Start of main content

How data delivery works in Yandex and why we're no longer afraid to transfer JSONs

Day 3

RU

Almost any company operating data finds it necessary to store and process data in different systems depending on the tasks.

Analysts are leaving in Clickhouse and Greenplum, rainy day backups are shipped to cheap HDFS and S3. Developers want to upload whatever they get to Elastic and Kafka, and any Yandex employee to the best storage in the world, which he and his friends wrote. But the boss insisted on Oracle. In such a world, a request arises for a service that can quickly and efficiently transfer data between these worlds.

To solve this problem, Yandex has developed Data Transfer, a cross-system data replication service. It is already used by hundreds of teams, constantly pumping tens of gigabytes of data per second, and some time ago it became available to Yandex.Cloud users.

This talk will be useful both for developers who are interested in distributed systems for the delivery of big data and for data engineers who will learn the details of the operation of the in-demand tool.

  • #architecture
  • #dataingestion

Speakers

Invited experts