Day 3
RU
In this talk, Ekaterina wants to talk about why Citymobil chose Exasol as the DBMS for the warehouse, and Data Vault as the data model.
Ekaterina Kolpakova
Citymobil
Start of main content
Day 3
RU
In this talk, Ekaterina wants to talk about why Citymobil chose Exasol as the DBMS for the warehouse, and Data Vault as the data model.
Citymobil
Day 3
RU
As part of this report, we will study the Data Modeling Methodology, step by step consider the basic principles of creating an effective data model. Let's get acquainted with typical cases and common mistakes, learn the rules that will help you get the most out of your DBMS, and avoid common problems.
Datastax
Day 1
RU
In this talk, we'll discuss ETL workflows inside the Big Data Tools plugin. With this plugin, you can conveniently work with Zeppelin laptops, monitor Spark and Hadoop applications, and preview cloud file systems and HDFS files right from IntelliJ-based IDEs.
JetBrains
Day 4
EN
Learn how lakeFS simplifies the management of a Data Lake by enabling git-like operations over files in object storage. See how common processes like experimentation, reproducing data and ensuring data quality are simplified with workflows centered around branching, committing, and the merging of data.
Treeverse
Day 2
RU
We will talk about the schedule, sessions, and share the information. Join the broadcast to find out what's on the air soon!
JetBrains
JetBrains
Day 2
RU
In this talk, Kirill will tell how MTS was able to launch an AI-service of computer vision on EDGE devices in 500 offices of companies. What pitfalls the team faced and how they were able to keep the entire fleet of devices up to date, process, and verify data from all offices.
Astronomer.io
Day 3
RU
There are several options for how to insert data to ClickHouse correctly, and even more how to do it incorrectly. We'll talk about how to add data to ClickHouse, what pitfalls we can face, and how to avoid them.
Odnoklassniki
Day 3
RU
In this talk, we'll discuss the topics of Data Lake open architecture, Apache Parquet, and Apache Arrow data formats. Why do we need Apache Iceberg and Deltalake table formats, and how the Nessie project will help build SQL Lakehouse on Data Lake.
Dremio
Day 3
RU
Panel discussion is not recorded Apache Calcite is a framework which allows to add SQL interface to your app. In this live coding session we will teach imaginary DBMS to make SQL-requests.
Querify Labs
Day 4
RU
Dmitry will tell how to organize the access and work with data for different specialists — engineers, analysts, data scientists. He will also tell how approaches to the allocation of computing resources and access organization have evolved, how changed the tool set and modeling approaches, how the approaches to the output of results into industrial operation developed.
Day 2
RU
Airflow SaaS implementation in K8s private cloud and experience of migration from Airflow 1.x to Airflow 2.x SaaS.
Day 2
EN
This talk focuses on techniques employed in hybrid storage systems to reduce cloud footprint and improve efficiencies.
Netflix
Day 1
RU
The data storage appeared in Avito more than 7 years ago. During this time, the business has grown several times, and the infrastructure has become more complex. Evgeny will tell how the product approach to platform development helps to solve dozens of analytical problems every day without the multiple growths of the DWH team.
Avito
Day 1
EN
In this talk, Andy will discuss the challenges in using ML to optimize DBMS knobs and the solutions we developed to address them. My presentation will be in the context of the OtterTune database tuning service. Andy will also highlight the insights learned from real-world installations of OtterTune for MySQL, Postgres, and Oracle.
Carnegie Mellon University
Day 3
RU
Almost any company operating data finds it necessary to store and process data in different systems depending on the tasks. In such a world, a request arises for a service that can quickly and efficiently transfer data between these worlds. To solve this problem, Yandex has developed Data Transfer, a cross-system data replication service and Andrey plans to talk about it.
Yandex
Day 4
RU
We take stock, remember the bright moments and talk about our plans. Join the broadcast, so you don't miss anything!
Dodo Engineering
JetBrains
Day 2
RU
Distributed SQL engines must process data across multiple servers. In this talk, Vladimir will tell, using Apache Flink and Presto as an example, how distributed SQL engines are arranged, and what approaches they use to increase query performance.
Querify Labs
Querify Labs
Valiotti Analytics
Day 2
EN
In this session, we will go in deep, with practical examples, on how to map external data with Vertica, which are the Vertica options to push down the queries to external data repositories and the technologies behind it. Differences between Vertica and some other solutions will also be explained.
Day 2
RU
Let's talk about how, before making any changes to the pipeline in a production environment, you need to assess the potential impact on the system. You will find out that sometimes the pipeline is so complex and entangled in dependencies that it is almost impossible to predict the ending without experimenting.
Profitero
Day 1
RU
We will talk about the schedule, sessions, and share the information. Join the broadcast to find out what's on the air soon!
JetBrains
Klarna
Day 4
RU
Panel discussion is not recorded!
We will talk about Hudi, DeltaLake, Iceberg, and other storages. Quasi-mutable data storage formats are not only trending, but also mysterious. In this discussion we can figure out what's on the market and where is it all going.
JetBrains
JetBrains
Day 1
EN
In this talk Sabir will walk you through physical data layout optimizations available with Delta Lake. It will discuss factors that make a query execute fast.
Databricks Inc
Day 3
RU
We will talk about the schedule, sessions, and share the information. Join the broadcast to find out what's on the air soon!
Dodo Engineering
Yandex
Day 3
RU
As part of this report, we will study the Data Modeling Methodology, step by step consider the basic principles of creating an effective data model. Let's get acquainted with typical cases and common mistakes, learn the rules that will help you get the most out of your DBMS, and avoid common problems.
Datastax
Day 3
RU
This talk will look at the NiFi ETL tool — its pros and cons, tools and methods for monitoring, and the development process for a large number of teams.
Leroy Merlin
Day 4
RU
The data engineer's role is very important and critical. What skills should he have, how well should he know the code, algorithms, and data science? Dmitry was able to identify 2 types of data engineers and will tell about them during this session.
Microsoft
Day 3
EN
Is it possible to set up Spark so it never touches hard drives and hence be memory-fast? That's the question that Jacek is going to answer during the talk. You'll know a bit about the internals of Apache Spark and what parts are or could be memory-only and what challenges it poses.
Day 1
RU
We'll talk about Trino. You'll learn about work with the data from primary sources, combining and enriching them, subsecond requests. We'll also talk about hidden opportunities, new functionalities, what we have in a project, or his forks.
Huawei
Day 4
RU
BigData MTS has grown and matured, but some of the problems that it received while developing ML still remain. And, as it turned out, they are not alone in their fight against them.
Day 3
RU
Panel discussion is not recorded!
Pasha managed to work in different IT areas — system administration, development, management, data engineering, and now he works on Big Data Tools at JetBrains.
JetBrains
JetBrains
Day 4
RU
The participants in the discussion will try to raise various tricky questions in the spirit of "how convenient is it to store raw data NOT in HDFS" and "is it possible to simply transfer everyone to the SQL engine". And also "is it possible to call the daemon with the words Data Mesh, Delta Lake, Anchor" and "how to make Kappa architecture in real life and what is it all about".
Day 3
RU
In this talk, Dmitry will talk about the specifics of DS teams work and their infrastructure at Ozon.
Ozon
Day 1
RU
In this talk, Dmitry will tell how to write in Spark functionally using Scala at maximum speed.
Ozon
Day 4
RU
During this talk, we will discuss what data engineer's life consists of and how do we help them with Big Data Tools.
JetBrains
Day 2
EN
In this talk, Ton will discuss how to get faster and more secure access to data for testing purposes, by generating private data that (a) emulates the state of a dataset/database and (b) increases testing coverage. There are several tools available on OSS, but usually, the devil is in the detail.
Synthesized
Day 4
RU
We will talk about the schedule, sessions, and share the information. Join the broadcast to find out what's on the air soon!
Dodo Engineering
JetBrains
Day 1
RU
Erasure coding in Hadoop 3: a story about how the pursuit of the smart economy can turn out to be (almost) a disaster, and how to avoid it. Based on real data petabytes and a sea of tears.
Odnoklassniki
Day 2
RU
Projector is a self-hosted technology that launches IntelliJ-based IDEs and Swing-based apps on a server, providing you with access to them from anywhere using browsers and native apps. Let's find out how it works and what's inside.
JetBrains
Day 2
RU
In this talk Arthur will discuss all aspects of building a remote user authentication system on the web, taking into account current technical and legal realities.
Tazeros
Querify Labs
Day 1
RU
Ivan's talk will be about the work on creating a DataCrafter data catalog based on MongoDB, based on large heterogeneous public data of complex formats from unmanaged sources.
Infoculture
Day 4
EN
You'll be introduced to Exasol, the world's fastest analytical database. You will discover how Exasol can simplify your life and make having a data warehouse fun again.
EXASOL
Exasol
Day 1
RU
In this talk, Evgeny and Nikolay would like to tell how dreams of architectural beauty shutter against reality.
Yandex Go
Yandex Go
Day 1
RU
Over the last ten years, cloud computing made a gigantic leap and fundamentally changed the way we approach building systems. In this talk, we will discuss how modern capabilities of the cloud infrastructure change the core principles and the architecture of a database. We will see how separation of compute and storage allows to improve scalability and availability of the system while allowing to have a more predictable cost for the end-users.
Querify Labs
Cherry Labs
Day 2
RU
Imagine that a company needs to build a powerful analytical platform. ManyChat created such configuration, choosing the latest tools for maximum convenience and minimizing the cost of ownership. Nikolay plans to describe the selection process at each step of building the platform, possible risks, and the final experience.
ManyChat
eyeota.com