Program SmartData 2021

Despite his education in psychology, for 14 years Pasha managed to work in a lot of IT areas — system administration, development, management, data engineering, in general, touched almost everything that exists in IT. More than 10 years ago he started practicing DevOps and never focused on just one thing. Now Pasha works at JetBrains on Big Data Tools – tools to make data engineer's life easier. Very sociable, loves, and understands people and is always happy to answer any questions.

Workshop. Making engineers' lives easier with Big Data Tools

Over 2 years in data analytics and BI development. Assists businesses with marketing and product. He is engaged in building and automating reporting, implementing and optimizing analytics, and adjusting data-driven marketing strategies. Besides work Roman together with Dmitry Anoshin is co-founder of DataLearn free educational project, where he is organizing webinars and mentors students.

ayudovin

For the last 5 years, Artem has been working in the area of Big Data. In this area, he came across completely different projects from publishing whitepapers on a NoSQL databases benchmark to writing standard pipelines. He works for Profitero as a tech lead of data engineers. In his free time, Artem tries to take part in different open source projects.

From one big ETL job to experimenting with data pipelines

Versatile engineer with over 8 years of experience in IT. For a long time worked on high-load systems in cludes as a full-stack developer. Last 2 years working in Big Data field. There are several publications of NoSQL database benchmarks.

For 12 years he has been developing software for collecting and processing data from various devices. In 2008, after defending his PhD thesis on the use of Petri nets in the development of remote control and management centers, he started developing business intelligence systems, integration and other Data Engineering and BI projects. For 15 years he taught at NSTU and Microsoft certified training centers.

algoncharuk

Alexey Goncharuk is the Chief Researcher at Querify Labs, where he works on researching query optimizers, distributed systems, and data storages. Alexey worked on building distributed in-memory data management platform Apache Ignite for over nine years where he focused on developing persistence layer and distributed protocols. Alexey is a committer of the Apache Ignite project.

brekelov

Vsevolod has more than 11 years in IT industry. He tried his hands at different roles and directions. He worked with automated testing, fullstack development in different languages. Some time ago he lived in California and worked under contract at Google. Managed to work with various clients from financial sector. Worked at NEO SPCC startup where he was developing on Go.

Bronislav has been working in IT for over 17 years and near 15 years in Software development. For the last 2 years, he has been developing the project "Raw data to DataWarehouse" in a Tinkoff as an Architect and a product owner. Creates an inner product base on Apache NiFi. Bronislav is also an administrator and active member of the NiFi Users community in Russia (@nifiusers).

Graduate of the Faculty of Geography, Moscow State University. Data Janitor since 2015. Long ago was a cartographer and geospatial engineer, but decided not to stop there. Once made a corporate GIS for foresters all over Russia. Worked with the Ministry of Defense and since then has been able to remain calm in any situation. Accidentally walked near Big Data, and since then all the things have started. For the last 2.5 years Andrey has been making the aviation industry better with S7 airlines.

7 years of full-time IT teaching, Information security and math stats. Ph.D. Currently works on developing recommender systems at ok.ru.

He has over 20 years of experience in the IT industry. He began his career by developing systems for monitoring and load testing, and has extensive experience in designing and administering data warehouses, as well as developing data center computing infrastructure. He was involved in deployment of configurations based on SAP solutions and headed IT services migration projects. He is currently the architect of Vertica solutions in Russia and the CIS.

Biography will be added soon

arboook

Graduated from Moscow Institute of Physics and Technology, moved from physics to the creation of IT products. Supervised products at Gaprombank and Otkrytie. Co-founder of COVI Retail startup. At the moment he is engaged in projects with EDGE computer vision at MTS.

Data processing and verification for computer vision in MTS sales offices throughout Russia

A graduate of Innopolis University. After graduation worked at Autodoria, where I got my first experience with computer vision. The next (and current) place was MTS. Currently he is the head of video analytics at MTS AI.

Andrey is a Staff Software Engineer at Google in the Core Data organization, where he is responsible for engineering productivity for petabyte scale OLAP/query processing systems. He is an active participant in the distributed systems community and is serving on the program committee of Hydra conference. In the past he was working on Amazon Aurora at Amazon Web Services, a distributed SQL database at Yandex, cloud antivirus detection system at Kaspersky Lab, an online multiplayer game at Mail.ru, and a foreign exchange pricing service at Deutsche Bank. He is interested in building large scale distributed databases and backend systems.

Vertica Field Chief Technologist. Maurizio has started writing complex code in Fortran in 1985 during his Master’s Degree in Physics when he has built sensors and software to capture and analyse gravitational waves signals. Maurizio has started working in 1986 coding Unix device drivers. In 1992 Maurizio has started working with databases and has implemented his first large Data Warehouse in 1998 when he was in Oracle. In 2006 Maurizio has joined Hewlett-Packard and started working with large MPP databases. In 2011 he begun working with Columnar Databases in Vertica. Maurizio knows several databases, many programming languages and different Data Warehouse Architectures. He has coded several tools in order to: move data from one database to another, assess database throughput and analyse Query Performance. Maurizio has also contributed to the development of the Vertica Federated Queries.

How to bring advanced analytics to hybrid data storage with Vertica

Vertica Field Chief Technologist. Worked with relational databases since 1989; with data warehouses since 1992/1993. Worked for Vertica ever since HP bought Vertica in 2011. Specializes in Big Data architectures and data warehousing ecosystems.

How to bring advanced analytics to hybrid data storage with Vertica

Gianluigi is a Software Engineer, located in Milan. His expertise lies in Data Architecture with a focus on Information Extraction. He contributes with the R&D team to increase Vertica integration with the opensource ecosystem (Hadoop, Kafka, Spark…). Before joining Vertica, Gianluigi worked in several Information Technology companies, as a System Engineer and Technical Architect for parallel cluster and parallel databases architectures.

How to bring advanced analytics to hybrid data storage with Vertica

enchantner

An infrastructure-dealing engineer with almost 10 years of software development using various programming languages and platforms. About 8 years of Python programming experience as well as ~3 years of using Go, good knowledge of web technologies.

Teaching, mentoring, writing and translating articles on Python, Linux, Big Data, clouds, networking and algorithms. Expertise includes distributed and high-performanced systems, networking, algorithms, concurrency/parallelism, capacity planning and basic statistical data analysis. DevOps and CI/CD enthusiast.

Round table: What if not Hadoop

"If artificial intelligence is our future, then big data is the coal of the locomotive that will bring us into it".

Maksim is working with data for 10 years. He has been building ETL Pipelines, Data Storages, analyzing Data, and working on Visualisation in government companies (RCOI), energy companies (MOEK, GAZPROM), Banks (BRC, VTB24), and IT companies (Yandex, Mail.Ru). Big Data is his wife and mistress. He's always ready to talk about it.

Round table: What if not Hadoop

Over 15 years of experience in the IT field. For the last 1.5 years, Nikolay has been developing data storage at Yandex Go. Specializes in MPP Greenplum DBMS.

Greenplum and Anchor modeling: How dreams shutter against reality

iJKos

More than 10 years of experience in IT. Architect of data warehouses and analysis systems at Mail.ru Group and Yandex Go. Candidate of Technical Sciences, author of more than 10 papers in data analysis, co-author of a monograph on the theory and practice of parallel database analysis.

Greenplum and Anchor modeling: How dreams shutter against reality

__ali

Artem is a Huawei expert in big data technologies and graph databases. Before that, he integrated Spark, TinkerPop, Cassandra at Datastax, led a data storage performance optimization team at EMC, and developed Apache Harmony J2SE.

Trino (Presto) DB: Zero copy lakehouse

Nikolay is the Head of Data Engineering of ManyChat (SaaS startup), responsible for the implementation and growth of its Data Platform (AWS+Redis+Snowflake+Tableau). Previously, from 2013 till 2019 he's headed the Data Platform of Avito, Craigslist of Russia, which grew to a multi-billion dollar company from a small startup. In Avito he was responsible for analytical databases (Vertica, ClickHouse), OLTP engines (PostgreSQL, Redis, MongoDB), and data buses (Kafka) for analytics and micro-services integration. In parallel with those jobs, Nikolay is a researcher of Higher School Economics in Moscow, Russia, having few international publications about data warehousing (Anchor Modeling) and aspects of big data processing.

Design steps of building analytical data platform in the clouds

Graduated from St. Petersburg State University in 2004, got a PhD degree in the field of the formal logical methods in 2007. Spent almost 9 years in outsourcing without losing contact with the university and research community. Big data analysis at Odnoklassniki became for Dmitry an unique chance to combine theoretical knowledge and scientific foundation with the development of real and popular products. And this chance he gladly took advantage of by coming there in 2011. Joined Sberbank team in 2019.

D-people workplace — Sber's experience

devozerov

Vladimir Ozerov is the founder of Querify Labs, where he manages the research and development of innovative data management products for technology companies. Before that, Vladimir worked on in-memory data platforms Apache Ignite and Hazelcast for more than eight years, focusing on distributed data processing. Vladimir is a committer to Apache Calcite and Apache Ignite projects.

How to design a high-performance distributed SQL engine

Tejas Chopra is a Senior Software Engineer, working in the Data Storage Platform team at Netflix, where he is responsible for architecting storage solutions to support Netflix Studios and Netflix Streaming Platform. Tejas has worked on distributed file systems & backend architectures, both in on-premise and cloud environments as part of several startups in his career. Tejas is an International Keynote Speaker and periodically conducts seminars on Micro services, NFTs, Software Development & Cloud Computing and has a Masters Degree in Electrical & Computer Engineering from Carnegie Mellon University, with a specialization in Computer Systems.

An experience report on strategies for working with Cloud Storage

Sabir is a software engineer at Databricks working on optimizing physical data layouts for the best performance. Before that, he worked in Databricks performance engineering and benchmarking team.

Sabir was born in Kazakhstan and since then has lived in 4 different countries. He's interested in learning new languages, technologies, and sports, mostly powerlifting and Russian kettlebells.

Delta Lake data layout optimization

hadesarchitect

After many years in Software Development as a developer, technical lead, DevOps engineer, and architect, Aleks focused on cloud computing and distributed systems. Professional Cloud Architect and Developer Advocate, he shares his knowledge and expertise in the field of high-performant and disaster tolerant systems.

Workshop. Building an efficient data model for high-performance applications with Apache Cassandra (part 1) Workshop. Building an efficient data model for high-performance applications with Apache Cassandra (part 2)

ashberlin

Ash has been a contributor to Airflow for almost four years and is a member of the Project Management Committee (a.k.a. the Core team) for almost as long. He was the Release Manager for much of the 1.10 release series and he also re-wrote much of the Scheduler internals to be highly-available and increase performance by an order of magnitude (AIP-15).

Outside of Airflow he is the Director of Airflow Engineering at Astronomer.io where he runs the team of developers contribute to the open source Airflow project.

Apache Airflow 2.3 and beyond: What comes next?

andy_pavlo

Andy Pavlo is an Associate Professor of Databaseology in the Computer Science Department at Carnegie Mellon University. He is also the co-founder of OtterTune.

Lessons learned from using machine learning to optimize database configurations

jaceklaskowski

Jacek is an IT freelancer specializing in Apache Spark, Delta Lake, Apache Kafka and Kafka Streams (with brief forays into a wider data engineering space, e.g. Presto). Jacek offers software development and consultancy services with very hands-on in-depth workshops and mentoring. He is best known by his online books available free of charge at https://books.japila.pl/.

Apache Spark as an in-memory-only data processing engine?

Valerie began her career as Pre-Sales Engineer at EXASOL in 2018. At the start of actively technically consulting prospects — future customers of Exasol. Her responsibilities included deep dive into EXASOL's product capabilities and features, preparing testing environments, delivering POCs, and building SOWs for Data Warehouse migrations into EXASOL. The portfolio of customers Valerie worked with some largest insurance and retail organizations in Germany and Central Europe.

How an analytical database stopped me smoking: A practical story with Exasol

Engineer with over 10 years of hands-on experience in IT. For the past 4 years, Andrey has been dealing with large distributed systems and, in particular, data delivery systems, which he has gradually combined into a universal data delivery service — Yandex DataTransfer.

How data delivery works in Yandex and why we're no longer afraid to transfer JSONs

madhape

Graduate of MSU Faculty of Computational Mathematics and Cybernetics. More than 14 years of experience in fintech and telecom as a developer, architect, expert of data governance, and product owner. Now he builds the MLOps platform at MTS.

Round table: What if not Hadoop

10 years at MTS, in data analytics and numerical marketing, marketing strategy, then headed the functions of data science and data governance, the Big Data team. For 1.5 years as the executive director — chief data scientist (CDS) of Sber, working with distressed assets. For 2 years worked as the leader of the Big Data team of the Social Block of the Moscow Government. Now Nikolay is a private consultant, open to new projects.

Round table: What if not Hadoop

Trying to do something useful with data since 1993 as DBA, DBA team lead, DB/DWH developer. Has experienced the ups and downs of different approaches to data processing. Today Valdis is the data processing team lead at Evolution.

Round table: What if not Hadoop

If "data is the new oil", then Dmitry is responsible for all steps in working with this it, from well drilling and production to refining and transportation. Dmitry has been building and maintaining data warehouses and data lakes in companies and startups on the Apache technology stack (Hadoop, Hive, Impala, Spark) for the past 8 years. In Leroy Merlin he built a ~500TB storage data platform based on DWH Greenplum, with a lake on top of S3, NiFI, and Flink ETL tools, and an operational layer at Clickhouse. Fan of open source and good dialog partner.

NiFi on a large scale: Architecture, monitoring, best practices

ibegtin

Founder of Infoculture, created to popularize the openness of data, the state, digital preservation, and other related technological public topics. He also develops APICrafter/DataCrafter startup to create catalogs and data lakes, primarily based on open data.

Before that, Ivan created state, private and public information systems and IT products.

Data catalog and data lake based on MongoDB: Building tech stack from scratch

Head of DWH at Citymobil. Developed DWH (BigData) at Tinkoff and Mail.Ru Group. Lecturer of the open course "Designing Big Data Warehouses" at the Mail.Ru Technopark at the BMSTU and MSU.

A tale about how we build DWH: From MySQL replicas to Exasol + ClickHouse

sweetweet213

DataStore Enthusiast, Doodle Maker, Tango Lover & fellow coder.

Currently a senior data engineer at eyeota.com — the world's largest audience data marketplace. Formerly at Flipkart.com — India's largest e-commerce company, was part of its data team, MySQL engineering team, website & warehouse/order management teams.

Spark Yoga — saving time & money with lean data pipelines

synorga

Christian Langmayr heads the development of the global Exasol Community with End Customers, Academics, Partners, and technology Alliances. He is passionate about keeping and growing the special spirit that goes beyond the software developed and strives for positive interactions between all parties to drive the development of individuals involved. He has more than 15 years of experience in the IT industry with previous positions in MicroStrategy and Toshiba. Christian holds a degree in Business Administration from the Catholic University in Eichstätt, specializing in Services Management and Marketing. His focus is on supporting business growth, improving processes, and developing a data analytics ecosystem that empowers Exasol to grow in its relevant markets.

How an analytical database stopped me smoking: A practical story with Exasol

Graduated from MSU Faculty of Computational Mathematics and Cybernetics in 2015. For more than 6 years he worked as a programmer, for more than 3 years he has been managing teams. Now he's a lead of DWH unit in Avito. A fan of cool products and implements the DWH strategy as a product. In his free time, Evgeny plays football (captain of Avito team), chess, and learns Spanish (B2).

DWH as a product

Data Engineer at MTS Big Data and a lead of data platform development unit. Built an ETL platform of internal fintech stream, took part in the development of BDaaS (Big Data as a Service) product and MTS Big Data ETL Framework. Currently developing Feature Store.

How we build Feature Store

TonBadal

Ton is an Engineer passionate about Machine Learning and AI. Before joining Synthesized, he worked for a challenger bank in the UK improving their decision process by exploiting their data, and before that, he obtained his MSc in Artificial Intelligence at the University of Edinburgh.

Optimizing test data coverage in functional testing

PhD in Economics, worked in major Russian companies: built analytics at the Lenta network, was responsible for analytical processes at Yota, did forecasting at Baltika, headed the analytics department and then the marketing department at Yulmart, headed the Data & BI direction at US company Airpush. In 2019, he founded Valiotti Analytics, where he provides analytics consulting for mobile and digital startups. Co-founder of the open source self-service BI platform Mprove. Author of the blog leftjoin.ru.

Self-service BI: Data model building practices

8 years in the machine learning industry, 4 years in developing computer vision systems at Cherry Labs. Interested in building ML pipelines, optimizing models, making stuff automated and flexible, for the needs of both production and research.

ML model lifecycle at Cherry Labs

Dmitry has been developing in Scala since 2014, developing everything from simple CRUD APIs to stateful distributed services. In recent years he has been working on DE and developing different kinds of DE tools.

"Functional" Spark

For the last year, Vadim has been working in the Big Data team at Tele2: he makes pipelines, develops internal frameworks, and starts contributing to Airflow. Before he worked at Cian as lead developer, stood at the origins of its rapid development, and was engaged in the development of many features that exist on the site.

Airflow 2.х SaaS

Sergey has over 5 years of experience in DevOps and SRE. Previously, he was involved in the development of Observability and IaaC directions within the TK Center. Now he's helping in the development of his own Hadoop distribution at Tele2. He is also actively developing the SaaS approach in the BigData sphere.

Airflow 2.х SaaS

For more than 6 years, Mikhail has been implementing DevOps practices and ubiquitous automation. He's one of the developers of SaaS cloud Bit.Live, he also successfully defeated the ancient manual monolith "TK Center", moving it onto comfortable IaaC rails. Now Mikhail is a part of the Hadoop distribution development at Tele2 and is also involved in the development of SaaS/PaaS solutions in the BigData team.

Airflow 2.х SaaS

Denis works in BigData, mostly with Hadoop since 2013 and now he's a lead developer at Mail.ru Group. He has been designing and developing a platform for storing and processing statistical data for the Odnoklassniki project since 2018.

Hadoop 3: Erasure coding catastrophe

IT engineer and architect with 10 years of experience. For the last 7 years, he has been working on distributed systems in general and Big Data in particular. Now Artem is a lead developer at Mail.ru Group/OK.RU, Data Platform team. Worked with data at Grid Dynamics for 4 years and has gone from Data Engineer to Data Architect role. Also used to be a full-time Apache Ignite contributor, that is why Artem knows how distributed systems work under the hood.

Insert into ClickHouse and not die

Roman was involved in building the distributed SQL for Apache Ignite at Gridgain Systems. Then he worked at Yandex, where he was engaged in Yandex Query Language. Now he works at Querify Labs that advise technology companies on database development.

How to employ Apache Calcite for building a SQL layer for any system

Itai is an R&D team leader at Treeverse, the company behind open source lakeFS. He thrives on finding creative solutions for complex problems, especially if it involves code. Previously, Itai worked at Microsoft and Ridge on data infrastructure, tooling, and performance. Itai received his B.S. degree in Computer Science and an MBA from Tel Aviv University.

Create a git-like experience for Data Lake analytics

Analytics and Data Engineer Leader with 10+ years of experience working in Business Intelligence, Data Warehouse & Data Integration, BigData, Cloud, and ML space across North America and Europe.

Apart from work, Dmitry is teaching a Cloud Computing course at the University of Victoria, mentoring high school students at CS faculty, and volunteering my time for coaching people with analytics engineering skills in the CIS region. Moreover, he's the author of analytics books and a speaker at data-related conferences and user groups.

Two types of data engineers

ViktorKessler

Viktor Kessler is Sr. Solutions Architect at Dremio since December 2019. Before joining Dremio spent multiple years working at MongoDB, ERGO, and PwC as a Solutions Architect on topics of Big Data, DW, and digital transformation projects.

Dremio SQL Lakehouse: Fast data for all

Schedule

Day 1. October 11

Day 2. October 12

Day 3. October 13

Day 4. October 14

Schedule

Talk(s)

Talk(s)

Talk(s)

Talk(s)

Talk(s)

Talk(s)

Talk(s)

Talk(s)

Talk(s)

Talk(s)

Talk(s)

Talk(s)

Talk(s)

Talk(s)

Talk(s)

Talk(s)

Talk(s)

Talk(s)

Talk(s)

Talk(s)

Talk(s)

Talk(s)

Talk(s)

Talk(s)

Talk(s)

Talk(s)

Talk(s)

Talk(s)

Talk(s)

Talk(s)

Talk(s)

Talk(s)

Talk(s)

Talk(s)

Talk(s)

Talk(s)

Talk(s)

Talk(s)

Talk(s)

Talk(s)

Talk(s)

Talk(s)

Talk(s)

Talk(s)

Talk(s)

Talk(s)

Talk(s)

Talk(s)

Talk(s)

Talk(s)

Talk(s)

Talk(s)

Talk(s)

Talk(s)

Talk(s)

Talk(s)

Talk(s)