Start of main content

Delta Lake data layout optimization

Day 1

EN

In this talk, Sabir will walk you through physical data layout optimizations available with Delta Lake. In talk will discuss factors that make a query execute fast. He'll then outline different ways users can make optimize their workloads by making sure their data is organized in the best way possible. In particular, this talk will look at data partitioning, bucketing, and Z-order. It will discuss factors such as data clustering, statistics, optimal file sizes, and parquet row group sizes. Finally, Sabir will give you a sneak peek at the things the team is currently working on at Databricks to push the performance to the next level.

  • #storage
  • #storageoptimization

Speakers

Invited experts