Start of main content

Data catalog and data lake based on MongoDB: Building tech stack from scratch

Day 1

08:30 PM

The problem: cataloging a large number of unmanaged data sources. The audience: data engineers, data analysts, data solutions developers, data solution architects.

Ivan's talk will be about the work on creating a DataCrafter data catalog based on MongoDB, based on large heterogeneous public data of complex formats from unmanaged sources.

The catalog includes such rarely implemented features as:

automatic data schema creation;
automatic classification/identification of gender types (cadastral numbers, email, company IDs, links, etc.);
automated documentation;
automatic data quality assessment.

The focus of the talk will be on experiments preceding the creation of the catalog, technology stacks, problems being solved, and limitations.

Video of the talk
Download slides

Speakers

Ivan Begtin
Infoculture

Invited experts

Kseniya Tomak
Dodo Engineering

All talks

Data catalog and data lake based on MongoDB: Building tech stack from scratch

Speakers

Ivan Begtin

Invited experts

Kseniya Tomak