Skip to main content
Engineering, Data / ML

Sparkle: Standardizing Modular ETL at Uber

15 August 2024 / Global
Featured image for Sparkle: Standardizing Modular ETL at Uber
Image
Figure 1: Data Technology Stack At Uber.
Image
Figure 2: Components that are expected to be packaged as part of an ETL tool.
Image
Figure 3: High-level flow of the sparkle framework.
Image
Figure 4: Details of different Technical Components used in Sparkle.
Image
 Figure 5: Configuring workflow in Base YAML, defining relationships between the modules.
Image
 Figure 6: SQL transformation, reading from the source tables with the required filters defined as Jinja template variables.
Image
Figure 7: Class Transformation, implementing ITransform interface method apply().
Image
Figure 8:  Configuring applicationConfigMap, writeConfigs, and connector configs in Env YAML (prod, dev, staging).
Image
Figure 9: SQL Validation queries which evaluate to  Boolean. Unit test is considered to have passed if all the test cases ( validation SQLs) assert TRUE.
Image
Figure 10: Streamlining ETL: From complexity to simplicity with Sparkle Framework.
Dinesh Jagannathan

Dinesh Jagannathan

Dinesh Jagannathan is a Staff Engineer on the Data Intelligence team. He is focused on building scalable data products to improve data quality, standardizing best practices, and improving developer productivity.

Sharath Bhat

Sharath Bhat

Sharath is a Senior Software Engineer in the Data Intelligence team. He is focused on designing big data systems and building ETL frameworks; boosting developer productivity, enhancing data quality, and evangelizing best practices

Suman Voleti

Suman Voleti

Suman Voleti is a Staff Engineer in the Global Data Warehouse team. He is focused on building ETL frameworks to standardize the creation of batch pipelines with better performance, data quality, and observability.

Praveen Raj

Praveen Raj

Praveen Raj is a Software Engineer in the Data Intelligence team. He loves working on foundational problems in the data world and creating simple reusable tools/frameworks as solutions. Works on designing big data datasets and their ingestion systems and ETL frameworks.

Posted by Dinesh Jagannathan, Sharath Bhat, Suman Voleti, Praveen Raj