Skip to main content
Data / ML, Engineering

Queryparser, an Open Source Tool for Parsing and Analyzing SQL

1 March 2018 / Global
Featured image for Queryparser, an Open Source Tool for Parsing and Analyzing SQL
Figure 1: Uber’s data warehouse streaming architecture feeds all queries through Queryparser. Boxes denote services and pipes denote data-streams. The catalog info service is responsible for tracking the schemas of the tables in the data warehouse.
Figure 2: Queryparser takes three passes to fully process a query: parse, resolve, and analyze.The top flow illustrates this sequence conceptually as a transformation of data types. The bottom flow illustrates the sequence concretely on a real query.
Query
drop A_new if exists
create A_new as select … from B
insert into A_new select … from C
drop A_old if exists
rename A to A_old
rename A_new to A
QueryTable lineage of queryCumulative observed dataflowInterpretation of cumulative dataflow
drop A_new if existsA_new has no dependencies A_new has no dependencies
create A_new as select … from BThe data in A_new was determined exclusively by the data in B
The data in A_new was determined exclusively by the data in B
insert into A_new select … from CThe data in A_new was determined by the previous data in A_new and the data in C The data in A_new was determined by the data in B and the data in C
drop A_old if existsA_old has no dependencies 
rename A to A_old
rename A_new to A
Figure 5: A sample data flow graph representing four raw tables (A, B, C, D) and three modeled tables (E, F, G) portrays how queries are processed by Queryparser. In practice, the raw tables typically come from upstream operational systems such as Kafka topics, Schemaless datastores, and service-oriented architecture (SOA) database tables. The modeled tables are exposed in the data warehouse (Hive) and in downstream data marts (Vertica).

Posted by Matt Halverson