Stay up to date with the latest from Uber Engineering

Presto® Express: Speeding up Query Processing with Minimal Resources

7 November 2024 / Global

Introduction

Presto^® is an open-source, distributed SQL query engine designed for running interactive analytic queries on data sources of any size, from gigabytes to petabytes.

At Uber, Presto is a critical engine for data analytics across various departments. The Operations team relies on it for dashboarding, while Uber Eats and marketing teams use its query results for pricing decisions. Presto is also essential to Uber’s compliance, growth marketing, and ad-hoc data analytics, making it a cornerstone of the company’s data-driven operations.

Figure 1: Uber Presto operational overview.

Uber operates around 20 Presto clusters across over 10,000 nodes in 2 regions, supporting approximately 12,000 weekly active users. These users run about 500,000 queries daily, reading around 100 PB from HDFS. Presto is used to query multiple data sources, including Apache Hive™, Apache Pinot™, MySQL^®, and Apache Kafka^®, through its extensible data source connectors.

This blog describes how Uber designed Presto express to reduce end-to-end SLA for fast-running Presto queries.

Problem Statement

Earlier last year, we observed Presto experiencing query slowness for multiple months. To work around this, we had to add more capacity. The problem of query slowness was caused by throttling. To keep ‌Presto clusters from getting overloaded, we have concurrency limits that limit the number of queries that can run concurrently on the cluster. This creates a fixed pipe, and all the queries have to contend for a spot in that pipe.

Figure 2: High-level Presto architecture.

For a query to get accepted for execution, it has to pass the user, consumer, cluster, and cluster group level concurrency checks. Incoming queries are queued up in Exeggutor post-validation and released to Prism for routing if they satisfy the checks. For a particular user or consumer, queries are processed in a first-come, first-served manner.

Figure 3 shows how many times user and consumer-level limits throttled scheduled queries over a 7-day period.

Figure 3: Queries throttled due to consumer and user limitations.

The cluster concurrency level limit also throttles queries. For example, throttling for batch low-tier queries based on cluster limits.

Figure 4: Queries throttled due to cluster availability.

Figures 3 and 4 show the significant throttling that Presto queries have to go through.

However, looking at the Presto query latencies, we saw that P50 execution times of the queries were well under a minute. Here, the execution time only refers to the running time of the query in Presto and doesn’t include any queuing time for the query in Exeggutor.

That meant that out of the roughly 500,000 queries that Presto executed every day, more than half of the queries could easily finish in under a minute. To reduce the queue time of these fast-running queries, we designed a new approach.

Identifying Express Queries

We define an express query as any Presto query that can finish within 2 minutes. To identify express queries, we developed a method using historical data to predict whether an upcoming query is an express query. To do this, we tested the P90 and P95 query execution times using the exact fingerprint and abstract fingerprint of the query with lookback windows of 2 days, 5 days, and 7 days. An exact fingerprint is a query hash after removing comments and whitespaces. An abstract fingerprint also removes literal values from the query. An abstract fingerprint can identify similar queries coming from the same pipeline or data services even though the query isn’t the same.

We used this candidate definition to predict if a query was express: if the X runtime of the query in the last Y days based on Z fingerprint was less than 2 minutes. We explored variations where X was P90 or P95, Y was 2, 5, or 7, and Z was exact or abstract.

This gave us 12 candidate definitions. To compare the various candidates, we defined Accuracy as True Positive/(True Positive +False Positive) and Coverage as (True Positive+False Positive)/ALL.

Figure 7: Confusion matrix of predictions for express queries.

To test the accuracy of the prediction, we wrote a SQL query to analyze the historical data and come up with the prediction. In the analysis, we found that the P90 value of the abstract fingerprint with a 5-day lookback window provided the best accuracy and coverage, with values of 95.7% and 48.99%, respectively. So, we decided to use this query as our primary indicator for express queries moving forward. By implementing this optimized approach, we could better predict and identify express queries. We’ll continue to monitor and refine our model to ensure its ongoing accuracy and effectiveness.

We added a sink to ingest data from our query events topic to a Pinot table that can be queried in real time. Now when a query comes to Presto, we can determine whether that query is express using the Pinot query shown in Figure 9.

Figure 9: Pinot query determines if a query is express.

We use the _count to make sure that we make the prediction based on at least 5 previous runs of the query. If the percentile that we get above is less than 2 minutes and the count is greater than 5, then the query is deemed an express query. Since the Pinot table has the exact fingerprint and data for the last 90 days, we can easily change the method to use the exact fingerprint, a different percentile and/or larger or smaller lookback window.

One drawback of using historical data for prediction is that we won’t be able to predict accurately for new queries coming to the system. However, this isn’t a problem for batch workloads, where most of the queries are scheduled and repeat themselves over a set frequency.

The query prediction latency is very minimal, with most of the predictions being made in under 100 milliseconds.

Initial Express Design

In our initial design of the express feature, we designated one of our existing batch clusters to be an express cluster and added the aforementioned logic in Prism to determine if a query was express and should be sent to the express clusters.

Figure 11: High-level architecture of initial Presto express design.

However, this approach faced several issues:

Underutilization: The express cluster operated significantly below its full capacity, with CPU usage hovering at approximately 20%, while other low-tier clusters operated at nearly 90%. This disparity was evident in metrics, showcasing a noticeable decrease in CPU usage for batch3_b20b, which was designated as the express cluster. This situation highlighted a substantial underutilization of the express cluster’s resources.

Figure 12: Daily CPU usage of each cluster.

Our approach, limited to including the express tag only in the execution process, had inadvertently overlooked query throttling in QR queues. This oversight resulted in our system remaining within consumer and user limits. Paradoxically, the express cluster, despite being designed for quick queries, often remained idle due to meeting these limits. Consequently, other clusters were burdened with an increased query load, exacerbating the situation.

Figure 13: Daily query count of each cluster.

Our initial attempt revealed crucial insights. Even though more than 50% of the low-tier batch queries were express queries, they consumed much less CPU. The resources allocated to express and non-express queries should match the CPU requirements of those queries. Beyond that, the slowdown experienced by the express queries couldn’t be solved just by the runtime isolation of the express queries from the non-express queries. ‌Express queries needed their own queues in the upstream systems (Exeggutor), so that an express query for a user or consumer didn’t get blocked by a non-express query for the same user or consumer sent earlier.

Final Design

In the final system that’s currently in production, we created a separate queue for express queries in Exeggutor that’s matched by a set of small express clusters. The express determination is done at the time of query validation itself, and the query is added to the express queue if applicable. The express queue has much higher user or consumer-level concurrencies, letting them run much more express queries than non-express queries.

Impact

Currently the express feature is only enabled for batch low-tier queries. The low-tier express clusters use around 10% of the total batch low-tier Presto resources, yet they run about 75% of the batch low-tier queries.

Figure 15: Comparing query count of express and non-express queries in the on-prem batch low tier.

They also provide a much better SLA to ‌users. The p90 queuing time for express queries is about 10 seconds, whereas it can be in hours for non-express queries. 01:37:06 in Figure 16 refers to 1 hour, 37 minutes, and 6 seconds.

Figure 16: Comparing P90 queuing time for the express and non-express queries.

The p90 running time of the express queries is also obviously less, and the express queries provide a much better end-to-end SLA for users.

Figure 17: Comparing the P90 runtime for ‌express and non-express queries.

Next Steps

For next steps, we want to look at separating the express cluster group. In our current implementation, express clusters are a sub-group of the major cluster groups like batch low tier, batch high tier, and interactive. The idea was that each cluster group would have its own set of express clusters. However, based on our experience in production,‌ express can be a separate cluster group instead of being a subgroup. So, an express query should be sent to this cluster group regardless of whether the query is a batch high tier, low tier or interactive query. The reasons for this are:

The current express clusters seem to have low utilization and can‌ run much more queries.
Simplified routing logic. We can just determine that a query is express at query submission time and route it to this cluster group.
The SLA given by the express system is low enough that we don’t need to differentiate between batch and interactive queries.

So far, we have only differentiated between express and non-express queries, but we can have more classifications like small/medium/large and each classification going to a separate cluster group.

Another area we’re exploring is query cost prediction. Historical data has been our primary tool for predicting whether a query qualifies as an express query. Having historical data can be a challenge for interactive workloads. To address this, we can harness machine learning models to determine whether incoming queries should be classified as express or not. By integrating advanced algorithms, we can accurately assess the nature of new queries, ensuring more precise and timely classifications, even for previously unseen queries in our system.

Conclusion

This blog explored the design and implementation of Uber’s Presto express, aimed at reducing the end-to-end SLA for short-running queries. We discussed how express queries are defined, how they integrate into the overall ecosystem, and the challenges we encountered. Additionally, we highlighted the performance gains achieved through our deployment, with Presto express delivering an order-of-magnitude improvement in end-to-end SLA for over 75% of scheduled queries.

Apache^®, Apache Kafka^®, Apache Hive^™, and Apache Pinot^™are either registered trademarks or trademarks of the Apache Software Foundation in the United States and/or other countries. No endorsement by The Apache Software Foundation is implied by the use of these marks.

Oracle, Java, MySQL, and NetSuite are registered trademarks of Oracle and/or its affiliates.

Presto® is a registered trademark of LF Projects, LLC.

“1st Avenue traffic” by Oran Viriyincy is licensed under CC BY 2.0.

Mingjia Hang

Mingjia Hang is a Senior Software Engineer at Uber. She’s been working on enhancing the Presto ecosystem and developing new connectors, including the Pinot Datalake connector.