Maximizing Process Performance with Maze, Uber’s Funnel Visualization Platform
16 August 2018 / GlobalAt Uber, we spend a considerable amount of resources making the driver sign-up experience as easy as possible. At Uber’s scale, even a one percent increase in the rate of sign-ups to first trips (the driver conversion rate) carries a monumental impact.
In December 2016, Uber data scientist Andrey Liscovich hypothesized that traditional funnel analytics tools were not adequate for studying the actual driver sign-up experience because they treated it as a fixed sequence of steps, while in practice, the path from sign-up to first trip is a complex maze that any two drivers might navigate differently. To gain a more realistic understanding of how users interact with the sign-up technology, he started a cross-functional effort to develop a new funnel visualization platform, called Maze, that recognized the underlying complexity of the funnel.
By applying Maze to the logs captured during driver sign-up, we can visualize the actual paths drivers take when signing up with Uber, and identify bottlenecks that occur in the process. Maze’s application at Uber has since expanded beyond the sign-up use case, and it is now used to visualize many processes—from rider pick-up and drop-off to user interactions with our website. Read on to learn how the Uber Visualization team developed Maze and why this new solution offers unparalleled insight into the Uber user experience.
Navigating the maze: why we built a new tool
Before 2016, we visualized the driver sign-up experience as a simple series of chronological steps in a fixed order. At any given step in the process, a certain percentage of aspiring drivers would drop out, or “churn.”
In this model, the conversion rate is the product of the percentage of conversions at each step. So, to improve our conversion rate, we needed to reduce churn. This approach, while valuable, was limited because it didn’t take into account the order in which driver sign-up events happened or whether they happened at all.
There are many points of entry to the driver sign-up flow (web pages, email, and app, to name a few), varying by region. Not all driver candidates go through the exact same sign-up process. And even for those who do: what happens if they start the process on their phone then choose to continue on a desktop computer? What if they pause their application and resume it two months later? What if they go back one step and change innocuous information (ex. car type, car color, etc.) they had already submitted? The answers to these questions have a notable impact on our conversion rate.
The long way to conversion
When we started measuring the number of steps from sign-up to activation–that is, all the events we were able to detect—we discovered huge variability in the number of steps actually taken by drivers. In some cases it took a few dozen steps to go from sign-up to first trip, but in others, it took far longer. In some cases, driver candidates might go through hundreds of events before dropping from the funnel.
Contrary to what we long-assumed, there is not one most effective path to conversion, but rather thousands, some short, some long, some linear, some crooked. Indeed, the journey to conversion is a real maze.
Entering the Maze
To better understand the sign-up process, we built Maze, a tool to visualize aggregated sequences of events, enabling us to answer questions such as:
- How many drivers went through event A?
- How many went through event A, then event B?
- How many went through event A, then event C?
- How many went through event A, then B, then D?
- How many went through event A, then B, then D, but not C?
Answering these questions through traditional means (for instance, a SQL query) is cumbersome and error-prone. Instead, our tool finds the size of every sequence of events and represents the entire funnel visually.
Presentation
When a user opens the driver app and begins sign-up, the app logs every action they perform, enabling Maze to transform each app session into a series of events.
To explore these sessions as a unified whole, we use a sunburst visualization, a visual representation of the funnel made of concentric rings. (For more on this topic, check out previous open source work in this area by Kerry Rodden). The center rings of the sunburst are made up of multiple arcs, representing the first events of all the sequences, and can end up in one ring if all sequences start from the same event. Then, for each extra step, we draw a ring made of one or several arc segments, with each segment corresponding to a specific event and sized proportionally to the sequences that correspond to that event.
In the Maze UI, we can highlight a certain path with our cursor and it will represent the sequences of events in that exact order, enabling us to easily see what proportion of the total sessions they represent under that node.
Because sequences can get quite long and there’s only so many layers we can show on the screen, we also let users zoom in on a certain arc and redraw the sunburst from that moment, offering a more precise and detailed view of the data.
From there, we can use the UI to refine the results we need. For instance, as depicted in Figure 3, we can filter for the events we want represented on the screen.
In fact, our UI lets us do all kinds of filtering and exploration. For instance, we can apply “sequence filters” to keep only sequences that have certain events in a given order.
We can also narrow our sunburst to user sessions during which a successful data query is followed by an error on the same query. This type of specification lets us explore possible pain points during the driver sign-up experience, enabling us to understand what went wrong and determine how we can fix it for future users.
Reaching the center of the maze: first wins
At Uber, our data scientists use Maze to test certain hypotheses and assess their validity.
In Figure 5, for example, we observe a spire of purple nodes at the top of our visualization, representing clicks on the back button of the browser for a certain web page during the driver sign-up process. The purple spire indicates that individuals were unable to return to previous steps, suggesting that the browser’s back button was the blocker. In this scenario, we can tell that driver candidates at that step would continue clicking on that button until they gave up altogether on the process or decided to continue progressing chronologically.
Maze architecture
We built the Maze frontend using Uber’s web tech stack, based on React 16 and Redux. Our web architecture also incorporates a React client and a Node server as an RPC proxy, as well as integrated performance metrics, traffic monitoring, and coverage reports.
To present valuable data visualizations at scale, Maze achieves a well-defined balance between responsive user interactions and high performance. In addition to the Redux framework, layered caching is also applied to our system, as detailed below:
- React Layer Cache: The very first layer of cache above React Store includes in-state cache and React Selectors, preventing heavy calculations and re-rendering.
- Web Worker: As part of the browser cache, the Web Worker holds data backup for instant calculations on actions that don’t have to access the back-end database. SharedWorkers allow a smooth user experience for actions regenerating the whole visualization.
- Node-side Memory Cache: Each request for every sign-up event from all drivers in San Francisco, for example, would result in millions of pieces of data being collected. This layer of cache makes sure the first view of data has a low cost.
Data processing and visualization
Data visualization at scale can be a challenging process. Maze solves this problem by enabling us to visualize very large data trees. Visualizing large sequences of events can result in trees of up to 100 layers in depth and potentially hundreds of branches on each leaf correlating to millions of nodes and hundreds of megabytes of data that must to be downloaded in a single request.
Using React and Redux on such a massive scale can be risky, as we don’t want an infinite calculation to be applied to each individual action dispatch. When the response from the backend utilizes hundreds of megabytes, maintaining rendering states from reducers to containers and components turns out to be a luxury for limited browser resources.
We use the D3 library for major data rendering in Maze. Using real document object model (DOM) actions triggered by D3 can be relatively tricky if animations or transitions are required, as React expects the whole DOM structure to be regenerated for state updates. The workaround we applied in Maze is to use in-component states as well as caches for each new set of data and only dispatch a global data update when new aggregations are required.
There are many insights we can glean from fitting the raw data using the right strategy for real world use cases. For instance, a typical question, “filter a certain type of event out of the view,” might turn out to be an algorithm question, “remove nodes with a given event type from the tree.” This situation can be separated into the following steps:
- Save the original data set
- Apply new fittings to the current data
- Render the latest view for all rules
From there, Web Workers execute the data fitting and deliver the results of each calculation so that we can serve real-time user interactions without having to query the backend, while original data sets are shared in the browser cache.
Storage and aggregation
In practical terms, Maze is a sequence analytics tool that provides insights on sessions, where a “session” can be a rider trip, a website visit, an sign-up procedure for a new driver, or any of the many real user interactions with our platform. In other words, Maze collects a list of events for thousands of sessions and aggregates it into a single tree view.
Maze’s backend primarily handles data aggregation. The backend is responsible for data sign-up, backup/backfilling/caching, aggregation, collecting sessions, validating and quality ensuring, and serving well-structured data in our OLAP database for data analysis.
For each onboarded data source, the Maze backend receives data by running daily scheduled Spark jobs on pre-defined Hive tables, and real-time data from our open source streaming analytics platform, AthenaX. As the raw data source is very noisy and even a single querying request requires a significant amount of time to process, data “pre-aggregation” occurs before writing into the database. This step ensures that saved data can be as close as possible to the specified read schema definition. Additionally, having this schema allows us to create custom filters on metadata, thereby enabling more dynamic querying.
The aggregation process directly serves the Maze frontend with two major results: aggregated data for visualization and detailed session records. During a typical interaction, a Maze user first checks the visualization for unexpected funnels and then dives into each individual session for greater insight. That pattern allows the Maze backend to optimize performance by using Redis as in-memory cache. Since these two results (with very different structures) are built from the same data set, we save resources by not needing to query for both results with millions of events each.
Leveraging these properties, the Maze backend aggregates, stores, and surfaces data accurately and efficiently.
Challenges & improvements
Maze was first developed from an Uber hackathon prototype. It took our engineers a lot of effort to productionize Maze and provide relatively stable data quality and a smooth user interface. Below, we outline some of the challenges we face while developing the next generation of this powerful new visualization tool:
- Back-end query tuning: “Determining the status of all drivers in San Francisco up until the last minute” is not a simple question to answer. We are now working with the fourth generation of Maze’s aggregation logic, and actively improving the data sign-up process.
- Data scalability: When building Maze, we asked ourselves, what’s the best storage approach for our online analytical processing (OLAP) database? There were many options available at Uber and they are all built for different purposes. It took a lot of engineering effort trying to come up with workarounds fitting the data scale and real-time requirements, and we recognized this would always be an ongoing process. Even with our solutions using MemSql, there is still room for growth.
- Dynamic rendering and internal state maintenance: Rendering millions of nodes in a web browser using D3 is not realistic. With dynamic rendering, internal state maintenance, and in-browser caches, we can display very tiny pieces of data for rendering and hide the whole iceberg from our users.
- JavaScript runtime clean up: It was a difficult but reasonable decision to remove Immutable.js and other fancy packages from our solution. On the frontend, we care about every millisecond of performance and kicked out possible blockers to speed up the user experience.
- And more: Other improvements we are working on include full test coverage and integration tests, offline data mock, continually improving monitoring and alerts, more efficient visualization, larger scale of user groups and problem set, and never-ending problem solving.
Other opportunities
We still have a long way to go to optimize Maze. We also acknowledge that efforts to make a product close to perfect never really end. However, migrating the Uber web tech stack from Bedrock to Fusion.js using webpack instead of gulp.js, developing a new data transaction procedure to reduce networking latency, upgrading UI components following the latest BaseUI design, integrating A/B testing, and simply experimenting with more visualizations would all potentially benefit the product itself and engineers.
However, returning to the original goals of Maze, we were at a place where the product is well-defined, the data structure is stable, and the performance is high even when using massive quantities of real-time data. Looking at these qualities, a new question surfaced: how can we extend what we have now to serve more users with insights to their funnel and conversion-related questions? In other words, how can we serve more useful data so that our users can efficiently determine the root cause of problems in a funnel?
Everything at Uber is a funnel
As a result of Maze and other large-scale efforts to address known issues in the sign-up process, the driver sign-up conversion rate in U.S. cities improved by more than 50 percent since 2016. Thanks to Maze, we can better detect and explain sign-up anomalies which otherwise would fly under the radar and remain broken.
There are many other processes similar to driver sign-up at Uber that involve a population proceeding through a number of events, ending with either a successful outcome or less desirable ones. A few months after we delivered the first version of Maze, we began opening the tool to over 20 other use cases, such as rider app sessions, Uber Eats, airport pickups, and the Uber Visa card application. Now, anyone at Uber who is responsible for a process can onboard their data to Maze and diagnose conversions.
Because we log the events of all our internal tools, we could also use Maze to understand how these tools are used. (In fact, using Maze to analyze Maze usage is one of the first things we did!)
And funnels are everything
Originally, Maze only proposed sunburst visualizations. The sunburst introduces some biases, as rings towards the outside occupy more space on the screen than rings closer to the center that represent the same proportion.
This distortion actually works well for Maze as the outer arcs are often very small, and as such, outer layers tend to be sparse.
Nevertheless, we created many variations of the sunburst and many other ways to look at this dataset, including reversed sunbursts, variable width sunbursts, animated sunbursts, icicles, node-link diagrams, and clustered sequences. We keep on adding new ones to our toolbox because different ways to look at our data can unlock new insights and create tremendous value for our users.
Markov chain view | Sunpulse animation between two sunbursts |
Variable width sunburst | Different coloring modes |
Figure 10: This set of experimental visualizations we developed in Maze shows data insights from a variety of different perspectives. |
Beyond the Maze
Maze is not an end in itself. The goal of the project is to play a part in a larger visualization initiative: Funnel Health. With Funnel Health, users can define their own funnel from select events and receive intelligent alerts when certain characteristics of the funnel indicate that action is needed. For instance, Maze can help us determine what actions to take if we observe a huge drop in the pick up rate of rider trip sessions or find user friction points after the deployment of a new app version. When integrated nicely with other funnel-centric tools such as Flow, Uber’s IFTTT Engine, Maze and Funnel Health will give our teams greater insight into these situations.
Do you want to help us build the next generation of funnel tools? We are looking for back-end, front-end and visualization engineers to join us! Maze also relies on the open source work of the visualization team. We are always looking for contributors. Send us an email here for more details!
Acknowledgments :Maze was originally developed by Andrey Liscovich, Jason Libbey, and Aniket Pant. The original product team included Rafi Lurie, Andrey Liscovich, Sam Suen, Yingchao Liu, Sergei Bezborodko, Alan Sheinberg, Bryan Bierce, Tony Jing, Yujia Luo, and Hugh Williams.
Subscribe to our newsletter to keep up with the latest innovations from Uber Engineering.
Yujia Luo
Yujia Luo is senior software engineer on Uber’s Growth Insights Platform team. He previously worked on Uber’s mobile feed system, focusing on optimizing its performance and developing its architecture.
Jerome Cukier
Jerome Cukier is a senior software engineer on Uber’s Visualization team. He specializes in information design and building visual tools.
Posted by Yujia Luo, Jerome Cukier
Related articles
Most popular
Preon: Presto Query Analysis for Intelligent and Efficient Analytics
Connecting communities: how Harrisburg University expands transportation access with Uber
Making Uber’s ExperimentEvaluation Engine 100x Faster
Genie: Uber’s Gen AI On-Call Copilot
Products
Company