Introduction
The Uber Rider app launches features simultaneously on a global scale, changing details across hundreds of screens using thousands of feature flags. It is no longer possible for any designers, engineers, quality assurance, or product managers to fully visualize every single user flow. Uber needs an observability system of similar scale for measuring design quality to prevent subpar user experience, especially when it comes to adopting the existing UI libraries and accessibility best practices packaged under the Uber’s Design System, Base. Without such an observability system–let’s call it Design System Observability–it could be too late when Uber learned through complaints and public media about the end users who would suffer confusing onboarding rides, inconsistent layouts, and frustrating voiceovers/talkbacks sessions.
Design System Observability consists of two main components: an eye and an ear.
The designer-eye challenge
It is often hard to tell by the naked eye the differences between the design specs handoffs and the final implementation in the actual apps.
At Uber, we strive to reuse components if they have already been built. Hence, it has become critical for us to measure this very important metric consistently like any other important engineering quality metric such as test coverage, downtime, latency, etc. Base is Uber’s design system with shared components across Design and Code. This provides a consistent user experience, with a reduced learning curve, accessibility, etc., coming for free. Based on internal research, benefits of using Base components include 3X faster development, 4X fewer visual parity issues, 50% less code, than using custom components. Moreover, future changes like theme and typography updates only take a few lines of code changes then rollout in matters of weeks, not months. Thus, it is important to visualize and measure Base adoption.
Deterministics Counter – the eye for everyone
Users turn on Base Counter, see different elements on the screen getting highlighted, and understand what can be improved. This is a first-of-it-kind deterministic measurement with visual highlights for design system adoption. Instead of a small group of experts who understand design quality, now thousands of people from different functions can measure and start work items to improve their own screens, making quality at scale possible.
There are three major steps in the Base Counter tooling: Trigger, Counting Algorithm, and Decorator.
Trigger for counting: We use an internal framework to capture the screen-level navigation, and trigger the start of the Base Counter. Periodic screen updates automatically trigger the base counting and help us in quantifying the Base stats for a screen without additional manual intervention. Using screen change as a trigger ensures that we are running the Base Counter only when it’s required.
Counting algorithm: Once the tool receives a trigger it starts counting the Base stats for the screen. We first need to figure out which screen is currently being shown to the user. Starting from the application window, we find the topmost view controller that is being shown to the user. Once we get the top view controller, we start the postorder view DFS (depth-first search) traversal, starting at the root view of the top view controller.
Decorators were built to provide support for visualizations to developers, and to also have extensible architecture, which can support future cases when they arise. Coloring of view nodes after identification is done by using the coloring node decorator. All decorators implement the decorator protocol.
The have-you-heard questions
Our teammates often ask each other questions like: “Have you heard about that new home screen launch for the India market?” or “When did that Uber shortcuts experience launch?”
In apps like Uber, it’s common to have a wide variety of user experiences coexisting, due to its massive user base spread across different regions. The feature teams are also very scattered across Uber offices around the world, and work independently. Each feature team conducts A/B experiments to determine the design that offers the best user engagement. Additionally, legal requirements may necessitate displaying certain screens in specific regions of the world. This makes it challenging to objectively tag a screen with a single Base stats metric at scale, as changes in its UI composition results in different Base stats for the screen.
We experimented with various aggregation techniques and ultimately decided to use the mode of all the baseline metrics as the single metric for tagging a screen. The mode automatically captures what the majority of users are actually viewing on your screen.
Daily Automated Analysis – the ear to stay on top of launches
With the Base Counter tooling we have the tool required to calculate the score, but we still have to manually go count this on each screen. We need a pipeline to measure thousands of different screens.
We deployed two complementary approaches to automation. The first approach gathers the broad statistics through analytics events triggered by default for all internal testers. Another approach using testing frameworks to take screenshots and run in-depth analysis including which components were used as well as known custom components we want to track.
With each screen having a defined quality score, we can track it on daily CD builds. Any violations would result in an automated Jira ticket assigned to the respective screen-owning Design and Engineering managers. Combining this automated process with a human Feature Review Readiness process where stakeholders verify the scores, we will ensure that future developments do not degrade and only improve design quality.
Real-world examples
The new eye and ear have helped every product team of Uber Rider drive towards the same goal.
Conclusion
We believe Design System Observability is a must-have for any technology organization that needs both velocity and quality. Here are our learnings:
- While all teams expressed enthusiasm for applying a Design System, most have competing priorities. Hence, it is important to have a shared and trackable OKR.
- There’s a lack of shared understanding about what a Design System truly encompasses. Some believe that simply using a text style or a color style is enough, while others attempt to recreate a component’s appearance without leveraging the existing code. Defining a Design System metrics meant to define an organization’s design quality expectations.
- Always assume people have good intentions and degradation is caused by the lack of guardrails. The earlier an issue is caught in the product development pipeline, the fewer days it takes to fix it.
At the end of the day, all Uber teams want to improve user experience, but often get lost in finding design resources or reaching out to other teams. These metrics were excellent conversation starters for both the Design and Engineering teams to improve their checkpoint and handoff experiences.
Our biggest win has been elevating design metrics to be as important as engineering and business metrics. It was a journey of steady progress, starting with building awareness around Base, our design system. We hosted Base Race challenges to encourage adoption, and eventually secured executive buy-in for adoption push. Trust was built through countless hours of manual audits with domain experts. We addressed pushbacks, refined our methodology, and aligned more managers. Today, what once took hours of manual work each month is now an automated system accessible to everyone, highlighting the impact of our design system journey.
Above was our work with the Rider Android and iOS apps, we will apply the approach to the broader Uber product portfolios. In the future, as Uber constantly evolves different technology stacks within Android, iOS, and Web, the Design System Observability will be an effective system to stop reinventing UI components, launching accessible features, and rolling out high-quality design to thousands of screens quickly. We also have many partners in the company from content designers to researchers who are interested in building on top of our system to deliver best practices at scale. More metrics to come.
Acknowledgments
Special thank you to Mohit Gupta, Reshma Naik, Anukalp Katyal, Arun Babu A S P, Lucia Pineda, and other colleagues for your technical contributions to the Design System Observability infrastructure.
Last but not least, this project can’t happen without the love and attention to details of our Design System team at base.uber.com and the support of the Uber Design organization.
Vietanh Nguyen
Vietanh Nguyen is a Principal Design Engineer, leading cross functional teams to enhance the experiences of millions globally. With a strategic focus on design system adoption, he ensured that every user-facing surface reflects Uber's commitment to functionality and aesthetics. He followed the three main principles of Design Engineering: "Be the Translator between Design and Engineering," "Bridge the Gaps via Toolings," and "Enable Vision with Leadership and Coding."
Alankar Gupta
Alankar Gupta is a Staff Android Engineer at Uber, and part of Rider Foundations team. He is currently working towards BaseUI adoption for core apps and guardrailing the UI quality.
Sagar Pant
Sagar Pant is an iOS developer on the Rider Foundations team. He is working on improving observability for Uber apps.
Posted by Vietanh Nguyen, Alankar Gupta, Sagar Pant
Related articles
Most popular
How to Measure Design System at Scale
The Accounter: Scaling Operational Throughput on Uber’s Stateful Platform
Preon: Presto Query Analysis for Intelligent and Efficient Analytics
Introducing the Prompt Engineering Toolkit
Products
Company