ML Education at Uber: Frameworks Inspired by Engineering Principles
July 28, 2022 / GlobalIntroduction
At Uber, millions of machine learning (ML) predictions are made every second, and hundreds of applied scientists, engineers, product managers, and researchers work on ML solutions daily.
Uber wins by scaling machine learning. We recognize org-wide that a powerful way to scale machine learning adoption is by educating. That’s why we created the Machine Learning Education Program: a program driven by Engineering Principles that provides a framework for delivering Uber-specific ML educational resources to Uber Tech employees.
Like a production system, education resources, contents, and distribution channels need to be continuously measured, evaluated, and improved. Ensuring each component of the ML Education Program is designed on this premise enabled us to quickly deliver new courses and curriculum that are tailored to engineers and scientists of various backgrounds.
This 2-part article will focus on how we have applied engineering principles when designing and scaling this program, and how it has helped us achieve the desired outcome. Part 1 will introduce our design principles and explain the benefits of applying these principles to technical education content design and program frameworks, specifically in the ML domain. Part 2 will take a closer look at critical components of the program and reflect on the outcomes that make ML Education at Uber a success.
Engineering Principles and ML Education: A Perfect Pair
At Uber, we recognize that not only is knowledge-sharing in technical subject matter areas important, but that doing so at scale in the AI/ML domain can be extremely difficult. Common challenges in large-scale production ML become relevant as we build learning resources designed for ML topics on top of our evolving ML infrastructure and ecosystem.
For instance, one of our core principles when designing education resources is to ensure its reproducibility. This is beneficial for students as this ensures a consistent and repeatable target outcome for them to build on and extend. For instructors, this helps reduce the friction to update the content and establishes a baseline for any future instructors to jump in and help iterate. However, guaranteeing reproducibility in production ML environments is complex; reproducing a ML model alone is already highly dependent on a number of factors, many of which can be stochastic and distributed:
- Features: Models can have drastically different performance based on how features are selected and generated. Every step and transformation taken to generate the training set and datasets for cross-validation should be lineage-tracked and version controlled.
- Model Training: Production ML is often trained and tuned in a distributed manner e.g. data-parallel distributed training patterns tuned using Bayesian Optimization with explore-exploit patterns. Seeds need to be applied and fixed data partitioning / shuffling may be required depending on the dataset size.
- Runtime Environments: Software versions and packages used as part of the workflow to generate the training data or model need to be exactly replicated as a base requirement to reproduce the same model. This is often challenging as the stakeholders in production ML environments e.g. Data Scientists, ML Engineers, and Research Scientists often optimize for different requirements.
- Model Serialization and Deserialization: Trained model often includes a sequence of transformation stages, each with a set of learned parameters. Serialization and deserialization schemes should be standardized to consistently store and reload models and artifacts.
Thankfully, Uber’s internal ML infrastructure has built-in capabilities to relieve challenges commonly seen across the industry. To tackle the aforementioned problem of ML reproducibility, our internal ML infrastructure (as illustrated in the visual below) ensures that:
- All runtime dependencies and environments are containerized and tracked
- Workflow orchestration is managed to ensure each workflow is re-runnable
- Feature fetching, transformation, and joins are tracked and managed by a Feature Store
- All intermediate artifacts, checkpoints, data, etc., are snapshotted, versioned, and persisted
- Model serialization and deserialization are standardized across model types and use cases
All ML applications are managed and executed as a Directed Acyclic Graph (DAG) whereby each node is containerized, allowing us to exactly control and reproduce the runtime dependencies (e.g., library version, service configurations) and environments in which we run our ML workflows. Intermediate results (e.g. model checkpoints, intermediate data formats, model evaluation results) are always snapshotted, versioned, and persisted.
In the context of production ML, this enables us to reproduce runs, plan experiments, or to measure usage. In the context of ML Education, these functionalities ensure course materials and teaching points are reproducible. Existing or new instructors do not need to spend additional efforts trying to reproduce a baseline model or training workflow each time they would like to refresh the content. Specific examples could be if you wanted to demonstrate learning rate adjustment on a specific library version to bypass a performance plateau at specific epochs or to showcase the power of transfer learning, having access to specific model checkpoints and data snapshots is critical.
Core Principles of Uber’s ML Education Program
The capabilities of Uber’s ML infrastructure and ecosystem have enabled us to design, implement, and ground our ML Education program in our design principles. Aside from the core principle of reproducibility discussed above, we have a list of other design principles that comprise Uber’s ML Education program:
Because our subject matter is highly technical, we felt it appropriate to derive our design principles from industry-recognized engineering principles. These principles are applied to both our content development workflow and program frameworks.
A Closer Look: Applying Principles to Content Design
Below you’ll find more detail around how we applied the spirit of each engineering principle to designing the ML curriculum.
Reproducibility
Machine learning is a data-intensive topic. A model’s performance is heavily reliant on the quality of the data an engineer uses as input during its development. At Uber specifically, there is a massive volume of data available to engineers to build from. As if the sheer volume of data were not complex enough, our data is diverse in data types (batch, near-real time), storage hierarchies (pulled from Apache Hive™ tables, or streamed realtime via Apache Kafka®), and subject classifications (location data, pricing data).
Given that we use real examples of Uber business problems in our learning curriculum and our data landscape is vast and evolves rapidly, reproducibility is critical to keep in mind during a course’s development. Reproducibility enables users to seamlessly bring training jobs they ran during a course into Uber’s production environment. On the instructor side, it ensures we can step in anytime to help them debug their results. Lastly, reproducibility in course artifacts provides users with an identical learning experience regardless of their geo-location or the course delivery method they’ve selected.
Extensibility
In June 2021 the ML Education program had 2 live courses and 2 self-serve offerings in its content library. Today, 10+ live courses and 10+ self-serve offerings are available to engineers, TPMs, and product managers across US/CAN, EMEA, APAC, and LATAM.
In a matter of 8 months the program rapidly produced and delivered content globally, without sacrificing quality. This velocity can be largely attributed to extensibility. Our content development workflow and library of content modules were designed to be easily applied to a variety of subject matter–from topics like Introduction to Linear Regression for true ML beginners, to how to access and contribute to Uber’s industry-renowned Feature Store, Palette.
We follow a 6-step process for developing content across any subject. Each development phase is considered to be complete once its corresponding artifacts are documented, reviewed/approved, and stored.
Having a clearly defined list of artifacts (many of which are templatized) enables extensibility across the design/delivery of all ML Education content. For example:
- A standard course spec template to guide the course initiation process
- Standard content modules (theory, hands-on) can be easily applied to new topics, use cases, and requirements
This checklist of artifacts not only powers extensibility, but also allows any new course developer to clearly understand requirements for building a new course from scratch, which helps streamline onboarding of new program volunteers.
Modularity
Every learning resource created by the ML Education program is made up of one or more content modules. A course’s subject matter, intended scope, target audience, and mode of delivery are all major factors in determining which module(s) are appropriate for the resource. Modules (such as theory or hands-on codelab exercises) are templatized, stored, tailored, and plugged into new courses where relevant.
With modularity at the forefront of our content design, we can update individual components of course material without experiencing up- or downstream impact on other modules. We can iterate course content quickly, and update learning materials in step with new feature rollouts or updates. For example: If Uber rolls out enhanced functionality within Palette (our feature store), our course developer team can quickly iterate our theory module of our Intro to Feature Engineering course to highlight the release. The hands-on activity remains unchanged, and edits can be made quickly so the next live session includes the necessary content updates.
Another benefit of modularity is that participants can focus on the portions of courses that are most immediately relevant to them without impacting the quality of their learning experience. A self-serve learner that prefers auditory learning over kinesthetic learning can watch the theory portion of a course, save the hands-on module for later, and still feel as if they’ve had a comprehensive E2E learning experience. Conversely, self-serve learners who understand the basic functionality of a tool or service can skip the theory portion and move right into the hands-on module without feeling like they’ve entered a course at the halfway point.
Lastly, modularity increases consistency across all ML Education learning materials, which positively contributes to our program branding and overall satisfaction (OSAT). Many of our participants attend multiple courses. Applying consistent content modules across courses enables us to package/deliver content with the same “look and feel,” reducing cognitive load for repeat users.
Curious about the intricacies of these modules? We’ll take a closer look at our most commonly used content modules in our follow-up blog post.
A Closer Look: Applying Principles to Program Frameworks
With strong design principles guiding our approach to content development, we needed equally strong design principles guiding the operational aspects of the program. We focused on creating operational frameworks that ensure:
- Users actually engage with the learning resources we create
- Users know where to find our learning resources once they are published
- Our learning resources are easily accessible by Uber’s global workforce
This is where our design principles of accountability, scalability, and discoverability come into play.
Accountability
The impact of our learning resources depends on whether or not they are actually used by the intended audience effectively: Uber engineers, data scientists, applied scientists, PMs, and TPMs.
Discoverability helps ensure that our intended audience knows these resources exist and they can easily locate them, but accountability helps us ensure that attendees properly use the resources they are provided with.
Scalability
The ML Education program’s content should scale in cadence with our feature releases and be closely aligned with business priorities. As new features are released, our extensible course design framework allows us to quickly update content, and works hand in hand with our scalable program frameworks to automate the course release process.
Delivery of ML content contains an element of ambiguity, like the ML development workflow. We need flexibility to accommodate that. For example: 2 topics of similar complexity may be delivered in the same format, yet the OSAT for one topic is scored disproportionately lower than the OSAT of the other. We may not have a clear signal as to why this occurred. To troubleshoot, we require flexibility to update delivery parameters in hopes to yield improved course performance following the next instance. This flexibility is especially important when scaling new courses.
Discoverability
Discoverability can be a massive challenge in a large organization. That said, we took the issue very seriously early in the program’s tenure and wasted no time implementing the principle in our program’s design.
Adobe said “good visibility can lead to good discovery” and we agree. We used increased visibility as a means to ensure discoverability of our program and its contents. Discoverability is not a “one-and-done” solution. It’s something we work towards constantly, taking opportunities wherever we can to present in all-hands forums and connect with engineers embedded into teams outside of UberAI.
Conclusion
Leveraging engineering principles to guide our content and program design frameworks early on in our program’s tenure has allowed us to:
- Snapshot, version control, and reproduce all course contents and resources
- Minimize resource / content maintenance and friction for developing new courses
- Easily re-use or extend course modules to develop new learning paths or targeted micro-learning experiences
- Reduce friction to go from training to production development so all learning resources effectively prepare our target audience
- Scale our program’s impact and reach by ~3x in 1 year
Applying engineering principles to our program’s design is one of many unique aspects to Machine Learning Education at Uber. We can read the next article in this series, ML Education at Uber: Program Design and Outcomes, to learn more about our unique approach to designing content modules, delivering full-length learning solutions, and the outcomes that make our ML Education a success.
Acknowledgements
The ML Education Program would not be possible without Thommen Korah, David Morales, Juan Marcano, Program Sponsor (Smitha Shyam), and the hard work of our ML Education core group and course instructors. This team has dedicated a significant amount of their time to educating Uber Engineers to recognize ML business problems, apply ML solutions at scale, and accelerate their work using our internal tools at Uber.
Apache®, Apache Spark™ and Spark™ are registered trademarks or trademarks of the Apache Software Foundation in the United States and/or other countries. No endorsement by The Apache Software Foundation is implied by the use of these marks.
Brooke Carter
Brooke Carter is a Program Manager based in Seattle, WA. She leads end-to-end strategy and execution of Uber’s Machine Learning Education Program. She focuses on scaling the program’s impact globally and designing program experiments to drive efficiencies in ML Education’s operational frameworks.
Melissa Barr
Melissa Barr is a Senior Technical Program Manager on Uber’s AI Platform team. She is based in New York City. She drives a broad set of programs across ML & AI, specializing in topics with embeddings, recommendation systems, and large language models.
Michael Mui
Michael Mui is a Staff Software Engineer on Uber AI's Machine Learning Platform team. He works on the distributed training infrastructure, hyperparameter optimization, model representation, and evaluation. He also co-leads Uber’s internal ML Education initiatives.
Posted by Brooke Carter, Melissa Barr, Michael Mui
Related articles
Meet the 2020 Safety Engineering Interns: COVID Edition
October 29, 2020 / Global
Most popular
How to Measure Design System at Scale
Preon: Presto Query Analysis for Intelligent and Efficient Analytics
Connecting communities: how Harrisburg University expands transportation access with Uber
Making Uber’s ExperimentEvaluation Engine 100x Faster
Products
Company