Photo by Karo Kujanpaa on Unsplash
By Travis Wolf
Introduction — “What is MLOps? — DevOps for ML” — Kaz Sato
Challenges arise as the production of machine learning models scale up to an enterprise level. MLOps plays a role in mitigating some of the challenges like handling scalability, automation, reducing dependencies, and streamlining decision making. Simply put, MLOps is like the cousin of DevOps.
It's a set of practices that unify the process of ML development and operation.
This article serves as a general guide for someone looking to develop their next machine learning pipeline, delivering summaries of topics that will introduce topics of MLOps.
1. Communication and collaboration between roles — “ML products are a team effort”
Production of a successful machine learning lifecycle is a lot like racing in formula one. From the outside, it appears that the driver is the only one responsible for getting the car around the track, but in reality, there are upwards of 80 team members behind the scenes. This is similar to developing an enterprise-level ML product. The data scientist sits in the driver’s seat, directing how the model will be built every step of the way. However, this assumes data scientists have expertise in every step of development which is commonly not the case. The driver is not going to get out and perform maintenance on the car, they need a team of engineers, mechanics, and strategists to be successful.
A successful data science team will also consist of roles that bring different skill sets and responsibilities to the project. Subject matter experts, data scientists, software engineers, and business analysts each play their important role, but they don’t use the same tools or share basic knowledge to communicate effectively with one another. That is why it is important to practice collaboration and communication between the roles every step of the way will ensure getting around the track as quickly as possible.
2. Establish Business Objectives — “Have a clear goal in mind”
Every business has key performance indicators (KPIs). These are measurable values that reflects how well a company is achieving its objectives. The first step in the machine learning lifecycle is taking a business question, and determining how it can be solved with ML, “What do you want from the data?”. Machine Learning model evaluation metrics are measured by accuracy, precision, and recall, how can the predictions be translated to real-world metrics that can be easily understood for project members outside of the data team?
Problems for data science teams arise all the time when they struggle to prove how their model is providing value to the company to stakeholders and upper management. Particularly, a model falls short because there weren’t clear objectives during the exploratory and maintenance phases of development. To combat this, broad business questions like, “How to get people to stay longer on a web-page?” need to be translated into performance metrics that can be set as a goal for the model to strive for. The point of this practice is to have a foundational starting point for the data engineers and scientists to work from, and avoiding the risk of solving a problem that doesn’t serve the business in the long run.
3. Obtaining the Ground Truth — “Validate the dataset”
Arguably the most important step in developing any machine learning model is the process of verifying the labels of the dataset. Ground truthing is imperative to train the model to result in predictions that accurately reflect the real world.
Legitimizing the source of the data and labeling it correctly can be an arduous process, and even maybe the most time-consuming of all. That is why it is important to recognize the amount of time and resources early on in the development process because depending on the size of the dataset could be a hindrance to model performance. For example, if the model is trained to detect objects in a picture, obtaining the ground truth will involve labeling each observation in the dataset with a bounding box, which has to be done manually.
After the model is put into production, the performance might drift away from the original predictions, and will not reflect the population that it had been trained on. In this case, retraining the model on new labels will be necessary and will cost time and resources to do so.
4. Choosing the Right Model — “Experiment and Reproduce”
Different challenges require different tools, and in this case algorithms. Algorithms can range from very simple like linear regression to advanced deep learning neural networks and everything in between. Each model has its advantages and disadvantages and will pose certain considerations affecting MLOps.
The best way to narrow down what algorithm from an MLOps perspective will depend on two things: what kind of data are they performing on, and how well they fit with the CI/CD pipeline. Having these two principles in mind will ultimately reduce dependencies and make life easier when it comes to the deployment phase.
This process will involve a lot of experimentation, validation, and ultimately be able to be reproduced consistently by the DevOps team for deployment.
Practice experimenting with simple models and working up in complexity. This will help you find the best balance between effectiveness and use of resources. The goal is to ultimately end up with a model that is plug and play (like an API), scalable, inputs and outputs are easy to understand and compatible with the production environment.
5. Determine the Type of Deployment — “Continuous Integration and Delivery”
There are typically two methods to consider when deploying a model once it reaches the production stage. Either embedding the model into an application, or staging it as a Software-as-a-Service, or in this case “model-as-a-service”. Each has its advantages and disadvantages, costs, and restraints.
But it is important to have this in mind before setting off on the development of a machine learning pipeline because certain software frameworks will only support specific packages. The production environment needs to be cohesive with the model of choice.
MLOps addresses the challenges that arise once the model is ready to enter production. MLOps borrows the continuous integration and continuous delivery (CI/CD) principle commonly used in DevOps. However, the difference between them is that with ML, the data is continuously being updated, as well as the models. While traditional software only requires code to have CI/CD.
Once the model has been trained and evaluated to perform in a way that delivers value to its user, the next step is deploying the pipeline in a way that can be continuously integrated as the data changes over time. This adds some challenges and needs for MLOps to continuously deliver newly trained and validated models.
6. Containerization — “Works every time, all the time”
Open-source software packages are often very rigid with their dependencies. In turn, this forces software to rely on the exact package versions and modules. Keeping with the theme of streamlining and standardization in MLOps, containerization serves as a way to automate the machine learning environments from development into production. Technologies like Docker serve as a way to practice containerization.
Tools like Docker provides an isolated environment for the model and accompanying applications to run in production. Ensuring the environment for the model and its accompanying application will always have the required packages available.
Each module of the pipeline can be kept in a container that keeps the environment variables consistent. This will reduce the number of dependencies for the model to work correctly. When there are multiple containers in play with deployment, then Kubernetes serves as a great MLOps tool.
In the past year, MLOps has seen exponential growth.
More businesses at an enterprise level are looking to invest in MLOps.
MLOps will serve to streamline, automate, and help scale their ML pipelines. The field is growing every day as well and not everything could be covered in the article. However, keep some of these ideas in mind. That way when your team is considering your next project you can introduce the concept of MLOps.
If you aren’t sure what you want to do with your data, then feel free to reach out to us. I would be happy to help outline some possibilities with you for free.
Drop some time on my calendar today!
Also, feel free to read more about data science and data engineering in the articles below
How Your Team Can Take Advantage Of Your Data Without Hiring A Full-Time Engineer
What Are The Benefits Of Cloud Data Warehousing And Why You Should Migrate
5 Data Analytics Challenges Companies Face in 2021 With Solutions
How To Write Better SQL - Episode 1
Introducing MLOps by (Mark Treveil, Nicolas Omont, Clément Stenac, Kenji Lefevre, Du Phan, Joachim Zentici, Adrien Lavoillotte, Makoto Miyazaki, Lynn Heidmann; 2020).
We are a team of data scientists and network engineers who want to help your functional teams reach their full potential!