We are a Flatworld Solutions company.
Menu Quadratyx Icon

Machine Learning Operations

Continuous integration and continuous deployment (CICD) have become key building blocks of modern software architectures, creating an entirely new generation of platforms in the space. With the emergence of machine learning, it’s only natural that many technologists try to adapt the CICD practices from traditional software trends into the machine learning world. Not surprisingly, the results of these experiments were not particularly positive. Machine learning models operation is based on a different lifecycle than that of traditional software applications. As a result, most CICD practices prove to be impractical when applied to machine learning scenarios. The end result of this friction was the emergence of a new discipline: machine learning operations (MLOps).
Conceptually, MLOps is the equivalent of DevOps for machine learning. The discipline looks to automate the lifecycle of machine learning models and corresponding datasets from experimentation to production. The point about datasets is worth noting. In machine learning applications, the lifecycle of datasets is as important as the lifecycle of code, which is a problem that doesn’t exist in other software disciplines. In addition to automation, MLOps processes require active collaboration between data scientists and machine learning engineers.
Most CICD tools and platforms prove to be insufficient for addressing the complexities of MLOps workflows. As a result, we are seeing a new generation of MLOps platform trying to enable automation in machine learning workflows. These days, most tier 1 machine learning platforms include some level of MLOps capabilities and there is a new generation of startups specialized in that area. It’s going to be interesting to see whether MLOps becomes a standalone category on the machine learning market. Certainly, they are a key component of any modern machine learning application.
MLOps is becoming a key component of the life cycle of machine learning models and consequently, the market is seeing an explosion in the number of platforms and frameworks trying to address this challenge. It’s becoming increasingly overwhelming for data scientists to stay up-to-date with the developments of this market. Below, we’ve listed key MLOps platforms you should be aware of, based on their initial traction and trajectory.

PaperSpace’s Gradient

Gradient is one of the early pioneers in the MLOps space. Created by PaperSpace, Gradient is a platform for streamlining the development, training and deployment of machine learning models. The platform abstracts machine learning programs as projects that include the following components:

  • Notebooks: Instances of Jupyter notebooks that include the initial version of models.
  • Experiments: Workflows that train machine learning models in GPUs without requiring the users to manage any infrastructure.
  • Jobs: Components that execute generic tasks related to the lifecycle of machine learning programs.
  • Data: Datasets used in the training and validation of machine learning models.
  • Models: The code related to machine learning models.
  • Deployments: Microservices that abstract the serving of a machine learning model.

PaperSpace’s Gradient is not an open-source project, but that hasn’t prevented it from developing a very impressive customer base.

How can I use it: Gradient is commercially distributed by Paperspace

KubeFlow

Kubeflow is one of the most exciting new entrants in the MLOps space. It was initially incubated as an internal project at Google to leverage Kubernetes in the deployment of TensorFlow models. After the initial success, the project was open-sourced and has gained a lot of contributors and adopters.

Kubeflow enables the lifecycle management of machine learning workflows using Kubernetes. Containers have become an essential component for packaging and deploying machine learning models. Kubeflow capitalizes on that by adapting Kubernetes’ container orchestration capabilities to the requirements of machine learning workflows. In that sense, Kubeflow can be seen as a machine learning toolkit for Kubernetes. Since its initial release, Kubeflow has developed a sophisticated set of capabilities to automate the lifecycle of machine learning models using Kubernetes. Although Kubeflow’s feature set is very extensive, there are some components that are worth highlighting:

  • Pipelines: The cornerstone of the Kubeflow platform, the pipelines engine enables the deployment of end-to-end machine learning workflows in a portable format.
  • Feast: Kubeflow incorporates the Feast feature store to enable the discovery and management of features in machine learning models.
  • Fairing: This component enables the training and deployment of machine learning models in the Kubeflow platform.
  • Training Frameworks: Kubeflow integrates with different frameworks and libraries to streamline the training of machine learning models across frameworks such as TensorFlow, PyTorch, MXNet and others.
  • Dashboard: Kubeflow includes a visual dashboard to manage the end-to-end lifecycle of machine learning models.

Kubeflow has been integrated into different cloud runtimes such as AWS and Azure as well as hybrid cloud infrastructures.

How can I use it: Kubeflow is open source and available at github.

TensorFlow Extended

TensorFlow Extended(TFX) is the implementation of the TFX paper explained in the previous section. TFX enables the key building blocks to define, serve and monitor TensorFlow models. The current implementation of TFX includes the following components:

  • Pipelines: TFX pipelines enables the orchestration of machine learning workflows on different platforms, such as Apache Airflow and Kubeflow Pipelines.
  • Standard Components: TFX standard components include many abstract building blocks such as training or data transformations that are relevant in machine learning workflows.
  • Libraries: TFX libraries group different components to enable specific functionality. Current libraries include TensorFlow Data Validation and TensorFlow Transform.

TFX is a native component of TensorFlow, which facilitates its broad adoption within the developer and data science community.
How can I use it: TFX is open source and available at github.

MLflow

Built by DataBricks, MLflow is an open-source platform enabling key building blocks to automate the lifecycle of machine learning models. The MLflow project is organized into four main components that abstract different stages of the lifecycle of machine learning models:

  • MLflow Tracking: This component is, essentially, an API for logging different elements of a machine learning model such as parameters, metrics and versioning artifacts.
  • MLflow Projects: This component provides a standard format for structuring machine learning projects in a way that can ensure reproducible runs in platforms like Docker or Conda.
  • MLflow Models: This component is a packaging format for deploying machine learning models across different runtimes such as Apache Spark, Azure ML and AWS SageMaker.
  • MLflow Model Registry: This component is a centralized model store, APIs and user experience to manage the lifecycle of machine learning models.

MLflow provides integration with different deep learning frameworks and platforms, which has facilitated its adoption within the machine learning community.

How can I use it: MLflow is open source and available at github.