PaperSpace’s Gradient
Gradient is one of the early pioneers in the MLOps space. Created by PaperSpace, Gradient is a platform for streamlining the development, training and deployment of machine learning models. The platform abstracts machine learning programs as projects that include the following components:
- Notebooks: Instances of Jupyter notebooks that include the initial version of models.
- Experiments: Workflows that train machine learning models in GPUs without requiring the users to manage any infrastructure.
- Jobs: Components that execute generic tasks related to the lifecycle of machine learning programs.
- Data: Datasets used in the training and validation of machine learning models.
- Models: The code related to machine learning models.
- Deployments: Microservices that abstract the serving of a machine learning model.
PaperSpace’s Gradient is not an open-source project, but that hasn’t prevented it from developing a very impressive customer base.
How can I use it: Gradient is commercially distributed by Paperspace
KubeFlow
Kubeflow is one of the most exciting new entrants in the MLOps space. It was initially incubated as an internal project at Google to leverage Kubernetes in the deployment of TensorFlow models. After the initial success, the project was open-sourced and has gained a lot of contributors and adopters.
Kubeflow enables the lifecycle management of machine learning workflows using Kubernetes. Containers have become an essential component for packaging and deploying machine learning models. Kubeflow capitalizes on that by adapting Kubernetes’ container orchestration capabilities to the requirements of machine learning workflows. In that sense, Kubeflow can be seen as a machine learning toolkit for Kubernetes. Since its initial release, Kubeflow has developed a sophisticated set of capabilities to automate the lifecycle of machine learning models using Kubernetes. Although Kubeflow’s feature set is very extensive, there are some components that are worth highlighting:
- Pipelines: The cornerstone of the Kubeflow platform, the pipelines engine enables the deployment of end-to-end machine learning workflows in a portable format.
- Feast: Kubeflow incorporates the Feast feature store to enable the discovery and management of features in machine learning models.
- Fairing: This component enables the training and deployment of machine learning models in the Kubeflow platform.
- Training Frameworks: Kubeflow integrates with different frameworks and libraries to streamline the training of machine learning models across frameworks such as TensorFlow, PyTorch, MXNet and others.
- Dashboard: Kubeflow includes a visual dashboard to manage the end-to-end lifecycle of machine learning models.
Kubeflow has been integrated into different cloud runtimes such as AWS and Azure as well as hybrid cloud infrastructures.
How can I use it: Kubeflow is open source and available at github.
TensorFlow Extended
TensorFlow Extended(TFX) is the implementation of the TFX paper explained in the previous section. TFX enables the key building blocks to define, serve and monitor TensorFlow models. The current implementation of TFX includes the following components:
- Pipelines: TFX pipelines enables the orchestration of machine learning workflows on different platforms, such as Apache Airflow and Kubeflow Pipelines.
- Standard Components: TFX standard components include many abstract building blocks such as training or data transformations that are relevant in machine learning workflows.
- Libraries: TFX libraries group different components to enable specific functionality. Current libraries include TensorFlow Data Validation and TensorFlow Transform.
TFX is a native component of TensorFlow, which facilitates its broad adoption within the developer and data science community.
How can I use it: TFX is open source and available at github.
MLflow
Built by DataBricks, MLflow is an open-source platform enabling key building blocks to automate the lifecycle of machine learning models. The MLflow project is organized into four main components that abstract different stages of the lifecycle of machine learning models:
- MLflow Tracking: This component is, essentially, an API for logging different elements of a machine learning model such as parameters, metrics and versioning artifacts.
- MLflow Projects: This component provides a standard format for structuring machine learning projects in a way that can ensure reproducible runs in platforms like Docker or Conda.
- MLflow Models: This component is a packaging format for deploying machine learning models across different runtimes such as Apache Spark, Azure ML and AWS SageMaker.
- MLflow Model Registry: This component is a centralized model store, APIs and user experience to manage the lifecycle of machine learning models.
MLflow provides integration with different deep learning frameworks and platforms, which has facilitated its adoption within the machine learning community.
How can I use it: MLflow is open source and available at github.