New Release! Git-backed Machine Learning Model Registry for all your model management needs.
Weβre excited to announce the launch of our latest open source offering, MLEM! MLEM is a tool that automatically extracts meta information like environment and frameworks from models and standardizes that information into a human-readable format within Git. ML teams can then use the model information for deployment into downstream production apps and services. MLEM easily connects to solutions like Heroku to dramatically decrease model deployment time.
With MLEM ML teams get a single tool to run your models anywhere that strikes to cover all model productionization scenarios you have.
MLEM enables this via model metadata codification: saving all information that is required to use a model later. Besides packaging a model for deployment it can be used for many things, including search and documentation. To make it even more convenient, MLEM uses human-readable YAML files for that.
Finally, using Git to keep that metainformation allows you to create a Git-native model registry, allowing you to handle model lifecycle management in Git, getting all benefits of CI/CD. Which makes your ML team one step closer to GitOps.
We built MLEM to address issues that MLOps teams have around managing model information as they move them from training and development to production and, ultimately, retirement. The Git-based model (one of our core philosophies) aligns model operations and deployment with software development teams β information and automation are all based on familiar DevOps tools β so that deploying any model into production is that much faster.
Capturing model-specific information requires an understanding of the Programming language and ML frameworks they're created with. That's why MLEM is a Python-specific tool. To provide a developer-first experience, MLEM exposes carefully designed CLI to help you manage DevOps parts of the workflow from CLI and Python API to handle model productionization programmatically.
It's easy to start using MLEM, since it integrates nicely into your existing training workflows by adding a couple of lines:
import mlem
mlem.api.save(
my_model,
"mlem-model",
sample_data=train
)
That produces two files: model binary and model metadata, which is a .mlem
file:
$ ls models
mlem-model mlem-model.mlem
MLEM automatically detects everything you need to run the model: ML framework,
model dependencies (i.e. Python requirements), methods, and input/output data
schema (note, that we didn't specify those above at save
!).
This enables easy codification of arbitrary complex models, such as a Python function in which you average a couple of frameworks or a custom Python class that uses different libraries to generate the features and make a prediction. MLEM saves this information in a simple human-readable YAML file:
# mlem-model.mlem
artifacts:
data:
hash: b7f7e869f2b9270c516b546f09f49cf7
size: 166864
uri: mlem-model
description: Random Forest Classifier
labels:
- random-forest
- classifier
model_type:
methods:
predict_proba:
args:
- name: data
type_:
columns:
- sepal length (cm)
- sepal width (cm)
- petal length (cm)
- petal width (cm)
dtypes:
- float64
- float64
- float64
- float64
index_cols: []
type: dataframe
name: predict_proba
returns:
dtype: float64
shape:
- null
- 3
type: ndarray
type: sklearn
object_type: model
requirements:
- module: sklearn
version: 1.0.2
- module: pandas
version: 1.4.1
- module: numpy
version: 1.22.3
To make ML model development Git-native, MLEM can work with DVC to manage
versions of a model stored remotely in the cloud. Committing both model
metainformation (mlem-model.mlem
) and a pointer to the model binary
(mlem-model.dvc
or dvc.lock
if you train it in a DVC pipeline) to Git allows
you to enable GitFlow and other Software Engineering best practices like GitOps.
The main goal of MLEM is to provide you with a single tool that enables any kind of model productionization scenarios. For MLEM, there are three main groups of those scenarios:
The first one allows you to import your model into a Python runtime, run predict against some dataset directly in the command line, or serve the model with MLEM from your CLI.
$ python
>>> import mlem
>>> model = mlem.api.load("mlem-model")
>>> model.predict(test)
[[0.4, 0.3, 0.3], [0.2, 0.5, 0.3]]
$ mlem apply mlem-model test.csv
[[0.4, 0.3, 0.3], [0.2, 0.5, 0.3]]
$ mlem serve ml-model fastapi
β³οΈ Loading model from ml-model.mlem
Starting fastapi server...
π
Adding route for /predict
π
Adding route for /predict_proba
Checkout openapi docs at <http://0.0.0.0:8080/docs>
INFO: Started server process [5750]
INFO: Waiting for application startup.
INFO: Application startup complete.
INFO: Uvicorn running on http://0.0.0.0:8080 (Press CTRL+C to quit)
The second one allows you to export your models as a Python package, build a
Docker Image, or export it as some special format (like .onnx
which is coming
soon).
$ mlem build mlem-model pip -c package_name=mlem-translate -c target=build/
β³οΈ Loading model from ml-model.mlem
πΌ Written `ml-package` package data to `build/`
$ tree build/
build
βββ MANIFEST.in
βββ ml-package
β βββ __init__.py
β βββ model
β βββ model.mlem
βββ requirements.txt
βββ setup.py
The last one allows you to deploy models to deployment providers, such as Heroku (with AWS Sagemaker and Kubernetes coming soon).
$ mlem deployment run myservice -m mlem-model -t staging -c app_name=mlem-quick-start
β³οΈ Loading deployment from my-service.mlem
π Loading link to staging.mlem
π Loading link to mlem-model.mlem
πΎ Updating deployment at my-service.mlem
π Creating docker image for heroku
π Building MLEM wheel file...
πΌ Adding model files...
π Generating dockerfile...
πΌ Adding sources...
πΌ Generating requirements file...
π Building docker image registry.heroku.com/mlem-quick-start/web...
β
Built docker image registry.heroku.com/mlem-quick-start/web
πΌ Pushing image registry.heroku.com/mlem-quick-start/web to registry.heroku.com
β
Pushed image registry.heroku.com/mlem-quick-start/web to registry.heroku.com
πΎ Updating deployment at my-service.mlem
π Releasing app mlem-quick-start formation
πΎ Updating deployment at my-service.mlem
β
Service mlem-quick-start is up. You can check it out at https://mlem-quick-start.herokuapp.com/
Since MLEM is both CLI-first and API-first tool, you can productionize your models just as easy with Python API:
$ python
>>> from mlem.api import serve, build, deploy
MLEM is a core building block for a Git-based ML model registry, together with other Iterative tools, like GTO and DVC.
ML model registries give your team key capabilities:
Many of these benefits are built into DVC: Your modeling process and performance data become codified in Git-based DVC repositories, making it possible to reproduce and manage models with standard Git workflows (along with code). Large model files are stored separately and efficiently, and can be pushed to remote storage β a scalable access point for sharing.
To make a Git-native registry, one option is to use GTO (Git Tag Ops). It tags ML model releases and promotions, and links them to artifacts in the repo using versioned annotations. This creates abstractions for your models, which lets you manage their lifecycle freely and directly from Git.
$ gto show
ββββββββββββββββββββββββ€βββββββββββ€βββββββββ€ββββββββββ
β name β latest β #stage β #prod β
ββββββββββββββββββββββββͺβββββββββββͺβββββββββͺββββββββββ‘
β pet-face-recognition β v3.1.0 β - β v3.0.0 β
β mlem-blep-classifier β v0.4.1 β v0.4.1 β - β
β dog-bark-translator β v0.0.1 β - β v0.0.1 β
ββββββββββββββββββββββββ§βββββββββββ§βββββββββ§ββββββββββ
$ mlem apply dog-bark-translator ./short-dog-phrase.wav
πΆππ
For more information, visit our model registry page.
β Star MLEM on GitHub and let us know what you think!
Machine Learning should be mlemming! π
Resources:
Have something great to say about our tools? We'd love to hear it! Head to this page to record or write a Testimonial! Join our Wall of Love β€οΈ
Do you have any use case questions or need support? Join us in Discord!
Head to the DVC Forum to discuss your ideas and best practices.