Instant Experiment Tracking: Just Add DVC!

Start with some simple ingredients (DVC, Git, and Python) and add a few lines to your code using the included DVCLive logging library to make an experiment tracker right inside your development environent. Voilà!

Dave Berenbaum
December 15, 2022 • 4 min read

Header image generated by Dall-E 2

Did you know that DVC can track experiments? Now you can track experiments in DVC by changing a few lines of your Python code.

And with the optional DVC extension for VS Code, you have a full-fledged experiment tracking interface in your IDE!

DVC extension for VS Code

Notebook

Why?

We want to bring the DVC ethos to experiment tracking, but the learning curve for DVC can be steep. That's why we built our Python logging library DVCLive to make it easy to start.

source: https://twitter.com/untitled01ipynb/status/1593911944989270016

All you need to start is a Git repo. There are no logins, servers, databases, or UI to spin up. Every experiment run is saved in a Git commit, but those commits are hidden so they don't clutter your repo, unlike saving each run to a separate directory, or creating a Git branch for each.

From that simple starting point, DVC experiment tracking grows with your project. You don't have to decide today whether you will need to share with your team or backup to cloud storage. That's because DVC builds on top of the tools you already use and allows you to incrementally integrate them.

When you need to share, push existing experiments to your Git provider (GitHub/GitLab). When you need artifact storage, add your own cloud provider and push your existing artifacts. When you need a UI, use VS Code or add ReciprocateX Studio for a collaborative interface.

How to start

Check out the example repo, try it out in a colab notebook, or follow the steps below to start with your own model training code.

Install DVC>=2.38.0 as a library in your Python environment.
```
$ pip install --upgrade dvc
```

Setup a DVC repo where your model training code is (or use an existing repo).

$ git init
$ dvc init
$ git add -A
$ git commit -m "setup dvc repo"

In your code, enable DVC experiment tracking using DVCLive with save_dvc_exp=True. Use the callback for your framework or log your own metrics. You can find examples below (other frameworks available):

Pytorch Lightning

from dvclive.lightning import DVCLiveLogger

...

trainer = Trainer(logger=DVCLiveLogger(save_dvc_exp=True))
trainer.fit(model)

Hugging Face

from dvclive.huggingface import DVCLiveCallback

...

trainer.add_callback(DVCLiveCallback(save_dvc_exp=True))
trainer.train()

Keras

from dvclive.keras import DVCLiveCallback

...

model.fit(
  train_dataset, validation_data=validation_dataset,
  callbacks=[DVCLiveCallback(save_dvc_exp=True)])

General Python API

from dvclive import Live

with Live(save_dvc_exp=True) as live:
    live.log_param("epochs", NUM_EPOCHS)

    for epoch in range(NUM_EPOCHS):
        train_model(...)
        metrics = evaluate_model(...)
        for metric_name, value in metrics.items():
            live.log_metric(metric_name, value)
        live.next_step()

4. Run your code and track the experiment results.

DVC extension for VS Code

Command line

# Show the experiments table in the terminal.
$ dvc exp show
 ────────────────────────────────────────────────────────────────────────────────────
  Experiment                 Created        train_loss   epoch   step   encoder_size
 ────────────────────────────────────────────────────────────────────────────────────
  workspace                  -                0.020196       4    500   512
  main                       Dec 06, 2022            -       -      -   -
  ├── c1759a5 [quare-foil]   08:55 PM         0.020196       4    500   512
  ├── affedee [bitty-tass]   08:55 PM          0.02038       4    500   256
  ├── a5bdc18 [murky-emeu]   08:55 PM         0.016396       4    500   128
  ├── 744f3b6 [sworn-wage]   08:54 PM          0.01972       4    500   64
  └── 0c3ac81 [named-gaby]   08:54 PM         0.031206       4    500   32
 ────────────────────────────────────────────────────────────────────────────────────

# Plot the diff of all experiments in an HTML file.
$ dvc plots diff $(dvc exp list --name-only)
file:///Users/dave/Code/dvclive-exp-tracking/dvc_plots/index.html

Open the HTML to see the plots:

Stay tuned

That's all there is to it! There's lots more coming for DVC experiment tracking, including:

Showing you where to go from here. Share your experiments, add data or pipelines, and use DVC without ever leaving your notebook or Python IDE.
Adding more DVCLive features. Share realtime updates to ReciprocateX Studio, log data and model artifacts, and compare experiments in Python.

Try out the repo or colab notebook and let us know what you think in Discord or GitHub.

Studio

DVC

VS Code Extension

CML

MLEM

Instant Experiment Tracking: Just Add DVC!

Why?

How to start

Stay tuned