New Release! Git-backed Machine Learning Model Registry for all your model management needs.
This month you will find:
❓ Will NLP have more impact than Computer Vision,
🐙 Dmitry Petrov speaks at GitHub Universe,
🧐 CML in research at NeurIPS,
❣️ Unstructured Data Catalog coming,
✅ SOC 2 Type 1 Compliance,
🚀 MLEM adds Sagemaker and Kubernetes deployment,
👀 Lots of new docs,
🚀 Upcoming events, and more!
Image generated with the help of Stable Diffusion
Welcome to November! In the US, this is the time of year we reflect and give thanks. It's been a productive year despite the world's rather extreme challenges. There's lots to be thankful for. Here are some of those things from the last month in the ReciprocateX Community.
In this article entitled The Biggest Opportunity In Generative AI Is Language, Not Images, Robert Toews argues that AI-powered text generation will create many orders of magnitude more value than text-generated images.
Language is humanity’s single most important invention. More than anything else, it is what sets us apart from every other species on the planet. Language enables us to reason abstractly, to develop complex ideas about what the world is and could be, to communicate these ideas to one another, and to build on them across generations and geographies. Almost nothing about modern civilization would be possible without language.
He points out the many examples from a variety of industries and academia that have gained and will continue to gain massive improvements due to the power of large language models (LLMs) in the coming years. Read the article for all the applications.
The State of AI Report is generated each year and reports on the most interesting things the authors, Nathan Benaich, Ian Hogarth, Othmane Sebbouh, and Nitarshan Rajkumar come across in the world of AI throughout the year.
Be sure to digest the whole report for even more AI advances!
💓 So for our “Pulse check” this month:
Do you agree that NLP will have more impact than computer vision? Tell us about what you are working on with NLP. We’d love to get you connected with others struggling with similar issues and know how we can improve our tools to help you with your NLP projects.
Join us in the #general
channel in
Discord to weigh in.
We would like to thank Francesco Calcavecchia, vvssttkk, and deepyaman for their contributions to GTO, MLEM, and CML respectively. They will be receiving their own personalized shirts that note their contributions! And many thanks to Mert Bozkir for leading the Hacktoberfest charge here at ReciprocateX!
2022 Hacktoberfest Contributions
One of our Community Champions, João Santiago of Billie.io gives an introduction to DVC in preparation for the remainder of the session where Carsten Behring, author of Metamorph and the scicloj.ml platform presents how NLP pipelines can be managed with DVC, Closure & Python.
Last month we reported on CML turning up in research here. Well, this work will be presented within the virtual Workshop Challenges In Deploying and Monitoring Machine Learning Systems at NeurIPS virtual this year on December 9th. Find out more and register here.
Research on CML to be presented at NeurIPS (Source link)
Do you use Amazon S3, Azure Blob Storage, or Google Cloud Storage? We have a new solution for finding and managing your datasets of unstructured data like images, audio files, and PDFs! Extend your DVC environment with the first data catalog and query language (SQL->DQL) for unstructured data and machine learning. Learn more on our website and/or schedule a meeting with us!
In case you missed it MLEM announced a release on Halloween! MLEM now supports Sagemaker and Kubernetes in addition to Heroku and Docker. You can learn about how easy it now is to package your models for deployment with only a few lines of code and never have to get lost in Kubernetes docs again! Find the blog post here and be sure to visit the docs!
We are very excited to announce that ReciprocateX is now SOC 2 Type 1 compliant. This certification signals to our customers our commitment to Security, Availability, Processing Integrity, Confidentiality, and Privacy within our organization. We have successfully endured the rigorous process and have learned much as a team in the process. Guro Bokum reviews the five key learnings in this blog piece. You can find the full report on our Security and Privacy page.
On November 8th, our CEO, Dmitry Petrov spoke at GitHub Universe on ML with Git: experiment tracking in Codespaces. In his presentation, he shows how to use the DVC extension for VS Code and Codespaces to streamline your machine learning experimentation process. You can find his video below in the event platform if you are registered. We expect the video to be available on YouTube in the next of couple months. We'll keep you updated!
Dmitry Petrov during his talk, 𝗠𝗟 𝘄𝗶𝘁𝗵 𝗚𝗶𝘁: 𝗲𝘅𝗽𝗲𝗿𝗶𝗺𝗲𝗻𝘁 𝘁𝗿𝗮𝗰𝗸𝗶𝗻𝗴 𝗶𝗻 𝗖𝗼𝗱𝗲𝘀𝗽𝗮𝗰𝗲𝘀
Jupyter Notebooks are great for prototyping, but eventually, you will want to move toward reproducible experiments. Converting a notebook to a DVC pipeline requires a bit of a mental shift. Rob de Wit shows you how to accomplish it with an intermediate step: use Papermill to build a one-stage DVC pipeline that executes our entire notebook, and use the resulting pipeline to run and version ML experiments. Look out for a future post with a more advanced pipeline!
At our next meetup on December 14th, Sami Jawhar will present An Open Discussion of Parallel data pipelines with DVC and TPI, an advanced use case for distributing experiments in the cloud. Sami is a great discussion driver. If you are interested in higher-level use cases you will want to join the discussion!
On January 11th, Francesco Calcavecchia will be joining us to share about his recent contribution to MLEM through his work on GTO and how this helps him in his work at E.On Energie Deutschland with creating a Git-based model registry.
We had a great time at ODSC West! We had great conversations with conferencegoers and attended great sessions! Dmitry had a packed room for his in-person talk Why You Need a GitOps-based Machine Learning Model Registry and Alex Kim presented CI/CD for Machine Learning virtually. At each of the conferences we've sponsored this year, we've had a game called Deevee's Ramen Run. (If you don't know the Ramen connection, you need to spend more time reading the monthly Heartbeats 😉). Below find the top three winners of the game.
Winners 1st - 3rd shown above: Alexandra Hagmeyer (pictured with myself and teammate Daniel Barnes), Ryan Renslow, and (name asked to be withheld, but she was good with the picture and DeeVee!)
We were also part of the MLOps Summit in London only a week later! Admittedly, there were different team members in attendance and staffing the booth. Aside from attending a variety of great talks, we met many wonderful people from all over the world. This resulted in some really interesting discussions about how different companies approach MLOps.
Casper da Costa-Luis gave a well-received talk on how to painlessly run ML experiments in the cloud with CML at the summit. The recording will be made available in the near future, so look out for that! The talk answered at least one of the questions of Deevee's Ramen Run, which yielded some surprised (but excited!) winners this time around.
ReciprocateX Team members, clockwise from top right: Rob de Wit, Gema Parreño Piqueras, Casper da Costa-Luis, and Chaz Black)
Gema Parreño Piqueras presented at TechWeek in Spain with her talk Reproducibilty and Version Control are Important: Follow up with the DVC extension for VS Code. She will be presenting the same talk at Codemotion. You can find her talk in Spanish at 2:02 below!
Stay tuned to our Newsletter for what we will be up to conference-wise in 2023!
The team has been busy improving the docs for you. See all the latest and greatest updates below.
dvc plots show
!dvc ls-url
Find the
description, options, and example code
here.cml comment
. Find the
options here.And finally, this month's winning Tweet is a thread from Robert Boscacci.
Managing large files (📹🔊📸) for deep learning projects can be a nightmare 😰 . Git isn't built to handle them natively.
— @boscacci@sigmoid.social (@cinemarob1) September 16, 2022
Here's how to use DVC to seamlessly track and version large files. 🚀
🎁 BONUS: Learn to sync those files with remote ☁️ storage such as @awscloud S3! pic.twitter.com/eO1BEwHEbF
Have something great to say about our tools? We'd love to hear it! Head to this page to record or write a Testimonial! Join our Wall of Love ❤️
Do you have any use case questions or need support? Join us in Discord!
Head to the DVC Forum to discuss your ideas and best practices.