“Data Versioning: Towards Reproducibility in Machine Learning,” a Presentation from Tryolabs

Nicolás Eiris, Machine Learning Engineer at Tryolabs, presents the “Data Versioning: Towards Reproducibility in Machine Learning” tutorial at the May 2022 Embedded Vision Summit.

Surprisingly in 2022, reproducibility is still a big pain point in most data science workflows. A critical element required for reproducibility is version control. Unfortunately, in machine learning there is a notorious lack of standards for version control, so developers typically resort to crafting ad-hoc workflows. And frequently, developers reinvent the wheel due to a lack of awareness of existing solutions.

In this talk, Eiris introduces DVC, short for “Data Version Control,” an open-source tool that Tryolabs has found can significantly alleviate the pain of reproducibility in data science workflows. He covers the motivation for such a tool, digs into its main features and will hopefully convince you that your life will be much better if you integrate it into your next project. Everything is illustrated through a real-world example of an end-to-end ML pipeline.

See here for a PDF of the slides.

Here you’ll find a wealth of practical technical insights and expert advice to help you bring AI and visual intelligence into your products without flying blind.

Contact

Address

Berkeley Design Technology, Inc.
PO Box #4446
Walnut Creek, CA 94596

Phone
Phone: +1 (925) 954-1411
Scroll to Top