Speaker: Ryan Daniels, University of Cambridge

Writing research software in Python presents numerous challenges to reproducibility - what version of Python is being used? What about the versions of PyTorch, Scikit Learn or Numpy? Should we use Conda, or venv, or Poetry to manage dependencies and environments? How can we control randomness? Do I have the right version of Cuda Toolkit? In principle, given the same data, and same algorithms and methodology, we should be able to reproduce the results of any given experiment to within an acceptable degree of error. Dealing with the above questions introduces significant problems to reproducing experiments in machine learning. In this talk, I would like to convince you that Docker can help alleviate almost all of these questions. Furthermore, combining Docker, git and GitHub can be a powerful workflow, helping to minimise your tech stack, and declutter your python development experience.

Presented at Best Practices in AI Afternoon event 2024-07-05