Reproducibility in Computational Science: It’s Time


The SOS conference series is an annual forward-looking gathering focused on emerging trends in high performance computing.

I have organized a panel session on reproducibility. Speakers are myself, Torsten Hoefler, Victoria Stodden and Ivo Jimenez.

The session abstract is:

Reproducibility is the heart of the scientific method. No scientific result has sustained value if it cannot be reproduced. Even so, computational reproducibility is still maturing. Many published computational results are not reproducible, often because not enough information from the computational experiment is retained for future use. We do not keep, or have the ability to keep, an accounting of the full environment used to produce a result. Furthermore, some computational experiments are not performed with enough rigor to even provide trustworthiness in the original results.

The state of the computational scientific method is not necessarily a poor reflection on the community. Instead, it reflects the complexity and tremendous variation of our computational tools and environments, and an inability to completely contain and distribute the full state of these environments. In our focus on rapid capability growth and innovation, and the related benefits, we are challenged in our ability to fully capture state and make it reusable and transferrable.

In this session, we discuss new approaches and opportunities to improve reproducibility for computational science. Recent advances in software technologies such as containers and their complementary Linux-based environments, increasing expectations for reproducibility of published results, and improvements in software quality due to highly productive software platforms such as Atlassian and GitHub enable significant productivity improvements. We will also discuss challenges, particular those that are most relevant to boutique computing environments such as leadership computing facilities, where system access is limited and expensive, software and hardware environments are changing, and workflows are non-standard.