Research Software Engineering

A Guide to the Open Source Ecosystem

Author

Matthias Bannert

Published

July 12, 2022

Preface

Note: This book will be published by Chapman & Hall/CRC. The online version of this book is free to read here (thanks to Chapman & Hall/CRC), and licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License. If you have any feedback, please feel free to file an issue on GitHub. Thank you!

“One of the things that Baidu did well early on was to create an internal platform for deep learning. What that did was enable engineers all across the company, including people who were not AI researchers, to leverage deep learning in all sorts of creative ways - applications that an AI researcher like me never would have thought of” – Andrew Ng

The vast majority of data has been created within the last decade. In turn, many fields of research are confronted with an unprecedented wealth of data. The sheer amount of information but also the complexity of modern datasets continues to point a kind researcher to programming approaches who had not considered programming to process data so far. Research Software Engineering aims at two things: First, to give a big picture overview and starting point to reach what the open source software community calls a ‘software carpentry’ level. Second, the book gives an understanding of the opportunities of automation and reproducibility, as well as the effort to maintain the required environment. This book argues a solid programming skill level and self-operation is totally in reach for most researchers. And most importantly, investing is worth the effort: being able to code leverages field specific expertise and fosters interdisciplinary collaboration as source code continues to become an important communication channel.

Meet Dr. Egghead who started his quest to figure out how his assistant got a week’s work done in two hours. “Hacker’s wear hoodies, you know,” he mumbles as he pulls up his coat’s hood and starts to think…

Acknowledgements

I am thankful for the inspiration, help and perspectives of everyone who contributed to this book at its different stages. There are several people and organizations, I would like to thank, in particular.

First, I would like to thank David Grubbs from CRC for getting me started. From our first meeting at useR! in Toulouse, France, David helped streamline writing a book, and he kept adding value throughout the process with his remarks and contacts. I would like to thank Achim Zeileis who also played an important role at this early stage of my book project. Achim did not only inspire me to become a co-host for useR! which led to many experiences that became important to this book. Our discussions about teaching amplified my motivation to have good material to accompany my own course – which clearly turned out to be one of the most important drivers eventually.

In that regard, I would like to thank all participants of my Hacking for … courses. Your insights, questions, feedback and semester projects have been invaluable to this project. Your field specific expertise is inspirational not only to me but also to readers of the book as it shows the broad relevance of the approach. I would like to thank ETH Zurich, in particularly the Department of Management, Technology and Economics (D-MTEC) and the KOF Swiss Economic Institute for hosting my ideas over the last 13 years. Thank you to educational developers Karin Brown and Erik Jentges - having teaching professionals with open ears and minds around helped to channel motivation and ideas into a course concept that continues to be popular among participants across departments and disciplines. Torboern Netlandt deserves credit for enabling this widespread interest. His early advice turned what was initially thought of as Hacking for Economists into Hacking for Social Sciences which eventually became Hacking for Science.

Though this book is not exactly a book about the R programming language, I would like to thank the R community in particular because of its instrumental role in growing my own engagement and horizon of the open source ecosystem. Thank you, statisticians Siegfried Heiler and Toni Stocker who pointed me to the R language almost 20 years ago. One of my most important learnings about how to leverage open source work may have come only those 20 later when the community held a virtual useR! conference during a time that was unprecedented, at least in my lifetime. Through its sharing nature and established workflow, open source work is capable of reaching and impacting so many more than just those on one’s main diagonal. Working with Rocío Joo, Dorothea Hug Peter, Heather Turner and Yanina Bellini Saibene has shown me the inclusion extra mile is not only worth the effort, but an inevitable mindset to bring our work as developers and scientists to the next level. Thank you, ladies! Thanks also to the local R user group here in Zurich, Switzerland, who continues to show the practical relevance of open source through their events. I want to mention the great people at my new employer, cynkra, too. Your influence on the community, myself and therefore this book cannot be forgotten. Thank you for your support and open ears!

Last but not least, I would like to thank Emily Riederer for her time and patience discussing my thoughts and reviewing my drafts. Particulary, her ideas to streamline and balance my ideas at a stage where they were rather messy added great value. Admittedly, your constructive feedback has caused some extra work, but was instrumental to making this book useful and accessible to a wider audience – thank you.