Healthy code, healthy patients: coding best practices in medical Data Science (Part 1)

How would you feel knowing that the quality of every single line of your code will directly impact the lives of thousands of people?

Step #0: Structuring repositories

Messy code and folders are the number one killer of efficiency and team productivity. The first thing to do before starting any project is therefore to make sure the repository ­–meaning the folders where the code will be stored– has a sensible structure.

A custom repository structure based on Cookiecutter for Data Science.

A good IDE is a good idea

Another simple trick towards maximizing efficiency and eradicating frustration is choosing the right Integrated development environment (IDE). Basically, an IDE is the program or application you use to write code.

PyCharm’s user interface, with the example of an auto-import suggestion. (From JetBrains’ website)

Version control: Git ’em all

If working in a team can be frustrating for most tasks, coding together can lead to unseen levels of frustration and friction within the group. To make sure that collaborative coding becomes an asset, rather than a burden, version control is the solution. Version control tools enable the possibility to track the changes to the code and the files over time, so that they can be reverted to specific versions at any point in the future, and that changes do not get overwritten by accident. The same project can have parallel branches, where different people can work on the same code simultaneously.

An example of a Merge Request in Gitlab. New code to make interpretable models was added to the main branch of the repository, after begin thoroughly discussed and reviewed.
  1. The four-eyes principle: code can only be merged into the main branch of your repository if it’s reviewed approved by (at least) another person. As an extra perk, having fellow coders review our work also helps improve one’s own coding skills.
  2. Versioning: Each time the master (main) branch of the codebase is changed, it gets a version number. At any point in the future, we will know which version of the code each project has been using when released to production. This also works on the small scale: through the use of ‘checkpoints’ called commits, fateful mistakes can be reverted relatively painlessly.
  3. Continuous integration and continuous delivery (CI/CD): a set of principles to automate the way our tools are built, tested, updated and deployed to production environments. Whenever we push code to Git, a series of automated tests make sure that our code does not fail and does not break existing functionalities and products.

Protect the (virtual) environment

If you are going to collaborate and work on multiple projects at once, you might find that software requirements for each of these can differ widely. Virtual environments are a great way to manage Python packages independently for each project, by creating isolated directory trees containing an installation of a specific version of Python. Different versions of the same package (or even of Python itself) can be installed for different projects, and one only needs to install the strictly necessary packages for each. Environments can, and should, be shared amongst team members, so that one’s results can be reproduced easily. Another nice thing about environments is that they can be contained in versionable files, creating a wonderful synergy with Git. Each version of a project is accompanied by its own environment file, with the exact version of the packages used at that point in time. No more incompatibility and installation problems!

Ready to code!

Up to this point we have covered the necessary bases to create the setup for efficient and productive collaborative coding, and start developing solid programs for medical applications. In the second part of the article we will talk about the actual process of code-writing. You can find it here:

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Pacmed

Pacmed

Pacmed builds decision support tools for doctors based on machine learning that makes sure patients only receive care that has proven to work for them!