Contributing to this toolbox
We welcome all contributions that help us achieve our aim of speeding up ML/AI research in health and life sciences. Examples of contributions are
Data loaders for specific health & life sciences data
Network architectures and components for deep learning models
Tools to analyze and/or visualize data
All contributions to the toolbox need to come with unit tests, and will be reviewed when a Pull Request (PR) is started.
If in doubt, reach out to the core hi-ml
team before starting your work.
Please look through the existing folder structure to find a good home for your contribution.
Submitting a Pull Request
If you’d like to submit a PR to the codebase, please ensure you:
Include a brief description
Link to an issue, if relevant
Write unit tests for the code - see below for details.
Add appropriate documentation for any new code that you introduce
Ensure that you modified CHANGELOG.md and described your PR there.
Only publish your PR for review once you have a build that is passing. You can make use of the “Create as Draft” feature of GitHub.
Code style
We use
flake8
as a linter, andmypy
and pyright for static typechecking. Both tools run as part of the PR build, and must run without errors for a contribution to be accepted.mypy
requires that all functions and methods carry type annotations, see mypy documentationWe highly recommend to run all those tools before pushing the latest changes to a PR. If you have
make
installed, you can run both tools in one go viamake check
(from the repository root folder)
Unit testing
DO write unit tests for each new function or class that you add.
DO extend unit tests for existing functions or classes if you change their core behaviour.
DO try your best to write unit tests that are fast. Very often, this can be done by reducing data size to a minimum. Also, it is helpful to avoid long-running integration tests, but try to test at the level of the smallest involved function.
DO ensure that your tests are designed in a way that they can pass on the local machine, even if they are relying on specific cloud features. If required, use
unittest.mock
to simulate the cloud features, and hence enable the tests to run successfully on your local machine.DO run all unit tests on your dev machine before submitting your changes. The test suite is designed to pass completely also outside of cloud builds.
DO NOT rely only on the test builds in the cloud (i.e., run test locally before submitting). Cloud builds trigger AzureML runs on GPU machines that have a far higher CO2 footprint than your dev machine.
When fixing a bug, the suggested workflow is to first write a unit test that shows the invalid behaviour, and only then start to code up the fix.
Correct Sphinx Documentation
Common mistakes when writing docstrings:
There must be a separating line between a function description and the documentation for its parameters.
In multi-line parameter descriptions, continuations on the next line must be indented.
Sphinx will merge the class description and the arguments of the constructor
__init__
. Hence, there is no need to write any text in the constructor, only the classes’ parameters.Use
>>>
to include code snippets. PyCharm will run intellisense on those to make authoring easier.To generate the Sphinx documentation on your dev machine, run
make html
in the./docs
folder, and then open./docs/build/html/index.html
Example:
class Foo:
"""
This is the class description.
The following block will be pretty-printed by Sphinx. Note the space between >>> and the code!
Usage example:
>>> from module import Foo
>>> foo = Foo(bar=1.23)
"""
ANY_ATTRIBUTE = "what_ever."
"""Document class attributes after the attribute."""
def __init__(self, bar: float = 0.5) -> None:
"""
:param bar: This is a description for the constructor argument.
Long descriptions should be indented.
"""
self.bar = bar
def method(self, arg: int) -> None:
"""
Method description, followed by an empty line.
:param arg: This is a description for the method argument.
Long descriptions should be indented.
"""