HI-ML Tools for Computational Pathology

The directory hi-ml-cpath contains code for runnning experiments in Computational Pathology.

The tools for computational pathology are best used directly from the Git repository. You can also use the hi-ml-cpath PyPi package to re-use the code in your own projects, for example the deep learning architectures.

Setting up your computer

Please follow the instructions in README to set up your local Python environment.

Onboarding to Azure

Please follow the instructions here to create an AzureML workspace if you don’t have one yet. You will also need to download the workspace configuration file, as described here, so that your code knows which workspace to access.

Creating datasets

In our example models, we are working with two public datasets, PANDA and TCGA-Crck.

Please follow the detailed instructions to download and prepare these datasets in Azure.

Training models

Visualizing data and results in Digital Slide Archive DSA

New Model configurations

To define your own model configuration, place a class definition in the directory health_cpath.configs. The class should inherit from a LightningContainer. As an example, please check the HelloWorld model or the base class for the MIL models.

Mount datasets

If you would like to inspect or analyze the datasets that are stored in Azure Blob Storage, you can either download them or mount them. “Mounting” here means that the dataset will be loaded on-demand over the network (see also the docs). This is ideal if you expect that you will only need a small number of files, or if the disk of your machine is too small to download the full dataset.

You can mount the dataset by executing this script in <root>/hi-ml-cpath:

python src/histopathology/scripts/mount_azure_dataset.py --dataset_id PANDA

After a few seconds, this may bring up a browser to authenticate you in Azure, and let you access the AzureML workspace that you chose by downloading the config.json file. If you get an error message saying that authentication failed (error message contains “The token is not yet valid (nbf)”), please ensure that your system’s time is set correctly and then try again. On WSL, you can use sudo hwclock -s.

Upon success, the script will print out:

Dataset PANDA will be mounted at /tmp/datasets/PANDA.