HI-ML Tools for Computational Pathology
The directory hi-ml-cpath
contains code
for runnning experiments in Computational Pathology.
The tools for computational pathology are best used directly from the Git repository.
You can also use the hi-ml-cpath
PyPi package
to re-use the code in your own projects, for example the deep learning architectures.
Setting up your computer
Please follow the instructions in README to set up your local Python environment.
Onboarding to Azure
Please follow the instructions here to create an AzureML workspace if you don’t have one yet. You will also need to download the workspace configuration file, as described here, so that your code knows which workspace to access.
Creating datasets
In our example models, we are working with two public datasets, PANDA and TCGA-Crck.
Please follow the detailed instructions to download and prepare these datasets in Azure.
Training models
Visualizing data and results in Digital Slide Archive DSA
New Model configurations
To define your own model configuration, place a class definition in the directory health_cpath.configs
. The class should
inherit from a
LightningContainer.
As an example, please check the HelloWorld
model
or the base class for the MIL
models.
Mount datasets
If you would like to inspect or analyze the datasets that are stored in Azure Blob Storage, you can either download them or mount them. “Mounting” here means that the dataset will be loaded on-demand over the network (see also the docs). This is ideal if you expect that you will only need a small number of files, or if the disk of your machine is too small to download the full dataset.
You can mount the dataset by executing this script in <root>/hi-ml-cpath
:
python src/histopathology/scripts/mount_azure_dataset.py --dataset_id PANDA
After a few seconds, this may bring up a browser to authenticate you in Azure, and let you access the AzureML
workspace that you chose by downloading the config.json
file. If you get an error message saying that authentication
failed (error message contains “The token is not yet valid (nbf)”), please ensure that your
system’s time is set correctly and then try again. On WSL, you can use sudo hwclock -s
.
Upon success, the script will print out:
Dataset PANDA will be mounted at /tmp/datasets/PANDA.