Creating montages from whole slide images (WSIs)
For working with large amounts of histology data, it is often useful to create montages of the data.
Montages are a collection of images that are stitched together to form a single image.
Montages are useful for visualizing large amounts of data at once, and can be used to create a single image that can be used for analysis.
hi-ml-cpath toolbox contains scripts that help with the creation of montages from whole slide images (WSIs).
Creating montages can be very time-consuming. It can hence be helpful to run the process in the cloud. The montage creation code provided here can be run in AzureML very easily.
Types of data for montage creation
Montages can be created from a folder of images, by specifying the name of the folder and a glob pattern, like
Montages can be created by first reading a file called
dataset.csvlocated in a folder.
dataset.csvis effectively a Pandas DataFrame, with each row corresponding to a single image. More details on the format of
dataset.csvcan be found below.
Montage creation works for all WSI image formats that are supported by either of the two possible backends:
Check out the
git clone https://github.com/microsoft/hi-ml
Run the following commands:
conda activate HimlHisto
All the commands listed below assume that
you have activated the Conda environment
your current working directory is
Creating montages from a folder with files
The following command will create a montage from all files in the folder
/data that match the pattern
python src/health_cpath/scripts/create_montage.py --dataset /data --image_glob_pattern '**/*.tiff' --level 2 --width 1000 --output_path montage1
This will create a montage from all TIFF files in folder
/data. Each TIFF file is read as a multi-level image, and
level 2 is read for creating the montage.
--width argument determines the width in pixel of the output image. The height of the output image is determined
automatically. Note that the width given here, 1000, is suitable only for montages from a very small number of files
(say, 10). See below for more details on this commandline option.
This will create two images in the folder specified by the
--output_path montage1 argument, hence outputting the files
Here’s an example how this could look like for a folder with 6 images,
Creating montages from a
If the montage creation script is only pointed to a folder, without providing a glob pattern,
it assumes that a file
dataset.csv is present. A montage will be created from only the images
dataset.csv. In addition, an optional
label column will be added to the text that is
overlayed onto the images itself.
The dataset file should be a CSV file, with each row corresponding to a single image.
When working with a
dataset.csv file, the following columns are handled:
The path of the image that should be loaded
A unique identifier for the slide
An additional string that will be placed on the montage, This could be
The path of an additional image that will rendered next to the image given in
Consider this example dataset file:
2.tiff,ID 2,Label 2
3.tiff,ID 3,Label 3
4.tiff,ID 4,Label 4
5.tiff,ID 5,Label 5
Run montage creation with the following command:
python src/health_cpath/scripts/create_montage.py --dataset /data --level 2 --width 1000 --output_path montage1
This would produce (assuming that the images
5.tiff are present in the folder
/data) a montage similar to this one:
Using inclusion or exclusion lists
When creating montages from a
dataset.csv file, it is possible to create montages from only a specific subset
of rows, or all rows apart from those in a given list.
--exclude_by_slide_id exclude.txtargument to point to a file with a list of slide IDs that should be excluded from the montage.
--include_by_slide_id include.txtargument to point to a file with a list of slide IDs for which the montage should be created.
include.txt should contain one slide ID per line.
Other commandline options
--width=20000to set the width of the output montage image. The height of the output image is determined automatically. Mind that the default value is 60_000, suitable for several hundreds of input images. If you want to try out montage creation on a small set of files (say, 10), ensure that you set the width to a reasonably small value, like
--parallel=2to specify the number of parallel processes that should be used for creating image thumbnails. Thumbnails are created in a first step, using multiple processes, and then the thumbnails are stitched into the final montage in the main process.
--backend=cucimto switch the image reader backend to
CuCIM. The default backend is
Running in Azure
create_montage.py script can be run in AzureML by adding 3 commandline arguments.
To set up Azure and AzureML:
Follow the steps in the AzureML onboarding.
At the end of the onboarding you will download a file
config.jsonfrom your AzureML workspace to your repository root folder.
To understand datasets, please read through the AzureML datasets documentation. Then create an AzureML datastore that points to your Azure Blob Storage account.
Upload your WSIs to a folder in Azure Blob Storage. This can be done most efficiently via azcopy.
azcopycan also copy directly across cloud providers, for example from AWS to Azure.
The following command will upload all files in the folder
my_test_slides to a container
datasets in your Azure Blob
Storage account called
mystorage, creating a folder
my_test_slides in the storage account in the process:
azcopy copy my_test_slides https://mystorage.blob.core.windows.net/datasets/ --recursive
The following command will then create a run in AzureML that executes montage creation from that folder:
python src/health_cpath/scripts/create_montage.py --dataset my_test_slides --level 2 --width 1000 --cluster <clustername> --conda_env environment.yml --datastore <datastorename>
In this command, replace the following:
my_test_slideswith the name of the folder in blob storage where you uploaded your WSIs.
clusternameis the name of a compute cluster where your job will execute)
datastorenameis the name of an AzureML datastore, essential a pointer to your blob storage account plus the credentials that are necessary to access it. For the above example, the data store needs to point to storage account
The command above will only run for a minute or less - it will mostly create a snapshot of the code and send that off to the cloud for execution. At the end you will see a link printed out that takes you to the AzureML portal, where you can monitor the progress of the run.
Once the run is completed, you will find two files
montage.png in the tab “Outputs” of the run, and
an option to download it to your machine.