HI-ML Multimodal Toolboxο
This toolbox provides models for working with multi-modal health data. The code is available on GitHub and Hugging Face π€.
Getting startedο
The best way to get started is by running the phrase grounding notebook. All the dependencies will be installed upon execution, so Python 3.7 and Jupyter are the only requirements to get started.
The notebook can also be run on Binder, without the need to download any code or install any libraries:
Installationο
The latest version can be installed using pip
:
pip install "git+https://github.com/microsoft/hi-ml.git#subdirectory=hi-ml-multimodal"
Developmentο
For development, it is recommended to clone the repository and set up the environment using conda
:
git clone https://github.com/microsoft/hi-ml.git
cd hi-ml-multimodal
make env
This will create a conda
environment named multimodal
and install all the dependencies to run and test the package.
You can visit the API documentation for a deeper understanding of our tools.
Examplesο
For zero-shot classification of images using text prompts, please refer to the example
script hi-ml-multimodal/test_multimodal/vlp/test_zero_shot_classification.py
that utilises a small subset of Open-Indiana CXR
dataset for pneumonia detection in Chest X-ray images. Please note that the examples and models are not intended for
deployed use cases β commercial or otherwise β which is currently out-of-scope.
Hugging Face π€ο
While the GitHub repository provides examples and pipelines to use our models, the weights and model cards are hosted on Hugging Face π€.
Creditο
If you use our code or models in your research, please cite the manuscript (accepted to be presented at the European Conference on Computer Vision (ECCV) 2022).
APAο
Boecking, B., Usuyama, N., Bannur, S., Castro, D., Schwaighofer, A., Hyland, S., Wetscherek, M., Naumann, T., Nori, A., Alvarez-Valle, J., Poon, H., & Oktay, O. (2022). Making the Most of Text Semantics to Improve Biomedical VisionβLanguage Processing (preprint)
BibTeXο
@misc{https://doi.org/10.48550/arxiv.2204.09817,
doi = {10.48550/ARXIV.2204.09817},
url = {https://arxiv.org/abs/2204.09817},
author = {Boecking, Benedikt and Usuyama, Naoto and Bannur, Shruthi and Castro, Daniel C. and Schwaighofer, Anton and Hyland, Stephanie and Wetscherek, Maria and Naumann, Tristan and Nori, Aditya and Alvarez-Valle, Javier and Poon, Hoifung and Oktay, Ozan},
keywords = {Computer Vision and Pattern Recognition (cs.CV), Computation and Language (cs.CL), FOS: Computer and information sciences, FOS: Computer and information sciences},
title = {Making the Most of Text Semantics to Improve Biomedical Vision-Language Processing},
publisher = {arXiv},
year = {2022},
}