health_multimodal.vlp
Visual-language processing tools
Tools related to joint image and text inference |
- class health_multimodal.vlp.ImageTextInferenceEngine(image_inference_engine, text_inference_engine)[source]
Functions related to inference on
ImageTextModel
.- static convert_similarity_to_image_size(similarity_map, width, height, resize_size, crop_size, val_img_transform=None, interpolation='nearest')[source]
Convert similarity map from raw patch grid to original image size, taking into account whether the image has been resized and/or cropped prior to entering the network.
- Return type
ndarray
- get_similarity_map_from_raw_data(image_path, query_text, interpolation='nearest')[source]
Return a heatmap of the similarities between each patch embedding from the image and the text embedding.
- Parameters
image_path (
Path
) – Path to the input chest X-ray, either a DICOM or JPEG file.query_text (
str
) – Input radiology text phrase.interpolation (
str
) – Interpolation method to upsample the heatmap so it matches the input image size. Seetorch.nn.functional.interpolate()
for more details.
- Return type
ndarray
- Returns
A heatmap of the similarities between each patch embedding from the image and the text embedding, with the same shape as the input image.
- get_similarity_score_from_raw_data(image_path, query_text)[source]
Compute the cosine similarity score between an image and one or more strings.
If multiple strings are passed, their embeddings are averaged before L2-normalization.
- Parameters
image_path (
Path
) – Path to the input chest X-ray, either a DICOM or JPEG file.query_text (
Union
[List
[str
],str
]) – Input radiology text phrase.
- Return type
float
- Returns
The similarity score between the image and the text.