health_multimodal.vlp

Visual-language processing tools

inference_engine

Tools related to joint image and text inference

class health_multimodal.vlp.ImageTextInferenceEngine(image_inference_engine, text_inference_engine)[source]

Functions related to inference on ImageTextModel.

static convert_similarity_to_image_size(similarity_map, width, height, resize_size, crop_size, val_img_transform=None, interpolation='nearest')[source]

Convert similarity map from raw patch grid to original image size, taking into account whether the image has been resized and/or cropped prior to entering the network.

Return type: ndarray

get_similarity_map_from_raw_data(image_path, query_text, interpolation='nearest')[source]

Return a heatmap of the similarities between each patch embedding from the image and the text embedding.

Parameters

image_path (Path) – Path to the input chest X-ray, either a DICOM or JPEG file.
query_text (str) – Input radiology text phrase.
interpolation (str) – Interpolation method to upsample the heatmap so it matches the input image size. See torch.nn.functional.interpolate() for more details.

Return type

ndarray

Returns

A heatmap of the similarities between each patch embedding from the image and the text embedding, with the same shape as the input image.

get_similarity_score_from_raw_data(image_path, query_text)[source]

Compute the cosine similarity score between an image and one or more strings.

If multiple strings are passed, their embeddings are averaged before L2-normalization.

Parameters

image_path (Path) – Path to the input chest X-ray, either a DICOM or JPEG file.
query_text (Union[List[str], str]) – Input radiology text phrase.

Return type

float

Returns

The similarity score between the image and the text.

to(device)[source]

Move models to the specified device.

Return type: None