health_multimodal.image

Image-related tools

`inference_engine`
`utils`

`io`
`transforms`

`encoder`
`model`
`modules`
`resnet`
`transformer`
`types`

class health_multimodal.image.BaseImageModel[source]

Abstract class for image models.

Initializes internal Module state, shared by both nn.Module and ScriptModule.

abstract forward(*args, **kwargs)[source]

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

Return type: ImageModelOutput

class health_multimodal.image.ImageEncoderType(value)[source]: An enumeration.

class health_multimodal.image.ImageInferenceEngine(image_model, transform)[source]

Encapsulate inference-time operations on an image model.

Parameters

img_model – Trained image model
transform (Compose) – Transform to apply to the image after loading. Must return a torch.Tensor that can be input directly to the image model.

get_projected_global_embedding(image_path)[source]

Compute global image embedding in the joint latent space.

Parameters: image_path (Path) – Path to the image to compute embeddings for.
Return type: Tensor
Returns: Torch tensor containing l2-normalised global image embedding [joint_feature_dim,] where joint_feature_dim is the dimensionality of the joint latent space.

get_projected_patch_embeddings(image_path)[source]

Compute image patch embeddings in the joint latent space, preserving the image grid.

Parameters: image_path (Path) – Path to the image to compute embeddings for.
Return type: Tuple[Tensor, Tuple[int, int]]
Returns: A tuple containing the image patch embeddings and the shape of the original image (width, height) before applying transforms.

load_and_transform_input_image(image_path, transform)[source]

Read an image and apply the transform to it.

Read the image from the given path
Apply transform
Add the batch dimension
Move to the correct device

Parameters: return_original_shape – Whether to return an extra tuple that has the original shape of the image before the transforms. The tuple returned contains (width, height).
Return type: Tuple[Tensor, Tuple[int, int]]

class health_multimodal.image.ImageModel(img_encoder_type, joint_feature_size, freeze_encoder=False, pretrained_model_path=None, **downstream_classifier_kwargs)[source]

Image encoder module

Initializes internal Module state, shared by both nn.Module and ScriptModule.

create_downstream_classifier(**kwargs)[source]

Create the classification module for the downstream task.

Return type: MultiTaskModel

forward(x)[source]

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

Return type: ImageModelOutput

get_patchwise_projected_embeddings(input_img, normalize)[source]

Get patch-wise projected embeddings from the CNN model.

Parameters

input_img (Tensor) – input tensor image [B, C, H, W].
normalize (bool) – If True, the embeddings are L2-normalized.

Returns projected_embeddings

tensor of embeddings in shape [batch, n_patches_h, n_patches_w, feature_size].

Return type

Tensor

train(mode=True)[source]

Switch the model between training and evaluation modes.

Return type: Any

health_multimodal.image.get_image_inference(image_model_type=ImageModelType.BIOVIL_T)[source]

Create a ImageInferenceEngine for the image model.

Parameters: image_model_type (ImageModelType) – The type of image model to use, BIOVIL or BIOVIL_T.

The model is downloaded from the Hugging Face Hub. The engine can be used to get embeddings from text prompts or masked token predictions.

Return type: ImageInferenceEngine