health_multimodal.image

Image-related tools

inference_engine

utils

io

transforms

encoder

model

modules

resnet

transformer

types

class health_multimodal.image.BaseImageModel[source]

Abstract class for image models.

Initializes internal Module state, shared by both nn.Module and ScriptModule.

abstract forward(*args, **kwargs)[source]

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

Return type

ImageModelOutput

class health_multimodal.image.ImageEncoderType(value)[source]

An enumeration.

class health_multimodal.image.ImageInferenceEngine(image_model, transform)[source]

Encapsulate inference-time operations on an image model.

Parameters
  • img_model – Trained image model

  • transform (Compose) – Transform to apply to the image after loading. Must return a torch.Tensor that can be input directly to the image model.

get_projected_global_embedding(image_path)[source]

Compute global image embedding in the joint latent space.

Parameters

image_path (Path) – Path to the image to compute embeddings for.

Return type

Tensor

Returns

Torch tensor containing l2-normalised global image embedding [joint_feature_dim,] where joint_feature_dim is the dimensionality of the joint latent space.

get_projected_patch_embeddings(image_path)[source]

Compute image patch embeddings in the joint latent space, preserving the image grid.

Parameters

image_path (Path) – Path to the image to compute embeddings for.

Return type

Tuple[Tensor, Tuple[int, int]]

Returns

A tuple containing the image patch embeddings and the shape of the original image (width, height) before applying transforms.

load_and_transform_input_image(image_path, transform)[source]

Read an image and apply the transform to it.

  1. Read the image from the given path

  2. Apply transform

  3. Add the batch dimension

  4. Move to the correct device

Parameters

return_original_shape – Whether to return an extra tuple that has the original shape of the image before the transforms. The tuple returned contains (width, height).

Return type

Tuple[Tensor, Tuple[int, int]]

class health_multimodal.image.ImageModel(img_encoder_type, joint_feature_size, freeze_encoder=False, pretrained_model_path=None, **downstream_classifier_kwargs)[source]

Image encoder module

Initializes internal Module state, shared by both nn.Module and ScriptModule.

create_downstream_classifier(**kwargs)[source]

Create the classification module for the downstream task.

Return type

MultiTaskModel

forward(x)[source]

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

Return type

ImageModelOutput

get_patchwise_projected_embeddings(input_img, normalize)[source]

Get patch-wise projected embeddings from the CNN model.

Parameters
  • input_img (Tensor) – input tensor image [B, C, H, W].

  • normalize (bool) – If True, the embeddings are L2-normalized.

Returns projected_embeddings

tensor of embeddings in shape [batch, n_patches_h, n_patches_w, feature_size].

Return type

Tensor

train(mode=True)[source]

Switch the model between training and evaluation modes.

Return type

Any

health_multimodal.image.get_image_inference(image_model_type=ImageModelType.BIOVIL_T)[source]

Create a ImageInferenceEngine for the image model.

Parameters

image_model_type (ImageModelType) – The type of image model to use, BIOVIL or BIOVIL_T.

The model is downloaded from the Hugging Face Hub. The engine can be used to get embeddings from text prompts or masked token predictions.

Return type

ImageInferenceEngine