health_multimodal.image
Image-related tools
- class health_multimodal.image.BaseImageModel[source]
Abstract class for image models.
Initializes internal Module state, shared by both nn.Module and ScriptModule.
- abstract forward(*args, **kwargs)[source]
Defines the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.- Return type
- class health_multimodal.image.ImageInferenceEngine(image_model, transform)[source]
Encapsulate inference-time operations on an image model.
- Parameters
img_model – Trained image model
transform (
Compose
) – Transform to apply to the image after loading. Must return a torch.Tensor that can be input directly to the image model.
- get_projected_global_embedding(image_path)[source]
Compute global image embedding in the joint latent space.
- Parameters
image_path (
Path
) – Path to the image to compute embeddings for.- Return type
Tensor
- Returns
Torch tensor containing l2-normalised global image embedding [joint_feature_dim,] where joint_feature_dim is the dimensionality of the joint latent space.
- get_projected_patch_embeddings(image_path)[source]
Compute image patch embeddings in the joint latent space, preserving the image grid.
- Parameters
image_path (
Path
) – Path to the image to compute embeddings for.- Return type
Tuple
[Tensor
,Tuple
[int
,int
]]- Returns
A tuple containing the image patch embeddings and the shape of the original image (width, height) before applying transforms.
- load_and_transform_input_image(image_path, transform)[source]
Read an image and apply the transform to it.
Read the image from the given path
Apply transform
Add the batch dimension
Move to the correct device
- Parameters
return_original_shape – Whether to return an extra tuple that has the original shape of the image before the transforms. The tuple returned contains (width, height).
- Return type
Tuple
[Tensor
,Tuple
[int
,int
]]
- class health_multimodal.image.ImageModel(img_encoder_type, joint_feature_size, freeze_encoder=False, pretrained_model_path=None, **downstream_classifier_kwargs)[source]
Image encoder module
Initializes internal Module state, shared by both nn.Module and ScriptModule.
- create_downstream_classifier(**kwargs)[source]
Create the classification module for the downstream task.
- Return type
- forward(x)[source]
Defines the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.- Return type
- get_patchwise_projected_embeddings(input_img, normalize)[source]
Get patch-wise projected embeddings from the CNN model.
- Parameters
input_img (
Tensor
) – input tensor image [B, C, H, W].normalize (
bool
) – IfTrue
, the embeddings are L2-normalized.
- Returns projected_embeddings
tensor of embeddings in shape [batch, n_patches_h, n_patches_w, feature_size].
- Return type
Tensor
- health_multimodal.image.get_image_inference(image_model_type=ImageModelType.BIOVIL_T)[source]
Create a
ImageInferenceEngine
for the image model.- Parameters
image_model_type (
ImageModelType
) – The type of image model to use, BIOVIL or BIOVIL_T.
The model is downloaded from the Hugging Face Hub. The engine can be used to get embeddings from text prompts or masked token predictions.
- Return type