health_multimodal.text

Text-related tools

inference_engine

utils

io

configuration_cxrbert

modelling_cxrbert

class health_multimodal.text.CXRBertConfig(projection_size=128, **kwargs)[source]

Config class for CXR-BERT model.

Parameters

projection_size (int) – Dimensionality of the joint latent space.

class health_multimodal.text.CXRBertModel(config)[source]

Implements the CXR-BERT model outlined in the manuscript: Boecking et al. “Making the Most of Text Semantics to Improve Biomedical Vision-Language Processing”, 2022 https://link.springer.com/chapter/10.1007/978-3-031-20059-5_1

Extends the HuggingFace BertForMaskedLM model by adding a separate projection head. The projection “[CLS]” token is used to align the latent vectors of image and text modalities.

Initializes internal Module state, shared by both nn.Module and ScriptModule.

config_class

alias of health_multimodal.text.model.configuration_cxrbert.CXRBertConfig

forward(input_ids, attention_mask, token_type_ids=None, position_ids=None, head_mask=None, inputs_embeds=None, output_attentions=None, output_hidden_states=None, output_cls_projected_embedding=None, return_dict=None, **kwargs)[source]

The [BertForMaskedLM] forward method, overrides the __call__ special method.

<Tip>

Although the recipe for forward pass needs to be defined within this function, one should call the [Module] instance afterwards instead of this since the former takes care of running the pre and post processing steps while the latter silently ignores them.

</Tip>

Parameters
  • input_ids (torch.LongTensor of shape (batch_size, sequence_length)) –

    Indices of input sequence tokens in the vocabulary.

    Indices can be obtained using [BertTokenizer]. See [PreTrainedTokenizer.encode] and [PreTrainedTokenizer.__call__] for details.

    [What are input IDs?](../glossary#input-ids)

  • attention_mask (torch.FloatTensor of shape (batch_size, sequence_length), optional) –

    Mask to avoid performing attention on padding token indices. Mask values selected in [0, 1]:

    • 1 for tokens that are not masked,

    • 0 for tokens that are masked.

    [What are attention masks?](../glossary#attention-mask)

  • token_type_ids (torch.LongTensor of shape (batch_size, sequence_length), optional) –

    Segment token indices to indicate first and second portions of the inputs. Indices are selected in [0, 1]:

    • 0 corresponds to a sentence A token,

    • 1 corresponds to a sentence B token.

    [What are token type IDs?](../glossary#token-type-ids)

  • position_ids (torch.LongTensor of shape (batch_size, sequence_length), optional) –

    Indices of positions of each input sequence tokens in the position embeddings. Selected in the range [0, config.max_position_embeddings - 1].

    [What are position IDs?](../glossary#position-ids)

  • head_mask (torch.FloatTensor of shape (num_heads,) or (num_layers, num_heads), optional) –

    Mask to nullify selected heads of the self-attention modules. Mask values selected in [0, 1]:

    • 1 indicates the head is not masked,

    • 0 indicates the head is masked.

  • inputs_embeds (torch.FloatTensor of shape (batch_size, sequence_length, hidden_size), optional) – Optionally, instead of passing input_ids you can choose to directly pass an embedded representation. This is useful if you want more control over how to convert input_ids indices into associated vectors than the model’s internal embedding lookup matrix.

  • output_attentions (bool, optional) – Whether or not to return the attentions tensors of all attention layers. See attentions under returned tensors for more detail.

  • output_hidden_states (bool, optional) – Whether or not to return the hidden states of all layers. See hidden_states under returned tensors for more detail.

  • return_dict (bool, optional) – Whether or not to return a [~file_utils.ModelOutput] instead of a plain tuple.

  • labels (torch.LongTensor of shape (batch_size, sequence_length), optional) – Labels for computing the masked language modeling loss. Indices should be in [-100, 0, …, config.vocab_size] (see input_ids docstring) Tokens with indices set to -100 are ignored (masked), the loss is only computed for the tokens with labels in [0, …, config.vocab_size]

Returns

A [transformers.modeling_outputs.MaskedLMOutput] or a tuple of torch.FloatTensor (if return_dict=False is passed or when config.return_dict=False) comprising various elements depending on the configuration ([BertConfig]) and inputs.

  • loss (torch.FloatTensor of shape (1,), optional, returned when labels is provided) – Masked language modeling (MLM) loss.

  • logits (torch.FloatTensor of shape (batch_size, sequence_length, config.vocab_size)) – Prediction scores of the language modeling head (scores for each vocabulary token before SoftMax).

  • hidden_states (tuple(torch.FloatTensor), optional, returned when output_hidden_states=True is passed or when config.output_hidden_states=True) – Tuple of torch.FloatTensor (one for the output of the embeddings + one for the output of each layer) of shape (batch_size, sequence_length, hidden_size).

    Hidden-states of the model at the output of each layer plus the initial embedding outputs.

  • attentions (tuple(torch.FloatTensor), optional, returned when output_attentions=True is passed or when config.output_attentions=True) – Tuple of torch.FloatTensor (one for each layer) of shape (batch_size, num_heads, sequence_length, sequence_length).

    Attentions weights after the attention softmax, used to compute the weighted average in the self-attention heads.

Return type

[transformers.modeling_outputs.MaskedLMOutput] or tuple(torch.FloatTensor)

Example:

```python >>> from transformers import BertTokenizer, BertForMaskedLM >>> import torch

>>> tokenizer = BertTokenizer.from_pretrained("bert-base-uncased")
>>> model = BertForMaskedLM.from_pretrained("bert-base-uncased")
>>> inputs = tokenizer("The capital of France is [MASK].", return_tensors="pt")
>>> labels = tokenizer("The capital of France is Paris.", return_tensors="pt")["input_ids"]
>>> outputs = model(**inputs, labels=labels)
>>> loss = outputs.loss
>>> logits = outputs.logits
```
get_projected_text_embeddings(input_ids, attention_mask, normalize_embeddings=True)[source]

Returns l2-normalised projected cls token embeddings for the given input token ids and attention mask. The joint latent space is trained using a contrastive objective between image and text data modalities.

Parameters
  • input_ids (Tensor) – (batch_size, sequence_length)

  • attention_mask (Tensor) – (batch_size, sequence_length)

  • normalize_embeddings (bool) – Whether to l2-normalise the embeddings.

Return type

Tensor

Returns

(batch_size, projection_size)

class health_multimodal.text.CXRBertOutput[source]
class health_multimodal.text.CXRBertTokenizer(**kwargs)[source]
class health_multimodal.text.TextInferenceEngine(tokenizer, text_model)[source]

Text inference class that implements functionalities required to extract sentence embedding, similarity and MLM prediction tasks.

Parameters
  • tokenizer (BertTokenizer) – A BertTokenizer object.

  • text_model (BertForMaskedLM) – Text model either default HuggingFace class

get_embeddings_from_prompt(prompts, normalize=True, verbose=True)[source]

Generate L2-normalised embeddings for a list of input text prompts.

Parameters
  • prompts (Union[str, List[str]]) – Input text prompt(s) either in string or list of string format.

  • normalize (bool) – If True, L2-normalise the embeddings.

  • verbose (bool) – If set to True, tokenized words are displayed in the console.

Return type

Tensor

Returns

Tensor of shape (batch_size, embedding_size).

get_pairwise_similarities(prompt_set_1, prompt_set_2)[source]

Compute pairwise cosine similarities between the embeddings of the given prompts.

Return type

Tensor

is_in_eval()[source]

Returns True if the model is in eval mode.

Return type

bool

predict_masked_tokens(prompts)[source]

Predict masked tokens for a single or list of input text prompts.

Requires models to be trained with a MLM prediction head.

Parameters

prompts (Union[str, List[str]]) – Input text prompt(s) either in string or list of string format.

Return type

List[List[str]]

Returns

Predicted token candidates (Top-1) at masked position.

tokenize_input_prompts(prompts, verbose=True)[source]

Tokenizes the input sentence(s) and adds special tokens as defined by the tokenizer. :type prompts: Union[str, List[str]] :param prompts: Either a string containing a single sentence, or a list of strings each containing

a single sentence. Note that this method will not correctly tokenize multiple sentences if they are input as a single string.

Parameters

verbose (bool) – If set to True, will log the sentence after tokenization.

Return type

Any

Returns

A 2D tensor containing the tokenized sentences

health_multimodal.text.get_bert_inference(bert_encoder_type=BertEncoderType.BIOVIL_T_BERT)[source]

Create a TextInferenceEngine for a text encoder model.

Parameters

bert_encoder_type (BertEncoderType) – The type of text encoder model to use, CXR_BERT or BIOVIL_T_BERT.

The model is downloaded from the Hugging Face Hub. The engine can be used to get embeddings from text prompts or masked token predictions.

Return type

TextInferenceEngine