health_multimodal.text

Text-related tools

`inference_engine`
`utils`

io

`configuration_cxrbert`
`modelling_cxrbert`

class health_multimodal.text.CXRBertConfig(projection_size=128, **kwargs)[source]

Config class for CXR-BERT model.

Parameters: projection_size (int) – Dimensionality of the joint latent space.

class health_multimodal.text.CXRBertModel(config)[source]

Implements the CXR-BERT model outlined in the manuscript: Boecking et al. “Making the Most of Text Semantics to Improve Biomedical Vision-Language Processing”, 2022 https://link.springer.com/chapter/10.1007/978-3-031-20059-5_1

Extends the HuggingFace BertForMaskedLM model by adding a separate projection head. The projection “[CLS]” token is used to align the latent vectors of image and text modalities.

Initializes internal Module state, shared by both nn.Module and ScriptModule.

config_class: alias of health_multimodal.text.model.configuration_cxrbert.CXRBertConfig

forward(input_ids, attention_mask, token_type_ids=None, position_ids=None, head_mask=None, inputs_embeds=None, output_attentions=None, output_hidden_states=None, output_cls_projected_embedding=None, return_dict=None, **kwargs)[source]

The [BertForMaskedLM] forward method, overrides the __call__ special method.

<Tip>

Although the recipe for forward pass needs to be defined within this function, one should call the [Module] instance afterwards instead of this since the former takes care of running the pre and post processing steps while the latter silently ignores them.

</Tip>

Parameters

input_ids (torch.LongTensor of shape (batch_size, sequence_length)) –
Indices of input sequence tokens in the vocabulary.

Indices can be obtained using [BertTokenizer]. See [PreTrainedTokenizer.encode] and [PreTrainedTokenizer.__call__] for details.

[What are input IDs?](../glossary#input-ids)
attention_mask (torch.FloatTensor of shape (batch_size, sequence_length), optional) –
Mask to avoid performing attention on padding token indices. Mask values selected in [0, 1]:
- 1 for tokens that are not masked,
- 0 for tokens that are masked.
[What are attention masks?](../glossary#attention-mask)
token_type_ids (torch.LongTensor of shape (batch_size, sequence_length), optional) –
Segment token indices to indicate first and second portions of the inputs. Indices are selected in [0, 1]:
- 0 corresponds to a sentence A token,
- 1 corresponds to a sentence B token.
[What are token type IDs?](../glossary#token-type-ids)
position_ids (torch.LongTensor of shape (batch_size, sequence_length), optional) –
Indices of positions of each input sequence tokens in the position embeddings. Selected in the range [0, config.max_position_embeddings - 1].

[What are position IDs?](../glossary#position-ids)
head_mask (torch.FloatTensor of shape (num_heads,) or (num_layers, num_heads), optional) –
Mask to nullify selected heads of the self-attention modules. Mask values selected in [0, 1]:
- 1 indicates the head is not masked,
- 0 indicates the head is masked.
inputs_embeds (torch.FloatTensor of shape (batch_size, sequence_length, hidden_size), optional) – Optionally, instead of passing input_ids you can choose to directly pass an embedded representation. This is useful if you want more control over how to convert input_ids indices into associated vectors than the model’s internal embedding lookup matrix.
output_attentions (bool, optional) – Whether or not to return the attentions tensors of all attention layers. See attentions under returned tensors for more detail.
output_hidden_states (bool, optional) – Whether or not to return the hidden states of all layers. See hidden_states under returned tensors for more detail.
return_dict (bool, optional) – Whether or not to return a [~file_utils.ModelOutput] instead of a plain tuple.
labels (torch.LongTensor of shape (batch_size, sequence_length), optional) – Labels for computing the masked language modeling loss. Indices should be in [-100, 0, …, config.vocab_size] (see input_ids docstring) Tokens with indices set to -100 are ignored (masked), the loss is only computed for the tokens with labels in [0, …, config.vocab_size]

Returns

A [transformers.modeling_outputs.MaskedLMOutput] or a tuple of torch.FloatTensor (if return_dict=False is passed or when config.return_dict=False) comprising various elements depending on the configuration ([BertConfig]) and inputs.

loss (torch.FloatTensor of shape (1,), optional, returned when labels is provided) – Masked language modeling (MLM) loss.
logits (torch.FloatTensor of shape (batch_size, sequence_length, config.vocab_size)) – Prediction scores of the language modeling head (scores for each vocabulary token before SoftMax).
hidden_states (tuple(torch.FloatTensor), optional, returned when output_hidden_states=True is passed or when config.output_hidden_states=True) – Tuple of torch.FloatTensor (one for the output of the embeddings + one for the output of each layer) of shape (batch_size, sequence_length, hidden_size).

Hidden-states of the model at the output of each layer plus the initial embedding outputs.
attentions (tuple(torch.FloatTensor), optional, returned when output_attentions=True is passed or when config.output_attentions=True) – Tuple of torch.FloatTensor (one for each layer) of shape (batch_size, num_heads, sequence_length, sequence_length).

Attentions weights after the attention softmax, used to compute the weighted average in the self-attention heads.

Return type

[transformers.modeling_outputs.MaskedLMOutput] or tuple(torch.FloatTensor)

Example:

```python >>> from transformers import BertTokenizer, BertForMaskedLM >>> import torch

>>> tokenizer = BertTokenizer.from_pretrained("bert-base-uncased")
>>> model = BertForMaskedLM.from_pretrained("bert-base-uncased")

>>> inputs = tokenizer("The capital of France is [MASK].", return_tensors="pt")
>>> labels = tokenizer("The capital of France is Paris.", return_tensors="pt")["input_ids"]

>>> outputs = model(**inputs, labels=labels)
>>> loss = outputs.loss
>>> logits = outputs.logits
```

get_projected_text_embeddings(input_ids, attention_mask, normalize_embeddings=True)[source]

Returns l2-normalised projected cls token embeddings for the given input token ids and attention mask. The joint latent space is trained using a contrastive objective between image and text data modalities.

Parameters

input_ids (Tensor) – (batch_size, sequence_length)
attention_mask (Tensor) – (batch_size, sequence_length)
normalize_embeddings (bool) – Whether to l2-normalise the embeddings.

Return type

Tensor

Returns

(batch_size, projection_size)

class health_multimodal.text.CXRBertOutput[source]

class health_multimodal.text.CXRBertTokenizer(**kwargs)[source]

class health_multimodal.text.TextInferenceEngine(tokenizer, text_model)[source]

Text inference class that implements functionalities required to extract sentence embedding, similarity and MLM prediction tasks.

Parameters

tokenizer (BertTokenizer) – A BertTokenizer object.
text_model (BertForMaskedLM) – Text model either default HuggingFace class

get_embeddings_from_prompt(prompts, normalize=True, verbose=True)[source]

Generate L2-normalised embeddings for a list of input text prompts.

Parameters

prompts (Union[str, List[str]]) – Input text prompt(s) either in string or list of string format.
normalize (bool) – If True, L2-normalise the embeddings.
verbose (bool) – If set to True, tokenized words are displayed in the console.

Return type

Tensor

Returns

Tensor of shape (batch_size, embedding_size).

get_pairwise_similarities(prompt_set_1, prompt_set_2)[source]

Compute pairwise cosine similarities between the embeddings of the given prompts.

Return type: Tensor

is_in_eval()[source]

Returns True if the model is in eval mode.

Return type: bool

predict_masked_tokens(prompts)[source]

Predict masked tokens for a single or list of input text prompts.

Requires models to be trained with a MLM prediction head.

Parameters: prompts (Union[str, List[str]]) – Input text prompt(s) either in string or list of string format.
Return type: List[List[str]]
Returns: Predicted token candidates (Top-1) at masked position.

tokenize_input_prompts(prompts, verbose=True)[source]

Tokenizes the input sentence(s) and adds special tokens as defined by the tokenizer. :type prompts: Union[str, List[str]] :param prompts: Either a string containing a single sentence, or a list of strings each containing

a single sentence. Note that this method will not correctly tokenize multiple sentences if they are input as a single string.

Parameters: verbose (bool) – If set to True, will log the sentence after tokenization.
Return type: Any
Returns: A 2D tensor containing the tokenized sentences

health_multimodal.text.get_bert_inference(bert_encoder_type=BertEncoderType.BIOVIL_T_BERT)[source]

Create a TextInferenceEngine for a text encoder model.

Parameters: bert_encoder_type (BertEncoderType) – The type of text encoder model to use, CXR_BERT or BIOVIL_T_BERT.

The model is downloaded from the Hugging Face Hub. The engine can be used to get embeddings from text prompts or masked token predictions.

Return type: TextInferenceEngine