health_multimodal.text
Text-related tools
- class health_multimodal.text.CXRBertConfig(projection_size=128, **kwargs)[source]
Config class for CXR-BERT model.
- Parameters
projection_size (
int
) – Dimensionality of the joint latent space.
- class health_multimodal.text.CXRBertModel(config)[source]
Implements the CXR-BERT model outlined in the manuscript: Boecking et al. “Making the Most of Text Semantics to Improve Biomedical Vision-Language Processing”, 2022 https://link.springer.com/chapter/10.1007/978-3-031-20059-5_1
Extends the HuggingFace BertForMaskedLM model by adding a separate projection head. The projection “[CLS]” token is used to align the latent vectors of image and text modalities.
Initializes internal Module state, shared by both nn.Module and ScriptModule.
- config_class
alias of
health_multimodal.text.model.configuration_cxrbert.CXRBertConfig
- forward(input_ids, attention_mask, token_type_ids=None, position_ids=None, head_mask=None, inputs_embeds=None, output_attentions=None, output_hidden_states=None, output_cls_projected_embedding=None, return_dict=None, **kwargs)[source]
The [BertForMaskedLM] forward method, overrides the __call__ special method.
<Tip>
Although the recipe for forward pass needs to be defined within this function, one should call the [Module] instance afterwards instead of this since the former takes care of running the pre and post processing steps while the latter silently ignores them.
</Tip>
- Parameters
input_ids (torch.LongTensor of shape (batch_size, sequence_length)) –
Indices of input sequence tokens in the vocabulary.
Indices can be obtained using [BertTokenizer]. See [PreTrainedTokenizer.encode] and [PreTrainedTokenizer.__call__] for details.
[What are input IDs?](../glossary#input-ids)
attention_mask (torch.FloatTensor of shape (batch_size, sequence_length), optional) –
Mask to avoid performing attention on padding token indices. Mask values selected in [0, 1]:
1 for tokens that are not masked,
0 for tokens that are masked.
[What are attention masks?](../glossary#attention-mask)
token_type_ids (torch.LongTensor of shape (batch_size, sequence_length), optional) –
Segment token indices to indicate first and second portions of the inputs. Indices are selected in [0, 1]:
0 corresponds to a sentence A token,
1 corresponds to a sentence B token.
[What are token type IDs?](../glossary#token-type-ids)
position_ids (torch.LongTensor of shape (batch_size, sequence_length), optional) –
Indices of positions of each input sequence tokens in the position embeddings. Selected in the range [0, config.max_position_embeddings - 1].
[What are position IDs?](../glossary#position-ids)
head_mask (torch.FloatTensor of shape (num_heads,) or (num_layers, num_heads), optional) –
Mask to nullify selected heads of the self-attention modules. Mask values selected in [0, 1]:
1 indicates the head is not masked,
0 indicates the head is masked.
inputs_embeds (torch.FloatTensor of shape (batch_size, sequence_length, hidden_size), optional) – Optionally, instead of passing input_ids you can choose to directly pass an embedded representation. This is useful if you want more control over how to convert input_ids indices into associated vectors than the model’s internal embedding lookup matrix.
output_attentions (bool, optional) – Whether or not to return the attentions tensors of all attention layers. See attentions under returned tensors for more detail.
output_hidden_states (bool, optional) – Whether or not to return the hidden states of all layers. See hidden_states under returned tensors for more detail.
return_dict (bool, optional) – Whether or not to return a [~file_utils.ModelOutput] instead of a plain tuple.
labels (torch.LongTensor of shape (batch_size, sequence_length), optional) – Labels for computing the masked language modeling loss. Indices should be in [-100, 0, …, config.vocab_size] (see input_ids docstring) Tokens with indices set to -100 are ignored (masked), the loss is only computed for the tokens with labels in [0, …, config.vocab_size]
- Returns
A [transformers.modeling_outputs.MaskedLMOutput] or a tuple of torch.FloatTensor (if return_dict=False is passed or when config.return_dict=False) comprising various elements depending on the configuration ([BertConfig]) and inputs.
loss (torch.FloatTensor of shape (1,), optional, returned when labels is provided) – Masked language modeling (MLM) loss.
logits (torch.FloatTensor of shape (batch_size, sequence_length, config.vocab_size)) – Prediction scores of the language modeling head (scores for each vocabulary token before SoftMax).
hidden_states (tuple(torch.FloatTensor), optional, returned when output_hidden_states=True is passed or when config.output_hidden_states=True) – Tuple of torch.FloatTensor (one for the output of the embeddings + one for the output of each layer) of shape (batch_size, sequence_length, hidden_size).
Hidden-states of the model at the output of each layer plus the initial embedding outputs.
attentions (tuple(torch.FloatTensor), optional, returned when output_attentions=True is passed or when config.output_attentions=True) – Tuple of torch.FloatTensor (one for each layer) of shape (batch_size, num_heads, sequence_length, sequence_length).
Attentions weights after the attention softmax, used to compute the weighted average in the self-attention heads.
- Return type
[transformers.modeling_outputs.MaskedLMOutput] or tuple(torch.FloatTensor)
Example:
```python >>> from transformers import BertTokenizer, BertForMaskedLM >>> import torch
>>> tokenizer = BertTokenizer.from_pretrained("bert-base-uncased") >>> model = BertForMaskedLM.from_pretrained("bert-base-uncased")
>>> inputs = tokenizer("The capital of France is [MASK].", return_tensors="pt") >>> labels = tokenizer("The capital of France is Paris.", return_tensors="pt")["input_ids"]
>>> outputs = model(**inputs, labels=labels) >>> loss = outputs.loss >>> logits = outputs.logits ```
- get_projected_text_embeddings(input_ids, attention_mask, normalize_embeddings=True)[source]
Returns l2-normalised projected cls token embeddings for the given input token ids and attention mask. The joint latent space is trained using a contrastive objective between image and text data modalities.
- Parameters
input_ids (
Tensor
) – (batch_size, sequence_length)attention_mask (
Tensor
) – (batch_size, sequence_length)normalize_embeddings (
bool
) – Whether to l2-normalise the embeddings.
- Return type
Tensor
- Returns
(batch_size, projection_size)
- class health_multimodal.text.TextInferenceEngine(tokenizer, text_model)[source]
Text inference class that implements functionalities required to extract sentence embedding, similarity and MLM prediction tasks.
- Parameters
tokenizer (
BertTokenizer
) – A BertTokenizer object.text_model (
BertForMaskedLM
) – Text model either default HuggingFace class
- get_embeddings_from_prompt(prompts, normalize=True, verbose=True)[source]
Generate L2-normalised embeddings for a list of input text prompts.
- Parameters
prompts (
Union
[str
,List
[str
]]) – Input text prompt(s) either in string or list of string format.normalize (
bool
) – If True, L2-normalise the embeddings.verbose (
bool
) – If set to True, tokenized words are displayed in the console.
- Return type
Tensor
- Returns
Tensor of shape (batch_size, embedding_size).
- get_pairwise_similarities(prompt_set_1, prompt_set_2)[source]
Compute pairwise cosine similarities between the embeddings of the given prompts.
- Return type
Tensor
- predict_masked_tokens(prompts)[source]
Predict masked tokens for a single or list of input text prompts.
Requires models to be trained with a MLM prediction head.
- Parameters
prompts (
Union
[str
,List
[str
]]) – Input text prompt(s) either in string or list of string format.- Return type
List
[List
[str
]]- Returns
Predicted token candidates (Top-1) at masked position.
- tokenize_input_prompts(prompts, verbose=True)[source]
Tokenizes the input sentence(s) and adds special tokens as defined by the tokenizer. :type prompts:
Union
[str
,List
[str
]] :param prompts: Either a string containing a single sentence, or a list of strings each containinga single sentence. Note that this method will not correctly tokenize multiple sentences if they are input as a single string.
- Parameters
verbose (
bool
) – If set to True, will log the sentence after tokenization.- Return type
Any
- Returns
A 2D tensor containing the tokenized sentences
- health_multimodal.text.get_bert_inference(bert_encoder_type=BertEncoderType.BIOVIL_T_BERT)[source]
Create a
TextInferenceEngine
for a text encoder model.- Parameters
bert_encoder_type (
BertEncoderType
) – The type of text encoder model to use, CXR_BERT or BIOVIL_T_BERT.
The model is downloaded from the Hugging Face Hub. The engine can be used to get embeddings from text prompts or masked token predictions.
- Return type