health_multimodal.text.model.modelling_cxrbert

Classes

BertProjectionHead(config)

Projection head to be used with BERT CLS token.

CXRBertModel(config)

Implements the CXR-BERT model outlined in the manuscript: Boecking et al. “Making the Most of Text Semantics to Improve Biomedical Vision-Language Processing”, 2022 https://link.springer.com/chapter/10.1007/978-3-031-20059-5_1.

CXRBertOutput

class health_multimodal.text.model.modelling_cxrbert.BertProjectionHead(config)[source]

Projection head to be used with BERT CLS token.

This is similar to BertPredictionHeadTransform in HuggingFace.

Parameters

config (CXRBertConfig) – Configuration for BERT.

Initializes internal Module state, shared by both nn.Module and ScriptModule.

forward(hidden_states)[source]

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

Return type

Tensor

class health_multimodal.text.model.modelling_cxrbert.CXRBertModel(config)[source]

Implements the CXR-BERT model outlined in the manuscript: Boecking et al. “Making the Most of Text Semantics to Improve Biomedical Vision-Language Processing”, 2022 https://link.springer.com/chapter/10.1007/978-3-031-20059-5_1

Extends the HuggingFace BertForMaskedLM model by adding a separate projection head. The projection “[CLS]” token is used to align the latent vectors of image and text modalities.

Initializes internal Module state, shared by both nn.Module and ScriptModule.

config_class

alias of health_multimodal.text.model.configuration_cxrbert.CXRBertConfig

forward(input_ids, attention_mask, token_type_ids=None, position_ids=None, head_mask=None, inputs_embeds=None, output_attentions=None, output_hidden_states=None, output_cls_projected_embedding=None, return_dict=None, **kwargs)[source]

The [BertForMaskedLM] forward method, overrides the __call__ special method.

<Tip>

Although the recipe for forward pass needs to be defined within this function, one should call the [Module] instance afterwards instead of this since the former takes care of running the pre and post processing steps while the latter silently ignores them.

</Tip>

Parameters
  • input_ids (torch.LongTensor of shape (batch_size, sequence_length)) –

    Indices of input sequence tokens in the vocabulary.

    Indices can be obtained using [BertTokenizer]. See [PreTrainedTokenizer.encode] and [PreTrainedTokenizer.__call__] for details.

    [What are input IDs?](../glossary#input-ids)

  • attention_mask (torch.FloatTensor of shape (batch_size, sequence_length), optional) –

    Mask to avoid performing attention on padding token indices. Mask values selected in [0, 1]:

    • 1 for tokens that are not masked,

    • 0 for tokens that are masked.

    [What are attention masks?](../glossary#attention-mask)

  • token_type_ids (torch.LongTensor of shape (batch_size, sequence_length), optional) –

    Segment token indices to indicate first and second portions of the inputs. Indices are selected in [0, 1]:

    • 0 corresponds to a sentence A token,

    • 1 corresponds to a sentence B token.

    [What are token type IDs?](../glossary#token-type-ids)

  • position_ids (torch.LongTensor of shape (batch_size, sequence_length), optional) –

    Indices of positions of each input sequence tokens in the position embeddings. Selected in the range [0, config.max_position_embeddings - 1].

    [What are position IDs?](../glossary#position-ids)

  • head_mask (torch.FloatTensor of shape (num_heads,) or (num_layers, num_heads), optional) –

    Mask to nullify selected heads of the self-attention modules. Mask values selected in [0, 1]:

    • 1 indicates the head is not masked,

    • 0 indicates the head is masked.

  • inputs_embeds (torch.FloatTensor of shape (batch_size, sequence_length, hidden_size), optional) – Optionally, instead of passing input_ids you can choose to directly pass an embedded representation. This is useful if you want more control over how to convert input_ids indices into associated vectors than the model’s internal embedding lookup matrix.

  • output_attentions (bool, optional) – Whether or not to return the attentions tensors of all attention layers. See attentions under returned tensors for more detail.

  • output_hidden_states (bool, optional) – Whether or not to return the hidden states of all layers. See hidden_states under returned tensors for more detail.

  • return_dict (bool, optional) – Whether or not to return a [~file_utils.ModelOutput] instead of a plain tuple.

  • labels (torch.LongTensor of shape (batch_size, sequence_length), optional) – Labels for computing the masked language modeling loss. Indices should be in [-100, 0, …, config.vocab_size] (see input_ids docstring) Tokens with indices set to -100 are ignored (masked), the loss is only computed for the tokens with labels in [0, …, config.vocab_size]

Returns

A [transformers.modeling_outputs.MaskedLMOutput] or a tuple of torch.FloatTensor (if return_dict=False is passed or when config.return_dict=False) comprising various elements depending on the configuration ([BertConfig]) and inputs.

  • loss (torch.FloatTensor of shape (1,), optional, returned when labels is provided) – Masked language modeling (MLM) loss.

  • logits (torch.FloatTensor of shape (batch_size, sequence_length, config.vocab_size)) – Prediction scores of the language modeling head (scores for each vocabulary token before SoftMax).

  • hidden_states (tuple(torch.FloatTensor), optional, returned when output_hidden_states=True is passed or when config.output_hidden_states=True) – Tuple of torch.FloatTensor (one for the output of the embeddings + one for the output of each layer) of shape (batch_size, sequence_length, hidden_size).

    Hidden-states of the model at the output of each layer plus the initial embedding outputs.

  • attentions (tuple(torch.FloatTensor), optional, returned when output_attentions=True is passed or when config.output_attentions=True) – Tuple of torch.FloatTensor (one for each layer) of shape (batch_size, num_heads, sequence_length, sequence_length).

    Attentions weights after the attention softmax, used to compute the weighted average in the self-attention heads.

Return type

[transformers.modeling_outputs.MaskedLMOutput] or tuple(torch.FloatTensor)

Example:

```python >>> from transformers import BertTokenizer, BertForMaskedLM >>> import torch

>>> tokenizer = BertTokenizer.from_pretrained("bert-base-uncased")
>>> model = BertForMaskedLM.from_pretrained("bert-base-uncased")
>>> inputs = tokenizer("The capital of France is [MASK].", return_tensors="pt")
>>> labels = tokenizer("The capital of France is Paris.", return_tensors="pt")["input_ids"]
>>> outputs = model(**inputs, labels=labels)
>>> loss = outputs.loss
>>> logits = outputs.logits
```
get_projected_text_embeddings(input_ids, attention_mask, normalize_embeddings=True)[source]

Returns l2-normalised projected cls token embeddings for the given input token ids and attention mask. The joint latent space is trained using a contrastive objective between image and text data modalities.

Parameters
  • input_ids (Tensor) – (batch_size, sequence_length)

  • attention_mask (Tensor) – (batch_size, sequence_length)

  • normalize_embeddings (bool) – Whether to l2-normalise the embeddings.

Return type

Tensor

Returns

(batch_size, projection_size)

class health_multimodal.text.model.modelling_cxrbert.CXRBertOutput[source]