speechbrain.utils.semdist module

Provides a metrics class for the SemDist metric.

Authors * Sylvain de Langen 2024

Summary

Classes:

`BaseSemDistStats`	Base class to implement the SemDist metric, for the variants that estimate a single cosine similarity per pair of target and predicted texts.
`SemDistStats`	Computes the SemDist metric with a provided HuggingFace Transformers text encoder.

Reference

class speechbrain.utils.semdist.BaseSemDistStats(embed_function: Callable[[List[str]], Tensor], scale: float = 1000.0, batch_size: int = 64)[source]

Bases: MetricStats

Base class to implement the SemDist metric, for the variants that estimate a single cosine similarity per pair of target and predicted texts. The SemDist metrics are described by the paper Evaluating User Perception of Speech Recognition System Quality with Semantic Distance Metric.

Parameters:

embed_function (Callable[[List[str]], torch.Tensor]) – Given a list of sentences, return their summarized embedding using the method of your choice (e.g. mean pooling)
scale (float, optional) – The α scale applied to the cosine similarity result for clarity. The default is 1000, in order to match the authors’ recommendation.
batch_size (int, optional) – How many pairs of utterances should be considered at once. Higher is faster but may result in OOM.

clear()[source]: Clears the collected metrics

append(ids, predict, target)[source]

Appends inputs, predictions and targets to internal lists

Parameters:

ids (list) – the string IDs for the samples
predictions (list) – the model’s predictions in tokenizable format
targets (list) – the ground truths in tokenizable format

summarize(field=None)[source]

Summarize the SemDist metric scores. Performs the actual embedding function call and SemDist calculation.

Full set of fields: - semdist: The average SemDist over all utterances, multiplied by

the scale optionally specified at initialization.

Additionally, a scores list is populated by this function for each pair of sentences. Each entry of that list is a dict, with the fields: - key: the ID of the utterance. - semdist: The SemDist of the utterance, multiplied by the scale.

Parameters:

field (str, optional) – The field to return, if you are only interested in one of them. If specified, a single float is returned, otherwise, a dict is.

Returns:

dict from str to float, if field is None – A dictionary of the fields documented above.
float, if field is not None – The single field selected by field.

class speechbrain.utils.semdist.SemDistStats(lm, method: Literal['meanpool', 'cls'] = 'meanpool', *args, **kwargs)[source]

Bases: BaseSemDistStats

Computes the SemDist metric with a provided HuggingFace Transformers text encoder.

Parameters:

lm (speechbrain.lobes.models.huggingface_transformers.TextEncoder) – HF Transformers tokenizer and text encoder wrapper to use as a LM.
method ("meanpool" or "cls") –
- "meanpool" (default): Computes the mean of all contextualized embeddings, excluding padding tokens.
- "cls": Exclusively uses the first contextualized embedding, which with BERT-like tokenizers is the [CLS] token, which is typically intended to capture classification information.
*args – Extra positional arguments passed to the base constructor.
**kwargs – Extra keyword arguments passed to the base constructor.