Natural Robustness metric
Natural Robustness measures a language model's ability to provide consistent responses across naturally varied inputs, such as paraphrasing and minor typos, helping evaluate model stability and reliability.
Metric details
The Natural Robustness metric evaluates a model's ability to maintain response consistency when presented with natural variations in input text. These variations include benign changes such as adjustments to sentence structure, linguistic style, minor noise, paraphrasing, and other non-malicious modifications. This is distinct from the Adversarial Robustness metric, which focuses on deliberately crafted, harmful perturbations.
Scope
To compute the Natural Robustness score, a representative subset of the test dataset is sampled—by default, five records are selected. Users can configure the sample size during the metric setup.
For each selected input, multiple types of perturbations are generated:
-
Naive perturbations: Small, surface-level changes such as removing or altering punctuation, modifying letter casing, or introducing natural typos.
-
Paraphrasing: Rewriting the input using semantically equivalent expressions.
-
The model’s responses to these perturbed inputs are compared against the original response generated for the unmodified input. The metric quantifies the proportion of input perturbations that result in a meaningfully different output.
Scores and Values
- Natural Robustness Score = (Number of perturbations causing different responses)/(Total number of perturbations)
-
Score range: 0.0 to 1.0
-
A score close to 1.0 indicates that the model is robust and produces stable responses across naturally varied inputs.
-
A score closer to 0.0 suggests that the model is sensitive to minor, non-malicious changes, potentially impacting user trust and experience.
-
-
Natural Robustness helps assess how well a model generalizes across common real-world input variability, providing insight into its reliability and stability.
Parent topic: Evaluation metrics