Keyword inclusion evaluation metric

Last updated: May 08, 2025

The keyword inclusion metric measures the similarity of nouns and pronouns between the foundation model output and the reference or ground truth.

Metric details

Keyword inclusion is a metric that measures how well your model generates text that matches certain phrases or keywords in the reference or ground truth. The metric is available only when you use the Python SDK to calculate evaluation metrics. For more information, see Computing Adversarial robustness and Prompt Leakage Risk using IBM watsonx.governance.

Scope

The keyword inclusion metric evaluates generative AI assets only.

Types of AI assets: Prompt templates
Generative AI tasks:
- Text summarization
- Question answering
- Retrieval augmented generation (RAG)
Supported languages: English

Scores and values

The keyword inclusion metric score indicates the proportion of keywords that exist in the generated output and the reference or ground truth.

Range of values: 0.0-1.0
Best possible score: 1.0
Ratios:
- At 0: No similar keywords are included in the output
- Over 0: Increasing amount of similar keywords are included in the output.

Parent topic: Evaluation metrics

Was the topic helpful?

0/1000

Metric detailsCopy link to section

ScopeCopy link to section

Scores and valuesCopy link to section

Metric details

Scope

Scores and values