ROUGE evaluation metric

Last updated: May 08, 2025

The ROUGE metric measures how well generated summaries or translations compare to reference outputs.

Metric details

ROUGE (Recall-Oriented Understudy for Gisting Evaluation) is a generative AI quality evaluation metric that measures how well generative AI assets perform tasks.

Scope

The ROUGE metric evaluates generative AI assets only.

Types of AI assets: Prompt templates
Generative AI tasks:
- Text summarization
- Content generation
- Question answering
- Entity extraction
- Retrieval augmented generation (RAG)
Supported languages: English

Scores and values

The ROUGE metric score indicates the similarity between the generated summary and reference outputs. Higher scores indicate higher similarity between the summary and the reference.

Range of values: 0.0-1.0
Best possible score: 1.0

Settings

Thresholds:
- Lower limit: 0.8
- Upper limit: 1
Parameters:
- Use stemmer: If true, users Porter stemmer to strip word suffixes. Defaults to false.

Parent topic: Evaluation metrics

Was the topic helpful?

0/1000

Metric detailsCopy link to section

ScopeCopy link to section

Scores and valuesCopy link to section

SettingsCopy link to section

Metric details

Scope

Scores and values

Settings