The ROUGE metric measures how well generated summaries or translations compare to reference outputs.
Metric details
Copy link to section
ROUGE (Recall-Oriented Understudy for Gisting Evaluation) is a generative AI quality evaluation metric that measures how well generative AI assets perform tasks.
Scope
Copy link to section
The ROUGE metric evaluates generative AI assets only.
Types of AI assets: Prompt templates
Generative AI tasks:
Text summarization
Content generation
Question answering
Entity extraction
Retrieval augmented generation (RAG)
Supported languages: English
Scores and values
Copy link to section
The ROUGE metric score indicates the similarity between the generated summary and reference outputs. Higher scores indicate higher similarity between the summary and the reference.
Range of values: 0.0-1.0
Best possible score: 1.0
Settings
Copy link to section
Thresholds:
Lower limit: 0.8
Upper limit: 1
Parameters:
Use stemmer: If true, users Porter stemmer to strip word suffixes. Defaults to false.