Macro F1 score evaluation metric

Last updated: May 08, 2025

The macro F1 score metric measures the average of F1 scores that are calculated separately for each class.

Metric details

Macro F1 score is a multi-label/class metric for generative AI quality evaluations that measures how well generative AI assets perform entity extraction tasks for multi-label/multi-class predictions.

Scope

The macro F1 metric evaluates generative AI assets only.

Types of AI assets: Prompt templates
Generative AI tasks: Entity extraction
Supported languages: English

Scores and values

The macro F1 metric score indicates the average of F1 scores that are calculated separately for each class. Higher scores indicate that predictions are more accurate.

Range of values: 0.0-1.0
Best possible score: 1.0

Settings

Thresholds:
- Lower limit: 0.8
- Upper limit: 1

Parent topic: Evaluation metrics

Was the topic helpful?

0/1000