0 / 0

Macro F1 score evaluation metric

Last updated: May 08, 2025
Macro F1 score evaluation metric

The macro F1 score metric measures the average of F1 scores that are calculated separately for each class.

Metric details

Macro F1 score is a multi-label/class metric for generative AI quality evaluations that measures how well generative AI assets perform entity extraction tasks for multi-label/multi-class predictions.

Scope

The macro F1 metric evaluates generative AI assets only.

  • Types of AI assets: Prompt templates
  • Generative AI tasks: Entity extraction
  • Supported languages: English

Scores and values

The macro F1 metric score indicates the average of F1 scores that are calculated separately for each class. Higher scores indicate that predictions are more accurate.

  • Range of values: 0.0-1.0
  • Best possible score: 1.0

Settings

  • Thresholds:
    • Lower limit: 0.8
    • Upper limit: 1

Parent topic: Evaluation metrics