About cookies on this site Our websites require some cookies to function properly (required). In addition, other cookies may be used with your consent to analyze site usage, improve the user experience and for advertising. For more information, please review your options. By visiting our website, you agree to our processing of information as described in IBM’sprivacy statement. To provide a smooth navigation, your cookie preferences will be shared across the IBM web domains listed here.
Macro F1 score evaluation metric
Last updated: May 08, 2025
The macro F1 score metric measures the average of F1 scores that are calculated separately for each class.
Metric details
Macro F1 score is a multi-label/class metric for generative AI quality evaluations that measures how well generative AI assets perform entity extraction tasks for multi-label/multi-class predictions.
Scope
The macro F1 metric evaluates generative AI assets only.
- Types of AI assets: Prompt templates
- Generative AI tasks: Entity extraction
- Supported languages: English
Scores and values
The macro F1 metric score indicates the average of F1 scores that are calculated separately for each class. Higher scores indicate that predictions are more accurate.
- Range of values: 0.0-1.0
- Best possible score: 1.0
Settings
- Thresholds:
- Lower limit: 0.8
- Upper limit: 1
Parent topic: Evaluation metrics
Was the topic helpful?
0/1000