Embedding drift evaluation metric
The embedding drift evaluation metric detects the percentage of records that are outliers when compared to the baseline data.
Metric details
Embedding drift is a drift v2 evaluation metric that can help measure changes in your data over time to ensure consistent outcomes for your model.
Scope
The embedding drift metric evaluates generative AI assets only.
- Types of AI assets: Prompt templates
- Generative AI tasks:
- Text summarization
- Text classification
- Content generation
- Entity extraction
- Question answering
- Retrieval Augmented Generation (RAG)
- Supported languages: English
Evaluation process
You must provide embeddings with your baseline data when you enable the embeddings drift metric to generate evaluation results. Watsonx.governance builds an auto-encoder that processes the embeddings in your baseline data and computes pre-defined cosine and euclidean distance metrics for the model output. Watsonx.governance identifies the distribution of the distance metrics to set a threshold for outlier detection and detects drift if the distance metric value is higher than the threshold. For RAG tasks, the embeddings for all of the context columns in your model record are combined into a single vector to determine drift.
Do the math
The following formulas are used to calculate the embedding drift metric:
Cosine distance measures the difference between embedding vectors:
The cosine distance ranges between 0, which indicates identical vectors to 1, which indicates no correlation between the vectors, to 2, which indicates opposite vectors.
Euclidean distance is the shortest distance between embedding vectors in the euclidean space:
The euclidean distance ranges between 0, which indicates completely identical vectors, to infinity. However, for vectors that are normalized to have unit length, the maximum euclidean distance is the .
Parent topic: Evaluation metrics