Context Recall (ctx_recall)
Contents
Metric Description
Context recall measures the quality of the retrieved context compared to a golden answer (ideal reference). It evaluates whether all content in the golden answer can be attributed to the context chunks. In other words: does the context contain enough information to support everything the golden answer claims?
The score runs from 0 (poor coverage) to 100 (full coverage). The implementation uses LLM-as-a-Judge for text parsing and interpretation.
How to interpret the score
- Closer to 100: the context covers most or all of the information present in the golden answer; little or nothing in the golden answer is unsupported by the chunks.
- Closer to 0: the context misses significant information that the golden answer expects; much of the content cannot be attributed to any chunk.
Context recall measures how well the retrieved context supports a reference answer—not how faithful an LLM’s output is to that context. For faithfulness of model outputs, use metrics like faithfulness or factfulness.
API usage
Prerequisites
After the environment variables are configured, the next step is to create a JSON payload for the custom-runs request. For a field-by-field description of the payload (top-level keys, evaluations, and each row in data), see Custom run request body.
Shortname: ctx_recall
Default threshold: 80
Inputs (each object in data)
context(strorlist[str]required): The retrieved context chunks (documents or passages) to evaluate.golden_answer(strrequired): The ideal reference answer that the context is expected to support.
Evaluation metadata
On successful evaluation, the metric returns eval_metadata with attribution gaps between the golden answer and the chunks:
unattributed_sentences_reasons(list[str]): Reasons from the judge for golden-answer sentences that could not be attributed to the provided context, explaining what is missing or unmatched.
Example
import json
import os
import requests
from dotenv import load_dotenv
load_dotenv(override=True)
_API_KEY = os.getenv("AEGIS_API_KEY")
_BASE_URL = os.getenv("AEGIS_API_BASE_URL")
_CUSTOM_RUN_URL = f"{_BASE_URL}/runs/custom"
def post_custom_run(payload: dict) -> requests.Response:
"""POST JSON payload to Aegis custom runs; returns the raw response."""
headers = {
"Content-Type": "application/json",
"Authorization": f"Bearer {_API_KEY}",
}
return requests.post(
_CUSTOM_RUN_URL,
headers=headers,
data=json.dumps(payload),
)
if __name__ == "__main__":
context = [
"Battery life is up to 17 hours of mixed use.",
"It includes a 10-core CPU and 16GB unified memory.",
"Storage starts at 512GB SSD.",
"There are two Thunderbolt 4 ports.",
"The notebook ships with a 3024x1964 high-resolution display.",
]
golden_answer = (
"The notebook ships with a 3024x1964 high-resolution display. "
"It lasts up to 17 hours on a charge. "
"It includes a 10-core CPU. "
"Storage starts at 512GB SSD. "
"There are two Thunderbolt 4 ports."
)
data = [
{"context": context, "golden_answer": golden_answer},
]
payload = {
"threshold": 80,
"model_slug": "o4-mini",
"is_blocking": True,
"data_collection_id": None,
"evaluations": [
{
"metrics": ["ctx_recall"],
"threshold": 80,
"model_slug": "o4-mini",
"data": data,
}
],
}
response = post_custom_run(payload)
response.raise_for_status()
print(json.dumps(response.json(), indent=2))