Context Sufficiency (ctx_suff)

Metric Description
API usage
Evaluation metadata

Metric Description

Context sufficiency measures whether the retrieved context is adequate for correctly addressing the user's query. It is crucial for RAG (Retrieval-Augmented Generation) systems, where the Large Language Model (LLM) is provided with relevant information from a knowledge base before generating its answer. If the context is incomplete or poorly matched, the model risks producing an incorrect or hallucinated response that is not grounded in the provided data.

The score runs from 0 (insufficient context) to 100 (fully sufficient context). The implementation uses an LLM-as-a-Judge approach.

How to interpret the score

Closer to 100: the context contains enough relevant information to fully answer all aspects of the user's question.
Closer to 0: the context provides little or no information needed to answer the user's question.

Important

Context sufficiency evaluates whether the retrieved context is enough to answer the query—it does not evaluate the model's actual output. Pair this with context faithfulness and answer relevancy to assess the full RAG pipeline.

API usage

Prerequisites

After the environment variables are configured, the next step is to create a JSON payload for the custom-runs request. For a field-by-field description of the payload (top-level keys, evaluations, and each row in data), see Custom run request body.

Shortname: ctx_suff

Default threshold: 80

Inputs (each object in data)

input (str required): The user's question or instruction (what the context should help answer).
context (str or list required): The retrieved context or source documents (e.g., chunks from a knowledge base).

Evaluation metadata

On successful evaluation, the metric returns eval_metadata summarizing coverage gaps:

missing_details (list[str]): Details on what details were missing from the context to be able to fully answer the input.

Example

import json
import os

import requests
from dotenv import load_dotenv

load_dotenv(override=True)

_API_KEY = os.getenv("AEGIS_API_KEY")
_BASE_URL = os.getenv("AEGIS_API_BASE_URL")
_CUSTOM_RUN_URL = f"{_BASE_URL}/runs/custom"


def post_custom_run(payload: dict) -> requests.Response:
    """POST JSON payload to Aegis custom runs; returns the raw response."""
    headers = {
        "Content-Type": "application/json",
        "Authorization": f"Bearer {_API_KEY}",
    }
    return requests.post(
        _CUSTOM_RUN_URL,
        headers=headers,
        data=json.dumps(payload),
    )


if __name__ == "__main__":
    context = [
        "The new laptop features a 14-inch OLED display, 32GB RAM, and an M3 chip.",
        "OLED screens are uncommon in mid-range laptops."
    ]
    data = [
        {
            "input": "What features does the new laptop have and are they rare?",
            "context": context,
        },
    ]

    payload = {
        "threshold": 80,
        "model_slug": "o4-mini",
        "is_blocking": True,
        "data_collection_id": None,
        "evaluations": [
            {
                "metrics": ["ctx_suff"],
                "threshold": 80,
                "model_slug": "o4-mini",
                "data": data,
            }
        ],
    }

    response = post_custom_run(payload)
    response.raise_for_status()
    print(json.dumps(response.json(), indent=2))

Contents​

Metric Description​

How to interpret the score​

API usage​

Evaluation metadata​

Contents

Metric Description

How to interpret the score

API usage

Evaluation metadata