Faithfulness (faith)
Contents
Metric Description
Faithfulness measures whether the model’s output is faithful to the information provided in the input. It evaluates whether the output is consistent with the input, avoids hallucinations or distortions, and preserves the text’s most significant concepts and conclusions. This metric is best suited for summarization tasks, where the output should reflect the source material without inventing or contradicting information.
The score runs from 0 (low faithfulness) to 100 (high faithfulness). The implementation blends (1) LLM-as-a-Judge assessments with (2) heuristic metrics.
How to interpret the score
- Closer to 100: the output is well-aligned with the input.
- Closer to 0: the output contains unsupported or contradicted claims, invents entities not present in the input, or diverges significantly from the source meaning.
Faithfulness focuses on whether the output stays true to the input, not whether facts are universally correct. Pair this with factfulness and answer relevancy for broader evaluation.
API usage
Prerequisites
After the environment variables are configured, the next step is to create a JSON payload for the custom-runs request. For a field-by-field description of the payload (top-level keys, evaluations, and each row in data), see Custom run request body.
Shortname: faith
Default threshold: 80
Inputs (each object in data)
input(strrequired): The source text (e.g., document to be summarized).output(strrequired): The model-generated output to evaluate (e.g., summary).
metric_args
max_n_claims(intoptional): Maximum number of claims to extract from the output for verification. An optimal value of claims to be generated is automatically calculated, this parameter will only define the cap. Default = 50.
Evaluation metadata
On successful evaluation, the metric returns eval_metadata with unsupported claims:
unsupported_details(list[str]): Reasons for output claims not being supported by the input (contradicted, absent from the source, or otherwise ungrounded).
Example
import json
import os
import requests
from dotenv import load_dotenv
load_dotenv(override=True)
_API_KEY = os.getenv("AEGIS_API_KEY")
_BASE_URL = os.getenv("AEGIS_API_BASE_URL")
_CUSTOM_RUN_URL = f"{_BASE_URL}/runs/custom"
def post_custom_run(payload: dict) -> requests.Response:
"""POST JSON payload to Aegis custom runs; returns the raw response."""
headers = {
"Content-Type": "application/json",
"Authorization": f"Bearer {_API_KEY}",
}
return requests.post(
_CUSTOM_RUN_URL,
headers=headers,
data=json.dumps(payload),
)
if __name__ == "__main__":
data = [
{
"input": "Paris is the capital of France. It has a population of about 2.1 million within its administrative limits.",
"output": "Paris is the capital of France with roughly 2.1 million inhabitants in the city proper.",
},
]
payload = {
"threshold": 80,
"model_slug": "o4-mini",
"is_blocking": True,
"data_collection_id": None,
"evaluations": [
{
"metrics": [
{"metric": "faith", "metric_args": {"max_n_claims": 5}},
],
"threshold": 80,
"model_slug": "o4-mini",
"data": data,
}
],
}
response = post_custom_run(payload)
response.raise_for_status()
print(json.dumps(response.json(), indent=2))