ROUGE Score (rouge)
Contents
Metric description
ROUGE measures overlap between the output and golden answer, emphasizing recall-oriented overlap. You can choose ROUGE-N (by n) or ROUGE-L (string "L"). Optional stemming and case sensitivity adjust tokenization.
How to interpret the score
- Closer to 100: higher F1-style overlap in the selected ROUGE variant.
- Closer to 0: little overlap with the reference.
API usage
Prerequisites
After the environment variables are configured, the next step is to create a JSON payload for the custom-runs request. For a field-by-field description of the payload (top-level keys, evaluations, and each row in data), see Custom run request body.
Shortname: rouge
Default threshold: 70
Structural metrics run without an LLM (deterministic checks). Your run may still include model_slug where the API expects it; scoring does not depend on it for this category.
Inputs (each object in data)
output(str, required): Candidate text.golden_answer(str, required): Reference text.
metric_args
-
rouge_type(number | stringoptional): ROUGE-N order as an integer, or"L"for ROUGE-L. Default:2. -
use_stemmer(booleanoptional): Use stemming when tokenizing. Default:false. -
case_sensitive(booleanoptional): Compare with case sensitivity. Default:false.
Eval metadata
Structural metrics do not populate eval_metadata; the field is omitted or ull on the result object.
Example
import json
import os
import requests
from dotenv import load_dotenv
load_dotenv(override=True)
_API_KEY = os.getenv("AEGIS_API_KEY")
_BASE_URL = os.getenv("AEGIS_API_BASE_URL")
_CUSTOM_RUN_URL = f"{_BASE_URL}/runs/custom"
def post_custom_run(payload: dict) -> requests.Response:
"""POST JSON payload to Aegis custom runs; returns the raw response."""
headers = {
"Content-Type": "application/json",
"Authorization": f"Bearer {_API_KEY}",
}
return requests.post(
_CUSTOM_RUN_URL,
headers=headers,
data=json.dumps(payload),
)
if __name__ == "__main__":
data = [
{"output": "the cat sat", "golden_answer": "the cat sat on the mat"}
]
payload = {
"threshold": 70,
"model_slug": "o4-mini",
"is_blocking": True,
"data_collection_id": None,
"evaluations": [
{
"metrics": [
{
"metric": "rouge",
"metric_args": {
"rouge_type": 2,
"use_stemmer": False,
"case_sensitive": False,
},
},
],
"threshold": 70,
"model_slug": "o4-mini",
"data": data,
}
],
}
response = post_custom_run(payload)
response.raise_for_status()
print(json.dumps(response.json(), indent=2))