Is Valid Python (is_valid_python)
Contents
Metric description
Is valid Python checks whether the output is syntactically valid Python source code. It does not execute the code or verify runtime behavior.
How to interpret the score
- 100: the text parses as Python without syntax errors.
- 0: syntax errors or empty or invalid input for this check.
API usage
Prerequisites
After the environment variables are configured, the next step is to create a JSON payload for the custom-runs request. For a field-by-field description of the payload (top-level keys, evaluations, and each row in data), see Custom run request body.
Shortname: is_valid_python
Default threshold: 100
Structural metrics run without an LLM (deterministic checks). Your run may still include model_slug where the API expects it; scoring does not depend on it for this category.
Inputs (each object in data)
output(str, required): Python source to parse.
Eval metadata
Structural metrics do not populate eval_metadata; the field is omitted or ull on the result object.
Example
import json
import os
import requests
from dotenv import load_dotenv
load_dotenv(override=True)
_API_KEY = os.getenv("AEGIS_API_KEY")
_BASE_URL = os.getenv("AEGIS_API_BASE_URL")
_CUSTOM_RUN_URL = f"{_BASE_URL}/runs/custom"
def post_custom_run(payload: dict) -> requests.Response:
"""POST JSON payload to Aegis custom runs; returns the raw response."""
headers = {
"Content-Type": "application/json",
"Authorization": f"Bearer {_API_KEY}",
}
return requests.post(
_CUSTOM_RUN_URL,
headers=headers,
data=json.dumps(payload),
)
if __name__ == "__main__":
data = [
{"output": "def hello():\n return 42\n"}
]
payload = {
"threshold": 100,
"model_slug": "o4-mini",
"is_blocking": True,
"data_collection_id": None,
"evaluations": [
{
"metrics": ["is_valid_python"],
"threshold": 100,
"model_slug": "o4-mini",
"data": data,
}
],
}
response = post_custom_run(payload)
response.raise_for_status()
print(json.dumps(response.json(), indent=2))