Is Valid Python (is_valid_python)

Metric description
API usage
Eval metadata

Metric description

Is valid Python checks whether the output is syntactically valid Python source code. It does not execute the code or verify runtime behavior.

How to interpret the score

100: the text parses as Python without syntax errors.
0: syntax errors or empty or invalid input for this check.

API usage

Prerequisites

After the environment variables are configured, the next step is to create a JSON payload for the custom-runs request. For a field-by-field description of the payload (top-level keys, evaluations, and each row in data), see Custom run request body.

Shortname: is_valid_python

Default threshold: 100

Structural metrics run without an LLM (deterministic checks). Your run may still include model_slug where the API expects it; scoring does not depend on it for this category.

Inputs (each object in data)

output (str, required): Python source to parse.

Eval metadata

Structural metrics do not populate eval_metadata; the field is omitted or ull on the result object.

Example

import json
import os

import requests
from dotenv import load_dotenv

load_dotenv(override=True)

_API_KEY = os.getenv("AEGIS_API_KEY")
_BASE_URL = os.getenv("AEGIS_API_BASE_URL")
_CUSTOM_RUN_URL = f"{_BASE_URL}/runs/custom"


def post_custom_run(payload: dict) -> requests.Response:
    """POST JSON payload to Aegis custom runs; returns the raw response."""
    headers = {
        "Content-Type": "application/json",
        "Authorization": f"Bearer {_API_KEY}",
    }
    return requests.post(
        _CUSTOM_RUN_URL,
        headers=headers,
        data=json.dumps(payload),
    )


if __name__ == "__main__":
    data = [
        {"output": "def hello():\n    return 42\n"}
    ]

    payload = {
        "threshold": 100,
        "model_slug": "o4-mini",
        "is_blocking": True,
        "data_collection_id": None,
        "evaluations": [
            {
                "metrics": ["is_valid_python"],
                "threshold": 100,
                "model_slug": "o4-mini",
                "data": data,
            }
        ],
    }

    response = post_custom_run(payload)
    response.raise_for_status()
    print(json.dumps(response.json(), indent=2))

Contents​

Metric description​

How to interpret the score​

API usage​

Eval metadata​

Contents

Metric description

How to interpret the score

API usage

Eval metadata