HTTP API Reference

All endpoints below are relative to your Evalyard backend base URL:

BASE_URL=https://api.demo.evalyard.com

There are two main types of authentication:

X-Api-Key – for direct access to the LLM Gateway.
X-User-Token – for user-scoped operations (jobs, usage, datasets, etc.).

1. User Datasets API (custom metrics & extra fields)

Datasets are user-owned time series stored as JSONL files and visualized in the dashboard. Each dataset can have extra metrics beyond the standard fields via adapters.

Quick start: extra metrics in two steps

Define your dataset adapter with POST /user/data/save_adapter – tell Evalyard which field is time, device, model, and which fields are your extra metrics (with nice labels).

Append rows with POST /user/data/append_jsonl – send your rows using that dataset name. Evalyard will store them and, if the adapter is enabled, automatically record usage + extra metrics so they show up on dashboards.

Auth header:

X-User-Token: YOUR_USER_TOKEN

1.1 Endpoints

Method	Path	Description
POST	`/user/data/append_jsonl`	Append rows to a dataset (and record usage)
GET	`/user/data/list`	List datasets for the current user
GET	`/user/data/get`	Get raw dataset rows
GET	`/user/data/metrics`	Get processed metrics for a dataset
GET	`/user/data/get_adapter`	Get dataset adapter configuration
POST	`/user/data/save_adapter`	Save/update dataset adapter configuration

1.2 Dataset adapter configuration (`POST /user/data/save_adapter`)

Call this once per dataset to describe how to interpret your rows and which fields should appear as extra metrics.

Save/update adapter

POST /user/data/save_adapter
X-User-Token: <YOUR_USER_TOKEN>
Content-Type: application/json

Example body

{
  "dataset": "my_server_metrics",
  "enabled": true,
  "x_field": "ts",
  "device_field": "device",
  "model_field": "model",
  "prompt_tokens_field": "prompt_tokens",
  "completion_tokens_field": "completion_tokens",
  "latency_ms_field": "latency_ms",
  "ttft_ms_field": "ttft_ms",
  "extra_fields": [
    { "label": "GPU power (W)", "field": "gpu_power_w", "kind": "number" },
    { "label": "CPU usage (%)", "field": "cpu_usage_pct", "kind": "number" }
  ]
}

dataset – name of the dataset you will use in append_jsonl.
enabled – when true, new rows will also be converted into usage events.
x_field – which field to use for the X-axis / timestamp.
device_field, model_field – which fields identify device and model.
prompt_tokens_field, completion_tokens_field, latency_ms_field, ttft_ms_field – standard metrics fields.
extra_fields – list of extra metrics:
- field – key from your rows (e.g. gpu_power_w),
- label – human-readable name shown in the UI (e.g. GPU power (W)),
- kind – type hint for the metric (e.g. number).

Once this adapter is saved, any rows you send via POST /user/data/append_jsonl with dataset": "my_server_metrics" and matching fields will:

be stored in the dataset;
appear as usage metrics (including your labeled extra metrics) in Evalyard dashboards.

1.3 Append rows to a dataset (`POST /user/data/append_jsonl`)

Use this to push your own rows (with extra metrics) into a named dataset.

Endpoint

POST /user/data/append_jsonl
X-User-Token: <YOUR_USER_TOKEN>
Content-Type: application/json

Body

{
  "dataset": "my_server_metrics",
  "rows": [
    {
      "ts": "2025-12-03T18:00:00Z",
      "device": "server-1",
      "model": "llama3-8b",
      "prompt_tokens": 50,
      "completion_tokens": 120,
      "latency_ms": 900,
      "ttft_ms": 220,
      "gpu_power_w": 45.5,
      "cpu_usage_pct": 60.3
    },
    {
      "ts": "2025-12-03T18:05:00Z",
      "device": "server-1",
      "model": "llama3-8b",
      "prompt_tokens": 70,
      "completion_tokens": 150,
      "latency_ms": 950,
      "ttft_ms": 240,
      "gpu_power_w": 48.2,
      "cpu_usage_pct": 63.1
    }
  ]
}

The backend will:

append each row to data/users/<user_id>/datasets/my_server_metrics.jsonl,
automatically add metadata fields like _meta_user_id, _meta_dataset, _meta_ts,
if a dataset adapter is configured and enabled, also derive usage metrics (including your extra metrics) from these rows.

2. Jobs API (queueing and reading runs via `X-User-Token`)

Jobs are queued executions of prompts on devices/models. You can:

enqueue jobs,
list them,
fetch results,
cancel or requeue.

Auth header:

X-User-Token: YOUR_USER_TOKEN

2.1 Endpoints

Method	Path	Description
GET	`/api/jobs`	List jobs for the current user
POST	`/api/jobs`	Enqueue a new job
GET	`/api/jobs/{job_id}`	Get job status & metadata
GET	`/api/jobs/{job_id}/result`	Get job result & metrics
POST	`/api/jobs/{job_id}/cancel`	Cancel a running or queued job
POST	`/jobs/{job_id}/requeue`	Requeue a failed job (DLQ)
GET	`/jobs/dlq`	List jobs in the dead-letter queue
DELETE	`/jobs/{job_id}`	Remove job from DLQ

DLQ endpoints are typically admin/operator focused.

2.2 Example: enqueue a job

curl -X POST "<BASE_URL>/api/jobs" \
  -H "Content-Type: application/json" \
  -H "X-User-Token: YOUR_USER_TOKEN" \
  -d '{
    "prompt": "Benchmark this prompt on a specific device.",
    "system": "You are a benchmark helper.",
    "model": "tinyllama-1.1b-chat",
    "options": {
      "num_predict": 256,
      "temperature": 0.7
    },
    "priority": 1,
    "queue_tag": "short",
    "device_serial": "0000asd",
    "adapter_id": "my_server_metrics",
    "callback_url": "https://example.com/hooks/job-finished"
  }'

2.3 Example: fetch job result

curl "<BASE_URL>/api/jobs/JOB_ID/result?format=json&full=1" \
  -H "X-User-Token: YOUR_USER_TOKEN"

3. Usage API (metrics, runs, CSV export & import)

Usage endpoints expose aggregated metrics and raw runs.

Auth header (user-scoped data):

X-User-Token: YOUR_USER_TOKEN

3.1 Read-only usage

Method	Path	Description
GET	`/usage`	Raw/aggregated usage data (time series & breakdown)
GET	`/usage/summary`	Summary stats for dashboards (KPIs)
GET	`/usage/compare`	Compare two time ranges / configurations
GET	`/usage/export.csv`	Export raw runs as CSV