Skip to content

HTTP API Reference

All endpoints below are relative to your Evalyard backend base URL:

BASE_URL=https://api.demo.evalyard.com

There are two main types of authentication:

  • X-Api-Key – for direct access to the LLM Gateway.
  • X-User-Token – for user-scoped operations (jobs, usage, datasets, etc.).

1. User Datasets API (custom metrics & extra fields)

Datasets are user-owned time series stored as JSONL files and visualized in the dashboard. Each dataset can have extra metrics beyond the standard fields via adapters.

Quick start: extra metrics in two steps

  1. Define your dataset adapter with POST /user/data/save_adapter – tell Evalyard which field is time, device, model, and which fields are your extra metrics (with nice labels).
  2. Append rows with POST /user/data/append_jsonl – send your rows using that dataset name. Evalyard will store them and, if the adapter is enabled, automatically record usage + extra metrics so they show up on dashboards.

Auth header:

X-User-Token: YOUR_USER_TOKEN

1.1 Endpoints

MethodPathDescription
POST/user/data/append_jsonlAppend rows to a dataset (and record usage)
GET/user/data/listList datasets for the current user
GET/user/data/getGet raw dataset rows
GET/user/data/metricsGet processed metrics for a dataset
GET/user/data/get_adapterGet dataset adapter configuration
POST/user/data/save_adapterSave/update dataset adapter configuration

1.2 Dataset adapter configuration (POST /user/data/save_adapter)

Call this once per dataset to describe how to interpret your rows and which fields should appear as extra metrics.

Save/update adapter

POST /user/data/save_adapter
X-User-Token: <YOUR_USER_TOKEN>
Content-Type: application/json

Example body

{
"dataset": "my_server_metrics",
"enabled": true,
"x_field": "ts",
"device_field": "device",
"model_field": "model",
"prompt_tokens_field": "prompt_tokens",
"completion_tokens_field": "completion_tokens",
"latency_ms_field": "latency_ms",
"ttft_ms_field": "ttft_ms",
"extra_fields": [
{ "label": "GPU power (W)", "field": "gpu_power_w", "kind": "number" },
{ "label": "CPU usage (%)", "field": "cpu_usage_pct", "kind": "number" }
]
}
  • dataset – name of the dataset you will use in append_jsonl.
  • enabled – when true, new rows will also be converted into usage events.
  • x_field – which field to use for the X-axis / timestamp.
  • device_field, model_field – which fields identify device and model.
  • prompt_tokens_field, completion_tokens_field, latency_ms_field, ttft_ms_field – standard metrics fields.
  • extra_fields – list of extra metrics:
    • field – key from your rows (e.g. gpu_power_w),
    • label – human-readable name shown in the UI (e.g. GPU power (W)),
    • kind – type hint for the metric (e.g. number).

Once this adapter is saved, any rows you send via POST /user/data/append_jsonl with dataset": "my_server_metrics" and matching fields will:

  • be stored in the dataset;
  • appear as usage metrics (including your labeled extra metrics) in Evalyard dashboards.

1.3 Append rows to a dataset (POST /user/data/append_jsonl)

Use this to push your own rows (with extra metrics) into a named dataset.

Endpoint

POST /user/data/append_jsonl
X-User-Token: <YOUR_USER_TOKEN>
Content-Type: application/json

Body

{
"dataset": "my_server_metrics",
"rows": [
{
"ts": "2025-12-03T18:00:00Z",
"device": "server-1",
"model": "llama3-8b",
"prompt_tokens": 50,
"completion_tokens": 120,
"latency_ms": 900,
"ttft_ms": 220,
"gpu_power_w": 45.5,
"cpu_usage_pct": 60.3
},
{
"ts": "2025-12-03T18:05:00Z",
"device": "server-1",
"model": "llama3-8b",
"prompt_tokens": 70,
"completion_tokens": 150,
"latency_ms": 950,
"ttft_ms": 240,
"gpu_power_w": 48.2,
"cpu_usage_pct": 63.1
}
]
}

The backend will:

  • append each row to data/users/<user_id>/datasets/my_server_metrics.jsonl,
  • automatically add metadata fields like _meta_user_id, _meta_dataset, _meta_ts,
  • if a dataset adapter is configured and enabled, also derive usage metrics (including your extra metrics) from these rows.

2. Jobs API (queueing and reading runs via X-User-Token)

Jobs are queued executions of prompts on devices/models. You can:

  • enqueue jobs,
  • list them,
  • fetch results,
  • cancel or requeue.

Auth header:

X-User-Token: YOUR_USER_TOKEN

2.1 Endpoints

MethodPathDescription
GET/api/jobsList jobs for the current user
POST/api/jobsEnqueue a new job
GET/api/jobs/{job_id}Get job status & metadata
GET/api/jobs/{job_id}/resultGet job result & metrics
POST/api/jobs/{job_id}/cancelCancel a running or queued job
POST/jobs/{job_id}/requeueRequeue a failed job (DLQ)
GET/jobs/dlqList jobs in the dead-letter queue
DELETE/jobs/{job_id}Remove job from DLQ

DLQ endpoints are typically admin/operator focused.

2.2 Example: enqueue a job

Terminal window
curl -X POST "<BASE_URL>/api/jobs" \
-H "Content-Type: application/json" \
-H "X-User-Token: YOUR_USER_TOKEN" \
-d '{
"prompt": "Benchmark this prompt on a specific device.",
"system": "You are a benchmark helper.",
"model": "tinyllama-1.1b-chat",
"options": {
"num_predict": 256,
"temperature": 0.7
},
"priority": 1,
"queue_tag": "short",
"device_serial": "0000asd",
"adapter_id": "my_server_metrics",
"callback_url": "https://example.com/hooks/job-finished"
}'

2.3 Example: fetch job result

Terminal window
curl "<BASE_URL>/api/jobs/JOB_ID/result?format=json&full=1" \
-H "X-User-Token: YOUR_USER_TOKEN"

3. Usage API (metrics, runs, CSV export & import)

Usage endpoints expose aggregated metrics and raw runs.

Auth header (user-scoped data):

X-User-Token: YOUR_USER_TOKEN

3.1 Read-only usage

MethodPathDescription
GET/usageRaw/aggregated usage data (time series & breakdown)
GET/usage/summarySummary stats for dashboards (KPIs)
GET/usage/compareCompare two time ranges / configurations
GET/usage/export.csvExport raw runs as CSV