Skip to main content
POST
/
batch
/
jobs
Create Job
curl --request POST \
  --url https://api.u1.archetypeai.app/v0.5/batch/jobs \
  --header 'Authorization: Bearer <token>' \
  --header 'Content-Type: application/json' \
  --data '
{
  "name": "<string>",
  "pipeline_type": "<string>",
  "pipeline_key": "<string>",
  "pipeline_version": "<string>",
  "inputs": {},
  "parameters": {}
}
'
{
  "id": "<string>",
  "org_id": "<string>",
  "name": "<string>",
  "pipeline_type": "<string>",
  "pipeline_key": "<string>",
  "pipeline_version": "<string>",
  "status": "<string>",
  "parameters": {},
  "retry_count": 123,
  "preemption_count": 123,
  "queue_position": 123,
  "queue_depth": 123,
  "input_progress": {},
  "created_at": "<string>",
  "updated_at": "<string>",
  "started_at": "<string>",
  "completed_at": "<string>",
  "failed_at": "<string>",
  "cancelled_at": "<string>",
  "error": {}
}

Overview

This endpoint creates a new job with the specified pipeline configuration and optional input files. The job is placed into the queue and will be processed when resources are available. Inputs are organized by port name. The available ports depend on the pipeline — call Get Pipeline Schema first if you don’t know them. The two batch pipelines deployed on the platform today are:
  • machine-state-classification — time-series sensor classification via an Omega encoder + KNN. Input ports: worker.inference (CSV files to classify), worker.n_shots (labeled CSV example files with metadata.class). Output port: worker.results.
  • activity-detection — Newton C language model over a JSONL prompt file. Input port: worker.data (one JSONL file, each line an InferenceRecord). Output port: worker.result.

Request

name
string
required
A human-readable name for the job
pipeline_type
string
required
The type of pipeline to run. One of: batch, training
pipeline_key
string
required
The key identifying the pipeline to use from the registry (e.g. machine-state-classification, activity-detection)
pipeline_version
string
Specific pipeline version to use. If omitted, the latest published version is used.
inputs
object
Input files organized by port name. Each key is a port name (see the pipeline schema) and the value is an array of input file objects:
  • file_id (string, required) — The file ID returned from the Files API
  • metadata (object) — Optional per-input metadata. For n-shot ports this carries the class label ({"class": "..."}).
parameters
object
Pipeline parameters organized by component name (e.g. worker). Each value is an object with:
  • parallelism (integer) — Number of parallel workers for this component
  • config (object) — Free-form configuration passed to the container. The accepted shape is defined by the pipeline’s user_config_schema — fetch it via Get Pipeline Schema.

Response

id
string
Unique job identifier (TypeID, job_ prefix)
org_id
string
Organization identifier
name
string
Job name
pipeline_type
string
Pipeline type (batch or training)
pipeline_key
string
Pipeline key
pipeline_version
string
Pipeline version used
status
string
Initial job status (typically PENDING)
parameters
object
Resolved job parameters (user-supplied values merged onto the pipeline’s default_config)
retry_count
integer
Number of times the job has been retried (always 0 on create)
preemption_count
integer
Number of times the job has been preempted (always 0 on create)
queue_position
integer
Position in the queue at admission time. Omitted from the response when not queued (e.g. terminal-state jobs).
queue_depth
integer
Total queue depth at admission time. Omitted from the response when not queued.
input_progress
object
Per-status counts of tracked inputs (pending, processing, completed, failed). Omitted from this response — populated only on read paths like GET /batch/jobs and GET /batch/jobs/{id}.
created_at
string
Creation timestamp in RFC 3339 format
updated_at
string
Last update timestamp
started_at
string
Start timestamp, or null if not yet started
completed_at
string
Completion timestamp, or null
failed_at
string
Failure timestamp, or null
cancelled_at
string
Cancellation timestamp, or null
error
object
Error details, or null

Examples

The two batch pipelines deployed on the platform take very different request bodies. Switch tabs to compare.
Classify time-series sensor data using n-shot example files. Inputs split across two ports — worker.inference for the CSVs to classify and worker.n_shots for the labeled example files (class declared via metadata.class).
curl -X POST https://api.u1.archetypeai.app/v0.5/batch/jobs \
  -H "Authorization: Bearer $ATAI_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "name": "tep-classification",
    "pipeline_type": "batch",
    "pipeline_key": "machine-state-classification",
    "inputs": {
      "worker.inference": [
        {"file_id": "tep_inference.csv"}
      ],
      "worker.n_shots": [
        {"file_id": "tep_normal.csv", "metadata": {"class": "normal"}},
        {"file_id": "tep_fault.csv",  "metadata": {"class": "fault"}}
      ]
    },
    "parameters": {
      "worker": {
        "parallelism": 1,
        "config": {
          "model_type": "omega_1_4_base",
          "batch_size": 32,
          "reader_config": {
            "data_columns": ["xmeas_1", "xmeas_2", "xmv_11"],
            "timestamp_column": "timestamp",
            "window_size": 64,
            "step_size": 1
          },
          "classifier_config": {
            "n_neighbors": 5,
            "metric": "euclidean",
            "weights": "uniform",
            "normalize_embeddings": false
          },
          "flush_every_n_iteration": 150
        }
      }
    }
  }'
Response — 201 Created
{
  "id": "job_2abc3def4ghi5jkl6mno7pqr",
  "org_id": "org_1abc2def3ghi4jkl",
  "name": "tep-classification",
  "pipeline_type": "batch",
  "pipeline_key": "machine-state-classification",
  "pipeline_version": "1.1.1",
  "status": "PENDING",
  "parameters": {
    "worker": {
      "parallelism": 1,
      "config": {
        "model_type": "omega_1_4_base",
        "batch_size": 32,
        "reader_config": {
          "data_columns": ["xmeas_1", "xmeas_2", "xmv_11"],
          "timestamp_column": "timestamp",
          "window_size": 64,
          "step_size": 1
        },
        "classifier_config": {
          "n_neighbors": 5,
          "metric": "euclidean",
          "weights": "uniform",
          "normalize_embeddings": false
        },
        "flush_every_n_iteration": 150
      }
    }
  },
  "retry_count": 0,
  "preemption_count": 0,
  "created_at": "2026-04-14T10:00:00Z",
  "updated_at": "2026-04-14T10:00:00Z",
  "started_at": null,
  "completed_at": null,
  "failed_at": null,
  "cancelled_at": null,
  "error": null
}
See the newton-machine-state-batch skill for model selection (omega_1_4_base vs the legacy 1.3 variants), window_size / step_size guidance at high sample rates, and the within-distribution vs cross-condition accuracy pitfall.

Error responses

400 - Invalid Request
{
  "code": "INVALID_REQUEST",
  "message": "pipeline_key 'nonexistent-pipeline' not found in registry",
  "error_uid": "err_abc123",
  "suggestion": "Check available pipelines with GET /batch/registry/pipelines"
}
401 - Unauthorized
{
  "detail": "Invalid access with key: api_key_not_found"
}