Skip to main content
POST
/
query
# The canonical archetypeai-swat-demo-direct-query pattern: the entire state
# snapshot is rendered into `query` as natural language, with a strict
# system prompt that enforces JSON output shape. No file uploads.
curl -X POST https://api.u1.archetypeai.app/v0.5/query \
  -H "Authorization: Bearer $ATAI_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "Newton::c2_4_7b_251215a172f6d7",
    "query": "Current plant state:\n- P1 raw intake: NORMAL\n- P3 ultrafiltration: ATTACK (LIT301=800.16, z=1.5)\n\nReturn one JSON card per anomalous stage.",
    "system_prompt": "Return ONLY a JSON array of {origin,target,direction,text} objects.",
    "instruction_prompt": "Return ONLY a JSON array of {origin,target,direction,text} objects.",
    "file_ids": [],
    "max_new_tokens": 700,
    "sanitize_response": false
  }'
{
  "query_id": "260519c33f8455cddda9a8",
  "status": "completed",
  "query_timestamp": 1779157948.572,
  "loading_timestamp": 1779157948.608,
  "inference_timestamp": 1779157948.630,
  "response_timestamp": 1779157956.404,
  "query_queue_time_sec": 0.035,
  "inference_time_sec": 7.774,
  "query_response_time_sec": 7.831,
  "gpq_node": "",
  "response": {
    "success": true,
    "response": [
      "The image appears to be a screenshot from a software interface designed for monitoring and analyzing a six-stage water treatment process..."
    ],
    "query": "Describe what you see. Identify any stages flagged as anomalous.",
    "prompt": "Describe what you see. Identify any stages flagged as anomalous.",
    "system": "...",
    "instruction": "...",
    "generation_latency": 7.77,
    "query_gpq_latency": 7.83,
    "query_queue_latency": 0.05,
    "results_timestamp": "20260519_02:32:36",
    "prefetch_stats": { "loading_time": 0.012 }
  }
}

Documentation Index

Fetch the complete documentation index at: https://docs.archetypeai.app/llms.txt

Use this file to discover all available pages before exploring further.

This API endpoint is under active development and is subject to change.

Overview

The /query endpoint runs a synchronous query against an Archetype model. The request is enqueued on the GPU Processing Queue (GPQ), routed to a worker node, and the final result is returned in the same HTTP response. A query can be grounded in:
  • Uploaded files — reference files by ID via file_ids after uploading them through the Files API. Supported file types: .png, .jpg, .jpeg, .txt, .json, .csv, .mp4.
  • Inline data events — pass payloads directly via events (text, JSON, base64-encoded image, or numeric arrays) without a separate upload step.
  • Prompt only — neither files nor events.
The model parameter selects what runs against your inputs. Two model families are currently exposed on /query:
  • Newton C language models (Newton::c2_...) — text reasoning, structured-output generation, image understanding. Used in archetypeai-swat-demo-direct-query, archetypeai-earthquake-demo, archetypeai-grid-demo, and the operator-suggestion patterns documented in the newton-query-prompting skill. Examples on this page use Newton::c2_4_7b_251215a172f6d7; the newer Newton::c2_5_8b_260413b723a9ab is also available.
  • Omega encoders (OmegaEncoder::omega_embeddings_01) — numeric-only. Returns embedding vectors instead of text. Used with data.numeric_array events to feed channel-first sensor windows; the response carries one 768-dim embedding per channel. Pattern documented in the newton-machine-state-direct-query skill.
The exact model identifiers available to your organization may differ — invalid values return 400 invalid_model_version.

Request

model
string
required
Versioned model identifier such as Newton::c2_4_7b_251215a172f6d7 (text + image reasoning) or OmegaEncoder::omega_embeddings_01 (numeric encoder). Validated against the available model registry — invalid values return 400 invalid_model_version. See Overview for the model families currently exposed on this endpoint.
query
string
required
The natural-language query to run against the model. For numeric-encoder models (Omega) this is typically "" — pass the sensor window as a data.numeric_array event in events instead.
system_prompt
string
default:""
Optional system prompt prepended to the query.
instruction_prompt
string
default:""
Optional instruction prompt appended to the system prompt.
response_start_prompt
string
default:""
Optional prefix used to seed the model’s response.
template_name
string
default:""
Optional named prompt template to apply server-side.
file_ids
string[]
File IDs returned by the Files API. Two gotchas worth knowing:
  1. Use the file_id (filename) the upload response returned, not the file_uid (fil_…). /query filters file types by extension on the file_id string — fil_… has no extension and is rejected as unsupported_file_type.
  2. Newton text models see contents of .png / .jpg / .jpeg / .txt / .json injected into the prompt; .csv is the exception. CSV uploads succeed and /query accepts the reference, but the file contents are not visible to the text-reasoning model (likely routed to the numeric ingestion path the LLM doesn’t observe). As a workaround, rename the file to end with .txt before uploading, or pass the CSV contents as a data.text event instead. .mp4 is accepted but Newton text checkpoints currently return polite refusals when asked to describe video frames — use the Activity Monitor lens for video analysis.
events
object[]
Inline data events in place of file uploads. See Data Events for the supported event types: data.text, data.json, data.base64_img, data.base64_img_array, data.numeric_array. For data.json, set event_data.contents to a serialized JSON string (passing a parsed object returns 400 invalid_parameter_type).
max_new_tokens
integer
default:"256"
Maximum tokens to generate in the response.
max_frames
integer
default:"32"
Maximum video frames to sample when an .mp4 file is supplied via file_ids.
temperature
number
Sampling temperature. Omit to use the model default.
do_sample
boolean
Whether to use sampling instead of greedy decoding.
repetition_penalty
number
Penalty for repeating tokens already produced.
top_p
number
Nucleus sampling cutoff.
top_k
integer
Top-k sampling cutoff.
presence_penalty
number
Penalty for tokens already present in the prompt.
normalize_input
boolean
default:"false"
Apply server-side input normalization. For numeric-encoder models (Omega), this z-scores each data.numeric_array event per window before encoding. That preserves cross-channel comparability but erases cross-window amplitude signal — typically the wrong default for anomaly-detection workloads. Leave false and pre-normalize with a global scaler if cross-window magnitudes carry meaning. Has no effect on text-reasoning models.
multi_image
boolean
default:"false"
When true, treat multiple file_ids / image events as a single multi-image input rather than independent inputs.
render
boolean
default:"false"
When true, retains rendered intermediate artifacts on the server. The retrieval endpoint for these artifacts is not exposed on /v0.5; leave this false unless instructed otherwise.
query_metadata
object
Free-form metadata stored alongside the query for the caller’s own bookkeeping.
max_query_size_mb
number
Override the maximum combined prompt size in MB. Defaults to the server’s MAX_QUERY_SIZE_MB setting (typically 0.04 MB).
max_wait_time_sec
number
Override the maximum time to wait for a synchronous result before returning a 504.
sanitize_response
boolean
default:"true"
When true, strips internal fields (api_key, org_id, query_metadata, file_ids, data_types, render, input_items, sanitize) from the response. Set to false only if you need to inspect the raw query record.

Response

query_id
string
Server-generated identifier for this query. Include it when reporting issues to support so the platform team can correlate to server logs.
status
string
Terminal status — completed for successful queries, failed if the worker returned an error.
response
object
Structured payload from the worker. The primary model output is the array at response.response (typically one or more strings). The remaining fields echo the prompt inputs (query, prompt, system, instruction) and per-stage timing (generation_latency, query_gpq_latency, query_queue_latency, results_timestamp, prefetch_stats) for debugging.
query_timestamp
number
Unix timestamp when the query was submitted.
loading_timestamp
number
Unix timestamp when data loading began.
inference_timestamp
number
Unix timestamp when inference began.
response_timestamp
number
Unix timestamp when the response was finalized.
query_queue_time_sec
number
Seconds spent in the queue before processing began.
inference_time_sec
number
Seconds spent on inference.
query_response_time_sec
number
End-to-end latency from submission to response.
gpq_node
string
Identifier of the GPQ worker node that processed the query.
error_messages
string[]
Plain-string error log accumulated by GPQ during query processing. Only present when GPQ has appended at least one message — successful queries typically omit this field entirely.
error_msg
string
Single error string set by GPQ only when status is failed.
Non-2xx responses (400, 429, 504) are reduced to an { "errors": [...] } envelope by the shared API response wrapper — fields like query_id, status, and timing data are stripped. 401 responses are rendered as { "detail": "..." } by FastAPI. See Errors for the shared AtaiError shape.
# The canonical archetypeai-swat-demo-direct-query pattern: the entire state
# snapshot is rendered into `query` as natural language, with a strict
# system prompt that enforces JSON output shape. No file uploads.
curl -X POST https://api.u1.archetypeai.app/v0.5/query \
  -H "Authorization: Bearer $ATAI_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "Newton::c2_4_7b_251215a172f6d7",
    "query": "Current plant state:\n- P1 raw intake: NORMAL\n- P3 ultrafiltration: ATTACK (LIT301=800.16, z=1.5)\n\nReturn one JSON card per anomalous stage.",
    "system_prompt": "Return ONLY a JSON array of {origin,target,direction,text} objects.",
    "instruction_prompt": "Return ONLY a JSON array of {origin,target,direction,text} objects.",
    "file_ids": [],
    "max_new_tokens": 700,
    "sanitize_response": false
  }'
{
  "query_id": "260519c33f8455cddda9a8",
  "status": "completed",
  "query_timestamp": 1779157948.572,
  "loading_timestamp": 1779157948.608,
  "inference_timestamp": 1779157948.630,
  "response_timestamp": 1779157956.404,
  "query_queue_time_sec": 0.035,
  "inference_time_sec": 7.774,
  "query_response_time_sec": 7.831,
  "gpq_node": "",
  "response": {
    "success": true,
    "response": [
      "The image appears to be a screenshot from a software interface designed for monitoring and analyzing a six-stage water treatment process..."
    ],
    "query": "Describe what you see. Identify any stages flagged as anomalous.",
    "prompt": "Describe what you see. Identify any stages flagged as anomalous.",
    "system": "...",
    "instruction": "...",
    "generation_latency": 7.77,
    "query_gpq_latency": 7.83,
    "query_queue_latency": 0.05,
    "results_timestamp": "20260519_02:32:36",
    "prefetch_stats": { "loading_time": 0.012 }
  }
}