Direct Query - Archetype AI Documentation

This API endpoint is under active development and is subject to change.

Overview

The /query endpoint runs a synchronous query against an Archetype model. The request is enqueued on the GPU Processing Queue (GPQ), routed to a worker node, and the final result is returned in the same HTTP response. A query can be grounded in:

Uploaded files — reference files by ID via file_ids after uploading them through the Files API. Supported file types: .png, .jpg, .jpeg, .txt, .json, .csv, .mp4.
Inline data events — pass payloads directly via events (text, JSON, base64-encoded image, or numeric arrays) without a separate upload step.
Prompt only — neither files nor events.

The model parameter selects what runs against your inputs. Two model families are currently exposed on /query:

Newton C language models (Newton::c2_...) — text reasoning, structured-output generation, image understanding. Used in archetypeai-swat-demo-direct-query, archetypeai-earthquake-demo, archetypeai-grid-demo, and the operator-suggestion patterns documented in the newton-query-prompting skill. Examples on this page use Newton::c2_4_7b_251215a172f6d7; the newer Newton::c2_5_8b_260413b723a9ab is also available.
Omega encoders (OmegaEncoder::omega_embeddings_01) — numeric-only. Returns embedding vectors instead of text. Used with data.numeric_array events to feed channel-first sensor windows; the response carries one 768-dim embedding per channel. Pattern documented in the newton-machine-state-direct-query skill.

The exact model identifiers available to your organization may differ — invalid values return 400 invalid_model_version.

Request

model

string

required

Versioned model identifier such as Newton::c2_4_7b_251215a172f6d7 (text + image reasoning) or OmegaEncoder::omega_embeddings_01 (numeric encoder). Validated against the available model registry — invalid values return 400 invalid_model_version. See Overview for the model families currently exposed on this endpoint.

query

string

required

The natural-language query to run against the model. For numeric-encoder models (Omega) this is typically "" — pass the sensor window as a data.numeric_array event in events instead.

system_prompt

string

default:""

Optional system prompt prepended to the query.

instruction_prompt

string

default:""

Optional instruction prompt appended to the system prompt.

response_start_prompt

string

default:""

Optional prefix used to seed the model’s response.

template_name

string

default:""

Optional named prompt template to apply server-side.

file_ids

string[]

File IDs returned by the Files API. Two gotchas worth knowing:

Use the file_id (filename) the upload response returned, not the file_uid (fil_…). /query filters file types by extension on the file_id string — fil_… has no extension and is rejected as unsupported_file_type.
Newton text models see contents of .png / .jpg / .jpeg / .txt / .json injected into the prompt; .csv is the exception. CSV uploads succeed and /query accepts the reference, but the file contents are not visible to the text-reasoning model (likely routed to the numeric ingestion path the LLM doesn’t observe). As a workaround, rename the file to end with .txt before uploading, or pass the CSV contents as a data.text event instead. .mp4 is accepted but Newton text checkpoints currently return polite refusals when asked to describe video frames — use the Activity Monitor lens for video analysis.

events

object[]

Inline data events in place of file uploads. See Data Events for the supported event types: data.text, data.json, data.base64_img, data.base64_img_array, data.numeric_array. For data.json, set event_data.contents to a serialized JSON string (passing a parsed object returns 400 invalid_parameter_type).

max_new_tokens

integer

default:"256"

Maximum tokens to generate in the response.

max_frames

integer

default:"32"

Maximum video frames to sample when an .mp4 file is supplied via file_ids.

temperature

number

Sampling temperature. Omit to use the model default.

do_sample

boolean

Whether to use sampling instead of greedy decoding.

repetition_penalty

number

Penalty for repeating tokens already produced.

top_p

number

Nucleus sampling cutoff.

top_k

integer

Top-k sampling cutoff.

presence_penalty

number

Penalty for tokens already present in the prompt.

normalize_input

boolean

default:"false"

Apply server-side input normalization. For numeric-encoder models (Omega), this z-scores each data.numeric_array event per window before encoding. That preserves cross-channel comparability but erases cross-window amplitude signal — typically the wrong default for anomaly-detection workloads. Leave false and pre-normalize with a global scaler if cross-window magnitudes carry meaning. Has no effect on text-reasoning models.

multi_image

boolean

default:"false"

When true, treat multiple file_ids / image events as a single multi-image input rather than independent inputs.

render

boolean

default:"false"

When true, retains rendered intermediate artifacts on the server. The retrieval endpoint for these artifacts is not exposed on /v0.5; leave this false unless instructed otherwise.

query_metadata

object

Free-form metadata stored alongside the query for the caller’s own bookkeeping.

max_query_size_mb

number

Override the maximum combined prompt size in MB. Defaults to the server’s MAX_QUERY_SIZE_MB setting (typically 0.04 MB).

max_wait_time_sec

number

Override the maximum time to wait for a synchronous result before returning a 504.

sanitize_response

boolean

default:"true"

When true, strips internal fields (api_key, org_id, query_metadata, file_ids, data_types, render, input_items, sanitize) from the response. Set to false only if you need to inspect the raw query record.

Response

query_id

string

Server-generated identifier for this query. Include it when reporting issues to support so the platform team can correlate to server logs.

status

string

Terminal status — completed for successful queries, failed if the worker returned an error.

response

object

Structured payload from the worker. The primary model output is the array at response.response (typically one or more strings). The remaining fields echo the prompt inputs (query, prompt, system, instruction) and per-stage timing (generation_latency, query_gpq_latency, query_queue_latency, results_timestamp, prefetch_stats) for debugging.

query_timestamp

number

Unix timestamp when the query was submitted.

loading_timestamp

number

Unix timestamp when data loading began.

inference_timestamp

number

Unix timestamp when inference began.

response_timestamp

number

Unix timestamp when the response was finalized.

query_queue_time_sec

number

Seconds spent in the queue before processing began.

inference_time_sec

number

Seconds spent on inference.

query_response_time_sec

number

End-to-end latency from submission to response.

gpq_node

string

Identifier of the GPQ worker node that processed the query.

error_messages

string[]

Plain-string error log accumulated by GPQ during query processing. Only present when GPQ has appended at least one message — successful queries typically omit this field entirely.

error_msg

string

Single error string set by GPQ only when status is failed.

Non-2xx responses (400, 429, 504) are reduced to an { "errors": [...] } envelope by the shared API response wrapper — fields like query_id, status, and timing data are stripped. 401 responses are rendered as { "detail": "..." } by FastAPI. See Errors for the shared AtaiError shape.

# The canonical archetypeai-swat-demo-direct-query pattern: the entire state
# snapshot is rendered into `query` as natural language, with a strict
# system prompt that enforces JSON output shape. No file uploads.
curl -X POST https://api.u1.archetypeai.app/v0.5/query \
  -H "Authorization: Bearer $ATAI_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "Newton::c2_4_7b_251215a172f6d7",
    "query": "Current plant state:\n- P1 raw intake: NORMAL\n- P3 ultrafiltration: ATTACK (LIT301=800.16, z=1.5)\n\nReturn one JSON card per anomalous stage.",
    "system_prompt": "Return ONLY a JSON array of {origin,target,direction,text} objects.",
    "instruction_prompt": "Return ONLY a JSON array of {origin,target,direction,text} objects.",
    "file_ids": [],
    "max_new_tokens": 700,
    "sanitize_response": false
  }'

{
  "query_id": "260519c33f8455cddda9a8",
  "status": "completed",
  "query_timestamp": 1779157948.572,
  "loading_timestamp": 1779157948.608,
  "inference_timestamp": 1779157948.630,
  "response_timestamp": 1779157956.404,
  "query_queue_time_sec": 0.035,
  "inference_time_sec": 7.774,
  "query_response_time_sec": 7.831,
  "gpq_node": "",
  "response": {
    "success": true,
    "response": [
      "The image appears to be a screenshot from a software interface designed for monitoring and analyzing a six-stage water treatment process..."
    ],
    "query": "Describe what you see. Identify any stages flagged as anomalous.",
    "prompt": "Describe what you see. Identify any stages flagged as anomalous.",
    "system": "...",
    "instruction": "...",
    "generation_latency": 7.77,
    "query_gpq_latency": 7.83,
    "query_queue_latency": 0.05,
    "results_timestamp": "20260519_02:32:36",
    "prefetch_stats": { "loading_time": 0.012 }
  }
}

API Reference

Documentation Index

​Overview

​Request

​Response

Overview

Request

Response