API Reference
Direct Query
Run a natural-language query against a model, optionally grounded in uploaded files or inline data events
POST
Overview
The/query endpoint runs a synchronous query against an Archetype model. The request is enqueued on the GPU Processing Queue (GPQ), routed to a worker node, and the final result is returned in the same HTTP response.
A query can be grounded in:
- Uploaded files — reference files by ID via
file_idsafter uploading them through the Files API. Supported file types:.png,.jpg,.jpeg,.txt,.json,.csv,.mp4. - Inline data events — pass payloads directly via
events(text, JSON, base64-encoded image, or numeric arrays) without a separate upload step. - Prompt only — neither files nor events.
model parameter selects what runs against your inputs. Two model families are currently exposed on /query:
- Newton C language models (
Newton::c2_...) — text reasoning, structured-output generation, image understanding. Used inarchetypeai-swat-demo-direct-query,archetypeai-earthquake-demo,archetypeai-grid-demo, and the operator-suggestion patterns documented in the newton-query-prompting skill. Examples on this page useNewton::c2_4_7b_251215a172f6d7; the newerNewton::c2_5_8b_260413b723a9abis also available. - Omega encoders (
OmegaEncoder::omega_embeddings_01) — numeric-only. Returns embedding vectors instead of text. Used withdata.numeric_arrayevents to feed channel-first sensor windows; the response carries one 768-dim embedding per channel. Pattern documented in the newton-machine-state-direct-query skill.
400 invalid_model_version.
Request
Versioned model identifier such as
Newton::c2_4_7b_251215a172f6d7 (text +
image reasoning) or OmegaEncoder::omega_embeddings_01 (numeric encoder).
Validated against the available model registry — invalid values return
400 invalid_model_version. See Overview for the model families
currently exposed on this endpoint.The natural-language query to run against the model. For numeric-encoder
models (Omega) this is typically
"" — pass the sensor window as a
data.numeric_array event in events instead.Optional system prompt prepended to the query.
Optional instruction prompt appended to the system prompt.
Optional prefix used to seed the model’s response.
Optional named prompt template to apply server-side.
File IDs returned by the Files API. Two
gotchas worth knowing:
- Use the
file_id(filename) the upload response returned, not thefile_uid(fil_…)./queryfilters file types by extension on thefile_idstring —fil_…has no extension and is rejected asunsupported_file_type. - Newton text models see contents of
.png/.jpg/.jpeg/.txt/.jsoninjected into the prompt;.csvis the exception. CSV uploads succeed and/queryaccepts the reference, but the file contents are not visible to the text-reasoning model (likely routed to the numeric ingestion path the LLM doesn’t observe). As a workaround, rename the file to end with.txtbefore uploading, or pass the CSV contents as adata.textevent instead..mp4is accepted but Newton text checkpoints currently return polite refusals when asked to describe video frames — use the Activity Monitor lens for video analysis.
Inline data events in place of file uploads. See
Data Events for the supported
event types:
data.text, data.json, data.base64_img,
data.base64_img_array, data.numeric_array. For data.json, set
event_data.contents to a serialized JSON string (passing a parsed
object returns 400 invalid_parameter_type).Maximum tokens to generate in the response.
Maximum video frames to sample when an
.mp4 file is supplied via file_ids.Sampling temperature. Omit to use the model default.
Whether to use sampling instead of greedy decoding.
Penalty for repeating tokens already produced.
Nucleus sampling cutoff.
Top-k sampling cutoff.
Penalty for tokens already present in the prompt.
Apply server-side input normalization. For numeric-encoder models (Omega),
this z-scores each
data.numeric_array event per window before
encoding. That preserves cross-channel comparability but erases
cross-window amplitude signal — typically the wrong default for
anomaly-detection workloads. Leave false and pre-normalize with a
global scaler if cross-window magnitudes carry meaning. Has no effect
on text-reasoning models.When true, treat multiple
file_ids / image events as a single multi-image
input rather than independent inputs.When true, retains rendered intermediate artifacts on the server. The
retrieval endpoint for these artifacts is not exposed on
/v0.5; leave
this false unless instructed otherwise.Free-form metadata stored alongside the query for the caller’s own bookkeeping.
Override the maximum combined prompt size in MB. Defaults to the server’s
MAX_QUERY_SIZE_MB setting (typically 0.04 MB).Override the maximum time to wait for a synchronous result before returning
a
504.When true, strips internal fields (
api_key, org_id, query_metadata,
file_ids, data_types, render, input_items, sanitize) from the
response. Set to false only if you need to inspect the raw query record.Response
Server-generated identifier for this query. Include it when reporting issues
to support so the platform team can correlate to server logs.
Terminal status —
completed for successful queries, failed if the worker
returned an error.Structured payload from the worker. The primary model output is the array
at
response.response (typically one or more strings). The remaining fields
echo the prompt inputs (query, prompt, system, instruction) and
per-stage timing (generation_latency, query_gpq_latency,
query_queue_latency, results_timestamp, prefetch_stats) for debugging.Unix timestamp when the query was submitted.
Unix timestamp when data loading began.
Unix timestamp when inference began.
Unix timestamp when the response was finalized.
Seconds spent in the queue before processing began.
Seconds spent on inference.
End-to-end latency from submission to response.
Identifier of the GPQ worker node that processed the query.
Plain-string error log accumulated by GPQ during query processing. Only
present when GPQ has appended at least one message — successful queries
typically omit this field entirely.
Single error string set by GPQ only when
status is failed.Non-2xx responses (400, 429, 504) are reduced to an
{ "errors": [...] }
envelope by the shared API response wrapper — fields like query_id,
status, and timing data are stripped. 401 responses are rendered as
{ "detail": "..." } by FastAPI. See Errors
for the shared AtaiError shape.