Batch Processing - Archetype AI Documentation

The Batch Processing API uses the Jobs Orchestration Service (JOS) to run processing jobs against multiple files. JOS owns the entire lifecycle of the jobs it runs: it validates the configuration, manages job pipelines, handles Kubernetes resource management, monitors the progress of the job, and publishes events. Events are delivered and metrics streamed using Redis streams.

Concepts

Job

A job is a complete work request submitted to JOS using either the Batch Manager in the Console or the Batch Processing API.

Index

An index is a unit of parallel execution within a job. A job’s parallelism parameter specifies the maximum number of indexes that will be used to process that job.

Completion

A completion is a successful resolution of an index.

Task

A task is the finest-grained data processing unit within the Jobs Orchestration Service. A task represents a single item to be processed. For example, a task may handle a single inference request on one input, or one training step on a batch. Tasks are distributed across indexes, and progress tracking and metrics are reported at the task level.

Pipeline

A pipeline is a named, versioned configuration template that defines defaults and requirements for a particular workload. Pipelines are managed by the Jobs Orchestration Service and are referenced by APIs when submitting jobs. The pipeline configuration includes:

The container image and command, indicating which worker to use.
Resource defaults such as minimum CPU, memory, and GPU type.
Queue routing information.
Timeout and retry policy settings.
The OpenAPI/JSON schema defining all configuration options and rules.
A user configuration schema defining a subset of the configuration schema that users are allowed to override when submitting a job. Fields outside this schema are locked to pipeline defaults.
The default configuration for the pipeline, specifying the default values for all configuration fields, except for the required fields. This is the base configuration that gets merged with the user overrides.

The Batch Job Lifecycle

Each job has a lifecycle, indicated by the job’s status. When a job is first created by using the Create Job endpoint, it begins in the PENDING state. When it’s admitted into a queue for processing, its state changes to ADMITTED. When a GPU is available to run the job, its state changes to RUNNING. Cancelled jobs transition to the CANCELLED state, while successfully completed jobs have the state COMPLETE. If a higher priority job is scheduled on the same GPU as an existing job, the existing job transitions into the PREEMPTED state. It will automatically return to the PENDING state when it is re-enqueued. If a job fails, its state transitions to FAILED. This can happen for a variety of reasons such as invalid input parameters or issues with the supplied data. FAILED and CANCELLED jobs can be retried using the Retry Job endpoint. This will attempt to continue the job from its last checkpoint.

Running a Batch Using the Console

To run a batch using the console, use the Batch Manager.

Running a Batch using the Batch Processing API

Upload the data files to process using the Files API.
Create the job using the Create Job endpoint. This creates the job and adds the job to the pipeline.
Monitor the job’s status using the Get Job endpoint. The job has finished running if its state is COMPLETE, CANCELLED, or `FAILED.
If a job needs to be cancelled, use the Cancel Job endpoint.
Cancelled or failed jobs can be retried by using the Retry Job endpoint.
Once the job is COMPLETE or FAILED, you can get a list of the events that occurred on the job using the List Job Events endpoint.
You can get a list of the output files generated by the job using the List Outputs endpoint. Once you have the list, you can download the artifacts using HTTPS requests.

Examples

Find examples demonstrating various aspects of using the Batch Processing API in our archetypeai-batch-examples-volve repository. These examples use real-world sensor data from the Equinor Volve Data Village and show how to:

Prepare data for analysis
Upload data to the platform
Run the batch jobs
Download the output data
Evaluate the output data against ground truth

​Concepts

​Job

​Index

​Completion

​Task

​Pipeline

​The Batch Job Lifecycle

​Running a Batch Using the Console

​Running a Batch using the Batch Processing API

​Examples

Concepts

Job

Index

Completion

Task

Pipeline

The Batch Job Lifecycle

Running a Batch Using the Console

Running a Batch using the Batch Processing API

Examples