The Batch Processing API uses the Jobs Orchestration Service (JOS) to run processing jobs against multiple files. JOS owns the entire lifecycle of the jobs it runs: it validates the configuration, manages job pipelines, handles Kubernetes resource management, monitors the progress of the job, and publishes events. Events are delivered and metrics streamed using Redis streams.Documentation Index
Fetch the complete documentation index at: https://docs.archetypeai.app/llms.txt
Use this file to discover all available pages before exploring further.
Concepts
Job
A job is a complete work request submitted to JOS using either the Batch Manager in the Console or the Batch Processing API.Index
An index is a unit of parallel execution within a job. A job’sparallelism parameter
specifies the maximum number of indexes that will be used to process that job.
Completion
A completion is a successful resolution of an index.Task
A task is the finest-grained data processing unit within the Jobs Orchestration Service. A task represents a single item to be processed. For example, a task may handle a single inference request on one input, or one training step on a batch. Tasks are distributed across indexes, and progress tracking and metrics are reported at the task level.Pipeline
A pipeline is a named, versioned configuration template that defines defaults and requirements for a particular workload. Pipelines are managed by the Jobs Orchestration Service and are referenced by APIs when submitting jobs. The pipeline configuration includes:- The container image and command, indicating which worker to use.
- Resource defaults such as minimum CPU, memory, and GPU type.
- Queue routing information.
- Timeout and retry policy settings.
- The OpenAPI/JSON schema defining all configuration options and rules.
- A user configuration schema defining a subset of the configuration schema that users are allowed to override when submitting a job. Fields outside this schema are locked to pipeline defaults.
- The default configuration for the pipeline, specifying the default values for all configuration fields, except for the required fields. This is the base configuration that gets merged with the user overrides.
The Batch Job Lifecycle
Each job has a lifecycle, indicated by the job’s status. When a job is first created by using the Create Job endpoint, it begins in thePENDING
state. When it’s admitted into a queue for processing, its state changes to ADMITTED. When a
GPU is available to run the job, its state changes to RUNNING. Cancelled jobs transition to
the CANCELLED state, while successfully completed jobs have the state COMPLETE.
If a higher priority job is scheduled on the same GPU as an existing job, the existing job
transitions into the PREEMPTED state. It will automatically return to the PENDING state when
it is re-enqueued.
If a job fails, its state transitions to FAILED. This can happen for a variety of reasons such
as invalid input parameters or issues with the supplied data.
FAILED and CANCELLED jobs can be retried using the Retry
Job endpoint. This will attempt to continue the job from its
last checkpoint.
Running a Batch Using the Console
To run a batch using the console, use the Batch Manager.Running a Batch using the Batch Processing API
- Upload the data files to process using the Files API.
- Create the job using the Create Job endpoint. This creates the job and adds the job to the pipeline.
- Monitor the job’s status using the Get Job endpoint. The job has finished running
if its state is
COMPLETE,CANCELLED, or `FAILED. - If a job needs to be cancelled, use the Cancel Job endpoint.
Cancelled or failed jobs can be retried by using the Retry Job endpoint.
- Once the job is
COMPLETEorFAILED, you can get a list of the events that occurred on the job using the List Job Events endpoint. - You can get a list of the output files generated by the job using the List Outputs endpoint. Once you have the list, you can download the artifacts using HTTPS requests.
Examples
Find examples demonstrating various aspects of using the Batch Processing API in our archetypeai-batch-examples-volve repository. These examples use real-world sensor data from the Equinor Volve Data Village and show how to:- Prepare data for analysis
- Upload data to the platform
- Run the batch jobs
- Download the output data
- Evaluate the output data against ground truth