Prerequisites
The instructions below assume you are following best practices and have downloaded this git repo into a parent directory at: ~/atai. If you have already installed the ATAI Python Library and cloned the cookbook repository, proceed directly to Quickstart.Before proceeding, check the requirements and correct Python version for the ATAI Python Library. To upgrade your Python client settings, please check our Python Client library.
Install Conda
Setup dev environment
Install ATAI Python Library
Clone Cookbook Repository
Quick Start
Analyze video content through natural language queries using the interactive CLI.Running the Demo
From the Cookbook root directory with your conda environment activated:Sample videos are available in the
sample_videos/ directory for testing the demo.Interactive Prompts
- API Endpoint: Your ArchetypeAI API endpoint (press Enter to use the default)
- Input Type: Choose
video(local file) orrtsp(camera stream) - Source:
- For video: Path to video file (drag & drop supported)
- For RTSP: Camera stream URL
- Focus: Your question about the video (e.g., “Is there a person?”, “What’s happening?”)
- Temporal Focus: The interval between analyzed frames in seconds (default: 5)
Example Session
Output
The system provides timestamped natural language descriptions of the video content. Each line corresponds to a frame at the given timestamp, describing what the model observes at that point in the video. For local video files, responses are generated for the duration of the video. For RTSP streams, responses update continuously until you pressCtrl+C to stop monitoring.
Temporal Focus Configuration
The temporal focus defines the interval (in seconds) between analyzed frames in each inference cycle. A larger value means fewer frames are examined, which can reduce detail but speed up processing. A smaller value increases the granularity of analysis but may require more computation.Recommended Settings
| Setting | Value | Use Case |
|---|---|---|
| Default | 5 | Optimal for most scenarios |
| Short videos | 3 | Ensures sufficient granularity for brief clips |
| Long-form content | 10–15 | Captures broader context and patterns |