Skip to main content

Prerequisites

The instructions below assume you are following best practices and have downloaded this git repo into a parent directory at: ~/atai. If you have already installed the ATAI Python Library and cloned the cookbook repository, proceed directly to Quickstart.
Before proceeding, check the requirements and correct Python version for the ATAI Python Library. To upgrade your Python client settings, please check our Python Client library.

Install Conda

wget https://repo.anaconda.com/archive/Anaconda3-2022.05-Linux-x86_64.sh
bash Anaconda3-2022.05-Linux-x86_64.sh
source ~/.bashrc

Setup dev environment

conda create -n dev_env python=3.10
conda activate dev_env

Install ATAI Python Library

git clone [email protected]:archetypeai/python-client.git
cd python-client
python -m pip install .

Clone Cookbook Repository

git clone https://github.com/archetypeai/archetypeai-cookbook.git

Quick Start

Analyze video content through natural language queries using the interactive CLI.

Running the Demo

From the Cookbook root directory with your conda environment activated:
cd command-line-demos/activity-monitor
python quickstart.py
Sample videos are available in the sample_videos/ directory for testing the demo.

Interactive Prompts

  1. API Endpoint: Your ArchetypeAI API endpoint (press Enter to use the default)
  2. Input Type: Choose video (local file) or rtsp (camera stream)
  3. Source:
    • For video: Path to video file (drag & drop supported)
    • For RTSP: Camera stream URL
  4. Focus: Your question about the video (e.g., “Is there a person?”, “What’s happening?”)
  5. Temporal Focus: The interval between analyzed frames in seconds (default: 5)

Example Session

=== Activity Monitor ===

Enter your API Endpoint (Press Enter for default):
Input type (video/rtsp):
Input type (video/rtsp): video
Enter path to video file: /Users/PATH/Videos/delivery.mp4

Enter focus (what to look for): What's happening?
Temporal focus (default: 5):

--- Configuration Summary ---
API Endpoint: https://api.u1.archetypeai.app/v0.5
Input:  VIDEO
Video:  /Users/PATH/Videos/delivery.mp4
Focus:  What's happening?

Press Enter to start monitoring...
Uploading video: /Users/PATH/Videos/delivery.mp4

Monitoring started — looking for: 'What's happening?'
Press Ctrl+C to stop

00:00:05: A FedEx Ground delivery truck is parked on the street, and a person is walking towards it with a package.
00:00:06: A FedEx Ground delivery truck is parked on the street, and a person is walking towards it with a package.
00:00:07: A FedEx Ground delivery truck is parked on the street, and a person is walking towards it with a package.
...
Stopped.
Session finished.

Output

The system provides timestamped natural language descriptions of the video content. Each line corresponds to a frame at the given timestamp, describing what the model observes at that point in the video. For local video files, responses are generated for the duration of the video. For RTSP streams, responses update continuously until you press Ctrl+C to stop monitoring.

Temporal Focus Configuration

The temporal focus defines the interval (in seconds) between analyzed frames in each inference cycle. A larger value means fewer frames are examined, which can reduce detail but speed up processing. A smaller value increases the granularity of analysis but may require more computation.
SettingValueUse Case
Default5Optimal for most scenarios
Short videos3Ensures sufficient granularity for brief clips
Long-form content10–15Captures broader context and patterns