Backends: MoveNet / Whisper / STGCN — Connection steps

[!NOTE] For the latest implementation status, please refer to Functional Implementation Status (Remaining Functionality).

This document summarizes the steps to build an environment for real model backends (MoveNet/MediaPipe, Whisper/faster_whisper, Torch-STGCN) and utilize the *_real backend of evospikenet.video_analysis.backends.

1) Required packages (python 3.9+ recommended)

The basic dependencies are in requirements.txt, but to enable the actual backend, install the following additionally:

MoveNet / MediaPipe (Pose)

python3 -m pip install mediapipe

Whisper (faster_whisper recommended)

python3 -m pip install faster-whisper

Torch + STGCN model (recommended: CPU or CUDA environment)

python3 -m pip install torch
# Prepare the STGCN model in TorchScript format and set the path to the environment variable VIDEO_ANALYSIS_STGCN_MODEL.
export VIDEO_ANALYSIS_STGCN_MODEL=/path/to/stgcn_model.pt

2) Environment variables/settings

VIDEO_ANALYSIS_WHISPER_MODEL : Whisper model name (e.g. tiny)
VIDEO_ANALYSIS_WHISPER_DEVICE : cpu or cuda.
VIDEO_ANALYSIS_STGCN_MODEL : TorchScript model path.

Or add settings to Docs/video_analysis_config.yaml or settings.*.yaml.

3) Operation confirmation (smoke)

You can list the available backends by running the following from the repository root:

python3 tools/smoke_backends.py

The output returns the available flag for each pose / action / asr.

4) Discovery in CI

Since it is not always possible to prepare a real GPU for CI, it is recommended that tools/smoke_backends.py only reports whether a real backend is available and does not make it a requirement.