EvoSpikeNet inference pipeline implementation plan
[!NOTE]
For the latest implementation status, please refer to Functional Implementation Status (Remaining Functionality).
Target version: EvoSpikeNet v4.0
Creation date: 2026-04-01
Author: Masahiro Aoki / Moonlight Technologies Inc.
table of contents
- [Summary/Purpose] (#1-Summary Purpose)
- Whole system architecture
- Module configuration
- [Inference pipeline details] (#4-Inference pipeline details)
- 4.1 Text Inference Sequence
- 4.2 [Multimodal Inference Sequence] (#42-Multimodal Inference Sequence)
- 4.3 RAG inference sequence
- 4.4 [Distributed Brain Inference Sequence] (#44-Distributed Brain Inference Sequence)
- [Startup/Settings] (#5-Startup settings)
- 5.1 [Environment variable list] (#51-Environment variable list)
- 5.2 [Configuration file structure] (#52-Configuration file structure)
- 5.3 Startup procedure
- API endpoint list
- TAS encoding details
- [Memory cache management] (#8-Memory cache management)
- [Implementation checklist (TODO)] (#9-Implementation checklist todo)
- Future expansion plan
1. Overview/Purpose
EvoSpikeNet is a large-scale evolutionary AI framework with spiking neural networks (SNN) at its core.
This document describes the implementation plan as a "large-scale AI inference service" that uses the trained model to return inference results for multimodal inputs such as text/image/audio/EEG.
Design principles
| Principles |
Contents |
| Spike Drive |
TAS encoding (patent MT25-EV002) converts all inputs into spike trains |
| Multimodal Integration |
Integrate text, image, and audio encoders with Fusion layer |
| Distributed inference |
Multi-node distributed processing with Zenoh messaging |
| RAG extension |
VectorDB adapter switching (InMemory / FAISS / Milvus / Chroma / Qdrant) + knowledge search extension generation |
| Async Pipeline |
Parallel batch processing with AsyncPipeline |
2. Overall system architecture
graph TB
subgraph "入力レイヤー"
P[Prompt<br/>Text]
I[Image]
A[Audio<br/>Audio/MFCC]
E[EEG signal]
W[WebSocket<br/>Real-time]
end
subgraph "API サーバー (FastAPI :8000)"
GW[API Gateway<br/>Security/Rate Limits<br/>api.py]
GEN[Text generation<br/>POST /api/generate]
MM[Multimodal<br/>POST /api/multimodal]
RAG2[RAG Query<br/>POST /api/rag/query]
MEM[Memory operations<br/>/api/memory/*]
PIP[Pipeline<br/>/api/pipeline/submit]
end
subgraph "推論コア"
TAS[TASEncoderDecoder<br/>Spike encoding]
VIS[SpikingEvoVisionEncoder<br/>CNN→Spike]
AUD[SpikingEvoAudioEncoder<br/>MFCC → Spiking]
FUS[Fusion Layer<br/>Modality integration]
STB[SpikingTransformerBlocks<br/>LIF neuron processing]
DEC[Output FC<br/>Logit → Text]
end
subgraph "モデルクラス"
STLM[SpikingEvoTextLM<br/>Text only]
MMLM[SpikingEvoMultiModalLM<br/>Text + Image + Audio]
EMBED[SNNEmbeddingModel<br/>RAG embedding]
end
subgraph "メモリ・ストレージ"
EPI[EpisodicMemoryNode<br/>Episode memory]
SEM[SemanticMemoryNode<br/>Semantic Memory]
INTEG[MemoryIntegratorNode<br/>Memory Integration]
DB[(PostgreSQL<br/>model artifact)]
VDB[(VectorDB Adapter<br/>InMemory/FAISS/Milvus/Chroma/Qdrant)]
ES[(Elasticsearch<br/>Full text search)]
end
subgraph "分散ネットワーク"
ZR[Zenoh Router<br/>:7447]
N1[Brain Node<br/>prefrontal]
N2[Brain Node<br/>hippocampus]
N3[Brain Node<br/>cerebellum]
N4[Brain Node<br/>motor_cortex]
end
subgraph "外部サービス"
REDIS[Redis<br/>Pub/Sub]
OPA[OPA<br/>Authorization Policy]
RAG_SRV[RAG API Server<br/>:8001]
end
P --> GW
I --> GW
A --> GW
E --> GW
W --> GW
GW --> GEN & MM & RAG2 & MEM & PIP
GEN --> STLM
MM --> MMLM
RAG2 --> RAG_SRV
STLM --> TAS --> STB --> DEC
MMLM --> TAS
MMLM --> VIS
MMLM --> AUD
TAS & VIS & AUD --> FUS --> STB --> DEC
EMBED --> VDB
GEN & MM --> EPI & SEM
EPI & SEM --> INTEG
DB -- モデルロード --> STLM & MMLM
GW <--> ZR
ZR <--> N1 & N2 & N3 & N4
GW --> REDIS
GW --> OPA
classDef core fill:#1a6fd6,stroke:#0d4ca0,color:#fff
classDef model fill:#d65a1a,stroke:#a04010,color:#fff
classDef storage fill:#1a9655,stroke:#0d6e3e,color:#fff
classDef external fill:#9629b5,stroke:#701985,color:#fff
class TAS,VIS,AUD,FUS,STB,DEC core
class STLM,MMLM,EMBED model
class EPI,SEM,INTEG,DB,VDB,ES storage
class REDIS,OPA,RAG_SRV external
3. Module configuration
graph LR
subgraph "evospikenet/"
direction TB
API[api.py<br/>FastAPI entry point]
subgraph "api_modules/"
T_API[training_api.py]
RAG_API[rag_api.py]
MEM_API[memory_api.py]
PIP_API[pipeline_api.py]
DIST_API[distributed_brain_api.py]
EEG_API[eeg_api.py]
EVO_API[evolution_api.py]
end
subgraph "コアモデル"
MODELS[models.py<br/>SpikingEvoTextLM<br/>SpikingEvoMultiModalLM<br/>TransformerLM]
ENC[encoding.py<br/>TASEncoderDecoder]
VIS_M[vision.py<br/>SpikingEvoVisionEncoder]
AUD_M[audio.py<br/>SpikingEvoAudioEncoder]
ATT[attention.py<br/>SpikingTransformerBlock]
end
subgraph "インフラ"
BATCH[batch_optimizer.py<br/>DynamicBatchProcessor]
MEMM[memory_manager.py<br/>MemoryManager]
TC[tensor_cache.py<br/>TensorCache]
LB[load_balancer.py<br/>AILoadBalancer]
ASYNC[async_pipeline.py<br/>AsyncPipeline]
end
subgraph "分散通信"
ZENOH[zenoh_comm.py<br/>zenoh_async.py]
DBN[distributed_brain_node.py<br/>DistributedBrainNode]
end
subgraph "記憶システム"
EPIM[episodic_memory.py]
LTMM[long_term_memory.py]
MN[memory_nodes.py<br/>EpisodicMemoryNode<br/>SemanticMemoryNode]
end
subgraph "RAG"
RAG_C[rag_client.py]
RAG_B[rag_backends.py]
SNN_RAG[snn_rag.py<br/>SNNEmbeddingModel]
end
end
API --> api_modules/
API --> コアモデル
API --> インフラ
API --> 分散通信
API --> 記憶システム
API --> RAG
List of module responsibilities
| File |
Class/Function |
Responsibility |
models.py |
SpikingEvoTextLM |
Text-only SNN language model (learning/inference) |
models.py |
SpikingEvoMultiModalLM |
Text + Image + Audio Multimodal SNN |
models.py |
TransformerLM |
Standard Transformer (distillation teacher model) |
models.py |
SNNEmbeddingModel |
Text embedding model for RAG |
encoding.py |
TASEncoderDecoder |
TAS Spike Encoding (Patent MT25-EV002) |
vision.py |
SpikingEvoVisionEncoder |
Image → Spiking conversion CSNN |
audio.py |
SpikingEvoAudioEncoder |
MFCC → Spiking conversion SNN |
attention.py |
SpikingTransformerBlock |
Spiking-aware Self-Attention block |
batch_optimizer.py |
DynamicBatchProcessor |
Dynamic batch size optimization |
memory_manager.py |
MemoryManager |
GPU/CPU memory efficiency |
tensor_cache.py |
TensorCache |
Tensor cache (inference acceleration) |
load_balancer.py |
AILoadBalancer |
AI predictive load balancing |
async_pipeline.py |
AsyncPipeline |
Asynchronous parallel processing pipeline |
api_modules/training_api.py |
router |
Learning API (background startup) |
api_modules/rag_api.py |
router |
RAG Proxy API |
api_modules/memory_api.py |
router |
Episodic/Semantic Memory API |
distributed_brain_node.py |
DistributedBrainNode |
Distributed Brain Node |
zenoh_async.py |
AsyncZenohCommunicator |
Zenoh asynchronous communication |
snn_rag.py |
SNNEmbeddingModel |
SNN vector embedding |
4. Inference pipeline details
4.1 Text Inference Sequence
sequenceDiagram
actor Client
participant GW as API Gateway<br/>(api.py)
participant SM as SecurityMiddleware<br/>RateLimiter / OPA
participant BP as DynamicBatchProcessor
participant MM as MemoryManager
participant LM as SpikingEvoTextLM
participant TAS as TASEncoderDecoder
participant STB as SpikingTransformerBlocks
participant LIF as LIFNeuronLayer (lif_out)
participant FC as output_fc (Linear)
participant TOK as BertTokenizer
participant DB as PostgreSQL<br/>(DataArtifact)
participant EMN as EpisodicMemoryNode
Note over DB,LM: 起動時: load_model_and_tokenizer() (background thread)
DB -->> LM: checkpoint.pth (state_dict)
DB -->> TOK: tokenizer_state / bert-base-uncased
Client->>GW: POST /api/generate<br/>{"prompt": "...", "max_length": 100}
GW->>SM: セキュリティ検証
SM-->>GW: OK (API Key / Rate Pass)
GW->>TOK: tokenize(prompt)
TOK-->>GW: prompt_tokens: Tensor[1, seq_len]
GW->>BP: process_batch(prompt_tokens, inference_func)
BP->>MM: optimize_tensor_memory(prompt_tokens)
MM-->>BP: 最適化済みテンソル
loop max_new_tokens 回
BP->>LM: model.generate(prompt_tokens, max_new_tokens)
LM->>TAS: encode(tokens)<br/>λ=σ(E), φ=pos×Δφ → spike_trains[batch, time, seq, dim]
TAS-->>LM: spike_trains
loop SpikingTransformerBlock × N
LM->>STB: forward(spike_trains)
STB-->>LM: processed_spikes
end
LM->>LIF: reset(batch*seq), forward(spike_trains_flat[step])
LIF-->>LM: output_spikes (sum over time = rate code)
LM->>FC: linear(output_potential_sum)
FC-->>LM: logits[batch, seq, vocab_size]
LM->>LM: softmax + multinomial sampling (temperature, top_k)
end
LM-->>GW: generated_tokens[1, seq_len + max_new_tokens]
GW->>TOK: decode(generated_tokens, skip_special_tokens=True)
TOK-->>GW: generated_text: str
GW->>BP: optimize_tensor_memory(generated_tokens)
GW->>EMN: store_episodic_memory(generated_text)
GW-->>Client: {"generated_text": "...", "prompt": "..."}
4.2 Multimodal Inference Sequence
sequenceDiagram
actor Client
participant GW as API Gateway
participant MM_LM as SpikingEvoMultiModalLM
participant TAS as TASEncoderDecoder<br/>(text)
participant VIS as SpikingEvoVisionEncoder<br/>(CSNN)
participant AUD as SpikingEvoAudioEncoder<br/>(SNN)
participant FUS as Fusion Layer<br/>(Linear + LayerNorm)
participant STB as SpikingTransformerBlocks
participant DEC as Output FC
Client->>GW: POST /api/multimodal<br/>text + image (base64) + audio (base64)
Note over GW: 画像: base64 → PIL → Tensor[B,C,H,W]<br/>音声: base64 → waveform → MFCC Tensor[B,T,13]
par テキストエンコード
GW->>TAS: encode(token_ids)
TAS-->>MM_LM: text_spikes[batch, time, seq, dim]
and 画像エンコード
GW->>VIS: forward(image_tensor)
Note over VIS: conv1→LIF1→pool1<br/>conv2→LIF2→pool2<br/>flatten→fc_lif
VIS-->>MM_LM: image_spikes[batch, time, dim]
and 音声エンコード
GW->>AUD: forward(mfcc_tensor)
AUD-->>MM_LM: audio_spikes[batch, time, dim]
end
MM_LM->>FUS: concat([text_spikes, image_spikes, audio_spikes])
FUS->>FUS: Linear projection + LayerNorm
FUS-->>STB: fused_spikes[batch, time, seq, dim]
loop SpikingTransformerBlock × N
STB->>STB: SpikingMultiHeadAttention + SNN-FFN
end
STB-->>DEC: processed_spikes
DEC->>DEC: rate-code decode → logits
DEC-->>GW: generated_text
GW-->>Client: {"generated_text": "...", "modalities_used": ["text","image","audio"]}
4.3 RAG Inference Sequence
sequenceDiagram
actor Client
participant GW as API Gateway
participant RAG_P as RAG Proxy<br/>(/api/rag/query)
participant RAG_SRV as RAG API Server<br/>(:8001)
participant EMBED as SNNEmbeddingModel
participant VDB as VectorDB Adapter
participant ES as Elasticsearch
participant LM as SpikingEvoTextLM
Note over VDB,ES: 事前: 学習データ/ドキュメントをインデックス化
Client->>GW: POST /api/rag/documents/text<br/>{"text": "Knowledge text..."}
GW->>RAG_SRV: forward document
RAG_SRV->>EMBED: encode(text) → vector[768]
EMBED->>EMBED: TASEncode → SpikingTransformer → mean pooling
EMBED-->>RAG_SRV: embedding vector
RAG_SRV->>VDB: upsert(id, vector, metadata)
RAG_SRV->>ES: index(document)
RAG_SRV-->>GW: {"status": "indexed"}
GW-->>Client: 200 OK
Client->>GW: POST /api/rag/query<br/>{"query": "EvoSpikeNetの学習方法は?", "k": 5}
GW->>RAG_P: proxy request
RAG_P->>RAG_SRV: POST /query
RAG_SRV->>EMBED: encode(query) → query_vector
RAG_SRV->>VDB: search(query_vector, top_k=5)
VDB-->>RAG_SRV: retrieved_docs[5]
RAG_SRV->>ES: fulltext_search(query)
ES-->>RAG_SRV: es_docs
RAG_SRV->>RAG_SRV: hybrid rerank (vector + BM25)
RAG_SRV->>LM: generate(context + query)
LM-->>RAG_SRV: answer_text
RAG_SRV-->>RAG_P: {"answer": "...", "sources": [...]}
RAG_P-->>GW: response
GW-->>Client: RAG 強化推論結果
4.4 Distributed Brain Inference Sequence
sequenceDiagram
actor Client
participant API as API Server<br/>(:8000)
participant ZR as Zenoh Router<br/>(:7447)
participant N_PFC as BrainNode<br/>prefrontal
participant N_HPC as BrainNode<br/>hippocampus
participant N_CRB as BrainNode<br/>cerebellum
participant N_MOT as BrainNode<br/>motor_cortex
participant MEM_INT as MemoryIntegratorNode
Client->>API: POST /api/distributed_brain/query<br/>{"prompt": "...", "nodes": ["prefrontal","hippocampus"]}
API->>ZR: publish("evospikenet/brain/query", {prompt_id, prompt})
par 並列ブレインノード処理
ZR->>N_PFC: query
N_PFC->>N_PFC: BrainSimulation.forward()<br/>→ SpikingTransformerBlock 処理
N_PFC-->>ZR: publish("evospikenet/brain/result", {node_id, partial_result})
and
ZR->>N_HPC: query
N_HPC->>N_HPC: LongTermMemoryModule 参照<br/>→ 記憶検索 + 推論
N_HPC-->>ZR: partial_result
and
ZR->>N_CRB: query
N_CRB->>N_CRB: 運動制御パターン推論
N_CRB-->>ZR: partial_result
end
ZR-->>API: on_result callback (subscribe "evospikenet/api/result")
API->>API: write result → /tmp/evospikenet_query_result_{prompt_id}.json
API->>MEM_INT: integrate(partial_results)
MEM_INT->>MEM_INT: EpisodicMemoryNode + SemanticMemoryNode 統合
API-->>Client: {"merged_response": "...", "contributing_nodes": [...]}
Note over N_PFC,N_MOT: 各ノードはZenohを介して<br/>リアルタイムに結果を共有・収束する
5. Startup/Settings
5.1 Environment variable list
| Variable name |
Default |
Description |
DATABASE_URL |
sqlite:///./evospikenet.db |
DB connection string |
EVOSPIKENET_API_KEY |
test-api-key |
API authentication key |
EVOSPIKENET_API_KEYS |
test-api-key |
Multiple keys separated by comma |
EVOSPIKENET_ALLOW_NO_AUTH |
false |
Skip authentication (for development) |
LOG_LEVEL |
INFO |
Log level (DEBUG/INFO/WARNING/ERROR) |
DEVICE |
cpu |
Inference device (cpu / cuda) |
UVICORN_WORKERS |
4 |
number of uvicorn workers |
RAG_VECTOR_DB_BACKEND |
inmemory |
RAG Vector DB backend (inmemory / faiss / milvus / chroma / qdrant) |
MILVUS_HOST |
milvus-standalone |
Milvus host |
ELASTICSEARCH_HOST |
elasticsearch |
Elasticsearch host |
RAG_API_URL |
http://rag-api:8001 |
RAG API Server URL |
DISTRIBUTED_WS_REDIS |
redis://redis:6379 |
Redis for WebSocket Pub/Sub |
EVOSPIKENET_OPA_URL |
(not configured) |
OPA policy server URL |
EVOSPIKENET_OPA_ENABLED |
false |
Enable OPA authorization |
NODE_ID |
api_node |
Distributed node identifier |
ACTIVE_RANKS |
0,1,2,3,4,5,6 |
Active brain node ranks |
5.2 Configuration file structure
config/
├── settings.yaml # Default settings
├── settings.development.yaml # Development environment override
├── settings.staging.yaml # staging environment
├── settings.production.yaml # Production environment
├── training_config.yaml # learning hyperparameters
├── connectome_config.yaml # Connectome connection settings
└── node_allocation.yaml # Distributed node assignment settings
settings.yaml main parameters:
# model settings
model:
default_device: "cpu" # GPU inference by changing to cuda
enable_gpu: false
hidden_size: 256 # d_model
num_layers: 4 # SpikingTransformerBlock number
num_heads: 8 # Number of attention heads
batch_size: 32
# API server settings
api:
host: "0.0.0.0"
port: 8000
workers: 4
max_request_size: 104857600 # 100MB
# Zenoh distributed communication
zenoh:
router_address: "tcp/zenoh-router:7447"
mode: "client"
5.3 Startup procedure
Pattern A: Local development (direct launch of uvicorn)
# 1. Virtual environment setup
cd /home/maoki/GitHub
source .venv/bin/activate
# 2. Dependency installation
cd EvoSpikeNet-Core
pip install -e ".[dev]"
# 3. DB migration
alembic upgrade head
# 4. Environment variable settings
export DATABASE_URL="sqlite:///./evospikenet_dev.db"
export EVOSPIKENET_ALLOW_NO_AUTH=true
export LOG_LEVEL=DEBUG
export DEVICE=cpu
# 5. Start API server
uvicorn evospikenet.api:app \
--host 0.0.0.0 \
--port 8000 \
--reload \
--log-level debug
# → Check Swagger UI at http://localhost:8000/docs
# → Model is automatically loaded from DB (requires trained artifacts)
Pattern B: Flow from learning
# Step 1: Text model learning
python examples/train_spiking_evospikenet_lm.py \
--source wikipedia \
--page "Artificial_intelligence" \
--lang en \
--epochs 10 \
--d-model 256 \
--n-heads 4 \
--num-blocks 4 \
--run-name my_model_v1 \
--upload-to-db
# Step 2: Multimodal model learning
python examples/train_multi_modal_lm.py \
--csv data/captions.csv \
--img-dir data/images/ \
--epochs 5 \
--run-name multimodal_v1
# Step 3: Start the inference server
uvicorn evospikenet.api:app --host 0.0.0.0 --port 8000
# Step 4: Text generation test
curl -X POST http://localhost:8000/api/generate \
-H "Content-Type: application/json" \
-H "X-API-Key: test-api-key" \
-d '{"prompt": "スパイキングニューラルネットワークとは", "max_length": 100}'
# Step 5: Direct script generation
python examples/run_spiking_lm_generation.py \
--run-name my_model_v1 \
--prompt "AIの未来について" \
--max-new-tokens 100 \
--temperature 0.8 \
--top-k 40
Pattern C: Docker Compose (full environment)
# GPU environment (CUDA inference)
docker-compose -f docker-compose.gpu.yml up -d
# CPU only environment
docker-compose -f docker-compose.cpu-only.yml up -d
# Distributed brain environment (multi-node)
docker-compose -f docker-compose.distributed.yml up -d
# Service confirmation
docker-compose ps
# evospikenet-api :8000 (FastAPI)
# evospikenet-frontend :8050 (Dash UI)
# evospikenet-postgres :5432
# evospikenet-redis :6379
# evospikenet-zenoh-router :7447
# milvus-standalone :19530
# elasticsearch :9200
# rag-api :8001
Pattern D: Single inference script execution
# Text generation (directly from trained model)
python examples/run_spiking_lm_generation.py \
--run-name <run_name> \
--prompt "プロンプトテキスト" \
--max-new-tokens 200 \
--temperature 0.8
# Multimodal inference demo
python examples/evaluate_multi_modal_lm.py \
--model-path saved_models/multimodal_v1/
# RAG query demo
python examples/rag_ingest_and_query.py
6. List of API endpoints
Text Reasoning
| Method |
Path |
Description |
POST |
/api/generate |
Text generation (SpikingEvoTextLM) |
GET |
/api/model_status |
Check model loading status |
Multimodal inference (implementation plan)
| Method |
Path |
Description |
State |
POST |
/api/multimodal/generate |
Text + Image + Audio Multimodal Inference |
TODO |
POST |
/api/vision/encode |
Image spike encoding |
TODO |
POST |
/api/audio/encode |
Audio → MFCC → Spike conversion |
TODO |
RAG
| Method |
Path |
Description |
POST |
/api/rag/query |
RAG query (search + generation) |
GET |
/api/rag/documents |
Document list |
POST |
/api/rag/documents/text |
Text document registration |
POST |
/api/rag/documents/file |
File upload |
Foundation Extensions (implemented)
| Method |
Path |
Description |
GET |
/api/foundation/status |
Check the enabled status and default settings of the foundation function |
POST |
/api/foundation/scale-benchmark |
Scale bench execution |
POST |
/api/foundation/secure-store/roundtrip |
Round-trip validation of encrypted store at rest |
POST |
/api/foundation/zero-trust/evaluate |
Zero trust evaluation |
POST |
/api/foundation/meta-learning/step |
MAML few-shot update step |
POST |
/api/foundation/xai/explain |
XAI explanation generation + failsafe evaluation |
Storage system
| Method |
Path |
Description |
POST |
/api/memory/episodic/store |
Episodic memory storage |
GET |
/api/memory/episodic/retrieve |
Episodic memory retrieval |
POST |
/api/memory/semantic/store |
Semantic memory storage |
GET |
/api/memory/semantic/retrieve |
Semantic memory retrieval |
Asynchronous pipeline
| Method |
Path |
Description |
POST |
/api/pipeline/submit |
Task submission (NORMAL/HIGH/CRITICAL) |
GET |
/api/pipeline/metrics |
Pipeline metrics |
GET |
/api/pipeline/status |
Pipeline operating status |
study
| Method |
Path |
Description |
POST |
/api/train/spiking-lm |
Start learning SpikingEvoTextLM |
POST |
/api/train/transformer |
Start learning TransformerLM |
POST |
/api/train/distillation |
ANN→SNN distillation learning |
GET |
/api/train/status/{session_id} |
Learning session status |
WebSocket Streaming
| Protocol |
Path |
Description |
WS |
/ws/audio |
Real-time audio streaming |
WS |
/ws/video |
Real-time video streaming |
WS |
/ws/brain_stream |
Distributed brain state real-time distribution |
7. TAS encoding details
TAS-Encoding (patent MT25-EV002) is the core technology that converts all text input into spike trains.
flowchart LR
subgraph "TASEncoderDecoder.forward(tokens)"
direction TB
TOK["token_ids\n[batch, seq_len]"]
EMB["nn.Embedding\n→ E [batch, seq, dim]"]
LAMBDA["λ = σ(E)\nFiring rate calculation"]
PHI["φ = pos × Δφ\nPhase offset"]
N_SPIKE["n = round(λ × (T − φ))\nDetermine the number of spikes"]
PLACE["Continuous spike placement\n[batch, time, seq, dim]"]
TOK --> EMB --> LAMBDA & PHI
LAMBDA & PHI --> N_SPIKE --> PLACE
end
PLACE --> STB["SpikingTransformerBlocks"]
Formula:
\[\lambda = \sigma(E) \in [0, 1] \quad \text{(firing rate)}\]
\[\phi = \text{pos} \times \Delta\phi \quad \text{(phase offset)}\]
\[n = \text{round}(\lambda \times (T - \phi)) \quad \text{(number of spikes)}\]
Lossless decoding (Claim 3):
Recover 100% of the original token from \((λ, φ)\) pairs using nearest neighbor search.
8. Memory cache management
flowchart TD
subgraph "推論時メモリフロー"
IN[input tensor]
MC{TensorCache\nヒット?}
OPT[memory_manager\noptimize_tensor_memory]
FWD[model forward]
CACHE[TensorCache.put]
OUT[output tensor]
IN --> MC
MC -->|hit| OUT
MC -->|mistake| OPT --> FWD --> CACHE --> OUT
end
subgraph "メモリ管理コンポーネント"
MM[MemoryManager]
GC[GPU Memory Cleanup\ntorch.cuda.empty_cache]
PY_GC[Python GC\ngc.collect]
WEAK[WeakRef registration\nModel automatic release]
MM --> GC & PY_GC & WEAK
end
subgraph "バッチ最適化"
BP[DynamicBatchProcessor]
METRICS[BatchOptimizationMetrics\nProcessing time/memory usage\nGPU usage tracking]
OPT_BS[Optimal batch size automatic calculation]
BP --> METRICS --> OPT_BS --> BP
end
9. Implementation checklist (TODO)
Phase 1: Basic Text Reasoning (Completed)
- [x]
SpikingEvoTextLM learning pipeline (train_spiking_evospikenet_lm.py)
- [x]
POST /api/generate inference endpoint
- [x] Automatic model loading from DB (
load_model_and_tokenizer)
- [x]
run_spiking_lm_generation.py script inference
- [x] Inference optimization with
DynamicBatchProcessor
- [x] Inference caching with
TensorCache
Phase 2: Multimodal inference (to be implemented)
- [ ] Implementation of
load_multimodal_model() function to load SpikingEvoMultiModalLM from DB
- File:
evospikenet/api.py
- Implementation details: Load the latest multimodal model from DB in the same way as text
- [ ] Add
POST /api/multimodal/generate endpoint
- File: Create new
multimodal_api.py in evospikenet/api_modules/
- Input:
{"prompt": str, "image": base64_str, "audio": base64_str}
- Preprocessing: Image → PIL → Tensor[B,C,H,W], Audio → torchaudio MFCC
- [ ]
generate() method extension for multimodal inference
- File:
SpikingEvoMultiModalLM in evospikenet/models.py
- Implementation details: Implemented autoregressive generation loop according to
SpikingEvoTextLM.generate()
- [ ]
POST /api/vision/encode endpoint
- [ ]
POST /api/audio/encode endpoint
Phase 3: RAG enhanced inference (implemented, continued operational optimization)
- [x]
POST /api/rag/query proxy endpoint
- [x] SNN vector embedding with
SNNEmbeddingModel
- [x] RAG VectorDB adapter implementation (
rag_backends.py) and runtime switching API (rag_client.py)
- [ ] Fixed indentation/unreachability bug in
SNNEmbeddingModel.forward()
- File:
SNNEmbeddingModel in evospikenet/models.py
- Problem:
forward() is defined inside models_api_summary() and is unreachable.
- Fixed: Moved
forward() method into SNNEmbeddingModel class
Phase 4: Distributed Brain Inference (Partially Implemented)
- [x]
DistributedBrainNode + AsyncZenohCommunicator
- [x] Results transfer via Zenoh Pub/Sub
- [ ] Multi-node fusion logic for
POST /api/distributed_brain/query
- [ ] Distributed result integration with
MemoryIntegratorNode
- [ ] Inference reflection of emotion/reward signals via
BiomimeticAdapter
Phase 5: Large Scale Capability (Future Plan)
- [ ] Large-scale batch inference with integration with
vLLM / DeepSpeed Inference
- [ ] Model sharding (supports
tensor_parallel)
- [ ] Export pipeline to ONNX / TensorRT
- [ ] Quantization inference option (INT8/FP16 automatic switching)
- File:
ModelCompressor in evospikenet/model_compressor.py
Bug fixes (priority)
- [ ] Severe:
SNNEmbeddingModel.forward() misplaced in models_api_summary()
evospikenet/models.py about L460-L475
- [ ]
generate() method is not implemented in SpikingEvoMultiModalLM
- [ ]
self.train() call of SpikingEvoTextLM.generate() is unnecessary after inference (eval() is recommended to be retained)
Supplement: Items reflected as of 2026-04-06
- [x] Foundation API addition (
/api/foundation/*) and API router integration
- [x] Foundation settings reflected in
settings.yaml / settings.schema.json
- [x] system regression testing
tests/system/test_scalability_and_performance.py stabilization
10. Future expansion plans
gantt
title EvoSpikeNet 推論パイプライン実装ロードマップ
dateFormat YYYY-MM-DD
section フェーズ1 (Completed)
テキスト推論基盤 :done, p1, 2025-01-01, 2025-06-30
DB アーティファクト管理 :done, p1b, 2025-04-01, 2025-06-30
section フェーズ2 (in progress)
SNNEmbeddingModel バグ修正 :active, p2a, 2026-04-01, 2026-04-07
multimodal_api.py 実装 :p2b, 2026-04-07, 2026-04-21
マルチモーダル generate() :p2c, 2026-04-14, 2026-04-28
section フェーズ3
RAG 統合テスト :p3a, 2026-04-21, 2026-05-05
分散ブレイン融合ロジック :p3b, 2026-05-01, 2026-05-15
section フェーズ4
量子化推論 (INT8/FP16) :p4a, 2026-05-10, 2026-05-31
ONNX/TensorRT エクスポート :p4b, 2026-06-01, 2026-06-30
大規模バッチ (vLLM integration) :p4c, 2026-06-15, 2026-07-31
Technology stack
推論サービス : FastAPI + uvicorn (async)
モデルフレームワーク: PyTorch + snntorch
エンコーディング : TAS (MT25-EV002 特許)
分散通信 : Eclipse Zenoh
ベクトルDB : VectorDB Adapter(InMemory / FAISS / Milvus / Chroma / Qdrant)
全文検索 : Elasticsearch
メッセージキュー : Redis Pub/Sub
認可 : Open Policy Agent (OPA)
コンテナ : Docker / Kubernetes (k8s/)
監視 : Prometheus metrics (/metrics)
ドキュメント : MkDocs
Moonlight Technologies Inc. © 2026 — EvoSpikeNet v4.0