Detailed explanation of EvoSpikeNet RAG system
[!NOTE] For the latest implementation status, please refer to Functional Implementation Status (Remaining Functionality).
Implementation notes (artifacts): See
docs/implementation/ARTIFACT_MANIFESTS.mdfor theartifact_manifest.jsonoutput by the training script and recommended CLI flags.
Creation date: December 10, 2025 Last updated: February 19, 2026 (RAG upload/version control implementation synchronization)
Copyright: 2025 Moonlight Technologies Inc. All Rights Reserved.
Author: Masahiro Aoki
NOTE: The implementation is carved out in
rag-system/and at runtime EvoSpikeNet nodes only interact via the RAG API. Seerag-system/README.mdfor details and startup instructions.
Purpose and use of this document
- Purpose: To provide an overview of the processing flow, technical specifications, and implementation locations of the RAG system, and to provide reference for development/operation.
- Target audience: RAG implementation/operation personnel, QA, PM.
- First reading order: Table of contents → RAG system overview → Document registration/search flow → Technical specifications.
- Related links: Distributed brain script in
examples/run_zenoh_distributed_brain.py, PFC/Zenoh/Executive details in implementation/PFC_ZENOH_EXECUTIVE.md.
table of contents
- RAG System Overview
- [Document registration process flow] (#2-Document registration process flow)
- [Search processing flow] (#3-Search processing flow)
- [Technical specifications details] (#4-Technical specifications details)
- [Implementation code explanation] (#5-Implementation code explanation)
- Advanced Features
1. RAG system overview
EvoSpikeNet's RAG (Retrieval-Augmented Generation) system employs a Hybrid Search architecture. This is a system that achieves highly accurate document retrieval and generation by executing vector searches based on semantic similarity and full text searches based on keyword matching in parallel, and integrating the results using the Reciprocal Rank Fusion (RRF) algorithm.
1.1. Main components
| Components | Roles | Technology Stack |
|---|---|---|
| Milvus | Vector database (semantic search) | Vector dimension: 384 dimensions, index: IVF_FLAT |
| Elasticsearch | Full text search engine (keyword search) | BM25 algorithm, kuromoji Japanese tokenizer |
| SentenceTransformer | Text vectorization | paraphrase-multilingual-MiniLM-L12-v2 (multilingual support) |
| RRF algorithm | Search result integration | k=60 |
| LLM backend | Text generation | HuggingFace / SNN / Standard LM |
1.2. Data structures
Schema of Milvus collection rag_kb:
{
"id": INT64 (Primary Key, auto_id=True),
"embedding": FLOAT_VECTOR (dim=384),
"text": VARCHAR (max_length=65535), # Maximum 65,535 characters
"source": VARCHAR (max_length=255)
}
2. Document registration process flow
The process of registering documents in the RAG system employs a Dual Indexing strategy. By registering the same document in both Milvus for vector searches and Elasticsearch for keyword searches, you can leverage both search engines during subsequent searches.
2.1. Process flow diagram
sequenceDiagram
participant U as "ユーザー/UI"
participant A as "add_user_text"
participant E as "Embedding Model"
participant M as "Milvus Collection"
participant ES as "Elasticsearch"
U->>+A: 文書テキスト, ソース情報
Note over A: 入力検証: 空文字チェック
A->>+E: テキストのエンコード
Note over E: SentenceTransformer: paraphrase-multilingual-MiniLM-L12-v2
E-->>-A: 384次元ベクトル
par Milvusへの登録
A->>+M: Insert Entity: embedding, text, source
Note over M: Auto-ID生成, IVF_FLATインデックス更新
M->>M: Flush: 永続化
M-->>-A: doc_id: Primary Key
and Elasticsearchへの登録
A->>+ES: Index Document: id=doc_id, content=text
Note over ES: BM25インデックス構築, Kuromoji Tokenization 日本語
ES-->>-A: 成功確認
end
A-->>-U: Added document with ID: doc_id
2.2. Error handling
- Milvus connection failure: The
_ensure_milvus_connectionfunction will perform up to 3 retries (at 5 second intervals) to account for the delay in starting the container.
3. Search processing flow
The search process consists of three stages: Parallel Hybrid Search + RRF Integration + LLM Generation.
3.1. Overall flow diagram
graph TD
A[user query] --> B[query sanitization]
B --> C{並列実行}
C --> D1[Vector search<br/>Milvus]
C --> D2[Keyword search<br/>Elasticsearch]
D1 --> E1[Embedding]
E1 --> E2[L2 distance calculation]
E2 --> F1[Top-K Results<br/>+ Score]
D2 --> G1[Kuromoji<br/>Tokenize]
G1 --> G2[BM25 scoring]
G2 --> F2[Top-K Results<br/>+ Score]
F1 --> H[RRF integration<br/>k=60]
F2 --> H
H --> I[Integrated ranking]
I --> J[Deduplication]
J --> K[Get top documents]
K --> L[Context construction]
L --> M[Prompt generation]
M --> N[LLM Reasoning]
N --> O[Hallucination suppression<br/>Post-processing]
O --> P[Return to user]
3.2. Generation step details
Step 1: Language detection and prompt selection
Detects the query language (Japanese/English) and automatically selects a prompt template optimized for each. This facilitates natural answer generation according to the language.
Step 2: Context construction and truncation
Concatenates the documents found in the search and constructs a single text separated by \n---\n. A truncation process will be performed to limit the total number of characters to a maximum of 3000 characters so that they do not exceed the LLM context window.
Step 3: Extractive-First mode
First, an extractive answer is attempted using the _extractive_answer method (based on TF-IDF). If a direct answer is found from the text in the context, the result is returned without LLM generation, reducing the risk of hallucinations and increasing response speed.
Step 4: LLM Reasoning
Only if the answer cannot be answered using the extraction method, an answer will be generated using the LLM backend (HuggingFace, SNN, etc.). To suppress repetition, parameters such as repetition_penalty=1.5 are set.
Step 5: Post-processing and hallucination suppression
Apply the following post-processing to the generated answers to improve their quality. - Locabulary overlap check with context: Calculates the overlap of words in the generated answer and the original context. If the overlap is less than 25%, we determine that the answer is likely to have been generated without context (an illusion) and will attempt to fall back to an extracted answer. - Irrelevant Content Filtering: If extraneous text is generated, such as URLs or advertisements, it will be discarded and an "Insufficient Information" response will be returned.
4. Technical specifications details
4.1. Milvus settings
- Collection schema:
id(PK),embedding(384 dimensions),text(65535 characters),source(255 characters) - Index:
IVF_FLAT(number of clustersnlist=128) - Similarity index:
L2(Euclidean distance) - Search parameters:
nprobe=10(number of clusters to search during search)
Mathematics of vector search
Milvus calculates the distance of the query vector \(\mathbf{q} \in \mathbb{R}^{384}\) to all document vectors \(\mathbf{d}_i\) in the collection.
L2 distance (Euclidean distance):
The smaller the number, the higher the similarity, and the Top-K documents are returned.
How the IVF_FLAT index works:
- Divide document vector into \(K\) clusters (default
nlist=128) - When querying, select
nprobe=10clusters closest to the query vector - Detailed distance calculation only for documents in the selected cluster (faster than full search)
4.2. Embedding Model specifications
- Model:
paraphrase-multilingual-MiniLM-L12-v2 - Output dimension: 384
- Supported languages: Over 100 (including Japanese and English)
- Architecture: BERT-based Transformer (12 layers)
- Learning objective: Paraphrase detection (identification of semantically similar sentences)
Text encoding process
from sentence_transformers import SentenceTransformer
model = SentenceTransformer('paraphrase-multilingual-MiniLM-L12-v2')
embedding = model.encode(text) # shape: (384,)
Internally, the following happens:
- Tokenization: Division into subword units (BPE based)
- Embeddings Layer: Convert token ID to embedding vector
- Transformer Encoder: 12-layer self-attention mechanism takes context into account
- Pooling: [CLS] Generate whole sentence representation by token or average pooling
- Normalization: Convert to unit vector by L2 normalization
4.3. Elasticsearch BM25 Scoring
Elasticsearch uses the BM25 (Best Matching 25) algorithm to perform keyword searches.
BM25 score calculation formula
BM25 score for document \(d\) and query \(q\):
where: - \(\text{TF}(t, d)\): Frequency of term \(t\) in document \(d\) - \(\text{IDF}(t) = \log\left(\frac{N - n(t) + 0.5}{n(t) + 0.5}\right)\): Inverse document frequency (rare words have higher scores) - \(N\): Total number of documents, \(n(t)\): Number of documents containing term \(t\) - \(|d|\): Document length (number of words) - \(\text{avgdl}\): Average document length - \(k_1 = 1.2\): TF saturation parameter - \(b = 0.75\): Document length normalization factor
4.4. RRF parameters and integration algorithm
Reciprocal Rank Fusion (RRF) is an algorithm that integrates the results of different search systems.
RRF score calculation formula
RRF score for document \(d\):
where: - \(R\): Set of search systems (in this system \(R = \{\text{Milvus}, \text{Elasticsearch}\}\)) - \(\text{rank}_r(d)\): Rank order of document \(d\) in system \(r\) (starting from 1) - \(k = 60\): Rank bias correction constant
RRF integration example
| Document ID | Milvus Rank | ES Rank | RRF Score Calculation | Total Score |
|---|---|---|---|---|
| doc_1 | 1st place | 3rd place | \(\frac{1}{60+1} + \frac{1}{60+3}\) | 0.0323 |
| doc_2 | 2nd place | 1st place | \(\frac{1}{60+2} + \frac{1}{60+1}\) | 0.0323 |
| doc_3 | 3rd place | 2nd place | \(\frac{1}{60+3} + \frac{1}{60+2}\) | 0.0320 |
The documents are finally sorted in descending order of RRF score.
Implementation code (evospikenet/rag_milvus.py)
rrf_scores = {}
k = 60
# Processing Milvus results
for rank, doc_id in enumerate(milvus_results, start=1):
rrf_scores[doc_id] = rrf_scores.get(doc_id, 0) + 1 / (k + rank)
# Processing Elasticsearch results
for rank, doc_id in enumerate(es_results, start=1):
rrf_scores[doc_id] = rrf_scores.get(doc_id, 0) + 1 / (k + rank)
# Sort by descending score
ranked_docs = sorted(rrf_scores.items(), key=lambda x: x[1], reverse=True)
5. Implementation code explanation
5.1. CRUD operations
evospikenet/rag_milvus.py implements the following helper functions to operate both Milvus and Elasticsearch databases consistently.
- add_user_text(): Add document
- get_all_data(): Get all documents
- update_document(): Update document
- delete_document(): Delete document
5.2. EvoRAG class
This class encapsulates the main logic of the RAG system.
- __init__(): Select an LLM backend (huggingface, snn, etc.) and initialize the Milvus/Elasticsearch client.
- retrieve(): Performs hybrid search and RRF integration.
- generate(): Performs text generation including hallucination suppression logic.
- rag(): Provides an end-to-end pipeline that combines retrieve and generate.
6. Advanced features
6.1. SNN cooperation and neuron activity visualization
The rag_with_vis() method is a special function that is only available when llm_type='snn'.
- Purpose: Visualize and analyze how the internal SNN model processes information during RAG pipeline execution.
- How it works:
1. Attach DataMonitorHook to each layer of the SNN model.
2. Run normal RAG pipeline (prompt tokenization, forward pass).
3. Capture time series data of spike firing and membrane potential of all neurons generated during this process.
4. Serialize and save the collected data in a file called rag_neuron_data.pt.
- Usage: Saved .pt files can be loaded into offline analysis scripts such as examples/visualize_rag_neurons.py for detailed visualizations such as firing raster plots.
6.2. Mock implementation for testability
To avoid relying on external services like Milvus in unit tests or CI/CD environments, we have built-in functionality to mock the connection to Milvus when the EVOSPIKENET_TEST_MODE environment variable is set.
MockCollectionclass: In-memory mock class that mimics Milvus'Collectionobject. Simulates key methods such asinsertandsearch.- Branch of
_get_or_create_collection(): Checks environment variables within this function and returns an instance ofMockCollectioninstead of an actual Milvus connection if in test mode.
This design makes it possible to test the RAG system's logic quickly and reliably, independent of external services.
6.3. RAG query processing debug function ⭐ NEW (Added on December 17, 2025)
Purpose: Visualize the internal processing of the RAG system and improve search quality and transparency of LLM responses.
Contents of debug information
Enabling the Show query processing details checkbox in the frontend UI will display the following details:
- Query analysis:
- Detected language (Japanese/English)
- Extracted keywords (after morphological analysis)
-
Prompt type used
-
Vector search results:
- Retrieved document ID
- L2 distance score for each document
-
Preview of document content (up to 200 characters)
-
Keyword search results:
- Retrieved document ID
- BM25 score for each document
-
Preview of document content (up to 200 characters)
-
RRF fusion process:
- Ranking of each search result
- RRF score calculation process (k=60)
-
Final integrated ranking
-
Generation details:
- Context character count
- Number of prompt characters
- Preview of used prompt template
- Response type (Extractive/Generative/Hallucination Fallback)
Implementation details
Backend (evospikenet/rag_milvus.py):```python
def retrieve(self, query, top_k=3, return_debug_info=False):
# ... search processing ...
if return_debug_info:
debug_info = {
'vector_results': [
{
'doc_id': str(doc_id),
'score': float(score),
'text_preview': text[:200]
}
for doc_id, score, text in vector_results
],
'keyword_results': [...], # similar structure
'rrf_scores': dict(rrf_scores)
}
return docs, debug_info
**Frontend** (`frontend/pages/rag.py`):python
@callback(
Output('rag-output', 'children'),
Input('rag-submit', 'n_clicks'),
State('show-debug-info', 'value'), # check box
prevent_initial_call=True
)
def handle_query(n_clicks, show_debug):
rag = EvoRAG(llm_type='huggingface')
result = rag.rag(query, return_debug_info=show_debug)
if show_debug and 'debug_info' in result:
# Generate detailed UI card with debug information
return create_debug_display(result)
```
6.4. Elasticsearch reindex function ⭐ NEW (Added on December 17, 2025)
Background: If a document needs to be registered in both Milvus and Elasticsearch, but only in one, hybrid search will not work properly.
Solution: Provide a script to retrieve all existing documents in Milvus and bulk index them into Elasticsearch.
How to use
python reindex_elasticsearch.py
Implementation (reindex_elasticsearch.py):```python
lient import get_es_client -->
def reindex_all(): # Get all documents from Milvus collection = _get_or_create_collection() results = collection.query( expr="id >= 0", output_fields=["id", "text"] )
# Bulk indexing in Elasticsearch
es_client = get_es_client()
for doc in results:
es_client.index(
index="rag_kb",
id=doc['id'],
document={'content': doc['text']},
refresh=True
)
```
Verification script (test_rag_debug_keywords.py):
- Verified that both vector search and keyword search work properly
- Verify that each search result includes a document preview
Reference file:
- evospikenet/rag_milvus.py: RAG system core implementation, debug function integration
- evospikenet/elasticsearch_client.py: Elasticsearch client
- evospikenet/rag_backends.py: LLM backend integration
- frontend/pages/rag.py: Knowledge base management UI, debug display function
- reindex_elasticsearch.py: Elasticsearch reindex script
- test_rag_debug_keywords.py: Debug function verification test
7. Document upload/parser implementation status (Plan E completed)
Last updated: February 22, 2026
Status: Implementation completed/in operation (RAG file upload + versioning pipeline is in production)
Related Documents: REMAINING_FEATURES.md
7.1. Current flow
POST /upload_file(rag_api.py) receives the upload, verifies the extension and MIME on the server side, and executesparse_document→chunk_text_auto→ embed generation → Milvus/Elasticsearch registration.- Validation: Control allowed extensions/MIME with
DEFAULT_ALLOWEDandDEFAULT_ALLOWED_MIMEinfile_validator.py. Allowed extensions are.txt,.md,.pdf,.docx,.doc,.xlsx,.xls,.pptx,.ppt,.gdoc,.html. Even if MIME cannot be detected, the extension check will not be passed. - Parsing:
ParserRegistryindocument_parsers.pydetermines the extension → parser and assignsmetadata["parser"]andmetadata["source_path"]. - Chunk:
chunk_text_auto(..., target_chunk_tokens=400)embeds the chunk ID anddoc_keyin the metadata, vectorizes each chunk, and registers it with Milvus/Elasticsearch. - Response:
rag_api.pyreturnschunks_indexed,document_ids,doc_key,version, and the front end and SDK use them as is.
7.2. Supported extensions and parsers
| Extension | Parser | Notes |
|---|---|---|
| .txt | TextParser | Reads UTF-8 text as is |
| .md | MarkdownParser | Read in UTF-8 and add metadata["parser"]="markdown" |
| .markdown | MarkdownParser | ParserRegistry supported. Upload permission is enabled after adding validator (currently only .md is allowed) |
| PdfParser | PyMuPDF required. Add page number metadata and read fallback when PyMuPDF is not installed | |
| .doc/.docx | WordParser | Extract paragraphs/tables with python-docx. ZIP fallback implementation |
| .xls/.xlsx | ExcelParser | Traverse all sheets with openpyxl and have sharedStrings fallback |
| .ppt/.pptx | PowerPointParser | Extract slide/table text with python-pptx |
| .gdoc/.html | GoogleDocsParser | Treated as exported plain text |
7.3. Versioning and indexing
doc_keylowercases the file name. Get the latest version with Elasticsearch'sget_latest_version(doc_key)and automatically number the nextversion.elasticsearch_client.pyassignsdoc_key,version,chunk_id,source_filename,checksum,indexed_atand stores each chunk.- Milvus stores
<filename>#v<version>assource, so the version can be determined from the search results. - SHA-1 checksum is retained and can be used for duplicate upload detection and tracing.
7.4. Clients and samples
- Python client:
upload_fileinrag_client.py. - SDK:
rag_upload_file/rag_upload_file_asyncinsdk.py. - Sample: A minimal example where
rag_markdown_sdk.pyuploads and searches Markdown. - UI:
rag.pycalls/upload_fileand displays upload result and version. The latest release has a streaming/background checkbox, Session ID input and job status inquiry functions have been added, allowing you to interactively upload large documents in parts.
7.5. Future follow-up
- Added
.markdownextension to upload validator to fully sync with ParserRegistry settings. - Add setup instructions for python-magic / PyMuPDF / python-docx / openpyxl / python-pptx if they are not installed to the FAQ.
- Expand the version history UI display on the Dash side and organize the
doc_key-based history acquisition/rollback API (using the existing versioned index).
7.6. Large file support & differential UI improvement plan
Purpose
It allows users to safely upload and search documents larger than 1GB, and provides a UI that allows users to intuitively understand the differences between versions.
7.6.1 Large file support
Assignment - Increased memory consumption and server OOM when uploading - Parsing/chunking takes time due to synchronous processing and times out. - Index load to Elasticsearch/Milvus
Plan
1. Streaming parsing/chunking engine
- Extended parse_document and chunk_text_auto to support streaming,
PDF/Word/Excel/PPT Each parser uses sequential processing API to output text one by one.
- Tokenization introduces a generator pattern and avoids full-text retention.
2. Split Upload API
- Added multipart upload to frontend/SDK. The client
Divide into 100MB, etc., and send each part sequentially without recombining on the server side.
3. Background job
- Asynchronous processing with FastAPI's BackgroundTasks or Celery/RQ,
Returns the job ID and performs progress polling/notification.
- start_time, end_time, progress in job metadata, and when completed
It can be obtained by assigning doc_key/version and using /upload_status.
- When using Redis as a backend, workers can track progress.
Publish to upload:channel:<job_id> and the main process is
Subscribe to that channel for instant notifications. WebSocket (/ws/progress/{job_id})
We also have a sample that you can subscribe to.
- Automatic polling after entering job ID in front-end UI
dcc.Interval has been introduced to convert the progress value that occurs into a bar progress
Can be displayed.
4. Resource limits and auditing
- Memory limit by Kubernetes/Cgroup, record size and processing time in log.
5. Performance Test
- Perform continuous upload bench with 1GB/5GB dummy file,
Measures time, memory, and CPU usage.
7.6.2 Improved difference display UI
Assignment - Only text differences, difficult to understand changes in images and tables - Slow scrolling with large documents - Comparison navigation is complicated
Plan
1. Rich Difference Component
- Enhance Markdown/HTML differences with react-diff-viewer etc.
Added heading folding, word highlighting, and color coding.
- Display changes in cell background color for table differences, and slider for image differences.
2. Difference summary page
- Automatically generate changed line count and chapter statistics for one-click jump.
3. Precomputation and caching
- Generate and save differential patches when registering versions, and retrieve UI when necessary.
4. Large document measures
- Lighter loading with virtual scrolling (react-virtualized) and page/section division.
5. User testing and feedback
- Evaluate UX at in-house workshop and reflect issues in sprint.
Schedule (estimate)
- 2026/2–3: Streaming parser prototype & split upload design
- 2026/3–4: Background job and performance test
- 2026/4–5: Differential UI prototype development and review
- 2026/5: Document update/deployment procedure maintenance
Risks and mitigations
- Parse bug: Early detection with library fixing and CI testing
- Split upload failure: Implemented retransmission/checksum verification
- UI performance: Lazy loading/virtualization to avoid scrolling delays
This plan is expected to dramatically improve support for large-volume documents and the experience of viewing differences.
8. Summary
In this document, we have organized the EvoSpikeNet RAG system, from implementation of hybrid search to file upload and version control.
Current strengths:
- ✅ High precision search using hybrid search (Milvus + Elasticsearch) and RRF
- ✅ Multilingual support (Japanese/English) and Extractive-first hallucination suppression
- ✅ File import with /upload_file and index with versioning
- ✅ SDK/front-end cooperation sample (including Markdown upload example)
Follow-up:
- 📅 Lifting the ban on .markdown validators and clarifying the dependent library installation procedure
- 📅 Improved version history UI and added rollback operation procedure
Related documents:
- REMAINING_FEATURES.md - Plan E status and backlog
- rag-system/README.md - Startup/API procedure for single RAG service
- docs/SDK_API_REFERENCE.md - How to use RAG from SDK