Skip to content

Detailed explanation of EvoSpikeNet RAG system

[!NOTE] For the latest implementation status, please refer to Functional Implementation Status (Remaining Functionality).

Implementation notes (artifacts): See docs/implementation/ARTIFACT_MANIFESTS.md for the artifact_manifest.json output by the training script and recommended CLI flags.

Creation date: December 10, 2025 Last updated: February 19, 2026 (RAG upload/version control implementation synchronization)

Author: Masahiro Aoki


NOTE: The implementation is carved out in rag-system/ and at runtime EvoSpikeNet nodes only interact via the RAG API. See rag-system/README.md for details and startup instructions.

Purpose and use of this document

  • Purpose: To provide an overview of the processing flow, technical specifications, and implementation locations of the RAG system, and to provide reference for development/operation.
  • Target audience: RAG implementation/operation personnel, QA, PM.
  • First reading order: Table of contents → RAG system overview → Document registration/search flow → Technical specifications.
  • Related links: Distributed brain script in examples/run_zenoh_distributed_brain.py, PFC/Zenoh/Executive details in implementation/PFC_ZENOH_EXECUTIVE.md.

table of contents

  1. RAG System Overview
  2. [Document registration process flow] (#2-Document registration process flow)
  3. [Search processing flow] (#3-Search processing flow)
  4. [Technical specifications details] (#4-Technical specifications details)
  5. [Implementation code explanation] (#5-Implementation code explanation)
  6. Advanced Features

1. RAG system overview

EvoSpikeNet's RAG (Retrieval-Augmented Generation) system employs a Hybrid Search architecture. This is a system that achieves highly accurate document retrieval and generation by executing vector searches based on semantic similarity and full text searches based on keyword matching in parallel, and integrating the results using the Reciprocal Rank Fusion (RRF) algorithm.

1.1. Main components

Components Roles Technology Stack
Milvus Vector database (semantic search) Vector dimension: 384 dimensions, index: IVF_FLAT
Elasticsearch Full text search engine (keyword search) BM25 algorithm, kuromoji Japanese tokenizer
SentenceTransformer Text vectorization paraphrase-multilingual-MiniLM-L12-v2 (multilingual support)
RRF algorithm Search result integration k=60
LLM backend Text generation HuggingFace / SNN / Standard LM

1.2. Data structures

Schema of Milvus collection rag_kb:

{
    "id": INT64 (Primary Key, auto_id=True),
    "embedding": FLOAT_VECTOR (dim=384),
    "text": VARCHAR (max_length=65535),  # Maximum 65,535 characters
    "source": VARCHAR (max_length=255)
}

2. Document registration process flow

The process of registering documents in the RAG system employs a Dual Indexing strategy. By registering the same document in both Milvus for vector searches and Elasticsearch for keyword searches, you can leverage both search engines during subsequent searches.

2.1. Process flow diagram

sequenceDiagram
    participant U as "ユーザー/UI"
    participant A as "add_user_text"
    participant E as "Embedding Model"
    participant M as "Milvus Collection"
    participant ES as "Elasticsearch"

    U->>+A: 文書テキスト, ソース情報
    Note over A: 入力検証: 空文字チェック

    A->>+E: テキストのエンコード
    Note over E: SentenceTransformer: paraphrase-multilingual-MiniLM-L12-v2
    E-->>-A: 384次元ベクトル

    par Milvusへの登録
        A->>+M: Insert Entity: embedding, text, source
        Note over M: Auto-ID生成, IVF_FLATインデックス更新
        M->>M: Flush: 永続化
        M-->>-A: doc_id: Primary Key
    and Elasticsearchへの登録
        A->>+ES: Index Document: id=doc_id, content=text
        Note over ES: BM25インデックス構築, Kuromoji Tokenization 日本語
        ES-->>-A: 成功確認
    end

    A-->>-U: Added document with ID: doc_id

2.2. Error handling

  • Milvus connection failure: The _ensure_milvus_connection function will perform up to 3 retries (at 5 second intervals) to account for the delay in starting the container.

3. Search processing flow

The search process consists of three stages: Parallel Hybrid Search + RRF Integration + LLM Generation.

3.1. Overall flow diagram

graph TD
    A[user query] --> B[query sanitization]
    B --> C{並列実行}

    C --> D1[Vector search<br/>Milvus]
    C --> D2[Keyword search<br/>Elasticsearch]

    D1 --> E1[Embedding]
    E1 --> E2[L2 distance calculation]
    E2 --> F1[Top-K Results<br/>+ Score]

    D2 --> G1[Kuromoji<br/>Tokenize]
    G1 --> G2[BM25 scoring]
    G2 --> F2[Top-K Results<br/>+ Score]

    F1 --> H[RRF integration<br/>k=60]
    F2 --> H

    H --> I[Integrated ranking]
    I --> J[Deduplication]
    J --> K[Get top documents]
    K --> L[Context construction]
    L --> M[Prompt generation]
    M --> N[LLM Reasoning]
    N --> O[Hallucination suppression<br/>Post-processing]
    O --> P[Return to user]

3.2. Generation step details

Step 1: Language detection and prompt selection

Detects the query language (Japanese/English) and automatically selects a prompt template optimized for each. This facilitates natural answer generation according to the language.

Step 2: Context construction and truncation

Concatenates the documents found in the search and constructs a single text separated by \n---\n. A truncation process will be performed to limit the total number of characters to a maximum of 3000 characters so that they do not exceed the LLM context window.

Step 3: Extractive-First mode

First, an extractive answer is attempted using the _extractive_answer method (based on TF-IDF). If a direct answer is found from the text in the context, the result is returned without LLM generation, reducing the risk of hallucinations and increasing response speed.

Step 4: LLM Reasoning

Only if the answer cannot be answered using the extraction method, an answer will be generated using the LLM backend (HuggingFace, SNN, etc.). To suppress repetition, parameters such as repetition_penalty=1.5 are set.

Step 5: Post-processing and hallucination suppression

Apply the following post-processing to the generated answers to improve their quality. - Locabulary overlap check with context: Calculates the overlap of words in the generated answer and the original context. If the overlap is less than 25%, we determine that the answer is likely to have been generated without context (an illusion) and will attempt to fall back to an extracted answer. - Irrelevant Content Filtering: If extraneous text is generated, such as URLs or advertisements, it will be discarded and an "Insufficient Information" response will be returned.


4. Technical specifications details

4.1. Milvus settings

  • Collection schema: id(PK), embedding(384 dimensions), text(65535 characters), source(255 characters)
  • Index: IVF_FLAT (number of clusters nlist=128)
  • Similarity index: L2 (Euclidean distance)
  • Search parameters: nprobe=10 (number of clusters to search during search)

Milvus calculates the distance of the query vector \(\mathbf{q} \in \mathbb{R}^{384}\) to all document vectors \(\mathbf{d}_i\) in the collection.

L2 distance (Euclidean distance):

\[ \text{dist}_{\text{L2}}(\mathbf{q}, \mathbf{d}_i) = \sqrt{\sum_{j=1}^{384} (q_j - d_{i,j})^2} \]

The smaller the number, the higher the similarity, and the Top-K documents are returned.

How ​​the IVF_FLAT index works:

  1. Divide document vector into \(K\) clusters (default nlist=128)
  2. When querying, select nprobe=10 clusters closest to the query vector
  3. Detailed distance calculation only for documents in the selected cluster (faster than full search)

4.2. Embedding Model specifications

  • Model: paraphrase-multilingual-MiniLM-L12-v2
  • Output dimension: 384
  • Supported languages: Over 100 (including Japanese and English)
  • Architecture: BERT-based Transformer (12 layers)
  • Learning objective: Paraphrase detection (identification of semantically similar sentences)

Text encoding process

from sentence_transformers import SentenceTransformer

model = SentenceTransformer('paraphrase-multilingual-MiniLM-L12-v2')
embedding = model.encode(text)  # shape: (384,)

Internally, the following happens:

  1. Tokenization: Division into subword units (BPE based)
  2. Embeddings Layer: Convert token ID to embedding vector
  3. Transformer Encoder: 12-layer self-attention mechanism takes context into account
  4. Pooling: [CLS] Generate whole sentence representation by token or average pooling
  5. Normalization: Convert to unit vector by L2 normalization

4.3. Elasticsearch BM25 Scoring

Elasticsearch uses the BM25 (Best Matching 25) algorithm to perform keyword searches.

BM25 score calculation formula

BM25 score for document \(d\) and query \(q\):

\[ \text{score}_{\text{BM25}}(d, q) = \sum_{t \in q} \text{IDF}(t) \cdot \frac{\text{TF}(t, d) \cdot (k_1 + 1)}{\text{TF}(t, d) + k_1 \cdot (1 - b + b \cdot \frac{|d|}{\text{avgdl}})} \]

where: - \(\text{TF}(t, d)\): Frequency of term \(t\) in document \(d\) - \(\text{IDF}(t) = \log\left(\frac{N - n(t) + 0.5}{n(t) + 0.5}\right)\): Inverse document frequency (rare words have higher scores) - \(N\): Total number of documents, \(n(t)\): Number of documents containing term \(t\) - \(|d|\): Document length (number of words) - \(\text{avgdl}\): Average document length - \(k_1 = 1.2\): TF saturation parameter - \(b = 0.75\): Document length normalization factor

4.4. RRF parameters and integration algorithm

Reciprocal Rank Fusion (RRF) is an algorithm that integrates the results of different search systems.

RRF score calculation formula

RRF score for document \(d\):

\[ \text{score}_{\text{RRF}}(d) = \sum_{r \in R} \frac{1}{k + \text{rank}_r(d)} \]

where: - \(R\): Set of search systems (in this system \(R = \{\text{Milvus}, \text{Elasticsearch}\}\)) - \(\text{rank}_r(d)\): Rank order of document \(d\) in system \(r\) (starting from 1) - \(k = 60\): Rank bias correction constant

RRF integration example

Document ID Milvus Rank ES Rank RRF Score Calculation Total Score
doc_1 1st place 3rd place \(\frac{1}{60+1} + \frac{1}{60+3}\) 0.0323
doc_2 2nd place 1st place \(\frac{1}{60+2} + \frac{1}{60+1}\) 0.0323
doc_3 3rd place 2nd place \(\frac{1}{60+3} + \frac{1}{60+2}\) 0.0320

The documents are finally sorted in descending order of RRF score.

Implementation code (evospikenet/rag_milvus.py)

rrf_scores = {}
k = 60

# Processing Milvus results
for rank, doc_id in enumerate(milvus_results, start=1):
    rrf_scores[doc_id] = rrf_scores.get(doc_id, 0) + 1 / (k + rank)

# Processing Elasticsearch results
for rank, doc_id in enumerate(es_results, start=1):
    rrf_scores[doc_id] = rrf_scores.get(doc_id, 0) + 1 / (k + rank)

# Sort by descending score
ranked_docs = sorted(rrf_scores.items(), key=lambda x: x[1], reverse=True)

5. Implementation code explanation

5.1. CRUD operations

evospikenet/rag_milvus.py implements the following helper functions to operate both Milvus and Elasticsearch databases consistently. - add_user_text(): Add document - get_all_data(): Get all documents - update_document(): Update document - delete_document(): Delete document

5.2. EvoRAG class

This class encapsulates the main logic of the RAG system. - __init__(): Select an LLM backend (huggingface, snn, etc.) and initialize the Milvus/Elasticsearch client. - retrieve(): Performs hybrid search and RRF integration. - generate(): Performs text generation including hallucination suppression logic. - rag(): Provides an end-to-end pipeline that combines retrieve and generate.


6. Advanced features

6.1. SNN cooperation and neuron activity visualization

The rag_with_vis() method is a special function that is only available when llm_type='snn'. - Purpose: Visualize and analyze how the internal SNN model processes information during RAG pipeline execution. - How it works: 1. Attach DataMonitorHook to each layer of the SNN model. 2. Run normal RAG pipeline (prompt tokenization, forward pass). 3. Capture time series data of spike firing and membrane potential of all neurons generated during this process. 4. Serialize and save the collected data in a file called rag_neuron_data.pt. - Usage: Saved .pt files can be loaded into offline analysis scripts such as examples/visualize_rag_neurons.py for detailed visualizations such as firing raster plots.

6.2. Mock implementation for testability

To avoid relying on external services like Milvus in unit tests or CI/CD environments, we have built-in functionality to mock the connection to Milvus when the EVOSPIKENET_TEST_MODE environment variable is set.

  • MockCollection class: In-memory mock class that mimics Milvus' Collection object. Simulates key methods such as insert and search.
  • Branch of _get_or_create_collection(): Checks environment variables within this function and returns an instance of MockCollection instead of an actual Milvus connection if in test mode.

This design makes it possible to test the RAG system's logic quickly and reliably, independent of external services.

6.3. RAG query processing debug function ⭐ NEW (Added on December 17, 2025)

Purpose: Visualize the internal processing of the RAG system and improve search quality and transparency of LLM responses.

Contents of debug information

Enabling the Show query processing details checkbox in the frontend UI will display the following details:

  1. Query analysis:
  2. Detected language (Japanese/English)
  3. Extracted keywords (after morphological analysis)
  4. Prompt type used

  5. Vector search results:

  6. Retrieved document ID
  7. L2 distance score for each document
  8. Preview of document content (up to 200 characters)

  9. Keyword search results:

  10. Retrieved document ID
  11. BM25 score for each document
  12. Preview of document content (up to 200 characters)

  13. RRF fusion process:

  14. Ranking of each search result
  15. RRF score calculation process (k=60)
  16. Final integrated ranking

  17. Generation details:

  18. Context character count
  19. Number of prompt characters
  20. Preview of used prompt template
  21. Response type (Extractive/Generative/Hallucination Fallback)

Implementation details

Backend (evospikenet/rag_milvus.py):```python def retrieve(self, query, top_k=3, return_debug_info=False): # ... search processing ...

if return_debug_info:
    debug_info = {
        'vector_results': [
            {
                'doc_id': str(doc_id),
                'score': float(score),
                'text_preview': text[:200]
            }
            for doc_id, score, text in vector_results
        ],
        'keyword_results': [...],  # similar structure
        'rrf_scores': dict(rrf_scores)
    }
    return docs, debug_info

**Frontend** (`frontend/pages/rag.py`):python @callback( Output('rag-output', 'children'), Input('rag-submit', 'n_clicks'), State('show-debug-info', 'value'), # check box prevent_initial_call=True ) def handle_query(n_clicks, show_debug): rag = EvoRAG(llm_type='huggingface') result = rag.rag(query, return_debug_info=show_debug) if show_debug and 'debug_info' in result: # Generate detailed UI card with debug information return create_debug_display(result) ```

6.4. Elasticsearch reindex function ⭐ NEW (Added on December 17, 2025)

Background: If a document needs to be registered in both Milvus and Elasticsearch, but only in one, hybrid search will not work properly.

Solution: Provide a script to retrieve all existing documents in Milvus and bulk index them into Elasticsearch.

How to use

python reindex_elasticsearch.py

Implementation (reindex_elasticsearch.py):```python

lient import get_es_client -->

def reindex_all(): # Get all documents from Milvus collection = _get_or_create_collection() results = collection.query( expr="id >= 0", output_fields=["id", "text"] )

# Bulk indexing in Elasticsearch
es_client = get_es_client()
for doc in results:
    es_client.index(
        index="rag_kb",
        id=doc['id'],
        document={'content': doc['text']},
        refresh=True
    )

```

Verification script (test_rag_debug_keywords.py): - Verified that both vector search and keyword search work properly - Verify that each search result includes a document preview

Reference file: - evospikenet/rag_milvus.py: RAG system core implementation, debug function integration - evospikenet/elasticsearch_client.py: Elasticsearch client - evospikenet/rag_backends.py: LLM backend integration - frontend/pages/rag.py: Knowledge base management UI, debug display function - reindex_elasticsearch.py: Elasticsearch reindex script - test_rag_debug_keywords.py: Debug function verification test


7. Document upload/parser implementation status (Plan E completed)

Last updated: February 22, 2026 Status: Implementation completed/in operation (RAG file upload + versioning pipeline is in production) Related Documents: REMAINING_FEATURES.md

7.1. Current flow

  • POST /upload_file (rag_api.py) receives the upload, verifies the extension and MIME on the server side, and executes parse_documentchunk_text_auto → embed generation → Milvus/Elasticsearch registration.
  • Validation: Control allowed extensions/MIME with DEFAULT_ALLOWED and DEFAULT_ALLOWED_MIME in file_validator.py. Allowed extensions are .txt, .md, .pdf, .docx, .doc, .xlsx, .xls, .pptx, .ppt, .gdoc, .html. Even if MIME cannot be detected, the extension check will not be passed.
  • Parsing: ParserRegistry in document_parsers.py determines the extension → parser and assigns metadata["parser"] and metadata["source_path"].
  • Chunk: chunk_text_auto(..., target_chunk_tokens=400) embeds the chunk ID and doc_key in the metadata, vectorizes each chunk, and registers it with Milvus/Elasticsearch.
  • Response: rag_api.py returns chunks_indexed, document_ids, doc_key, version, and the front end and SDK use them as is.

7.2. Supported extensions and parsers

Extension Parser Notes
.txt TextParser Reads UTF-8 text as is
.md MarkdownParser Read in UTF-8 and add metadata["parser"]="markdown"
.markdown MarkdownParser ParserRegistry supported. Upload permission is enabled after adding validator (currently only .md is allowed)
.pdf PdfParser PyMuPDF required. Add page number metadata and read fallback when PyMuPDF is not installed
.doc/.docx WordParser Extract paragraphs/tables with python-docx. ZIP fallback implementation
.xls/.xlsx ExcelParser Traverse all sheets with openpyxl and have sharedStrings fallback
.ppt/.pptx PowerPointParser Extract slide/table text with python-pptx
.gdoc/.html GoogleDocsParser Treated as exported plain text

7.3. Versioning and indexing

  • doc_key lowercases the file name. Get the latest version with Elasticsearch's get_latest_version(doc_key) and automatically number the next version.
  • elasticsearch_client.py assigns doc_key, version, chunk_id, source_filename, checksum, indexed_at and stores each chunk.
  • Milvus stores <filename>#v<version> as source, so the version can be determined from the search results.
  • SHA-1 checksum is retained and can be used for duplicate upload detection and tracing.

7.4. Clients and samples

  • Python client: upload_file in rag_client.py.
  • SDK: rag_upload_file / rag_upload_file_async in sdk.py.
  • Sample: A minimal example where rag_markdown_sdk.py uploads and searches Markdown.
  • UI: rag.py calls /upload_file and displays upload result and version. The latest release has a streaming/background checkbox, Session ID input and job status inquiry functions have been added, allowing you to interactively upload large documents in parts.

7.5. Future follow-up

  • Added .markdown extension to upload validator to fully sync with ParserRegistry settings.
  • Add setup instructions for python-magic / PyMuPDF / python-docx / openpyxl / python-pptx if they are not installed to the FAQ.
  • Expand the version history UI display on the Dash side and organize the doc_key-based history acquisition/rollback API (using the existing versioned index).

7.6. Large file support & differential UI improvement plan

Purpose

It allows users to safely upload and search documents larger than 1GB, and provides a UI that allows users to intuitively understand the differences between versions.

7.6.1 Large file support

Assignment - Increased memory consumption and server OOM when uploading - Parsing/chunking takes time due to synchronous processing and times out. - Index load to Elasticsearch/Milvus

Plan 1. Streaming parsing/chunking engine - Extended parse_document and chunk_text_auto to support streaming, PDF/Word/Excel/PPT Each parser uses sequential processing API to output text one by one. - Tokenization introduces a generator pattern and avoids full-text retention. 2. Split Upload API - Added multipart upload to frontend/SDK. The client Divide into 100MB, etc., and send each part sequentially without recombining on the server side. 3. Background job - Asynchronous processing with FastAPI's BackgroundTasks or Celery/RQ, Returns the job ID and performs progress polling/notification. - start_time, end_time, progress in job metadata, and when completed It can be obtained by assigning doc_key/version and using /upload_status. - When using Redis as a backend, workers can track progress. Publish to upload:channel:<job_id> and the main process is Subscribe to that channel for instant notifications. WebSocket (/ws/progress/{job_id}) We also have a sample that you can subscribe to. - Automatic polling after entering job ID in front-end UI dcc.Interval has been introduced to convert the progress value that occurs into a bar progress Can be displayed. 4. Resource limits and auditing - Memory limit by Kubernetes/Cgroup, record size and processing time in log. 5. Performance Test - Perform continuous upload bench with 1GB/5GB dummy file, Measures time, memory, and CPU usage.

7.6.2 Improved difference display UI

Assignment - Only text differences, difficult to understand changes in images and tables - Slow scrolling with large documents - Comparison navigation is complicated

Plan 1. Rich Difference Component - Enhance Markdown/HTML differences with react-diff-viewer etc. Added heading folding, word highlighting, and color coding. - Display changes in cell background color for table differences, and slider for image differences. 2. Difference summary page - Automatically generate changed line count and chapter statistics for one-click jump. 3. Precomputation and caching - Generate and save differential patches when registering versions, and retrieve UI when necessary. 4. Large document measures - Lighter loading with virtual scrolling (react-virtualized) and page/section division. 5. User testing and feedback - Evaluate UX at in-house workshop and reflect issues in sprint.

Schedule (estimate)

  • 2026/2–3: Streaming parser prototype & split upload design
  • 2026/3–4: Background job and performance test
  • 2026/4–5: Differential UI prototype development and review
  • 2026/5: Document update/deployment procedure maintenance

Risks and mitigations

  • Parse bug: Early detection with library fixing and CI testing
  • Split upload failure: Implemented retransmission/checksum verification
  • UI performance: Lazy loading/virtualization to avoid scrolling delays

This plan is expected to dramatically improve support for large-volume documents and the experience of viewing differences.


8. Summary

In this document, we have organized the EvoSpikeNet RAG system, from implementation of hybrid search to file upload and version control.

Current strengths: - ✅ High precision search using hybrid search (Milvus + Elasticsearch) and RRF - ✅ Multilingual support (Japanese/English) and Extractive-first hallucination suppression - ✅ File import with /upload_file and index with versioning - ✅ SDK/front-end cooperation sample (including Markdown upload example)

Follow-up: - 📅 Lifting the ban on .markdown validators and clarifying the dependent library installation procedure - 📅 Improved version history UI and added rollback operation procedure

Related documents: - REMAINING_FEATURES.md - Plan E status and backlog - rag-system/README.md - Startup/API procedure for single RAG service