RAG 2.0 Implementation Roadmap - Remaining Functionality and Future Phases

Last Updated: 2026-05-23
Status: Implementation Planning + Operational Hardening Implemented
Owner: RAG System Development Team

Overview

A comprehensive document for managing RAG 2.0 project implementation phases and remaining functionality. Phases 1-5 are detailed in RAG_JAPANESE_SpecV2.en.md. This document manages Phase 6+ extension features and continuous improvements.

Implementation & Planning Phases Overview

┌─────────────────────────────────────────────────────────────────┐
│                    RAG 2.0 Implementation Timeline               │
├─────────────────────────────────────────────────────────────────┤
│                                                                 │
│ Phase 1 (Prep)      Phase 2 (KW Opt)    Phase 3 (Semantic)    │
│ ─────────────       ──────────────────   ────────────────       │
│ • Sudachi          • Sudachi + ES       • Query Expansion       │
│ • NER Model        • Index Rebuild      • NER Integration       │
│ • Test Cases       • Evaluation         • Entity Boosting       │
│ [2-3 days]         [2-3 days]          [3-4 days]             │
│                                                                 │
│         ↓                    ↓                    ↓            │
│ ┌─────────────┐    ┌──────────────┐    ┌──────────────┐        │
│ │   READY     │    │  MRR > 0.7   │    │ Recall > 0.8 │        │
│ └─────────────┘    └──────────────┘    └──────────────┘        │
│                                                                 │
│ Phase 4 (Integration)    Phase 5 (Operations)                 │
│ ──────────────────       ───────────────────                  │
│ • RRF Tuning            • Dashboard                           │
│ • Full Test             • Monitoring                          │
│ • Production            • Feedback Loop                       │
│ [2-3 days]              [Ongoing]                             │
│                                                                 │
│         ↓                    ↓                                │
│ ┌─────────────┐    ┌──────────────┐                          │
│ │NDCG > 0.75  │    │ Production   │                          │
│ └─────────────┘    └──────────────┘                          │
│                                                                 │
│ Phase 6+ (REMAINING - This Document)                          │
│ ────────────────────────────────────                          │
│ • Context-aware RRF                                            │
│ • Advanced NER fine-tuning                                     │
│ • Relevance Feedback Loop                                      │
│ • Multilingual Expansion                                       │
│                                                                 │
└─────────────────────────────────────────────────────────────────┘

Phase 1-5: Core Phases (Details in RAG_JAPANESE_SpecV2.en.md)

Phase	Name	Duration	Key Tasks	Exit Criteria
1	Preparation & Validation	2-3 days	Sudachi/NER/Test Cases	Test set frozen
2	KW Search Optimization	2-3 days	Sudachi + ES integration	MRR > 0.7
3	Semantic Optimization	3-4 days	Query Exp + NER	Recall > 0.8
4	Integration & Tuning	2-3 days	RRF tuning + full test	NDCG > 0.75
5	Operations & Monitoring	Ongoing	Dashboard + Feedback	SLA achievement

2026-05-22 Implementation Reflection (RAG v2 Hardening and Operational Observability)

The following items are already reflected in EvoSpikeNet-Core implementation (evospikenet/api_modules/rag_v2_api.py).

v2 API hardening:
- Strengthened fail-closed behavior for POST /api/v2/rag/search and POST /api/v2/rag/feedback.
- Returns 503 when preprocessing dependencies (Sudachi/NER/QueryExpander) are unavailable instead of silently degrading.
Preprocessing component split:
- Explicitly separated SudachiTokenizer, EntityRecognizer, and QueryExpander responsibilities.
Sudachi version guard:
- Validates expected defaults sudachipy=0.7.5 and dictionary 20240716.
- Enforced as fail-closed in production-like environments, warning-only in non-production.
QueryExpander enhancements:
- Implemented rule|llm|hybrid backends.
- LLM expansion is parsed through a fixed JSON schema ({"expansions": [...]}).
- Invalid JSON raises in strict mode and falls back to deterministic behavior in non-strict mode.
Quality guard:
- Evaluates diversity_score and redundancy_rate.
- Automatically falls back from llm/hybrid to rule on low-quality expansion output.
Observability:
- Added GET /api/v2/rag/preprocessing/health.
- Added query_expansion_quality, query_expansion_guard, query_expansion_guard_stats, query_expansion_guard_history, and query_expansion_guard_hash_summary to debug_info.preprocessing.
- Guard history is maintained as a ring buffer with timestamp, reason, fallback backend, query hash, and quality snapshot.

Newly Added / Updated Environment Variables

RAG_V2_NER_BACKEND (transformers|regex)
RAG_V2_NER_MODEL
RAG_V2_PREPROCESSING_WARMUP
RAG_V2_PREPROCESSING_WARMUP_STRICT
RAG_V2_SUDACHI_VERSION
RAG_V2_SUDACHI_DICT_VERSION
RAG_V2_SUDACHI_DICT_DIST_NAMES
RAG_V2_QUERY_EXPANDER_BACKEND (rule|llm|hybrid)
RAG_V2_QUERY_EXPANDER_STRICT
RAG_V2_QUERY_EXPANDER_LM_BACKEND
RAG_V2_QUERY_EXPANDER_QUALITY_GUARD
RAG_V2_QUERY_EXPANDER_MIN_DIVERSITY
RAG_V2_QUERY_EXPANDER_MAX_REDUNDANCY
RAG_V2_QUERY_EXPANDER_GUARD_HISTORY_SIZE

Remaining Forward-Looking Work

Full context-aware RRF rollout (Phase 6)
Domain-specific NER fine-tuning (Phase 7)
Automated feedback-driven adjustment loop (Phase 8)
Multilingual extension (Phase 9)

2026-05-23 Implementation Reflection (RAG v2 Memory-Ranking Contract Sync)

The following items are now reflected in EvoSpikeNet-Core and RAG_JAPANESE_SpecV2.en.md v3.1.

RRF aggregation across query expansions:
- When the same doc_id appears in multiple expanded-query results, its RRF score is accumulated.
Memory-enhanced ranking:
- The result of RAGMemoryIntegrator.compute_memory_boost() is added to final_score.
- Results are sorted by final_score before assigning rank.
API contract synchronization:
- POST /api/v2/rag/search explicitly documents session_id, memory_context, and results[].memory_boost.
- POST /api/v2/rag/feedback requires session_id and returns memory_id plus importance on success.
Added test layers:
- Unit: memory-enhanced ranking calculation.
- Integration: API-level final ranking with memory_boost.
- System: RRF accumulation contract across query expansions.
- E2E: search-to-feedback memory journey.

Phase 6: Advanced Context-Aware Search

6.1 Context-Aware RRF Implementation

Objective: Incorporate episodic memory (when, who, context) into ranking weights

Background: Currently, RRF treats BM25 scores and vector distances equally. However, considering user context (e.g., user belongs to Project A) enables more sophisticated ranking.

Implementation Specification:

class ContextAwareRRF:
    """Context-aware RRF scoring"""

    def __init__(self, user_context: Dict[str, Any]):
        self.project_id = user_context.get("project_id")
        self.department = user_context.get("department")
        self.timestamp = user_context.get("timestamp")
        self.search_history = user_context.get("search_history", [])

    def compute_context_weight(self, doc: Dict) -> float:
        """
        Compute context weight based on document metadata

        Returns: Weight multiplier 0.5 ~ 2.0
        - High relevance: 2.0
        - Neutral: 1.0
        - Low relevance: 0.5
        """
        weight = 1.0

        # Project match
        if doc.get("project_id") == self.project_id:
            weight *= 1.5

        # Department match
        if doc.get("department") == self.department:
            weight *= 1.2

        # Search history relevance
        for prev_query in self.search_history[-5:]:
            if prev_query in doc.get("source", ""):
                weight *= 1.1

        # Temporal recency
        doc_age_days = (self.timestamp - doc.get("updated_at")).days
        recency_penalty = max(0.5, 1.0 - (doc_age_days / 365) * 0.3)
        weight *= recency_penalty

        return min(weight, 2.0)

    def reciprocal_rank_fusion_with_context(
        self,
        search_results_lists: List[List[Dict]],
        k: int = 60
    ) -> List[str]:
        """
        RRF with context weighting
        """
        fused_scores = {}

        for results in search_results_lists:
            for i, result in enumerate(results):
                doc_id = result["id"]
                base_score = 1 / (k + i + 1)
                context_weight = self.compute_context_weight(result)
                weighted_score = base_score * context_weight

                fused_scores[doc_id] = fused_scores.get(doc_id, 0) + weighted_score

        reranked = sorted(fused_scores.items(), key=lambda x: x[1], reverse=True)
        return [doc_id for doc_id, _ in reranked]

Timeline: 3-4 weeks after Phase 5, 2-3 weeks implementation

Expected Impact: - NDCG@10: 0.75 → 0.82 (+9%) - User satisfaction: 85% → 92%

Phase 7: Advanced Named Entity Recognition Fine-tuning

7.1 Domain-Specific NER Fine-tuning

Objective: Extract internal domain entities (org, department, project IDs) with 95%+ accuracy

Background: Standard NER models have insufficient accuracy (80-85%) for internal entities like project code "EV-2024-001" and department name "AI Systems Development Division".

Implementation Specification:

class DomainSpecificNER:
    """Organization-specific NER"""

    def __init__(self, base_model: str = "tner/roberta-large-japanese-char-luw-ner"):
        self.base_model = base_model
        self.fine_tuned_model = None
        self.entity_dict = self._load_internal_entities()

    def _load_internal_entities(self) -> Dict[str, List[str]]:
        """Load internal entity dictionary"""
        return {
            "PROJECT_ID": ["EV-2024-001", "EV-2024-002", "SPIKE-00123"],
            "DEPARTMENT": ["AI Systems Dev", "Data Science", "Research Group"],
            "PRODUCT": ["EvoSpikeNet Pro", "EvoSpikeNet Core"],
            "PERSON": ["John Doe", "Jane Smith"]
        }

    def fine_tune_on_internal_data(
        self,
        training_data: List[Dict],
        num_epochs: int = 3
    ):
        """
        Fine-tune on internal annotated data
        """
        from transformers import AutoModelForTokenClassification, Trainer

        model = AutoModelForTokenClassification.from_pretrained(self.base_model)
        # Fine-tuning process
        self.fine_tuned_model = model

    def extract_entities_with_dict_matching(self, text: str) -> List[Dict]:
        """
        Ensemble model prediction + dictionary matching
        """
        # Model predictions
        model_predictions = self._predict_with_model(text)

        # Dictionary matching
        dict_matches = self._match_against_internal_dict(text)

        # Merge (deduplicate, combine confidence)
        merged = self._merge_predictions(model_predictions, dict_matches)

        return merged

Timeline: 4 weeks after Phase 5, 3-4 weeks implementation

Expected Impact: - Entity Recall: 85% → 95% - Entity Precision: 88% → 96%

Phase 8: Automated User Feedback Loop

8.1 Relevance Feedback & Self-Learning

Objective: Automatically improve model/rules from user "relevance" assessments

Background: Initial search results are evaluated by users. Accumulating these feedbacks and auto-adjusting model parameters enables continuous accuracy improvements.

Implementation Specification:

class FeedbackLoop:
    """User feedback collection and learning"""

    def __init__(self, rag_system, feedback_db):
        self.rag = rag_system
        self.db = feedback_db

    def record_feedback(
        self,
        query: str,
        doc_id: str,
        rating: int,
        user_id: str,
        timestamp: datetime
    ):
        """Record user feedback"""
        feedback = {
            "query": query,
            "doc_id": doc_id,
            "rating": rating,
            "user_id": user_id,
            "timestamp": timestamp
        }
        self.db.insert(feedback)

    def analyze_feedback_patterns(self, window_days: int = 30) -> Dict:
        """
        Extract problem patterns from recent feedback
        """
        feedbacks = self.db.query_recent(days=window_days)

        patterns = {
            "low_rated_queries": [],
            "false_negatives": [],
            "entity_misses": [],
            "variation_issues": []
        }

        for feedback in feedbacks:
            if feedback["rating"] <= 2:
                if self._is_entity_query(feedback["query"]):
                    patterns["entity_misses"].append(feedback)
                elif self._is_variation_query(feedback["query"]):
                    patterns["variation_issues"].append(feedback)

        return patterns

    def auto_adjust_parameters(self, patterns: Dict):
        """
        Auto-adjust parameters based on feedback patterns
        """
        if len(patterns["entity_misses"]) > 5:
            self.rag.entity_boost_weight *= 1.1
            logging.info("Auto-increased entity boost weight")

        if len(patterns["variation_issues"]) > 5:
            self.rag.query_expansion_enabled = True
            logging.info("Auto-enabled query expansion")

Timeline: 2 weeks after Phase 6, 2-3 weeks implementation

Expected Impact: - Monthly NDCG improvement: 0.5-1.0% (continuous) - User satisfaction auto-improvement

Phase 9: Multilingual & Multi-Regional Support

9.1 Multilingual RAG Extension

Objective: Support English, Chinese, and other languages

Current Status: - Japanese: Full support (Phase 1-5) - English: Basic support (language detection only) - Others: Not supported

Implementation Specification:

class MultilingualRAG:
    """Multilingual RAG"""

    LANGUAGE_CONFIGS = {
        "ja": {
            "tokenizer": "sudachi",
            "stop_words": "ja_stop",
            "embedding_model": "paraphrase-multilingual-MiniLM-L12-v2",
            "ner_model": "tner/roberta-large-japanese-char-luw-ner"
        },
        "en": {
            "tokenizer": "english",
            "stop_words": "english",
            "embedding_model": "paraphrase-multilingual-MiniLM-L12-v2",
            "ner_model": "dslim/bert-base-NER"
        },
        "zh": {
            "tokenizer": "chinese",
            "stop_words": "chinese",
            "embedding_model": "paraphrase-multilingual-MiniLM-L12-v2",
            "ner_model": "uer/roberta-base-chinese-cluener"
        }
    }

    def retrieve_multilingual(self, query: str, languages: List[str] = None) -> Dict:
        """
        Simultaneous search across multiple languages
        """
        results = {}
        for lang in languages:
            if lang in self.supported_languages:
                detected_lang = self._detect_language(query)

                if detected_lang == lang:
                    results[lang] = self.rag.retrieve(query, lang_config=self.LANGUAGE_CONFIGS[lang])
                else:
                    translated_query = self._translate(query, detected_lang, lang)
                    results[lang] = self.rag.retrieve(translated_query, lang_config=self.LANGUAGE_CONFIGS[lang])

        return results

Timeline: 3 weeks after Phase 7, 4-5 weeks implementation

Expected Impact: - Global support realization - User base expansion

Phase 10: Real-time Updates & Streaming

10.1 Real-time Index Updates

Objective: Immediately reflect document additions/updates in indexes

Current State: - Batch updates (hours-days lag)

After Improvement: - Real-time updates (seconds)

Implementation Specification:

class RealtimeRAGIndexer:
    """Real-time index updates"""

    def __init__(self, milvus_client, es_client):
        self.milvus = milvus_client
        self.es = es_client
        self.update_queue = asyncio.Queue()

    async def process_updates(self):
        """
        Asynchronously process documents from queue
        """
        while True:
            doc = await self.update_queue.get()

            try:
                embedding = self._generate_embedding(doc["text"])

                await self._update_milvus(doc["id"], embedding, doc)
                await self._update_elasticsearch(doc["id"], doc)

                logging.info(f"Document {doc['id']} updated in real-time")
            except Exception as e:
                logging.error(f"Failed to update document: {e}")
                await self.update_queue.put(doc)

Timeline: 2 weeks after Phase 8, 3-4 weeks implementation

Expected Impact: - Information freshness improvement - Enhanced user experience

Phase 11: Explainable RAG

11.1 Search Result Rationale Display

Objective: Explain to users "why this document appeared"

Implementation Example:

class ExplainableRAG:
    """Explainable search results"""

    def retrieve_with_explanation(self, query: str, top_k: int = 5) -> List[Dict]:
        """
        Return search results with explanations
        """
        results = []

        docs, debug_info = self.rag.retrieve(query, return_debug_info=True)

        for i, doc in enumerate(docs[:top_k]):
            explanation = {
                "document": doc,
                "rank": i + 1,
                "reasons": [
                    {
                        "type": "keyword_match",
                        "matched_terms": debug_info["keyword_results"][i]["matched_keywords"],
                        "score": debug_info["keyword_results"][i]["score"]
                    },
                    {
                        "type": "semantic_similarity",
                        "similarity": debug_info["vector_results"][i]["score"],
                        "explanation": "Small semantic distance to query"
                    },
                    {
                        "type": "rrf_fusion",
                        "combined_score": debug_info["rrf_scores"][doc["id"]]
                    }
                ]
            }
            results.append(explanation)

        return results

Timeline: 1 week after Phase 9, 2 weeks implementation

Expected Impact: - Increased user trust - Improved debugging/optimization efficiency

Milestone Overview

Phase	Name	Start	Duration	Main Deliverable
1	Preparation	Immediate	2-3 days	Test set frozen
2	KW Optimization	Week 1	2-3 days	MRR 0.7+
3	Semantic Optimization	Week 2	3-4 days	Recall 0.8+
4	Integration & Tuning	Week 3	2-3 days	NDCG 0.75+
5	Operations & Monitoring	Week 4	Ongoing	Production ready
6	Context-aware RRF	Week 7	2-3 weeks	NDCG 0.82+
7	Domain NER	Week 10	3-4 weeks	Entity Recall 95%+
8	Feedback Loop	Week 13	2-3 weeks	Auto-improvement
9	Multilingual	Week 16	4-5 weeks	Global Ready
10	Real-time Updates	Week 21	3-4 weeks	Second-level Updates
11	Explainability	Week 25	2 weeks	Explainable RAG

Resource Planning

Phase 1-5 (Required)

Role	FTE	Duration
ML Engineer	1.5	2 weeks
Backend Engineer	1.5	2 weeks
QA / Testing	0.5	2 weeks
Total	3.5	2 weeks

Phase 6-11 (Extension / Optional)

Role	FTE	Duration
ML Engineer	1.0	8 weeks
Backend Engineer	1.0	8 weeks
DevOps	0.5	8 weeks
Total	2.5	8 weeks

Success Metrics

Phase 1-5 Goals

Metric	Baseline	Target	Deadline
Variation MRR	0.42	0.70	Week 3
Entity Recall	0.35	0.80	Week 4
Overall NDCG	0.55	0.75	Week 4
User Satisfaction	78%	90%	Week 5

Phase 6-11 Goals

Metric	Target	Deadline
Context-aware NDCG	0.82	Week 8
Entity Precision	0.96	Week 11
Self-learning NDCG improvement	+0.5-1.0%/month	Week 13
Global Support	3 languages	Week 21

Dependencies & Constraints

External Dependencies

✅ Sudachi / TNER / Elasticsearch - Available
⚠️ Internal terminology dictionary - Must be built in Phase 1
⚠️ User feedback mechanism - Implemented in Phase 5

Technical Constraints

GPU Memory: 8GB+ recommended for NER model execution
Elasticsearch Storage: 2x capacity needed during index rebuild
Milvus Credits: Additional allocation for large-scale indexing

Risk Management

Risk	Probability	Impact	Mitigation
Sudachi dependency issues	Medium	High	Prepare MeCab alternative
NER model accuracy shortfall	Low	Medium	Fine-tuning on internal data
Index rebuild time overrun	Medium	Medium	Incremental indexing strategy
Insufficient user feedback	Medium	Medium	Introduce incentive mechanism

Document Version: 1.0
Status: Ready for Implementation
Owner: RAG Development Team
Last Updated: 2026-05-20