RAG 2.0 Implementation Roadmap - Remaining Functionality and Future Phases
Last Updated: 2026-05-23
Status: Implementation Planning + Operational Hardening Implemented
Owner: RAG System Development Team
Overview
A comprehensive document for managing RAG 2.0 project implementation phases and remaining functionality. Phases 1-5 are detailed in RAG_JAPANESE_SpecV2.en.md. This document manages Phase 6+ extension features and continuous improvements.
Implementation & Planning Phases Overview
┌─────────────────────────────────────────────────────────────────┐
│ RAG 2.0 Implementation Timeline │
├─────────────────────────────────────────────────────────────────┤
│ │
│ Phase 1 (Prep) Phase 2 (KW Opt) Phase 3 (Semantic) │
│ ───────────── ────────────────── ──────────────── │
│ • Sudachi • Sudachi + ES • Query Expansion │
│ • NER Model • Index Rebuild • NER Integration │
│ • Test Cases • Evaluation • Entity Boosting │
│ [2-3 days] [2-3 days] [3-4 days] │
│ │
│ ↓ ↓ ↓ │
│ ┌─────────────┐ ┌──────────────┐ ┌──────────────┐ │
│ │ READY │ │ MRR > 0.7 │ │ Recall > 0.8 │ │
│ └─────────────┘ └──────────────┘ └──────────────┘ │
│ │
│ Phase 4 (Integration) Phase 5 (Operations) │
│ ────────────────── ─────────────────── │
│ • RRF Tuning • Dashboard │
│ • Full Test • Monitoring │
│ • Production • Feedback Loop │
│ [2-3 days] [Ongoing] │
│ │
│ ↓ ↓ │
│ ┌─────────────┐ ┌──────────────┐ │
│ │NDCG > 0.75 │ │ Production │ │
│ └─────────────┘ └──────────────┘ │
│ │
│ Phase 6+ (REMAINING - This Document) │
│ ──────────────────────────────────── │
│ • Context-aware RRF │
│ • Advanced NER fine-tuning │
│ • Relevance Feedback Loop │
│ • Multilingual Expansion │
│ │
└─────────────────────────────────────────────────────────────────┘
Phase 1-5: Core Phases (Details in RAG_JAPANESE_SpecV2.en.md)
| Phase | Name | Duration | Key Tasks | Exit Criteria |
|---|---|---|---|---|
| 1 | Preparation & Validation | 2-3 days | Sudachi/NER/Test Cases | Test set frozen |
| 2 | KW Search Optimization | 2-3 days | Sudachi + ES integration | MRR > 0.7 |
| 3 | Semantic Optimization | 3-4 days | Query Exp + NER | Recall > 0.8 |
| 4 | Integration & Tuning | 2-3 days | RRF tuning + full test | NDCG > 0.75 |
| 5 | Operations & Monitoring | Ongoing | Dashboard + Feedback | SLA achievement |
2026-05-22 Implementation Reflection (RAG v2 Hardening and Operational Observability)
The following items are already reflected in EvoSpikeNet-Core implementation (evospikenet/api_modules/rag_v2_api.py).
- v2 API hardening:
- Strengthened fail-closed behavior for
POST /api/v2/rag/searchandPOST /api/v2/rag/feedback. - Returns
503when preprocessing dependencies (Sudachi/NER/QueryExpander) are unavailable instead of silently degrading.
- Strengthened fail-closed behavior for
- Preprocessing component split:
- Explicitly separated
SudachiTokenizer,EntityRecognizer, andQueryExpanderresponsibilities.
- Explicitly separated
- Sudachi version guard:
- Validates expected defaults
sudachipy=0.7.5and dictionary20240716. - Enforced as fail-closed in production-like environments, warning-only in non-production.
- Validates expected defaults
- QueryExpander enhancements:
- Implemented
rule|llm|hybridbackends. - LLM expansion is parsed through a fixed JSON schema (
{"expansions": [...]}). - Invalid JSON raises in strict mode and falls back to deterministic behavior in non-strict mode.
- Implemented
- Quality guard:
- Evaluates
diversity_scoreandredundancy_rate. - Automatically falls back from
llm/hybridtoruleon low-quality expansion output.
- Evaluates
- Observability:
- Added
GET /api/v2/rag/preprocessing/health. - Added
query_expansion_quality,query_expansion_guard,query_expansion_guard_stats,query_expansion_guard_history, andquery_expansion_guard_hash_summarytodebug_info.preprocessing. - Guard history is maintained as a ring buffer with timestamp, reason, fallback backend, query hash, and quality snapshot.
- Added
Newly Added / Updated Environment Variables
RAG_V2_NER_BACKEND(transformers|regex)RAG_V2_NER_MODELRAG_V2_PREPROCESSING_WARMUPRAG_V2_PREPROCESSING_WARMUP_STRICTRAG_V2_SUDACHI_VERSIONRAG_V2_SUDACHI_DICT_VERSIONRAG_V2_SUDACHI_DICT_DIST_NAMESRAG_V2_QUERY_EXPANDER_BACKEND(rule|llm|hybrid)RAG_V2_QUERY_EXPANDER_STRICTRAG_V2_QUERY_EXPANDER_LM_BACKENDRAG_V2_QUERY_EXPANDER_QUALITY_GUARDRAG_V2_QUERY_EXPANDER_MIN_DIVERSITYRAG_V2_QUERY_EXPANDER_MAX_REDUNDANCYRAG_V2_QUERY_EXPANDER_GUARD_HISTORY_SIZE
Remaining Forward-Looking Work
- Full context-aware RRF rollout (Phase 6)
- Domain-specific NER fine-tuning (Phase 7)
- Automated feedback-driven adjustment loop (Phase 8)
- Multilingual extension (Phase 9)
2026-05-23 Implementation Reflection (RAG v2 Memory-Ranking Contract Sync)
The following items are now reflected in EvoSpikeNet-Core and RAG_JAPANESE_SpecV2.en.md v3.1.
- RRF aggregation across query expansions:
- When the same
doc_idappears in multiple expanded-query results, its RRF score is accumulated.
- When the same
- Memory-enhanced ranking:
- The result of
RAGMemoryIntegrator.compute_memory_boost()is added tofinal_score. - Results are sorted by
final_scorebefore assigningrank.
- The result of
- API contract synchronization:
POST /api/v2/rag/searchexplicitly documentssession_id,memory_context, andresults[].memory_boost.POST /api/v2/rag/feedbackrequiressession_idand returnsmemory_idplusimportanceon success.
- Added test layers:
- Unit: memory-enhanced ranking calculation.
- Integration: API-level final ranking with
memory_boost. - System: RRF accumulation contract across query expansions.
- E2E: search-to-feedback memory journey.
Phase 6: Advanced Context-Aware Search
6.1 Context-Aware RRF Implementation
Objective: Incorporate episodic memory (when, who, context) into ranking weights
Background: Currently, RRF treats BM25 scores and vector distances equally. However, considering user context (e.g., user belongs to Project A) enables more sophisticated ranking.
Implementation Specification:
class ContextAwareRRF:
"""Context-aware RRF scoring"""
def __init__(self, user_context: Dict[str, Any]):
self.project_id = user_context.get("project_id")
self.department = user_context.get("department")
self.timestamp = user_context.get("timestamp")
self.search_history = user_context.get("search_history", [])
def compute_context_weight(self, doc: Dict) -> float:
"""
Compute context weight based on document metadata
Returns: Weight multiplier 0.5 ~ 2.0
- High relevance: 2.0
- Neutral: 1.0
- Low relevance: 0.5
"""
weight = 1.0
# Project match
if doc.get("project_id") == self.project_id:
weight *= 1.5
# Department match
if doc.get("department") == self.department:
weight *= 1.2
# Search history relevance
for prev_query in self.search_history[-5:]:
if prev_query in doc.get("source", ""):
weight *= 1.1
# Temporal recency
doc_age_days = (self.timestamp - doc.get("updated_at")).days
recency_penalty = max(0.5, 1.0 - (doc_age_days / 365) * 0.3)
weight *= recency_penalty
return min(weight, 2.0)
def reciprocal_rank_fusion_with_context(
self,
search_results_lists: List[List[Dict]],
k: int = 60
) -> List[str]:
"""
RRF with context weighting
"""
fused_scores = {}
for results in search_results_lists:
for i, result in enumerate(results):
doc_id = result["id"]
base_score = 1 / (k + i + 1)
context_weight = self.compute_context_weight(result)
weighted_score = base_score * context_weight
fused_scores[doc_id] = fused_scores.get(doc_id, 0) + weighted_score
reranked = sorted(fused_scores.items(), key=lambda x: x[1], reverse=True)
return [doc_id for doc_id, _ in reranked]
Timeline: 3-4 weeks after Phase 5, 2-3 weeks implementation
Expected Impact: - NDCG@10: 0.75 → 0.82 (+9%) - User satisfaction: 85% → 92%
Phase 7: Advanced Named Entity Recognition Fine-tuning
7.1 Domain-Specific NER Fine-tuning
Objective: Extract internal domain entities (org, department, project IDs) with 95%+ accuracy
Background: Standard NER models have insufficient accuracy (80-85%) for internal entities like project code "EV-2024-001" and department name "AI Systems Development Division".
Implementation Specification:
class DomainSpecificNER:
"""Organization-specific NER"""
def __init__(self, base_model: str = "tner/roberta-large-japanese-char-luw-ner"):
self.base_model = base_model
self.fine_tuned_model = None
self.entity_dict = self._load_internal_entities()
def _load_internal_entities(self) -> Dict[str, List[str]]:
"""Load internal entity dictionary"""
return {
"PROJECT_ID": ["EV-2024-001", "EV-2024-002", "SPIKE-00123"],
"DEPARTMENT": ["AI Systems Dev", "Data Science", "Research Group"],
"PRODUCT": ["EvoSpikeNet Pro", "EvoSpikeNet Core"],
"PERSON": ["John Doe", "Jane Smith"]
}
def fine_tune_on_internal_data(
self,
training_data: List[Dict],
num_epochs: int = 3
):
"""
Fine-tune on internal annotated data
"""
from transformers import AutoModelForTokenClassification, Trainer
model = AutoModelForTokenClassification.from_pretrained(self.base_model)
# Fine-tuning process
self.fine_tuned_model = model
def extract_entities_with_dict_matching(self, text: str) -> List[Dict]:
"""
Ensemble model prediction + dictionary matching
"""
# Model predictions
model_predictions = self._predict_with_model(text)
# Dictionary matching
dict_matches = self._match_against_internal_dict(text)
# Merge (deduplicate, combine confidence)
merged = self._merge_predictions(model_predictions, dict_matches)
return merged
Timeline: 4 weeks after Phase 5, 3-4 weeks implementation
Expected Impact: - Entity Recall: 85% → 95% - Entity Precision: 88% → 96%
Phase 8: Automated User Feedback Loop
8.1 Relevance Feedback & Self-Learning
Objective: Automatically improve model/rules from user "relevance" assessments
Background: Initial search results are evaluated by users. Accumulating these feedbacks and auto-adjusting model parameters enables continuous accuracy improvements.
Implementation Specification:
class FeedbackLoop:
"""User feedback collection and learning"""
def __init__(self, rag_system, feedback_db):
self.rag = rag_system
self.db = feedback_db
def record_feedback(
self,
query: str,
doc_id: str,
rating: int,
user_id: str,
timestamp: datetime
):
"""Record user feedback"""
feedback = {
"query": query,
"doc_id": doc_id,
"rating": rating,
"user_id": user_id,
"timestamp": timestamp
}
self.db.insert(feedback)
def analyze_feedback_patterns(self, window_days: int = 30) -> Dict:
"""
Extract problem patterns from recent feedback
"""
feedbacks = self.db.query_recent(days=window_days)
patterns = {
"low_rated_queries": [],
"false_negatives": [],
"entity_misses": [],
"variation_issues": []
}
for feedback in feedbacks:
if feedback["rating"] <= 2:
if self._is_entity_query(feedback["query"]):
patterns["entity_misses"].append(feedback)
elif self._is_variation_query(feedback["query"]):
patterns["variation_issues"].append(feedback)
return patterns
def auto_adjust_parameters(self, patterns: Dict):
"""
Auto-adjust parameters based on feedback patterns
"""
if len(patterns["entity_misses"]) > 5:
self.rag.entity_boost_weight *= 1.1
logging.info("Auto-increased entity boost weight")
if len(patterns["variation_issues"]) > 5:
self.rag.query_expansion_enabled = True
logging.info("Auto-enabled query expansion")
Timeline: 2 weeks after Phase 6, 2-3 weeks implementation
Expected Impact: - Monthly NDCG improvement: 0.5-1.0% (continuous) - User satisfaction auto-improvement
Phase 9: Multilingual & Multi-Regional Support
9.1 Multilingual RAG Extension
Objective: Support English, Chinese, and other languages
Current Status: - Japanese: Full support (Phase 1-5) - English: Basic support (language detection only) - Others: Not supported
Implementation Specification:
class MultilingualRAG:
"""Multilingual RAG"""
LANGUAGE_CONFIGS = {
"ja": {
"tokenizer": "sudachi",
"stop_words": "ja_stop",
"embedding_model": "paraphrase-multilingual-MiniLM-L12-v2",
"ner_model": "tner/roberta-large-japanese-char-luw-ner"
},
"en": {
"tokenizer": "english",
"stop_words": "english",
"embedding_model": "paraphrase-multilingual-MiniLM-L12-v2",
"ner_model": "dslim/bert-base-NER"
},
"zh": {
"tokenizer": "chinese",
"stop_words": "chinese",
"embedding_model": "paraphrase-multilingual-MiniLM-L12-v2",
"ner_model": "uer/roberta-base-chinese-cluener"
}
}
def retrieve_multilingual(self, query: str, languages: List[str] = None) -> Dict:
"""
Simultaneous search across multiple languages
"""
results = {}
for lang in languages:
if lang in self.supported_languages:
detected_lang = self._detect_language(query)
if detected_lang == lang:
results[lang] = self.rag.retrieve(query, lang_config=self.LANGUAGE_CONFIGS[lang])
else:
translated_query = self._translate(query, detected_lang, lang)
results[lang] = self.rag.retrieve(translated_query, lang_config=self.LANGUAGE_CONFIGS[lang])
return results
Timeline: 3 weeks after Phase 7, 4-5 weeks implementation
Expected Impact: - Global support realization - User base expansion
Phase 10: Real-time Updates & Streaming
10.1 Real-time Index Updates
Objective: Immediately reflect document additions/updates in indexes
Current State: - Batch updates (hours-days lag)
After Improvement: - Real-time updates (seconds)
Implementation Specification:
class RealtimeRAGIndexer:
"""Real-time index updates"""
def __init__(self, milvus_client, es_client):
self.milvus = milvus_client
self.es = es_client
self.update_queue = asyncio.Queue()
async def process_updates(self):
"""
Asynchronously process documents from queue
"""
while True:
doc = await self.update_queue.get()
try:
embedding = self._generate_embedding(doc["text"])
await self._update_milvus(doc["id"], embedding, doc)
await self._update_elasticsearch(doc["id"], doc)
logging.info(f"Document {doc['id']} updated in real-time")
except Exception as e:
logging.error(f"Failed to update document: {e}")
await self.update_queue.put(doc)
Timeline: 2 weeks after Phase 8, 3-4 weeks implementation
Expected Impact: - Information freshness improvement - Enhanced user experience
Phase 11: Explainable RAG
11.1 Search Result Rationale Display
Objective: Explain to users "why this document appeared"
Implementation Example:
class ExplainableRAG:
"""Explainable search results"""
def retrieve_with_explanation(self, query: str, top_k: int = 5) -> List[Dict]:
"""
Return search results with explanations
"""
results = []
docs, debug_info = self.rag.retrieve(query, return_debug_info=True)
for i, doc in enumerate(docs[:top_k]):
explanation = {
"document": doc,
"rank": i + 1,
"reasons": [
{
"type": "keyword_match",
"matched_terms": debug_info["keyword_results"][i]["matched_keywords"],
"score": debug_info["keyword_results"][i]["score"]
},
{
"type": "semantic_similarity",
"similarity": debug_info["vector_results"][i]["score"],
"explanation": "Small semantic distance to query"
},
{
"type": "rrf_fusion",
"combined_score": debug_info["rrf_scores"][doc["id"]]
}
]
}
results.append(explanation)
return results
Timeline: 1 week after Phase 9, 2 weeks implementation
Expected Impact: - Increased user trust - Improved debugging/optimization efficiency
Milestone Overview
| Phase | Name | Start | Duration | Main Deliverable |
|---|---|---|---|---|
| 1 | Preparation | Immediate | 2-3 days | Test set frozen |
| 2 | KW Optimization | Week 1 | 2-3 days | MRR 0.7+ |
| 3 | Semantic Optimization | Week 2 | 3-4 days | Recall 0.8+ |
| 4 | Integration & Tuning | Week 3 | 2-3 days | NDCG 0.75+ |
| 5 | Operations & Monitoring | Week 4 | Ongoing | Production ready |
| 6 | Context-aware RRF | Week 7 | 2-3 weeks | NDCG 0.82+ |
| 7 | Domain NER | Week 10 | 3-4 weeks | Entity Recall 95%+ |
| 8 | Feedback Loop | Week 13 | 2-3 weeks | Auto-improvement |
| 9 | Multilingual | Week 16 | 4-5 weeks | Global Ready |
| 10 | Real-time Updates | Week 21 | 3-4 weeks | Second-level Updates |
| 11 | Explainability | Week 25 | 2 weeks | Explainable RAG |
Resource Planning
Phase 1-5 (Required)
| Role | FTE | Duration |
|---|---|---|
| ML Engineer | 1.5 | 2 weeks |
| Backend Engineer | 1.5 | 2 weeks |
| QA / Testing | 0.5 | 2 weeks |
| Total | 3.5 | 2 weeks |
Phase 6-11 (Extension / Optional)
| Role | FTE | Duration |
|---|---|---|
| ML Engineer | 1.0 | 8 weeks |
| Backend Engineer | 1.0 | 8 weeks |
| DevOps | 0.5 | 8 weeks |
| Total | 2.5 | 8 weeks |
Success Metrics
Phase 1-5 Goals
| Metric | Baseline | Target | Deadline |
|---|---|---|---|
| Variation MRR | 0.42 | 0.70 | Week 3 |
| Entity Recall | 0.35 | 0.80 | Week 4 |
| Overall NDCG | 0.55 | 0.75 | Week 4 |
| User Satisfaction | 78% | 90% | Week 5 |
Phase 6-11 Goals
| Metric | Target | Deadline |
|---|---|---|
| Context-aware NDCG | 0.82 | Week 8 |
| Entity Precision | 0.96 | Week 11 |
| Self-learning NDCG improvement | +0.5-1.0%/month | Week 13 |
| Global Support | 3 languages | Week 21 |
Dependencies & Constraints
External Dependencies
- ✅ Sudachi / TNER / Elasticsearch - Available
- ⚠️ Internal terminology dictionary - Must be built in Phase 1
- ⚠️ User feedback mechanism - Implemented in Phase 5
Technical Constraints
- GPU Memory: 8GB+ recommended for NER model execution
- Elasticsearch Storage: 2x capacity needed during index rebuild
- Milvus Credits: Additional allocation for large-scale indexing
Risk Management
| Risk | Probability | Impact | Mitigation |
|---|---|---|---|
| Sudachi dependency issues | Medium | High | Prepare MeCab alternative |
| NER model accuracy shortfall | Low | Medium | Fine-tuning on internal data |
| Index rebuild time overrun | Medium | Medium | Incremental indexing strategy |
| Insufficient user feedback | Medium | Medium | Introduce incentive mechanism |
Document Version: 1.0
Status: Ready for Implementation
Owner: RAG Development Team
Last Updated: 2026-05-20