Skip to content

Feature 40: Geographically distributed node management system

Author: Masahiro Aoki

Implementation date: February 20, 2026 Version: 1.0.0 Status: ✅ Implemented

overview

EvoSpikeNet's Geographically Distributed Node Management System (Feature 40) manages globally distributed nodes by region, providing automatic failover in the event of failure and cross-region latency optimization.

Current status: Although the basic functions and API are working, Some modules are treated as "🟡partial implementation". Unit/integration/performance testing is complete. We are proceeding with the preparation of document links and a complete judgment through additional testing.


Architecture

┌─────────────────────────────────────────────────────────┐
│                    GeoNodeManager                        │
│                                                         │
│  ┌─────────────┐  ┌─────────────┐  ┌─────────────────┐ │
│  │  GeoRegion  │  │   GeoNode   │  │  LatencyMatrix  │ │
│  │  (5 default)│  │  (per node) │  │ (N×N regions)   │ │
│  └─────────────┘  └─────────────┘  └─────────────────┘ │
│                                                         │
│  ┌──────────────────────────────────────────────────┐  │
│  │  Background Health Poller (30s interval)          │  │
│  │  → RegionStatus update → Failover trigger         │  │
│  └──────────────────────────────────────────────────┘  │
└─────────────────────────────────────────────────────────┘

Default registration region

GeoNodeManager automatically registers the following 5 regions at startup.

Region ID Display Name Latitude Longitude Priority
ap-northeast-1 Asia Pacific (Tokyo) 35.68 139.69 1
us-east-1 US East (N. Virginia) 39.04 -77.49 2
us-west-2 US West (Oregon) 45.52 -122.68 3
eu-west-1 Europe (Ireland) 53.33 -6.25 4
ap-southeast-1 Asia Pacific (Singapore) 1.35 103.82 5

Latency calculation

Inter-region latency is estimated from actual distances on Earth using the Haversine equation.

\[d = 2r \cdot \arcsin\left(\sqrt{\sin^2\left(\frac{\phi_2 - \phi_1}{2}\right) + \cos\phi_1 \cos\phi_2 \sin^2\left(\frac{\lambda_2 - \lambda_1}{2}\right)}\right)\]
latency_ms = distance_km / 200.0 * 1000 + jitter_ms

Tokyo → Virginia: Approximately 145ms (approximately 1/3 of the circumference of the earth) Tokyo → Singapore: about 32ms


Core components

GeoRegion

<!-- from evospikenet.geo_node_manager import GeoRegion, RegionStatus -->

region = GeoRegion(
    region_id="ap-northeast-1",
    display_name="Asia Pacific (Tokyo)",
    provider="aws",
    latitude=35.6762,
    longitude=139.6503,
    priority=1,
)
print(region.status)  # RegionStatus.ONLINE

RegionStatus:

Value Description
online Operating normally
offline Offline
degraded Performance is deteriorating
maintenance Under maintenance
unknown Status unknown

GeoNode

<!-- TODO: update or remove - import fail<!-- Remember: Automatic conversion not possible  please fix manually -->mport GeoNode, NodeGeoStatus -->

node = GeoNode(
    node_id="prod-gpu-node-01",
    region_id="ap-northeast-1",
    endpoint="10.0.1.100:8000",
    node_type="gpu",   # general / gpu / scheduler / storage
)

GeoNodeManager

<!-- Module 'evospikenet' not found. Check moves/renames within the package -->
<!-<!-- Remember: Cannot convert automatically  please fix manually -->_node_manager.add_region(region)
geo_node_manager.get_region("ap-northeast-1")
geo_node_manager.remove_region("ap-northeast-1")
regions = geo_node_manager.list_regions()

# Node operations
geo_node_manager.register_node(node)
geo_node_manager.deregister_node("prod-gpu-node-01")
nodes = geo_node_manager.list_nodes(region_id="ap-northeast-1")

# active region
active = geo_node_manager.get_active_region()
geo_node_manager.set_active_region("us-east-1")

# failover
event = geo_node_manager.trigger_failover(
    from_region="ap-northeast-1",
    to_region="us-east-1",
    reason="region_health_check_failed",
    triggered_by="health_monitor",
)

# latency matrix
matrix = geo_node_manager.get_latency_matrix()
# matrix["ap-northeast-1"]["us-east-1"] == 145.3

# replication group
source, targets, latencies = geo_node_manager.get_replication_group("ap-northeast-1")

REST API

GET /api/geo/status

Response example:```json { "active_region": "ap-northeast-1", "total_regions": 5, "total_nodes": 12, "online_regions": 5, "regions": { "ap-northeast-1": { "status": "online", "node_count": 4, "priority": 1 } } }

### GET `/api/geo/regions`

Returns a list of all regions.

### POST `/api/geo/regions`

Register a new region.

**Request body:**```json
{
  "region_id": "ap-northeast-3",
  "display_name": "Asia Pacific (Osaka)",
  "provider": "aws",
  "location": { "lat": 34.69, "lon": 135.50 },
  "priority": 6
}

GET /api/geo/regions/{region_id}

DELETE /api/geo/regions/{region_id}

GET /api/geo/nodes

Query parameters: region_id, status

POST /api/geo/nodes

Request body:```json { "node_id": "prod-gpu-node-01", "region_id": "ap-northeast-1", "endpoint": "10.0.1.100:8000", "node_type": "gpu" }

### DELETE `/api/geo/nodes/{node_id}`

### POST `/api/geo/failover`

Perform a manual failover.

**Request body:**```json
{
  "from_region": "ap-northeast-1",
  "to_region": "us-east-1",
  "reason": "region_unavailable",
  "triggered_by": "operator"
}

response:```json { "status": "failover_executed", "event": { "event_id": "550e8400-...", "from_region": "ap-northeast-1", "to_region": "us-east-1", "timestamp": "2026-02-20T10:30:00Z", "reason": "region_unavailable", "triggered_by": "operator", "success": true } }

### GET `/api/geo/failover/history`

**Query parameters:** `limit` (default: 20)

### GET `/api/geo/latency-matrix`

**Response example:**```json
{
  "matrix": {
    "ap-northeast-1": {
      "us-east-1": 145.3,
      "us-west-2": 110.2,
      "eu-west-1": 230.1,
      "ap-southeast-1": 32.4
    },
    "us-east-1": {
      "ap-northeast-1": 145.3,
      ...
    }
  }
}

GET /api/geo/active-region

PUT /api/geo/active-region

Request body:```json { "region_id": "us-east-1" }

### GET `/api/geo/replication-group/{region_id}`

**Response example:**```json
{
  "source_region": "ap-northeast-1",
  "replication_targets": ["ap-southeast-1", "us-west-2"],
  "latencies_ms": {
    "ap-southeast-1": 32.4,
    "us-west-2": 110.2
  }
}


Failoverflow

1. ヘルスチェック (30秒間隔)
        │
        ▼ 失敗を検出
2. RegionStatus → OFFLINE
        │
        ▼
3. target_region 選定
   (status==ONLINE かつ 最高優先度)
        │
        ▼
4. active_region 切り替え
        │
        ▼
5. FailoverEvent 記録
        │
        ▼
6. アクティブリージョンのノードが新リクエストを処理

setting

config/settings.yaml:

geo_node_manager:
  enabled: true
  health_check_interval_seconds: 30
  state_file: "data/geo/geo_state.json"
  default_regions:
    - id: "ap-northeast-1"
      priority: 1
  auto_failover: true
  replication_group_size: 2   # Number of destination regions

Performance indicators

Metrics Target values Test reference
Latency matrix (10 regions) < 50ms test_latency_matrix_10_regions
Latency matrix (50 regions) < 500ms test_latency_matrix_50_regions
Automatic failover completion time < 30 seconds System test

Test

# unit test
pytest tests/unit/test_geo_node_manager.py -v

# Integration test
pytest tests/integration/test_features_36_39_40_integration.py::TestGeoNodeEndpoints -v

# System testing (multi-region lifecycle)
pytest tests/system/test_features_36_39_40_system.py::TestMultiRegionNodeLifecycle -v

# performance test
pytest tests/performance/test_features_36_39_40_performance.py::TestGeoNodeManagerPerformance -v