Edge Phase 1-5 implementation report

[!NOTE] For the latest implementation status, please refer to Functional Implementation Status (Remaining Functionality).

Update date: 2026-04-09

Overview: - Condensed Phases 1-5 of the Edge implementation plan into a locally re-executable format. - Execution pipeline, representative sample generation, conversion difference verification, actual machine runbook generation, and evaluation report generation are already implemented. - The actual measurement of Raspberry Pi / Android / iPhone has not been carried out due to the absence of hardware, but the execution procedure and deliverables have been prepared.

Implemented content

Phase 1: Preparation
Added scripts/device/generate_rep_samples.py to enable automatic generation of .npy samples for quantization.
The presence of dependencies in scripts/device/run_edge_phase_pipeline.py is automatically checked and saved in JSON.
Phase 2: Local PoC
Started edge_server.py from scripts/device/run_edge_phase_pipeline.py and automated delay measurement for 100 requests in mobile_client_sim.py.
In local execution, steady-state was about 1ms, and the maximum value including cold-start was about 4.6 seconds.
Phase 3: Transformation and quantization verification
Modified scripts/device/convert_torchscript.py to output eager / TorchScript / ONNX artifact.
Fixed broken execution flow in scripts/device/ios_convert_coreml.py.
Made scripts/device/android_convert_tflite.py compatible with eager / TorchScript artifact.
Added scripts/device/validate_converted_model.py so that output differences between eager and TorchScript can be compared by L2 distance.
In this local run, the TorchScript difference was 0.0 on average and 0.0 at maximum.
Phase 4: Preparation for actual machine verification
Automatically generated phase4_device_runbook.md from scripts/device/run_edge_phase_pipeline.py and fixed the execution procedure for Raspberry Pi / Android / iPhone.
It is assumed that an external USB power meter, Android Studio Profiler, and Xcode Instruments are used to measure the power of the actual device.
Phase 5: Evaluation and decision making
Automatically generates a Markdown report from local execution results and provides current recommendations.
The current judgment is ``SDK on edge is the first choice for production models that include SNN-specific processing, and conversion is also a good choice for simple dense models.''

Current execution result

Run report: bench_output/edge_phase_runs/20260409-130535/edge_phase1_5_report.md
Validation JSON: bench_output/edge_phase_runs/20260409-130535/phase3_validation.json
Latency summary: bench_output/edge_phase_runs/20260409-130535/phase2_latency_summary.json
Actual device runbook: bench_output/edge_phase_runs/20260409-130535/phase4_device_runbook.md

Main figures

Dependencies:
Available: torch, requests, psutil, fastapi, uvicorn, zenoh
Not installed: onnx, onnx_tf, tensorflow, coremltools
Latency:
Average: about 46.97 ms
p50: approx. 1.02 ms
p95: approx. 1.26 ms
Maximum: approx. 4592.98 ms
Conversion difference:
TorchScript L2 Average: 0.0
TorchScript L2 Max: 0.0

Unfinished items

TFLite real conversion
Reason: onnx, onnx_tf, tensorflow are not installed.
CoreML real conversion
Reason: coremltools is not installed
Actual power measurement
Reason: Hardware and instruments are not available from this environment

judgment

For now, we prioritize SDK on edge.
However, if the production model is configured with a dense system and the difference after TFLite / CoreML conversion and the actual power consumption are within an acceptable range, it is worth switching to converted deployment for Android / iPhone.

Next implementation candidate

Install onnx, onnx-tf, tensorflow, coremltools and fully execute Phase 3
Run phase4_device_runbook.md on the actual device and obtain power, thermal, and long-term stability data
Create a representative sample with the production model and rerun the same validation pipeline