Skip to content

Edge Phase 1-5 implementation report

[!NOTE] For the latest implementation status, please refer to Functional Implementation Status (Remaining Functionality).

Update date: 2026-04-09

Overview: - Condensed Phases 1-5 of the Edge implementation plan into a locally re-executable format. - Execution pipeline, representative sample generation, conversion difference verification, actual machine runbook generation, and evaluation report generation are already implemented. - The actual measurement of Raspberry Pi / Android / iPhone has not been carried out due to the absence of hardware, but the execution procedure and deliverables have been prepared.

Implemented content

  1. Phase 1: Preparation
  2. Added scripts/device/generate_rep_samples.py to enable automatic generation of .npy samples for quantization.
  3. The presence of dependencies in scripts/device/run_edge_phase_pipeline.py is automatically checked and saved in JSON.

  4. Phase 2: Local PoC

  5. Started edge_server.py from scripts/device/run_edge_phase_pipeline.py and automated delay measurement for 100 requests in mobile_client_sim.py.
  6. In local execution, steady-state was about 1ms, and the maximum value including cold-start was about 4.6 seconds.

  7. Phase 3: Transformation and quantization verification

  8. Modified scripts/device/convert_torchscript.py to output eager / TorchScript / ONNX artifact.
  9. Fixed broken execution flow in scripts/device/ios_convert_coreml.py.
  10. Made scripts/device/android_convert_tflite.py compatible with eager / TorchScript artifact.
  11. Added scripts/device/validate_converted_model.py so that output differences between eager and TorchScript can be compared by L2 distance.
  12. In this local run, the TorchScript difference was 0.0 on average and 0.0 at maximum.

  13. Phase 4: Preparation for actual machine verification

  14. Automatically generated phase4_device_runbook.md from scripts/device/run_edge_phase_pipeline.py and fixed the execution procedure for Raspberry Pi / Android / iPhone.
  15. It is assumed that an external USB power meter, Android Studio Profiler, and Xcode Instruments are used to measure the power of the actual device.

  16. Phase 5: Evaluation and decision making

  17. Automatically generates a Markdown report from local execution results and provides current recommendations.
  18. The current judgment is ``SDK on edge is the first choice for production models that include SNN-specific processing, and conversion is also a good choice for simple dense models.''

Current execution result

  • Run report: bench_output/edge_phase_runs/20260409-130535/edge_phase1_5_report.md
  • Validation JSON: bench_output/edge_phase_runs/20260409-130535/phase3_validation.json
  • Latency summary: bench_output/edge_phase_runs/20260409-130535/phase2_latency_summary.json
  • Actual device runbook: bench_output/edge_phase_runs/20260409-130535/phase4_device_runbook.md

Main figures

  • Dependencies:
  • Available: torch, requests, psutil, fastapi, uvicorn, zenoh
  • Not installed: onnx, onnx_tf, tensorflow, coremltools
  • Latency:
  • Average: about 46.97 ms
  • p50: approx. 1.02 ms
  • p95: approx. 1.26 ms
  • Maximum: approx. 4592.98 ms
  • Conversion difference:
  • TorchScript L2 Average: 0.0
  • TorchScript L2 Max: 0.0

Unfinished items

  • TFLite real conversion
  • Reason: onnx, onnx_tf, tensorflow are not installed.
  • CoreML real conversion
  • Reason: coremltools is not installed
  • Actual power measurement
  • Reason: Hardware and instruments are not available from this environment

judgment

  • For now, we prioritize SDK on edge.
  • However, if the production model is configured with a dense system and the difference after TFLite / CoreML conversion and the actual power consumption are within an acceptable range, it is worth switching to converted deployment for Android / iPhone.

Next implementation candidate

  1. Install onnx, onnx-tf, tensorflow, coremltools and fully execute Phase 3
  2. Run phase4_device_runbook.md on the actual device and obtain power, thermal, and long-term stability data
  3. Create a representative sample with the production model and rerun the same validation pipeline