Evaluation SDK design/implementation guide

[!NOTE] For the latest implementation status, please refer to Functional Implementation Status (Remaining Functionality).

Overview

EvoSpikeNet's evaluation system (Evaluator) is an interface for quantifying the performance of individuals and teams in optimization processes such as evolutionary algorithms and reinforcement learning.

BaseEvaluator is an abstract class, and evaluate_competitive (competition between individuals) and evaluate_team (team evaluation) must be implemented.
For production operation, you can implement at least one concrete class such as DefaultEvaluator and add your own evaluation strategy class depending on the purpose.

Implementation requirements

All Evaluators should inherit from BaseEvaluator and implement two methods:
- evaluate_competitive(genome1, genome2) -> Dict[str, float]
- evaluate_team(team: List[Any]) -> Dict[str, Any]
The evaluation function can describe logic according to the purpose, such as brain instantiation, forward calculation, score calculation, etc.
It is recommended to provide multiple Evaluator implementations as SDK samples in order to switch and expand evaluation strategies.

Sample implementation

DefaultEvaluator: Score with L2 norm of brain forward
CooperativeEvaluator: Evaluate based on the average score of the entire team
CustomEvaluator: Use any external metric or complex reward function

Implementation example

from evospikenet.evaluators import BaseEvaluator

class CooperativeEvaluator(BaseEvaluator):
    def evaluate_competitive(self, genome1, genome2):
        # Competitive evaluation (e.g. winner determined by total score)
        ...
    def evaluate_team(self, team):
        # Average score for the whole team
        scores = [self._score(g) for g in team]
        mean_score = sum(scores) / len(scores)
        return {"team_score": mean_score, "individual_scores": scores}
    def _score(self, genome):
        # Score calculation using brain forward etc.
        ...

Notes

In the actual implementation of the evaluation system, returning only random numbers and dummy values is prohibited, and a meaningful score calculation must be performed.
SDK users can add or replace their own Evaluators according to their needs.