In order to evaluate and rank the performances of the participant methods, we first used 2D topology-based segmentation metrics, together with the pixel error (for the sake of metric comparison). Each metric has an updated leader-board. However, retrospective evaluation of the original challenge scoring system revealed that it was not sufficiently robust to variations in the widths of neurite borders. After evaluating all of these metrics and associated variants, we found that specially normalized versions of the Rand error and Variation of Information best matched our qualitative judgements of segmentation quality:
- Foreground-restricted Rand Scoring after border thinning: VRand(thinned)
- Foreground-restricted Information Theoretic Scoring after border thinning: VInfo (thinned)
We found empirically that of these two popular metrics, VRand is more robust than VInfo so the new leader board is sorted by its value. You can find all details about the new metrics in our open-access challenge publication.
The old (and deprecated) metrics were:
- Minimum Splits and Mergers Warping error, a segmentation metric that penalizes topological disagreements, in this case, the object splits and mergers.
- Foreground-restricted Rand error: defined as 1 - the maximal F-score of the foreground-restricted Rand index, a measure of similarity between two clusters or segmentations. On this version of the Rand index we exclude the zero component of the original labels (background pixels of the ground truth).
- Pixel error: defined as 1 - the maximal F-score of pixel similarity, or squared Euclidean distance between the original and the result labels.
We understand that segmentation evaluation is an ongoing and sensitive research topic, therefore we open the metrics to discussion. Please, do not hesitate to contact the organizers to discuss about the metric selection.