Intersection F1 calculation code

Since PSDS Eval package has been removed from github and so the support for it, is there a plan to have a separate standalone code for evaluation of this metric in the repo without having to import from a somewhat obscure psds_eval package that has been removed from github? 

I was getting NaNs in my "per class F1 score" so i had to go through the psds_eval package only to discover that its due to the line:
```
        num_gts = per_class_tp / tp_ratios
```
where it calculates num_gts in this really bizarre way assuming tp_ratio never being zero(!) and hence yields the false negatives and F1 of all classes with zero TP, to become nan.
 
```
def compute_macro_f_score(self, detections, beta=1.):
        """Computes the macro F_score for the given detection table

        The DTC/GTC/CTTC criteria presented in the ICASSP paper (link above)
        are exploited to compute the confusion matrix. From the latter, class
        dependent F_score metrics are computed. These are further averaged to
        compute the macro F_score.

        It is important to notice that a cross-trigger is also counted as
        false positive.

        Args:
            detections (pandas.DataFrame): A table of system detections
                that has the following columns:
                "filename", "onset", "offset", "event_label".
            beta: coefficient used to put more (beta > 1) or less (beta < 1)
                emphasis on false negatives.

        Returns:
            A tuple with average F_score and dictionary with per-class F_score

        Raises:
            PSDSEvalError: if class instance doesn't have ground truth table
        """
        if self.ground_truth is None:
            raise PSDSEvalError("Ground Truth must be provided before "
                                "adding the first operating point")

        det_t = self._init_det_table(detections)
        counts, tp_ratios, _, _ = self._evaluate_detections(det_t)

        per_class_tp = np.diag(counts)[:-1]
        num_gts = per_class_tp / tp_ratios
        per_class_fp = counts[:-1, -1]
        per_class_fn = num_gts - per_class_tp
        f_per_class = self.compute_f_score(per_class_tp, per_class_fp,
                                           per_class_fn, beta)

        # remove the injected world label
        class_names_no_world = sorted(set(self.class_names
                                          ).difference([WORLD]))
        f_dict = {c: f for c, f in zip(class_names_no_world, f_per_class)}
        f_avg = np.nanmean(f_per_class)

        return f_avg, f_dict 
```


This behaviour by the way could easily lead to a significant overestimation of macro intersection F1 using this code, because if the model's output for a rare class yields zero TPs, the macro F1 in this package ignores it and calculates average across the rest of the classes.

So I think it would be helpful to have a more transparent and clean standalone code for intersection based F1 in the repo.



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Intersection F1 calculation code #80

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Intersection F1 calculation code #80

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions