Skip to content

Intersection F1 calculation code #80

@astrocyted

Description

@astrocyted

Since PSDS Eval package has been removed from github and so the support for it, is there a plan to have a separate standalone code for evaluation of this metric in the repo without having to import from a somewhat obscure psds_eval package that has been removed from github?

I was getting NaNs in my "per class F1 score" so i had to go through the psds_eval package only to discover that its due to the line:

        num_gts = per_class_tp / tp_ratios

where it calculates num_gts in this really bizarre way assuming tp_ratio never being zero(!) and hence yields the false negatives and F1 of all classes with zero TP, to become nan.

def compute_macro_f_score(self, detections, beta=1.):
        """Computes the macro F_score for the given detection table

        The DTC/GTC/CTTC criteria presented in the ICASSP paper (link above)
        are exploited to compute the confusion matrix. From the latter, class
        dependent F_score metrics are computed. These are further averaged to
        compute the macro F_score.

        It is important to notice that a cross-trigger is also counted as
        false positive.

        Args:
            detections (pandas.DataFrame): A table of system detections
                that has the following columns:
                "filename", "onset", "offset", "event_label".
            beta: coefficient used to put more (beta > 1) or less (beta < 1)
                emphasis on false negatives.

        Returns:
            A tuple with average F_score and dictionary with per-class F_score

        Raises:
            PSDSEvalError: if class instance doesn't have ground truth table
        """
        if self.ground_truth is None:
            raise PSDSEvalError("Ground Truth must be provided before "
                                "adding the first operating point")

        det_t = self._init_det_table(detections)
        counts, tp_ratios, _, _ = self._evaluate_detections(det_t)

        per_class_tp = np.diag(counts)[:-1]
        num_gts = per_class_tp / tp_ratios
        per_class_fp = counts[:-1, -1]
        per_class_fn = num_gts - per_class_tp
        f_per_class = self.compute_f_score(per_class_tp, per_class_fp,
                                           per_class_fn, beta)

        # remove the injected world label
        class_names_no_world = sorted(set(self.class_names
                                          ).difference([WORLD]))
        f_dict = {c: f for c, f in zip(class_names_no_world, f_per_class)}
        f_avg = np.nanmean(f_per_class)

        return f_avg, f_dict 

This behaviour by the way could easily lead to a significant overestimation of macro intersection F1 using this code, because if the model's output for a rare class yields zero TPs, the macro F1 in this package ignores it and calculates average across the rest of the classes.

So I think it would be helpful to have a more transparent and clean standalone code for intersection based F1 in the repo.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions