Since PSDS Eval package has been removed from github and so the support for it, is there a plan to have a separate standalone code for evaluation of this metric in the repo without having to import from a somewhat obscure psds_eval package that has been removed from github?
I was getting NaNs in my "per class F1 score" so i had to go through the psds_eval package only to discover that its due to the line:
num_gts = per_class_tp / tp_ratios
where it calculates num_gts in this really bizarre way assuming tp_ratio never being zero(!) and hence yields the false negatives and F1 of all classes with zero TP, to become nan.
def compute_macro_f_score(self, detections, beta=1.):
"""Computes the macro F_score for the given detection table
The DTC/GTC/CTTC criteria presented in the ICASSP paper (link above)
are exploited to compute the confusion matrix. From the latter, class
dependent F_score metrics are computed. These are further averaged to
compute the macro F_score.
It is important to notice that a cross-trigger is also counted as
false positive.
Args:
detections (pandas.DataFrame): A table of system detections
that has the following columns:
"filename", "onset", "offset", "event_label".
beta: coefficient used to put more (beta > 1) or less (beta < 1)
emphasis on false negatives.
Returns:
A tuple with average F_score and dictionary with per-class F_score
Raises:
PSDSEvalError: if class instance doesn't have ground truth table
"""
if self.ground_truth is None:
raise PSDSEvalError("Ground Truth must be provided before "
"adding the first operating point")
det_t = self._init_det_table(detections)
counts, tp_ratios, _, _ = self._evaluate_detections(det_t)
per_class_tp = np.diag(counts)[:-1]
num_gts = per_class_tp / tp_ratios
per_class_fp = counts[:-1, -1]
per_class_fn = num_gts - per_class_tp
f_per_class = self.compute_f_score(per_class_tp, per_class_fp,
per_class_fn, beta)
# remove the injected world label
class_names_no_world = sorted(set(self.class_names
).difference([WORLD]))
f_dict = {c: f for c, f in zip(class_names_no_world, f_per_class)}
f_avg = np.nanmean(f_per_class)
return f_avg, f_dict
This behaviour by the way could easily lead to a significant overestimation of macro intersection F1 using this code, because if the model's output for a rare class yields zero TPs, the macro F1 in this package ignores it and calculates average across the rest of the classes.
So I think it would be helpful to have a more transparent and clean standalone code for intersection based F1 in the repo.
Since PSDS Eval package has been removed from github and so the support for it, is there a plan to have a separate standalone code for evaluation of this metric in the repo without having to import from a somewhat obscure psds_eval package that has been removed from github?
I was getting NaNs in my "per class F1 score" so i had to go through the psds_eval package only to discover that its due to the line:
where it calculates num_gts in this really bizarre way assuming tp_ratio never being zero(!) and hence yields the false negatives and F1 of all classes with zero TP, to become nan.
This behaviour by the way could easily lead to a significant overestimation of macro intersection F1 using this code, because if the model's output for a rare class yields zero TPs, the macro F1 in this package ignores it and calculates average across the rest of the classes.
So I think it would be helpful to have a more transparent and clean standalone code for intersection based F1 in the repo.