Python code for thesis, PIANO project The files can be grouped in three parts:
- comment-related files: PIANO_feature_comments, PIANO_clustering_comments and PIANO_textual analysis, that create new features for the comments, apply clustering and analyze textual characteristics respectively
- PIANO_comment_aggregation: aggregation of comments per users by four aggregation measures (sum, mean, max and min)
- user-related files: PIANO_author_label (definition of toxic authors), PIANO_classification_binary and no-toxic (binary classification and explanation, both including and removing toxicity features) PIANO_classification_5_SMOTE and no-toxic (multi-class single label classification and explanation, both including and removing toxicity features)