Cajamar Predictive Modelling Challenge (PyConEs16)

Solution for the Cajamar predictive modelling datathon (PyConEs16) - http://www.cajamardatalab.com/datathon-cajamar-pythonhack-2016/

I tried several models and classifiers, getting the best results with a combination of a feature selection by an extraTrees, and a kNN classifier.

The biggest problem was the extremely imbalanced training dataset: only 1.6% of the instances of the positive class. Due to this, a SMOTE oversampling was applied to balance the training dataset (Library https://github.com/scikit-learn-contrib/imbalanced-learn).

I tried lots of classifiers and methods (xgboost, randomForest...) but due to the limited time given, I couldn't do parameter optimizations and had to stick with the classifier applied in the code, which was far from being perfect, but at least classified correctly more than half of the positive classes in the test splits.

Although I didnt get into the top 3, I learnt a lot!

The winner solution can be found here: https://github.com/masdeseiscaracteres/pythonhack2016

The datasets

The datasets can not be published, due to the terms and conditions of the contest.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
README.md		README.md
main.py		main.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Cajamar Predictive Modelling Challenge (PyConEs16)

The datasets

About

Uh oh!

Releases

Packages

Languages

miqlar/Cajamar-Predictive-Modelling-PyConEs16

Folders and files

Latest commit

History

Repository files navigation

Cajamar Predictive Modelling Challenge (PyConEs16)

The datasets

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages