-
Notifications
You must be signed in to change notification settings - Fork 0
Add multi label capabilities for regression and add hyperparameter search #8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from all commits
Commits
Show all changes
15 commits
Select commit
Hold shift + click to select a range
288211c
add hyperparam tuning and multi label input
ysims cf4df3f
update tests
ysims 40c7771
add hyperparam tuning test
ysims ad49102
update eval
ysims 394b94a
cleaning
ysims c6315a0
Apply suggestions from code review
ysims a7a9bfc
Update ecosci/evaluation/feature_importance.py
ysims 76a3dac
Update tests/test_hyperopt.py
ysims c98a98e
explanation
ysims 051e29a
split up
ysims a24994b
fix warnings
ysims ba4fcbb
convert hyperopt to pytests
ysims c003f76
unpack
ysims 5fb6048
unused
ysims ddd531a
minor updates
ysims File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,64 @@ | ||
| # Multi-output classification example using Palmer Penguins dataset | ||
| # This config demonstrates predicting multiple labels simultaneously | ||
|
|
||
| problem_type: classification # Multi-output classification | ||
|
|
||
| data: | ||
| path: data/palmerpenguins_extended.csv | ||
|
|
||
| # Features: Physical measurements and contextual variables | ||
| features: | ||
| - bill_length_mm # Continuous: length of bill in millimeters | ||
| - bill_depth_mm # Continuous: depth of bill in millimeters | ||
| - flipper_length_mm # Continuous: flipper length in millimeters | ||
| - body_mass_g # Continuous: body mass in grams | ||
| - island # Categorical: Biscoe, Dream, Torgensen | ||
| - year # Numeric: 2021-2025 | ||
|
|
||
| # Multiple target variables to predict simultaneously | ||
| labels: | ||
| - species # Adelie, Chinstrap, Gentoo (3 classes) | ||
| - sex # male, female (2 classes) | ||
| - life_stage # adult, juvenile, chick (3 classes) | ||
|
|
||
| test_size: 0.2 # 20% for testing | ||
| val_size: 0.15 # 15% of remaining for validation | ||
| random_state: 42 # For reproducibility | ||
| scaling: standard # Standardize features | ||
| impute_strategy: mean # Imputation for missing values | ||
|
|
||
| # Train multiple models for comparison | ||
| models: | ||
| # Random Forest - natively supports multi-output | ||
| - name: random_forest | ||
| params: | ||
| n_estimators: 100 | ||
| max_depth: 15 | ||
| min_samples_split: 5 | ||
| min_samples_leaf: 2 | ||
| max_features: sqrt | ||
| class_weight: balanced | ||
|
|
||
| # MLP - good for multi-output tasks | ||
| - name: mlp | ||
| params: | ||
| hidden_layer_sizes: [64, 32] | ||
| max_iter: 300 | ||
| early_stopping: true | ||
| validation_fraction: 0.1 | ||
| n_iter_no_change: 15 | ||
|
|
||
| # Logistic Regression - wrapped with MultiOutputClassifier | ||
| - name: logistic | ||
| params: | ||
| max_iter: 1000 | ||
| class_weight: balanced | ||
|
|
||
| # Training configuration | ||
| training: | ||
| repetitions: 5 # Train 5 times with different seeds | ||
| random_seed: 0 | ||
|
|
||
| # Output configuration | ||
| output: | ||
| dir: outputs/penguins_multilabel |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,61 @@ | ||
| # Multi-output regression example using Possum dataset | ||
| # This config demonstrates predicting multiple continuous variables simultaneously | ||
|
|
||
| problem_type: regression # Multi-output regression | ||
|
|
||
| data: | ||
| path: data/possum.csv | ||
|
|
||
| # Features: Various possum measurements | ||
| features: | ||
| - hdlngth # Head length | ||
| - skullw # Skull width | ||
| - totlngth # Total length | ||
| - taill # Tail length | ||
| - footlgth # Foot length | ||
| - earconch # Ear conch length | ||
| - eye # Eye measurement | ||
|
|
||
| # Multiple target variables to predict simultaneously | ||
| labels: | ||
| - age # Age of the possum | ||
| - chest # Chest circumference | ||
| - belly # Belly circumference | ||
|
|
||
| test_size: 0.2 # 20% for testing | ||
| val_size: 0.15 # 15% of remaining for validation | ||
| random_state: 42 # For reproducibility | ||
| scaling: standard # Standardize features | ||
| impute_strategy: mean # Imputation for missing values | ||
|
|
||
| # Train multiple models for comparison | ||
| models: | ||
| # Random Forest - natively supports multi-output | ||
| - name: random_forest | ||
| params: | ||
| n_estimators: 100 | ||
| max_depth: 10 | ||
| min_samples_split: 3 | ||
| min_samples_leaf: 2 | ||
|
|
||
| # MLP Regressor - natively supports multi-output | ||
| - name: mlp | ||
| params: | ||
| hidden_layer_sizes: [32, 16] | ||
| max_iter: 500 | ||
| early_stopping: true | ||
| validation_fraction: 0.15 | ||
| n_iter_no_change: 20 | ||
|
|
||
| # Linear Regression - wrapped with MultiOutputRegressor | ||
| - name: linear | ||
| params: {} | ||
|
|
||
| # Training configuration | ||
| training: | ||
| repetitions: 5 # Train 5 times with different seeds | ||
| random_seed: 0 | ||
|
|
||
| # Output configuration | ||
| output: | ||
| dir: outputs/possum_multilabel |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.