ysims · ysims · Nov 24, 2025 · Nov 23, 2025 · Nov 23, 2025 · Nov 23, 2025
diff --git a/README.md b/README.md
@@ -7,8 +7,10 @@ EcoNetToolkit lets you train a shallow neural network or classical models on you
 
 - CSV input with automatic preprocessing (impute, scale, encode)
 - Model zoo: MLP (shallow), Random Forest, SVM, XGBoost, Logistic Regression, Linear Regression
+- **Hyperparameter tuning** with grouped train/val/test splits (prevents data leakage)
 - Repeated training with different seeds for stable estimates
 - Metrics, including for unbalanced datasets (balanced accuracy, PR AUC)
+- K-fold cross-validation with spatial/temporal grouping
 - Configure the project from a single config file
 
 ## Table of Contents
@@ -26,6 +28,7 @@ EcoNetToolkit lets you train a shallow neural network or classical models on you
     - [Available models and key parameters](#available-models-and-key-parameters)
     - [Notes on metrics](#notes-on-metrics)
     - [Additional notes](#additional-notes)
+  - [Hyperparameter Tuning](#hyperparameter-tuning)
   - [Using your own data](#using-your-own-data)
   - [Testing](#testing)
     - [Unit Tests](#unit-tests)
@@ -211,6 +214,60 @@ output:
 
 **Note:** If `output.dir` is not specified, outputs are automatically saved to `outputs/<config_name>/` where `<config_name>` is derived from your config file name.
 
+### Multi-output (multi-target) prediction
+
+EcoNetToolkit supports predicting **multiple target variables simultaneously** (multi-output learning). This is useful when you want to predict several related outcomes from the same features.
+
+**Example: Multi-output classification**
+
+```yaml
+problem_type: classification
+
+data:
+    path: data/palmerpenguins_extended.csv
+    features: [bill_length_mm, bill_depth_mm, flipper_length_mm, body_mass_g, island]
+    labels: [species, sex, life_stage]  # Predict 3 labels simultaneously
+    test_size: 0.2
+    val_size: 0.15
+    scaling: standard
+
+models:
+  - name: random_forest
+    params:
+      n_estimators: 100
+      max_depth: 15
+
+training:
+    repetitions: 5
+```
+
+**Example: Multi-output regression**
+
+```yaml
+problem_type: regression
+
+data:
+    path: data/possum.csv
+    features: [hdlngth, skullw, totlngth, taill]
+    labels: [age, chest, belly]  # Predict 3 continuous values
+    test_size: 0.2
+    scaling: standard
+
+models:
+  - name: mlp
+    params:
+      hidden_layer_sizes: [32, 16]
+      max_iter: 500
+```
+
+**Key points:**
+- Use `labels:` (list) instead of `label:` (single string) to specify multiple targets
+- For backward compatibility, `label:` still works for single-output prediction
+- Multi-output metrics report mean and standard deviation across all outputs
+- Some models support multi-output natively (Random Forest, MLP Regressor), others are wrapped automatically (Logistic Regression, SVM, Linear Regression)
+
+See `configs/penguins_multilabel.yaml` and `configs/possum_multilabel.yaml` for complete examples.
+
 ### Available models and key parameters
 
 **MLP (Multi-Layer Perceptron)**
@@ -265,6 +322,70 @@ output:
 - For multi-class problems, macro-averaged Precision/Recall/F1 summarise performance across all classes.
 - Models are ranked by Cohen's kappa (classification) or MSE (regression) to identify the best performer.
 
+## Hyperparameter Tuning
+
+EcoNetToolkit includes automated hyperparameter tuning with proper train/validation/test splits to prevent data leakage. This is especially important for ecological data with spatial or temporal structure.
+
+**Quick Example:**
+
+```bash
+python run.py --config configs/mangrove_tuning.yaml
+```
+
+**Key Features:**
+
+- **Grouped splits**: Assign groups (e.g., patches, sites, years) to train/val/test sets
+- **Automatic search**: GridSearchCV or RandomizedSearchCV to find optimal hyperparameters
+- **Multiple seeds**: Run with different random seeds for stable results
+- **Proper evaluation**: Tune on train+val, evaluate on held-out test set
+
+**Example Config:**
+
+```yaml
+problem_type: regression
+
+data:
+  path: data/mangrove.csv
+  cv_group_column: patch_id    # Group by spatial patches
+  n_train_groups: 4            # 4 patches for training
+  n_val_groups: 2              # 2 patches for validation (tuning)
+  n_test_groups: 2             # 2 patches for test (final eval)
+
+  labels: [NDVI]
+  features: [pu_x, pu_y, temperature, ...]
+  scaling: standard
+
+# Enable hyperparameter tuning
+tuning:
+  enabled: true
+  search_method: random       # "random" or "grid"
+  n_iter: 30                  # Number of parameter combinations
+  cv_folds: 3                 # CV folds during tuning
+  scoring: neg_mean_squared_error
+  n_jobs: -1                  # Use all CPU cores
+
+# Define models and search spaces
+models:
+  - name: random_forest
+    param_space:
+      n_estimators: [100, 200, 500, 1000]
+      max_depth: [10, 20, 30, 50]
+      min_samples_split: [2, 5, 10]
+      max_features: [sqrt, log2, 0.5]
+
+training:
+  repetitions: 5
+  random_seed: 42
+```
+
+**Outputs include:**
+- Best hyperparameters for each seed
+- Validation and test set performance
+- Comparison plots
+- Trained models with optimal parameters
+
+For detailed information, see [docs/HYPERPARAMETER_TUNING.md](docs/HYPERPARAMETER_TUNING.md)
+
 ## Using your own data
 
 1. Place your CSV file in the `data` folder.
@@ -321,7 +442,7 @@ python run.py --config configs/penguins_config.yaml
 python run.py --config configs/possum_config.yaml
 ```
 
-These demonstrate that the toolkit works correctly for both problem types and generates appropriate metrics and visualizations.
+These demonstrate that the toolkit works correctly for both problem types and generates appropriate metrics and visualisations.
 
 ## Troubleshooting
 

diff --git a/configs/penguins_multilabel.yaml b/configs/penguins_multilabel.yaml
@@ -0,0 +1,64 @@
+# Multi-output classification example using Palmer Penguins dataset
+# This config demonstrates predicting multiple labels simultaneously
+
+problem_type: classification   # Multi-output classification
+
+data:
+  path: data/palmerpenguins_extended.csv
+
+  # Features: Physical measurements and contextual variables
+  features: 
+    - bill_length_mm       # Continuous: length of bill in millimeters
+    - bill_depth_mm        # Continuous: depth of bill in millimeters
+    - flipper_length_mm    # Continuous: flipper length in millimeters
+    - body_mass_g          # Continuous: body mass in grams
+    - island               # Categorical: Biscoe, Dream, Torgensen
+    - year                 # Numeric: 2021-2025
+
+  # Multiple target variables to predict simultaneously
+  labels:
+    - species              # Adelie, Chinstrap, Gentoo (3 classes)
+    - sex                  # male, female (2 classes)
+    - life_stage           # adult, juvenile, chick (3 classes)
+
+  test_size: 0.2           # 20% for testing
+  val_size: 0.15           # 15% of remaining for validation
+  random_state: 42         # For reproducibility
+  scaling: standard        # Standardize features
+  impute_strategy: mean    # Imputation for missing values
+
+# Train multiple models for comparison
+models:
+  # Random Forest - natively supports multi-output
+  - name: random_forest
+    params:
+      n_estimators: 100
+      max_depth: 15
+      min_samples_split: 5
+      min_samples_leaf: 2
+      max_features: sqrt
+      class_weight: balanced
+
+  # MLP - good for multi-output tasks
+  - name: mlp
+    params:
+      hidden_layer_sizes: [64, 32]
+      max_iter: 300
+      early_stopping: true
+      validation_fraction: 0.1
+      n_iter_no_change: 15
+
+  # Logistic Regression - wrapped with MultiOutputClassifier
+  - name: logistic
+    params:
+      max_iter: 1000
+      class_weight: balanced
+
+# Training configuration
+training:
+  repetitions: 5           # Train 5 times with different seeds
+  random_seed: 0
+
+# Output configuration
+output:
+  dir: outputs/penguins_multilabel
diff --git a/configs/possum_multilabel.yaml b/configs/possum_multilabel.yaml
@@ -0,0 +1,61 @@
+# Multi-output regression example using Possum dataset
+# This config demonstrates predicting multiple continuous variables simultaneously
+
+problem_type: regression   # Multi-output regression
+
+data:
+  path: data/possum.csv
+
+  # Features: Various possum measurements
+  features: 
+    - hdlngth     # Head length
+    - skullw      # Skull width
+    - totlngth    # Total length
+    - taill       # Tail length
+    - footlgth    # Foot length
+    - earconch    # Ear conch length
+    - eye         # Eye measurement
+
+  # Multiple target variables to predict simultaneously
+  labels:
+    - age         # Age of the possum
+    - chest       # Chest circumference
+    - belly       # Belly circumference
+
+  test_size: 0.2           # 20% for testing
+  val_size: 0.15           # 15% of remaining for validation
+  random_state: 42         # For reproducibility
+  scaling: standard        # Standardize features
+  impute_strategy: mean    # Imputation for missing values
+
+# Train multiple models for comparison
+models:
+  # Random Forest - natively supports multi-output
+  - name: random_forest
+    params:
+      n_estimators: 100
+      max_depth: 10
+      min_samples_split: 3
+      min_samples_leaf: 2
+
+  # MLP Regressor - natively supports multi-output
+  - name: mlp
+    params:
+      hidden_layer_sizes: [32, 16]
+      max_iter: 500
+      early_stopping: true
+      validation_fraction: 0.15
+      n_iter_no_change: 20
+
+  # Linear Regression - wrapped with MultiOutputRegressor
+  - name: linear
+    params: {}
+
+# Training configuration
+training:
+  repetitions: 5           # Train 5 times with different seeds
+  random_seed: 0
+
+# Output configuration
+output:
+  dir: outputs/possum_multilabel
diff --git a/ecosci/__init__.py b/ecosci/__init__.py
@@ -4,6 +4,6 @@
 from .data import CSVDataLoader
 from .models import ModelZoo
 from .trainer import Trainer
-from .eval import evaluate_and_report
+from .evaluation import evaluate_and_report
 
 __all__ = ["load_config", "CSVDataLoader", "ModelZoo", "Trainer", "evaluate_and_report"]