Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
58 changes: 45 additions & 13 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -25,7 +25,7 @@ The current version of TPOT was developed at Cedars-Sinai by:
- Jay Moran (jay.moran@cshs.org)
- Nicholas Matsumoto (nicholas.matsumoto@cshs.org)
- Hyunjun Choi (hyunjun.choi@cshs.org)
- Gabriel Ketron (gabriel.ketron@cshs.org)
- Gabriel Ketron (gabriel.ketron@cshs.org)
- Miguel E. Hernandez (miguel.e.hernandez@cshs.org)
- Jason Moore (moorejh28@gmail.com)

Expand Down Expand Up @@ -83,7 +83,6 @@ scipy
scikit-learn
update_checker
tqdm
stopit
pandas
joblib
xgboost
Expand Down Expand Up @@ -226,23 +225,36 @@ We welcome you to check the existing issues for bugs or enhancements to work on.

If you use TPOT in a scientific publication, please consider citing at least one of the following papers:

Trang T. Le, Weixuan Fu and Jason H. Moore (2020). [Scaling tree-based automated machine learning to biomedical big data with a feature set selector](https://academic.oup.com/bioinformatics/article/36/1/250/5511404). *Bioinformatics*.36(1): 250-256.
Hernandez, J. G., Saini, A. K., Ghosh, A., & Moore, J. H. (2025). [The tree-based pipeline optimization tool: Tackling biomedical research problems with genetic programming and automated machine learning](https://www.cell.com/patterns/fulltext/S2666-3899(25)00162-X). Patterns, 6(7).

BibTeX entry:

```bibtex
@article{le2020scaling,
title={Scaling tree-based automated machine learning to biomedical big data with a feature set selector},
author={Le, Trang T and Fu, Weixuan and Moore, Jason H},
journal={Bioinformatics},
volume={36},
number={1},
pages={250--256},
year={2020},
publisher={Oxford University Press}
```bibtext
@article{hernandez2025tree,
title={The tree-based pipeline optimization tool: Tackling biomedical research problems with genetic programming and automated machine learning},
author={Hernandez, Jose Guadalupe and Saini, Anil Kumar and Ghosh, Attri and Moore, Jason H},
journal={Patterns},
volume={6},
number={7},
year={2025},
publisher={Elsevier}
}
```

Ribeiro, P., Saini, A., Moran, J., Matsumoto, N., Choi, H., Hernandez, M., & Moore, J. H. (2024). [TPOT2: A New Graph-Based Implementation of the Tree-Based Pipeline Optimization Tool for Automated Machine Learning](https://link.springer.com/chapter/10.1007/978-981-99-8413-8_1). In Genetic programming theory and practice XX (pp. 1-17). Singapore: Springer Nature Singapore.

BitTex entry:

```bibtex
@incollection{ribeiro2024tpot2,
title={TPOT2: A New Graph-Based Implementation of the Tree-Based Pipeline Optimization Tool for Automated Machine Learning},
author={Ribeiro, Pedro and Saini, Anil and Moran, Jay and Matsumoto, Nicholas and Choi, Hyunjun and Hernandez, Miguel and Moore, Jason H},
booktitle={Genetic programming theory and practice XX},
pages={1--17},
year={2024},
publisher={Springer}
}
```

Randal S. Olson, Ryan J. Urbanowicz, Peter C. Andrews, Nicole A. Lavender, La Creis Kidd, and Jason H. Moore (2016). [Automating biomedical data science through tree-based pipeline optimization](http://link.springer.com/chapter/10.1007/978-3-319-31204-0_9). *Applications of Evolutionary Computation*, pages 123-137.

Expand Down Expand Up @@ -286,6 +298,26 @@ BibTeX entry:
}
```

## Related Papers

Trang T. Le, Weixuan Fu and Jason H. Moore (2020). [Scaling tree-based automated machine learning to biomedical big data with a feature set selector](https://academic.oup.com/bioinformatics/article/36/1/250/5511404). *Bioinformatics*.36(1): 250-256.

BibTeX entry:

```bibtex
@article{le2020scaling,
title={Scaling tree-based automated machine learning to biomedical big data with a feature set selector},
author={Le, Trang T and Fu, Weixuan and Moore, Jason H},
journal={Bioinformatics},
volume={36},
number={1},
pages={250--256},
year={2020},
publisher={Oxford University Press}
}
```


## Support for TPOT

TPOT was developed in the [Artificial Intelligence Innovation (A2I) Lab](http://epistasis.org/) at Cedars-Sinai with funding from the [NIH](http://www.nih.gov/) under grants U01 AG066833 and R01 LM010098. We are incredibly grateful for the support of the NIH and the Cedars-Sinai during the development of this project.
Expand Down
37 changes: 25 additions & 12 deletions docs/cite.md
Original file line number Diff line number Diff line change
@@ -1,23 +1,36 @@
# Citing TPOT
If you use TPOT in a scientific publication, please consider citing at least one of the following papers:

Trang T. Le, Weixuan Fu and Jason H. Moore (2020). [Scaling tree-based automated machine learning to biomedical big data with a feature set selector](https://academic.oup.com/bioinformatics/article/36/1/250/5511404). *Bioinformatics*.36(1): 250-256.
Hernandez, J. G., Saini, A. K., Ghosh, A., & Moore, J. H. (2025). [The tree-based pipeline optimization tool: Tackling biomedical research problems with genetic programming and automated machine learning](https://www.cell.com/patterns/fulltext/S2666-3899(25)00162-X). Patterns, 6(7).

BibTeX entry:

```bibtex
@article{le2020scaling,
title={Scaling tree-based automated machine learning to biomedical big data with a feature set selector},
author={Le, Trang T and Fu, Weixuan and Moore, Jason H},
journal={Bioinformatics},
volume={36},
number={1},
pages={250--256},
year={2020},
publisher={Oxford University Press}
```bibtext
@article{hernandez2025tree,
title={The tree-based pipeline optimization tool: Tackling biomedical research problems with genetic programming and automated machine learning},
author={Hernandez, Jose Guadalupe and Saini, Anil Kumar and Ghosh, Attri and Moore, Jason H},
journal={Patterns},
volume={6},
number={7},
year={2025},
publisher={Elsevier}
}
```

Ribeiro, P., Saini, A., Moran, J., Matsumoto, N., Choi, H., Hernandez, M., & Moore, J. H. (2024). [TPOT2: A New Graph-Based Implementation of the Tree-Based Pipeline Optimization Tool for Automated Machine Learning](https://link.springer.com/chapter/10.1007/978-981-99-8413-8_1). In Genetic programming theory and practice XX (pp. 1-17). Singapore: Springer Nature Singapore.

BitTex entry:

```bibtex
@incollection{ribeiro2024tpot2,
title={TPOT2: A New Graph-Based Implementation of the Tree-Based Pipeline Optimization Tool for Automated Machine Learning},
author={Ribeiro, Pedro and Saini, Anil and Moran, Jay and Matsumoto, Nicholas and Choi, Hyunjun and Hernandez, Miguel and Moore, Jason H},
booktitle={Genetic programming theory and practice XX},
pages={1--17},
year={2024},
publisher={Springer}
}
```

Randal S. Olson, Ryan J. Urbanowicz, Peter C. Andrews, Nicole A. Lavender, La Creis Kidd, and Jason H. Moore (2016). [Automating biomedical data science through tree-based pipeline optimization](http://link.springer.com/chapter/10.1007/978-3-319-31204-0_9). *Applications of Evolutionary Computation*, pages 123-137.

Expand Down Expand Up @@ -59,4 +72,4 @@ BibTeX entry:
publisher = {ACM},
address = {New York, NY, USA},
}
```
```
1 change: 0 additions & 1 deletion pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -33,7 +33,6 @@ dependencies = [
"scikit-learn>=1.6",
"update_checker>=0.16",
"tqdm>=4.36.1",
"stopit>=1.1.1",
"pandas>=2.2.0",
"joblib>=1.1.1",
"xgboost>=3.0.0",
Expand Down
46 changes: 0 additions & 46 deletions tpot/utils/eval_utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -40,11 +40,9 @@
import traceback
from collections.abc import Iterable
import warnings
from stopit import threading_timeoutable, TimeoutException
from tpot.selectors import survival_select_NSGA2
import time
import dask
import stopit
from dask.diagnostics import ProgressBar
from tqdm.dask import TqdmCallback
from dask.distributed import progress
Expand Down Expand Up @@ -269,47 +267,3 @@ def parallel_eval_objective_list(individual_list,
final_scores = process_scores(final_scores, n_expected_columns)
return final_scores, final_start_times, final_end_times, final_eval_errors


###################
# Parallel optimization
#############

@threading_timeoutable(np.nan) #TODO timeout behavior
def optimize_objective(ind, objective, steps=5, verbose=0):

with warnings.catch_warnings(record=True) as w: #catches all warnings in w so it can be supressed by verbose
try:
value = ind.optimize(objective, steps=steps)
if not isinstance(value, Iterable):
value = [value]

if len(w) and verbose>=2:
warnings.warn(w[0].message)
return value
except Exception as e:
if verbose >= 2:
print('WARNING THIS INDIVIDUAL CAUSED AND EXCEPTION')
print(e)
print()
if verbose >= 3:
print(traceback.format_exc())
print()
return [np.nan]



def parallel_optimize_objective(individual_list,
objective,
n_jobs = 1,
verbose=0,
steps=5,
timeout=None,
**objective_kwargs, ):

Parallel(n_jobs=n_jobs)(delayed(optimize_objective)(ind, objective, steps, verbose, timeout=timeout) for ind in individual_list ) #TODO: parallelize