diff --git a/README.md b/README.md index b35c0fe2..266da516 100644 --- a/README.md +++ b/README.md @@ -25,7 +25,7 @@ The current version of TPOT was developed at Cedars-Sinai by: - Jay Moran (jay.moran@cshs.org) - Nicholas Matsumoto (nicholas.matsumoto@cshs.org) - Hyunjun Choi (hyunjun.choi@cshs.org) - - Gabriel Ketron (gabriel.ketron@cshs.org) + - Gabriel Ketron (gabriel.ketron@cshs.org) - Miguel E. Hernandez (miguel.e.hernandez@cshs.org) - Jason Moore (moorejh28@gmail.com) @@ -83,7 +83,6 @@ scipy scikit-learn update_checker tqdm -stopit pandas joblib xgboost @@ -226,23 +225,36 @@ We welcome you to check the existing issues for bugs or enhancements to work on. If you use TPOT in a scientific publication, please consider citing at least one of the following papers: -Trang T. Le, Weixuan Fu and Jason H. Moore (2020). [Scaling tree-based automated machine learning to biomedical big data with a feature set selector](https://academic.oup.com/bioinformatics/article/36/1/250/5511404). *Bioinformatics*.36(1): 250-256. +Hernandez, J. G., Saini, A. K., Ghosh, A., & Moore, J. H. (2025). [The tree-based pipeline optimization tool: Tackling biomedical research problems with genetic programming and automated machine learning](https://www.cell.com/patterns/fulltext/S2666-3899(25)00162-X). Patterns, 6(7). BibTeX entry: -```bibtex -@article{le2020scaling, - title={Scaling tree-based automated machine learning to biomedical big data with a feature set selector}, - author={Le, Trang T and Fu, Weixuan and Moore, Jason H}, - journal={Bioinformatics}, - volume={36}, - number={1}, - pages={250--256}, - year={2020}, - publisher={Oxford University Press} +```bibtext +@article{hernandez2025tree, + title={The tree-based pipeline optimization tool: Tackling biomedical research problems with genetic programming and automated machine learning}, + author={Hernandez, Jose Guadalupe and Saini, Anil Kumar and Ghosh, Attri and Moore, Jason H}, + journal={Patterns}, + volume={6}, + number={7}, + year={2025}, + publisher={Elsevier} } ``` +Ribeiro, P., Saini, A., Moran, J., Matsumoto, N., Choi, H., Hernandez, M., & Moore, J. H. (2024). [TPOT2: A New Graph-Based Implementation of the Tree-Based Pipeline Optimization Tool for Automated Machine Learning](https://link.springer.com/chapter/10.1007/978-981-99-8413-8_1). In Genetic programming theory and practice XX (pp. 1-17). Singapore: Springer Nature Singapore. + +BitTex entry: + +```bibtex +@incollection{ribeiro2024tpot2, + title={TPOT2: A New Graph-Based Implementation of the Tree-Based Pipeline Optimization Tool for Automated Machine Learning}, + author={Ribeiro, Pedro and Saini, Anil and Moran, Jay and Matsumoto, Nicholas and Choi, Hyunjun and Hernandez, Miguel and Moore, Jason H}, + booktitle={Genetic programming theory and practice XX}, + pages={1--17}, + year={2024}, + publisher={Springer} +} +``` Randal S. Olson, Ryan J. Urbanowicz, Peter C. Andrews, Nicole A. Lavender, La Creis Kidd, and Jason H. Moore (2016). [Automating biomedical data science through tree-based pipeline optimization](http://link.springer.com/chapter/10.1007/978-3-319-31204-0_9). *Applications of Evolutionary Computation*, pages 123-137. @@ -286,6 +298,26 @@ BibTeX entry: } ``` +## Related Papers + +Trang T. Le, Weixuan Fu and Jason H. Moore (2020). [Scaling tree-based automated machine learning to biomedical big data with a feature set selector](https://academic.oup.com/bioinformatics/article/36/1/250/5511404). *Bioinformatics*.36(1): 250-256. + +BibTeX entry: + +```bibtex +@article{le2020scaling, + title={Scaling tree-based automated machine learning to biomedical big data with a feature set selector}, + author={Le, Trang T and Fu, Weixuan and Moore, Jason H}, + journal={Bioinformatics}, + volume={36}, + number={1}, + pages={250--256}, + year={2020}, + publisher={Oxford University Press} +} +``` + + ## Support for TPOT TPOT was developed in the [Artificial Intelligence Innovation (A2I) Lab](http://epistasis.org/) at Cedars-Sinai with funding from the [NIH](http://www.nih.gov/) under grants U01 AG066833 and R01 LM010098. We are incredibly grateful for the support of the NIH and the Cedars-Sinai during the development of this project. diff --git a/docs/cite.md b/docs/cite.md index ac7de6e6..5683a0ac 100644 --- a/docs/cite.md +++ b/docs/cite.md @@ -1,23 +1,36 @@ # Citing TPOT If you use TPOT in a scientific publication, please consider citing at least one of the following papers: -Trang T. Le, Weixuan Fu and Jason H. Moore (2020). [Scaling tree-based automated machine learning to biomedical big data with a feature set selector](https://academic.oup.com/bioinformatics/article/36/1/250/5511404). *Bioinformatics*.36(1): 250-256. +Hernandez, J. G., Saini, A. K., Ghosh, A., & Moore, J. H. (2025). [The tree-based pipeline optimization tool: Tackling biomedical research problems with genetic programming and automated machine learning](https://www.cell.com/patterns/fulltext/S2666-3899(25)00162-X). Patterns, 6(7). BibTeX entry: -```bibtex -@article{le2020scaling, - title={Scaling tree-based automated machine learning to biomedical big data with a feature set selector}, - author={Le, Trang T and Fu, Weixuan and Moore, Jason H}, - journal={Bioinformatics}, - volume={36}, - number={1}, - pages={250--256}, - year={2020}, - publisher={Oxford University Press} +```bibtext +@article{hernandez2025tree, + title={The tree-based pipeline optimization tool: Tackling biomedical research problems with genetic programming and automated machine learning}, + author={Hernandez, Jose Guadalupe and Saini, Anil Kumar and Ghosh, Attri and Moore, Jason H}, + journal={Patterns}, + volume={6}, + number={7}, + year={2025}, + publisher={Elsevier} } ``` +Ribeiro, P., Saini, A., Moran, J., Matsumoto, N., Choi, H., Hernandez, M., & Moore, J. H. (2024). [TPOT2: A New Graph-Based Implementation of the Tree-Based Pipeline Optimization Tool for Automated Machine Learning](https://link.springer.com/chapter/10.1007/978-981-99-8413-8_1). In Genetic programming theory and practice XX (pp. 1-17). Singapore: Springer Nature Singapore. + +BitTex entry: + +```bibtex +@incollection{ribeiro2024tpot2, + title={TPOT2: A New Graph-Based Implementation of the Tree-Based Pipeline Optimization Tool for Automated Machine Learning}, + author={Ribeiro, Pedro and Saini, Anil and Moran, Jay and Matsumoto, Nicholas and Choi, Hyunjun and Hernandez, Miguel and Moore, Jason H}, + booktitle={Genetic programming theory and practice XX}, + pages={1--17}, + year={2024}, + publisher={Springer} +} +``` Randal S. Olson, Ryan J. Urbanowicz, Peter C. Andrews, Nicole A. Lavender, La Creis Kidd, and Jason H. Moore (2016). [Automating biomedical data science through tree-based pipeline optimization](http://link.springer.com/chapter/10.1007/978-3-319-31204-0_9). *Applications of Evolutionary Computation*, pages 123-137. @@ -59,4 +72,4 @@ BibTeX entry: publisher = {ACM}, address = {New York, NY, USA}, } -``` \ No newline at end of file +``` diff --git a/pyproject.toml b/pyproject.toml index ee445379..b179c4f0 100644 --- a/pyproject.toml +++ b/pyproject.toml @@ -33,7 +33,6 @@ dependencies = [ "scikit-learn>=1.6", "update_checker>=0.16", "tqdm>=4.36.1", - "stopit>=1.1.1", "pandas>=2.2.0", "joblib>=1.1.1", "xgboost>=3.0.0", diff --git a/tpot/utils/eval_utils.py b/tpot/utils/eval_utils.py index 9b0a2ea3..010d42e7 100644 --- a/tpot/utils/eval_utils.py +++ b/tpot/utils/eval_utils.py @@ -40,11 +40,9 @@ import traceback from collections.abc import Iterable import warnings -from stopit import threading_timeoutable, TimeoutException from tpot.selectors import survival_select_NSGA2 import time import dask -import stopit from dask.diagnostics import ProgressBar from tqdm.dask import TqdmCallback from dask.distributed import progress @@ -269,47 +267,3 @@ def parallel_eval_objective_list(individual_list, final_scores = process_scores(final_scores, n_expected_columns) return final_scores, final_start_times, final_end_times, final_eval_errors - -################### -# Parallel optimization -############# - -@threading_timeoutable(np.nan) #TODO timeout behavior -def optimize_objective(ind, objective, steps=5, verbose=0): - - with warnings.catch_warnings(record=True) as w: #catches all warnings in w so it can be supressed by verbose - try: - value = ind.optimize(objective, steps=steps) - if not isinstance(value, Iterable): - value = [value] - - if len(w) and verbose>=2: - warnings.warn(w[0].message) - return value - except Exception as e: - if verbose >= 2: - print('WARNING THIS INDIVIDUAL CAUSED AND EXCEPTION') - print(e) - print() - if verbose >= 3: - print(traceback.format_exc()) - print() - return [np.nan] - - - -def parallel_optimize_objective(individual_list, - objective, - n_jobs = 1, - verbose=0, - steps=5, - timeout=None, - **objective_kwargs, ): - - Parallel(n_jobs=n_jobs)(delayed(optimize_objective)(ind, objective, steps, verbose, timeout=timeout) for ind in individual_list ) #TODO: parallelize - - - - - -