You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Add the new feature of allowing users to specify customized initial pipeline population for TPOT2.
Where should the reviewer start?
tpot2/tests/test_customized_iniPop.py
Contains the SequentialPipeline initialization method, which consists of scalers, selectors, transformers_layer, inner_estimators_layer, estimators and a sample of initializing this TPOTClassifier in a customized_initial_population parameter.
tpot2/config/get_configspace.py
A new set_node() function has been added, containing mainly operations for adding new nodes in pipeline.
tpot2/evolvers/base_evolver.py
Add some judgments about the number of initialized populations and the number of populations that need to be generated by crushed gold.
tpot2/tpot_estimator/estimator.py
Add passing of customized_initial_population parameter
How should this PR be tested?
The test code is at tpot2/tests/test_customized_iniPop.py:
In this version, users can specify a well-defined initial pipeline population, currently limited to the SequentialPipeline type. This update has the potential to improve algorithm performance and reduce evolutionary time.
Several Tips:
These SequentialPipeline pipelines can be obtained:
Referencing the examples in customized_initial_population.py and modifying them according to TPOT2's config_dict.
We consider the relationship between #customized initial pipelines and #population_size as follows:
init_population_size = len(customized_initial_population)
if self.cur_population_size <= init_population_size:
initial_population = customized_initial_population[:self.cur_population_size]
else:
initial_population = [next(self.individual_generator) for _ in range(self.cur_population_size - init_population_size)]
initial_population = customized_initial_population + initial_population
The current version is only applicable to solve the problem where search_spaces is linear and the initialized pipeline is of type SequentialPipeline. We will continue to refine the scenario where search_spaces is graph and the pipeline is of type GraphPipeline in the near future if you think our approach is appropriate.
I do like the idea of being able to specify an initial population. Thanks for your interest and contribution to the project!
Some notes:
You modified the default PULL_REQUEST_TEMPLATE.md with your info. That is intended to be copy pasted into the PRs (should populate by default on github). This change needs to be reverted back to the original template.
Bug - The custom initial population is never actually used. Line 447 of base_evolver overwrites the custom population. You can see the initial population with the following command and see that the custom individual is not there.
for ind in est.evaluated_individuals.iterrows():
print(ind[1]['Instance'])
This could be turned into a test potentially, I'm pretty sure that the order of the custom initial population will match the order in the pandas df.
3. initial_population = customized_initial_population[:self.cur_population_size] - I think if users pass in a list larger than the population size, they probably did that intentionally and TPOT2 should just use the larger list as is. To me this is more intuitive. I recommend replacing with initial_population = customized_initial_population
I'm wondering if the "set_node" function is necessary. It seems functionally equivalent to just calling "EstimatorNode." Also note that passing in a dictionary fixes the hyperparameters permanently, they go unlearned. Need to pass in ConfigurationSpace to make the hyperparameters learned. With set_node, you currently cannot specify an initial/default hyperparameter AND have it be learned/tuned later. Instead, I think that adding a parameter in EstimatorNode (and wrapper pipeline, etc.) like 'default_hyperparameters' might be a better approach. AND/OR maybe add it as a parameter for get_node, but this gets weird/complicated with the wrapperpipelines...
This also needs to be implemented in the steady state evolver/estimator
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What does this PR do?
Add the new feature of allowing users to specify customized initial pipeline population for TPOT2.
Where should the reviewer start?
Contains the SequentialPipeline initialization method, which consists of scalers, selectors, transformers_layer, inner_estimators_layer, estimators and a sample of initializing this TPOTClassifier in a customized_initial_population parameter.
A new set_node() function has been added, containing mainly operations for adding new nodes in pipeline.
Add some judgments about the number of initialized populations and the number of populations that need to be generated by crushed gold.
Add passing of customized_initial_population parameter
How should this PR be tested?
The test code is at tpot2/tests/test_customized_iniPop.py:
pytest test_customized_iniPop.py
Any background context you want to provide?
In this version, users can specify a well-defined initial pipeline population, currently limited to the SequentialPipeline type. This update has the potential to improve algorithm performance and reduce evolutionary time.
Several Tips:
Referencing the examples in customized_initial_population.py and modifying them according to TPOT2's config_dict.
What are the relevant issues?
issue-61
Main Contributors
@peiyanpan @t-harden