Bayesian Network Feature Selection (BNFS)

A method for feature selection applied to classification and regression problems using Bayesian networks.

Final Application Project at Trento University

Overview

BNFS implements a feature selection strategy based on the reconstruction of Bayesian networks, following the method proposed in arXiv:2204.03526. By modeling probabilistic dependencies between variables, the algorithm identifies a Markov blanket around the target variable to select the most informative features. The approach has been tested by training machine learning classifiers and compared against state-of-the-art selection methods, showing promising results.

The pipeline is structured as follows:

Data Preprocessing: The data is cleaned and normalized to prepare it for Bayesian network training. This step includes discretizing continuous features where necessary.
Bayesian Network Structure Learning: The network structure is learned by identifying the relevant variables and their relationships. Different strategies can be employed for this step including the quantum-computing one.
Markov Blanket Calculation: The Markov blanket of the target variable is determined to identify the set of features that directly influence the target.
Feature Selection: The most relevant features are selected based on their inclusion in the Markov blanket, which provides a set of features highly correlated with the target variable.

Installation

Prerequisites:

Ensure that the following prerequisites are installed on your development machine:

Python 3.6 or later
pip3 (Python package installer)

BNFS Installation:

To install BNFS, use the following pip command:

pip3 install bnfs

Usage

Step 1: Data Preparation

Prepare a CSV file containing the dataset, with the target variable being the last column. The features can be of any type (integer, float, string), but the target must be labeled appropriately. For example:

Example

Feature 1	Feature 2	Feature 3	TARGET
17.27	3	ETVDA	True
44.59	105	FBAER	False
...	...	...	...
26.89	19	DDFBDF	False
15.56	298	CSDSD	True

Mixed data types are supported (int, float, string).

Step 2: Configuration File

Create a JSON configuration file to specify customizable parameters for the feature selection algorithm.

Details

Aviable Parameters:

data_path: Path to the CSV file containing the dataset.
output_dir: Directory for output files. If it doesn’t exist, it will be created.
random_state: Set a random seed for reproducibility.
verbose: If set to true, print information at each step (discretization, BN structure learning, Markov blanket calculation).
full_Markov_blanket: If true, the selected features include the union of parents, children, and the children’s parents of the target variable.

Discretization Parameters:

discretize: If set to false, skips discretization.
labels: List of indexes for categorical features needing label encoding.
n_bins: Number of bins for discretization.
discretizer_strategy: Discretization strategy (e.g., uniform, quantile, kmeans).
keep_file: If true, generates a CSV file with the discretized dataset.
divide_et_impera: If true, applies the divide et impera approach.

Bayesian Network Structure Learning:

dei_n: Number of splits for the divide et impera approach.
bnsl_data_path: Path to the discretized data (if discretization is skipped).
bnsl_strategy: Strategy for learning the Bayesian network structure (e.g., QA, SA, bnlearn).

QA and bnlearn Parameters:

reads: Number of reads for the quantum annealing method.
annealing_time: Time (in microseconds) allocated for quantum annealing per read.
metric: The scoring function used to evaluate network fit (e.g., k2, bic, bdeu).
search_algorithm: Search algorithm for optimizing the DAG structure (e.g., ex, hc, cl, tan, cs, naivebayes).

Step 3: Running the Pipeline

Once the data and configuration file are ready, execute the algorithm using the following command:

bnfs -c <config_file>

This will trigger the execution of the feature selection pipeline, generating the following output files:

res.txt: Contains the adjacency matrix of the learned Bayesian network structure and a list of features selected through the Markov blanket method.

Name		Name	Last commit message	Last commit date
Latest commit History 27 Commits
BNSL-QA-python @ 66f3ae1		BNSL-QA-python @ 66f3ae1
bnfsqa		bnfsqa
utils		utils
.gitattributes		.gitattributes
.gitignore		.gitignore
.gitmodules		.gitmodules
README.md		README.md
config.json		config.json
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Bayesian Network Feature Selection (BNFS)

Table of Contents

Overview

Installation

Prerequisites:

BNFS Installation:

Usage

Step 1: Data Preparation

Step 2: Configuration File

Step 3: Running the Pipeline

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Bayesian Network Feature Selection (BNFS)

Table of Contents

Overview

Installation

Prerequisites:

BNFS Installation:

Usage

Step 1: Data Preparation

Step 2: Configuration File

Step 3: Running the Pipeline

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages