You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
**Please Note**: This repository has been updated as of **02/11/2021**. Input file formats have been modified to increase readability.
12
-
python >= 3.5
13
-
We suggest using anaconda to create a virtual environment using the provided YAML configuration file:
14
-
`conda env create -f bichrom.yml`
11
+
~~**Please Note**: This repository has been updated as of **02/11/2021**. Input file formats have been modified to increase readability.~~
15
12
16
-
Alternatively, to install requirements using pip:
17
-
`pip install -r requirements.txt`
13
+
**Please Note**: As of **03/07/2022**, the tensorflow version used by Bichrom has been changed to **2.6.2**, cudatoolkit and cudnn versions are included in `bichrom.yml`.
18
14
19
-
**Note**: Bichrom uses Pybedtools to construct genome-wide training, test and validation datasets. In order to use this functionality, you must have bedtools installed. To install bedtools, follow instructions here: https://bedtools.readthedocs.io/en/latest/content/installation.html
15
+
We suggest using anaconda to create a virtual environment using the provided YAML configuration file:
16
+
`conda env create -f bichrom.yml`
20
17
21
-
**Note**: For GPU compatibility, tensorflow 2.2.1 requires CUDA 10.1 and cuDNN >= 7.
18
+
**Note**: Now Bichrom supports **MirroredStrategy** to employ multiple gpus for training.
22
19
23
20
## Usage
24
21
25
-
26
22
### Step 1 - Construct Bichrom Input Data
27
23
28
24
Clone and navigate to the Bichrom repository.
@@ -52,7 +48,11 @@ optional arguments:
52
48
format
53
49
-o OUTDIR, --outdir OUTDIR
54
50
Output directory for storing train, test data
55
-
51
+
-p PROCESSORS Number of processors
52
+
-val_chroms CHROMOSOME
53
+
Space-delimited chromosome names would be used as validation dataset
54
+
-test_chroms CHROMOSOME
55
+
Space-delimited chromosome names would be used as test dataset
56
56
```
57
57
58
58
**Required Arguments**
@@ -89,6 +89,10 @@ Output directory for storing output train, test and validation datasets.
89
89
A blacklist BED file, with artifactual regions to be excluded from the training.
90
90
For an example file, please see: `sample_data/mm10_blacklist.bed`.
91
91
92
+
**p** (optional):
93
+
Number of processors, default is 1.
94
+
It is suggested to provide more cores to speed up training sample preparation
95
+
92
96
### Step 1 - Output
93
97
construct_data.py will produce train, test and validation datasets in the specified output directory.
94
98
This function will also produce a configuration file called **bichrom.yaml**, which can be used as input to run Bichrom. This configuration file stores the paths to the created train, test and validation datasets.
@@ -144,24 +148,9 @@ Bichrom output directory.
144
148
* best_model.hdf5: A Bichrom tensorflow.Keras Model (with the highest validation set auPRC)
145
149
* precision-recall curves for the sequence-only network and Bichrom.
146
150
147
-
### Optional: Custom Training Sets and YAML files
148
-
If generating custom training data, please specify a custom YAML file for training Bichrom. Bichrom requires the following files: **1)** Training files, **2)** Validation files, **3)** Test Files.
149
-
150
-
Within each category, Bichrom expects **3 file types**:
151
-
***Sequence File**: This file contains sequence data (one training sequence of lenght L/line). Acceptable nucleotides: A, T, G, C, N.
152
-
For an example: see `custom_data_files/data_train.seq`.
153
-
154
-
155
-
***Chromatin Files**: 1 file per chromatin experiment. Each input chromatin file contains chromatin signal (binned at any resolution) over the input genomic windows.
156
-
For an example: see `custom_data_files/data_train.mES_dnaseseq.chromatin` which is uses a window length= 500, nbins=20.
157
-
158
-
159
-
***Labels File**: This file contains binary labels associated with TF binding over the input genomic windows.
160
-
For an example: see `custom_data_files/data_train.labels`
161
-
162
-
163
-
File paths to these files should be summarized in a configuration YAML file. For the structure of the YAML file, please see: `sample_data/sample_custom_config.yaml` or `custom_data_files/bichrom.yaml`
151
+
~~### Optional: Custom Training Sets and YAML files~~
164
152
153
+
**TODO**: Due to currently Bichrom saving dataset in Tensorflow TFRecord format, a new way of providing custom training set and yaml files will be released.
165
154
166
155
### 2-D Bichrom embeddings
167
156
For 2-D latenet embeddings, please refer to the README in the ```Bichrom/latent_embeddings directory```
0 commit comments