Skip to content

Commit a924316

Browse files
committed
Update README.md
1 parent 6e9cd6e commit a924316

1 file changed

Lines changed: 16 additions & 27 deletions

File tree

README.md

Lines changed: 16 additions & 27 deletions
Original file line numberDiff line numberDiff line change
@@ -8,21 +8,17 @@ https://doi.org/10.1186/s13059-020-02218-6
88

99
## Installation and Requirements
1010

11-
**Please Note**: This repository has been updated as of **02/11/2021**. Input file formats have been modified to increase readability.
12-
python >= 3.5
13-
We suggest using anaconda to create a virtual environment using the provided YAML configuration file:
14-
`conda env create -f bichrom.yml`
11+
~~**Please Note**: This repository has been updated as of **02/11/2021**. Input file formats have been modified to increase readability.~~
1512

16-
Alternatively, to install requirements using pip:
17-
`pip install -r requirements.txt`
13+
**Please Note**: As of **03/07/2022**, the tensorflow version used by Bichrom has been changed to **2.6.2**, cudatoolkit and cudnn versions are included in `bichrom.yml`.
1814

19-
**Note**: Bichrom uses Pybedtools to construct genome-wide training, test and validation datasets. In order to use this functionality, you must have bedtools installed. To install bedtools, follow instructions here: https://bedtools.readthedocs.io/en/latest/content/installation.html
15+
We suggest using anaconda to create a virtual environment using the provided YAML configuration file:
16+
`conda env create -f bichrom.yml`
2017

21-
**Note**: For GPU compatibility, tensorflow 2.2.1 requires CUDA 10.1 and cuDNN >= 7.
18+
**Note**: Now Bichrom supports **MirroredStrategy** to employ multiple gpus for training.
2219

2320
## Usage
2421

25-
2622
### Step 1 - Construct Bichrom Input Data
2723

2824
Clone and navigate to the Bichrom repository.
@@ -52,7 +48,11 @@ optional arguments:
5248
format
5349
-o OUTDIR, --outdir OUTDIR
5450
Output directory for storing train, test data
55-
51+
-p PROCESSORS Number of processors
52+
-val_chroms CHROMOSOME
53+
Space-delimited chromosome names would be used as validation dataset
54+
-test_chroms CHROMOSOME
55+
Space-delimited chromosome names would be used as test dataset
5656
```
5757

5858
**Required Arguments**
@@ -89,6 +89,10 @@ Output directory for storing output train, test and validation datasets.
8989
A blacklist BED file, with artifactual regions to be excluded from the training.
9090
For an example file, please see: `sample_data/mm10_blacklist.bed`.
9191

92+
**p** (optional):
93+
Number of processors, default is 1.
94+
It is suggested to provide more cores to speed up training sample preparation
95+
9296
### Step 1 - Output
9397
construct_data.py will produce train, test and validation datasets in the specified output directory.
9498
This function will also produce a configuration file called **bichrom.yaml**, which can be used as input to run Bichrom. This configuration file stores the paths to the created train, test and validation datasets.
@@ -144,24 +148,9 @@ Bichrom output directory.
144148
* best_model.hdf5: A Bichrom tensorflow.Keras Model (with the highest validation set auPRC)
145149
* precision-recall curves for the sequence-only network and Bichrom.
146150

147-
### Optional: Custom Training Sets and YAML files
148-
If generating custom training data, please specify a custom YAML file for training Bichrom. Bichrom requires the following files: **1)** Training files, **2)** Validation files, **3)** Test Files.
149-
150-
Within each category, Bichrom expects **3 file types**:
151-
* **Sequence File**: This file contains sequence data (one training sequence of lenght L/line). Acceptable nucleotides: A, T, G, C, N.
152-
For an example: see `custom_data_files/data_train.seq`.
153-
154-
155-
* **Chromatin Files**: 1 file per chromatin experiment. Each input chromatin file contains chromatin signal (binned at any resolution) over the input genomic windows.
156-
For an example: see `custom_data_files/data_train.mES_dnaseseq.chromatin` which is uses a window length= 500, nbins=20.
157-
158-
159-
* **Labels File**: This file contains binary labels associated with TF binding over the input genomic windows.
160-
For an example: see `custom_data_files/data_train.labels`
161-
162-
163-
File paths to these files should be summarized in a configuration YAML file. For the structure of the YAML file, please see: `sample_data/sample_custom_config.yaml` or `custom_data_files/bichrom.yaml`
151+
~~### Optional: Custom Training Sets and YAML files~~
164152

153+
**TODO**: Due to currently Bichrom saving dataset in Tensorflow TFRecord format, a new way of providing custom training set and yaml files will be released.
165154

166155
### 2-D Bichrom embeddings
167156
For 2-D latenet embeddings, please refer to the README in the ```Bichrom/latent_embeddings directory```

0 commit comments

Comments
 (0)