Skip to content
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
100 changes: 82 additions & 18 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,11 +7,26 @@ This tutorial will give you an overview of how to take the [cBioportal](https://
You will need to acquire the following files in order to seed the cBioportal database for your new reference.
- [reference_genome_file](https://github.com/cBioPortal/cbioportal/blob/master/docs/Import-reference-genome.md) A TSV file describing the reference you intend to import, including the species, build name, URL, and release date.

- gene_info file: This can be obtained from the NCBI FTP server at ftp.ncbi.nih.gov. If there is not one specific to your species available, you can filter down the `All_Mammalia.gene_info` file by using the taxon id of your species like so: `cat All_Mammalia.gene_info | grep "^9685" > felis_catus.gene_info`
- gene_info file: This can be obtained from the NCBI FTP server at ftp.ncbi.nih.gov. If there is not one specific to your species available, you can filter down the [All_Mammalia.gene_info](https://ftp.ncbi.nih.gov/gene/DATA/GENE_INFO/Mammalia/All_Mammalia.gene_info.gz) file using the taxon id of your species.

- gtf file: This can be obtained via the UCSC downloads server. Find your species [here](https://hgdownload.soe.ucsc.edu/downloads.html) and then look under for the `refGene.gtf` file. (The exact directory layout can vary between species)
Assuming you are running a Linux distribution you can do so as follows:
```bash
`gzcat All_Mammalia.gene_info.gz | grep "^9685" > felis_catus.gene_info`
```

- gtf file: This can be obtained via the UCSC downloads server. Find your species [here](https://hgdownload.soe.ucsc.edu/downloads.html) and then look under for the `refGene.gtf` file. (The exact directory layout can vary between species). As an example the refGene file for the felCat9 assembly is located here: `https://hgdownload.soe.ucsc.edu/goldenPath/felCat9/bigZips/genes/felCat9.refGene.gtf.gz`

- IGV reference track: If your species already has a hosted reference track for IGV [listed here](https://igv.org/doc/igvjs/#Reference-Genome/) you don't need to acquire any additional files for this. If it is not you will need `.fa`, `fa.fai` , and `bands.tsv` files for your species. The `fa` files can be acquired from the [UCSC downloads server](https://hgdownload.soe.ucsc.edu/downloads.html) as well. You may need to generate a bands file yourself. It is a TSV file with the following headers:
- IGV reference track: If your species already has a hosted reference track for IGV [listed here](https://igv.org/doc/igvjs/#Reference-Genome/) you don't need to acquire any additional files for this. If it is not you will need `.fa`, `fa.fai` files for your species. These can be acquired from the [UCSC downloads server](https://hgdownload.soe.ucsc.edu/downloads.html) as well. An example for felcat9 is provided below:
```bash
# download the reference genome
wget https://hgdownload.soe.ucsc.edu/goldenPath/felCat9/bigZips/felCat9.fa.gz
gunzip felCat9.fa.gz

# creates the fai file, requires the samtools utility
samtools faidx felCat9.fa
```

A `bands.tsv` file is also required for your species. You may need to generate a bands file yourself. If gieStain are unknown a file with a header is suffecient. It should take the following form:

```
#chrom chromStart chromEnd name gieStain
Expand All @@ -22,18 +37,31 @@ You will need to acquire the following files in order to seed the cBioportal dat
"yourSpecies": { "1": 1234, "2": 3456 }
```

To create this file we again provide an example using felcat9 from UCSC:
```bash
# download sizes
wget https://hgdownload.soe.ucsc.edu/goldenPath/felCat9/bigZips/felCat9.chrom.sizes

# convert to json using command line
cat felCat9.chrom.sizes | awk '{print "\t\""$1"\": "$2","}' | sed '$ s/,$//' | awk 'BEGIN{print "{"} {print} END{print "}"}' > felCat9.chrom.sizes.2

# replace the downloaded file with the new one
rm -f felCat9.chrom.sizes
mv felCat9.chrom.sizes.2 felCat9.chrom.sizes
```

### Modifying the codebase
There are four repositories you will need to fork in order to get set up. They are:
There are four repositories you will need to fork or clone in order to get set up. They are:

- [https://github.com/cBioPortal/cbioportal-core](https://github.com/cBioPortal/cbioportal-core) This repo contains the validation and loading scripts for importing references, studies, etc.
- [https://github.com/cBioPortal/cbioportal](https://github.com/cBioPortal/cbioportal) This is the main web application backend and the primary repository that you will use to build the final Docker images.
- [https://github.com/cBioPortal/cbioportal-frontend](https://github.com/cBioPortal/cbioportal-frontend) This is the react.js frontend of the web application
- [https://github.com/cBioPortal/cbioportal-docker-compose](https://github.com/cBioPortal/cbioportal-docker-compose) This contains docker-compose configuration that we will use to boot the application.

#### Modifying the frontend
#### Modifying the frontend (cbioportal-frontend)
In `env/master.sh` you will want to find the line that says`export CBIOPORTAL_URL=`. To run the frontend against your local cbioportal you will want to update it to say `http://localhost:8080`. For your release build you will want to update it to the url/domain of your production site.

In `src/shared/lib/IGVUtils.ts` you will want to add a helper method that points to the IGV reference track files you prepared in the first step. It can either point to the officially hosted ones if they already existed or it can point to the ones you acquired yourself. It should look something like this but with your species specific information:
In `src/shared/lib/IGVUtils.ts` you will want to add a helper method that points to the IGV reference track files you prepared in the first step. It can either point to the officially hosted ones if they already existed or it can point to the ones you acquired yourself. It should look something like this but with your species specific information and file paths:

```
export function defaultfelCat9ReferenceProps() {
Expand Down Expand Up @@ -102,18 +130,26 @@ src/shared/lib/referenceGenomeUtils.tsx --- 4/4 --- TypeScript TSX
91 111 export function formatStudyReferenceGenome(genomeBuild: string) {
92 112 if (isGrch37(genomeBuild)) {
```
#### Modifying the core
#### Modifying the core (cbioportal-core)
In `scripts/importer/chromosome_sizes.json` you will need to add an entry for the reference build you intend to support. In our case the addtional
value looked like this:
```
"felCat9": { "A1": 242100913, "C1": 222790142, "B1": 208212889, "A2": 171471747, "C2": 161193150, "B2": 155302638, "B3": 149751809, "B4": 144528695, "A3": 143202405, "X": 130557009, "D1": 117648028, "D3": 96884206, "D4": 96521652, "D2": 90186660, "F2": 85752456, "F1": 71664243, "E2": 64340295, "E1": 63494689, "E3": 44648284 }
```

In `scripts/importer/cbioportal_common.py` you will need to add the name of your reference to the `valid_segment_reference_genomes` array as demonstrated [here](https://github.com/catbioportal/cbioportal-core/blob/main/scripts/importer/cbioportal_common.py#L979).
In `scripts/importer/cbioportal_common.py` you will need to add the name of your reference to the `valid_segment_reference_genomes` array as demonstrated [here](https://github.com/catbioportal/cbioportal-core/blob/main/scripts/importer/cbioportal_common.py#L979).
Replacing:
```
valid_segment_reference_genomes = ["hg19", "hg38"]
```
With:
```
valid_segment_reference_genomes = ["hg19", "hg38", "felCat9"]
```

Likewise, you will need to make the script at `scripts/importer/validateData.py` aware of your new reference. This includes inside [load_chromosome_lengths](https://github.com/catbioportal/cbioportal-core/blob/main/scripts/importer/validateData.py#L1076) and [reference_genome_map](https://github.com/catbioportal/cbioportal-core/blob/main/scripts/importer/validateData.py#L5764).
Likewise, you will need to make the script at `scripts/importer/validateData.py` aware of your new reference. This includes inside [load_chromosome_lengths](https://github.com/catbioportal/cbioportal-core/blob/93439023408c6efa720ecc83a00eaca2e734b658/scripts/importer/validateData.py#L1076) and [reference_genome_map](https://github.com/catbioportal/cbioportal-core/blob/93439023408c6efa720ecc83a00eaca2e734b658/scripts/importer/validateData.py#L5764)).

This script also contains logic aliasing the X and Y chromosomes to 23 and 24, if this does not apply to your organism, you will need to change that logic as we have [here](https://github.com/catbioportal/cbioportal-core/blob/main/scripts/importer/validateData.py#L3632).
This script also contains logic aliasing the X and Y chromosomes to 23 and 24, if this does not apply to your organism, you will need to change that logic as we have [here](https://github.com/catbioportal/cbioportal-core/blob/93439023408c6efa720ecc83a00eaca2e734b658/scripts/importer/validateData.py#L3632).

In `src/main/java/org/mskcc/cbio/portal/model/ReferenceGenome.java` you will want to add the appropriate constants for your organism. In our case that looked like this:

Expand All @@ -123,37 +159,65 @@ public static String FELIS_CATUS_DEFAULT_GENOME_NAME = "felCat9";
public static String FELIS_CATUS_DEFAULT_GENOME_BUILD_PREFIX = "Felis_catus_";
```

These should be declared within the `public class ReferenceGenome{}` where other organisms are declared.

Depending on what types of data you intend to import, you will likely have to modify the cBioPortal model files representing that data. For example, we have to modify the `ReferenceGenomeId` enum in ` src/main/java/org/mskcc/cbio/portal/model/CopyNumberSegmentFile.java` to include felCat9.

Other examples include removing logic in `src/main/java/org/mskcc/cbio/portal/scripts/ImportExtendedMutationData.java` and `src/main/java/org/mskcc/cbio/portal/scripts/ImportCopyNumberSegmentData.java` that [asserts that all chromosomes will be numeric](https://github.com/catbioportal/cbioportal-core/blob/main/src/main/java/org/mskcc/cbio/portal/scripts/ImportExtendedMutationData.java#L195) and modifying `src/main/java/org/mskcc/cbio/portal/scripts/ImportGeneData.java` to [skip importing genes from taxons other than cat](https://github.com/catbioportal/cbioportal-core/blob/main/src/main/java/org/mskcc/cbio/portal/scripts/ImportGeneData.java#L89).
```java
hg18("hg18"),
hg19("hg19"),
hg38("hg38"),
mm10("mm10"),
felCat9("felCat9");
```

Other examples include removing logic in `src/main/java/org/mskcc/cbio/portal/scripts/ImportExtendedMutationData.java` and `src/main/java/org/mskcc/cbio/portal/scripts/ImportCopyNumberSegmentData.java` that asserts that all chromosomes will be numeric see [here](https://github.com/catbioportal/cbioportal-core/blob/93439023408c6efa720ecc83a00eaca2e734b658/src/main/java/org/mskcc/cbio/portal/scripts/ImportExtendedMutationData.java#L195) and [here](https://github.com/catbioportal/cbioportal-core/blob/93439023408c6efa720ecc83a00eaca2e734b658/src/main/java/org/mskcc/cbio/portal/scripts/ImportCopyNumberSegmentData.java#L73) respectively, and modifying `src/main/java/org/mskcc/cbio/portal/scripts/ImportGeneData.java` to [skip importing genes from taxons other than cat](https://github.com/catbioportal/cbioportal-core/blob/93439023408c6efa720ecc83a00eaca2e734b658/src/main/java/org/mskcc/cbio/portal/scripts/ImportGeneData.java#L89)).

#### Modifying the webapp
#### Modifying the webapp (cbioportal)

Fortunately, only minimal modifications to the main API webserver
are required. Primarily, you need to modify the routes configuration in ` src/main/java/org/cbioportal/WebAppConfig.java` to make it aware of any custom pages you have added to your instance.
are required. Primarily, you need to modify the routes configuration in ` src/main/java/org/cbioportal/application/WebAppConfig.java` to make it aware of any custom pages you have added to your instance.

ADAM or ZACH TODO on this section

### Building artifacts

1. To build the cbioportal-core JAR file, simply run `mvn package` in the root directory of the repository. This will produce a file named something like `core-1.0.8.jar` in the same directory. Upload this fie somewhere accessible. We recomend attaching it to a GitHub release.
1. To build the cbioportal-core JAR file, simply run `mvn package` in the root directory of the `cbioportal-core` repository. This will produce a file named something like `core-1.0.8.jar` in the same directory. Upload this fie somewhere accessible. We recomend attaching it to a GitHub release. You may need to install `mvn` for this with a package manager.

2. You will need to have the cbioportal Docker image include your customized cbioportal-core jar file. To do this modify the `Dockerfile` at `docker/web-and-data/Dockerfile`. Look for the line below and change it to point to the customized the jar file you published in the previous step instead of the official cbioportal repository.
2. You will need to switch to the `cbioportal` respository and have the cbioportal Docker image include your customized cbioportal-core jar file. To do this modify the `Dockerfile` at `docker/web-and-data/Dockerfile`. Look for the line below and change it to point to the customized the jar file you published in the previous step instead of the official cbioportal repository.

ADAM/ZACH/OBI, need to think about this, git complains about the jar file size, it's 150mb, maybe doesn't matter, testing this via local though
```
RUN wget https://github.com/cBioPortal/cbioportal-core/releases/download/1.0.6/core-1.0.6.jar -P core/ ; cd core ; jar -xf core-1.0.6.jar scripts/ requirements.txt ; chmod -R a+x scripts/ ; cd ..;
```


3. You will also need to point the cbioportal build process at your newly customized cbioportal-frontend release. You can do this by updating the dependency in the `pom.xml` in the main cbioportal repository.
Look for the block below and replace the `groupId` with your GitHub repository and the `version` with either a [commit SHA or a tagged release](https://docs.cbioportal.org/development/build-different-frontend/).

for example:

```
<frontend.groupId>com.github.cbioportal</frontend.groupId>
<frontend.version>v6.0.5</frontend.version>
<frontend.groupId>io.github.cbioportal</frontend.groupId>
<frontend.artifactId>frontend-cbioportal</frontend.artifactId>
<frontend.version>v6.4.4</frontend.version>
```

might change to something like this:

```
<frontend.groupId>com.github.zlskidmore</frontend.groupId>
<frontend.artifactId>cbioportal-frontend</frontend.artifactId>
<frontend.version>1.0</frontend.version>
```

4. With that complete, you can build and push your customized Docker image. For this, you will need a [DockerHub](https://hub.docker.com) account. In the below commands, substitute your Docker username and your preferred image name. You will only need the `linux/arm64` platform if you are on an Apple Silicon Mac.

NOTE to ADAM/ZACH/OBI, when I ran this COPY $PWD in the Dockerfile did not expand, someone will have to doublecheck this, I altered my own Dockerfile to get this to work

```
docker buildx build --platform linux/amd64,linux/arm64 -t my-docker-username/mycustombioportal:latest -f docker/web-and-data/Dockerfile .

docker-we
docker push my-docker-username/mycustombioportal:latest
```

Expand Down