Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion LICENSE.txt
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
The MIT License

Copyright (c) 2016-2018 Kamil Salikhov, Karel Brinda, Simone Pignotti, Gregory Kucherov
Copyright (c) 2016-2020 Kamil Salikhov, Karel Brinda, Simone Pignotti, Gregory Kucherov

Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
Expand Down
2 changes: 1 addition & 1 deletion Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@ IND=./prophex

DEPS= $(wildcard src/*.h) $(wildcard src/*.c) $(wildcard src/bwa/.*.h) $(wildcard src/bwa/*.c)

all: prophex readme
all: prophex #readme

prophex: $(DEPS)
$(MAKE) -C src
Expand Down
73 changes: 45 additions & 28 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,24 @@ designed as a core computational component of
classifier allowing fast and accurate read assignment.


<!-- vim-markdown-toc GFM -->

* [Getting started](#getting-started)
* [Alternative ways of installation](#alternative-ways-of-installation)
* [Quick example](#quick-example)
* [ProPhex commands](#prophex-commands)
* [Output format](#output-format)
* [FAQs](#faqs)
* [Issues](#issues)
* [Changelog](#changelog)
* [Licence](#licence)
* [Authors](#authors)

<!-- vim-markdown-toc -->




## Getting started

```
Expand Down Expand Up @@ -44,24 +62,24 @@ conda install prophex



# ProPhex commands
## ProPhex commands
<!---
USAGE-BEGIN
-->
```
Program: prophex (a lossless k-mer index)
Version: 0.1.1
Program: prophex (an exact k-mer index)
Version: 0.2.0
Authors: Kamil Salikhov, Karel Brinda, Simone Pignotti, Gregory Kucherov
Contact: kamil.salikhov@univ-mlv.fr
Contact: kamil.salikhov@univ-mlv.fr, karel.brinda@gmail.com

Usage: prophex <command> [options]

Command: index construct a BWA index and k-LCP
query query reads against index
Command: index index sequences in the FASTA format
query query k-mers

klcp construct an additional k-LCP
bwtdowngrade downgrade .bwt to the old, more compact format without Occ
bwt2fa reconstruct FASTA from BWT
klcp construct an additional k-LCP array
bwtdowngrade remove OCC from .bwt
bwt2fa reconstruct .fa from .fa.bwt

```

Expand All @@ -77,11 +95,9 @@ Options: -k INT k-mer length for k-LCP
```
Usage: prophex query [options] <idxbase> <in.fq>

Options: -k INT length of k-mer
Options: -k INT k-mer length
-u use k-LCP for querying
-v output set of chromosomes for every k-mer
-p do not check whether k-mer is on border of two contigs, and show such k-mers in output
-b print sequences and base qualities
-b append sequences and base qualities to the output
-l STR log file name to output statistics
-t INT number of threads [1]
-h print help message
Expand All @@ -91,8 +107,8 @@ Options: -k INT length of k-mer
```
Usage: prophex klcp [options] <idxbase>

Options: -k INT length of k-mer
-s construct k-LCP and SA in parallel
Options: -k INT k-mer length
-s construct also SA, in parallel to k-LCP
-i sampling distance for SA
-h print help message

Expand All @@ -115,19 +131,20 @@ Usage: prophex bwt2fa <idxbase> <output.fa>

## Output format

Matches are reported in an extended
[Kraken format](http://ccb.jhu.edu/software/kraken/MANUAL.html#output-format).
ProPhex produces a tab-delimited file with the following columns:

1. Category (unused, `U` as a legacy value)
2. Sequence name
3. Final decision (unused, `0` as a legacy value)
4. Sequence length
5. Assigned k-mers. Space-delimited list of k-mer blocks with the same assignments. The list is of
the following format: comma-delimited list of sets (or `A` for ambiguous, or
  `0` for no matches), colon, length. Example: `2157,393595:1 393595:1 0:16` (the first k-mer assigned to the nodes `2157` and `393595`, the second k-mer assigned to `393595`, the subsequent 16 k-mers unassigned)
6. Bases (optional)
7. Base qualities (optional)
Matches are reported in the form of a tab-delimited file with the following
columns:

1. Sequence name
2. Sequence length
3. Assigned k-mers. Space-delimited list of k-mer blocks matching the same
k-mer sets. The list is of the following format: comma-delimited list of
k-mer sets (`~` for an ambiguous nucleotide name `*` for no k-mer matches),
colon, the number of k-mers in the block. Example: `2157,393595:1 393595:1
*:16` (the first k-mer assigned to the k-mer sets `2157` and `393595`, the
second k-mer assigned to `393595`, and the subsequent 16 k-mers do not match
anything)
4. Bases (optional)
5. Base qualities (optional)


## FAQs
Expand Down
4 changes: 2 additions & 2 deletions src/Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -45,8 +45,8 @@ clean:
# if BWA Makefile is present
test -f bwa/Makefile && $(MAKE) -C bwa clean

$(PROG): bwa/libbwa.a $(AOBJS2) main.o prophex_query.o prophex_build.o klcp.o bitarray.o bwa_utils.o prophex_utils.o contig_node_translator.o
$(CC) $(INCLUDES) $(CFLAGS) $(DFLAGS) $(AOBJS2) main.o prophex_query.o prophex_build.o klcp.o bitarray.o bwa_utils.o prophex_utils.o contig_node_translator.o -o $@ -Lbwa -lbwa $(LIBS)
$(PROG): bwa/libbwa.a $(AOBJS2) main.o prophex_query.o prophex_build.o klcp.o bitarray.o bwa_utils.o prophex_utils.o contig_translator.o
$(CC) $(INCLUDES) $(CFLAGS) $(DFLAGS) $(AOBJS2) main.o prophex_query.o prophex_build.o klcp.o bitarray.o bwa_utils.o prophex_utils.o contig_translator.o -o $@ -Lbwa -lbwa $(LIBS)

#bwa/libbwa.a $(AOBJS2) bwtexk.o:
bwa/libbwa.a:
Expand Down
1 change: 1 addition & 0 deletions src/bitarray.c
Original file line number Diff line number Diff line change
@@ -1,4 +1,5 @@
#include "bitarray.h"

#include <stdio.h>
#include <stdlib.h>

Expand Down
3 changes: 2 additions & 1 deletion src/bwa_utils.c
Original file line number Diff line number Diff line change
Expand Up @@ -3,8 +3,9 @@
#include <stdlib.h>
#include <string.h>
#include <time.h>

#include "bwa.h"
#include "contig_node_translator.h"
#include "contig_translator.h"
#include "khash.h"
#include "kstring.h"
#include "prophex_utils.h"
Expand Down
52 changes: 0 additions & 52 deletions src/contig_node_translator.c

This file was deleted.

17 changes: 0 additions & 17 deletions src/contig_node_translator.h

This file was deleted.

54 changes: 54 additions & 0 deletions src/contig_translator.c
Original file line number Diff line number Diff line change
@@ -0,0 +1,54 @@
#include "contig_translator.h"

#include <stdio.h>
#include <stdlib.h>
#include <string.h>

#include "utils.h"

#define MAX_KMERSETS_COUNT 100000
#define MAX_CONTIGS_COUNT 200000000

static int contig_to_kmerset[MAX_CONTIGS_COUNT];
static char* kmerset_names[MAX_KMERSETS_COUNT];
static int kmerset_name_lengths[MAX_KMERSETS_COUNT];
static int kmersets_count = 0;
static int contigs_count = 0;

int get_kmerset_from_contig(int contig) {
if (contig < 0 || contig >= contigs_count) {
fprintf(stderr, "[prophex:%s] contig %d is outside of range [%d, %d]\n", __func__, contig, 0, contigs_count - 1);
}
return contig_to_kmerset[contig];
}

char* get_kmerset_name(int kmerset) { return kmerset_names[kmerset]; }

int get_kmerset_name_length(int kmerset) { return kmerset_name_lengths[kmerset]; }

void add_contig(char* contig, int contig_number) {
xassert(contigs_count < MAX_CONTIGS_COUNT,
"[prophex] there are more than MAX_CONTIGS_COUNT contigs, try to increase MAX_CONTIGS_COUNT in contig_translator.c\n");
contigs_count++;
const char* ch = strchr(contig, '@');
int index = 0;
if (ch == NULL) {
index = strlen(contig);
} else {
index = ch - contig;
}
contig[index] = '\0';
if (kmersets_count == 0 || strcmp(contig, kmerset_names[kmersets_count - 1])) {
char* kmerset_name = malloc((index + 1) * sizeof(char));
memcpy(kmerset_name, contig, index);
kmerset_name[index] = '\0';
xassert(kmersets_count < MAX_KMERSETS_COUNT,
"[prophex] there are more than MAX_KMERSETS_COUNT kmersets, try to increase MAX_KMERSETS_COUNT in contig_translator.c\n");
kmerset_names[kmersets_count] = kmerset_name;
kmerset_name_lengths[kmersets_count] = strlen(kmerset_name);
contig_to_kmerset[contig_number] = kmersets_count;
kmersets_count++;
} else {
contig_to_kmerset[contig_number] = kmersets_count - 1;
}
}
17 changes: 17 additions & 0 deletions src/contig_translator.h
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
/*
Correspondance between contig_id in BWA and kmerset_name in taxonomic tree.
Author: Kamil Salikhov <salikhov.kamil@gmail.com>
Licence: MIT
*/

#ifndef CONTIG_TRANSLATOR_H
#define CONTIG_TRANSLATOR_H

#include <stdint.h>

int get_kmerset_from_contig(int contig);
char* get_kmerset_name(int kmerset);
int get_kmerset_name_length(int kmerset);
void add_contig(char* contig, int contig_number);

#endif // CONTIG_TRANSLATOR_H
2 changes: 2 additions & 0 deletions src/klcp.c
Original file line number Diff line number Diff line change
@@ -1,7 +1,9 @@
#include "klcp.h"

#include <stdint.h>
#include <stdio.h>
#include <string.h>

#include "utils.h"

int32_t position_of_smallest_zero_bit[MAX_BITARRAY_BLOCK_VALUE + 1];
Expand Down
27 changes: 14 additions & 13 deletions src/main.c
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,7 @@
#include <stdlib.h>
#include <string.h>
#include <unistd.h>

#include "bwa.h"
#include "bwa_utils.h"
#include "prophex_build.h"
Expand All @@ -24,19 +25,19 @@

static int usage() {
fprintf(stderr, "\n");
fprintf(stderr, "Program: prophex (a lossless k-mer index)\n");
fprintf(stderr, "Program: prophex (an exact k-mer index)\n");
fprintf(stderr, "Version: %s\n", VERSION);
fprintf(stderr, "Authors: Kamil Salikhov, Karel Brinda, Simone Pignotti, Gregory Kucherov\n");
fprintf(stderr, "Contact: kamil.salikhov@univ-mlv.fr\n");
fprintf(stderr, "Contact: kamil.salikhov@univ-mlv.fr, karel.brinda@gmail.com\n");
fprintf(stderr, "\n");
fprintf(stderr, "Usage: prophex <command> [options]\n");
fprintf(stderr, "\n");
fprintf(stderr, "Command: index construct a BWA index and k-LCP\n");
fprintf(stderr, " query query reads against index\n");
fprintf(stderr, "Command: index index sequences in the FASTA format\n");
fprintf(stderr, " query query k-mers\n");
fprintf(stderr, "\n");
fprintf(stderr, " klcp construct an additional k-LCP\n");
fprintf(stderr, " bwtdowngrade downgrade .bwt to the old, more compact format without Occ\n");
fprintf(stderr, " bwt2fa reconstruct FASTA from BWT\n");
fprintf(stderr, " klcp construct an additional k-LCP array\n");
fprintf(stderr, " bwtdowngrade remove OCC from .bwt\n");
fprintf(stderr, " bwt2fa reconstruct .fa from .fa.bwt\n");
fprintf(stderr, "\n");
return 1;
}
Expand All @@ -45,8 +46,8 @@ static int usage_klcp() {
fprintf(stderr, "\n");
fprintf(stderr, "Usage: prophex klcp [options] <idxbase>\n");
fprintf(stderr, "\n");
fprintf(stderr, "Options: -k INT length of k-mer\n");
fprintf(stderr, " -s construct k-LCP and SA in parallel\n");
fprintf(stderr, "Options: -k INT k-mer length\n");
fprintf(stderr, " -s construct also SA, in parallel to k-LCP\n");
fprintf(stderr, " -i sampling distance for SA\n");
fprintf(stderr, " -h print help message\n");
fprintf(stderr, "\n");
Expand Down Expand Up @@ -84,11 +85,11 @@ static int usage_query(int threads) {
fprintf(stderr, "\n");
fprintf(stderr, "Usage: prophex query [options] <idxbase> <in.fq>\n");
fprintf(stderr, "\n");
fprintf(stderr, "Options: -k INT length of k-mer\n");
fprintf(stderr, "Options: -k INT k-mer length\n");
fprintf(stderr, " -u use k-LCP for querying\n");
fprintf(stderr, " -v output set of chromosomes for every k-mer\n");
fprintf(stderr, " -p do not check whether k-mer is on border of two contigs, and show such k-mers in output\n");
fprintf(stderr, " -b print sequences and base qualities\n");
//fprintf(stderr, " -v output matching k-mer sets for every k-mer\n");
//fprintf(stderr, " -p do not check whether k-mer is on border of two contigs, and show such k-mers in output\n");
fprintf(stderr, " -b append sequences and base qualities to the output\n");
fprintf(stderr, " -l STR log file name to output statistics\n");
fprintf(stderr, " -t INT number of threads [%d]\n", threads);
fprintf(stderr, " -h print help message\n");
Expand Down
2 changes: 2 additions & 0 deletions src/prophex_build.c
Original file line number Diff line number Diff line change
@@ -1,7 +1,9 @@
#include "prophex_build.h"

#include <pthread.h>
#include <stdio.h>
#include <string.h>

#include "bwa_utils.h"
#include "bwt.h"
#include "klcp.h"
Expand Down
Loading