GPU-parallel implementation of the DC3-algorithm for computing suffix-arrays. Implemented in Futhark as part of the course Data Parallel Programming.
The suffix array of a string is a lexicographically sorted list of all suffixes.
As an example, the suffix array of fastfuthark is
ark
astfuthark
fastfuthark
futhark
hark
k
rk
stfuthark
tfuthark
thark
uthark
Suffix arrays have wide uses: For example pattern matching in DNA sequence analysis, and n-gram processing for language models.
| Input | Size of input (~Characters) | PBBS: parallelKs (s) |
PBBS: parallelRange (s) |
Futhark: DC3 (s) |
Futhark: gfg (s) |
|---|---|---|---|---|---|
| Trigram String | 100_000_000 | 2.121 | 0.645 | 1.289 | 14.333 |
| DNA-sequence | 34_000_000 | 0.893 | 0.311 | 0.588 | 4.709 |
| Project Gutenberg | 105_000_000 | 3.089 | 1.751 | 2.029 | 15.125 |
| Wikipedia XML | 100_000_000 | 2.733 | 1.804 | 1.822 | 14.276 |
The parallelKs in PBBS (C++) algorithm approximately corresponds to the DC3 algorithm in Futhark.