Mark Cieliebak1, Thomas Erlebach2,
Zsuzsanna Lipták1, Jens Stoye3, and Emo Welzl1
1 Institute for Theoretical Computer Science, ETH Zürich
2 Computer Engineering and Networks Laboratory (TIK), ETH Zürich
3 Max Planck Institute for Molecular Genetics, Berlin
We investigate a combinatorial problem that originates from computational
biology: Given a string
of length n over a weighted alphabet,
find a data structure and a query algorithm which, for a given weight M,
decides whether
has a (contiguous) substring of weight M. Here,
the weight of a string is the sum of the weights of its letters. There are
two simple algorithms that solve the problem: one runs in time linear in
n, the other has running time
,
but requires an additional
data structure of size O(n2). We present an algorithm that uses
preprocessing on
to construct a data structure of linear size
that allows to perform queries in sublinear time, namely time
.