Papers for stringologist (2011)
Contents
- A Closer Look at the Closest String and Closest Substring Problem.
- A Simple Parallel Cartesian Tree Algorithm and its Application to Suffix Tree Construction.
- Exact Pattern Matching with Feed-Forward Bloom Filters.
- Fast and Lightweight LCP-Array Construction Algorithms.
- Approximate string-matching with a single gap for sequence alignment.
- DynMap: mapping short reads to multiple related genomes.
- A Method to Ensure the Confidentiality of the Compressed Data.
- An Online Algorithm for Lightweight Grammar-Based Compression.
- Asymptotic Optimal Lossless Compression via the CSE Technique.
- Backwards Search in Context Bound Text Transformations.
- Cache Friendly Burrows-Wheeler Inversion.
- Combining Non-stationary Prediction, Optimization and Mixing for Data Compression.
- Generalized Witness Sets.
- Lempel-Ziv Data Compression on Parallel and Distributed Systems.
- Lossless Compression of Hyperspectral Imagery.
- Natural Language Compression per Blocks.
- Pattern Matching on Sparse Suffix Trees.
- Quick Estimation of Data Compression and De-duplication for Large Storage Systems.
- Straight-Line Programs: A Practical Test.
- Wavelet Trees: From Theory to Practice.
- Chrobak Normal Form Revisited, with Applications.
- Tree Template Matching in Ranked Ordered Trees by Pushdown Automata.
- Indexes for highly repetitive document collections.
- On the Right-Seed Array of a String.
- A Coarse-to-Fine Approach to Computing the k-Best Viterbi Paths.
- A Combinatorial Model of Phyllotaxis Perturbations in Arabidopsis thaliana.
- A d-Step Approach for Distinct Squares in Strings.
- Algorithms on Grammar-Compressed Strings.
- Approximation Algorithms for Orienting Mixed Graphs.
- Automatic Discovery of Patterns in Media Content.
- Computational Regulatory Genomics.
- Counting Colours in Compressed Strings.
- Edit Distance with Duplications and Contractions Revisited.
- Efficient Matching of Biological Sequences Allowing for Non-overlapping Inversions.
- Efficient Seeds Computation Revisited.
- Fast Error-Tolerant Quartet Phylogeny Algorithms.
- Faster Subsequence and Don’t-Care Pattern Matching on Compressed Texts.
- Filling Scaffolds with Gene Repetitions: Maximizing the Number of Adjacencies.
- Finding Approximate and Constrained Motifs in Graphs.
- Forest Alignment with Affine Gaps and Anchors.
- Frequent Submap Discovery.
- Improved MAX SNP-Hard Results for Finding an Edit Distance between Unordered Trees.
- LRM-Trees: Compressed Indices, Adaptive Sorting, and Compressed Permutations.
- Lempel-Ziv Factorization Revisited.
- Lightweight BWT Construction for Very Large String Collections.
- On Wavelet Tree Construction.
- On the Weak Prefix-Search Problem.
- Palindrome Pattern Matching.
- Phylogenetic Footprinting and Consistent Sets of Local Aligments.
- Polynomial-Time Approximation Algorithms for Weighted LCS Problem.
- Quick Greedy Computation for Minimum Common String Partitions.
- Real-Time Streaming String-Matching.
- Restricted Common Superstring and Restricted Common Supersequence.
- Self-indexing Based on LZ77.
- Simple Real-Time Constant-Space String Matching.
- Space Lower Bounds for Online Pattern Matching.
- Sparse and Truncated Suffix Trees on Variable-Length Codes.
- String Comparison and Lyndon-Like Factorization Using V-Order in Linear Time.
- Substring Range Reporting.
- Succincter Text Indexing with Wildcards.
- Tractability Results for the Consecutive-Ones Property with Multiplicity.
- Tractability and Approximability of Maximal Strip Recovery.
- Unique Perfect Phylogeny Is NP-Hard.
- Coding of Sets of Words.
- Color Image Compression Using a Learned Dictionary of Pairs of Orthonormal Bases.
- Compressed Context Modeling for Text Compression.
- Compressed Index for Property Matching.
- Compressed Property Suffix Trees.
- Deplump for Streaming Data.
- Error Recovery Method for PPM Compressed Data.
- Improving PPM Algorithm Using Dictionaries.
- Lossless Data Compression Testbed: ExCom and Prague Corpus.
- Mixing Deduplication and Compression on Active Data Sets.
- On Performance of Compressed Pattern Matching on VF Codes.
- Search and Modification in Compressed Texts.
- Sequence Similarity by Gapped LZW.
- Sliding Window Update Using Suffix Arrays.
- The String-to-Dictionary Matching Problem.
- Tree Structure Compression with RePair.
- Specific “scientific” data structures, and their processing
- Abelian Primitive Words.
- Avoiding Abelian Powers in Partial Words.
- On Highly Repetitive and Power Free Words.
- On Prefix Normal Words.
- Scalable Detection of Frequent Substrings by Grammar-Based Compression.
- Alphabet-Independent Compressed Text Indexing.
- Distribution-Aware Compressed Full-Text Indexes.
- Pattern Matching in Lempel-Ziv Compressed Strings: Fast, Simple, and Deterministic.
- Optimal Packed String Matching.
- Range Majority in Constant Time and Linear Space.
- A Dynamic Stabbing-Max Data Structure with Sub-Logarithmic Query Time.
- A New Algorithm for the Characteristic String Problem under Loose Similarity Criteria.
- Compact Representation of Posets.
- Dynamic Range Majority Data Structures.
- Dynamic Range Selection in Linear Space.
- Encoding 2D Range Maximum Queries.
- External Memory Orthogonal Range Reporting with Fast Updates.
- Faster Approximate Pattern Matching in Compressed Repetitive Texts.
- Path Queries in Weighted Trees.
- Range LCP.
- Space-Efficient Data-Analysis Queries on Grids.
- Succinct Indexes for Circular Patterns.
- A Unifying Property for Distribution-Sensitive Priority Queues.
- Parameterized Longest Previous Factor.
- Periods in Partial Words: An Algorithm.
- Two Constant-Factor-Optimal Realizations of Adaptive Heapsort.
- p-Suffix Sorting as Arithmetic Coding.
- Improved Alignment Based Algorithm for Multilingual Text Compression.
- Unary Pattern Avoidance in Partial Words Dense with Holes.
- Compressed Word Problems for Inverse Monoids.
- On Minimising Automata with Errors.
- Periodicity Algorithms for Partial Words.
- The Bounded Search Tree Algorithm for the Closest String Problem Has Quadratic Smoothed Complexity.
- An Empirical Evaluation of Extendible Arrays.
- Compressed String Dictionaries.
- Online Dictionary Matching with Variable-Length Gaps.
- Practical Compressed Document Retrieval.
- Sample selection for dictionary-based corpus compression.
- Succinct nearest neighbor search.
- Improved Space Bounds for Cache-Oblivious Range Reporting.
- Optimal pattern matching in LZW compressed strings.
- Ordered and Unordered Top-K Range Reporting in Large Data Sets.
- Persistent Predecessor Search and Orthogonal point Location on the Word RAM.
- Random Access to grammar-Compressed Strings.
- Top-K Color Queries for Document Retrieval.
- An Improved B+ Tree for Flash File Systems.
- In-Place Sorting.
- A Knowledge-Based Semantic Kernel for Text Classification.
- A Learned Approach for Ranking News in Real-Time Using the Blogosphere.
- A Multi-faceted Approach to Query Intent Classification.
- A New Approach for Verifying URL Uniqueness in Web Crawlers.
- A Succinct Index for Hypertext.
- Approximate Point Set Pattern Matching with L p -Norm.
- Approximate Regular Expression Matching with Multi-strings.
- Approximations and Partial Solutions for the Consensus Sequence Problem.
- Attribute Retrieval from Relational Web Tables.
- COCA Filters: Co-occurrence Aware Bloom Filters.
- Candidate Document Retrieval for Web-Scale Text Reuse Detection.
- Compressed Indexes for Aligned Pattern Matching.
- Compressed Text Indexing with Wildcards.
- Computing All Subtree Repeats in Ordered Ranked Trees.
- Computing the Longest Common Prefix Array Based on the Burrows-Wheeler Transform.
- Constructing Strings at the Nano Scale via Staged Self-assembly.
- Cross-Lingual Text Fragment Alignment Using Divergence from Randomness.
- Detecting Health Events on the Social Web to Enable Epidemic Intelligence.
- Discounted Cumulative Gain and User Decision Models.
- ESP-Index: A Compressed Index Based on Edit-Sensitive Parsing.
- Enhancing Document Snippets Using Temporal Information.
- External Query Reformulation for Text-Based Image Retrieval.
- Fast Computation of a String Duplication History under No-Breakpoint-Reuse - (Extended Abstract).
- Fast q-gram Mining on SLP Compressed Strings.
- Finding Frequent Elements in Compressed 2D Arrays and Strings.
- Fixed Block Compression Boosting in FM-Indexes.
- Improved Compressed Indexes for Full-Text Document Retrieval.
- Indexing with Gaps.
- Navigating the User Query Space.
- Near Real-Time Suffix Tree Construction via the Fringe Marked Ancestor Problem.
- On Suffix Extensions in Suffix Trees.
- On-Line Construction of Position Heaps.
- Persistency in Suffix Trees with Applications to String Interval Problems.
- Query-Sets + + : A Scalable Approach for Modeling Web Sites.
- Reference Sequence Construction for Relative Compression of Genomes.
- Space Efficient Wavelet Tree Construction.
- Spaced Seeds Design Using Perfect Rulers.
- Sparse Spatial Selection for Novelty-Based Search Result Diversification.
- Succinct Gapped Suffix Arrays.
- Weighted Shortest Common Supersequence.
- When Was It Written? Automatically Determining Publication Dates.
- On Minimal Sturmian Partial Words.
- The power of simple tabulation hashing.
- 2001-2010: Ten Years of Exact String Matching Algorithms.
- A Parameterized Formulation for the Maximum Number of Runs Problem.
- Algorithmics of Posets Generated by Words over Partially Commutative Alphabets.
- An Improved Version of the Runs Algorithm Based on Crochemore’s Partitioning Algorithm.
- Analyzing Edit Distance on Trees: Tree Swap Distance is Intractable.
- Binary Image Compression via Monochromatic Pattern Substitution: A Sequential Speed-Up.
- Computing Abelian Periods in Words.
- Computing Longest Common Substring/Subsequence of Non-linear Texts.
- Computing the Number of Cubic Runs in Standard Sturmian Words.
- Efficient Eager XPath Filtering over XML Streams.
- Finding Long and Multiple Repeats with Edit Distance.
- Improving Deduplication Techniques by Accelerating Remainder Calculations.
- Improving Exact Search of Multiple Patterns From a Compressed Suffix Array.
- Inexact Graph Matching by “Geodesic Hashing” for the Alignment of Pseudoknoted RNA Secondary Structures.
- Inferring Strings from Suffix Trees and Links on a Binary Alphabet.
- Minimization of Acyclic DFAs.
- Notes on Sequence Binary Decision Diagrams: Relationship to Acyclic Automata and Complexities of Binary Set Operations.
- Observations On Compressed Pattern-Matching with Ranked Variables in Zimin Words.
- On Compile Time Knuth-Morris-Pratt Precomputation.
- Variations of Forward-SBNDM.
- Space Efficient Data Structures for Dynamic Orthogonal Range Counting.
- De Bruijn Sequences for the Binary Strings with Maximum Density.
- Efficient Top-k Queries for Orthogonal Ranges.
ACM J. Exp. Algorithmics 2011
- Theory and practice of monotone minimal perfect hashing.
ACM Trans. Algorithms 2011
- Fully compressed suffix trees.
- Succinct indexes for strings, binary relations and multilabeled trees.
- The tree inclusion problem: In linear space and faster.
Algorithmica 2011
- On Optimally Partitioning a Text to Improve Its Compression.
Algorithms 2011
- Compressed Matching in Dictionaries.
- Edit Distance with Block Deletions.
- Compression of DNA sequence reads in FASTQ format.
CoRR 2011
- A Compressed Self-Index for Genomic Databases
- A Dynamic Stabbing-Max Data Structure with Sub-Logarithmic Query Time
- A Faster LZ77-Based Index
- A Regularity Measure for Context Free Grammars
- A Searchable Compressed Edit-Sensitive Parsing
- A Simple Linear-Space Data Structure for Constant-Time Range Minimum Query
- Algorithms for Jumbled Pattern Matching in Strings
- An Improved Move-To-Front(IMTF) Off-line Algorithm for the List Accessing Problem
- An implementation of range trees with fractional cascading in C++
- Anomaly Sequences Detection from Logs Based on Compression
- Approximating Edit Distance in Near-Linear Time
- Compressed String Dictionaries
- Computing a Longest Common Palindromic Subsequence
- Computing on Binary Strings
- Computing q-gram Frequencies on Collage Systems
- Computing q-gram Non-overlapping Frequencies on SLP Compressed Texts
- De-amortizing Binary Search Trees
- Don’t Rush into a Union: Take Time to Find Your Roots
- Dynamic Range Majority Data Structures
- Dynamic Range Selection in Linear Space
- Edit Distance to Monotonicity in Sliding Windows
- Efficient Seeds Computation Revisited
- Encoding 2-D Range Maximum Queries
- External Memory Orthogonal Range Reporting with Fast Updates
- Fast $q$-gram Mining on SLP Compressed Strings
- Faster Approximate Pattern Matching in Compressed Repetitive Texts
- Faster fully compressed pattern matching by recompression
- Fixed Block Compression Boosting in FM-Indexes
- Genome Halving by Block Interchange
- I/O-Efficient Data Structures for Colored Range and Prefix Reporting
- Improved Grammar-Based Compressed Indexes
- Improved space-time tradeoffs for approximate full-text indexing with one edit error
- Inducing the LCP-Array
- K-sort: A new sorting algorithm that beats Heap sort for n <= 70 lakhs!
- Linear Time Inference of Strings from Cover Arrays using a Binary Alphabet
- Linear pattern matching on sparse suffix trees
- Linear-Space Data Structures for Range Mode Query in Arrays
- Lossless data compression on GPGPU architectures
- Mining Patterns in Networks using Homomorphism
- New Lower and Upper Bounds for Representing Sequences
- On Approximability of Block Sorting
- On Compressing Permutations and Adaptive Sorting
- On Dynamic Optimality for Binary Search Trees
- On minimising automata with errors
- On the Complexity of Approximate Sum of Sorted List
- On-line construction of position heaps
- Optimal Indexes for Sparse Bit Vectors
- Orthogonal Range Searching on the RAM, Revisited
- Partial Data Compression and Text Indexing via Optimal Suffix Multi-Selection
- Pattern Matching under Polynomial Transformation
- Pattern matching in Lempel-Ziv compressed strings: fast, simple, and deterministic
- Practical Top-K Document Retrieval in Reduced Space
- Privacy-Enhanced Methods for Comparing Compressed DNA Sequences
- Quadratic-time Algorithm for the String Constrained LCS Problem
- Random input helps searching predecessors
- Reference Sequence Construction for Relative Compression of Genomes
- Relative Lempel-Ziv Factorization for Efficient Storage and Retrieval of Web Collections
- Restructuring Compressed Texts without Explicit Decompression
- Self-Index Based on LZ77
- Self-Index based on LZ77 (thesis)
- Sorting Algorithms with Restrictions
- Space Lower Bounds for Online Pattern Matching
- Space-Efficient Data-Analysis Queries on Grids
- SparseAssembler: de novo Assembly with the Sparse de Bruijn Graph
- Stratified B-trees and versioning dictionaries
- String Indexing for Patterns with Wildcards
- String Matching with Variable Length Gaps
- Substring Range Reporting
- Succinct Representations of Permutations and Functions
- Succincter Text Indexing with Wildcards
- The Cell Probe Complexity of Dynamic Range Counting
- Tight lower bounds for online labeling problem
- Towards Optimal Sorting of 16 Elements
- Towards an Optimal Space-and-Query-Time Index for Top-$k$ Document Retrieval
- Tying up the loose ends in fully LZW-compressed pattern matching
- Uncommon Suffix Tries
- Upper Bounds for Maximally Greedy Binary Search Trees
- Self-Indexed Grammar-Based Compression.
Inf. Comput. 2011
- Space-efficient construction of Lempel-Ziv compressed text indexes.
Inf. Process. Lett. 2011
- Computing Longest Previous non-overlapping Factors.
Inf. Process. Manag. 2011
- Improving semistatic compression via phrase-based modeling.
Inf. Syst. 2011
- Fully dynamic metric access methods based on hyperplane partitioning.
Int. J. Found. Comput. Sci. 2011
- Stronger Quickheaps.
J. Discrete Algorithms 2011
- Fast searching in packed strings.
- Missing pattern discovery.
- Tight bounds for online stable sorting.
Probl. Inf. Transm. 2011
- Computing the longest common substring with one mismatch.
Proc. VLDB Endow. 2011
- Relative Lempel-Ziv Factorization for Efficient Storage and Retrieval of Web Collections.
Theor. Comput. Sci. 2011
- Approximate string matching with stuck address bits.
- On-line approximate string matching with bounded errors.
- Succinct data structures for Searchable Partial Sums with optimal worst-case performance.
- Succinct representation of dynamic trees.
- Verifying and enumerating parameterized border arrays.