StringologyTimes

Papers for stringologist (2021)

Contents

ALENEX 2021

  1. A “Learned” Approach to Quicken and Compress Rank/Select Dictionaries.
  2. PFP Compressed Suffix Trees.

CIAA 2021

  1. Approximate Hashing for Bioinformatics.

CIAC 2021

  1. The Parameterized Suffix Tray.

COCOA 2021

  1. Succinct Data Structures for Series-Parallel, Block-Cactus and 3-Leaf Power Graphs.

CPM 2021

  1. A Compact Index for Cartesian Tree Matching.
  2. A Fast and Small Subsampled R-Index.
  3. A Linear Time Algorithm for Constructing Hierarchical Overlap Graphs.
  4. AWLCO: All-Window Length Co-Occurrence.
  5. An Invertible Transform for Efficient String Matching in Labeled Digraphs.
  6. Compressed Weighted de Bruijn Graphs.
  7. Computing Covers of 2D-Strings.
  8. Computing Edit Distance (Invited Talk).
  9. Constructing Strings Avoiding Forbidden Substrings.
  10. Constructing the Bijective and the Extended Burrows-Wheeler Transform in Linear Time.
  11. Data Structures for Categorical Path Counting Queries.
  12. Disorders and Permutations.
  13. Efficient Algorithms for Counting Gapped Palindromes.
  14. Front Matter, Table of Contents, Preface, Conference Organization.
  15. Gapped Indexing for Consecutive Occurrences.
  16. Internal Shortest Absent Word Queries.
  17. On-Line Pattern Matching on D-Texts (Invited Talk).
  18. Optimal Completion and Comparison of Incomplete Phylogenetic Trees Under Robinson-Foulds Distance.
  19. Optimal Construction of Hierarchical Overlap Graphs.
  20. R-enum: Enumeration of Characteristic Substrings in BWT-runs Bounded Space.
  21. Ranking Bracelets in Polynomial Time.
  22. Repetitions in Strings: A “Constant” Problem (Invited Talk).
  23. String Sanitization Under Edit Distance: Improved and Generalized.
  24. The Longest Run Subsequence Problem: Further Complexity Results.
  25. The k-Mappability Problem Revisited.
  26. Weighted Ancestors in Suffix Trees Revisited.

DCC 2021

  1. A Disk-Based Index for Trajectories with an In-Memory Compressed Cache.
  2. A grammar compressor for collections of reads with applications to the construction of the BWT.
  3. Accelerating Knuth-Morris-Pratt String Matching over LZ77 Compressed Text.
  4. Approximate Hashing for Bioinformatics.
  5. Backward Weighted Coding.
  6. Compact Representation of Spatial Hierarchies and Topological Relationships.
  7. DZip: improved general-purpose loss less compression based on novel neural network modeling.
  8. Efficiently Merging r-indexes.
  9. End-to-End optimized image compression for machines, a study.
  10. Improved LZ77 Compression.
  11. Improving Run Length Encoding by Preprocessing.
  12. Neural Networks Optimally Compress the Sawbridge.
  13. On Elias-Fano for Rank Queries in FM-Indexes.
  14. On Random Editing in LZ-End.
  15. PHONI: Streamed Matching Statistics with Multi-Genome References.
  16. Parallel Processing of Grammar Compression.
  17. Smaller RLZ-Compressed Suffix Arrays.
  18. Succinct Data Structures for Small Clique-Width Graphs.
  19. Succinct representations of Intersection Graphs on a Circle.
  20. ndzip: A High-Throughput Parallel Lossless Compressor for Scientific Data.

DLT 2021

  1. Upper Bounds on Distinct Maximal (Sub-)Repetitions in Compressed Strings.
  2. Weighted Prefix Normal Words: Mind the Gap.

ESA 2021

  1. Bidirectional String Anchors: A New String Sampling Mechanism.
  2. Compression by Contracting Straight-Line Programs.
  3. Dynamic Colored Orthogonal Range Searching.
  4. Fast and Space-Efficient Construction of AVL Grammars from the LZ77 Parsing.
  5. Faster Algorithms for Longest Common Substring.
  6. Hypersuccinct Trees - New Universal Tree Source Codes for Optimal Compressed Tree Data Structures and Range Minima.
  7. Lyndon Words Accelerate Suffix Sorting.
  8. Minimum Common String Partition: Exact Algorithms.
  9. Space Efficient Two-Dimensional Orthogonal Colored Range Counting.

FOCS 2021

  1. Breaking the Cubic Barrier for (Unweighted) Tree Edit Distance.
  2. Small-space and streaming pattern matching with $k$ edits.

ICALP 2021

  1. A Linear-Time n0.4-Approximation for Longest Common Subsequence.
  2. An Almost Optimal Edit Distance Oracle.
  3. Analysis of Smooth Heaps and Slim Heaps.
  4. Faster Algorithms for Bounded Tree Edit Distance.
  5. Fine-Grained Hardness for Edit Distance to a Fixed Sequence.
  6. Improved Approximation for Longest Common Subsequence over Small Alphabets.
  7. LF Successor: Compact Space Indexing for Order-Isomorphic Pattern Matching.
  8. Linear Time Runs Over General Ordered Alphabets.
  9. New Sublinear Algorithms and Lower Bounds for LIS Estimation.
  10. Optimal-Time Queries on BWT-Runs Compressed Indexes.
  11. Sorting Short Integers.
  12. Streaming and Small Space Approximation Algorithms for Edit Distance and Longest Common Subsequence.

ISAAC 2021

  1. Algorithms and Complexity on Indexing Elastic Founder Graphs.
  2. Inverse Suffix Array Queries for 2-Dimensional Pattern Matching in Near-Compact Space.
  3. Pattern Masking for Dictionary Matching.
  4. Repetition- and Linearity-Aware Rank/Select Dictionaries.
  5. Resilient Level Ancestor, Bottleneck, and Lowest Common Ancestor Queries in Dynamic Trees.
  6. Streaming Pattern Matching (Invited Talk).

IWOCA 2021

  1. The Tandem Duplication Distance Problem Is Hard over Bounded Alphabets.

LATA 2021

  1. Cadences in Grammar-Compressed Strings.
  2. Succinct Representations for (Non)Deterministic Finite Automata.

MFCS 2021

  1. Matching Patterns with Variables Under Hamming Distance.

RP 2021

  1. Absent Subsequences in Words.

SEA 2021

  1. Document Retrieval Hacks.
  2. Engineering Predecessor Data Structures for Dynamic Integer Sets.

SODA 2021

  1. A Lower Bound for Dynamic Fractional Cascading.
  2. Beating the probabilistic lower bound on perfect hashing.
  3. Competitive Data-Structure Dynamization.
  4. New Data Structures for Orthogonal Range Reporting and Range Minima Queries.
  5. On Indexing and Compressing Finite Automata.
  6. On Locating Paths in Compressed Tries.
  7. Optimal Oblivious Priority Queues.

SOFSEM 2021

  1. A Normal Sequence Compressed by PPM* But Not by Lempel-Ziv 78.
  2. Blocksequences of k-local Words.
  3. Novel Results on the Number of Runs of the Burrows-Wheeler-Transform.

SOSA 2021

  1. Modular Subset Sum, Dynamic Strings, and Zero-Sum Sets.
  2. Soft Sequence Heaps.

SPIRE 2021

  1. A Separation of γ and b via Thue-Morse Words.
  2. All Instantiations of the Greedy Algorithm for the Shortest Common Superstring Problem are Equivalent.
  3. An LMS-Based Grammar Self-index with Local Consistency Properties.
  4. Computing the Original eBWT Faster, Simpler, and with Less Memory.
  5. Exploiting Pseudo-locality of Interchange Distance.
  6. Extracting the Sparse Longest Common Prefix Array from the Suffix Binary Search Tree.
  7. Grammar Index by Induced Suffix Sorting.
  8. Improved Topic Modeling in Twitter Through Community Pooling.
  9. Longest Common Rollercoasters.
  10. Lower Bounds for the Number of Repetitions in 2D Strings.
  11. Minimal Unique Palindromic Substrings After Single-Character Substitution.
  12. On Stricter Reachable Repetitiveness Measures.
  13. On the Approximation Ratio of LZ-End to LZ77.
  14. Permutation-Constrained Common String Partitions with Applications.
  15. Position Heaps for Cartesian-Tree Matching on Strings and Tries.
  16. String Covers of a Tree.
  17. TSXor: A Simple Time Series Compression Algorithm.
  18. Unicode at Gigabytes per Second.
  19. findere: Fast and Precise Approximate Membership Query.
  20. r-Indexing the eBWT.

STACS 2021

  1. Efficiently Testing Simon’s Congruence.
  2. Finding an Optimal Alphabet Ordering for Lyndon Factorization Is Hard.
  3. The Edit Distance to k-Subsequence Universality.

STOC 2021

  1. Fully dynamic approximation of LIS in polylogarithmic time.
  2. Improved dynamic algorithms for longest increasing subsequence.
  3. Subcubic algorithms for Gomory-Hu tree in unweighted graphs.

Stringology 2021

  1. Automata Approach to Inexact Tree Pattern Matching Using 1-degree Edit Distance.
  2. Computational Substantiation of the d-step Conjecture for Distinct Squares Revisited.
  3. Counting Lyndon Subsequences.
  4. Pitfalls of Algorithm Comparison.
  5. Refined Upper Bounds on the Size of the Condensed Neighbourhood of Sequences.
  6. Searching with Extended Guard and Pivot Loop.
  7. The n-ary Initial Literal and Literal Shuffle.
  8. Towards an Efficient Text Sampling Approach for Exact and Approximate Matching.

WABI 2021

  1. Compressing and Indexing Aligned Readsets.
  2. Space-Efficient Representation of Genomic k-Mer Count Tables.

ACM Comput. Surv. 2021

  1. Predecessor Search.

ACM J. Exp. Algorithmics 2021

  1. Engineering Practical Lempel-Ziv Tries.

ACM Trans. Algorithms 2021

  1. Optimal Substring Equality Queries with Applications to Sparse Text Indexing.
  2. Optimal-Time Dictionary-Compressed Indexes.

ACM Trans. Knowl. Discov. Data 2021

  1. Combinatorial Algorithms for String Sanitization.

Algorithmica 2021

  1. Internal Dictionary Matching.
  2. Range Majorities and Minorities in Arrays.
  3. Top Tree Compression of Tries.

Algorithms 2021

  1. Compressed Communication Complexity of Hamming Distance.
  2. Lempel-Ziv Parsing for Sequences of Blocks.
  3. Non-Overlapping LZ77 Factorization and LZ78 Substring Compression Queries with Suffix Trees.
  4. Re-Pair in Small Space.
  5. Reversed Lempel-Ziv Factorization with Suffix Trees.
  6. Subpath Queries on Compressed Graphs: A Survey.

Bioinform. 2021

  1. Accurate spliced alignment of long RNA sequencing reads.

CoRR 2021

  1. $r$-indexing Wheeler graphs.
  2. A Bloom Filter Survey: Variants for Different Domain Applications.
  3. A Conditional Lower Bound for Episode Matching.
  4. A Fast and Small Subsampled R-index.
  5. A Linear Time Algorithm for Constructing Hierarchical Overlap Graphs.
  6. A New Lossless Data Compression Algorithm Exploiting Positional Redundancy.
  7. A Separation of γ and b via Thue-Morse Words.
  8. A Simple Algorithm for Optimal Search Trees with Two-Way Comparisons.
  9. A Tight Analysis of Slim Heaps and Smooth Heaps.
  10. A new compressed cover tree guarantees a near linear parameterized complexity for all $k$-nearest neighbors search in metric spaces.
  11. A new distance based on minimal absent words and applications to biological sequences.
  12. APMF < APSP? Gomory-Hu Tree for Unweighted Graphs in Almost-Quadratic Time.
  13. Abelian Repetition Threshold Revisited.
  14. Absent Subsequences in Words.
  15. Algorithms and Complexity on Indexing Founder Graphs.
  16. Algorithms and Hardness for Multidimensional Range Updates and Queries.
  17. All instantiations of the greedy algorithm for the shortest superstring problem are equivalent.
  18. An Almost Optimal Edit Distance Oracle.
  19. An Improved Algorithm for The k-Dyck Edit Distance Problem.
  20. An O(k log n) algorithm for prefix based ranked autocomplete.
  21. An efficient way to manage blocks of data with Wise Red-Black Trees.
  22. Analysis of Smooth Heaps and Slim Heaps.
  23. Approximate Membership Query Filters with a False Positive Free Set.
  24. Approximating LCS and Alignment Distance over Multiple Sequences.
  25. Approximating Length-Restricted Means under Dynamic Time Warping.
  26. Approximating the Longest Common Subsequence problem within a sub-polynomial factor in linear time.
  27. Approximation Algorithms for LCS and LIS with Truly Improved Running Times.
  28. Arbitrary-length analogs to de Bruijn sequences.
  29. Beyond Single-Deletion Correcting Codes: Substitutions and Transpositions.
  30. Beyond the Longest Letter-duplicated Subsequence Problem.
  31. Binary Dynamic Time Warping in Linear Time.
  32. Boosting the Search Performance of B+-tree for Non-volatile Memory with Sentinels.
  33. Breaking the Cubic Barrier for (Unweighted) Tree Edit Distance.
  34. Breaking the O(n)-Barrier in the Construction of Compressed Suffix Arrays.
  35. Burrows Wheeler Transform on a Large Scale: Algorithms Implemented in Apache Spark.
  36. Cantor Mapping Technique.
  37. Checking whether a word is Hamming-isometric in linear time.
  38. Chunk List: Concurrent Data Structures.
  39. Closed Ziv-Lempel factorization of the m-bonacci words.
  40. Combinatorics of minimal absent words for a sliding window.
  41. Compact Euler Tours of Trees with Small Maximum Degree.
  42. Complex Event Forecasting with Prediction Suffix Trees: Extended Technical Report.
  43. Compressed Communication Complexity of Hamming Distance.
  44. Compression by Contracting Straight-Line Programs.
  45. Computing Matching Statistics on Repetitive Texts.
  46. Computing the original eBWT faster, simpler, and with less memory.
  47. Conditional Lower Bounds for Variants of Dynamic LIS.
  48. Construction of Sparse Suffix Trees and LCE Indexes in Optimal Time and Space.
  49. Counting Lyndon Subsequences.
  50. Counting and Verifying Abelian Border Arrays of Binary Words.
  51. Critical factorisation in square-free words.
  52. Defeating duplicates: A re-design of the LearnedSort algorithm.
  53. Deriving monadic quicksort (Declarative Pearl).
  54. Does Preprocessing help in Fast Sequence Comparisons?
  55. Dynamic Longest Increasing Subsequence and the Erdös-Szekeres Partitioning Problem.
  56. Dynamic Range Mode Enumeration.
  57. Dynamic Suffix Array with Sub-linear update time and Poly-logarithmic Lookup Time.
  58. Efficient construction of the extended BWT from grammar-compressed DNA sequencing reads.
  59. Empirically Improved Tokuda Gap Sequence in Shellsort.
  60. Engineering Predecessor Data Structures for Dynamic Integer Sets.
  61. Entropy of Mersenne-Twisters.
  62. Estimating the Longest Increasing Subsequence in Nearly Optimal Time.
  63. ExtendedHyperLogLog: Analysis of a new Cardinality Estimator.
  64. FM-Indexing Grammars Induced by Suffix Sorting for Long Patterns.
  65. Fast Succinct Retrieval and Approximate Membership using Ribbon.
  66. Fast and Space-Efficient Construction of AVL Grammars from the LZ77 Parsing.
  67. Fast direct access to variable length codes.
  68. Faster Algorithms for Bounded Tree Edit Distance.
  69. Faster Algorithms for Longest Common Substring.
  70. Faster Exponential Algorithm for Permutation Pattern Matching.
  71. Friendly Cut Sparsifiers and Faster Gomory-Hu Trees.
  72. From Bit-Parallelism to Quantum: Breaking the Quadratic Barrier.
  73. Gap Edit Distance via Non-Adaptive Queries: Simple and Optimal.
  74. Gapped Indexing for Consecutive Occurrences.
  75. Grammar Index By Induced Suffix Sorting.
  76. Graphs can be succinctly indexed for pattern matching in $ O(│E│^2 + │V│^{5 / 2}) $ time.
  77. HINT: A Hierarchical Index for Intervals in Main Memory.
  78. HOLZ: High-Order Entropy Encoding of Lempel-Ziv Factor Distances.
  79. Hardness of Detecting Abelian and Additive Square Factors in Strings.
  80. Hierarchical Bitmap Indexing for Range and Membership Queries on Multidimensional Arrays.
  81. How Compression and Approximation Affect Efficiency in String Distance Measures.
  82. Hypersuccinct Trees - New universal tree source codes for optimal compressed tree data structures.
  83. Improved Approximation Algorithms for Dyck Edit Distance and RNA Folding.
  84. Improved Approximation Guarantees for Shortest Superstrings using Cycle Classification by Overlap to Length Ratios.
  85. Improved Approximation for Longest Common Subsequence over Small Alphabets.
  86. Improving Run Length Encoding by Preprocessing.
  87. Internal Shortest Absent Word Queries in Constant Time and Linear Space.
  88. Is this the simplest (and most surprising) sorting algorithm ever?
  89. Learned Sorted Table Search and Static Indexes in Small Space: Methodological and Practical Insights via an Experimental Study.
  90. Linear Approximate Pattern Matching Algorithm.
  91. Linear Time Runs over General Ordered Alphabets.
  92. Linear-time Minimization of Wheeler DFAs.
  93. Load-Balancing Succinct B Trees.
  94. Lower Bounds and Improved Algorithms for Asymmetric Streaming Edit Distance and Longest Common Subsequence.
  95. Lower Bounds for the Number of Repetitions in 2D Strings.
  96. Matching Patterns with Variables under Hamming Distance.
  97. Memory-Optimality for Non-Blocking Containers.
  98. Minimal unique palindromic substrings after single-character substitution.
  99. N-ary Huffman Encoding Using High-Degree Trees - A Performance Comparison.
  100. Near-Optimal Quantum Algorithms for String Problems.
  101. Nearly Tight Lower Bounds for Succinct Range Minimum Query.
  102. Number Parsing at a Gigabyte per Second.
  103. On (co-lex) Ordering Automata.
  104. On Arithmetically Progressed Suffix Arrays and related Burrows-Wheeler Transforms.
  105. On Solving the Minimum Common String Partition Problem by Decision Diagrams.
  106. On Specialization of a Program Model of Naive Pattern Matching in Strings (Extended Abstract).
  107. On Stricter Reachable Repetitiveness Measures.
  108. On the Cost of Unsuccessful Searches in Search Trees with Two-way Comparisons.
  109. On the Optimal Time/Space Tradeoff for Hash Tables.
  110. On the approximation ratio of LZ-End to LZ77.
  111. Optimal Gap Sequences in Shellsort for n≤6 Elements.
  112. Optimal Sorting Circuits for Short Keys.
  113. Optimal Space and Time for Streaming Pattern Matching.
  114. PTHash: Revisiting FCH Minimal Perfect Hashing.
  115. Parallel Batch-Dynamic kd-Trees.
  116. Parallel Batched Interpolation Search Tree.
  117. Parallel and External-Memory Construction of Minimal Perfect Hash Functions with PTHash.
  118. Pattern Matching on Grammar-Compressed Strings in Linear Time.
  119. Pattern-defeating Quicksort.
  120. Position Heaps for Cartesian-tree Matching on Strings and Tries.
  121. Practical evaluation of Lyndon factors via alphabet reordering.
  122. Prefixes of the Fibonacci word that end with a cube.
  123. RLBWT Tricks.
  124. Range Minimum Queries in Minimal Space.
  125. Reconstruction of Sets of Strings from Prefix/Suffix Compositions.
  126. Resilient Level Ancestor, Bottleneck, and Lowest Common Ancestor Queries in Dynamic Trees.
  127. Ruler Wrapping.
  128. SIMD-Optimized Search Over Sorted Data.
  129. Scalable Hash Table for NUMA Systems.
  130. Sensitivity of string compressors and repetitiveness measures.
  131. SetSketch: Filling the Gap between MinHash and HyperLogLog.
  132. Simple Worst-Case Optimal Adaptive Prefix-Free Coding.
  133. Small space and streaming pattern matching with k edits.
  134. Solving one variable word equations in the free group in cubic time.
  135. Sorted Range Reporting.
  136. Sorting Short Integers.
  137. Space Efficient Two-Dimensional Orthogonal Colored Range Counting.
  138. Space-Efficient Huffman Codes Revisited.
  139. Spanner Evaluation over SLP-Compressed Documents.
  140. Stochastic and Worst-Case Generalized Sorting Revisited.
  141. Strictly In-Place Algorithms for Permuting and Inverting Permutations.
  142. String Comparison on a Quantum Computer Using Hamming Distance.
  143. String Sampling with Bidirectional String Anchors.
  144. Succinct Data Structure for Path Graphs.
  145. Succinct Data Structures for Series-Parallel, Block-Cactus and 3-Leaf Power Graphs.
  146. Support Optimality and Adaptive Cuckoo Filters.
  147. Text Searching Allowing for Non-Overlapping Adjacent Unbalanced Translocations.
  148. The Dynamic k-Mismatch Problem.
  149. The k-mappability problem revisited.
  150. Tiny Pointers.
  151. Tree Edit Distance with Variables. Measuring the Similarity between Mathematical Formulas.
  152. Weighted Ancestors in Suffix Trees Revisited.
  153. Weighted Burrows-Wheeler Compression.
  154. Which Regular Languages can be Efficiently Indexed?

Comput. J. 2021

  1. Smaller Compressed Suffix Arrays†.

IEEE Trans. Inf. Theory 2021

  1. On the Approximation Ratio of Ordered Parsings.
  2. The Smallest Grammar Problem Revisited.

Inf. Comput. 2021

  1. Efficient pattern matching in elastic-degenerate strings.
  2. On the cost of unsuccessful searches in search trees with two-way comparisons.
  3. Wheeler languages.

Inf. Process. Lett. 2021

  1. Longest previous overlapping factor array.

Int. J. Geogr. Inf. Sci. 2021

  1. An index for moving objects with constant-time access to their compressed trajectories.

J. Comput. Syst. Sci. 2021

  1. Block trees.
  2. Circular pattern matching with k mismatches.
  3. Grammar-compressed indexes with logarithmic search time.

J. Inf. Process. 2021

  1. Towards a Complete Perspective on Labeled Tree Indexing: New Size Bounds, Efficient Constructions, and Beyond.

J. King Saud Univ. Comput. Inf. Sci. 2021

  1. A survey on data compression techniques: From the perspective of data quality, coding schemes, data type and applications.

Theor. Comput. Sci. 2021

  1. Adaptive learning of compressible strings.
  2. Computing longest palindromic substring after single-character or block-wise edits.
  3. Efficiently computing runs on a trie.
  4. Maximal unbordered factors of random strings.
  5. Space-efficient construction of compressed suffix trees.
  6. Tight upper and lower bounds on suffix tree breadth.
  7. When a dollar makes a BWT.

Theory Comput. Syst. 2021

  1. Constructing Antidictionaries of Long Texts in Output-Sensitive Space.