Random-Access Compressed String Store (RACSS)


RACSS is a hybrid technology positioned between compression and indexing. It is designed for compact storage of large collections of strings with true random access, allowing any individual string to be decompressed independently, without decoding the entire dataset.

RACSS is neither a general-purpose compressor nor a classical string indexing system. It represents a separate class of solutions: a compressed string retrieval subsystem.


What RACSS actually provides

RACSS enables:

In practice, RACSS behaves as a string store optimized for retrieval, not as a streaming compressor.


Core concept

Strings are stored using a recursive dictionary representation:

Despite the recursive structure, real-world datasets exhibit very shallow recursion depth, even for large and highly redundant inputs.

This makes RACSS particularly suitable for systems where:


Example: how RACSS tokenizes real text

To make the core idea concrete, here is a minimal real-world example showing how RACSS processes and tokenizes text.
Input text (8 lines):

Jingle Bells, Jingle Bells
Jingle all the way
Oh what fun it is to ride in a
One horse open sleigh
Jingle Bells, Jingle Bells
Jingle all the way
Oh what fun it is to ride in a one
Horse open sleigh

After compression, RACSS produces two kinds of entries:

Tokens in square brackets ([n]) refer to dictionary entries by index.

Logical lines

L       1 :"[7]"
L       2 :"[3]"
L       3 :"[2]"
L       4 :"On[6]h[4]"
L       5 :"[7]"
L       6 :"[3]"
L       7 :"[2] one"
L       8 :"H[4]"

Each logical line is stored independently as a sequence of literals and dictionary references.

Dictionary entries

D       1 :"[5]Bells"
D       2 :"Oh what fu[8]it is to rid[6]i[8]a"
D       3 :"[5]all th[6]way"
D       4 :"ors[6]ope[8]sleigh"
D       5 :"Jingl[6]"
D       6 :"e "
D       7 :"[1], [1]"
D       8 :"n "

The dictionary is self-referential: dictionary entries may reference other dictionary entries. This allows RACSS to represent recurring substrings compactly without flattening them into a single global stream.

What this demonstrates


How RACSS differs from gzip / LZ / zstd

Aspect gzip / LZ RACSS
Data model Stream String collection
Random access - +
Partial decompression - +
Runtime state Complex Minimal
Embedded suitability Limited High

RACSS does not aim to compete with a general-purpose compressors. It is designed for a different problem space, where stream-oriented compression is inherently inefficient, and fast random access to individual lines or records is required. The table below is provided solely for comparison purposes with gzip, one of the most widely used compressors in Linux environments. Keep in mind that RACSS files include indices to enable arbitrary line retrieval without decompressing the entire file. RACSS is particularly useful for game localization files, dictionaries, navigation databases, and embedded systems, where compact storage and efficient random access are critical.

File Raw size RACSS GZIP
Wonderful World lyric 596 444 (74.5%) 338 (56.6%)
Let My People Go lyric 730 359 (49.2%) 280 (38.3%)
10,000 substance names 1245737 366063 (29.4%) 334000 (26.8%)
Multilingual article 2445 1234 (50.5%) 1067 (43.6%)

Typical application areas

Game engines and consoles

Navigation systems

Embedded and legacy devices

Offline reference data


Demo retrieval tool

The distribution includes a minimal demo retrieval program (rfetch), implemented in approximately 200 lines of C.

Its purpose is to:

This program is not a full API and not a reference specification. It is intentionally minimal, serving as a proof-of-concept and a transparency tool.

Supported modes:


Usage:
  ./rfetch <file.rc>                    - unpack all lines (default)
  ./rfetch <file.rc>  N                 - unpack line N (1-based)
  ./rfetch <file.rc>  0                 - print raw lines and dictionary in debug format
  ./rfetch <file.rc> -N                 - unpack dict entry N (1-based)
  ./rfetch <file.rc> <out-of-range num> - print valid range and header

What is published

This release provides:

It intentionally does not include:

The goal is to demonstrate feasibility, performance, and simplicity, not to disclose the full compression pipeline.


Why this is commercially relevant

RACSS enables:

It is especially relevant for companies maintaining long-lived products where storage size, predictability, and backward compatibility matter more than peak compression ratio.


This site contains decompressor source code and example of compressed data.