RACSS is a hybrid technology positioned between compression and indexing. It is designed for compact storage of large collections of strings with true random access, allowing any individual string to be decompressed independently, without decoding the entire dataset.
RACSS is neither a general-purpose compressor nor a classical string indexing system. It represents a separate class of solutions: a compressed string retrieval subsystem.
RACSS enables:
In practice, RACSS behaves as a string store optimized for retrieval, not as a streaming compressor.
Strings are stored using a recursive dictionary representation:
Despite the recursive structure, real-world datasets exhibit very shallow recursion depth, even for large and highly redundant inputs.
This makes RACSS particularly suitable for systems where:
To make the core idea concrete, here is a minimal real-world example showing how RACSS processes and tokenizes text.
Input text (8 lines):
Jingle Bells, Jingle Bells Jingle all the way Oh what fun it is to ride in a One horse open sleigh Jingle Bells, Jingle Bells Jingle all the way Oh what fun it is to ride in a one Horse open sleigh
After compression, RACSS produces two kinds of entries:
Tokens in square brackets ([n]) refer to dictionary entries by index.
Logical lines
L 1 :"[7]" L 2 :"[3]" L 3 :"[2]" L 4 :"On[6]h[4]" L 5 :"[7]" L 6 :"[3]" L 7 :"[2] one" L 8 :"H[4]"
Each logical line is stored independently as a sequence of literals and dictionary references.
Dictionary entries
D 1 :"[5]Bells" D 2 :"Oh what fu[8]it is to rid[6]i[8]a" D 3 :"[5]all th[6]way" D 4 :"ors[6]ope[8]sleigh" D 5 :"Jingl[6]" D 6 :"e " D 7 :"[1], [1]" D 8 :"n "
The dictionary is self-referential: dictionary entries may reference other dictionary entries. This allows RACSS to represent recurring substrings compactly without flattening them into a single global stream.
What this demonstrates
| Aspect | gzip / LZ | RACSS |
|---|---|---|
| Data model | Stream | String collection |
| Random access | - | + |
| Partial decompression | - | + |
| Runtime state | Complex | Minimal |
| Embedded suitability | Limited | High |
RACSS does not aim to compete with a general-purpose compressors. It is designed for a different problem space, where stream-oriented compression is inherently inefficient, and fast random access to individual lines or records is required. The table below is provided solely for comparison purposes with gzip, one of the most widely used compressors in Linux environments. Keep in mind that RACSS files include indices to enable arbitrary line retrieval without decompressing the entire file. RACSS is particularly useful for game localization files, dictionaries, navigation databases, and embedded systems, where compact storage and efficient random access are critical.
| File | Raw size | RACSS | GZIP |
|---|---|---|---|
| Wonderful World lyric | 596 | 444 (74.5%) | 338 (56.6%) |
| Let My People Go lyric | 730 | 359 (49.2%) | 280 (38.3%) |
| 10,000 substance names | 1245737 | 366063 (29.4%) | 334000 (26.8%) |
| Multilingual article | 2445 | 1234 (50.5%) | 1067 (43.6%) |
Game engines and consoles
Navigation systems
Embedded and legacy devices
Offline reference data
The distribution includes a minimal demo retrieval program (rfetch), implemented in approximately 200 lines of C.
Its purpose is to:
This program is not a full API and not a reference specification. It is intentionally minimal, serving as a proof-of-concept and a transparency tool.
Supported modes:
Usage:
./rfetch <file.rc> - unpack all lines (default)
./rfetch <file.rc> N - unpack line N (1-based)
./rfetch <file.rc> 0 - print raw lines and dictionary in debug format
./rfetch <file.rc> -N - unpack dict entry N (1-based)
./rfetch <file.rc> <out-of-range num> - print valid range and header
This release provides:
It intentionally does not include:
The goal is to demonstrate feasibility, performance, and simplicity, not to disclose the full compression pipeline.
RACSS enables:
It is especially relevant for companies maintaining long-lived products where storage size, predictability, and backward compatibility matter more than peak compression ratio.