Bitmap-indexed library catalog engine for million-scale query performance.
CardCatalog analyzes the dataset first, then builds a compact columnar storage layer and dense bitmap inverted indexes to answer title, author, and category queries in microseconds.
We measured books_dataset.txt before designing anything. Two properties — very low field cardinality with large result sets, and dense sequential identifiers — determined every decision that follows.
Only 78 distinct titles, 59 authors and 15 categories span all 1M rows — each category alone covers ~64–65K records.
Identifiers run 1…N with no gaps, so a record's position is just offset = id − 1. The id is never stored.
A single field value matches tens of thousands of records — the canonical case for dense bitmaps over sparse posting lists.
Instead of a generic record-by-record layout, CardCatalog exploits the measured structure of the dataset. Low-cardinality fields become compact single-byte codes, and each field value receives a dense bitmap over record positions — turning every count into a cached number and every intersection into a word-wise AND.
From a memory-mapped source file down through dictionary-encoded columns and a dense bitmap index to the query layer. Scroll to activate each layer.
books_dataset.txt — one million rowstitle_code author_code category_code as byte arraysyear as 16-bit values · id implicit, not storedAND / OR / AND-NOT, vectorised to AVX2get_by_id · exact field query · countA simulated view of the index — not a live backend. Pick an operation and watch the 64-bit blocks combine into a result vector.
Median of 200 repetitions on the release build (-O3 -march=native -flto -pthread), measured at the three assignment scales.
Every write keeps the columns, the bitmaps and the cached cardinalities consistent in constant time. Deletion never moves data.
Append at the next offset, encode each field, set three bitmap bits, grow the columns.
Compute offset = id − 1, check the tombstone bit, decode the byte codes.
Clear the old bitmap bit, set the new bitmap bit, adjust the code and the cardinality.
Set one bit in the shared deleted bitmap and keep the data in place; queries AND-NOT it out.
Restart the system without re-parsing the source. The snapshot stores coded columns, the year column, dictionaries and the deleted bitmap; the bitmaps are rebuilt in memory on load.
A debugging story. Each fix sharpened a correctness or measurement guarantee.
CardCatalog demonstrates how file organization, indexing, and memory-layout decisions can transform a million-record dataset into a compact, high-speed query engine.