RLE : Run Length Encoding

February 5, 2026

RLE (Run-Length Encoding) is a lossless compression/encoding technique that replaces consecutive repeated values with a compact pair:

(value, count)

Real-life example: “Which year had the most LinkedIn signups?”

Imagine a small linkedin_signups table:

In a columnar file (Parquet/ORC), the signup_year column is stored sequentially (and often time-clustered), so it looks like:

2019, 2019, 2019, 2020, 2020, 2021, 2021, 2021, 2022

RLE compresses this into runs:

(2019, 3)
(2020, 2)
(2021, 3)
(2022, 1)

Now the question:

“Which year had the most signups?”

is basically just: pick the run with the biggest count → 2019 or 2021 (tie) in this toy sample.

In real datasets, those counts might be:

(2019, 120,000)
(2020, 310,000)
(2021, 540,000) ✅
(2022, 480,000)

RLE use-cases:

Use when values repeat in long blocks (years, countries, status flags), especially when data is sorted/partitioned by time. (mostly in OLAP systems)
When values change constantly → runs are short → little to no benefit.