RLE : Run Length Encoding
February 5, 2026
RLE (Run-Length Encoding) is a lossless compression/encoding technique that replaces consecutive repeated values with a compact pair:
(value, count)
Real-life example: “Which year had the most LinkedIn signups?”
Imagine a small linkedin_signups table:

In a columnar file (Parquet/ORC), the signup_year column is stored sequentially (and often time-clustered), so it looks like:
2019, 2019, 2019, 2020, 2020, 2021, 2021, 2021, 2022RLE compresses this into runs:
(2019, 3)
(2020, 2)
(2021, 3)
(2022, 1)Now the question:
“Which year had the most signups?”
is basically just: pick the run with the biggest count → 2019 or 2021 (tie) in this toy sample.
In real datasets, those counts might be:
(2019, 120,000)
(2020, 310,000)
(2021, 540,000) ✅
(2022, 480,000)RLE use-cases:
-
Use when values repeat in long blocks (years, countries, status flags), especially when data is sorted/partitioned by time. (mostly in OLAP systems)
-
When values change constantly → runs are short → little to no benefit.