RLE : Run Length Encoding

By Pradyumna Chippigiri

February 5, 2026


RLE (Run-Length Encoding) is a lossless compression/encoding technique that replaces consecutive repeated values with a compact pair:


(value, count)

Real-life example: “Which year had the most LinkedIn signups?”

Imagine a small linkedin_signups table:

linkedin_signups sample table


In a columnar file (Parquet/ORC), the signup_year column is stored sequentially (and often time-clustered), so it looks like:

2019, 2019, 2019, 2020, 2020, 2021, 2021, 2021, 2022

RLE compresses this into runs:

(2019, 3)
(2020, 2)
(2021, 3)
(2022, 1)

Now the question:


“Which year had the most signups?”


is basically just: pick the run with the biggest count → 2019 or 2021 (tie) in this toy sample.

In real datasets, those counts might be:

(2019, 120,000)
(2020, 310,000)
(2021, 540,000) ✅
(2022, 480,000)

RLE use-cases: