Polars and Parquet

Polars has been making waves in the Python data world — and for good reason. It’s fast, expressive, and built with performance-first principles. If you’re dealing with Parquet files in an S3 bucket and care even a little about speed or memory, this post is for you. Let’s talk about three powerful components working together: 🪣 S3 bucket storing Parquet files 📦 PyArrow-style datasets ⚡ Polars doing its thing — lazily and efficiently 🔍 The Case for Lazy Reading When you reach for read_parquet, you get everything. That’s fine… until it’s not. Instead, scan_parquet gives you a lazy frame — and that changes everything. ...

April 24, 2025 · 2 min · 327 words · Joost