Supercharging Your Database: Partitioning + Indexing for Blazing-Fast Queries

Datacenter cartoon composition with indoor view of data analysts working place with desktop computer and servers vector illustration

In large-scale systems, database performance can make or break the user experience.
When APIs slow down, dashboards lag, or permission checks take too long, the usual instinct is: “Add an index.”

While indexing is crucial, it’s not always enough — especially when dealing with billions of rows. In such cases, partitioning can be a game-changer.

Recently, while working for a client, I significantly improved their system performance by combining PostgreSQL partitioning with targeted indexing. The result? Queries that used to take hundreds of milliseconds now run in a few milliseconds.


🛠 What is Partitioning?

Partitioning is the process of splitting a large table into smaller, more manageable pieces, called partitions.
From the application’s perspective, it still looks like a single table, but under the hood, queries can skip irrelevant partitions — drastically reducing the data scanned.

Types of Partitioning in PostgreSQL

  1. Range Partitioning
    • Rows are stored based on a value range (e.g., date ranges).
    • Example: One partition for each month of data.
    • Best for time-series data like logs, metrics, or events.
  2. List Partitioning
    • Rows are stored based on discrete values in a column.
    • Example: Partition per country or region.
    • Best when the column has a small set of distinct values.
  3. Hash Partitioning
    • A hash function distributes rows evenly across partitions.
    • Example: Partition by tenant_id for a multi-tenant SaaS system.
    • Best for evenly distributing data without natural ranges.
  4. Composite Partitioning
    • Combines two or more methods.
    • Example: First partition by range (year), then subpartition by hash (tenant_id).

PostgreSQL supports native partitioning since version 10, and it automatically routes inserts and queries to the correct partition when set up properly.


🔍 What is Indexing?

An index is like a fast lookup table for your database — instead of scanning the whole table, the database can jump directly to the rows that match your query conditions.

Types of common indexes in PostgreSQL:

  • B-Tree Indexes → Best for equality and range lookups (most common)
  • Hash Indexes → Best for equality checks (less common, more specialized)
  • GIN / GiST Indexes → Best for full-text search or JSONB fields

Indexes speed up lookups, but as tables grow into hundreds of millions of rows, even indexes can become large and slow. That’s where partitioning helps: instead of one massive index, you get many small indexes — one per partition.


🧩 Real-World Scenario: Massive Permissions Lookup

My client runs a large authentication and authorization system that manages:

  • 10 million users (entities)
  • 5 million resources (APIs, files, services)
  • Billions of permission mappings

Every API request must check:

“Does this user have permission X for resource Y?”

A simplified schema:

CREATE TABLE permissions (
    organization_id INT,
    entity_id BIGINT,
    resource_id BIGINT,
    permission_type TEXT,
    granted_at TIMESTAMP
);

🐌 The Problem

Even with multiple indexes:

CREATE INDEX idx_entity_perm 
    ON permissions (entity_id, permission_type);

CREATE INDEX idx_resource_perm 
    ON permissions (resource_id, permission_type);

Queries like:

SELECT 1
FROM permissions
WHERE organization_id = 32
  AND entity_id = 983233
  AND resource_id = 424
  AND permission_type = 'read';

…took 150–250 ms under heavy load.

Why?

  • The table was too large
  • Indexes were too big, making scans expensive
  • All queries were scoped to a specific organization (tenant), but the database still had to search across the entire table

⚙️ My Solution: Tenant-Based Hash Partitioning + Targeted Indexing

After profiling with EXPLAIN ANALYZE, I saw a clear pattern:

  • Every query filtered by organization_id
  • Inside that scope, queries filtered by entity_id and resource_id

So, I decided to partition by organization_id using hash partitioning.

Step 1: Create Partitioned Table





CREATE TABLE permissions (
    organization_id INT,
    entity_id BIGINT,
    resource_id BIGINT,
    permission_type TEXT,
    granted_at TIMESTAMP
) PARTITION BY HASH (organization_id);

Step 2: Create Hash Partitions





CREATE TABLE permissions_org_0
    PARTITION OF permissions FOR VALUES WITH (MODULUS 8, REMAINDER 0);

CREATE TABLE permissions_org_1
    PARTITION OF permissions FOR VALUES WITH (MODULUS 8, REMAINDER 1);

-- ... up to permissions_org_7

This split the data evenly across 8 partitions.

Step 3: Add Indexes Inside Each Partition

CREATE INDEX idx_entity_resource_perm
    ON permissions_org_0 (entity_id, resource_id, permission_type);

Step 4: Query Execution

Now, when running:

SELECT 1
FROM permissions
WHERE organization_id = 32
  AND entity_id = 983233
  AND resource_id = 424
  AND permission_type = 'read';
  • PostgreSQL immediately routes the query to the right partition
  • The small index inside that partition is scanned instead of a massive global index

📈 Performance Boost

Before: 150–250 ms
After: 2–5 ms

This improvement allowed the client to handle more concurrent API requests, reduce DB load, and eliminate permission-check bottlenecks.

📊 Partitioning vs Indexing: When to Use What?

FeatureIndexingPartitioning
PurposeSpeeds up row lookupsReduces scanned data
Best forSelective columnsAlways-filtered columns
ComplexityLowMedium–High
ScalingSlower at huge scaleScales better with data growth
Write impactMinimalSlight overhead
MaintenanceEasyNeeds automation

🤝 Why They Work Best Together

  • Partitioning: Skips irrelevant data entirely
  • Indexing: Finds the exact row inside the relevant data

In this client project, hash partitioning by organization_id plus indexes on (entity_id, resource_id, permission_type) made the permission checks instant.


⚠️ Things to Watch

  • Don’t over-partition (too many partitions slow the planner)
  • Pick the right key — it must be in almost every query’s WHERE clause
  • Joins across partitions can be tricky
  • Automate partition creation if the key space changes

🧠 Final Takeaway

Think of indexing as a magnifying glass — fast for finding something in a book.
But when the “book” is a warehouse, partitioning organizes shelves so you only search in one section.

For massive multi-tenant systems like authentication/permissions, partitioning + indexing isn’t just an optimization — it’s a necessity for scalability.


Comments

Leave a Reply

Your email address will not be published. Required fields are marked *