Optimizing ORDER BY: Issues Involving Filesort and Large Fields

Have you ever encountered a situation where an ORDER BY query runs extremely slowly, taking seconds or even tens of seconds? The problem usually lies in filesort - one of the most common reasons for a massive drop in MySQL performance. In this article, we'll dive deep into how MySQL processes ORDER BY, how filesort works, and most importantly - how to optimize it to achieve maximum performance (potentially improving it up to 70x).

I. What is Filesort?

Before diving into details, let's understand filesort - the most crucial concept in this article:

Filesort is the sorting algorithm MySQL uses when it cannot use an index to sort the ORDER BY results. Despite the name "filesort", it doesn't always use a file on the disk:

Filesort in RAM (Fast):

When the data to be sorted fits inside the sort_buffer_size (default 256KB-2MB).

Time: 50-200ms for 10,000 rows
Sort_merge_passes = 0

Filesort on Disk (Very Slow):

When the data is larger than the sort buffer.

MySQL has to break it down, sort each chunk, write it to disk, and then merge it back.
Time: 5-30+ seconds for 1M+ rows.
Sort_merge_passes > 0 (can be up to 10-30 times!)
Disk I/O is 100-1000 times slower than RAM.

Why is filesort slow?

It must read all data requiring a sort into memory.
If it doesn't fit in memory → it must read/write to disk multiple times.
The merge operation consumes a lot of CPU and I/O.

Our Goal: Completely eliminate filesort by creating indexes properly!

Preparing a Sample Database for Practice

To help you practice and verify the optimization techniques in this article, here's a sample database with realistic data.

Step 1: Create Database and Tables

-- Create a new database
CREATE DATABASE IF NOT EXISTS performance_demo;
USE performance_demo;

-- Create customers table
CREATE TABLE customers (
    customer_id INT PRIMARY KEY AUTO_INCREMENT,
    customer_name VARCHAR(100) NOT NULL,
    email VARCHAR(100),
    country VARCHAR(50),
    created_at DATETIME DEFAULT CURRENT_TIMESTAMP
) ENGINE=InnoDB;

-- Create orders table
CREATE TABLE orders (
    order_id INT PRIMARY KEY AUTO_INCREMENT,
    customer_id INT NOT NULL,
    order_date DATETIME NOT NULL,
    total_amount DECIMAL(10, 2) NOT NULL,
    status ENUM('pending', 'processing', 'completed', 'cancelled'),
    shipping_country VARCHAR(50),
    created_at DATETIME DEFAULT CURRENT_TIMESTAMP,
    INDEX idx_customer (customer_id),
    FOREIGN KEY (customer_id) REFERENCES customers(customer_id)
) ENGINE=InnoDB;

Step 2: Create Stored Procedure to Generate Data

DELIMITER $
CREATE PROCEDURE generate_customers(IN num_customers INT)
BEGIN
    DECLARE i INT DEFAULT 1;
    WHILE i <= num_customers DO
        INSERT INTO customers (customer_name, email, country, created_at)
        VALUES (
            CONCAT('Customer_', i),
            CONCAT('customer', i, '@email.com'),
            ELT(FLOOR(1 + RAND() * 7), 'Vietnam', 'USA', 'UK', 'Japan', 'Korea', 'Singapore', 'Thailand'),
            DATE_SUB(NOW(), INTERVAL FLOOR(RAND() * 730) DAY)
        );
        SET i = i + 1;
    END WHILE;
END$

CREATE PROCEDURE generate_orders(IN num_orders INT)
BEGIN
    DECLARE i INT DEFAULT 1;
    DECLARE max_customer INT;
    SELECT MAX(customer_id) INTO max_customer FROM customers;
    
    WHILE i <= num_orders DO
        INSERT INTO orders (customer_id, order_date, total_amount, status, shipping_country, created_at)
        VALUES (
            1 + FLOOR(RAND() * max_customer),
            DATE_SUB(NOW(), INTERVAL FLOOR(RAND() * 365) DAY),
            10 + (RAND() * 990),
            ELT(FLOOR(1 + RAND() * 4), 'pending', 'processing', 'completed', 'cancelled'),
            ELT(FLOOR(1 + RAND() * 7), 'Vietnam', 'USA', 'UK', 'Japan', 'Korea', 'Singapore', 'Thailand'),
            DATE_SUB(NOW(), INTERVAL FLOOR(RAND() * 365) DAY)
        );
        SET i = i + 1;
    END WHILE;
END$
DELIMITER ;

Step 3: Generate the Sample Data

-- Create 1,000 customers and 100,000 orders
CALL generate_customers(1000);
CALL generate_orders(100000);

-- Verify the data
SELECT COUNT(*) FROM customers;
SELECT COUNT(*) FROM orders;

Step 4: Check Performance BEFORE Optimization

-- Enable profiling for accurate metrics
SET profiling = 1;

EXPLAIN SELECT order_id, order_date, total_amount, status 
FROM orders 
WHERE customer_id = 100 
ORDER BY order_date DESC 
LIMIT 20;

-- Execute query and measure time
SELECT order_id, order_date, total_amount, status 
FROM orders 
WHERE customer_id = 100 
ORDER BY order_date DESC 
LIMIT 20;

-- View execution time
SHOW PROFILES;

❌ EXPLAIN Results before Optimization:

type: ref → Uses idx_customer index (Good for WHERE)
key: idx_customer → Only filters customer_id
rows: 116 → Must examine 116 rows
Extra: Using filesort → ❌ STILL REQUIRES manual sorting!

The Issue: The index only covers customer_id, not order_date. It cannot be sorted chronologically using the index.

⏱️ Execution Time: ~0.0234 seconds (23.4ms with 100,000 rows). With a larger dataset (1M rows), time can balloon up to 0.5-2 seconds.

Step 5: Create Optimized Index and Remeasure

-- Create a composite index
CREATE INDEX idx_customer_date ON orders(customer_id, order_date DESC);

-- Reset profiling and measure again
SET profiling = 0;
SET profiling = 1;

-- Run query again
EXPLAIN SELECT order_id, order_date, total_amount, status 
FROM orders 
WHERE customer_id = 100 
ORDER BY order_date DESC 
LIMIT 20;

-- View execution time
SHOW PROFILES;

✅ EXPLAIN Results AFTER optimization:

key: idx_customer_date → Uses the NEW composite index
rows: 20 → Only examines 20 rows (instead of 116!)
Extra: Using where → ✅ NO MORE "Using filesort"!

⏱️ Execution Time: 0.0012 seconds (1.2ms)

Comparison:

With a larger dataset (1M rows, customers with 5000 orders):

✅ Reduced time from 23.4ms to 1.2ms (small dataset)
✅ Reduced time from 1.8s to 3ms (large dataset)
✅ Removed filesort entirely
✅ Decreased rows examined from 116 to 20

II. How Does ORDER BY Work?

When you run a query using ORDER BY, MySQL has two ways to sort the results:

1. Using Index (Most Optimized)

If a suitable index exists, MySQL reads the data in the predefined sorted order from the index. No extra sorting effort is necessary.

CREATE INDEX idx_order_date ON orders(order_date);

-- ✅ If you only SELECT columns present in the index
SELECT order_id, order_date FROM orders ORDER BY order_date;
-- Extra: Using index (Perfect, no filesort!)

-- ⚠️ If you SELECT * or multiple unindexed columns
SELECT * FROM orders ORDER BY order_date;
-- Extra: Using filesort (must still sort because it needs to read extra columns from the table!)

Crucial Note: An index alone isn't enough! MySQL avoids filesort only if:

There's an index on the ORDER BY column.
The query only SELECTs columns nested inside the index (Covering index).
OR the cost of reading sequentially from the index + a table lookup is cheaper than performing a filesort.

2. Using Filesort (No Index Available)

When there's no suitable index, MySQL must perform a filesort - a manual sorting operation.

SELECT * FROM orders WHERE status = 'completed' ORDER BY total_amount; 
-- ❌ Must sort manually → SLOW

When Does MySQL Avoid Filesort?

MySQL skirts around filesort if it meets these constraints:

✅ Case 1: Covering Index

CREATE INDEX idx_date_id_total ON orders(order_date, order_id, total_amount);
SELECT order_id, order_date, total_amount FROM orders ORDER BY order_date; 
-- Extra: Using index (all columns exist in the index!)

✅ Case 2: Index + Table Lookup is cheaper than Filesort

CREATE INDEX idx_customer_date ON orders(customer_id, order_date);
SELECT * FROM orders WHERE customer_id = 123 ORDER BY order_date LIMIT 20; 
-- Avoids filesort because it only needs to sequentially pull 20 rows from the index

❌ Case where Filesort is Mandatory

-- No index on order_date
SELECT * FROM orders ORDER BY order_date;

-- Has index but SELECT * grabs a bunch of columns
SELECT * FROM orders ORDER BY order_date; 
-- MySQL assumes filesort is cheaper than an index scan + a bunch of table lookups

-- Index doesn't match the ORDER BY field
SELECT * FROM orders WHERE status = 'completed' ORDER BY total_amount;

III. 8 Strategies to Optimize ORDER BY

1. Create the Appropriate Index (Most Important)

-- ❌ BEFORE: Filesort
SELECT * FROM orders WHERE customer_id = 123 ORDER BY order_date;

-- Create composite index
CREATE INDEX idx_customer_date ON orders(customer_id, order_date);

-- ✅ AFTER: No filesort, 70x faster!

Composite Index Principle: The order of the columns matters: (WHERE column, ORDER BY column). The index should always declare the WHERE filter column first, followed by the sorting column.

2. Increase sort_buffer_size

SHOW VARIABLES LIKE 'sort_buffer_size';
SET SESSION sort_buffer_size = 16777216; -- 16MB

⚠️ Warning:

Every connection creates its own buffer!
100 connections × 16MB = 1.6GB RAM
Suggested buffer: 256KB - 16MB

3. Utilize LIMIT

-- Only retrieve what is strictly necessary
SELECT * FROM orders ORDER BY order_date DESC LIMIT 10;

MySQL optimizes filesorts better when there is a LIMIT, opting to pause the sorting loop early.

4. Only SELECT Required Columns

-- ❌ SLOWER
SELECT * FROM orders ORDER BY order_date;

-- ✅ MUCH FASTER
SELECT order_id, order_date, total FROM orders ORDER BY order_date;

Diminishes data storage overhead required inside the sort buffer.

5. Covering Index

-- An index containing every desired column lookup
CREATE INDEX idx_covering ON orders(order_date, order_id, total);
SELECT order_id, order_date, total FROM orders ORDER BY order_date;
-- Extra: Using index (the best!)

6. Avoid ORDER BY with Functions

-- ❌ Impossible to use an index
SELECT * FROM orders ORDER BY ABS(total);

-- ✅ Can use an index
SELECT * FROM orders ORDER BY total;

7. Guarantee Deterministic Sorting

-- ❌ Non-deterministic sorting
SELECT * FROM orders ORDER BY status LIMIT 10;

-- ✅ Deterministic sorting
SELECT * FROM orders ORDER BY status, order_id LIMIT 10;

When encountering concurrent rows with the identical ORDER BY value, attaching an implicit unique column (order_id) forces strict sequence determinism.

8. Isolate Large Fields

-- Primary Table: Small footprint data
CREATE TABLE orders (
    order_id INT PRIMARY KEY,
    customer_id INT,
    order_date DATETIME,
    total_amount DECIMAL(10,2)
);

-- Separate Table: TEXT/BLOB
CREATE TABLE order_details (
    order_id INT PRIMARY KEY,
    description TEXT,
    notes TEXT
);

IV. The Problem with Large Fields

Fields like TEXT, BLOB, or length-heavy VARCHAR will critically hamper query completion latency. Why?

Impact on the Sort Buffer

Formula: Buffer Fit Rows = sort_buffer_size / avg_row_size

Example 1: Small Rows (100 bytes)
2MB buffer / 100 bytes = 20,000 rows ✅

Example 2: Rows encapsulating TEXT properties (5KB)
2MB buffer / 5KB = 400 rows ❌

Why are Large Fields a Problem?

They quickly bottleneck spatial memory inside the sort buffer → MySQL responds by flushing the workload to Disk.
The index size vastly scales → Spilling overflow from RAM cache.
Memory overhead damages Buffer Pool efficiency boundaries.
Input/Output escalates causing massive Read/Write loops onto physical hardware.

Solution:

✅ DO:

Detach completely TEXT/BLOB types onto decoupled tables.
Limit prefix indices for prolonged VARCHAR: CREATE INDEX idx(column(50))
Only SELECT fundamental columns!

❌ DON'T:

Apply SELECT * where standard tables encompass TEXT/BLOB attributes.
Use ORDER BY dynamically atop strings.

V. Case Study: E-commerce Dashboard

Scenario: An e-commerce system displaying the 50 most recent orders for arbitrary users. The underlying orders table governs 5-million rows.

Before Tuning:

SELECT * FROM orders WHERE customer_id = 12345 ORDER BY created_at DESC LIMIT 50;

❌ Metrics indicated: 8.472 seconds duration.

idx_customer executes successfully filtering users.
SELECT * hauls monumental metadata structures like description TEXT.
The Sort Buffer spills requiring Disk execution measuring 5 separate Merge Passes.

After Tuning:

Step 1: Construct a Composite index

CREATE INDEX idx_customer_created ON orders(customer_id, created_at DESC);

Step 2: Omit non-critical queries

SELECT order_id, created_at, total_amount, status, shipping_address 
FROM orders WHERE customer_id = 12345 ORDER BY created_at DESC LIMIT 50;

✅ Results: The execution duration evaporated down to 29.8ms!

The Index streams purely sequentially without filesort rendering or disk usage.

ROI: Saves approximately 23.3 hours of database rendering bottleneck over the course of an standard 10,000 traffic-day curve.

VI. Best Practices

✅ DO - What you should focus on

Constantly draft indices supporting heavily queried ORDER BY variables.
Composite Indices structured effectively pairing WHERE + ORDER BY segments.
Attach LIMIT constraints.
Profile code leveraging EXPLAIN and monitoring Sort_merge_passes metrics.

❌ DON'T - What to avoid

Declaring SELECT * globally instead of strictly defining the return arrays.
Passing parameters involving math transformations into sorting mechanisms: ORDER BY ABS().
Over-saturating the sort_buffer_size parameter.
Ignoring diagnostics for complicated server executions.