Have you ever encountered a situation where an ORDER BY query runs extremely slowly, taking seconds or even tens of seconds? The problem usually lies in filesort - one of the most common reasons for a massive drop in MySQL performance. In this article, we'll dive deep into how MySQL processes ORDER BY, how filesort works, and most importantly - how to optimize it to achieve maximum performance (potentially improving it up to 70x).
I. What is Filesort?
Before diving into details, let's understand filesort - the most crucial concept in this article:
Filesort is the sorting algorithm MySQL uses when it cannot use an index to sort the ORDER BY results. Despite the name "filesort", it doesn't always use a file on the disk:
Filesort in RAM (Fast):
When the data to be sorted fits inside the sort_buffer_size (default 256KB-2MB).
- Time: 50-200ms for 10,000 rows
Sort_merge_passes= 0
Filesort on Disk (Very Slow):
When the data is larger than the sort buffer.
- MySQL has to break it down, sort each chunk, write it to disk, and then merge it back.
- Time: 5-30+ seconds for 1M+ rows.
Sort_merge_passes> 0 (can be up to 10-30 times!)- Disk I/O is 100-1000 times slower than RAM.
Why is filesort slow?
- It must read all data requiring a sort into memory.
- If it doesn't fit in memory → it must read/write to disk multiple times.
- The merge operation consumes a lot of CPU and I/O.
Our Goal: Completely eliminate filesort by creating indexes properly!
Preparing a Sample Database for Practice
To help you practice and verify the optimization techniques in this article, here's a sample database with realistic data.
Step 1: Create Database and Tables
-- Create a new database
CREATE DATABASE IF NOT EXISTS performance_demo;
USE performance_demo;
-- Create customers table
CREATE TABLE customers (
customer_id INT PRIMARY KEY AUTO_INCREMENT,
customer_name VARCHAR(100) NOT NULL,
email VARCHAR(100),
country VARCHAR(50),
created_at DATETIME DEFAULT CURRENT_TIMESTAMP
) ENGINE=InnoDB;
-- Create orders table
CREATE TABLE orders (
order_id INT PRIMARY KEY AUTO_INCREMENT,
customer_id INT NOT NULL,
order_date DATETIME NOT NULL,
total_amount DECIMAL(10, 2) NOT NULL,
status ENUM('pending', 'processing', 'completed', 'cancelled'),
shipping_country VARCHAR(50),
created_at DATETIME DEFAULT CURRENT_TIMESTAMP,
INDEX idx_customer (customer_id),
FOREIGN KEY (customer_id) REFERENCES customers(customer_id)
) ENGINE=InnoDB;
Step 2: Create Stored Procedure to Generate Data
DELIMITER $
CREATE PROCEDURE generate_customers(IN num_customers INT)
BEGIN
DECLARE i INT DEFAULT 1;
WHILE i <= num_customers DO
INSERT INTO customers (customer_name, email, country, created_at)
VALUES (
CONCAT('Customer_', i),
CONCAT('customer', i, '@email.com'),
ELT(FLOOR(1 + RAND() * 7), 'Vietnam', 'USA', 'UK', 'Japan', 'Korea', 'Singapore', 'Thailand'),
DATE_SUB(NOW(), INTERVAL FLOOR(RAND() * 730) DAY)
);
SET i = i + 1;
END WHILE;
END$
CREATE PROCEDURE generate_orders(IN num_orders INT)
BEGIN
DECLARE i INT DEFAULT 1;
DECLARE max_customer INT;
SELECT MAX(customer_id) INTO max_customer FROM customers;
WHILE i <= num_orders DO
INSERT INTO orders (customer_id, order_date, total_amount, status, shipping_country, created_at)
VALUES (
1 + FLOOR(RAND() * max_customer),
DATE_SUB(NOW(), INTERVAL FLOOR(RAND() * 365) DAY),
10 + (RAND() * 990),
ELT(FLOOR(1 + RAND() * 4), 'pending', 'processing', 'completed', 'cancelled'),
ELT(FLOOR(1 + RAND() * 7), 'Vietnam', 'USA', 'UK', 'Japan', 'Korea', 'Singapore', 'Thailand'),
DATE_SUB(NOW(), INTERVAL FLOOR(RAND() * 365) DAY)
);
SET i = i + 1;
END WHILE;
END$
DELIMITER ;
Step 3: Generate the Sample Data
-- Create 1,000 customers and 100,000 orders
CALL generate_customers(1000);
CALL generate_orders(100000);
-- Verify the data
SELECT COUNT(*) FROM customers;
SELECT COUNT(*) FROM orders;
Step 4: Check Performance BEFORE Optimization
-- Enable profiling for accurate metrics
SET profiling = 1;
EXPLAIN SELECT order_id, order_date, total_amount, status
FROM orders
WHERE customer_id = 100
ORDER BY order_date DESC
LIMIT 20;
-- Execute query and measure time
SELECT order_id, order_date, total_amount, status
FROM orders
WHERE customer_id = 100
ORDER BY order_date DESC
LIMIT 20;
-- View execution time
SHOW PROFILES;
❌ EXPLAIN Results before Optimization:
- type:
ref→ Usesidx_customerindex (Good for WHERE) - key:
idx_customer→ Only filterscustomer_id - rows:
116→ Must examine 116 rows - Extra:
Using filesort→ ❌ STILL REQUIRES manual sorting!
The Issue: The index only covers customer_id, not order_date. It cannot be sorted chronologically using the index.
⏱️ Execution Time: ~0.0234 seconds (23.4ms with 100,000 rows). With a larger dataset (1M rows), time can balloon up to 0.5-2 seconds.
Step 5: Create Optimized Index and Remeasure
-- Create a composite index
CREATE INDEX idx_customer_date ON orders(customer_id, order_date DESC);
-- Reset profiling and measure again
SET profiling = 0;
SET profiling = 1;
-- Run query again
EXPLAIN SELECT order_id, order_date, total_amount, status
FROM orders
WHERE customer_id = 100
ORDER BY order_date DESC
LIMIT 20;
-- View execution time
SHOW PROFILES;
✅ EXPLAIN Results AFTER optimization:
- key:
idx_customer_date→ Uses the NEW composite index - rows:
20→ Only examines 20 rows (instead of 116!) - Extra:
Using where→ ✅ NO MORE "Using filesort"!
⏱️ Execution Time: 0.0012 seconds (1.2ms)
Comparison:
With a larger dataset (1M rows, customers with 5000 orders):
- ✅ Reduced time from 23.4ms to 1.2ms (small dataset)
- ✅ Reduced time from 1.8s to 3ms (large dataset)
- ✅ Removed filesort entirely
- ✅ Decreased rows examined from 116 to 20
II. How Does ORDER BY Work?
When you run a query using ORDER BY, MySQL has two ways to sort the results:
1. Using Index (Most Optimized)
If a suitable index exists, MySQL reads the data in the predefined sorted order from the index. No extra sorting effort is necessary.
CREATE INDEX idx_order_date ON orders(order_date);
-- ✅ If you only SELECT columns present in the index
SELECT order_id, order_date FROM orders ORDER BY order_date;
-- Extra: Using index (Perfect, no filesort!)
-- ⚠️ If you SELECT * or multiple unindexed columns
SELECT * FROM orders ORDER BY order_date;
-- Extra: Using filesort (must still sort because it needs to read extra columns from the table!)
Crucial Note: An index alone isn't enough! MySQL avoids filesort only if:
- There's an index on the
ORDER BYcolumn. - The query only SELECTs columns nested inside the index (Covering index).
- OR the cost of reading sequentially from the index + a table lookup is cheaper than performing a filesort.
2. Using Filesort (No Index Available)
When there's no suitable index, MySQL must perform a filesort - a manual sorting operation.
SELECT * FROM orders WHERE status = 'completed' ORDER BY total_amount;
-- ❌ Must sort manually → SLOW
When Does MySQL Avoid Filesort?
MySQL skirts around filesort if it meets these constraints:
✅ Case 1: Covering Index
CREATE INDEX idx_date_id_total ON orders(order_date, order_id, total_amount);
SELECT order_id, order_date, total_amount FROM orders ORDER BY order_date;
-- Extra: Using index (all columns exist in the index!)
✅ Case 2: Index + Table Lookup is cheaper than Filesort
CREATE INDEX idx_customer_date ON orders(customer_id, order_date);
SELECT * FROM orders WHERE customer_id = 123 ORDER BY order_date LIMIT 20;
-- Avoids filesort because it only needs to sequentially pull 20 rows from the index
❌ Case where Filesort is Mandatory
-- No index on order_date
SELECT * FROM orders ORDER BY order_date;
-- Has index but SELECT * grabs a bunch of columns
SELECT * FROM orders ORDER BY order_date;
-- MySQL assumes filesort is cheaper than an index scan + a bunch of table lookups
-- Index doesn't match the ORDER BY field
SELECT * FROM orders WHERE status = 'completed' ORDER BY total_amount;
III. 8 Strategies to Optimize ORDER BY
1. Create the Appropriate Index (Most Important)
-- ❌ BEFORE: Filesort
SELECT * FROM orders WHERE customer_id = 123 ORDER BY order_date;
-- Create composite index
CREATE INDEX idx_customer_date ON orders(customer_id, order_date);
-- ✅ AFTER: No filesort, 70x faster!
Composite Index Principle: The order of the columns matters: (WHERE column, ORDER BY column). The index should always declare the WHERE filter column first, followed by the sorting column.
2. Increase sort_buffer_size
SHOW VARIABLES LIKE 'sort_buffer_size';
SET SESSION sort_buffer_size = 16777216; -- 16MB
⚠️ Warning:
- Every connection creates its own buffer!
- 100 connections × 16MB = 1.6GB RAM
- Suggested buffer: 256KB - 16MB
3. Utilize LIMIT
-- Only retrieve what is strictly necessary
SELECT * FROM orders ORDER BY order_date DESC LIMIT 10;
MySQL optimizes filesorts better when there is a LIMIT, opting to pause the sorting loop early.
4. Only SELECT Required Columns
-- ❌ SLOWER
SELECT * FROM orders ORDER BY order_date;
-- ✅ MUCH FASTER
SELECT order_id, order_date, total FROM orders ORDER BY order_date;
Diminishes data storage overhead required inside the sort buffer.
5. Covering Index
-- An index containing every desired column lookup
CREATE INDEX idx_covering ON orders(order_date, order_id, total);
SELECT order_id, order_date, total FROM orders ORDER BY order_date;
-- Extra: Using index (the best!)
6. Avoid ORDER BY with Functions
-- ❌ Impossible to use an index
SELECT * FROM orders ORDER BY ABS(total);
-- ✅ Can use an index
SELECT * FROM orders ORDER BY total;
7. Guarantee Deterministic Sorting
-- ❌ Non-deterministic sorting
SELECT * FROM orders ORDER BY status LIMIT 10;
-- ✅ Deterministic sorting
SELECT * FROM orders ORDER BY status, order_id LIMIT 10;
When encountering concurrent rows with the identical ORDER BY value, attaching an implicit unique column (order_id) forces strict sequence determinism.
8. Isolate Large Fields
-- Primary Table: Small footprint data
CREATE TABLE orders (
order_id INT PRIMARY KEY,
customer_id INT,
order_date DATETIME,
total_amount DECIMAL(10,2)
);
-- Separate Table: TEXT/BLOB
CREATE TABLE order_details (
order_id INT PRIMARY KEY,
description TEXT,
notes TEXT
);
IV. The Problem with Large Fields
Fields like TEXT, BLOB, or length-heavy VARCHAR will critically hamper query completion latency. Why?
Impact on the Sort Buffer
Formula: Buffer Fit Rows = sort_buffer_size / avg_row_size
Example 1: Small Rows (100 bytes)
2MB buffer / 100 bytes = 20,000 rows ✅
Example 2: Rows encapsulating TEXT properties (5KB)
2MB buffer / 5KB = 400 rows ❌
Why are Large Fields a Problem?
- They quickly bottleneck spatial memory inside the sort buffer → MySQL responds by flushing the workload to Disk.
- The index size vastly scales → Spilling overflow from RAM cache.
- Memory overhead damages Buffer Pool efficiency boundaries.
- Input/Output escalates causing massive Read/Write loops onto physical hardware.
Solution:
✅ DO:
- Detach completely
TEXT/BLOBtypes onto decoupled tables. - Limit prefix indices for prolonged
VARCHAR:CREATE INDEX idx(column(50)) - Only
SELECTfundamental columns!
❌ DON'T:
- Apply
SELECT *where standard tables encompassTEXT/BLOBattributes. - Use
ORDER BYdynamically atop strings.
V. Case Study: E-commerce Dashboard
Scenario: An e-commerce system displaying the 50 most recent orders for arbitrary users. The underlying orders table governs 5-million rows.
Before Tuning:
SELECT * FROM orders WHERE customer_id = 12345 ORDER BY created_at DESC LIMIT 50;
❌ Metrics indicated: 8.472 seconds duration.
idx_customerexecutes successfully filtering users.SELECT *hauls monumental metadata structures likedescription TEXT.- The Sort Buffer spills requiring Disk execution measuring 5 separate
Merge Passes.
After Tuning:
Step 1: Construct a Composite index
CREATE INDEX idx_customer_created ON orders(customer_id, created_at DESC);
Step 2: Omit non-critical queries
SELECT order_id, created_at, total_amount, status, shipping_address
FROM orders WHERE customer_id = 12345 ORDER BY created_at DESC LIMIT 50;
✅ Results: The execution duration evaporated down to 29.8ms!
- The Index streams purely sequentially without filesort rendering or disk usage.
ROI: Saves approximately 23.3 hours of database rendering bottleneck over the course of an standard 10,000 traffic-day curve.
VI. Best Practices
✅ DO - What you should focus on
- Constantly draft indices supporting heavily queried
ORDER BYvariables. - Composite Indices structured effectively pairing
WHERE+ORDER BYsegments. - Attach
LIMITconstraints. - Profile code leveraging
EXPLAINand monitoringSort_merge_passesmetrics.
❌ DON'T - What to avoid
- Declaring
SELECT *globally instead of strictly defining the return arrays. - Passing parameters involving math transformations into sorting mechanisms:
ORDER BY ABS(). - Over-saturating the
sort_buffer_sizeparameter. - Ignoring diagnostics for complicated server executions.