← All Posts
mysqloptimizationdatabase

Optimizing ORDER BY: Issues Involving Filesort and Large Fields

Dive deep into how MySQL processes ORDER BY, how filesort works, and how to optimize it to achieve maximum performance and eliminate severe slowdowns.

Have you ever encountered a situation where an ORDER BY query runs extremely slowly, taking seconds or even tens of seconds? The problem usually lies in filesort - one of the most common reasons for a massive drop in MySQL performance. In this article, we'll dive deep into how MySQL processes ORDER BY, how filesort works, and most importantly - how to optimize it to achieve maximum performance (potentially improving it up to 70x).

I. What is Filesort?

Before diving into details, let's understand filesort - the most crucial concept in this article:

Filesort is the sorting algorithm MySQL uses when it cannot use an index to sort the ORDER BY results. Despite the name "filesort", it doesn't always use a file on the disk:

Filesort in RAM (Fast):

When the data to be sorted fits inside the sort_buffer_size (default 256KB-2MB).

  • Time: 50-200ms for 10,000 rows
  • Sort_merge_passes = 0

Filesort on Disk (Very Slow):

When the data is larger than the sort buffer.

  • MySQL has to break it down, sort each chunk, write it to disk, and then merge it back.
  • Time: 5-30+ seconds for 1M+ rows.
  • Sort_merge_passes > 0 (can be up to 10-30 times!)
  • Disk I/O is 100-1000 times slower than RAM.

Why is filesort slow?

  1. It must read all data requiring a sort into memory.
  2. If it doesn't fit in memory → it must read/write to disk multiple times.
  3. The merge operation consumes a lot of CPU and I/O.

Our Goal: Completely eliminate filesort by creating indexes properly!


Preparing a Sample Database for Practice

To help you practice and verify the optimization techniques in this article, here's a sample database with realistic data.

Step 1: Create Database and Tables

-- Create a new database
CREATE DATABASE IF NOT EXISTS performance_demo;
USE performance_demo;

-- Create customers table
CREATE TABLE customers (
    customer_id INT PRIMARY KEY AUTO_INCREMENT,
    customer_name VARCHAR(100) NOT NULL,
    email VARCHAR(100),
    country VARCHAR(50),
    created_at DATETIME DEFAULT CURRENT_TIMESTAMP
) ENGINE=InnoDB;

-- Create orders table
CREATE TABLE orders (
    order_id INT PRIMARY KEY AUTO_INCREMENT,
    customer_id INT NOT NULL,
    order_date DATETIME NOT NULL,
    total_amount DECIMAL(10, 2) NOT NULL,
    status ENUM('pending', 'processing', 'completed', 'cancelled'),
    shipping_country VARCHAR(50),
    created_at DATETIME DEFAULT CURRENT_TIMESTAMP,
    INDEX idx_customer (customer_id),
    FOREIGN KEY (customer_id) REFERENCES customers(customer_id)
) ENGINE=InnoDB;

Step 2: Create Stored Procedure to Generate Data

DELIMITER $
CREATE PROCEDURE generate_customers(IN num_customers INT)
BEGIN
    DECLARE i INT DEFAULT 1;
    WHILE i <= num_customers DO
        INSERT INTO customers (customer_name, email, country, created_at)
        VALUES (
            CONCAT('Customer_', i),
            CONCAT('customer', i, '@email.com'),
            ELT(FLOOR(1 + RAND() * 7), 'Vietnam', 'USA', 'UK', 'Japan', 'Korea', 'Singapore', 'Thailand'),
            DATE_SUB(NOW(), INTERVAL FLOOR(RAND() * 730) DAY)
        );
        SET i = i + 1;
    END WHILE;
END$

CREATE PROCEDURE generate_orders(IN num_orders INT)
BEGIN
    DECLARE i INT DEFAULT 1;
    DECLARE max_customer INT;
    SELECT MAX(customer_id) INTO max_customer FROM customers;
    
    WHILE i <= num_orders DO
        INSERT INTO orders (customer_id, order_date, total_amount, status, shipping_country, created_at)
        VALUES (
            1 + FLOOR(RAND() * max_customer),
            DATE_SUB(NOW(), INTERVAL FLOOR(RAND() * 365) DAY),
            10 + (RAND() * 990),
            ELT(FLOOR(1 + RAND() * 4), 'pending', 'processing', 'completed', 'cancelled'),
            ELT(FLOOR(1 + RAND() * 7), 'Vietnam', 'USA', 'UK', 'Japan', 'Korea', 'Singapore', 'Thailand'),
            DATE_SUB(NOW(), INTERVAL FLOOR(RAND() * 365) DAY)
        );
        SET i = i + 1;
    END WHILE;
END$
DELIMITER ;

Step 3: Generate the Sample Data

-- Create 1,000 customers and 100,000 orders
CALL generate_customers(1000);
CALL generate_orders(100000);

-- Verify the data
SELECT COUNT(*) FROM customers;
SELECT COUNT(*) FROM orders;

Step 4: Check Performance BEFORE Optimization

-- Enable profiling for accurate metrics
SET profiling = 1;

EXPLAIN SELECT order_id, order_date, total_amount, status 
FROM orders 
WHERE customer_id = 100 
ORDER BY order_date DESC 
LIMIT 20;

-- Execute query and measure time
SELECT order_id, order_date, total_amount, status 
FROM orders 
WHERE customer_id = 100 
ORDER BY order_date DESC 
LIMIT 20;

-- View execution time
SHOW PROFILES;

EXPLAIN Results before Optimization:

  • type: ref → Uses idx_customer index (Good for WHERE)
  • key: idx_customer → Only filters customer_id
  • rows: 116 → Must examine 116 rows
  • Extra: Using filesort → ❌ STILL REQUIRES manual sorting!

The Issue: The index only covers customer_id, not order_date. It cannot be sorted chronologically using the index.

⏱️ Execution Time: ~0.0234 seconds (23.4ms with 100,000 rows). With a larger dataset (1M rows), time can balloon up to 0.5-2 seconds.

Step 5: Create Optimized Index and Remeasure

-- Create a composite index
CREATE INDEX idx_customer_date ON orders(customer_id, order_date DESC);

-- Reset profiling and measure again
SET profiling = 0;
SET profiling = 1;

-- Run query again
EXPLAIN SELECT order_id, order_date, total_amount, status 
FROM orders 
WHERE customer_id = 100 
ORDER BY order_date DESC 
LIMIT 20;

-- View execution time
SHOW PROFILES;

EXPLAIN Results AFTER optimization:

  • key: idx_customer_date → Uses the NEW composite index
  • rows: 20 → Only examines 20 rows (instead of 116!)
  • Extra: Using where → ✅ NO MORE "Using filesort"!

⏱️ Execution Time: 0.0012 seconds (1.2ms)

Comparison:

With a larger dataset (1M rows, customers with 5000 orders):

  • ✅ Reduced time from 23.4ms to 1.2ms (small dataset)
  • ✅ Reduced time from 1.8s to 3ms (large dataset)
  • ✅ Removed filesort entirely
  • ✅ Decreased rows examined from 116 to 20

II. How Does ORDER BY Work?

When you run a query using ORDER BY, MySQL has two ways to sort the results:

1. Using Index (Most Optimized)

If a suitable index exists, MySQL reads the data in the predefined sorted order from the index. No extra sorting effort is necessary.

CREATE INDEX idx_order_date ON orders(order_date);

-- ✅ If you only SELECT columns present in the index
SELECT order_id, order_date FROM orders ORDER BY order_date;
-- Extra: Using index (Perfect, no filesort!)

-- ⚠️ If you SELECT * or multiple unindexed columns
SELECT * FROM orders ORDER BY order_date;
-- Extra: Using filesort (must still sort because it needs to read extra columns from the table!)

Crucial Note: An index alone isn't enough! MySQL avoids filesort only if:

  1. There's an index on the ORDER BY column.
  2. The query only SELECTs columns nested inside the index (Covering index).
  3. OR the cost of reading sequentially from the index + a table lookup is cheaper than performing a filesort.

2. Using Filesort (No Index Available)

When there's no suitable index, MySQL must perform a filesort - a manual sorting operation.

SELECT * FROM orders WHERE status = 'completed' ORDER BY total_amount; 
-- ❌ Must sort manually → SLOW

When Does MySQL Avoid Filesort?

MySQL skirts around filesort if it meets these constraints:

Case 1: Covering Index

CREATE INDEX idx_date_id_total ON orders(order_date, order_id, total_amount);
SELECT order_id, order_date, total_amount FROM orders ORDER BY order_date; 
-- Extra: Using index (all columns exist in the index!)

Case 2: Index + Table Lookup is cheaper than Filesort

CREATE INDEX idx_customer_date ON orders(customer_id, order_date);
SELECT * FROM orders WHERE customer_id = 123 ORDER BY order_date LIMIT 20; 
-- Avoids filesort because it only needs to sequentially pull 20 rows from the index

Case where Filesort is Mandatory

-- No index on order_date
SELECT * FROM orders ORDER BY order_date;

-- Has index but SELECT * grabs a bunch of columns
SELECT * FROM orders ORDER BY order_date; 
-- MySQL assumes filesort is cheaper than an index scan + a bunch of table lookups

-- Index doesn't match the ORDER BY field
SELECT * FROM orders WHERE status = 'completed' ORDER BY total_amount;

III. 8 Strategies to Optimize ORDER BY

1. Create the Appropriate Index (Most Important)

-- ❌ BEFORE: Filesort
SELECT * FROM orders WHERE customer_id = 123 ORDER BY order_date;

-- Create composite index
CREATE INDEX idx_customer_date ON orders(customer_id, order_date);

-- ✅ AFTER: No filesort, 70x faster!

Composite Index Principle: The order of the columns matters: (WHERE column, ORDER BY column). The index should always declare the WHERE filter column first, followed by the sorting column.

2. Increase sort_buffer_size

SHOW VARIABLES LIKE 'sort_buffer_size';
SET SESSION sort_buffer_size = 16777216; -- 16MB

⚠️ Warning:

  • Every connection creates its own buffer!
  • 100 connections × 16MB = 1.6GB RAM
  • Suggested buffer: 256KB - 16MB

3. Utilize LIMIT

-- Only retrieve what is strictly necessary
SELECT * FROM orders ORDER BY order_date DESC LIMIT 10;

MySQL optimizes filesorts better when there is a LIMIT, opting to pause the sorting loop early.

4. Only SELECT Required Columns

-- ❌ SLOWER
SELECT * FROM orders ORDER BY order_date;

-- ✅ MUCH FASTER
SELECT order_id, order_date, total FROM orders ORDER BY order_date;

Diminishes data storage overhead required inside the sort buffer.

5. Covering Index

-- An index containing every desired column lookup
CREATE INDEX idx_covering ON orders(order_date, order_id, total);
SELECT order_id, order_date, total FROM orders ORDER BY order_date;
-- Extra: Using index (the best!)

6. Avoid ORDER BY with Functions

-- ❌ Impossible to use an index
SELECT * FROM orders ORDER BY ABS(total);

-- ✅ Can use an index
SELECT * FROM orders ORDER BY total;

7. Guarantee Deterministic Sorting

-- ❌ Non-deterministic sorting
SELECT * FROM orders ORDER BY status LIMIT 10;

-- ✅ Deterministic sorting
SELECT * FROM orders ORDER BY status, order_id LIMIT 10;

When encountering concurrent rows with the identical ORDER BY value, attaching an implicit unique column (order_id) forces strict sequence determinism.

8. Isolate Large Fields

-- Primary Table: Small footprint data
CREATE TABLE orders (
    order_id INT PRIMARY KEY,
    customer_id INT,
    order_date DATETIME,
    total_amount DECIMAL(10,2)
);

-- Separate Table: TEXT/BLOB
CREATE TABLE order_details (
    order_id INT PRIMARY KEY,
    description TEXT,
    notes TEXT
);

IV. The Problem with Large Fields

Fields like TEXT, BLOB, or length-heavy VARCHAR will critically hamper query completion latency. Why?

Impact on the Sort Buffer

Formula: Buffer Fit Rows = sort_buffer_size / avg_row_size

Example 1: Small Rows (100 bytes)
2MB buffer / 100 bytes = 20,000 rows ✅

Example 2: Rows encapsulating TEXT properties (5KB)
2MB buffer / 5KB = 400 rows ❌

Why are Large Fields a Problem?

  1. They quickly bottleneck spatial memory inside the sort buffer → MySQL responds by flushing the workload to Disk.
  2. The index size vastly scales → Spilling overflow from RAM cache.
  3. Memory overhead damages Buffer Pool efficiency boundaries.
  4. Input/Output escalates causing massive Read/Write loops onto physical hardware.

Solution:

DO:

  • Detach completely TEXT/BLOB types onto decoupled tables.
  • Limit prefix indices for prolonged VARCHAR: CREATE INDEX idx(column(50))
  • Only SELECT fundamental columns!

DON'T:

  • Apply SELECT * where standard tables encompass TEXT/BLOB attributes.
  • Use ORDER BY dynamically atop strings.

V. Case Study: E-commerce Dashboard

Scenario: An e-commerce system displaying the 50 most recent orders for arbitrary users. The underlying orders table governs 5-million rows.

Before Tuning:

SELECT * FROM orders WHERE customer_id = 12345 ORDER BY created_at DESC LIMIT 50;

❌ Metrics indicated: 8.472 seconds duration.

  • idx_customer executes successfully filtering users.
  • SELECT * hauls monumental metadata structures like description TEXT.
  • The Sort Buffer spills requiring Disk execution measuring 5 separate Merge Passes.

After Tuning:

Step 1: Construct a Composite index

CREATE INDEX idx_customer_created ON orders(customer_id, created_at DESC);

Step 2: Omit non-critical queries

SELECT order_id, created_at, total_amount, status, shipping_address 
FROM orders WHERE customer_id = 12345 ORDER BY created_at DESC LIMIT 50;

Results: The execution duration evaporated down to 29.8ms!

  • The Index streams purely sequentially without filesort rendering or disk usage.

ROI: Saves approximately 23.3 hours of database rendering bottleneck over the course of an standard 10,000 traffic-day curve.


VI. Best Practices

✅ DO - What you should focus on

  1. Constantly draft indices supporting heavily queried ORDER BY variables.
  2. Composite Indices structured effectively pairing WHERE + ORDER BY segments.
  3. Attach LIMIT constraints.
  4. Profile code leveraging EXPLAIN and monitoring Sort_merge_passes metrics.

❌ DON'T - What to avoid

  1. Declaring SELECT * globally instead of strictly defining the return arrays.
  2. Passing parameters involving math transformations into sorting mechanisms: ORDER BY ABS().
  3. Over-saturating the sort_buffer_size parameter.
  4. Ignoring diagnostics for complicated server executions.