0% found this document useful (0 votes)
104 views

Practical Mysql Indexing Guidelines

The document provides guidelines for efficient indexing in MySQL databases, noting that indexes can improve query performance but also slow down write operations. It recommends focusing indexes on columns used in WHERE clauses to filter data and including all columns needed to sort results. Composite indexes on multiple columns are generally better than separate indexes when columns are used together in queries.

Uploaded by

Azizul Huq
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
104 views

Practical Mysql Indexing Guidelines

The document provides guidelines for efficient indexing in MySQL databases, noting that indexes can improve query performance but also slow down write operations. It recommends focusing indexes on columns used in WHERE clauses to filter data and including all columns needed to sort results. Composite indexes on multiple columns are generally better than separate indexes when columns are used together in queries.

Uploaded by

Azizul Huq
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 35

Practical MySQL indexing guidelines

Percona Live
October 24th-25th, 2011 Stéphane Combaudon
London, UK stephane.combaudon@dailymotion.com
Agenda

 Introduction
 Bad indexes & performance drops
 Guidelines for efficient indexing
 Tools and methods to improve index usage

2
Introduction

3
Goals

 Having fun with indexes!!!

 Getting rid of trial-and-error approach

 Knowing performance penalty of bad indexes

 Being productive
 Knowing simple rules to design indexes
 Knowing tools that can help 4
Indexing basics

 Index: data structure to speed up SELECTs


 Think of an index in a book
 In MySQL, key = index
 We'll consider that indexes are trees

 InnoDB's clustered index


 Data is stored with the PK: PK lookups are fast
 Secondary keys hold the PK values
 Designing InnoDB's PKs with care is critical for perf.
5
Strengths

 An index can filter and/or sort values

 An index can contain all the fields needed for a


query
 No need to access data anymore

 A leftmost prefix can be used


 Indexes on several columns are useful
 Order of columns in composite keys is important
6
Limitations

 MySQL only uses 1 index per table per query


 Ok, that's not 100% true (OR clauses...)
 Think of composite indexes when you can!!

 Can't index full TEXT fields


 You must use a prefix
 Same for BLOBS and long VARCHARs

 Maintaining an index has a cost


 Read speed vs write speed 7
Sample table

 CREATE TABLE t (
   id INT NOT NULL AUTO_INCREMENT,
   a INT NOT NULL DEFAULT 0,
   b INT NOT NULL DEFAULT 0,
   [more columns here]
   PRIMARY KEY(id)
 )ENGINE=InnoDB;

 Populated with ”many” rows


 Means that queries against table are ”slow”

 Replace ”many” and ”slow” with your own values


8
Bad indexes &
performance drops

9
Adding an index

 3 main consequences:
 Can speed up queries (good)
 Increases the size of your dataset (bad)
 Slows down writes (bad)

 How big is the write slow-down?


 Let's have simple tests

10
Write slow-downs, pictured
In-memory test

300 Baseline is 100 for 1 key


250 for both graphs
Time to load data

200
2 idx
3 idx
150
4 idx
100
For in-memory
workloads, adding 2 keys
50
makes perf. 2x worse
0
Number of indexes

On-disk test

12000

10000

For on-disk workloads,


Time to load data

8000
2 idx

6000
3 idx adding 2 keys make perf.
4 idx
4000
40x worse!!
2000

0
11
Number of indexes
So what?

 Removing bad indexes is crucial for perf.


 Especially for write-intensive workloads
 Tools will help us

 What if your workload is read-intensive?


 A few hot tables may handle most of the writes
 These tables will be write-intensive

12
Identifying bad indexes

 Before removing bad indexes, identify them!

 What is a bad index?


 Duplicate indexes: always bad
 Redondant indexes: generally bad
 Low-cardinality indexes: depends
 Unused indexes: always bad

13
Guidelines for efficient indexes

14
Before we start...

 Indexing is not an exact science


 But guessing is not the best way to design indexes

 A few simple rules will help 90% of the time

 Always check your assumptions


 EXPLAIN does not tell you everything
 Time your queries with different index combinations
 SHOW PROFILES is often valuable

 Slow query log is a good place to start! 15


Rule #1: Filter

Q1: SELECT * FROM t WHERE a = 10 AND b = 20
 Without an index, always a full table scan
1. mysql> EXPLAIN SELECT * FROM t WHERE a = 10 AND b = 20\G
2. *********** 1. row ***********
3.            id: 1
4.   select_type: SIMPLE
5.         table: t
ALL means
6.          type: ALL
table scan
7. possible_keys: NULL
8.           key: NULL
9.       key_len: NULL
10.          ref: NULL
Estimated #
12.         rows: 1000545
of rows to read
12.        Extra: Using where

Post-filtering needed
to discard the non-matching rows 16
Rule #1: Filter

 Idea: filter as much data as possible by


focusing on the WHERE clause

 Candidates for Q1:


 key(a), key(b), key(a,b), key(b,a)

 Condition is on both a and b with an AND


 A composite index should be better
 Let's test!
17
Rule #1: Filter
1. mysql> EXPLAIN SELECT * ... 1. mysql> EXPLAIN SELECT * ...
2. ********** 1. row ********** 2. ********** 1. row **********
3.            [...] 3.            [...]
4.           key: a 4.           key: b
5.       key_len: 4 5.       key_len: 4
6.            [...] 6.            [...]
7.          rows: 20 7.          rows: 67368
Exec time: 0.00s Exec time: 0.20s

1. mysql> EXPLAIN SELECT * ... 1. mysql> EXPLAIN SELECT * ...
2. ********** 1. row ********** 2. ********** 1. row **********
3.            [...] 3.            [...]
4.           key: ab 4.           key: ba
5.       key_len: 8 5.       key_len: 8
6.            [...] 6.            [...]
7.          rows: 10 7.          rows: 10
Exec time: 0.00s Exec time: 0.00s

Same perf. for this query


Other queries will guide us 18
to choose between them
Rule #2: Sort

Q2: SELECT * FROM t WHERE a = 10 ORDER BY b
 Remember: indexed values are sorted

 An index can avoid costly filesorts


 Think of filesorts performed on on-disk temp tables
 ORDER BY clause must be a leftmost prefix of the
index

 Caveat: an index scan is fast in itself, but


retrieving the rows in index order may be slow
 Seq. scan on index but random access on table 19
Rule #2: Sort

 Let's try key(b) for Q2 vs full table scan

1. mysql> EXPLAIN SELECT * ... 1. mysql> EXPLAIN SELECT * ...
2. ********** 1. row ********** 2. ********** 1. row **********
3.            […] 3.            […]
4.          type: index 4.          type: ALL
5.           key: b 4.           key: NULL
6.       key_len: 4 5.       key_len: NULL
7.            [...] 6.            [...]
8.          rows: 1000638 7.          rows: 1000638
9.         Extra: Using where 8.         Extra: Using where;    
                  Using filesort

Exec time: 1.52s Exec time: 0.37s

EXPLAIN suggest
key(b) is better,
20
but it's wrong!
Rule #2: Sort

 An index is not always the best for sorting

 If possible, try to sort and filter

 Exception to the leftmost prefix rule:


 Leading columns appearing in the WHERE clause
as constants can fill the holes in the index
 WHERE a = 10 ORDER BY b: key(a,b) can
filter and sort
 Not true with WHERE a > 10 ORDER BY b
21
Rule #2: Sort

 With key(a,b)
1. mysql> EXPLAIN SELECT * FROM t 1. mysql> EXPLAIN SELECT * FROM t
WHERE a = 10 ORDER BY b\G WHERE a > 10 ORDER BY b\G
2. ********** 1. row ********** 2. ********** 1. row **********
3.            [...] 3.            [...]
4.          type: ref 4.          type: ALL
5.           key: ab 5.           key: NULL
6.       key_len: 8 6.       key_len: NULL
7.            [...] 7.            [...]
8.          rows: 20 8.          rows: 1000638
9.         Extra: 9.         Extra: Using where;    
                  Using filesort

Could have been a range scan


Depends on the distribution
of the values
22
Rule #3: Cover

Q3: SELECT a,b FROM t WHERE a > 100;

 With key(a), you filter efficiently

 But with key(a,b)


 You filter
 The index holds all the columns you need
 Means you don't need to access data

 key(a,b) is a covering index


23
Rule #3: Cover

 Back to InnoDB's clustered index


 It is always covering
 SELECT by PK is the fastest access with InnoDB
 Take care of your PKs!!

 Remember full table scan + filesort vs index?


 If the index used for sorting is also covering, it will
outperform the table scan

24
Rating an index

 An index can give you 3 benefits: filtering,


sorting, covering

 1-star index: 1 property


 2-star index: 2 properties
 3-star index: 3 properties

 This is my own rating, other systems exist


25
Range queries and ORDER BY

Q4: SELECT * FROM t WHERE a > 10 and b = 20 ORDER BY a
mysql> EXPLAIN SELECT * ...\G
********** 1. row **********
           [...] Key filters and sorts, but
         type: range filtering is not efficient.
          key: a Getting data is very slow
         rows: 500319 (random access + I/O-bound)
        Extra: Using where
Exec time: 35.9s

mysql> EXPLAIN SELECT * ...\G
********** 1. row **********
           [...]
         type: ref Key filters but doesn't sort.
possible_keys: a,b,ab,ba Filtering is efficient so getting data,
          key: ba post-filtering and post-sorting
         rows: 64814 is not too slow
        Extra: Using where;
26
               Using filesort
Exec time: 0.2s
Joins and ORDER BY

 All columns in the ORDER BY clause must


refer to the 1st table

 Forcing the join order with SELECT


STRAIGHT_JOIN is sometimes useful

 Sometimes you can't fulfill this condition


 This can be a reason to denormalize

27
Tools and methods
to improve index usage

28
Userstats v2

 You need Percona Server or MariaDB 5.2+


mysql> SELECT s.table_name,s.index_name,rows_read
       FROM information_schema.statistics s
       LEFT JOIN information_schema.index_statistics i
       ON (i.table_schema=s.table_schema Table added by
           AND i.table_name=s.table_name this feature
           AND i.index_name=s.index_name)
       WHERE s.table_name='comment'
             AND s.table_schema='mydb'
             AND seq_in_index=1; Deals with
composite indexes

+­­­­­­­­­­­­+­­­­­­­­­­­­­­­­­­­­­­+­­­­­­­­­­­+
| table_name | index_name           | rows_read |
+­­­­­­­­­­­­+­­­­­­­­­­­­­­­­­­­­­­+­­­­­­­­­­­+
| comment    | PRIMARY              |     50361 |
| comment    | user_id              |      NULL | Useless
| comment    | video_comment_idx    |  18276197 |
| comment    | created_language_idx |      NULL | Useless
+­­­­­­­­­­­­+­­­­­­­­­­­­­­­­­­­­­­+­­­­­­­­­­­+
29
Workload matters!

 OLAP server
+­­­­­­­­­­­­+­­­­­­­­­­­­­­­­­­­­­­+­­­­­­­­­­­+
| table_name | index_name           | rows_read |
+­­­­­­­­­­­­+­­­­­­­­­­­­­­­­­­­­­­+­­­­­­­­­­­+
| comment    | PRIMARY              |     50361 |
| comment    | user_id              |      NULL | Never used
| comment    | video_comment_idx    |  18276197 |
| comment    | created_language_idx |      NULL |
+­­­­­­­­­­­­+­­­­­­­­­­­­­­­­­­­­­­+­­­­­­­­­­­+

 OLTP server
+­­­­­­­­­­­­+­­­­­­­­­­­­­­­­­­­­­­+­­­­­­­­­­­+
| table_name | index_name           | rows_read |
+­­­­­­­­­­­­+­­­­­­­­­­­­­­­­­­­­­­+­­­­­­­­­­­+
| comment    | PRIMARY              |  12220798 |
| comment    | user_id              |  96674982 | Useful!
| comment    | video_comment_idx    | 365691254 |
| comment    | created_language_idx |    217176 |
+­­­­­­­­­­­­+­­­­­­­­­­­­­­­­­­­­­­+­­­­­­­­­­­+
30
Pros

 Very easy to use


 Turn on the variable and forget

 Easy to write queries to discover unused


indexes automatically

31
Cons

 Large sample period needed for accurate stats


 Not always obvious to say if index is useful
 Look at created_language_idx in previous slide
 Has some CPU overhead

32
pt-duplicate-key-checker

 Anything wrong with the keys?


CREATE TABLE comment (
  comment_id int(10) ... AUTO_INCREMENT,
  video_id int(10) ...,
  user_id int(10) ...,
  language char(2) ...,
  [...]
  PRIMARY KEY (comment_id), Tool is aware
  KEY user_id (user_id), of InnoDB's
  KEY video_comment_idx (video_id,language,comment_id) clustered index!
) ENGINE=InnoDB;

$ pt­duplicate­key­checker u=root,h=localhost
[...]
# Key video_comment_idx ends with a prefix of the clustered index
# Key definitions:
#   KEY video_comment_idx (video_id,language,comment_id)
Query to remove
#   PRIMARY KEY (comment_id),
[...]
the index
# To shorten this duplicate clustered index, execute:
ALTER TABLE mydb.comment DROP INDEX video_comment_idx, ADD INDEX  33
video_comment_idx (video_id,language)
pt-index-usage

 Helps answer questions not solved by userstats


 Are there any queries with a changing exec plan?
 Is an index necessary for a query?

 Read a slow log file/general log file

 Can give you invaluable information on your


index usage
 See the man page for more
34
 Thanks for your attention!

 Any questions?

35

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy