Practical Mysql Indexing Guidelines
Practical Mysql Indexing Guidelines
Percona Live
October 24th-25th, 2011 Stéphane Combaudon
London, UK stephane.combaudon@dailymotion.com
Agenda
Introduction
Bad indexes & performance drops
Guidelines for efficient indexing
Tools and methods to improve index usage
2
Introduction
3
Goals
Being productive
Knowing simple rules to design indexes
Knowing tools that can help 4
Indexing basics
CREATE TABLE t (
id INT NOT NULL AUTO_INCREMENT,
a INT NOT NULL DEFAULT 0,
b INT NOT NULL DEFAULT 0,
[more columns here]
PRIMARY KEY(id)
)ENGINE=InnoDB;
9
Adding an index
3 main consequences:
Can speed up queries (good)
Increases the size of your dataset (bad)
Slows down writes (bad)
10
Write slow-downs, pictured
In-memory test
200
2 idx
3 idx
150
4 idx
100
For in-memory
workloads, adding 2 keys
50
makes perf. 2x worse
0
Number of indexes
On-disk test
12000
10000
8000
2 idx
6000
3 idx adding 2 keys make perf.
4 idx
4000
40x worse!!
2000
0
11
Number of indexes
So what?
12
Identifying bad indexes
13
Guidelines for efficient indexes
14
Before we start...
Q1: SELECT * FROM t WHERE a = 10 AND b = 20
Without an index, always a full table scan
1. mysql> EXPLAIN SELECT * FROM t WHERE a = 10 AND b = 20\G
2. *********** 1. row ***********
3. id: 1
4. select_type: SIMPLE
5. table: t
ALL means
6. type: ALL
table scan
7. possible_keys: NULL
8. key: NULL
9. key_len: NULL
10. ref: NULL
Estimated #
12. rows: 1000545
of rows to read
12. Extra: Using where
Post-filtering needed
to discard the non-matching rows 16
Rule #1: Filter
1. mysql> EXPLAIN SELECT * ... 1. mysql> EXPLAIN SELECT * ...
2. ********** 1. row ********** 2. ********** 1. row **********
3. [...] 3. [...]
4. key: ab 4. key: ba
5. key_len: 8 5. key_len: 8
6. [...] 6. [...]
7. rows: 10 7. rows: 10
Exec time: 0.00s Exec time: 0.00s
Q2: SELECT * FROM t WHERE a = 10 ORDER BY b
Remember: indexed values are sorted
1. mysql> EXPLAIN SELECT * ... 1. mysql> EXPLAIN SELECT * ...
2. ********** 1. row ********** 2. ********** 1. row **********
3. […] 3. […]
4. type: index 4. type: ALL
5. key: b 4. key: NULL
6. key_len: 4 5. key_len: NULL
7. [...] 6. [...]
8. rows: 1000638 7. rows: 1000638
9. Extra: Using where 8. Extra: Using where;
Using filesort
Exec time: 1.52s Exec time: 0.37s
EXPLAIN suggest
key(b) is better,
20
but it's wrong!
Rule #2: Sort
With key(a,b)
1. mysql> EXPLAIN SELECT * FROM t 1. mysql> EXPLAIN SELECT * FROM t
WHERE a = 10 ORDER BY b\G WHERE a > 10 ORDER BY b\G
2. ********** 1. row ********** 2. ********** 1. row **********
3. [...] 3. [...]
4. type: ref 4. type: ALL
5. key: ab 5. key: NULL
6. key_len: 8 6. key_len: NULL
7. [...] 7. [...]
8. rows: 20 8. rows: 1000638
9. Extra: 9. Extra: Using where;
Using filesort
Q3: SELECT a,b FROM t WHERE a > 100;
24
Rating an index
Q4: SELECT * FROM t WHERE a > 10 and b = 20 ORDER BY a
mysql> EXPLAIN SELECT * ...\G
********** 1. row **********
[...] Key filters and sorts, but
type: range filtering is not efficient.
key: a Getting data is very slow
rows: 500319 (random access + I/O-bound)
Extra: Using where
Exec time: 35.9s
mysql> EXPLAIN SELECT * ...\G
********** 1. row **********
[...]
type: ref Key filters but doesn't sort.
possible_keys: a,b,ab,ba Filtering is efficient so getting data,
key: ba post-filtering and post-sorting
rows: 64814 is not too slow
Extra: Using where;
26
Using filesort
Exec time: 0.2s
Joins and ORDER BY
27
Tools and methods
to improve index usage
28
Userstats v2
++++
| table_name | index_name | rows_read |
++++
| comment | PRIMARY | 50361 |
| comment | user_id | NULL | Useless
| comment | video_comment_idx | 18276197 |
| comment | created_language_idx | NULL | Useless
++++
29
Workload matters!
OLAP server
++++
| table_name | index_name | rows_read |
++++
| comment | PRIMARY | 50361 |
| comment | user_id | NULL | Never used
| comment | video_comment_idx | 18276197 |
| comment | created_language_idx | NULL |
++++
OLTP server
++++
| table_name | index_name | rows_read |
++++
| comment | PRIMARY | 12220798 |
| comment | user_id | 96674982 | Useful!
| comment | video_comment_idx | 365691254 |
| comment | created_language_idx | 217176 |
++++
30
Pros
31
Cons
32
pt-duplicate-key-checker
$ ptduplicatekeychecker u=root,h=localhost
[...]
# Key video_comment_idx ends with a prefix of the clustered index
# Key definitions:
# KEY video_comment_idx (video_id,language,comment_id)
Query to remove
# PRIMARY KEY (comment_id),
[...]
the index
# To shorten this duplicate clustered index, execute:
ALTER TABLE mydb.comment DROP INDEX video_comment_idx, ADD INDEX 33
video_comment_idx (video_id,language)
pt-index-usage
Any questions?
35