SQL is not well written, and overtime is inevitable. SQL is an essential technology in daily work, but many people do not pay much attention to SQL problems. One is the small amount of data, and the other is the lack of awareness of the importance of indexing. This article is mainly to organize the SQL failure scenarios. If you know all the details, then you must be a good learner. “If you can follow the rhythm of reading the article, you will certainly gain at least I feel much more transparent thinking after writing the next 90% of the SQl indexing problems and interviews in this area can grasp the two
Summary of the article
Indexing failure. Tidying up.
Basic data preparation
Prepare a data table for the data demo, in which a total of three indexes are created.
Union indexsname
,s_code
,address
- primary key index
id
- ordinary index
height
SET NAMES utf8mb4;
SET FOREIGN_KEY_CHECKS = 0;
-- ----------------------------
-- Table structure for student
-- ----------------------------
DROP TABLE IF EXISTS `student`;
CREATE TABLE `student` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`sname` varchar(20) CHARACTER SET utf8 COLLATE utf8_general_ci NOT NULL,
`s_code` int(100) NULL DEFAULT NULL,
`address` varchar(100) CHARACTER SET utf8 COLLATE utf8_general_ci NULL DEFAULT NULL,
`height` double NULL DEFAULT NULL,
`classid` int(11) NULL DEFAULT NULL,
`create_time` datetime(0) NOT NULL ON UPDATE CURRENT_TIMESTAMP(0),
PRIMARY KEY (`id`) USING BTREE,
INDEX ``(`height`) USING BTREE,
INDEX ``(`sname`, `s_code`, `address`) USING BTREE
) ENGINE = InnoDB AUTO_INCREMENT = 5 CHARACTER SET = utf8 COLLATE = utf8_general_ci ROW_FORMAT = Dynamic;
Reflections on the issue
With the SQL above, we’ve created the basic data, but before we validate it, let’s take a few questions.
Let’s start by verifying from top to bottom
(math.) the leftmost matching principle
Write in the front: I have heard of the database of the leftmost matching principle a long time ago, at that time it is through the major blog forum to understand, but the limitation of these blogs is that their description of the leftmost matching principle is like some mathematical definitions, often is a list of 123 points, to meet the 123 points will be able to match on the index, otherwise it can not. The leftmost matching principle means that in the joint index, if your SQL statement used in the joint index of the leftmost index, then this SQL statement can be used to match the joint index, we established above the joint index can be used to test the leftmost matching principle sname
, s_code
, address
Look at the following SQL statement and think about whether or not it will go to an index.
In your experience, what are some of the things that are indexed? Think about it for a moment and write down the numbers in your mind.
Full table scan rows = 4
If you don’t know what EXPLAIN
is, or can’t read the analyzed data, we recommend checking out the other article on analyzing the EXPLAIN command.
If you don’t get all the answers right, read on.
The leftmost match principle is as the name implies: leftmost first, any consecutive indexes starting with the leftmost will match. Meanwhile, range queries (>, <, between, like) will stop matching.
For example: s_code = 2 If you build indexes in the order of ( sname
, s_code
), you can’t match indexes in the order of ( sname
, s_code
).
But if the query condition is sname = “turn into Patrick” and s_code = 2 or a=1 (aka s_code = 2 and sname = “turn into Patrick” ) it’s OK, because the optimizer will automatically adjust the order of sname
, s_code
. Another example is sname = “turns into Patrick” and s_code > 1 and address = “Shanghai” address
, because the s_code field is a range query, and the fields after it will stop matching.
Query Without Range Index Usage Type
Type of use with range
Based on the previous post, you can see that the meaning of ref and range are quite different in terms of level.
Why must left links follow the leftmost suffix principle?
Read one of the funnier responses
You can think of co-indexing as the design of a breakout game
For example you have this union index as state/city/zipCode
So state is level one, city is level two, and zipCode is level three.
You must match the first level to match the second level, and match the first and second levels to match the third level
That’s not entirely accurate, but that’s the idea.
To understand the leftmost match principle of a union index, let’s first understand the underlying principle of an index. The underlying layer of an index is a B+ tree, so the underlying layer of a union index is also a B+ tree, except that the B+ tree nodes of a union index store key values. Since the construction of a B + tree can only be based on a value to determine the index relationship, so the database relies on the joint index of the leftmost field to build Text is more abstract Let’s look at it
Join us to create a joint index of A and B. What do they look like in the underlying storage?
- Orange represents field A
- Light green Represents field B
Illustration:
We can discern several features
- A is sequential 1, 1, 2, 2, 3, 4
B is out of order. 1, 2, 1, 4, 1, 2. This is hashed.
B is ordered if A is equal, e.g. (1, 1), (1, 2) Here B is ordered (2, 1),(2, 4) B is also ordered.
It should be obvious here that without A’s support, B’s index would be hashed, not contiguous.
Let’s get a little more detailed. Let’s create a new table.
DROP TABLE IF EXISTS `leftaffix`;
CREATE TABLE `leftaffix` (
`a` int(11) NOT NULL AUTO_INCREMENT,
`b` int(11) NULL DEFAULT NULL,
`c` int(11) NULL DEFAULT NULL,
`d` int(11) NULL DEFAULT NULL,
`e` varchar(11) CHARACTER SET utf8 COLLATE utf8_general_ci NULL DEFAULT NULL,
PRIMARY KEY (`a`) USING BTREE,
INDEX `联合索引`(`b`, `c`, `d`) USING BTREE
) ENGINE = InnoDB AUTO_INCREMENT = 8 CHARACTER SET = utf8 COLLATE = utf8_general_ci ROW_FORMAT = Dynamic;
-- ----------------------------
-- Records of leftaffix
-- ----------------------------
INSERT INTO `leftaffix` VALUES (1, 1, 1, 1, '1');
INSERT INTO `leftaffix` VALUES (2, 2, 2, 2, '2');
INSERT INTO `leftaffix` VALUES (3, 3, 2, 2, '3');
INSERT INTO `leftaffix` VALUES (4, 3, 1, 1, '4');
INSERT INTO `leftaffix` VALUES (5, 2, 3, 5, '5');
INSERT INTO `leftaffix` VALUES (6, 6, 4, 4, '6');
INSERT INTO `leftaffix` VALUES (7, 8, 8, 8, '7');
SET FOREIGN_KEY_CHECKS = 1;
When creating the index tree, the data will be sorted according to the principle of the leftmost suffix will be sorted first by B, that is, if the same value occurs according to the C sort if the C is the same according to the D sort after sorting the order is the following figure:
The indexes are generated in the same order as in Figure 2, so let’s see what the resulting tree looks like.
Explaining some of this tree diagram, first of all, based on the sorting in Figure 2, we know that the order is 1111a 2222b, so at the third level, we can see that 1111a is at the first level, and 2222b is at the second level, because 111 < 222, so 111 goes to the second level, and then we get to the first level, and then we get to the first level.
Simplified it looks like this
But this order is relative. This is because MySQL’s rule for creating a union index is that it will first sort the leftmost of the union index, on top of the ordering of the first field, and then on the second field. So there is no way to utilize the index for a query condition like B=2.
One more thing you can see here is why we don’t recommend building indexes on fields that change frequently, because then our index structure has to change as you change, so it’s a big performance hog.
The leftmost affix principle can be broken by jump scanning, as hinted by the old man in the comments section.
This is optimized for 8.0
Started to increase the index jump scanning function, when the first column of the index of the unique value is less, even if the where condition does not have the first column of the index, the query can also be used when the joint index. For example, we use the union index bcd but there are fewer fields in b. We do not use b when we use the union index, but we can still use the union index. MySQL union indexes sometimes follow the principle of left-most prefix matching, and sometimes they do not.
Prerequisites If you create a b,c,d joint index plane
If the condition after my where isc = 1 and d = 1
why can’t I go to the index if there’s no b the value of your query is equivalent to*11
as we all know*
means all which means I can match all the data
If my where is followed byb = 1 and d =1
why does it go to the index? You’re querying for1*1
and I can index match it with the first 1, so it’s indexed.- The most important aspect of the leftmost match principle is the first field.
Let’s move on to the next failure scenario
select *
Here is a misunderstanding of my previous thinking select * will not cause index failure before the test found that the failure is because of where after the query range is too large, resulting in index failure is not caused by Select *, but why do not we recommend the use of select *?
- Increase query analyzer parsing costs.
Incrementing and decrementing fields can easily be inconsistent with the resultMap configuration.
Unused fields increase network consumption, especially text-type fields.
In Ali’s development manual, the above points are summarized in broad terms.
When using Select * indexes are used normally
It’s indexed, but it’s not recommended. Why not?
First of all, we created a joint index in the last validation We use B = 1 will go to the index But with the direct query index field is different Use SELECT*
,to get the unwanted data, then first filter the data through the auxiliary index, and then through the aggregated index to get all the columns, which is more than one b + tree query, the speed will inevitably be much slower to reduce the use of select * is to reduce the loss of back to the table.
That is, Select * will go to the index in some cases, and if it doesn’t go to the index, it’s because the where query is too large, causing MySQL to optimize a full table scan, not a Select * problem.
The image above is a case of index failure
Range searches don’t always cause indexes to fail, but they do in the following cases: the indexes work at a low level, and the reason they work is because they’re narrowed down.
select * will go to the index
Range lookups have a probability that the index will fail but will work in certain circumstances Smaller ranges will be used It can also be interpreted to mean that indexes will be used if the returned result set is small
The principle of join query in mysql is to query the driving table first, and then use the data obtained from the driving table as the condition to query the driven table one by one.
Each time the driver table loads a piece of data into memory, and then all the data of the driven table need to be loaded into memory once for comparison. Efficiency is very low, so mysql can specify the size of a buffer pool, buffer pool if large can be loaded at the same time more than one driving table data for comparison, the more data put in the performance of the io operation will be less, the performance is also better. Therefore, if you useselect *
to put some useless columns, it will only take up buffer space for nothing. It is a waste of an opportunity to improve performance.
According to the comment section, select * is not the direct cause of the index failure, it’s mostly a problem with the conditions after where, but it’s still a good idea to minimize the use of select *, as much as possible.
Using Functions
Using a function after Select will work with indexes, but the following won’t.
Because the index holds the original value of the indexed field, not the value calculated by the function, there is naturally no way to go to the index.
However, since MySQL 8.0, the indexing feature adds function indexes, which means that you can build an index against the value calculated by a function, that is, the value of the index is the value calculated by the function, so you can query the data by scanning the index.
I’ve never used this style of writing, and I think it’s less common and easier to notice.
computational operation
This case is the same as the above The reason why the index is invalidated is because the original value of the index has been changed, and the corresponding data cannot be found in the tree, so we have to scan the whole table.
The following calculation would use the index
Java is more familiar with may be a bit of a question, this kind of simple expression on the index of the calculation, in the code under the special handling, it should be able to do index scanning, let’s say b – 1 = 6 into b = 6 – 1. Yes, it is able to achieve, but MySQL still stole this lazy, did not implement.
In short, in a nutshell, if it affects the value of an indexed column, the index is invalidated.
Like %
This is a really tough one, because it’s used a lot, so you have to be careful. Before we look at why it doesn’t work, let’s look at the Like % explanation.
% percent sign wildcard: Indicates that any character occurs any number of times (can be 0).
_Underline wildcard: Indicates that only a single character can be matched, no more and no less, that is, a single character.
The LIKE operator: LIKE is used to indicate to mysql that later search patterns are compared using wildcards rather than direct equality matches.
Note: If the like operator is used without a generic match after it, the effect is the same as =, but the effect is the same as if the like operator is used without a generic match after it.
SELECT * FROM products WHERE products.prod_name like '1000';
2. Match records containing “Li” (including record “Li”).
SELECT* FROM products WHERE products.prod_name like '%Li%';
3. Match records ending in “Li” (including the record “Li”, excluding the record “Li”, that is, Li followed by a space in the record, here you need to pay attention)
SELECT * FROM products WHERE products.prod_name like '%Li';
Not on the left. On the right.
It goes, but the index level is lower mainly because of the fuzzy query, the range is bigger, so the index level is lower.
This is a very large range, so there’s no need to use an index This may not be very well optimized Luckily, it’s not spliced on all the time
Indexing has a lot to do with the query range, and it’s not uncommon for indexes to fail because the range is too large, making them meaningless.
Using Or causes indexes to fail
The reason for this is even simpler.
In the WHERE clause, if the conditional column before the OR is an indexed column and the conditional column after the OR is not an indexed column, then the index will fail As an example, let’s say that the following query statement, where b is the primary key and e is a normal column, went through a full table scan from the results of the execution plan.
The way to optimize this is to index both sides of the Or.
Then we’ll use indexes to avoid full table scans.
Inappropriate use of in
First of all, the use of In does not necessarily result in a full table scan IN will definitely go to the index, but when the range of IN values is large it will cause the index to fail and go to a full table scan
in The index fails when the result set is greater than 30%.
The failure scenarios for not in and In are the same
order By
- Go Index + Back to Table
- No indexing. Just a full table scan.
Mysql thinks it’s faster to just do a full table scan than to go back to the table, so it just goes straight to the index. In the case of Order By, it’s better to do a full table scan.
Does the subquery go to the index?
The answer is yes, but not if you use it badly.
Optimization Ideas for Reducing Table Returns
This is a term that may be familiar to those who have been working with SQL for a long time, but it’s very interesting and important.
In this indexing problem there is also a detailed thing which is more impressive is back to the table will cause efficiency loss but in our daily work is more commonly used single column indexes joint indexes are not very common for newcomers but single column indexes in some cases is certainly not the optimal solution, such as the like % problem will cause indexing problems I recently learned a ICP
knowledge I have not paid attention to do not know how much you know about this I here some organization. I don’t know how much you know about it, so I’m going to organize it here.
First of all, the ICP is Index Condition Pushdown
, which can be called index push down. The main function is to solve the problem of querying the data back to the table, but only if it is used in conjunction with a union index, then it can be effective. If you don’t know about it, you can take a serious look at this, personally I think it is still something interesting.
The question of table returns
The above doesn’t really explain much about table lookups, so I’ll just mention what a table lookup is.
Table lookups typically occur on non-primary key indexes and require two tree lookups, so efficiency is compromised. To address this behavior, we can optimize using federated indexes.
ICP Index Downpush
This is a feature provided after MySQL 5.6. If we ask this in an interview, when we usually interview, the interviewer likes to ask what version has been added. If you are asked what optimizations have been added after MySQL 5.6, I don’t know what you can say, but this is a very good point. If you can say what index push is, the interviewer will be more impressed with you, at least it shows that you have something in your body. I’m not going to talk about it. Let’s get started.
Let’s take a look at how the query process changes before and after 5.6
Let’s say we need to look up select * from table1 where b like '3%' and c = 3
5.6 Before
First, we look up the data starting with 3 through the union index, and then we get the primary key (the green block in the above image is the primary key).
And then the primary key goes to the primary key index to go back to the table, and the secondary index goes back to the table as many times as it comes up with a few 3’s.
5.6 After
First, we look up the data starting with 3 in the secondary index, then we find the data with c = 3 and filter it, and then we get the primary key.- Table lookup by primary key
All of the above will do table lookups but before 5.6, they didn’t fully utilize the secondary cache for data filtering and if there was a lot of data starting with a 3, then it would go all the way back to the table, but after 5.6, it’s going to utilize the subsequent indexed fields for lookups.
How to say it is why index push down should be used with the joint index common so no index push down is to make full use of the joint index fields into the filter to minimize the need to go back to the table data to increase the efficiency of the query it feels like the idea is very simple
ICP for Innodb Engine Only Secondary Indexes