The Join principle of the MySQL series that cannot be stopped from the bottom of my head (1)

write in front

That afternoon, the interviewer slammed me at the door of the conference room

Take my resume and ask me about SQL tuning

I said, come on

I'm already strong don't have a reason

The interviewer askedJoin

heart starts to tremble

I still remember, the beginning of the DBA's newcomer's handbook

"How to use SQL Join"

Time is fast, the lights in the conference room are on for a long time

The interviewer told me to leave quickly, but didn't hold me back at all

Don't forget to comfort me with a warm heart, read Ba Yu's article, and cheer yourself on

text

Let's just put it this way, nowadays, anyone who doesn't have a sentence on their resume 精通MySQL原理及调优shouldn't bother to submit it. Today Join, we will show the interviewer from this perspective, how we dare to write the word proficient at the beginning of the resume!

Join process

Let's first create two tables t1andt2

-- 创建表t1，对idx建立普通索引
create table t1
(
	id int auto_increment,
	idx int null,
	constraint t1_pk
		primary key (id)
);
create index t1__index_idx
	on t1 (idx);
-- 创建表t2，对idx建立普通索引	
create table t2
(
	id int auto_increment,
	idx int null,
	constraint t2_pk
		primary key (id)
);
create index t2__index_idx
	on t2 (idx);
复制代码

Now execute the SQL:

select * from t1 straight_join t2 on t1.idx = t2.idx;
复制代码

straight_join: is a connection in a fixed manner, forcing the use of the left table as the driving table.

What is the execution flow of this SQL?

Step 1: t1Read a row of data from the table ROW
Step 2: Take the field from the data row idxto the table t2to find it
The third step: t2filter out the rows that meet the conditions in , and assemble with the t1read data ROW to form a row of the result set
Repeat steps 1, 2, and 3 until t1all are checked

This way of JOIN is called Index Neste-Loop Join, and we can probably know the characteristics of this kind of Join through the name:

One is Nested-Loopnested queries, which are equivalent to double for loops.

The second is Index In the query process, we can use the index of the driven table. The driven table is a tree search, which saves a lot of scan lines.

Let's try to analyze Index Neste-Loop Jointhe query process of .

![Index Neste-Loop Join](/Users/huxingbo/Documents/elastic/Index Neste-Loop Join.png)

Scenarios where Join does not go to the index

刚刚我们嘎嘎聊了，通过索引进行JOIN的流程，那如果被驱动表没有索引呢？直观上来说，根据Index Neste-Loop Join经验，我们可以仍然采用嵌套循环。遍历循环t1表的数据，去t2表全表扫描，符合条件的筛选出来，组装成数据集，这种JOIN方式，跟Index Neste-Loop Join方式很相似，同时，它也有个相似的名字——Simple Nested-Loop Join

但是，我们知道这种效率必然很慢，需要对两张表进行扫描，虽然整体流程没有办法改变，但是能否对其中的一些步骤做优化，是我们想要探索的。

实际上，由于扫描的行数没有改变，也可以时间复杂度上无法再优化，MySQL因此单独开辟了一个空间内存Join Buffer来把JOIN的操作放到了内存里操作。无索引的Join流程就变成了这样：

select * from t1 straight_join t2 on t1.idx = t2.idx;
复制代码

第一步：把t1的查询字段，放入内存Join_Buffer，这个内存区域是线程独享的。

第二步：扫描表t2，把每一行取出来，跟Join_Buffer里面的数据进行匹配，封装结果集。

细心的你，可能会发现一个问题，虽然Join_Buffer是内存操作，那么这个内存得多大的，实际上来说是很小的，默认是256K，并且受join_bu!er_size控制大小。因此，肯定会存在放不下的情况，Mysql把数据拆分成段，分段放，因此这种连接方式被称为Block Nested-Loop Join。

八股文

通过上面的分析，我们知道了JOIN的执行流程

在这里我想告诉你几个八股文：

索引的JOIN可以使用，不走索引的Block，不建议使用。
不管是什么JOIN，都建议使用小表作为驱动表

那么在优化中常常提到以小表作为驱动表也是这样的原因。但是我们怎么定义小表呢？

严格来说，我们应该对各个表进行条件筛选，查询的数据字段的总数据量为判断标准
Join的优化不应当仅仅局限于索引，小表驱动表，我们还可以通过MRR(Multi-Range Read),5.6的（BKA）Batched Key Access算法优化JOIN

关于MRR和BKA到底是什么呢？我们下次再会？欢迎关注催更