第十三周翻译

原文简介：

《Pro SQL Server Internals》2nd edition（专业SQL服务器内部-第二版）

作者：Dmitri Korotkevitch

作者简介：

Dmitri Korotkevitch是Microsoft Data Platform MVP和Microsoft Certified Master (SQL Server 2008)，他拥有20多年的IT经验，包括作为应用程序和数据库开发人员、数据库管理员和数据库架构师与Microsoft SQL Server一起工作的经验。Dmitri专门从事复杂OLTP系统的设计、开发和性能调优，这些系统每秒处理数千个事务。Dmitri经常在各种Microsoft和SQL PASS活动上发言，他为世界各地的客户提供SQL Server培训；

作者的博客地址：http://aboutsqlserver.com

原文链接：http://www.doc88.com/p-4042504089228.html

Clustered Indexes

聚集索引

A clustered index dictates the physical order of the data in a table, which is sorted according to the clustered index key. The table can have only one clustered index defined.

聚集索引指示表中数据的物理顺序，该顺序是根据聚集索引键排序的。表只能定义一个聚集索引；

Let’s assume that you want to create a clustered index on the heap table with the data. As a first step, which is shown in Figure 2-5 , SQL Server creates another copy of the data that is then sorted based on the value of the clustered key. The data pages are linked in a double-linked list where every page contains pointers to the next and previous pages in the chain. This list is called the leaf level of the index, and it contains the actual table data.

让我们假设您希望在堆表上使用数据创建集群索引。作为第一步，如图2-5所示，SQL Server创建数据的另一个副本，然后根据集群键的值对其进行排序。数据页在双链表中链接，其中每个页面都包含指向链中的下一个和上一个页面的指针。这个列表称为索引的叶级，它包含实际的表数据。

图2 - 5：聚集索引结构:叶级

■ Note The sort order on the page is controlled by a slot array. Actual data on the page is unsorted.

When the leaf level consists of multiple pages, SQL Server starts to build an intermediate level of the index, as shown in Figure 2-6 .

注意页面上的排序顺序是由槽阵列控制。页面上的实际数据没有排序。

当叶子层包含多个页面时，SQL Server开始构建索引的中间层，如图2-6所示。

igure 2-6. Clustered index structure: Intermediate and leaf levels

The intermediate level stores one row per leaf-level page. It stores two pieces of information: the physical address and the minimum value of the index key from the page it references. The only exception is the very first row on the first page, where SQL Server stores NULL rather than the minimum index key value. With such optimization, SQL Server does not need to update non-leaf-level rows when you insert the row with the lowest key value in the table.

图2 - 6：聚类索引结构:中间层和叶层

中间层为每个叶级页面存储一行。它存储了两条信息:物理地址和它引用的页面索引键的最小值。唯一的例外是第一页的第一行，其中SQL Server存储NULL，而不是最小索引键值。通过这种优化，当您插入表中键值最低的行时，SQL Server不需要更新非叶级别的行。

The pages on the intermediate levels are also linked to the double-linked list. SQL Server adds more and more intermediate levels until there is a level that includes just the single page. This level is called the root level , and it becomes the entry point to the index, as shown in Figure 2-7 .

中间层上的页面也链接到双链表。SQL Server添加了越来越多的中间级别，直到有一个级别只包含单个页面。这个级别称为根级别，它成为索引的入口点，如图2-7所示。

As you can see, the index always has one leaf level, one root level, and zero or more intermediate levels. The only exception is when the index data fits into a single page. In that case, SQL Server does not create the separate root-level page, and the index consists of just the single leaf-level page.

图 2-7：聚集索引结构:根级别

正如您所看到的，索引总是具有一个叶级、一个根级和零个或多个中间级。唯一的例外是索引数据适合于单个页面。在这种情况下，SQL Server不会创建单独的根级别页面，而索引只包含单个叶级别页面。

The number of levels in the index largely depends on the row and index key sizes. For example, the index on the 4-byte integer column will require 13 bytes per row on the intermediate and root levels. Those 13 bytes consist of a 2-byte slot-array entry, a 4-byte index-key value, a 6-byte page pointer, and a 1-byte row overhead, which is adequate because the index key does not contain variable-length and NULL columns.

索引中的级别数量主要取决于行和索引键大小。例如，4字节整数列上的索引在中间和根级别上需要每一行13字节。这13个字节由一个2字节的槽数组条目、一个4字节的索引键值、一个6字节的页指针和一个1字节的行开销组成，这已经足够了，因为索引键不包含可变长度和NULL列。

As a result, you can accommodate 8,060 bytes / 13 bytes per row = 620 rows per page. This means that, with the one intermediate level, you can store information about up to 620 * 620 = 384,400 leaf-level pages. If your data row size is 200 bytes, you can store 40 rows per leaf-level page and up to 15,376,000 rows in the index with just three levels. Adding another intermediate level to the index would essentially cover all possible integer values.

因此，每行可以容纳8060字节/ 13字节=每页620行。这意味着，使用一个中间层，可以存储最多620 * 620 = 384,400页的信息。如果数据行大小为200字节，那么每个叶级页面可以存储40行，索引中最多可以存储15,376,000行，其中只有三个级别。向索引中添加另一个中间级别将基本上覆盖所有可能的整数值。

Note In real life, index fragmentation would reduce those numbers. We will talk about index fragmentation in Chapter

注意在现实生活中,索引碎片将减少这些数字。我们将在第6章讨论索引碎片。

There are three different ways in which SQL Server can read data from the index. The first one is by an ordered scan. Let’s assume that we want to run the SELECT Name FROM dbo.Customers ORDER BY CustomerId query. The data on the leaf level of the index is already sorted based on the CustomerId column value. As a result, SQL Server can scan the leaf level of the index from the first to the last page and return the rows in the order in which they were stored.

SQL Server可以通过三种不同的方式从索引读取数据。第一个是有序扫描。让我们假设希望从dbo运行SELECT名称。客户通过CustomerId查询下单。索引页级别上的数据已经根据CustomerId列值排序。因此，SQL Server可以从第一个页面扫描到最后一个页面的索引叶级别，并按照存储它们的顺序返回行。

SQL Server starts with the root page of the index and reads the first row from there. That row references the intermediate page with the minimum key value from the table. SQL Server reads that page and repeats the process until it finds the first page on the leaf level. Then, SQL Server starts to read rows one by one, moving through the linked list of the pages until all rows have been read. Figure 2-8 illustrates this process.

SQL Server从索引的根页面开始，从根页面读取第一行。该行引用中间页，其中包含来自表的最小键值。SQL Server读取该页面并重复该过程，直到找到叶子级别上的第一个页面。然后，SQL Server开始逐个读取行，遍历页面的链表，直到读取了所有行。图2-8说明了这个过程。

图2 - 8：命令索引扫描

图2 - 9：排序索引扫描执行计划

t is worth mentioning that the order by clause is not required for an ordered scan to be triggered. An ordered scan just means that SQL Server reads the data based on the order of the index key.

值得一提的是，order by子句不需要触发有序扫描。有序扫描意味着SQL Server根据索引键的顺序读取数据。

SQL Server can navigate through indexes in both directions, forward and backward. However, there is one important aspect that you must keep in mind: SQL Server does not use parallelism during backward index scans.

SQL Server可以在前进和后退两个方向上导航索引。但是，您必须记住一个重要的方面:SQL Server在向后索引扫描期间不使用并行性。

Y ou can check scan direction by examining the INDEX SCAN or INDEX SEEK operator properties in the execution plan. Keep in mind, however, that Management Studio does not display these properties in the graphical representation of the execution plan. You need to open the Properties window to see it by selecting the operator in the execution plan and choosing the View/Properties Window menu item or by pressing the F4 key.

你可以检查扫描方向通过检查索引扫描或索引寻求运营商属性执行计划。但是请记住，Management Studio不会在执行计划的图形表示形式中显示这些属性。您需要打开Properties窗口，通过在执行计划中选择操作符并选择View/Properties窗口菜单项或按F4键来查看它。

The Enterprise Edition of SQL Server has an optimization feature called merry-go-round scan that allows multiple tasks to share the same index scan. Let’s assume that you have session S1, which is scanning the index. At some point in the middle of the scan, another session, S2, runs a query that needs to scan the same index. With a merry-go-round scan, S2 joins S1 at its current scan location. SQL Server reads each page only once, passing rows to both sessions.

SQL Server的企业版有一个称为旋转木马扫描的优化特性，它允许多个任务共享相同的索引扫描。假设会话S1扫描索引。在扫描过程中的某个时刻，另一个会话S2运行一个查询，该查询需要扫描相同的索引。通过旋转木马扫描，S2在当前扫描位置加入S1。SQL Server只读取每个页面一次，将行传递给两个会话；

When the S1 scan reaches the end of the index, S2 starts scanning data from the beginning of the index until the point where the S2 scan started. A merry-go-round scan is another example of why you cannot rely on the order of the index keys and why you should always specify an ORDER BY clause when it matters.

当S1扫描到达索引的末尾时，S2从索引的开头开始扫描数据，直到S2扫描开始的那一点。旋转木马扫描是另一个例子，说明为什么不能依赖索引键的顺序，以及为什么在重要的时候应该始终指定order BY子句。

The next access method after the ordered scan is called an allocation order scan. S QL Server accesses the table data through the IAM pages, similar to how it does so with heap tables. The SELECT Name FROM dbo.Customers WITH (NOLOCK) query and Figure 2-10 illustrate this method. Figure 2-11 shows the query execution plan.

排序扫描之后的下一个访问方法称为分配顺序扫描。QL服务器通过IAM页面访问表数据，这与它通过堆表访问表数据的方式类似。从dbo中选择的名称。具有(NOLOCK)查询的客户和图2-10演示了这种方法。图2-11显示了查询执行计划。

图2 - 10：llocation顺序扫描

图2 - 11：llocation命令扫描执行计划

Unfortunately, it is not easy to detect when SQL Server uses an allocation order scan. Even though the Ordered property in the execution plan shows false , it indicates that SQL Server does not care whether the rows were read in the order of the index key, not that an allocation order scan was used.

不幸的是，SQL Server在使用分配顺序扫描时不容易检测到。即使如果执行计划中的Ordered属性为false，则表示SQL Server并不关心是否按照索引键的顺序读取行，也不关心是否使用了分配顺序扫描。

An allocation order scan can be faster for scanning large tables, although it has a higher startup cost. SQL Server does not use this access method when the table is small. Another important consideration is data consistency. SQL Server does not use forwarding pointers in tables that have a clustered index, and an allocation order scan can produce inconsistent results. Rows can be skipped or read multiple times due to the data movement caused by page splits. As a result, SQL Server usually avoids using allocation order scans unless it reads the data in READ UNCOMMITTED or SERIALIZABLE transaction-isolation levels.

分配顺序扫描可以更快地扫描大型表，尽管它有较高的启动成本。当表很小时，SQL Server不使用这种访问方法。另一个重要的考虑因素是数据一致性。SQL Server在具有集群索引的表中不使用转发指针，分配顺序扫描可能产生不一致的结果。由于页面分割导致的数据移动，可以多次跳过或读取行。因此，SQL Server通常避免使用分配顺序扫描，除非它在未提交或可序列化事务隔离级别读取数据。

We will talk about page splits and fragmentation in Chapter 6 , “Index Fragmentation,” and discuss locking and data consistency in Part III, “Locking, Blocking, and Concurrency.”

注意我们将讨论页面分裂和分化在第六章,“索引碎片,”和在第三部分讨论锁定和数据一致性,“锁定、阻塞和并发性”。

The last index access method is called index seek . The SELECT Name FROM dbo.Customers WHERE CustomerId BETWEEN 4 AND 7 query and Figure 2-12 illustrate the operation.

最后一种索引访问方法称为索引寻道。从dbo中选择的名称。其中CustomerId在4到7之间的查询和图2-12说明了操作。

图2 - 12：指数查找

第二章■表和索引:内部结构和访问方法

n order to read the range of rows from the table, SQL Server needs to find the row with the minimum value of the key from the range, which is 4. SQL Server starts with the root page, where the second row references the page with the minimum key value of 350. It is greater than the key value that we are looking for (4), and SQL Server reads the intermediate-level data page (1:170) referenced by the first row on the root page.

为了从表中读取行范围，SQL Server需要从范围中找到键值最小的行，即4。SQL Server从根页面开始，其中第二行引用键值最小为350的页面。它大于我们正在寻找的键值(4)，SQL Server读取根页面上第一行引用的中间层数据页(1:170)。

Similarly, the intermediate page leads SQL Server to the first leaf-level page (1:176). SQL Server reads that page, then it reads the rows with CustomerIds equal to 4 and 5, and, finally, it reads the two remaining rows from the second page.

类似地，中间页面将SQL Server引导到第一个叶级页面(1:176)。SQL Server读取该页面，然后读取定制id为4和5的行，最后从第二个页面读取剩下的两行。

执行计划如图2-13所示。

图2 - 13。索引查找执行计划

As you can guess, index seek is more efficient than index scan, because SQL Server processes just the subset of rows and data pages rather than scanning the entire table. 正如您所猜测的，索引查找比索引扫描更有效，因为SQL Server只处理行和数据页的子集，而不是扫描整个表。

Technically speaking, there are two kinds of index seek operations. The first is called a singleton lookup , or sometimes point-lookup , where SQL Server seeks and returns a single row. You can think about WHERE CustomerId = 2 predicate as an example. The other type of index seek operation is called a range scan , and it requires SQL Server to find the lowest or highest value of the key and scan (either forward or backward) the set of rows until it reaches the end of scan range. The predicate WHERE CustomerId BETWEEN 4 AND 7 leads to the range scan. Both cases are shown as INDEX SEEK operations in the execution plans.

从技术上讲，有两种索引查找操作。第一个称为单例查找，有时也称为点查找，SQL Server在其中查找并返回一行。您可以以CustomerId = 2谓词为例。另一种类型的索引查找操作称为范围扫描，它要求SQL Server查找键的最低或最高值，并扫描(向前或向后)这组行，直到扫描范围结束。CustomerId在4到7之间的谓词将导致范围扫描。这两种情况都显示为执行计划中的索引查找操作。

As you can guess, it is entirely possible for range scans to force SQL Server to process a large number or even all data pages from the index. For example, if you changed the query to use a WHERE CustomerId > 0 predicate, SQL Server would read all rows/pages, even though you would have an Index Seek operator displayed in the execution plan. You must keep this behavior in mind and always analyze the efficiency of range scans during query performance tuning.

正如您所猜测的，范围扫描完全有可能强制SQL Server处理来自索引的大量甚至所有数据页。例如，如果将查询更改为使用WHERE CustomerId >谓词，SQL Server将读取所有行/页，即使在执行计划中显示了索引查找操作符。您必须记住这种行为，并在查询性能调优期间始终分析范围扫描的效率。

There is a concept in relational databases called SARGable predicates , which stands for S earch Arg ument able . The predicate is SARGable if SQL Server can utilize an index seek operation, if an index exists. In a nutshell, predicates are SARGable when SQL Server can isolate the single value or range of index key values to process, thus limiting the search during predicate evaluation. Obviously, it is beneficial to write queries using SARGable predicates and utilize index seek whenever possible.

关系数据库中有一个概念叫SARGable谓词，它代表S earch薪资。如果SQL Server可以使用索引查找操作(如果存在索引)，则谓词是可SARGable。简而言之，当SQL Server能够隔离要处理的单个值或索引键值范围时，谓词是可SARGable的，从而限制了谓词计算期间的搜索。显然，使用可SARGable谓词编写查询并在任何可能的情况下使用索引搜索是有益的。

SARGable predicates include the following operators: = , > , >= , < , <= , IN , BETWEEN , and LIKE (in case of prefix matching). Non-SARGable operators include NOT , <> , LIKE (in case of non-prefix matching), and NOT IN . Another circumstance for making predicates non-SARGable is using functions or mathematical calculations against the table columns. SQL Server has to call the function or perform the calculation for every row it processes.

SARGable谓词包括以下操作符:=、>、>=、<、<=、IN、BETWEEN和LIKE(在前缀匹配的情况下)。非sargable操作符包括NOT、<>、LIKE(在非前缀匹配的情况下)和NOT in。使谓词不可sargable的另一种情况是对表列使用函数或数学计算。SQL Server必须为它处理的每一行调用函数或执行计算。

Fortunately, in some of cases you can refactor the queries to make such predicates SARGable. Table 2-1 shows a few examples of this.

幸运的是，在某些情况下，您可以重构查询，使这些谓词成为可SARGable。表2-1显示了一些例子。

表2 - 1。重构不可SARGable谓词到可SARGable谓词的示例

Another important factor that you must keep in mind is type conversion . In some cases, you can make predicates non-SARGable by using incorrect data types. Let’s create a table with a varchar column and populate it with some data, as shown in Listing 2-6 .

您必须记住的另一个重要因素是类型转换。在某些情况下，您可以使用不正确的数据类型使谓词不可sargable。让我们使用varchar列创建一个表，并用一些数据填充它，如清单2-6所示。

清单2 - 6。SARG谓词和数据类型:测试表的创建

create table dbo.Data 
 ( 
     VarcharKey varchar(10) not null, 
     Placeholder char(200) 
 ); 
 
 
 
create unique clustered index IDX_Data_VarcharKey 
 on dbo.Data(VarcharKey); 
 
 
 
;with N1(C) as (select 0 union all select 0) -- 2 rows 
 ,N2(C) as (select 0 from N1 as T1 cross join N1 as T2) -- 4 rows 
 ,N3(C) as (select 0 from N2 as T1 cross join N2 as T2) -- 16 rows 
 ,N4(C) as (select 0 from N3 as T1 cross join N3 as T2) -- 256 rows 
 ,N5(C) as (select 0 from N4 as T1 cross join N4 as T2) -- 65,536 rows 
 ,IDs(ID) as (select row_number() over (order by (select null)) from N5) 
 insert into dbo.Data(VarcharKey) 
     select convert(varchar(10),ID) from IDs;

The clustered index key column is defined as varchar , even though it stores integer values. Now, let’s run two selects, as shown in Listing 2-7 , and look at the execution plans.

集群索引键列被定义为varchar，尽管它存储整数值。现在，让我们运行两个选择，如清单2-7所示，并查看执行计划。

第二章■表和索引:内部结构和访问方法

图2 - 7：SARG谓词和数据类型:使用整型

declare

@IntParam int = '200'

select * from dbo.Data where VarcharKey = @IntParam;

select * from dbo.Data where VarcharKey = convert(varchar(10),@IntParam);

As you can see in Figure 2-14 , in the case of the integer parameter, SQL Server scans the clustered index, converting varchar to an integer for every row. In the second case, SQL Server converts the integer parameter to a varchar at the beginning and utilizes a much more efficient clustered index seek operation.

如图2-14所示，对于integer参数，SQL Server扫描集群索引，将varchar转换为每一行的整数。在第二种情况下，SQL Server在开始时将整型参数转换为varchar，并使用更高效的集群索引查找操作

图2 - 14：SARG谓词和数据类型:具有整型参数的执行计划

Pay attention to the column data types in the join predicates. Implicit or explicit data type conversions can significantly decrease the performance of the queries.

You will observe very similar behavior in the case of unicode string parameters. Let’s run the queries shown in Listing 2-8 . Figure 2-15 shows the execution plans for the statements.

提示注意连接谓词的列数据类型。隐式或显式数据类型转换会显著降低查询的性能。

在unicode字符串参数的情况下，您将观察到非常类似的行为。让我们运行清单2-8所示的查询。图2-15显示了语句的执行计划。

清单2 - 8：SARG谓词和数据类型:使用字符串参数进行选择

select * from dbo.Data where VarcharKey = '200'; 
 select * from dbo.Data where VarcharKey = N'200'; -- unicode parameter

图2 - 15：SARG谓词和数据类型:带有S tring参数的执行计划

As you can see, a unicode string parameter is non-SARGable for varchar columns. This is a much bigger issue than it appears to be. While you rarely write queries in this way, as shown in Listing 2-8 , most application development environments nowadays treat strings as unicode. As a result, SQL Server client libraries generate unicode ( nvarchar ) parameters for string objects unless the parameter data type is explicitly specified as varchar . This makes the predicates non-SARGable, and it can lead to major performance hits due to unnecessary scans, even when varchar columns are indexed.

如您所见，对于varchar列，unicode字符串参数是不可sargable的。这是一个比看上去大得多的问题。虽然很少以这种方式编写查询，如清单2-8所示，但是现在大多数应用程序开发环境都将字符串视为unicode。结果，SQL Server客户端库为字符串对象生成unicode (nvarchar)参数，除非参数数据类型明确指定为varchar。这使得谓词不可sargable，而且由于不必要的扫描，甚至在索引varchar列时，也会导致性能下降。

Important Always specify parameter data types in client applications. For example, in ADO.Net, us Parameters.Add("@ParamName",SqlDbType.Varchar, <Size>).Value = stringVariable instead of Parameters.Add("@ParamName").Value = stringVariable overload. Use mapping in ORM frameworks to explicitly specify non-unicode attributes in the classes.

It is also worth mentioning that varchar parameters are SARGable for nvarchar unicode data columns.

重要总是在客户机应用程序指定参数的数据类型。例如，在ADO中。净,用

SqlDbType Parameters.Add(“@ParamName”。Varchar、<大小>)。Value = stringVariable而不是Parameters.Add(“@ParamName”)。Value = stringVariable重载。在ORM框架中使用映射显式地指定类中的非unicode属性。值得一提的是，对于nvarchar unicode数据列，varchar参数是可SARGable的。

猜你喜欢