调整SQLServer参数后实例停止且无法启动

今天,项目组一女司机在维护一生产服务器过程中,犯了一个大傻,下午15时左右,从网上拷贝了一个语句,在其本人“莫名其妙”的情况下,就对生产服务器执行,结果导致SQLServer实例停止并且无法启动,所有应用全部不能访问;

当生产部门发现问题并反馈时,她检查发现SQLServer不能连接(SQL服务停止),遂启动服务(当然报错),然后重新多次启动(从SQL错误日志文件可以看到)....,

折腾到17:30左右,实在挺不住了,找到我(焦急的)反应“sqlserver 的服务不能开启连不上数据库是怎么回事”,


1、废话少说,先远程连上服务器,查看SQL错误日志(D:\Program Files\Microsoft SQL Server\MSSQL11.MSSQLSERVER\MSSQL\Log\ERRORLOG),

下面是详细日志,红色字体是错误部分

2017-08-14 15:10:02.48 Server      (c) Microsoft Corporation.
2017-08-14 15:10:02.48 Server      All rights reserved.
2017-08-14 15:10:02.48 Server      Server process ID is 2896.
2017-08-14 15:10:02.48 Server      System Manufacturer: 'LENOVO', System Model: 'Lenovo System x3850 X6                               -[6241AC1]-'.
2017-08-14 15:10:02.48 Server      Authentication mode is MIXED.
2017-08-14 15:10:02.48 Server      Logging SQL Server messages in file 'd:\Program Files\Microsoft SQL Server\MSSQL11.MSSQLSERVER\MSSQL\Log\ERRORLOG'.
2017-08-14 15:10:02.48 Server      The service account is 'NT Service\MSSQLSERVER'. This is an informational message; no user action is required.
2017-08-14 15:10:02.48 Server      Registry startup parameters: 
-d d:\Program Files\Microsoft SQL Server\MSSQL11.MSSQLSERVER\MSSQL\DATA\master.mdf
-e d:\Program Files\Microsoft SQL Server\MSSQL11.MSSQLSERVER\MSSQL\Log\ERRORLOG
-l d:\Program Files\Microsoft SQL Server\MSSQL11.MSSQLSERVER\MSSQL\DATA\mastlog.ldf
2017-08-14 15:10:02.48 Server      Command Line Startup Parameters:
-s "MSSQLSERVER"
2017-08-14 15:10:02.68 Server      SQL Server detected 4 sockets with 8 cores per socket and 16 logical processors per socket, 64 total logical processors; using 40 logical processors based on SQL Server licensing. This is an informational message; no user action is required.
2017-08-14 15:10:02.68 Server      SQL Server is starting at normal priority base (=7). This is an informational message only. No user action is required.
2017-08-14 15:10:02.68 Server      Detected 130959 MB of RAM. This is an informational message; no user action is required.
2017-08-14 15:10:02.68 Server      Using conventional memory in the memory manager.
2017-08-14 15:10:03.98 Server      This instance of SQL Server last reported using a process ID of 14064 at 2017/8/14 15:06:38 (local) 2017/8/14 7:06:38 (UTC). This is an informational message only; no user action is required.
2017-08-14 15:10:03.99 Server      Node configuration: node 0: CPU mask: 0x000000000000ffff:0 Active CPU mask: 0x000000000000ffff:0. This message provides a description of the NUMA configuration for this computer. This is an informational message only. No user action is required.
2017-08-14 15:10:03.99 Server      Node configuration: node 1: CPU mask: 0x00000000ffff0000:0 Active CPU mask: 0x00000000ffff0000:0. This message provides a description of the NUMA configuration for this computer. This is an informational message only. No user action is required.
2017-08-14 15:10:03.99 Server      Node configuration: node 2: CPU mask: 0x0000ffff00000000:0 Active CPU mask: 0x000000ff00000000:0. This message provides a description of the NUMA configuration for this computer. This is an informational message only. No user action is required.
2017-08-14 15:10:03.99 Server      Node configuration: node 3: CPU mask: 0xffff000000000000:0 Active CPU mask: 0x0000000000000000:0. This message provides a description of the NUMA configuration for this computer. This is an informational message only. No user action is required.
2017-08-14 15:10:04.00 Server      Using dynamic lock allocation.  Initial allocation of 2500 Lock blocks and 5000 Lock Owner blocks per node.  This is an informational message only.  No user action is required.
2017-08-14 15:10:04.00 Server      Lock partitioning is enabled.  This is an informational message only. No user action is required.

2017-08-14 15:10:09.06 Server      Failed allocate pages: FAIL_PAGE_ALLOCATION 1

...
2017-08-14 15:10:09.06 Server      错误: 17300,严重性: 16,状态: 1。(参数:)。所显示的错误消息非常简洁,因为在格式设置期间有错误。跟踪、ETW 和通知等均被跳过。

...
2017-08-14 15:10:09.06 Server      错误: 17312,严重性: 16,状态: 1。(参数:)。所显示的错误消息非常简洁,因为在格式设置期间有错误。跟踪、ETW 和通知等均被跳过。

...
2017-08-14 15:10:09.06 Server      错误: 33086,严重性: 10,状态: 1。(参数:)。所显示的错误消息非常简洁,因为在格式设置期间有错误。跟踪、ETW 和通知等均被跳过。

...

2、问题分析,询问小妹在15:10分左右其对服务器作了什么操作,她发来一串语句,

--修改过内存的
exec sp_configure 'show advanced options', 1  
--设置最大内存值,清除现有缓存空间  
exec sp_configure 'max server memory', 256  
EXEC ('RECONFIGURE')  
--设置等待时间  
WAITFOR DELAY '00:00:01'  
--重新设置最大内存值  
EXEC  sp_configure 'max server memory', 4096  
EXEC ('RECONFIGURE')  
--关闭高级配置  
exec sp_configure 'show advanced options',0  
GO

  

我一看简直就要晕了,这是个生产服务器啊,总内存128GB,数据1TB左右,怎么能限制到4G呢

心里先平静一下,问她知道这个语句是干嘛的不,她说知道" 设置内存的参数,我就限制了一下服务器内存,然后服务器就停了"....


3、很显然问题原因是:限制内存并应用后,SQLServer内存不足自动停止,并且无法正常启动之


4、解决,以最小模式启动SQLServer实例,修改内存限制

考虑到有多个应用从不同的服务器连接本SQLServer实例,首先做一些外围处理(禁止应用程序连接本实例),避免它们干扰修复过程

-- 关闭本地服务器(连接该实例)的应用程序,
-- 对于跨服务器访问的,在防火墙中关闭SQLServer端口


--1.开启一个cmd窗口 窗口1,-f最小模式启动实例

cd /D C:\Program Files\Microsoft SQL Server\MSSQL11.MSSQLSERVER\MSSQL\Binn\sqlservr.exe

sqlservr.exe -f -sMSSQLSERVER
MSSQLSERVER 默认实例名,可修改为你实际实例名修改


--2.cmd窗口2 (窗口1运行后)立即运行下面命令进入 命令行模式

sqlcmd -E -sMSSQLSERVER

--修复,就本故障来说,修改内存最大值为不限制,

EXEC sp_configure 'show advanced options', '1' RECONFIGURE WITH OVERRIDE;
EXEC sp_configure 'max server memory', 2147483647 RECONFIGURE WITH OVERRIDE;
EXEC sp_configure 'show advanced options', '0' RECONFIGURE WITH OVERRIDE;
GO


-- 3.关闭cmd窗口1,正常启动SQLServer实例

启动成功则OK,如果启动不成功,检查错误日志修复其它参数,直至启动成功。

启动成功后可根据需要重新调整合适的参数,(毕竟紧急情况下是以先解决问题为主)



猜你喜欢

转载自blog.csdn.net/qing7416/article/details/77165115