SQL SERVER System Information Collection and Analysis

1.      Windows Event Log
a.      System log – sql server service start and close, operating system events, hardware changes etc.
b.      Application log
c.      Security log – for authentication errors
2.      SQL Server Errorlog
a.      Where? –one way is to check the –e in startup parameters in configuration manager
b.      7 files by default
c.      What are not reported in errorlog?
                                                    i.     Blocking
                                                   ii.     Performance problem due to poor table and statement design
                                                  iii.     Errors at the OS level
3.      Performance Monitor
a.      Perfmon.exe
b.      Live counters
c.      Limited time window at one time (1m4 40s by default), but you can set start and end time
d.      Typical use case: start it just before the problems, then stop.
e.      The default 15s interval is generally fine
4.      SQL Trace
a.      Many event classes
                                                    i.     RPC:Starting and RPC:Completed – an application such as ADO runs a sp, we will see an RPC event.
                                                   ii.     SP:Starting and SP:Completed – if use a sp in SQL batch
                                                  iii.     SP:StmtStarting and SP:StmtCompleted – track the statements inside a complex sp, used when the sp has many commands inside.
                                                  iv.     Choose the option - SQL Processing trace data (if not, Profiler – client side trace will be used). A better way is to use SQL trace scripts, which can be generated from an exported trace.
b.      Analyze Trace Results
                                                    i.     Read/Write – number of times, not page or KB; Logical, not physical
                                                   ii.     fn_trace_gettable(FILENAME,number_files) -- Filename cannot be Chinese
1.      can import multiple files into a table
2.      easy to query, even with index
USE [tempdb] --一定要指定tempdb数据库
 GO
IF OBJECT_ID('#temp_trc') IS NOT  NULL
    DROP TABLE #temp_trc
GO

--不需要事先存在#temp_trc
SELECT  *
INTO    #temp_trc
FROM    fn_trace_gettable('D:\1.trc', 1)  --trace文件名字不能是中文!!!!!!!!!
WHERE   [EventClass] IN ( 10, 12 )
 GO

SELECT  *
FROM    [#temp_trc]
                                                  iii.     ReadTrace (now Replay Markup Language (RML) Utilities for SQL Server) at http://support.microsoft.com/kb/944837

--自动化分析SQL Trace
 --每次都要手写查询语句,来分析trace文件里的记录,有很多重复劳动,比较麻烦
 --微软技术支持部门为了节省大家的时间,开发了一个叫ReadTrace的工具
 --可以自动帮你分析抓来的语句。这个工具可以在下面这个文章里的的连接下载

 --http://support.microsoft.com/kb/944837

 --安装完毕后,在C:\Program Files\Microsoft Corporation\RMLUtils
 --会有一个小工具叫ReadTrace
 --这个是cmd命令行工具,要在cmd下运行

 --例如要分析在D:\TraceAnalysis里的trace文件,输出也产生在D:\TraceAnalysis下,命令可以是:
 --注意参数是-f  不能是/f
 --参数区分大小写
 --参数后面要紧挨着参数值,路径不能有双引号:-oD:\TraceAnalysis

 >"C:\Program Files\Microsoft Corporation\RMLUtils\ReadTrace.exe" -ID:\TraceAnalysis\1.trc  -oD:\TraceAnalysis -f


 --如果ReadTrace正常执行,会将Trace文件的记录自动导入SQL里的一个叫PerfAnalysis的数据库里
 --然后再作出计算。在生成目录下会产生ReadTrace.log文件

 --生成的报表,可以选择RML Utilities for SQLSERVER ->Reporter来访问

 --注意,要确保当前你的数据库有PerfAnalysis 数据库!!!!!!!!!!!
5.      DMVs
a.      Why need them?
                                                    i.     Provide many that SQL trace cannot capture
                                                   ii.     Low overhead
                                                  iii.     If we have no ideas about the problem, we can use DMVs first, then use SQL Trace after problem becomes clearer.

b.      Which ones to use?
                                                    i.     To capture live data
--DBA比较有帮助的跟踪指令有下面这些:
 SELECT * FROM sys.[sysprocesses]
 --更精确地,是下面几张视图的联合结果
 SELECT * FROM sys.[dm_exec_sessions]
 SELECT * FROM sys.[dm_exec_requests]
 SELECT * FROM sys.[dm_os_tasks]
 SELECT * FROM sys.[dm_exec_connections]

 --SQL 各种等待状态出现的次数的统计
 DBCC SQLPERF(waitstats)

 --每个调度进程里的队列以及他们的状态信息Scheduler ID
 --如果线程的数目远小于设置的最大数,那就要考虑是不是memtoleave有压力了
 --由于线程使用的是memtoleave的内存,确认SQL还有足够的memtoleave
 DBCC SQLPERF(umsstats)--User-Mode Scheduling: internal thread scheduler in SQL
  
 --内存分配信息
 SELECT  type ,
         SUM(virtual_memory_reserved_kb) AS [vm reserved] ,
         SUM(virtual_memory_committed_kb) AS [vm commited] ,
         SUM(awe_allocated_kb) AS [awe allocated] ,
         SUM(shared_memory_reserved_kb) AS [sm reserved] ,
         SUM(shared_memory_committed_kb) AS [sm committed] ,
         SUM(single_pages_kb) AS [singlepage allocator],
         SUM(multi_pages_kb) AS [multi page allocated]
 FROM    sys.dm_os_memory_clerks
 GROUP BY type
 ORDER BY type

--An example of 跟踪系统性能脚本:
--Save the script to c:\users\xxx\desktop\queryprocess.sql

 SET NOCOUNT ON
 GO
 USE master
 GO

 DECLARE @i INT
 DECLARE @loops INT
 DECLARE @delayStr VARCHAR(20)

 --根据需求,可以调整@loops@delayStr的值
 SET @loops = 1
 SET @delayStr = '00:0:10'  --每秒执行一次脚本

 DBCC SQLPERF(waitstats,CLEAR)
 DBCC SQLPERF(umsstats,CLEAR)

 SET @i = 0
 WHILE @i < @loops
     BEGIN
         WAITFOR DELAY @delayStr
         PRINT 'START TIME'
         SELECT  CURRENT_TIMESTAMP
         PRINT 'END TIME'
         PRINT ''
     
         PRINT 'START SYSPROC'
         SELECT  *
         FROM    [sys].[sysprocesses]
         PRINT 'END SYSPROC'
         PRINT ''
    
         PRINT 'START WAITSTATS'
         DBCC SQLPERF(waitstats)
         PRINT 'END WAITSTATS'
         PRINT ''
    
         PRINT 'START SYSSCHED'
         DBCC SQLPERF(umsstats)
         PRINT 'END SYSSCHED'
         PRINT ''
         --     ...
         --根据需求加入其他监视语句
         SELECT  @i = @i + 1
     END
    
Run the following osql or sqlcmd:
osql /E /S /i c:\users\charlie\desktop\queryprocess.sql  /o c:\users\charlie\desktop\queryprocess.out /w 3000

                                                   ii.     Download more sample scripts from the SQL Nexus

What is SQL Nexus?
SQL Nexus is a tool that helps you identify the root cause of SQL Server performance issues. It loads and analyzes performance data collected by SQLDiag and PSSDiag. It can dramatically reduce the amount of time you spend manually analyzing data. Visit Getting Started page.

SQLdiag is a successor of PSSdiag. SQL Nexus is a tool used to load and analyze performance data collected by SQLdiag

What is SQLDiag?
In SQL Server 2005, the SQLdiag utility is a general purpose diagnostics collection utility that can be run as a console application or as a service. For more information about sqldiag, please refer to SQL Server books online documentation:http://msdn2.microsoft.com/en-us/library/ms162833.aspx.

Thought default templates are available, shipped products don't have a configuration user interface that would allow you to customize what profiler trace events or perfmon counters to collect. If you want to customize data collection, please use 
pssdiag/sqldiag manager
 from codeplex.

What is PSSDIAG?
PSSDIAG is a diagnostics utility used to collect profiler trace, perfmon data for SQL Server. PSSDAIG was created to troubleshoot SQL Server 7.0 and 2000 issues. It was evolved and shipped with SQL Server 2005 and beyond. But the shipped product only has a data collector engine called sqldiag.exe. Microsoft Product support continues to maintain a configuration tool that allows engineers to dynamically configure on what to collect. Due to customers' demand, this configuration tool has released under codeplex. please go toPssdiag/SqlDiag Manager project

If you troubleshoot SQL Server 2000 performance problems, you can continue to use an older version of pssdiag (
http://support.microsoft.com/kb/830232)

Steps 1-2-3

1.     Install SQL Nexus and RML Utilities
2.      Collect perf data from your SQL Server 2005/2008/2008R2/2012 instance using SQLDiag
3.      Import and analyze the data using SQL Nexus

What is RML?
The Microsoft SQL Server support team uses several internally written utilities to ease the work that is related to a typical customer support case. This article describes one utility suite that is called the Replay Markup Language (RML) Utilities for Microsoft SQL Server. Database developers and system administrators can use the RML Utilities for SQL Server to work with SQL Server 2000, SQL Server 2005, SQL Server 2008, SQL Server 2008 R2, SQL Server 2012 and SQL Server 2014 CTP2.

                                                  iii.     Some useful DMVs for cumulative statistics
--关于现在SQL缓存有执行计划的指令,他们的执行历史记录,包括CPU,READS,WRITES,编译重编译等
 --从这个视图可以了解语句执行历史
 SELECT * FROM sys.[dm_exec_query_stats]

 --表格以及其索引的查询,修改情况,以及上面发生过的I/O,阻塞等统计信息。从这个视图
 --可以了解数据库到底哪些表最繁忙
 SELECT * FROM sys.[dm_db_index_operational_stats](DB_ID('AdventureWorks'),NULL,NULL,NULL)
 SELECT * FROM sys.[dm_db_index_usage_stats] WHERE [database_id]=DB_ID('AdventureWorks')

 --以文件为单位,统计各个文件上发生过的I/O及其I/O等待。可以知道在SQL层面,哪个数据文件最繁忙
 SELECT * FROM sys.[dm_io_virtual_file_stats](DB_ID('AdventureWorks'),NULL)

 --可以计算出tempdb的使用情况。
 SELECT * FROM sys.[dm_db_file_space_usage] WHERE [database_id]=DB_ID('AdventureWorks')
 SELECT * FROM sys.[dm_db_session_space_usage] WHERE [database_id]=DB_ID('AdventureWorks')
 SELECT * FROM sys.[dm_db_task_space_usage] WHERE [database_id]=DB_ID('AdventureWorks')

 --内存里数据页面的缓存情况
 SELECT * FROM sys.[dm_os_buffer_descriptors] WHERE [database_id]=DB_ID('AdventureWorks')

                                                  iv.     More scripts from others
                                                   v.     Scripts from the author

6.      SQLDiag – Automatic Data Collection
有个微软内部工具,叫SQLDiag/PSSDiag,可以自动收集很多信息。

SQLDiag可以收集的信息有:
 ()Windows事件日志
 ()SQLSERVER ErrorLog,以及SQL配置信息,一些重要运行信息
 ()SQL曾经产生的DUMP文件
 ()服务器系统配置信息
 ()同时包含有系统和SQL性能计数器的性能日志
 ()服务器端Trace

 而且它可以帮助DBA自动开服务器端Trace,比用Profiler要安全多了

 SQLDiag工具默认安装在:
 C:\Program Files\Microsoft SQL Server\90\Tools\Binn
 需要在命令行窗口运行。如果一台服务器从来没有运行过这个工具,可以
 在命令行直接运行\SQLdiag.exe,工具会使用默认的配置,收集系统信息

 当出现SQLDIAG Collection started.  Press Ctrl+C to stop. 信息以后就
 Crtl+C,终止这个工具的执行。SQLDIAG 工具会在Binn目录下产生三个
 XML文件,并且产生一个SQLDIAG子目录,存放刚才收集的信息


 使用默认配置(即Binn目录下的SQLDiag.xml)只会收到:
 (1)SQL errorlog文件内容,以及SQL配置信息和一些重要运行信息(XXXX_sp_sqldiag_Shutdown.out
 (2)SQL曾经产生的dump记录(XXXX_SQLDUMPER_ERRORLOG.log
 (3)服务器系统配置信息(XXXX_MSINFO32.txt
 (4)SQL默认开启的Default Trace文件(log_XXX.trc

 但是使用另外两个XML配置文件SD_General.xml SD_Detailed.xml,就能够收集到
 SQL Trace和性能日志。但是这样,产生的文件就会比较大,放在SQL安装目录下
 就不再合适。需要在SQLDiag里指定输出文件路径。

 使用的指令是:
 SQLdiag.exe /I <configure_file> /O <output_directory>
 例如:
 SQLdiag.exe /I SD_General.XML /O D:\SQLDIAOUT

  运行这句话之后,就能看到SQLdiag开启PerfmonAddingPerfmon counters
 TraceStarting Profiler Trace)。等SQLdiag成功开启以后,DBA
 就可以开始尝试重现自己的问题。当问题跟踪下来以后,再按Ctrl+C
 工具会停止日志收集。有些信息是在工具关闭前收集的,所以
 可能要等一会才能完全停止

  而在指定的D:\SQLDIAOUT目录,可以看到以下更多信息
 ()SQL Trace文件(XXXX_sp_trace.trc
 ()Windows事件日志(XXXX_applog_Shutdown.txt,XXXX_seclog_Shutdown.txt,XXXX_syslog_Shutdown.txt
 ()Windows性能日志(SQLdiag.blg)用性能监视器也打不开这个文件,要使用系统自带的小工具relog /? cmd下运行的

要将名为SQLDIAG.BLG sqldiag性能日志文件转换为名为sqldiag.csv.csv 日志文件,使用命令:

 relog D:\SQLDIAOUT\SQLDIAG.BLG -f CSV -o D:\sqldiag.csv

 两个XML配置文件SD_General.xml SD_Detailed.xml的主要区别是SD_Detailed.xml
 会在SQL Trace里收集更多的事件,所以输出会要大很多。DBA可以根据需要选择其中一个

 SQLdiag的唯一缺点,是不会定期查询系统管理视图,所以建议的自动化信息收集方式
 SQLdiag(使用SD_General.xml SD_Detailed.xml做配置文件),再加上
 PerfStatsScript.sql的监视脚本的输出。用这两个手段,就能够比较全面地收集
 系统信息了。

PerfStatsScript2005 - Mainly Current Data
PerfStatsScript2008
PerfStatsScript2008R2 (the same as 2008)
PerfStatsScript2012
PerfStatsScript2014

PerfStatsScript:说明:建立两个存储过程,
    1)sp_perf_stats09:–列出最耗时的请求及找到阻塞源头,#tmp_requests:查询当前活动的请求,任务,等基本信息–#tmp_requests2:据#tmp_requests查询当前的活动事务,阻塞的进程
    2)sp_perf_stats_infrequent09:收集sqlserver的性能计数器,如内存管理,CLR等sys.dm_os_performance_counters,CPU运行情况,I/O情况
    每10分钟执行一次sp_perf_stats09,每小时执行一次sp_perf_stats_infrequent09
7.      Performance Dashboard – for Customer Reports in SQL Server
a.      SQL Server 2005
b.      SQL Server 2008 and R2
c.      SQL Server 2012