1. Windows Event Log
a.
System log – sql server service start and close,
operating system events, hardware changes etc.
b.
Application log
c.
Security log – for authentication errors
2. SQL Server Errorlog
a.
Where? –one way is to check the –e in startup
parameters in configuration manager
b.
7 files by default
c.
What are not reported in errorlog?
i. Blocking
ii. Performance
problem due to poor table and statement design
iii. Errors
at the OS level
3. Performance Monitor
a.
Perfmon.exe
b.
Live counters
c.
Limited time window at one time (1m4 40s by
default), but you can set start and end time
d.
Typical use case: start it just before the
problems, then stop.
e.
The default 15s interval is generally fine
4. SQL Trace
a.
Many event classes
i. RPC:Starting
and RPC:Completed – an application such as ADO runs a sp, we will see an RPC
event.
ii. SP:Starting
and SP:Completed – if use a sp in SQL batch
iii. SP:StmtStarting
and SP:StmtCompleted – track the statements inside a complex sp, used when the
sp has many commands inside.
iv. Choose
the option - SQL Processing trace data (if not, Profiler – client side trace
will be used). A better way is to use SQL trace scripts, which can be generated
from an exported trace.
b.
Analyze Trace Results
i. Read/Write
– number of times, not page or KB; Logical, not physical
ii. fn_trace_gettable(FILENAME,number_files) -- Filename cannot be
Chinese
1.
can import multiple files into a table
2.
easy to query, even with index
USE [tempdb] --一定要指定tempdb数据库
GO
IF OBJECT_ID('#temp_trc') IS NOT NULL
DROP TABLE #temp_trc
GO
--不需要事先存在#temp_trc表
SELECT
*
INTO
#temp_trc
FROM
fn_trace_gettable('D:\1.trc', 1) --trace文件名字不能是中文!!!!!!!!!
WHERE
[EventClass] IN (
10, 12 )
GO
SELECT
*
FROM [#temp_trc]
iii. ReadTrace
(now Replay Markup Language (RML) Utilities for SQL Server) at http://support.microsoft.com/kb/944837
--自动化分析SQL Trace
--每次都要手写查询语句,来分析trace文件里的记录,有很多重复劳动,比较麻烦
--微软技术支持部门为了节省大家的时间,开发了一个叫ReadTrace的工具
--可以自动帮你分析抓来的语句。这个工具可以在下面这个文章里的的连接下载
--http://support.microsoft.com/kb/944837
--安装完毕后,在C:\Program Files\Microsoft Corporation\RMLUtils下
--会有一个小工具叫ReadTrace
--这个是cmd命令行工具,要在cmd下运行
--例如要分析在D:\TraceAnalysis里的trace文件,输出也产生在D:\TraceAnalysis下,命令可以是:
--注意参数是-f 不能是/f
--参数区分大小写
--参数后面要紧挨着参数值,路径不能有双引号:-oD:\TraceAnalysis
>"C:\Program Files\Microsoft
Corporation\RMLUtils\ReadTrace.exe" -ID:\TraceAnalysis\1.trc
-oD:\TraceAnalysis
-f
--如果ReadTrace正常执行,会将Trace文件的记录自动导入SQL里的一个叫PerfAnalysis的数据库里
--然后再作出计算。在生成目录下会产生ReadTrace.log文件
--生成的报表,可以选择RML Utilities for SQLSERVER ->Reporter来访问
--注意,要确保当前你的数据库有PerfAnalysis
数据库!!!!!!!!!!!
5. DMVs
a.
Why need them?
i. Provide
many that SQL trace cannot capture
ii. Low
overhead
iii. If
we have no ideas about the problem, we can use DMVs first, then use SQL Trace
after problem becomes clearer.
b.
Which ones to use?
i. To
capture live data
--对DBA比较有帮助的跟踪指令有下面这些:
SELECT * FROM sys.[sysprocesses]
--更精确地,是下面几张视图的联合结果
SELECT * FROM sys.[dm_exec_sessions]
SELECT * FROM sys.[dm_exec_requests]
SELECT * FROM sys.[dm_os_tasks]
SELECT * FROM sys.[dm_exec_connections]
--SQL 各种等待状态出现的次数的统计
DBCC SQLPERF(waitstats)
--每个调度进程里的队列以及他们的状态信息Scheduler ID
--如果线程的数目远小于设置的最大数,那就要考虑是不是memtoleave有压力了
--由于线程使用的是memtoleave的内存,确认SQL还有足够的memtoleave
DBCC SQLPERF(umsstats)--User-Mode Scheduling:
internal thread scheduler in SQL
--内存分配信息
SELECT type ,
SUM(virtual_memory_reserved_kb) AS [vm reserved] ,
SUM(virtual_memory_committed_kb) AS [vm commited] ,
SUM(awe_allocated_kb) AS [awe allocated] ,
SUM(shared_memory_reserved_kb) AS [sm reserved] ,
SUM(shared_memory_committed_kb) AS [sm committed] ,
SUM(single_pages_kb) AS [singlepage
allocator],
SUM(multi_pages_kb) AS [multi page
allocated]
FROM sys.dm_os_memory_clerks
GROUP BY type
ORDER BY type
--An example of 跟踪系统性能脚本:
--Save the script to c:\users\xxx\desktop\queryprocess.sql
SET NOCOUNT ON
GO
USE master
GO
DECLARE @i INT
DECLARE @loops INT
DECLARE @delayStr VARCHAR(20)
--根据需求,可以调整@loops和@delayStr的值
SET @loops = 1
SET @delayStr = '00:0:10' --每秒执行一次脚本
DBCC SQLPERF(waitstats,CLEAR)
DBCC SQLPERF(umsstats,CLEAR)
SET @i = 0
WHILE @i < @loops
BEGIN
WAITFOR DELAY
@delayStr
PRINT 'START TIME'
SELECT CURRENT_TIMESTAMP
PRINT 'END TIME'
PRINT ''
PRINT 'START SYSPROC'
SELECT *
FROM [sys].[sysprocesses]
PRINT 'END SYSPROC'
PRINT ''
PRINT 'START WAITSTATS'
DBCC SQLPERF(waitstats)
PRINT 'END WAITSTATS'
PRINT ''
PRINT 'START SYSSCHED'
DBCC SQLPERF(umsstats)
PRINT 'END SYSSCHED'
PRINT ''
-- ...
--根据需求加入其他监视语句
SELECT @i = @i + 1
END
Run the following osql or sqlcmd:
osql /E /S /i c:\users\charlie\desktop\queryprocess.sql /o c:\users\charlie\desktop\queryprocess.out /w 3000
ii. Download
more sample scripts from the SQL Nexus
What is SQL Nexus?
SQL Nexus is a tool
that helps you identify the root cause of SQL Server performance issues. It
loads and analyzes performance data collected by SQLDiag and PSSDiag. It can dramatically reduce the amount of time
you spend manually analyzing data. Visit Getting Started page.
SQLdiag is a successor of PSSdiag. SQL Nexus is
a tool used to load and analyze performance data collected by SQLdiag
What is SQLDiag?
In SQL Server 2005, the SQLdiag utility is a general purpose
diagnostics collection utility that can be run as a console application or as a
service. For more information about sqldiag, please refer to SQL Server books
online documentation:http://msdn2.microsoft.com/en-us/library/ms162833.aspx.
Thought default templates are available, shipped products don't have a configuration user interface that would allow you to customize what profiler trace events or perfmon counters to collect. If you want to customize data collection, please use pssdiag/sqldiag manager from codeplex.
What is PSSDIAG?
PSSDIAG is a
diagnostics utility used to collect profiler trace, perfmon data for SQL
Server. PSSDAIG was created to troubleshoot SQL Server 7.0 and 2000 issues. It
was evolved and shipped with SQL Server 2005 and beyond. But the shipped
product only has a data collector engine called sqldiag.exe. Microsoft Product
support continues to maintain a configuration tool that allows engineers to
dynamically configure on what to collect. Due to customers' demand, this
configuration tool has released under codeplex. please go toPssdiag/SqlDiag Manager project
If you troubleshoot SQL Server 2000 performance problems, you can continue to use an older version of pssdiag (http://support.microsoft.com/kb/830232)
If you troubleshoot SQL Server 2000 performance problems, you can continue to use an older version of pssdiag (http://support.microsoft.com/kb/830232)
Steps 1-2-3
1. Install SQL Nexus and RML Utilities
3.
Import and analyze the data
using SQL Nexus
What is RML?
The Microsoft SQL Server support team uses
several internally written utilities to ease the work that is related to a
typical customer support case. This article describes one utility suite that is
called the Replay Markup Language (RML) Utilities for Microsoft SQL Server.
Database developers and system administrators can use the RML Utilities for SQL
Server to work with SQL Server 2000, SQL Server 2005, SQL Server 2008, SQL
Server 2008 R2, SQL Server 2012 and SQL Server 2014 CTP2.
iii. Some
useful DMVs for cumulative statistics
--关于现在SQL缓存有执行计划的指令,他们的执行历史记录,包括CPU,READS,WRITES,编译重编译等
--从这个视图可以了解语句执行历史
SELECT * FROM sys.[dm_exec_query_stats]
--表格以及其索引的查询,修改情况,以及上面发生过的I/O,阻塞等统计信息。从这个视图
--可以了解数据库到底哪些表最繁忙
SELECT * FROM sys.[dm_db_index_operational_stats](DB_ID('AdventureWorks'),NULL,NULL,NULL)
SELECT * FROM sys.[dm_db_index_usage_stats]
WHERE [database_id]=DB_ID('AdventureWorks')
--以文件为单位,统计各个文件上发生过的I/O及其I/O等待。可以知道在SQL层面,哪个数据文件最繁忙
SELECT * FROM sys.[dm_io_virtual_file_stats](DB_ID('AdventureWorks'),NULL)
--可以计算出tempdb的使用情况。
SELECT * FROM sys.[dm_db_file_space_usage]
WHERE [database_id]=DB_ID('AdventureWorks')
SELECT * FROM sys.[dm_db_session_space_usage]
WHERE [database_id]=DB_ID('AdventureWorks')
SELECT * FROM sys.[dm_db_task_space_usage]
WHERE [database_id]=DB_ID('AdventureWorks')
--内存里数据页面的缓存情况
SELECT * FROM sys.[dm_os_buffer_descriptors]
WHERE [database_id]=DB_ID('AdventureWorks')
iv. More
scripts from others
v. Scripts
from the author
6. SQLDiag – Automatic Data Collection
有个微软内部工具,叫SQLDiag/PSSDiag,可以自动收集很多信息。
SQLDiag可以收集的信息有:
()Windows事件日志
()SQLSERVER
ErrorLog,以及SQL配置信息,一些重要运行信息
()SQL曾经产生的DUMP文件
()服务器系统配置信息
()同时包含有系统和SQL性能计数器的性能日志
()服务器端Trace
而且它可以帮助DBA自动开服务器端Trace,比用Profiler要安全多了
SQLDiag工具默认安装在:
C:\Program Files\Microsoft SQL
Server\90\Tools\Binn
需要在命令行窗口运行。如果一台服务器从来没有运行过这个工具,可以
在命令行直接运行\SQLdiag.exe,工具会使用默认的配置,收集系统信息
当出现SQLDIAG Collection started. Press Ctrl+C to stop. 信息以后就
按Crtl+C,终止这个工具的执行。SQLDIAG 工具会在Binn目录下产生三个
XML文件,并且产生一个SQLDIAG子目录,存放刚才收集的信息
使用默认配置(即Binn目录下的SQLDiag.xml)只会收到:
(1)SQL errorlog文件内容,以及SQL配置信息和一些重要运行信息(XXXX_sp_sqldiag_Shutdown.out)
(2)SQL曾经产生的dump记录(XXXX_SQLDUMPER_ERRORLOG.log)
(3)服务器系统配置信息(XXXX_MSINFO32.txt)
(4)SQL默认开启的Default Trace文件(log_XXX.trc)
但是使用另外两个XML配置文件SD_General.xml 和SD_Detailed.xml,就能够收集到
SQL Trace和性能日志。但是这样,产生的文件就会比较大,放在SQL安装目录下
就不再合适。需要在SQLDiag里指定输出文件路径。
使用的指令是:
SQLdiag.exe /I <configure_file>
/O <output_directory>
例如:
SQLdiag.exe /I SD_General.XML /O D:\SQLDIAOUT
运行这句话之后,就能看到SQLdiag开启Perfmon(AddingPerfmon
counters)
和Trace(Starting
Profiler Trace)。等SQLdiag成功开启以后,DBA
就可以开始尝试重现自己的问题。当问题跟踪下来以后,再按Ctrl+C
工具会停止日志收集。有些信息是在工具关闭前收集的,所以
可能要等一会才能完全停止
而在指定的D:\SQLDIAOUT目录,可以看到以下更多信息
()SQL Trace文件(XXXX_sp_trace.trc)
()Windows事件日志(XXXX_applog_Shutdown.txt,XXXX_seclog_Shutdown.txt,XXXX_syslog_Shutdown.txt)
()Windows性能日志(SQLdiag.blg)用性能监视器也打不开这个文件,要使用系统自带的小工具relog /? 在cmd下运行的
要将名为SQLDIAG.BLG 的sqldiag性能日志文件转换为名为sqldiag.csv的.csv 日志文件,使用命令:
relog D:\SQLDIAOUT\SQLDIAG.BLG
-f CSV -o D:\sqldiag.csv
两个XML配置文件SD_General.xml 和SD_Detailed.xml的主要区别是SD_Detailed.xml
会在SQL Trace里收集更多的事件,所以输出会要大很多。DBA可以根据需要选择其中一个
SQLdiag的唯一缺点,是不会定期查询系统管理视图,所以建议的自动化信息收集方式
是SQLdiag(使用SD_General.xml 和SD_Detailed.xml做配置文件),再加上
PerfStatsScript.sql的监视脚本的输出。用这两个手段,就能够比较全面地收集
系统信息了。
PerfStatsScript2005 - Mainly Current Data
PerfStatsScript2008
PerfStatsScript2008R2 (the same as 2008)
PerfStatsScript2012
PerfStatsScript2014
PerfStatsScript:说明:建立两个存储过程,
1)sp_perf_stats09:–列出最耗时的请求及找到阻塞源头,#tmp_requests:查询当前活动的请求,任务,等基本信息–#tmp_requests2:据#tmp_requests查询当前的活动事务,阻塞的进程
2)sp_perf_stats_infrequent09:收集sqlserver的性能计数器,如内存管理,CLR等sys.dm_os_performance_counters,CPU运行情况,I/O情况
每10分钟执行一次sp_perf_stats09,每小时执行一次sp_perf_stats_infrequent09
PerfStatsScript2005 - Mainly Current Data
PerfStatsScript2008
PerfStatsScript2008R2 (the same as 2008)
PerfStatsScript2012
PerfStatsScript2014
PerfStatsScript:说明:建立两个存储过程,
1)sp_perf_stats09:–列出最耗时的请求及找到阻塞源头,#tmp_requests:查询当前活动的请求,任务,等基本信息–#tmp_requests2:据#tmp_requests查询当前的活动事务,阻塞的进程
2)sp_perf_stats_infrequent09:收集sqlserver的性能计数器,如内存管理,CLR等sys.dm_os_performance_counters,CPU运行情况,I/O情况
每10分钟执行一次sp_perf_stats09,每小时执行一次sp_perf_stats_infrequent09
7. Performance Dashboard – for Customer Reports in SQL Server
a.
SQL Server 2005
http://www.mssqltips.com/sqlservertip/1553/installing-sql-server-2005-performance-dashboard-reports/
b.
SQL Server 2008 and R2
c.
SQL Server 2012