第6章 连接的建立和问题排查 229
6.1. 协议的选择与别名 230
·
Shard Memory: 本地访问会使用的,一般用不到
·
TCP/IP – 优先考虑之
·
Named Pipes:命名管道不是基于网络协议的,而是为局域网开发的一种协议,它使用IPC$共享来无缝和透明地传输数据和用户认证的上下文,在访问IPC$共享的时候先要通过Windows认证,这也是命名管道的好处之一。
6.1.1. 服务器网络配置 231
·
设置
o
SQL Server
Configuration Manager|SQL Server Network Configuration
o
注册表
·
查看开启否 – SQL Server ErrorLog
6.1.2. SQL Server browser的作用 233
·
Background:
o
In SQL Server 2000, SQL
Server Resolution Protocol (SSRP) was developed to listen on UDP port 1434,
which responds to client requests with the names of the installed instances,
and the ports or named pipes used by the instance.
o
SQL Server 2005 introduced
the SQL Server Browser service as a replacement for SSRP.
·
How SQL Server Browser is better than SSRP? – use minimum
Account Privileges
·
Purpose – SQL Server Browser
listens for incoming requests for Microsoft SQL Server resources and
provides information about SQL Server instances installed on the computer.
·
How SQL Server Browser Works?
o
Upon startup, SQL Server
Browser starts and claims UDP port 1434. SQL Server Browser reads the registry,
identifies all instances of SQL Server on the computer, and notes the ports and
named pipes that they use. When a server has two or more network cards, SQL
Server Browser returns the first enabled port it encounters for SQL Server. SQL
Server Browser support ipv6 and ipv4.
o
When SQL Server clients
request SQL Server resources, the client network library sends a UDP message to
the server using port 1434. SQL Server Browser responds with the TCP/IP port or
named pipe of the requested instance. The network library on the client
application then completes the connection by sending a request to the server
using the port or named pipe of the desired instance.
·
What if SQL Server is not running?
o
If you provide the correct
port number or named pipe, you are still able to connect to SQL Server.
·
Connection Problems related to SLQ Server Browser?
o
SQL Server Browser itself rarely causes problems.
o
If you see “SQL Server
doesn’t exist or access denied”, the UDP 1434 on the server may be prohibited.
You need to open the port on the firewall and gateway. You may also try a
different port or piped name to see if SQL Server can be connected.
o
You also need to make sure
that the service account for SQL Server Browser has the permission to Read
and write the SQL Server registry keys related to network communication (ports
and pipes) (HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Microsoft SQL Server\MSSQL.X\MSSQLServer\SuperSocketNetLib)
6.1.3. 客户端网络配置 235
·
客户端应用程序都是通过加载SQL Server的数据驱动控件来连接SQL Server的。
·
4种常见的数据驱动是:
o
MDAC
ü
传统的ODBC和OLE DB接口
ü
老的VB,VC,或COM应用程序还在用
ü
需要运行cliconfg.exe来配置网络协议,也可直接修改注册表。
o
SQL Server Native
Client
ü
SQL Server 2005+
ü
SQL Server OLEDB and SQL
Server ODBC use One DLL
ü
如果装了客户端,在SQL Server配置管理器中对Native Client进行配置,如没装,直接修改注册表。
o
SQLClient
ü
For managed applications。目前一般使用ADO.NET来连接SQL Server。
ü
SQLClient没有配置工具,按照Shared Memory->TCP/IP->Named Pipes顺序来连接.
o
Microsoft JDBC Driver
for SQL Server
ü
For JAVA applications
connecting to SQL Server。
ü
没有专门的网络配置界面。
6.1.4. 客户端网络连接选择机制 239
如果客户端开启了多个网络协议,选择的顺序是:Connection String, Alias, LastConnect, and SQL Server Browser.
·
Specified in the connection
string
o
Option 1(使用了Server关键词)
Server = [protocol:] Server [,
port], where protocol can be tcp (TCP/IP), lpc (shared memory), or np (named pipes)
Example 1: Server=np:myServer\myInstance
Example 2: if you specify the connection server is
np:server_name in SSMS, then SSMS will only use this named pipe for the
connection.
o
Option 2(使用了Network关键词): Network = dbmssocn (for TCP/IP) or dbnmpntw (for Named Pipe)
·
Alias – 连接有别名, 就用别名。如果使用别名连不上就报错。
·
The “LastConnect” value in
the registry – if not successful, will try the methods below
·
If no protocol is specified
in method 1, server name does not appear in the alias, and no ‘LastConnect”, then
the client chooses the protocol based on the precedence of protocols, SQL
Server Browser will acquire the port or piped name dynamically.
6.2. 连接失败检测步骤——命名管道 241
命名管道是Windows系统中进程间通信机制(Inter-Process
Communications)的一种。它是通过共享ICP$ 来进行通讯。
6.2.1. SQL Server命名管道工作原理 242
o
Server, Pipe, Path
o
Local server can be
replaced with ‘.’
o
Examples:
\\.\pipe\sql\query for the default instance or \\.\Pipe\MSSQL$instancename\sql\query
for a named instance.
·
配置或查看命名管道
– SQL Server Configuration Manger(需重启生效)
·
启动否? - 看ErrorLog
6.2.2. 客户端的命名管道配置 243
·
MDAC数据库接口 (run cliconfig.exe)
·
SQL Server
Configuration Manager | SQL Server Native Client
·
When SQL Server监听的命名管道或IP的address, port is NOT the default one anymore, you can use alias
in either tool above. However, the alias should be set up for all clients, not
on the server.
6.2.3. 命名管道连接问题的解决步骤 245
·
Use SQL Server Client Network
Utility or SQL Server Configuration
Manager|SQL Server Native Client to
check the setup of the named pipe and make sure SQL Server monitors it
·
客户端连接的命名管道与SQL Server监听的要一致,服务器别名没有错误。
·
Make sure the network
connection is good. You can PING both the IP and the name for the SQL Server
·
Make sure the client can pass
through the Windows authentication (under DOS prompt):
·
Make sure the client can
login into SQL Server.
6.2.4. 一些常见的连接问题 246
·
Example 1 - [named pipes] SQL
Server does not exist or access denied
o
Cause: the client has not
found the named pipe.
o
Solution: (1) SQL Server is
started, (2) the setup of named pipe matches on the server and client, and (3)
network connection is through (PING),
·
Example 2 - Login failed for
user ‘NULL’ or Login failed for
user anonymous
o
Cause: 这个错误意味着网络连通性没问题,但命名管道访问服务器有权限上的问题。没有权限访问IPC$就无法使用命名管道。另一种情况是你使用了本地账号登录client机器,又尝试用这个权限不足的本地账号去登录SQL Server服务器。
o
Solution: (1) run Net use \\servername \IPC$ to check, (2) Use a domain account to log in again the client machine,
while ensuring that the domain account has been added to the inside of the SQL Server login.
==========http://www.databaseskill.com/1621432/==========
Connectivity issues:
Login failed for user 'NULL' or Login failed for user
anonymous
This error basically means that the network connectivity is no
problem, just use named on the pipeline have permission to access the server. Do
not forget that the IPC $ share Oh. Not have access to the IPC
$ will not be able to use named pipes. Can run "net use "command
to test.
In most cases this error is because you use the insufficient
privileges account login client machine, and you use
Windows Authentication to access SQL Server cause. For
example, you use the client machine local account login, use windows authentication way
to access SQL Server. Access client local account does not
have permission to access server resources, will inevitably lead to failure. The solution is
to use a domain account to log in again the client machine, while
ensuring that the domain account has been added to the inside of the SQL
Server login. There is another method is to use a SQL Server account instead
of the Windows account attempts to connect to SQL
Server, and uses the TCP / IP protocol. Do not forget, using SQL account,
if the agreement is named pipes, but also need Windows authentication.
==================================================
·
Example 3 - Login failed for
user ‘xxx’
o
Cause: insufficient
permission. It is not a problem of connection, instead, it is a problem of
authentication.
o
Solution: give the user
sufficient permission
·
Two Tips
o
Which protocol the connection
is using?
SELECT session_id, net_transport,client_net_address
FROM sys.dm_exec_connections
o
除SSMS外, ODBC数据源也可用来测试SQL
Server 的连接问题
6.3. 连接失败检测步骤——tcp/ip 247
6.3.1. SQL Server监听的tcp/ip端口号 248
·
查看设置: TCP/IP|Properties(右击)|看IP4。注意,这里只是设置,连接未必用TCP/IP。个人的单一机器常用的是Shared Memory。可以进一步查看:
SELECT session_id, local_tcp_port,
net_transport
FROM sys.dm_exec_connections
·
查看监听的端口:
netstat –an (DOS command)(如不用TCP/IP连接SQL Server,设置的端口如1433未必出现。出现的这些TCP/IP应该是其它程序所用。)
·
Port: 0为动态,非0为静态,default为1433(静态)。如果是静态但Port不是1433,会怎样?
o
Default instance – 在客户端使用相同的Port或使用别名
o
Named Instance – 不需要设置,客户端可以通过SQL
Server Browser连接到命名实例。
6.3.2. 客户端的tcp/ip协议配置 249
·
一般无需配置,因为默认是Enabled。
·
但也可以通过界面或注册表改变。
6.3.3. tcp/ip连接的keepalive机制 251
当客户端和服务器之间通过TCP/IP建立连接后,如果过一段时间,客户端不向服务器发出SQL Server的请求,是否会断掉?如果客户端没问题,就一直连着。如果客户端有问题,35秒后就断掉了(By default, keepalivetime为30s,keepaliveinterval为1s,windows
tcp配置默认TcpMaxDataRetransmissions为5s)。
6.3.4. 配置SQL Server的keepalive配置 252
·
Keepalivetime – 可以通过界面和注册表修改
·
tcpMaxDataRetransmissions
– 可以通过注册表修改
·
keepaliveinterval –
cannot be changed
6.3.5. 配置客户端的keepalive配置 253
·
客户端也有keepalive机制,用来控制客户端程序。
·
当当客户端和服务器之间通过TCP/IP建立连接,客户端和服务器的Keepalive都会起作用。
·
By default, keepalivetime也为30s,keepaliveinterval为1s。没有tcpMaxDataRetransmissions。
·
与服务器不同,客户端的Keepaliveinterval可修改。
·
以上2个参数值,可以通过注册表修改。对SQL Server Native Client,还可以在SQL Server配置管理器上修改。
6.3.6. tcp/ip连接问题的解决步骤 254
·
开启否?- Make sure SQL Server is listening on TCP/IP – errorlog(Enabled了就监听,但监听未必表示就用它)
·
端口一致否? - Make sure the TCP/IP port the server is listening to is the
same as that on the client machine or the alias
·
网络能否连通? - Examine the network connectivity by PING the IP address and
the name of the SQL Server
o
Make sure you can PING SQL
Server IP address
ü
How to obtain the IP address
for SQL Server?
If you're running SQL Server 2005 or
above you can use:
Select client_net_address,local_net_address
from sys.dm_exec_connections
where session_id=@@SPId;
/*
client_net_address
local_net_address
169.254.161.61
169.254.161.61
*/
Limitation: You only get IP addresses if you're connective
via TCP/IP. If you use Shared Memory then those attributes don't exist. If you turn off
Shared Memory (or any protocols except for TCP/IP) via Server Configuration
Manager you will always get IP address for any connection using this technique.
ü
How to PING?
PING
169.254.161.61
o
You also need to make sure
that the client and the server are on the same network. You can run ipconfig
/all on both the client and the server to check. Or run the above script in
SSMS.
o
Make sure you can PING SQL
Server name
PING
XU_XPS
If there is a problem, try to add the IP/name pair to the hosts
file under c:\windows\system32\drivers\etc
·
能否Telnet那个端口? - Use Telnet to check the port that SQL Server is listening on.
(Telnet was disable in Windows 8.1 by default, you need to turn it on through
Windows features)
Telnet 169.254.161.61 1433
·
有权限否? - 客户端的登录用户是否有权限登录SQL Server
·
TIPS
o
如果服务器是NUMA结构,可以通过配置多个静态端口,把NUMA的节点和端口绑定起来。即Network Connection
Affinity。
o
How to know which program is
using port 1433 or 1434(导致TCP/IP端口绑定失败)?
Download and install PortQry.exe
Run the following query:
Portqry-n localhost –p TCP –e 1433
Portqry-n localhost –p UDP –e 1434
o
Which protocol is being used
for the current connection?
SELECT session_id, net_transport,
client_net_address FROM sys.dm_exec_connections
--For SQL Server 2000, Net_Libary =
LPC (local PC or shared memory)
SELECT Net_library, hostname, program_name,
nt_domain, nt_username,loginame
FROM master..sysprocesses
WHERE spid>50
o
如何访问防火墙后面的SQL Server?
o
How to connect to SQL Server
with a specific port (强制使用)?
TCP: ServerName, 1433 (e.g., TCP:localhost, 1433)
o
最后一招 – 用Network Monitor抓网络包来分析
Microsoft
Network Monitor has been superseded by Microsoft Message Analyzer
6.4. 一般性网络错误 258
6.4.1. Client已找到SQL Server, 但在连接、传送或接受返回结果时,出现问题。
6.4.2. 可能原因:
·
服务器负荷太大
·
Application Server没用Connection
Pool
·
网络传输问题
·
Windows系统出于自身考虑,拒绝了SQL Server连接。
·
客户端的Windows层面问题
·
不恰当的系统设置如修改注册表中TCP的设定,SQL
Server层面的 priority boost
or lightweight pooling。
·
防病毒软件和防火墙误杀SQL
Server连接。
·
有些网络设备空闲了30-45分钟就关闭了
·
错误的网卡设置
6.4.3. Problem
Solving
·
Communicate with Network Administrators
·
Check SQL Server and client health
o
make sure no obvious error in windows’ error log
o
CPU usage cannot be 100% for a long time
o
No high pressure on cache and RAM
o
No serious errors in errorlog such as AV, RAM, disk, or
17883/17884
o
The blocked QL Server connections should not be over 100.
o
SQL Server should not have many runnable and running
processes
Select *
FROM master..sysprocesses
WHERE spid>50
·
Recommendations
o
Set EnableRSS in HKey to 0
o
Set DisableTaskOff to 1
o
Disable TCP Chimney
o
Set TCPA in HKey to 0
o
Restart the machine
o
Do not change priority boost and lightweight pooling
o
Do not change the default values for HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\Tcpip\Parameters
o
Do not use teaming network card
o
Make sure the automatic closing function for the idle status
of the network equipment is disabled.
o
Temporarily disable AV and Firewall
o
Make the server and client machine physically closer if
possible
o
Change to a different protocol such as TCP/IP to a named pipe
or vice versa
o
Much more…
6.5. 利用ring buffer排查连接问题 266
6.5.1. SQLServer2008+ has a new feature - Connectivity Ring Buffer, which is used to捕捉每个由服务器发起的连接关闭记录,包含异常中断和登陆失败。
6.5.2. Ring Buffer记录3种数据:ConnectionClose,Error,LoginTimers(整个登陆过程所花的时间。最多1000条数据。
6.5.3. Connectivity Ring Buffer的优势在于可以让你在不使用Network Monitor的情况下解决棘手的问题。
6.5.4. 你可以直接分析以下的XML结果,也可以将查询结果导入到临时表中去做进一步的分析。
SELECT CAST(record AS XML) record
,CAST( record AS XML). value('(//Record/ConnectivityTraceRecord/RecordTime)[1]','datetime') recordtime
,CAST( record AS XML). value('(//Record/ConnectivityTraceRecord/RecordType)[1]','varchar(20)') Recordtype
FROM sys.dm_os_ring_buffers
WHERE ring_buffer_type ='RING_BUFFER_CONNECTIVITY'
6.6. 小结 271
在一些比较复杂的网络条件下,造成SQL
Server连接问题的罪魁祸首往往来自网络层面。