류짱:Beyond MySelf

클러스터 로그 분석을 위한 참고 사이트 본문

Microsoft/Failover Cluster

클러스터 로그 분석을 위한 참고 사이트

リュちゃん 2012. 2. 8. 17:05

최근 Windows Server 2003클러스터의 로그 분석 문의가 부쩍 많아졌습니다. 클러스터 로그 생성 과정과 로그 분석시 참고 할 만한 사이트를 정리 해 봅니다.

[
클러스트 에러 메시지 확인 방법
]
명령 프롬프트에서 아래 명령어를 실행 하면 에러 코드의 확인이 가능 합니다
.

net helpmsg [error_number]

C:\Users\Administrator>net helpmsg 32
The process cannot access the file because it is being used by another process.

[Log Summary of Cluster Formation]
When the Cluster service forms a cluster, it does the following:

1. Starts the Resource Monitor, the manager of the cluster's resources.
2. Brings the Quorum resource online.
3. Updates local copies of the cluster database.
4. Recreates groups and resources.
5. Configures the networks, recreates network and interface objects, registers the networks and interfaces with cluster transport, and then brings them online.
6. Brings all resources online.
7. Takes a checkpoint of the cluster database.
9. And the cluster is formed. Resources continue to be brought online after the cluster formation is reported complete.


Forming a Cluster
Forming a cluster involves the following stages:

1. Starting an instance of the Resource Monitor (Resrcmon.exe).
2. Bringing the Quorum resource online, including the following:
3. Applying the quorum log changes to the cluster database.
4. Recreating groups and resources in the cluster database
5. The Cluster service might have used stale object information at startup; now it needs to destroy and then recreate all the group and resource objects in order to refresh their information.
6. Configuring the networks.
7. Bringing resources online, which might involve updating resources' registry keys.
8. The cluster can be successfully formed before all the resources have been brought online.

Initializing the Node
The following entries represent initialization of the local node.

378.32c::1999/06/09-18:00:18.874 Cluster Service started - Cluster Node
Version 3.2051
378.32c::1999/06/09-18:00:18.874 OS Version
5.0.2051
Note that the preceding entries include, with the time the Cluster service started, the version number of the Cluster service and of the node's operating system.

378.380::1999/06/09-18:00:18.874 [CS] Service Starting...
378.380::1999/06/09-18:00:19.210 [EP] Initialization...
378.380::1999/06/09-18:00:19.218 [DM]: Initialization
378.380::1999/06/09-18:00:19.226 [DM]: Loading cluster database from
C:\WINNT\cluster\CLUSDB
In the preceding entry, the Database Manager loads the cluster database into the local registry. Later, the Database Manager updates the cluster's registry data with any cluster database checkpoints or quorum log change records that are more recent than the version of the cluster database that it just loaded into the cluster registry key.

378.380::1999/06/09-18:00:19.382 [DM] DmpStartFlusher: Entry
378.380::1999/06/09-18:00:19.382 [DM] DmpStartFlusher: thread created
378.380::1999/06/09-18:00:19.406 [NM] Initializing...
378.380::1999/06/09-18:00:19.429 [NM] Local node name = NODE1.
378.380::1999/06/09-18:00:19.429 [NM] Local node ID = 1.
The last two, preceding entries identify the node the name and ID of the node whose activity this log tracks. This identity is important for tracking interactions in the various nodes' cluster logs.

378.380::1999/06/09-18:00:19.429 [NM] Creating object for node 1 (NODE1)
378.380::1999/06/09-18:00:19.429 [NM] Initializing networks.
378.380::1999/06/09-18:00:19.437 [NM] Initializing network interfaces.
378.380::1999/06/09-18:00:19.609 [NM] Initialization complete.
378.380::1999/06/09-18:00:19.632 [FM] Starting worker thread...
378.3a8::1999/06/09-18:00:19.632 [FM] Worker thread running
378.380::1999/06/09-18:00:19.632 [API] Initializing
378.380::1999/06/09-18:00:19.632 [lm] :LmInitialize Entry.
378.380::1999/06/09-18:00:19.640 [lm] :TimerActInitialize Entry.
378.380::1999/06/09-18:00:19.640 [CS] Service Domain Account = ITRESKIT\administrator
378.380::1999/06/09-18:00:19.640 [CS] Initializing RPC server.
378.380::1999/06/09-18:00:19.734 [INIT] Attempting to join cluster CLUSTER1
After it is initialized, the Cluster service immediately tries to join a cluster.

[Anatomy of a Cluster Log Entry]
Cluster log abbreviations for components and node states are shown in Table 20.1.

Table 20.1 Cluster Log Abbreviations for Components and Node States

Abbreviation

Node state or component

[API]

API support. These entries come from the Cluster service component that provides support for the Server Cluster API.

[ClMsg]

Cluster messaging. The component that Regroup (also known as Membership Manager see later in this table) uses to send and receive its messages.

[ClNet]

Cluster network engine. Generic code to determine a node's network configuration.

[CP]

Checkpoint Manager. If a resource has its registry key registered for checkpointing, the Checkpoint Manager monitors any changes to the key while the resource is online and writes a checkpoint to the quorum disk whenever there is a change to the registered key. On the node to which the resource is being failed over, the resource key in the registry is updated with the resource key's checkpoint before the resource is brought online.

[CS]

Cluster service. This abbreviation is assigned to messages that come out of the Cluster service rather than one of its components.

[DM]

Database Manager. The agent through which other components read or make changes to the cluster configuration database.

[EP]

Event Processor. Components of the Cluster service register with the Event Processor to receive internal cluster events, such as a node's going up or down.

[FM]

Failover Manager. Coordinates the moving of a group from one node to another based on failure criteria specified by the group's properties.

[GUM]

Global Update Manager. A cluster-wide, broadcast-like remote procedure call (RPC) mechanism used to distribute information to all nodes in the cluster.

[INIT]

The initial state of a node prior to joining or forming a cluster.

[JOIN]

The node state that follows [INIT] when the node attempts to join a cluster. If the join operation succeeds, the state of the node then moves to cluster member.

[LM]

Log Manager. Maintains the quorum log.

[MM]

Membership Manager, also known and written to the cluster log as Regroup ([RGP]). See [RGP] in this table.

[NM]

Node Manager. Keeps track of the state of other nodes in the cluster as well as maintaining the cluster-wide network configuration.

[OM]

Object Manager. Maintains an in-memory database of entities, or objects (nodes, networks, groups, and so on). Each object has an associated type and a set of methods with which other components can manipulate it. Each cluster object is represented in the Object Manager space. The Object Manager does not differentiate between types of objects.

[RGP]

Regroup, also known and written to the cluster log as Membership Manager ([MM]). Tracks which nodes are members of the cluster. Regroup writes entries to the log during initialization, form operations, and join operations, and when cluster membership changes.

[RM]

Resource Monitor. Any of the processes (instances of Resrcmon.exe) of the Cluster service that actually monitor individual resources.



[
참고 자료]

Interpreting the Cluster Log
http://technet.microsoft.com/en-us/library/cc961673.aspx

The meaning of state codes in the Cluster log
http://support.microsoft.com/kb/286052/en-us

Recovering from a lost or corrupted quorum log
http://support.microsoft.com/kb/245762/EN-US

Server Cluster Troubleshooting
http://technet.microsoft.com/en-us/library/cc776978(WS.10).aspx

How to Troubleshoot Cluster Service Startup Issue
http://support.microsoft.com/kb/266274/en-us

How to Use the Cluster TMP file to Replace a Damaged Clusdb File
http://support.microsoft.com/kb/224999/EN-US

감사합니다.