Thursday, October 10, 2013

Inside AlwaysOn

Monitor Performance for AlwaysOn Availability Groups

I am referring to http://technet.microsoft.com/en-us/library/dn135338.aspx
Availability groups in SQL server 2012 is a hot topic. Recently, I have set up and configured availability groups at my work. I have SQL server 2012 enterprise version on Windows 2012 Data Center Edition.

We will need to know how primary and secondary servers are communicating to each other in order to monitor performance of availability groups.


 Data Synchronization Process







1 . Log Flush

Log data is flushed to disk. This log must be replicated to the secondary replicas. The log records enter the send queue.


select * from sys.dm_os_performance_counters
where counter_name like '%log bytes Flushed/sec%' 


2. Log Capture

Logs for each database is captured and sent to the corresponding partner queue (one per database-replica pair). This capture process runs continuously as long as the availability replica is connected and data movement is not suspended for any reason, and the database-replica pair is shown to be either Synchronizing or Synchronized. If the capture process is not able to scan and enqueue the messages fast enough, the log send queue builds up.

select * from sys.dm_os_performance_counters where counter_name like '%bytes sent to Replica/sec%' 

3. Send

The messages in each database-replica queue is dequeued and sent across the wire to the respective secondary replica.

select  * from sys.dm_os_performance_counters where
counter_name like '%bytes sent to transport/sec%'



4. Received and Cached

Each secondary replica receives and caches the message.
select  * from sys.dm_os_performance_counters where
counter_name like '%log bytes received/sec%' 


5. Harden
Log is flushed on the secondary replica for hardening. After the log flush, an acknowledgement is sent back to the primary replica. Once the log is hardened, data loss is avoided. 


select  *  from sys.dm_os_wait_stats
where  wait_type = 'HADR_LOGCAPTURE_SYNC'

select  * from sys.dm_os_performance_counters
where  counter_name like '%log bytes Flushed/sec%'



6. Redo

Redo the flushed pages on the secondary replica. Pages are kept in the redo queue as they wait to be redone.


select  * from sys.dm_os_performance_counters
where  counter_name like '%redone bytes/sec%'


select  * from sys.dm_os_wait_stats  where  wait_type = 'REDO_THREAD_SYNC'


Flow Control Gates

AlwaysOn Availability Groups is designed with flow control gates on the primary replica to avoid excessive resource consumption, such as network and memory resources, on all availability replicas. These flow control gates do not affect the synchronization health state of the availability replicas, but they can affect the overall performance of your availability databases, including RPO.

how many times flow control was activated and how much time was spent waiting on flow control. Higher wait time on the flow control translate to higher RPO.


select  * from sys.dm_os_performance_counters  where  counter_name like '%Flow control%'






No comments:

Post a Comment

How to add a Database to AlwaysOn Availability Group with four different options

To add a database to an existing AlwaysOn availability group, MS has given us four options to choose from Automatic seeding Full database an...