Wednesday, January 22, 2014

SQL Server 2012 AlwaysOn Testing

Preparation

Use SQL server management studio (SMS), create a test database and add the database to availability group database.

create database test

create table test
(testno int)

Data Load script

I will use the following simple data load script to test data synchronization and connection.

declare @i as integer
declare @id as integer
set @i = 200000
set @id = 1
insert into test(testno)
select @id

while ((select max(testno) from test) < @i )
begin
insert into test(testno)
select @id
set @id = @id +1
print @id
end


Test Case

Test 1: Ensure synchronization works normal

  • Connect to Availability group Listener and run data load script
  • Check the secondary replica to see if the log records made it across


Expected Result :Logs are written to Secondary replica. Row counts are the same.

Test 2: Primary replica with Manual failover

  • Start manual failover from AlwaysOn Dashboard
  • Confirm Primary is successfully failover to Secondary replica
  • Reconnect to Availability group Listener
Expected Error : In SMS,Remote harden of transaction 'INSERT' (ID 0x0000000007016159 0000:000199ce) started at Jan 17 2014  2:04PM in database 'test' at LSN (211:148:3) failed.

Expected Result: Failover successfully. During the failover, client connection is disconnected to Availability group listener. Reconnect to Listener is successful and data load continues. Connection lost during failover is expected by design. It is between 3 sec to 15 sec.

Test 3: Primary replica with Manual failover (cont.): missing rows to determine the data loss

  • Start manual failover from AlwaysOn Dashboard while data load script is running
  •  Determine any missing rows are not submitted to Primary
  •  Determine any missing rows are not replicated to Secondary
Expected Result: No missing rows in primary and secondary replicas

Test 4: Primary replica with Automatic (forced) failover

4.1    Stopping SQL Server service

  • Stopping SQL Server service
  • Confirm secondary replica becomes primary replica
  •  At this point, primary replica run exposed and RPO of 0 data loss is not met because there is no secondary replica
  • Reconnect to listener and run data load script


Expected Result: In Failover Cluster Manager, Secondary replica becomes Primary. During timeout, client connection is disconnected to Availability group listener. Then, reconnect to Listener is successful and data load continues.

4.2    Power down Primary replica
  • Power down Primary replica from VSphere
  • Confirm secondary replica becomes primary replica
  •  Reconnect to listener and run data load script

Expected Result: In Failover Cluster Manager, Secondary replica becomes Primary. During timeout, client connection is disconnected to Availability group listener. Then, reconnect to Listener is successful and data load continues

4.3    Stop Current owner Cluster service

  • Stop a current owner cluster service from failover cluster manager
  • Confirm secondary replica becomes primary replica
  • Reconnect to listener and run data load script

Expected Result: In Failover Cluster Manager, Secondary replica becomes Primary. During timeout, client connection is disconnected to Availability group listener. Then, reconnect to Listener is successful and data load continues.

4.4    Disable Primary server network adapter

  • Open Network Connection on Primary Replica
  • Click on both device and disable
  • Confirm secondary replica becomes primary replica
  • Reconnect to listener and run data load script
  • Open VShpere and open console of earlier Primary replica and enable network adapers

Expected Result: In Failover Cluster Manager, Secondary replica becomes Primary. During timeout, client connection is disconnected to Availability group listener. Then, reconnect to Listener is successful and data load continues.

Test 5: Primary replica with Automatic failover (cont.): missing rows to determine the data loss

  • During automatic forced failover cases (4.1 – 4.4)
  • Determine any missing rows are not submitted to Primary
  • Determine any missing rows are not replicated to Secondary
Expected Result : No missing rows

Test 6: Secondary replica down
  • Connect to availability group listener and run data load script
  • Power off Secondary replica
  • Confirm data load script is not interrupted

 Expected Result: Data load is NOT interrupted.

Test 7:  Primary replica also fails and Secondary comes up later

Imagine due to power issues, Primary and Secondary replicas are down.

  • In Test 6, secondary replica is down
  • Now Power off Primary
  • Both nodes are offline and Cluster is down
  • After 1 minute, power on earlier Secondary
  • Confirm earlier Secondary comes up as a primary replica
  • Confirm Cluster service comes up
  • Reconnect to Listener and run data load
Expected Result : Earlier Secondary replica is online but it doesn’t change the role to Primary. Databases are Recovery pending state in Secondary. Unable to connect to Listener. The failover cluster manager is unable to connect to our WSFC.

Test 8:  Bring earlier Primary replica back online

  • Power on earlier primary replica
  • Confirm Cluster service is up
  • Reconnect to Listener and run data load
Expected Result : Earlier Primary replica doesn’t come up as a secondary. It role is still Primary. Data load started and database is synchronized.

Test 9:  Quorum is down

  • Take quorum offline. The WSFC cluster service offline
  • Determine how application behaves in this situation
  • After 1 min, Bring quorum online
  • Confirm cluster service online
  • Confirm all nodes are online
  • Confirm database are synchronizing
  • Confirm SQL Management Studio is able to connect to listener
E    Expected result : The cluster service stays online even if the quorum is taken offline. The quorum configuration is Node and File Share Majority. We had three nodes before the quorum is taken offline.  After the quorum is taken offline, the total node is 2 and we have two votes. We need more than half of the votes to running the cluster service. This means we can only sustain a single node failure.


     Test 10: Primary replica is taken offline while the quorum is 

  •   Stop SQL service on Primary Replica
  •  Verify a cluster service is still online
      Expected Result : The cluster service is offline. We need at least 2 out of 3 nodes to be online in order to the cluster service to stay online


Test 11:  VMotion

  • Test VMotion on both replicas with System Adminstrator team
  • Confirm no replica role changes
  • Confirm no changes in cluster node role
Expected Result : VMotion is performed with no impact to running workloads or connected users and replica role.

Test 12:  Load testing/ performance test

  • Application will log millions of rows
  • Take a performance base line of the server while the loading takes place

Ø  IO
Ø  CPU
Ø  Disk
Ø  Network
Ø  SQL Server : Availability group replica
Ø  SQL Server: Availability group Database












2 comments:

  1. I'm planning to launch AOAG on 2 node for test purpose, if the quorum is down, so that the AOAG ?

    ReplyDelete
    Replies
    1. Quorum is down, primary,secondary nodes are still up. AG is still up.

      Delete

How to add a Database to AlwaysOn Availability Group with four different options

To add a database to an existing AlwaysOn availability group, MS has given us four options to choose from Automatic seeding Full database an...