Preparation
Use SQL
server management studio (SMS), create a test database and add the database to
availability group database.
create database test
create table test
(testno int)
Data Load script
I will
use the following simple data load script to test data synchronization and connection.
declare @i as integer
declare @id as integer
set @i = 200000
set @id = 1
insert into test(testno)
select @id
while ((select max(testno) from test) < @i )
begin
insert into test(testno)
select @id
set @id = @id +1
print @id
end
Test Case
Test 1: Ensure synchronization works normal
- Connect to Availability group Listener and run data load script
- Check the secondary replica to see if the log records made it
across
Expected Result :Logs are written to
Secondary replica. Row counts are the same.
Test 2: Primary replica
with Manual failover
- Start manual failover from AlwaysOn Dashboard
- Confirm Primary is successfully failover to Secondary replica
- Reconnect to Availability group Listener
Expected Error : In
SMS,Remote harden of transaction 'INSERT' (ID 0x0000000007016159 0000:000199ce)
started at Jan 17 2014 2:04PM in
database 'test' at LSN (211:148:3) failed.
Expected Result: Failover successfully. During the failover, client
connection is disconnected to Availability group listener. Reconnect to
Listener is successful and data load continues. Connection lost during failover
is expected by design. It is between 3 sec to 15 sec.
Test 3: Primary replica with Manual failover (cont.): missing
rows to determine the data loss
- Start manual failover from AlwaysOn Dashboard while data load
script is running
- Determine any missing rows are not submitted to Primary
- Determine any missing rows are not replicated to Secondary
Expected Result: No missing rows in primary and secondary replicas
Test 4: Primary replica with Automatic (forced) failover
4.1 Stopping SQL Server
service
- Stopping SQL Server service
- Confirm secondary replica becomes primary replica
- At this point, primary replica run exposed and RPO of 0 data
loss is not met because there is no secondary replica
- Reconnect to listener and run data load script
Expected Result: In Failover Cluster Manager, Secondary replica becomes
Primary. During timeout, client connection is disconnected to Availability
group listener. Then, reconnect to Listener is successful and data load
continues.
4.2 Power down Primary
replica
- Power down Primary replica from VSphere
- Confirm secondary replica becomes primary replica
- Reconnect to listener and run data load script
Expected Result: In Failover Cluster Manager, Secondary replica
becomes Primary. During timeout, client connection is disconnected to
Availability group listener. Then, reconnect to Listener is successful and data
load continues
4.3 Stop Current owner
Cluster service
- Stop a current owner cluster service from failover cluster
manager
- Confirm secondary replica becomes primary replica
- Reconnect to listener and run data load script
Expected Result: In Failover Cluster Manager, Secondary replica
becomes Primary. During timeout, client connection is disconnected to
Availability group listener. Then, reconnect to Listener is successful and data
load continues.
4.4 Disable Primary
server network adapter
- Open Network Connection on Primary Replica
- Click on both device and disable
- Confirm secondary replica becomes primary replica
- Reconnect to listener and run data load script
- Open VShpere and open console of earlier Primary replica and
enable network adapers
Expected Result: In Failover Cluster Manager, Secondary replica
becomes Primary. During timeout, client connection is disconnected to
Availability group listener. Then, reconnect to Listener is successful and data
load continues.
Test 5: Primary replica with Automatic failover (cont.): missing
rows to determine the data loss
- During
automatic forced failover cases (4.1 – 4.4)
- Determine any missing rows are not submitted to Primary
- Determine any missing rows are not replicated to Secondary
Expected Result : No missing rows
Test 6: Secondary replica down
- Connect to availability group listener and run data load script
- Power off Secondary replica
- Confirm data load script is not interrupted
Expected Result: Data load is NOT interrupted.
Test 7: Primary replica
also fails and Secondary comes up later
Imagine
due to power issues, Primary and Secondary replicas are down.
- In Test 6, secondary replica is down
- Now Power off Primary
- Both nodes are offline and Cluster is down
- After 1 minute, power on earlier Secondary
- Confirm earlier Secondary comes up as a primary replica
- Confirm Cluster service comes up
- Reconnect to Listener and run data load
Expected Result : Earlier Secondary replica is online but it
doesn’t change the role to Primary. Databases are Recovery pending state in Secondary.
Unable to connect to Listener. The failover cluster manager is unable to
connect to our WSFC.
Test 8: Bring earlier Primary
replica back online
- Power on earlier primary replica
- Confirm Cluster service is up
- Reconnect to Listener and run data load
Expected Result : Earlier Primary replica doesn’t come up as a
secondary. It role is still Primary. Data load started and database is
synchronized.
Test 9: Quorum is down
- Take quorum offline. The WSFC cluster service offline
- Determine how application behaves in this situation
- After 1 min, Bring quorum online
- Confirm cluster service online
- Confirm all nodes are online
- Confirm database are synchronizing
- Confirm SQL Management Studio is able to connect to listener
E Expected result : The cluster service stays online even if the quorum
is taken offline. The quorum configuration is Node and File Share Majority. We
had three nodes before the quorum is taken offline. After the quorum is taken offline, the total node
is 2 and we have two votes. We need more than half of the votes to running the
cluster service. This means we can only sustain a single node failure.
Test 10: Primary replica is taken offline while the quorum is
- Stop SQL service on Primary Replica
- Verify a cluster service is still online
Expected Result : The cluster service is offline. We need at least
2 out of 3 nodes to be online in order to the cluster service to stay online
Test 11: VMotion
- Test VMotion on both replicas with System Adminstrator team
- Confirm no replica role changes
- Confirm no changes in cluster node role
Expected Result : VMotion is performed with no impact to running
workloads or connected users and replica role.
Test 12: Load testing/
performance test
- Application will log millions of rows
- Take a performance base line of the server while the loading
takes place
Ø IO
Ø CPU
Ø Disk
Ø Network
Ø SQL
Server : Availability group replica
Ø SQL
Server: Availability group Database