Try More

Using SimSANs to Study SAN Performance

1. Introduction
This page is to illustrate how the SimSANs can be utilized to study SAN performance. Two simulation scenarios are demonstrated: one for frame latency study discussed in section 3, and the other is for the study of long distance data movement discussed in section 4. Each of the two sections presents the simulation’s configuration, result, as well as some considerations. Section 2 outlines the network topology that the two scenarios are running on. The executables (for Windows), parameter configuration files, and output data are also provided from here.

2. Network Topology
The network topology that is used to run the scenarios is derived from the sample network, with minor changes involved. The first scenario (scenario 1), studying the frame latency, uses Host 1, 2, 3, and 5 to issue sequential write IOs to LUNs at Device1. The second scenario (scenario 2), studying the performance of long distance data movement, uses Host 4 and 6 to issue sequential write IOs the LUNs at Device1, which then clones the data to the mirrored LUNs at Device3 remotely located. Figure 1 shows the network topology and the IO paths for both scenarios, with yellow color path for scenario 1 and pink color path for scenario 2.

Figure 1. Network Topology and IO Paths

Click here to see the full image

3. Scenario I – Frame Latency
This scenario is to investigate the frame latency vs. concurrent client number. Four clients (Host 1, 2, 3, and 5) are involved in the simulation, each of which initiates 100MB sequential write IO to a separate LUN at Device1, and IOs travel two switch hops (Sw1 and Sw3) from Hosts to Device. Each client maintains a constant write throughput of 40MBps during the simulation. Simulation data is collected to analyze how the number of concurrent clients affects the frame latency. Besides, three different IO sizes (16KB, 32KB, and 64KB) are applied to the simulation for us to watch the effect of the IO size on the frame latency.

Configurations
Client configuration – All four hosts have similar configurations except that they are targeted to different LUNs. Following parameters are configured for clients:
- Concurrent IOs: 256
- HBA Port Speed: 2Gbps
- Rx Buffer: 6
- Tx Buffer: None
- Class-3 Concurrent Sequence: 255
- Application: Sequential Write
- Application Data: 100MB
- Application Throughput: 40MBps
- IO Size: 16KB, 32KB, and 64KB
- Target LUN: LUN1, 2, 3, and 5

Switch configuration – Both switches have similar configurations. Following parameters are configured for switches:
- Domain_ID: 101 for Sw1 and 103 for Sw3
- Port Speed: 2Gbps
- Rx Buffer: 12
- Tx Buffer: None
- Frame Forwarding: cut-through
- Forwarding Latency: around 1.88us
- Hops: 2

Device configuration – Device1 is equipped with 2 CHIPs, each of which has four N_Ports, and only one of the eight ports is used in the simulation. Four LUNs are setup, each of which is accessible by one of the four clients. Except the LUN WWN, all LUNs have the same configuration. Besides, the LUN member disks are assigned across separate array groups to minimize resource contention. Following parameters are configured for the device:
- CHIP Port Speed: 2Gbps
- CHIP Port Rx Buffer: 6
- CHIP Port Tx Buffer: None
- CHIP Class-3 Concurrent Sequence: 255
- Cache: disabled to avoid the cache-hit effect on the simulation results
- ACP Array Group Number: 4
- RAID: RAID 0
- LUN Member Disks: 4
- LUN Block Size: 8KB
- LUN Stripe Depth: 32KB
- Disk Sequential Access Rate: about 25 to 40MBps

Simulation Results
Simulations were performed against different concurrent client numbers and different IO sizes so there are total 12 separate simulation instances scheduled, with concurrent client number of 1, 2, 3, and 4, and IO size of 16KB, 32KB, and 64KB. Figure 2 shows the results of average network latency vs. concurrent client number. Figure 3 shows the results of average end-to-end latency vs. concurrent number. Figure 4 shows the average IO response time vs. concurrent number. Network latency denotes the duration from the time when a frame is sent out onto the link by the source N_Port to the time when the frame is completely received by the destination N_Port, i.e., how much time for a frame to travel through the switch fabric. End-to-End latency is calculated from the time the source N_Port’s FC-2 layer initiates a frame till the time the destination N_Port’s FC-2 layer completely receives the whole frame. It is obvious that End-to-End frame latency includes network latency and N_Port’s processing cost.

The simulation results show that the frame latency (both network and end-to-end) as well as the IO response time increase with the joining of more concurrent clients. This is because more concurrent clients introduce larger aggregate throughput and therefore a greater chance to cause frames piled up in the frame buffer waiting for processing. The frame latency becomes much more significant when large IO size, say, 64KB, is used since larger IO size will generate more frames at a time by Fibre Channel Sequence Segmentation so that we can expect there is a good chance of frame piling up during its traveling along the IO path.

Considerations
Basically the simulation results reflect the frame latency behavior that a SAN environment usually presents. Please note this simulation is just a demonstration of utilizing SimSANs to study the frame latency issue. In fact, there are many factors that can affect frame latency including Rx/Tx buffer settings, link speed, concurrent IO/Sequence values, N_Port processing cost, switch processing cost, IO pattern, application throughput, network topology, and so on. To thoroughly investigate latency issue more advanced scenarios against real-world cases/test-beds can be modeled and simulated in SimSANs by taking into account the combination effect of many of those factors.

Figure 2. Average Network Latency vs. Concurrent Client Number

Click here to see the full image

Figure 3. Average End-to-End Latency vs. Concurrent Client Number

Click here to see the full image

Figure 4. Average IO Response Time vs. Concurrent Client Number

Click here to see the full image

4. Scenario II – Long Distance Data Movement
This scenario is to investigate the application throughput for the data movement over long distance, such as remote mirroring, replication, or cloning, etc. Two clients (Host 4 and 6) are involved in the simulation, each of which simultaneously initiates 100MB sequential write IO to a separate LUN at Device1, and Device1 then initiates remote mirroring operation (say, by using extended copy or thirty-party copy) to copy the data to remotely located Device3 through Sw6 and Sw11. Please refer to figure 1 for the network topology and IO path. The sustained throughput of the mirroring operation is recorded. The larger the throughput is, the better the performance.

Simulation Configurations and Results
Most of the configurations of hosts, switches, and devices are similar to those used in scenario 1 except for the followings:
- ISL between Sw6 and Sw11: 1Gbps link speed, large BB_Credit (or Rx buffer)
- Distance between Sw6 and Sw11: 1786km
- Target LUN: LUN4 and LUN6 accessible by Host4 and Host6 respectively
- LUN Block Size: 64KB
- LUN Stripe Depth: 256KB

The recorded mirroring operation throughput is 97.29 MBps.

Considerations

The simulation takes on the form of direct ISL connection between Sw6 and Sw11 without considering an intermediate network, say, IP network, between them. With the implementation of iFCP/iSCSI in the future version of the SimSANs, we can be able to study the performance of Fibre Channel SAN extension over IP network. Considering the protocol overhead introduced by iFCP/iSCSI/TCP/IP one can expect that the application throughput of the data movement over IP SAN will be a little bit lower than the one of the above simulation which uses a direct FC ISL.

This page was last updated 2003.11.25