Control Memory & Cache

The CtrlMemCch module transforms SCSI IOs into LUN Block IOs (between CHIP and LUN), performs Caching operations on LUN Blocks (between LUN and Cache), and schedules physical block IOs for disk accesses (between Cache and ACP). A central Control Memory CPU is installed to handle all these operations. Multiple Meta Channels are used to exchange control information with CHIPs and ACPs. A Bus Selector is used to exchange data information with Cache Switches. At least one LUN Controller module must be setup working as the Logical Unit Device Server and Task Manager as defined in SAM-2 (The Application Client is the Host SCSI module or CHIP SCSI_Rmt module). The following figure shows a example CtrlMemCch module with four LUNs setup, communicating with two CHIPs, two ACPs, and two Cache Switches.

A LUN_IOSB (LUN IO State Block) must be created when a SCSI IO request arrives. The LUN_IOSB is maintained to record and monitor the IO's operational state. Accordingly the IOs opened in LUN Controller is called "LunIO". Note LunIO has a one-to-one relationship with SubIO (defined in Host SCSI Layer) since SCSI CDB is created on a per-SubIO basis. Of all the five SCSI Commands implemented in SimSANs, RPT_LUNS is handled by LUN0 while other four (MODE_SENSE, INQUIRY, READ, and WRITE) are handled by LUNx where x>0.

Upon receiving a SCSI_READ or SCSI_WRITE IO request the LUN Controller segments it into multiple LUN Block IO requests and forwards them to CtrlMemCPU where the caching and/or physical IO operations are performed. Note all the LUN Block IO requests generated by a SCSI IO must be block address continuous, e.g., if an SCSI_READ IO requests size 24KB, initial block address 1000 to LUN1 whose block size is 8KB, then three LUN Block IO requests are generated with the 1st one to block 1000, the 2nd one to block 1001, and the last one to block 1002.

Each LUN (except LUN0) applies RAID 1+0 configuration where two or more drives are mirrored together, and then the mirrors are striped together. Different from the implementation in version 1.0, which requires member disks to be setup within the same Array Group, in version 2.0 member disks can be setup across the whole Disk Array. LUN parameters must be manually setup before running a simulation. System Configuration has the detail regarding LUN configuration. Note

LUN block mapping is to translate LUN Block requests into the corresponding Member Disk Logical Block requests so that the disk accesses can be processed. Based on RAID 1+0 mentioned above, the mapping rules is illustrated as follows given Block_Size = 4KB, Stripe_Depth = 16KB, and Member Disk_Num = 3:

From the above example, it is very easy to figure out how to map LUN block number to Disk Block number, e.g., LUN block #21 is mapped to Disk2’s block #5 for primary image and Disk5’s block #5 for mirroring image. The following generic formulas are used to calculate diskId and diskBlk for a given lunBlk:

Where strDpth, blkSize, diskNum, and lunBlk are input parameters while diskId and diskBlk are output parameters - it should be no difficulty figuring out what these parameters mean. Note diskNum denotes the number of primary disks and diskId is primary diskId. It is obvious that each LUN block WRITE request will generate two disk block requests if disk mirroring is setup. For block READ request, either disk can fulfill it. Unlike version 1.0, the version 2.0 SimSANs does not require LUN member disks to be the same disk model although it is recommended to do so. In case user setup hybrid disk models for a LUN, the LUN capacity is limited to the least-capacity member disk. See more detail in LUN Configuration.

CtrlMemCPU performs the followings tasks: Caching Operation, Dirty Block Flushing, Cache Cleaning, and Physical Block IO Scheduling. Besides, LUN Controller (LUN_Ctrl) also performs its own tasks by consuming CtrlMemCPU resources, which implies that LUN_Ctrl is a software module loaded into the array control memory. Since CtrlMemCPU is a hardware processor, necessary timing elements must be setup to compute the response time for the above tasks. The Control Memory Timing Elements are introduced at the end of this section.

Caching Operation: SimSANs uses Write-Back + LRU (Least Recently Used). Such a mechanism is proven to be optimal for random access and transaction applications. For highly sequential access the Read Ahead mechanism should be used but current version (version 2.0) has no this feature. Future version may consider this. Different from version 1.0 where the Cache Block Size is a straight value that proves to lack flexibility, version 2.0 allows various values to be setup to accommodate various LUN Block Sizes. From version 2.0.15, Write-Through caching is enabled.

Dirty Block Flushing (or Cache Update): Using write back mechanism significantly improves performance for WRITE operation since the IO acknowledgment can be returned immediately when the block is written into the Cache without waiting for the completion of time-consuming disk access. However, if the dirty blocks are not flushed periodically the cache space will run out sooner or later thus the whole system performance will drop quickly. On the other hand, if we flush too many dirty blocks at one time it may congest the IO path, which in turn degrade the performance.

For efficiently flushing dirty blocks without significantly impacting system performance SimSANs provides two Cache Update mechanisms: Fixed_Update which performs flushing operation in a pre-setup time interval and write back a fixed percentage of dirty blocks, and Adaptive_Update which tries to estimate future IO load based on previously calculated IO load and then adaptively flush dynamic number of dirty blocks in a pre-setup time interval. The Adpative_Update in version 2.0 has been re-designed and presents a more complex and more powerful mechanism than the one implemented in version 1.0 (in fact I even don't think the Adaptive_Update in version 1.0 is quite effective). Besides, both update mechanisms try to flush the least-used blocks across all LUNs instead of working on per-LUN basis like version 1.0. The following flow-chart and its parameter list shows the basic idea about how the Cache Update works. A separate flow-chart and its parameter list is also given to detail the flush operations for both Fixed_Update and Adaptive_Update. Note the Adaptive_Update is pretty much a rough idea and I didn't dig deeper about it. It would be highly appreciated if interested users could do more work on this topic.

Parameter Name	Description	Attribute
dirtyRatio	the ratio of dirty Cache space over total Cache space	working
dirtyCchSize	dirty Cache in KB	working
totalCchSize	total installed Cache Memory in KB	input
cchUpdActv	a flag denoting if update operation is active	working
FTH_actv	dirtyRatio threshold to activate flush operation	input
FTH_deactv	dirtyRatio threshold to de-activate flush operation	input
INT_flush	flush operation time interval in ms	input

Parameter Name	Description	Attribute
currDirtyBlk	current dirty Cache blocks	working
FBR_max	maximum value of Flush Block Ratio - the ratio of dirty blocks to be flushed over total dirty blocks	input (for Fixed Update only)
lastPhyRtn	completed physical block IOs (RD + WR) for the last time interval	working
lastAveRespTime	average response time per completed physical block IO for the last time interval	working
lastTotalRespTime	sum of response time of each completed physical block IO for the last time interval	working
lastPhyReq	physical block IO requests (RD + WR) issued for the last time interval	working
lastPhyWrReq	physical block WRITE IO requests issued for the last time interval	working
expFlushRatio	expected ratio of dirty blocks to be flushed	working
thisTotalTime	total response time to be allocated for physical IO completion	working
TOL_fact	tolerance factor - usually a multiple of INT_flush	input (for Adapt Update only)
thisSurpTime	the surplus response time to be allocated for this round flushing	working
currDirtyActvBlk	active dirty blocks - currently in the progress of disk accesses and still haven't completed	working
thisExpPhyReq	physical block IO requests expected to be issued	working
FBR_ini	initial value of Flush Block Ratio	input (for Adapt Update only)
thisExpFlushBlk	physical WRITE blocks expected to be flushed	working
thisMaxFlushBlk	maximum number of physical WRITE blocks to be flushed	output (for both Updates)
AGG_fact	aggressiveness factor to generate the value of thisMaxFlushBlk	input (for Adapt Update only)
thisMinFbm	physical WRITE blocks to be flushed, limited by FBR_min	working
FBR_min	minimum value of Flush Block Ratio	input (for Adapt Update only)

Cache Cleaning: SimSANs uses direct mapping to create an index for each data block in the Cache. The direct mapping has a problem: indexing operation becomes quite slow if high percentage of Cache-misses is encountered, e.g, highly sequential IOs - Note SimSANs does not use Read-Ahead to boost sequential IOs. Cache Cleaning operation is used to periodically clean out least-used Cache blocks to free more spaces so that indexing operation (especially in the case of Cache-miss) can present less overhead to search the Cache for each IO operation. The Cache Cleaning operation used in SimSANs is very simple: once the Cache-hit ratio drops to a pre-defined threshold and the Cache block occupy ratio reaches to a pre-defined threshold certain amount of Cache blocks will be cleaned out. Five parameters need to be setup: HRTH, CTH_H_actv, CBR_H_max, CTH_N_actv, and CBR_N_max. The description of these parameters as well as the cleaning operation algorithm are shown below. User may find that various combinations of the five parameters may yield rather different results - sometimes the system may suffer significant performance penalty. Interested user could work more on this topic, say, the impact of IO traffic pattern on the effectiveness of cleaning operation, and the combination of Cache Cleaning and Dirty Block Flushing (see above). I observe that a larger HRTH (say, between 0.4 to 0.6) and a smaller CTH_H_actv (say, < 0.2 but depending on the total Cache Size) may help boost the performance, but still needs further research.

Parameter Name	Description	Attribute
hitRatio	the Cache-Hit ratio for the last time interval	working
blkReqHit	total Cache blocks hit for the last time interval	working
totalBlkReq	total Cache block requests for the last time interval	working
occupyRatio	the Cache space occupy ratio	working
openCchSize	opened Cache space in KB	working
totalCchSize	total installed Cache Memory in KB	input
HRTH	Cache-Hit ratio threshold value	input
CTH_H_actv	when Cache-Hit drops below HRTH, the ratio of occupied Cache space over total Cache space, when exceeded, the cleaning operation is activated	input
CBR_H_max	when Cache-Hit drops below HRTH, the maximum ratio of Cache blocks to be cleaned out over total Cache blocks opened	input
CTH_N_actv	when Cache-Hit above HRTH, the ratio of occupied Cache space over total Cache space, when exceeded, the cleaning operation is activated	input
CBR_N_max	when Cache-Hit above HRTH, the maximum ratio of Cache blocks to be cleaned out over total Cache blocks opened	input
INT_clean	clean operation time interval in ms	input

Physical Block IO Scheduling: this is regarding how the Control Memory handles those missed block requests and schedule physical disk accesses. A Pending_Block_Table is setup to keep track of all the missed block requests. Several rules are applied:
[1] Multiple pending READ requests to access the same LUN block will generate only one physical READ request. The pending block initiating the physical READ request is called active pending block, otherwise, is static pending block. For a given LUN block, at any one time, there may be multiple static pending blocks but only one active pending block.
[2] During the period that a physical READ operation is in progress, if a WRITE block data comes in for the same LUN block, all the pending READ requests (whether active pending block or static pending blocks) for that LUN block will be fulfilled immediately. That physical READ operation in progress is discarded.

[3] WRITE miss due to insufficient space to caching the block data will try to free Cache space (using LRU) or flush existing dirty blocks to accommodate its caching requirement. Note the flush operation mentioned in here is just to generate physical WRITE requests, which is different from the one used in the Cache Update operation where complex algorithm are applied before issuing physical requests.

Timing Elements: Control Memory CPU defines five timing elements: PER_CCH_IDX and COM_PND_BLK for Cache relevant operations, and SCSI_WR_SEG, SCSI_RD_SEG, RMV_LUN_IO, COM_BLK_WR, and COM_BLK_RD for LUN Controller modules. Please refer to Timing Elements about their definitions and configurations.