Return to Home --> Modules --> Device - Disk Array

 

Control Memory & Cache - CtrlMemCch

 

The CtrlMemCch module transforms SCSI IOs into LUN Block IOs (between CHIP and LUN), performs Caching operations on LUN Blocks (between LUN and Cache), and schedules physical block IOs for disk accesses (between Cache and ACP). A central Control Memory CPU is installed to handle all these operations. Multiple Meta Channels are used to exchange control information with CHIPs and ACPs. A Bus Selector is used to exchange data information with Cache Switches. At least one LUN Controller module must be setup working as the Logical Unit Device Server and Task Manager as defined in SAM-2 (The Application Client is the Host SCSI module or CHIP SCSI_Rmt module). The following figure shows a example CtrlMemCch module with four LUNs setup, communicating with two CHIPs, two ACPs, and two Cache Switches.  

 

Mouse over the boxes to view specific modules.

 

Click to view "LUN Controller Module" Click to view "Control Memory CPU Module" Click to view "CHIP Module" Click to view "ACP Module" Click to view "Cache Switch Fabric Module"

CtrlMemCch Module

Click here to see its VML version (requiring IE)

 

LUN Controller

A LUN_IOSB (LUN IO State Block) must be created when a SCSI IO request arrives. The LUN_IOSB is maintained to record and monitor the IO's operational state. Accordingly the IOs opened in LUN Controller is called "LunIO". Note LunIO has a one-to-one relationship with SubIO (defined in Host SCSI Layer) since SCSI CDB is created on a per-SubIO basis. Of all the five SCSI Commands implemented in SimSANs, RPT_LUNS is handled by LUN0 while other four (MODE_SENSE, INQUIRY, READ, and WRITE) are handled by LUNx where x>0.

 

Upon receiving a SCSI_READ or SCSI_WRITE IO request the LUN Controller segments it into multiple LUN Block IO requests and forwards them to CtrlMemCPU where the caching and/or physical IO operations are performed. Note all the LUN Block IO requests generated by a SCSI IO must be block address continuous, e.g., if an SCSI_READ IO requests size 24KB, initial block address 1000 to LUN1 whose block size is 8KB, then three LUN Block IO requests are generated with the 1st one to block 1000, the 2nd one to block 1001, and the last one to block 1002.  

 

Each LUN (except LUN0) applies RAID 1+0 configuration where two or more drives are mirrored together, and then the mirrors are striped together. Different from the implementation in version 1.0, which requires member disks to be setup within the same Array Group, in version 2.0 member disks can be  setup across the whole Disk Array. LUN parameters must be manually setup before running a simulation. System Configuration has the detail regarding LUN configuration. Note

 

LUN block mapping is to translate LUN Block requests into the corresponding Member Disk Logical Block requests so that the disk accesses can be processed. Based on RAID 1+0 mentioned above, the mapping rules is illustrated as follows given Block_Size = 4KB, Stripe_Depth = 16KB, and Member Disk_Num = 3:

 

LUN_Mapping

Click here to see its VML version (requiring IE)

 

From the above example, it is very easy to figure out how to map LUN block number to Disk Block number, e.g., LUN block #21 is mapped to Disk2’s block #5 for primary image and Disk5’s block #5 for mirroring image. The following generic formulas are used to calculate diskId and diskBlk for a given lunBlk:

 

strDpthInBlk = strDpth / blkSize
strInBlk = strDpthInBlk x diskNum
destStrId = lunBlk / strInBlk
destStrBlk = lunBlk % strInBlk
diskId = destStrBlk / strDpthInBlk
diskBlk =
destStrId x strDpthInBlk + destStrBlk % strDpthInBlk

 

Where strDpth, blkSize, diskNum, and lunBlk are input parameters while diskId and diskBlk are output parameters - it should be no difficulty figuring out what these parameters mean. Note diskNum denotes the number of primary disks and diskId is primary diskId. It is obvious that each LUN block WRITE request will generate two disk block requests if disk mirroring is setup. For block READ request, either disk can fulfill it. Unlike version 1.0, the version 2.0 SimSANs does not require LUN member disks to be the same disk model although it is recommended to do so. In case user setup hybrid disk models for a LUN, the LUN capacity is limited to the least-capacity member disk. See more detail in LUN Configuration.

 

CtrlMemCPU

CtrlMemCPU performs the followings tasks: Caching Operation, Dirty Block Flushing, Cache Cleaning, and Physical Block IO Scheduling. Besides, LUN Controller (LUN_Ctrl) also performs its own tasks by consuming CtrlMemCPU resources, which implies that LUN_Ctrl is a software module loaded into the array control memory.  Since CtrlMemCPU is a hardware processor, necessary timing elements must be setup to compute the response time for the above tasks. The Control Memory Timing Elements are introduced at the end of this section.

 

Caching Operation: SimSANs uses Write-Back + LRU (Least Recently Used). Such a mechanism is proven to be optimal for random access and transaction applications. For highly sequential access the Read Ahead mechanism should be used but current version (version 2.0) has no this feature. Future version may consider this. Different from version 1.0 where the Cache Block Size is a straight value that proves to lack flexibility, version 2.0 allows various values to be setup to accommodate various LUN Block Sizes. From version 2.0.15, Write-Through caching is enabled.

 

Dirty Block Flushing (or Cache Update): Using write back mechanism significantly improves performance for WRITE operation since the IO acknowledgment can be returned immediately when the block is written into the Cache without waiting for the completion of time-consuming disk access. However, if the dirty blocks are not flushed periodically the cache space will run out sooner or later thus the whole system performance will drop quickly. On the other hand, if we flush too many dirty blocks at one time it may congest the IO path, which in turn degrade the performance.

 

For efficiently flushing dirty blocks without significantly impacting system performance SimSANs provides two Cache Update mechanisms: Fixed_Update which performs flushing operation in a pre-setup time interval and write back a fixed percentage of dirty blocks, and Adaptive_Update which tries to estimate future IO load based on previously calculated IO load and then adaptively flush dynamic number of dirty blocks in a pre-setup time interval. The Adpative_Update in version 2.0 has been re-designed and presents a more complex and more powerful mechanism than the one implemented in version 1.0 (in fact I even don't think the Adaptive_Update in version 1.0 is quite effective). Besides, both update mechanisms try to flush the least-used blocks across all LUNs instead of working on per-LUN basis like version 1.0. The following flow-chart and its parameter list shows the basic idea about how the Cache Update works. A separate flow-chart and its parameter list is also given to detail the flush operations for both Fixed_Update and Adaptive_Update. Note the Adaptive_Update is pretty much a rough idea and I didn't dig deeper about it. It would be highly appreciated if interested users could do more work on this topic.

Parameter List for Cache Update Work Flow

Parameter Name

Description

Attribute

dirtyRatio

the ratio of dirty Cache space over total Cache space

working

dirtyCchSize

dirty Cache in KB

working

totalCchSize

total installed Cache Memory in KB

input

cchUpdActv

a flag denoting if update operation is active

working

FTH_actv

dirtyRatio threshold to activate flush operation

input

FTH_deactv

dirtyRatio threshold to de-activate flush operation

input

INT_flush

flush operation time interval in ms

input

 

Click to view "Flush Operation Algorithm"

Cache Update Work Flow

Click here to see its VML version (requiring IE)

 

Parameter List for Flush Operations

Parameter Name

Description

Attribute

currDirtyBlk

current dirty Cache blocks

working

FBR_max

maximum value of Flush Block Ratio - the ratio of dirty blocks to be flushed over total dirty blocks

input (for Fixed Update only)

lastPhyRtn

completed physical block IOs (RD + WR) for the last time interval

working

lastAveRespTime

average response time per completed physical block IO for the last time interval

working

lastTotalRespTime

sum of response time of each completed physical block IO for the last time interval

working

lastPhyReq

physical block IO requests (RD + WR) issued for the last time interval

working

lastPhyWrReq

physical block WRITE IO requests issued for the last time interval

working

expFlushRatio

expected ratio of dirty blocks to be flushed

working

thisTotalTime

total response time to be allocated for physical IO completion

working

TOL_fact

tolerance factor - usually a multiple of INT_flush

input (for Adapt Update only)

thisSurpTime

the surplus response time to be allocated for this round flushing

working

currDirtyActvBlk

active dirty blocks - currently in the progress of disk accesses and still haven't completed

working

thisExpPhyReq

physical block IO requests expected to be issued

working

FBR_ini

initial value of Flush Block Ratio

input (for Adapt Update only)

thisExpFlushBlk

physical WRITE blocks expected to be flushed

working

thisMaxFlushBlk

maximum number of physical WRITE blocks to be flushed

output (for both Updates)

AGG_fact

aggressiveness factor to generate the value of thisMaxFlushBlk

input (for Adapt Update only)

thisMinFbm

physical WRITE blocks to be flushed, limited by FBR_min

working

FBR_min

minimum value of Flush Block Ratio

input (for Adapt Update only)

 

Flush Operations

Click here to see its VML version (requiring IE)

 

Cache Cleaning: SimSANs uses direct mapping to create an index for each data block in the Cache. The direct mapping has a problem: indexing operation becomes quite slow if high percentage of Cache-misses is encountered, e.g, highly sequential IOs - Note SimSANs does not use Read-Ahead to boost sequential IOs. Cache Cleaning operation is used to periodically clean out least-used Cache blocks to free more spaces so that indexing operation (especially in the case of Cache-miss) can present less overhead to search the Cache for each IO operation. The Cache Cleaning operation used in SimSANs is very simple: once the Cache-hit ratio drops to a pre-defined threshold and the Cache block occupy ratio reaches to a pre-defined threshold certain amount of Cache blocks will be cleaned out. Five parameters need to be setup: HRTH, CTH_H_actv, CBR_H_max, CTH_N_actv, and CBR_N_max. The description of these parameters as well as the cleaning operation algorithm are shown below. User may find that various combinations of the five parameters may yield rather different results - sometimes the system may suffer significant performance penalty. Interested user could work more on this topic, say, the impact of IO traffic pattern on the effectiveness of cleaning operation, and the combination of Cache Cleaning and Dirty Block Flushing (see above). I observe that a larger HRTH (say, between 0.4 to 0.6) and a smaller CTH_H_actv (say, < 0.2 but depending on the total Cache Size) may help boost the performance, but still needs further research.

 

Parameter List for Clean Operation

Parameter Name

Description

Attribute

hitRatio

the Cache-Hit ratio for the last time interval

working

blkReqHit

total Cache blocks hit for the last time interval

working

totalBlkReq

total Cache block requests for the last time interval

working

occupyRatio

the Cache space occupy ratio

working

openCchSize

opened Cache space in KB

working

totalCchSize

total installed Cache Memory in KB

input

HRTH

Cache-Hit ratio threshold value

input

CTH_H_actv

when Cache-Hit drops below HRTH, the ratio of occupied Cache space over total Cache space, when exceeded, the cleaning operation is activated

input

CBR_H_max

when Cache-Hit drops below HRTH, the maximum ratio of Cache blocks to be cleaned out over total Cache blocks opened

input

CTH_N_actv

when Cache-Hit above HRTH, the ratio of occupied Cache space over total Cache space, when exceeded, the cleaning operation is activated

input

CBR_N_max

when Cache-Hit above HRTH, the maximum ratio of Cache blocks to be cleaned out over total Cache blocks opened

input

INT_clean

clean operation time interval in ms

input

 

Clean Operations

Click here to see its VML version (requiring IE)

 

Physical Block IO Scheduling: this is regarding how the Control Memory handles those missed block requests and schedule physical disk accesses. A Pending_Block_Table is setup to keep track of all the missed block requests.  Several rules are applied:
[1] Multiple pending READ requests to access the same LUN block will generate only one physical READ request. The pending block initiating the physical READ request is called active pending block, otherwise, is static pending block. For a given LUN block, at any one time, there may be multiple static pending blocks but only one active pending block.
[2] During the period that a physical READ operation is in progress, if a WRITE block data comes in for the same LUN block, all the pending READ requests (whether active pending block or static pending blocks) for that LUN block will be fulfilled immediately. That physical READ operation in progress is discarded.

[3] WRITE miss due to insufficient space to caching the block data will try to free Cache space (using LRU) or flush existing dirty blocks to accommodate its caching requirement. Note the flush operation mentioned in here is just to generate physical WRITE requests, which is different from the one used in the Cache Update operation where complex algorithm are applied before issuing physical requests. 

 

Timing Elements: Control Memory CPU defines five timing elements: PER_CCH_IDX and COM_PND_BLK for Cache relevant operations, and SCSI_WR_SEG, SCSI_RD_SEG, RMV_LUN_IO, COM_BLK_WR, and COM_BLK_RD for LUN Controller modules. Please refer to Timing Elements about their definitions and configurations.

 

 


This page was last updated 2004.08.20