Return to Home --> Modules --> Device - Disk Array
Control Memory & Cache - CtrlMemCch
The CtrlMemCch module transforms SCSI IOs into LUN Block IOs (between CHIP and LUN), performs Caching operations on LUN Blocks (between LUN and Cache), and schedules physical block IOs for disk accesses (between Cache and ACP). A central Control Memory CPU is installed to handle all these operations. Multiple Meta Channels are used to exchange control information with CHIPs and ACPs. A Bus Selector is used to exchange data information with Cache Switches. At least one LUN Controller module must be setup working as the Logical Unit Device Server and Task Manager as defined in SAM-2 (The Application Client is the Host SCSI module or CHIP SCSI_Rmt module). The following figure shows a example CtrlMemCch module with four LUNs setup, communicating with two CHIPs, two ACPs, and two Cache Switches.
Mouse over the boxes to view specific modules.
CtrlMemCch Module
Click here to see its VML version (requiring IE)
A LUN_IOSB (LUN IO State Block) must be created when a SCSI IO request arrives. The LUN_IOSB is maintained to record and monitor the IO's operational state. Accordingly the IOs opened in LUN Controller is called "LunIO". Note LunIO has a one-to-one relationship with SubIO (defined in Host SCSI Layer) since SCSI CDB is created on a per-SubIO basis. Of all the five SCSI Commands implemented in SimSANs, RPT_LUNS is handled by LUN0 while other four (MODE_SENSE, INQUIRY, READ, and WRITE) are handled by LUNx where x>0.
Upon receiving a SCSI_READ or SCSI_WRITE IO request the LUN Controller segments it into multiple LUN Block IO requests and forwards them to CtrlMemCPU where the caching and/or physical IO operations are performed. Note all the LUN Block IO requests generated by a SCSI IO must be block address continuous, e.g., if an SCSI_READ IO requests size 24KB, initial block address 1000 to LUN1 whose block size is 8KB, then three LUN Block IO requests are generated with the 1st one to block 1000, the 2nd one to block 1001, and the last one to block 1002.
Each LUN (except LUN0) applies RAID 1+0 configuration where two or more drives are mirrored together, and then the mirrors are striped together. Different from the implementation in version 1.0, which requires member disks to be setup within the same Array Group, in version 2.0 member disks can be setup across the whole Disk Array. LUN parameters must be manually setup before running a simulation. System Configuration has the detail regarding LUN configuration. Note
LUN block mapping is to translate LUN Block requests into the corresponding Member Disk Logical Block requests so that the disk accesses can be processed. Based on RAID 1+0 mentioned above, the mapping rules is illustrated as follows given Block_Size = 4KB, Stripe_Depth = 16KB, and Member Disk_Num = 3:
LUN_Mapping
Click here to see its VML version (requiring IE)
From the above example, it is very easy to figure out how to map LUN block number to Disk Block number, e.g., LUN block #21 is mapped to Disk2’s block #5 for primary image and Disk5’s block #5 for mirroring image. The following generic formulas are used to calculate diskId and diskBlk for a given lunBlk:
strDpthInBlk = strDpth
/ blkSize strInBlk = strDpthInBlk x diskNum destStrId = lunBlk / strInBlk destStrBlk = lunBlk % strInBlk diskId = destStrBlk / strDpthInBlk diskBlk = destStrId x strDpthInBlk + destStrBlk % strDpthInBlk |
Where strDpth, blkSize, diskNum, and lunBlk are input parameters while diskId and diskBlk are output parameters - it should be no difficulty figuring out what these parameters mean. Note diskNum denotes the number of primary disks and diskId is primary diskId. It is obvious that each LUN block WRITE request will generate two disk block requests if disk mirroring is setup. For block READ request, either disk can fulfill it. Unlike version 1.0, the version 2.0 SimSANs does not require LUN member disks to be the same disk model although it is recommended to do so. In case user setup hybrid disk models for a LUN, the LUN capacity is limited to the least-capacity member disk. See more detail in LUN Configuration.
CtrlMemCPU performs the followings tasks: Caching Operation, Dirty Block Flushing, Cache Cleaning, and Physical Block IO Scheduling. Besides, LUN Controller (LUN_Ctrl) also performs its own tasks by consuming CtrlMemCPU resources, which implies that LUN_Ctrl is a software module loaded into the array control memory. Since CtrlMemCPU is a hardware processor, necessary timing elements must be setup to compute the response time for the above tasks. The Control Memory Timing Elements are introduced at the end of this section.
Caching Operation: SimSANs uses Write-Back + LRU (Least Recently Used). Such a mechanism is proven to be optimal for random access and transaction applications. For highly sequential access the Read Ahead mechanism should be used but current version (version 2.0) has no this feature. Future version may consider this. Different from version 1.0 where the Cache Block Size is a straight value that proves to lack flexibility, version 2.0 allows various values to be setup to accommodate various LUN Block Sizes. From version 2.0.15, Write-Through caching is enabled.
Dirty Block Flushing (or Cache Update): Using write back mechanism significantly improves performance for WRITE operation since the IO acknowledgment can be returned immediately when the block is written into the Cache without waiting for the completion of time-consuming disk access. However, if the dirty blocks are not flushed periodically the cache space will run out sooner or later thus the whole system performance will drop quickly. On the other hand, if we flush too many dirty blocks at one time it may congest the IO path, which in turn degrade the performance.
For efficiently flushing dirty blocks without significantly impacting system performance SimSANs provides two Cache Update mechanisms: Fixed_Update which performs flushing operation in a pre-setup time interval and write back a fixed percentage of dirty blocks, and Adaptive_Update which tries to estimate future IO load based on previously calculated IO load and then adaptively flush dynamic number of dirty blocks in a pre-setup time interval. The Adpative_Update in version 2.0 has been re-designed and presents a more complex and more powerful mechanism than the one implemented in version 1.0 (in fact I even don't think the Adaptive_Update in version 1.0 is quite effective). Besides, both update mechanisms try to flush the least-used blocks across all LUNs instead of working on per-LUN basis like version 1.0. The following flow-chart and its parameter list shows the basic idea about how the Cache Update works. A separate flow-chart and its parameter list is also given to detail the flush operations for both Fixed_Update and Adaptive_Update. Note the Adaptive_Update is pretty much a rough idea and I didn't dig deeper about it. It would be highly appreciated if interested users could do more work on this topic.
Parameter List for Cache Update Work Flow
Parameter Name |
Description |
Attribute |
dirtyRatio |
the ratio of dirty Cache space over total Cache space |
working |
dirtyCchSize |
dirty Cache in KB |
working |
totalCchSize |
total installed Cache Memory in KB |
input |
cchUpdActv |
a flag denoting if update operation is active |
working |
FTH_actv |
dirtyRatio threshold to activate flush operation |
input |
FTH_deactv |
dirtyRatio threshold to de-activate flush operation |
input |
INT_flush |
flush operation time interval in ms |
input |
Cache Update Work Flow
Click here to see its VML version (requiring IE)
Parameter List for Flush Operations
Parameter Name |
Description |
Attribute |
currDirtyBlk |
current dirty Cache blocks |
working |
FBR_max |
maximum value of Flush Block Ratio - the ratio of dirty blocks to be flushed over total dirty blocks |
input (for Fixed Update only) |
lastPhyRtn |
completed physical block IOs (RD + WR) for the last time interval |
working |
lastAveRespTime |
average response time per completed physical block IO for the last time interval |
working |
lastTotalRespTime |
sum of response time of each completed physical block IO for the last time interval |
working |
lastPhyReq |
physical block IO requests (RD + WR) issued for the last time interval |
working |
lastPhyWrReq |
physical block WRITE IO requests issued for the last time interval |
working |
expFlushRatio |
expected ratio of dirty blocks to be flushed |
working |
thisTotalTime |
total response time to be allocated for physical IO completion |
working |
TOL_fact |
tolerance factor - usually a multiple of INT_flush |
input (for Adapt Update only) |
thisSurpTime |
the surplus response time to be allocated for this round flushing |
working |
currDirtyActvBlk |
active dirty blocks - currently in the progress of disk accesses and still haven't completed |
working |
thisExpPhyReq |
physical block IO requests expected to be issued |
working |
FBR_ini |
initial value of Flush Block Ratio |
input (for Adapt Update only) |
thisExpFlushBlk |
physical WRITE blocks expected to be flushed |
working |
thisMaxFlushBlk |
maximum number of physical WRITE blocks to be flushed |
output (for both Updates) |
AGG_fact |
aggressiveness factor to generate the value of thisMaxFlushBlk |
input (for Adapt Update only) |
thisMinFbm |
physical WRITE blocks to be flushed, limited by FBR_min |
working |
FBR_min |
minimum value of Flush Block Ratio |
input (for Adapt Update only) |
Flush Operations
Click here to see its VML version (requiring IE)
Cache Cleaning: SimSANs uses direct mapping to create an index for each data block in the Cache. The direct mapping has a problem: indexing operation becomes quite slow if high percentage of Cache-misses is encountered, e.g, highly sequential IOs - Note SimSANs does not use Read-Ahead to boost sequential IOs. Cache Cleaning operation is used to periodically clean out least-used Cache blocks to free more spaces so that indexing operation (especially in the case of Cache-miss) can present less overhead to search the Cache for each IO operation. The Cache Cleaning operation used in SimSANs is very simple: once the Cache-hit ratio drops to a pre-defined threshold and the Cache block occupy ratio reaches to a pre-defined threshold certain amount of Cache blocks will be cleaned out. Five parameters need to be setup: HRTH, CTH_H_actv, CBR_H_max, CTH_N_actv, and CBR_N_max. The description of these parameters as well as the cleaning operation algorithm are shown below. User may find that various combinations of the five parameters may yield rather different results - sometimes the system may suffer significant performance penalty. Interested user could work more on this topic, say, the impact of IO traffic pattern on the effectiveness of cleaning operation, and the combination of Cache Cleaning and Dirty Block Flushing (see above). I observe that a larger HRTH (say, between 0.4 to 0.6) and a smaller CTH_H_actv (say, < 0.2 but depending on the total Cache Size) may help boost the performance, but still needs further research.
Parameter List for Clean Operation
Parameter Name |
Description |
Attribute |
hitRatio |
the Cache-Hit ratio for the last time interval |
working |
blkReqHit |
total Cache blocks hit for the last time interval |
working |
totalBlkReq |
total Cache block requests for the last time interval |
working |
occupyRatio |
the Cache space occupy ratio |
working |
openCchSize |
opened Cache space in KB |
working |
totalCchSize |
total installed Cache Memory in KB |
input |
HRTH |
Cache-Hit ratio threshold value |
input |
CTH_H_actv |
when Cache-Hit drops below HRTH, the ratio of occupied Cache space over total Cache space, when exceeded, the cleaning operation is activated |
input |
CBR_H_max |
when Cache-Hit drops below HRTH, the maximum ratio of Cache blocks to be cleaned out over total Cache blocks opened |
input |
CTH_N_actv |
when Cache-Hit above HRTH, the ratio of occupied Cache space over total Cache space, when exceeded, the cleaning operation is activated |
input |
CBR_N_max |
when Cache-Hit above HRTH, the maximum ratio of Cache blocks to be cleaned out over total Cache blocks opened |
input |
INT_clean |
clean operation time interval in ms |
input |
Clean Operations
Click here to see its VML version (requiring IE)
Physical Block IO Scheduling:
this is regarding how the Control Memory handles those missed block
requests and schedule physical disk accesses. A
Pending_Block_Table is setup to
keep track of all the missed block requests. Several rules are
applied:
[1] Multiple pending READ requests to access the same LUN block will
generate only one physical READ request. The pending block
initiating the physical READ request is called
active pending block, otherwise, is
static pending block. For a
given LUN block, at any one time, there may be multiple static
pending blocks but only one active pending block.
[2] During the period that a physical READ operation is in progress,
if a WRITE block data comes in for the same LUN block, all the
pending READ requests (whether active pending block or static
pending blocks) for that LUN block will be fulfilled
immediately. That physical READ operation in progress is discarded.
[3] WRITE miss due to insufficient space to caching the block data will try to free Cache space (using LRU) or flush existing dirty blocks to accommodate its caching requirement. Note the flush operation mentioned in here is just to generate physical WRITE requests, which is different from the one used in the Cache Update operation where complex algorithm are applied before issuing physical requests.
Timing Elements: Control Memory CPU defines five timing elements: PER_CCH_IDX and COM_PND_BLK for Cache relevant operations, and SCSI_WR_SEG, SCSI_RD_SEG, RMV_LUN_IO, COM_BLK_WR, and COM_BLK_RD for LUN Controller modules. Please refer to Timing Elements about their definitions and configurations.
This page was last updated 2004.08.20