Feb 12 12:27 1993 SCSI RAID Paper Page 1 Doc: X3T9.2/93-030 SCSI RAID Problems Presented to the Software Developer By Bill Dallas of DIGITAL EQUIPMENT CORPORATION (dallas@wasted.enet.dec.com) This paper is a discussion of SCSI RAID controllers and the problems they present to SCSI peripheral driver developers. I mainly have only questions on what to expect from a RAID controller since there is no ANSI standard or even industry standard defined. I do have available, a defined model that solves the flexible configuration problems on a generic basis and it is available to this study group if wanted. There are models in RAID controllers that allow different devices on the back end (worms, opticals, tapes (for striped tapes, backup speed and independent backup )). What are the allowable device types on the backend of a RAID box? SCSI RAID controllers are a problem to software developers, due to the lack of a defined standard, and as more RAID controllers come on the market this problem only grows. Currently on the market today there are a number of RAID boxes for the SCSI bus. These boxes all present a different behavioral model to the SCSI peripheral driver. Some boxes present a fixed configuration RAID 5 (4+1) and act exactly like the current SCSI disk model. Other boxes have a completely flexible configuration (RAID 1, RAID 3, or RAID 5) with combinations of various RAID levels (RAID 1, over RAID 5 etc.). These different boxes can present different error handling models over the current SCSI disk error handling model. Some boxes have defined vendor unique commands for specific error instances, and recovery procedures that vary from RAID box to RAID box. Configuration control of different boxes can be very simplistic to to extremely complex. They range from the simple SCSI plug in (fixed configuration) to mix and match were you choose what type of RAID level over what disks and combinations of RAID over RAID over RAID etc. In some boxes the only way to configure a box is through a terminal line with no driver intervention. Other boxes require SCSI commands to configure a box. To further complicate the configuration control problems, now introduce the concept of backup controllers. These controllers can be in active (load sharing) or hot standby mode. In either case, knowledge must be available to the software to effect failover in the event of a controller failure for the box. In each configuration model, the problem of configuration is solved only for that box and not the market in general. Feb 12 12:27 1993 SCSI RAID Paper Page 2 Doc: X3T9.2/93-030 The task of writing a generic peripheral driver for RAID boxes is an impossible one. Each driver must be specific for the type of box it is controlling. From the software engineering view point, a maintenance nightmare, and a waste of a number of man years to develop/test each driver. The model I am going to use for the purpose of discussion is the following: Follows the disk model for error event notification, request sense information is similiar to disks, additional sense codes and qualifier having specific information about the RAID logical volume event. Additional sense information, beyond byte 18, has a specific format that narrows down the event to a specific member of the RAID logical volume. The configuration of the box is completely up to human/software control. The box is capable of all RAID levels, and can handle simultaneous RAID levels and multiple RAID levels. An example of this is: lun 1 RAID 5 logical volume lun 2 RAID 3 logical volume lun 3 RAID 1 logical volume comprised of 2 RAID 5 logical volumes. Performance of the box/access patterns must be taken into consideration when configuring the RAID logical volumes. The box also has a primary controller with an active backup controller. The active back up controller can be used for load balancing. This means normal operation disk type scsi commands can be handled by active backup controller, and does not allow configuration control. If a primary controller failure or active backup failure is detected the RAID logical volumes will migrate automatically to the alternate controller without fault. Feb 12 12:27 1993 SCSI RAID Paper Page 3 Doc: X3T9.2/93-030 Below is a model of a physical RAID box with a backup controller and no configuration parameters (a RAW box). The RAID box is comprised of one primary controller and a backup controller on the frontend, connected to the same bus and host. The frontend connections are not limited to same host and bus. The frontend could have the primary controller on SCSI bus A and the backup controller on SCSI BUS B same host. The frontend could also be dual/tri initiator or have separate initiators for the primary and backup controllers. The the number of frontend configuration possibilities do have bounds, but the SCSI RAID specification should address these frontend type configurations. Out of the backend are five separate SCSI busses, numbered 0 - 4. The number of backend busses is not limited to that number, in fact the number of backend busses could exceed 100. While this is impractical in todays market, what is impractical today has been shown in this industry to be very practical tomorrow. The coming of GPP, FCB, SSA SCSI transports and the higher bus bandwidths will soon make cluster RAID controllers practical. Each bus has 3 disks except for bus 3 which has 4 disks. All of the disks are the same type, but this is not a requirement. The number of disks on each bus is bounded by the current limit of number of targets per bus and the number of luns per target. SCSI 3 will soon increase those limits. Feb 12 12:27 1993 SCSI RAID Paper Page 4 Doc: X3T9.2/93-030 ----------------- | Host | | | ----------------- | | SCSI BUS -> | | ------------------------------------------------------- | | RAID BOX below | -Bus Connect- | Frontend V V --------------------------------- | | | | Prim Ctrl | Backup Ctrl | | | | | | | --------------------------------- BUS 0 | BUS 1| BUS 2| BUS 3| BUS 4| Backend. | | | | | ------------- | | | ----- | --------- | | ----- | | ----- | ------ | | | | ----- | | | | | |---|D30| | | | |---|D00| | ----- | | | | |---|D40| | | | | | | | ----- | ----- | | | | ----- |---|D10| | | | | | ----- | | | | |---|D20| | ----- | | ----- | ----- | | | | | | | ----- | | | | | ----- |---|D31| | | | |---|D01| | ----- | ----- | | | |---|D41| | | | | | | | | | | ----- | | | | ----- |---|D11| |---|D21| | | ----- | | | | | | | | ----- | | ----- | ----- | ----- | | | | ----- | | | | | |---|D32| | | | |---|D02| | ----- | ----- | | | |---|D42| | | | | | | | | | ----- | | ----- |---|D12| |---|D22| | ----- | | | | | ----- ----- ----- | | | |---|D33| | | ----- Feb 12 12:27 1993 SCSI RAID Paper Page 5 Doc: X3T9.2/93-030 Some configurations examples: RAID 5 members are: lun x Label set1 | --------------------------------- - - - - - - - - - | | | | | | Dsk00 Dsk10 Dsk21 Dsk32 Dsk42 | | | Hot spare lun y Label set2 Dsk40 | | --------------------------------- - - - - - - - - - | | | | | Dsk02 Dsk12 Dsk22 Dsk33 Dsk40 Each one of the RAID sets share the hot spare. If one of the members fail, the failed member is taken out of the set, and the hot spare in brought in, and the hot spare is reconstructed. If another member fails, there is currently no available hot spare, and the set runs in degraded mode. RAID 0 members are: lun z Label set3 | ----------------- | | | Dsk01 Dsk31 Dsk41 lun w Label set4 | ----------------- | | | Dsk03 Dsk13 Dsk23 RAID 1 members are: Instead of having luns z and w defined Mirrored strip sets. lun n | ---------- | | set3 set4 Feb 12 12:27 1993 SCSI RAID Paper Page 6 Doc: X3T9.2/93-030 I have taken the approach of listing the problems/questions from boot and configuration, to normal operation of a SCSI RAID box within an operating system environment. This is in no way a definitive statement of the problems that RAID presents to an operating system, but a living document that changes over time with solutions that are found/defined and new problems are discovered. At initial system boot the bus is probed. This is done with the INQUIRY command. A response is expected from a device if it is physically connected and powered up. The response can be either device type or there maybe something here later. With a RAID box that has no RAID logical volumes configured, could respond with a periperal device type of RAID and a qualifier of 001b. Questions: What is the expected response in data byte 0? How to determine that this is a RAID controller that will need specific commands to make a RAID device appear at this target/lun? If there has been a RAID logical volume configured to this target/lun, how to determine what RAID level it is configured? What about a RAID1 set over 2 RAID 5 sets, how is it reported? Possible solutions: Define a new device type for RAID controllers? For RAID levels define a new inquiry data? For RAID over RAID report only top most level? Operating system is now up and running. The probe of the SCSI busses found controller type of RAID. No RAID logical volumes are seen. Questions: How to determine if a backup controller is available? Is it active or passive? Is the controller that is being accessed currently the primary or backup controller? At what bus/target and possibly host does the other controller resides at? How to determine what RAID configurations this box supports? RAID 1 or RAID 3 or RAID 5, one, some or all. How to determine if the box supports RAID over RAID and to what depth? i.e. RAID 1 over RAID 0 over RAID 5. How to determine backend number of busses? Number of targets per bus. Number of luns per target? Feb 12 12:27 1993 SCSI RAID Paper Page 7 Doc: X3T9.2/93-030 How to scan and report what is physically connected on the backend busses. How to rescan a bus after a new device is plugged in on the backend bus.? On completely flexible boxes a couple of pieces of information should be known about the physical device. Disk product and vendor id's (Maintenance) Backend bus, target, and lun currently on. Status of device - up, down, warned. Number of LBN's. Number of bytes of LBN. Does this physical device belong to a logical grouping of physical devices (RAID set)? Where to find the description of the logical volume? How to assign a physical disk to a target/lun for maintenance purposes (update of drives micro code)? Non-maintenance operation for the device? If a logical volume has been configured, how to determine what physical devices make up the volume set? Can we determine status of each of the members? How to determine the total volume status? Reconstruct state, degraded operation, hot spare reconstruct state, volume set constructed but not assigned to target/lun , RAID level assigned, is a hot spare assigned, etc.? Configuring a RAID logical volume. Assigning the RAID level of operation, what physical devices are part of this RAID logical volume, how many lbn's, size of lbn (bytes), chunk size (how many lbn's make up a logical block)? Who is allowed to configure, only the primary controller or both controllers? If the RAID box has an active backup controller, how is a lun assigned to the backup (load sharing)? Can it be migrated back? Can we manually fail one of the controllers? How to mark a failed controller after it has been replaced to a non-failed active status. Feb 12 12:27 1993 SCSI RAID Paper Page 8 Doc: X3T9.2/93-030 Once we have configured the box reads/writes can occur. System/user data can be placed on the device in a reliable fashion. The error reporting/handling model is not defined. What recovery methods can be employed once a condition is detected? Questions: Are there any error conditions reported for the RAID box that require new defined error handling? Is it specific to a sense key and the ASC/ASQ fields? One sense key comes to mind ABORTED Command. If a write command, did all the data make it out and XOR updated in a RAID 5 configuration. Could recovery be rewrite and do an XOR check against all blocks that apply? New ASC/ASQ fields defined? Additional sense data format defined beyond byte 18 for member and volume specific information dealing with this event. As can be seen from this disscussion, RAID hardware implementations present new and unique problems to a system software developer. The question remains what can or should be done to ease software support of SCSI RAID implementations, and SCSI devices in general.