200TB (Just upgraded to 500TB) COTS SAN system using storage virtualization
(NON-proprietary with open legacy)
JES Hardware Solutions has been working with several DOD prime contractors providing high-speed storage to various simulation projects. At first we just provided storage to the DOD contractors, later JES analyzed their current topologies and changed them over to high speed FIBRE. After a few technological bumps, we had successfully designed infrastructure for 10,25,35,55 IG (Image Generator) channel simulators using various FIBRE topologies. These simulators included, but were not limited to, high-speed aircraft and ground troop maneuvers. The key to all of these designs was the use of the COTS paradigm. The reason for COTS is that is easy to replace and repair in the field and of course less expensive.
Due to the impressive work on completed projects in simulator infrastructure, JES was then contacted by a DOD contractor to create a simulation design system that started with 200TB (100TB mirrored) of storage on 4G FIBRE and had to have a growth path of several PB (1000GB). The system of course had to be a COTS design, unlike most proprietary and expensive SAN systems on the market today. The next several paragraphs will describe the elements and design of such a system successfully installed at the end of October of 2006 at a cost of 1M dollars
Topology (diagram 1)
Please note the diagram above, we will describe four elements of this SAN system. Infastructure, Storage, SAN virtulization software and power requirements.
The infrastructure consists of Storage engines, 4G switches and Application Engines.
Storage engines (Storcache)
Storage engines allow up to six RAID systems to be attached to them via 4G FIBRE. The engine computers are standard COTS computers with 3 PCI Express and one PCI-X Emulex LP110002 4G enterprise class dual channel 4G FIBRE adapters. The computers are dual XEON and each engine contains 4GB of high-speed RAM. Windows XP mirrored using the Datacore SAN Melody software to consolidate the storage to be presented to Application Engines. We will describe the elements of the Datacore software further on.
The switches are standard 16 port 4G high availability (dual power supplies). In this case we chose Qlogic 5602 SAN Box 4G switches. These switches are stackable up to 96 ports. With 96 ports we can expand up to 2.3PB mirrored. Switches are managed over private Ethernet local to SAN only.
Application Engines (Appcache)
Application Engines virtualize all storage presented to it from Storage Engines, which is then presented to Application Servers for use. The engine computers are standard COTS computers with 3 PCI Express and one PCI-X Emulex LP110002 4G enterprise class dual channel 4G FIBRE adapters. The computers are dual XEON and each engine contains 16GB of high-speed RAM. 2003 Advanced Server mirrored using the Datacore SAN Symphony software to allow Application Servers access to storage. We will describe the elements of the Datacore software further on.
So, we have a simple COTS infrastructure. You can picture it in two levels. Storage level switches and Application level switches. The storage is attached to the Storage engines and then attached to the storage switches. The Storage switches then attach to the Application switches. The Application Servers then attach to Application Switches. Do not be confused by the two paths as we being redundant and using MPIO. Granted there is a single switch hop, but in order to have growth to several PB this was necessary.
The RAID systems used in the SAN system is base on SATAII disk with 4G FIBRE host. Each RAID system has 16 bays and we used storage rated 500GB hard drives. Each RAID contains 1GB cache RAM. Naturally the RAID systems have triple redundant power supplies and are hot swap. Hot swappable drives and swappable controller. The storage systems are a JES product and are COTS in nature. Each RAID has a unique IP address so that each RAID is managed by private Ethernet, local to SAN only.
The RAID level used was level 0. We used level 0 for speed. Simulations on standard RAID systems can take up to a week to calculate, however with this SAN system we wanted it down to a few hours. Actually the overall SAN system is RAID 0+1. The RAID level the hardware is using level 0 and the Datacore software adds the +1 part. The switches on the SAN are in two zones Storage zone and Mirror Zone. The Mirror Zone is a synchronous mirror of the data storage. This gives the best overall RAID performance. If we used JBOD there would be too many drives to manage. It is easier to manage 20 RAIDs at level 0 than 320 separate hard drives. Also with JBOD we do not take advantage of the cache RAM on the RAID. The beauty of this system is the massive amount of cache RAM at 68GB. Granted when there is a drive pack failure occurs we only have to rebuild 4 drives, because of the way the Datacore software works, this extremely fast. RAID 5 would give more storage yield but the requirement was as fast as possible. Finally backing up 100TB of data is impossible, a logistical nightmare and a mirror will give two copies. The customer wanted to get everything off of the tapes they had already.
The fact that we will not attempt to back up the data, a great deal of redundancy had to be built into the SAN system. As you can see in diagram 1 everything is duplicated and there are multiple IO paths. Note on diagram 1 the Mirror Zone(s), you can see a single line coming out of the APPCACHE1 server to the Mirror Zone over to the other side of the mirror. There is then a second line coming out of Mirror Zone going to APPCACHE2. Once the mirror data is in the APPCACHE2 memory it is written to the proper STORCACHE machine memory and then on to the RAID memory and finally to the disk. This establishes a synchronous mirror between APPCACHE machines. If any part of the system fails there is a secondary path to the data. There is no single point of failure. We could lose the entire left side of the diagram and the Right takes on double duty, until left is repaired. If we lose an APPCACHE machine the working APPCACHE machine(s) will keep feeding the STORCACHE machines on the dead APPCACHE side.
If a STORCACHE is lost we only lose half of the mirror and it will keep the working STORCACHE mirror in sync. Once the STORCACHE is repaired that part of the mirror is re-synced. If we lose a switch, that is just like losing an APPCACHE machine. Since these machines follow a COTS paradigm they are easy to replace and can even be replaced with faster machines of any brand (i.e. Open Legacy). RAID systems have standard repair features like Hot Swap drives, Power supplies and Swappable controllers.
Complete system with storage awaiting UPS system
JES chose DATACORE Software COTS (Commercial off the Shelf) based SAN system able to deliver the scalability and flexibility benefits of an open architecture without sacrificing performance or data protection.
The high-performance based storage network employs dual DataCore SANsymphony (APPCACHE) systems as the backbone of the architecture to enable automatic failover and mirroring of storage resources for mission critical data protection. SANsymphony's storage virtualization software runs on COTS servers, supports vendor independent disks and SAN hardware while providing storage and services to all the major open operating systems (Windows, Linux, UNIX, MacOS, Netware, etc.). The SAN utilizes state-of-the-art 4GB Fibre Channel connectivity throughout for maximum throughput. All disks and systems are fibre attached to a highly redundant shared pool of storage devices protected by four different DataCore SANmelody (STORCACHE) storage servers running under the control of dual SANsymphony based servers.
DataCore's SANsymphony and SANmelody software also caches and accelerates storage I/O performance, adds the storage management services and automates disk capacity administration to all systems through its advanced automation thin provisioning capability. The powerful thin-provisioning feature optimizes disk space utilization and enhances productivity. In effect, thin-provisioning serves up virtual volumes to application servers while only providing thin slices of the total capacity available within the storage pool. These thin slices are provided automatically by the system as users actually have a need to utilize more storage space - "just in time" virtual capacity.
SANsymphony software is an enterprise level open storage networking platform that doubles effective disk utilization by pooling disk space from multiple devices and automatically allocating just enough capacity, just-in-time to needy applications. DataCore's SANmelody disk server software enables enterprises that have maximized their servers' internal disk storage to expand beyond it by converting PC servers into cost-effective expansion disk servers. Both solve the most common storage-related problems:
-- Optimizes disk space utilization.
-- Accelerates disk I/O performance.
-- Automates provisioning and reduces administrator burdens.
-- Speeds up response to change requests; shortens the time to add disk space, restore failed servers or disks and task new ones.
-- Delivers mission critical data protection and auto failover capabilities to insure the highest level of system availability.
The open and hardware independent SAN system was built to evolve and scale to meet future requirements and changes.
Delivery and Installation
The entire system was assembled at our facility. The people who will work on this system attended an entire week of training on SAN theory, the SAN hardware and the management software. Delivery was accepted at our site. If you noticed in the pictures so far all the 74” 48U cabinets are on pallets. The entire system was shipped assembled on a private truck and arrived at the customer site in three days.
Once the systems arrived at the customer site, we had to get the rack cabinets off the pallets. You will need a forklift! Once we got the cabinets on the floor (make sure you get ones with casters), we started installing the 400LB 10U UPS systems into racks. They are COTS UPS from Tripplite .and come in three pieces Control head, PDU (power distribution unit) and Battery. We used separate UPS systems instead of one big UPS, because the system has to be portable. We had electricians install the (5) 208Vsingle phase drops. UPS’s were plugged in and the technician from Tripplite started, tested and trained all of us on the use of the UPS. The shutdown code was then loaded on to the APPCACHE and STORAGECACHE machines, so that if after 10 minutes of power loss we started a phased shutdown of the SAN.
Once the UPS’s were installed and shutdown sequence was tested, we started the SAN and debugged it. Set up the Windows XP 64 application servers and then spent a few days labeling every wire and port on the system and made sure there were no unforeseen problems.
Completed SAN project
You can see in the final picture we have a 100TB mirrored SAN. There is enough room to add another 200TB to the existing racks and UPS’s (a requirement). So we have a 400TB or rather a 200TB mirrored cluster here. When the customer is ready to add another cluster we just connect the new cluster to the Application Level switches. Naturally building and installing the SAN system was not as easy as described here. There are many pitfalls and unforeseen problems, so plan carefully. The entire project took about six weeks and 800 man-hours to complete in the form as seen above. This does not include the man-hours we paid for from other vendors. The system is totally COTS, any section can be replaced or expanded by using commercially available FIBRE storage and Windows based computer systems (AMD/INTEL).
JES Hardware Solutions specializes in designing and implementing COTS storage solutions for all types and sizes of customer requirements. JES also can design custom enclosures and cabling system to fit customer needs and security concerns. JES is an 8(a) and MBE owned company that participates in the American Indian owned business incentive program, that gives DOD sub and prime contractors 5% rebate from DOD customers for JES contracts.