Data Center Solutions Case Study
Northern Arizona University (NAU), with nearly 30,000 students and 13 statewide locations, was experiencing poor system performance in their virtualized student labs. At peak times, students had trouble logging onto the system and additional difficulties running their academic and Microsoft® Office applications. By deploying Nexenta’s software-defined storage product equipped with Optimus UltraTM SAS SSDs from SanDisk, the IT Services team was able to restore students’ productivity and support their success.
Northern Arizona University is home to 28,000 students and 3,500 faculty and staff spread across 13 statewide locations. With seven colleges and 35 schools and departments, the University is known for intellectual rigor, accessible faculty, and both online and on-site classes. NAU offers baccalaureate degrees in 15 disciplines, ranging from the Humanities to Mathematics and Science. In addition, master and doctoral programs in multiple fields, from Nursing and the Sciences to Early Childhood Education are also available.
The University has a mixture of virtualized labs with as many as 410 client PCs across campus. With nearly 20,000 students at the Flagstaff Campus, some 8,000 routinely utilize the technology labs that are on-site at each of the NAU sites. In addition, the team at Information Technology Services (ITS) creates lab images that are used at remote locations. Their aim is to have a uniform set of software applications within a working environment, irrespective of a student’s unique location. There is a great deal of diverse use of computer labs across the campus—from student orientation and registration to course work and printing. Students use virtual terminals not only within the technology labs, but also at kiosks, at print release stations, and in small dormitory labs.
Both Tobias Kreidl, the Academic Team Lead for NAU’s Computing and Communication Systems (CCS) Division of Information Technology Services (ITS) and Duane Booher, Senior Software Systems Engineer at NAU’s CCS wear many hats. The ITS team supports a number of services, such as remote connectivity, local labs, authentication and authorization, web services, and problem support. “Students are our number one customer. We have to keep them productive and happy, and keep satisfaction high. Our job is to provide IT solutions to the students to make them successful in their academic pursuits and their eventual career paths,” commented Booher.
The ITS team was seeing significant issues with the performance of applications in the student labs. Students frequently had issues logging on, and when they did log on, applications sometimes would not run well. Typically, these issues would arise during peak load times when the labs were heavily used, such as early in the morning, at noon hour, and in the late afternoon. The team observed a reoccurring situation during the semester when labs were usually very busy in combination with other network events: the labs would start emptying out, to the point that only five to ten users would still be there working. “This was the worst illustration of the fact that we had problems,” said Booher. “We were having problems providing consistent response times to the users, and the services would not always be available. When we had those situations, other departments and groups would call us to say that the systems were not responding to the students’ needs.”
Behind the scenes, the service time on the NAU hard disk drive (HDD) arrays would increase tremendously during peak times resulting in much slower HDD I/O. This slowed down the entire login process, and affected user experience with applications, as users fought for limited I/O cycles. In the virtualized labs the HDD arrays simply could not keep up.
“We would have peak times where students would all tend to enter the labs, and the system was overwhelmed with the burden of dealing with everyone at the same time,” said Kreidl. “It simply could not handle the traffic adequately. Once things started getting bad, they got progressively worse at a much faster rate, and the system was hitting the threshold where the performance was unacceptable.” The ITS team rapidly began looking for a solution to deal much better with the I/O at these peak times.
“The array we use has effectively two controllers on it, and as you add more HDDs, the I/O burden for every HDD eventually falls on one of the controllers,” continued Kreidl. “As we kept adding storage, the resources got spread even thinner, so the controllers didn’t have enough power to deal with all of the I/O, which at times can be as high as up to 30,000 IOPS—more than what that particular device was designed for. It was effectively running out of capacity to service all of the concurrent I/O.”
Login times, normally anywhere from 30 to 40 seconds, had now increased to 5 minutes or more. Once logged in, users would launch applications such as standard Windows office products and would experience significant delays. Moving the cursor from one cell to another in Microsoft Excel would result in considerable wait times. Essentially, the systems would be completely unusable from the students’ point of view.
“Since every NAU student has to pay an IT fee, a poor user experience does not bode well for the perception of satisfying our students’ needs,” commented Booher. “In addition to enabling students’ productivity, we’re also being graded by them in terms of our success rate, in terms of being able to meet their needs. Students use our computing environment to access the Internet and to do their academic work. Without the students, we wouldn’t be here doing our jobs—we’re here to serve the needs of the students.” Booher added, “When many of our students would call us or email us and let us know that they were having a really difficult time logging on, we knew that performance was really sluggish.”
Not only was NAU running out of storage space, but the practice of simply adding additional storage was not alleviating the problem. The storage arrays were already overburdened with respect to processing capability. NAU needed an innovative and economical solution that would be sustainable over a longer period of time.
Duane Booher, Senior Software Systems Engineer at NAU in the ITS centralized computing department, Northern Arizona University
Kreidl and Booher attended the Citrix Synergy Conference in Anaheim, California in 2014 with the express goal of scouting out potential storage solutions that could satisfy their criteria of economics, flexibility, sustainability, and performance. A linear storage model was no longer an option because the system’s controllers were already having a terrible time keeping up with the I/O. Therefore, simply adding HDDs would have exacerbated the problem.
“We were seeking out a different solution,” said Booher. “We were using network- based storage solutions; that is, a high-density, high-volume, high-capacity storage solution over the network, and it wasn’t meeting our needs. We were doing something quite innovative with the virtual terminal labs and it just wasn’t performing at an acceptable level, so we needed a storage solution that was more scalable and more responsive.”
The team looked at different types of network-based solutions, such as software- defined storage, which included Nexenta with SanDisk flash storage. Back at the office, the team downloaded and tested the community edition of NexentaStor, determining that the Nexenta solution would indeed outperform their existing system.
The team was impressed that Nexenta uses Open Source for their base operating system and their I/O mechanisms. NAU, along with many other universities, is a staunch supporter of Open Source. Open Source solutions are known for their reliability and provide a great deal of flexibility to develop custom solutions, which is critical in a university environment where IT must support diverse needs. In addition, Nexenta is a certified platform for one of NAU’s hardware vendors, which enabled the University team to utilize existing hardware and facilitate the purchasing process.
“The specific reason that we ultimately wanted to go with Nexenta was to leverage some of the existing hardware we already had, which for us as a university is important from a cost-saving perspective,” explained Kreidl. “The other thing was the ability to get a trial system and really put it through the paces well ahead of time. Nexenta’s solution provided affordability, reliability, scalability, performance, and the ability to move to a highly available system by clustering two servers. The Nexenta solution encompassed all of the really important points for us.”
Initially, the NAU system did not include a solid-state drive (SSD) card. Therefore, one of the steps the ITS team took to improve system performance was to add the Optimus Ultra SAS SSD from SanDisk. “The Optimus Ultra SAS SSD was readily available, and it was part of the Nexenta certification package as a compatible solution,” said Booher. “It is also a fast device and can sustain many writes for an extended period of time.” NAU procured two devices to ensure high availability. NAU uses paired SSDs, in a RAID 1 configuration, to ensure internal redundancy. In the unlikely event that one drive should fail, the other drive will completely take over.
“The Optimus Ultra (150GB) SAS SSD seemed like a perfect choice because of its availability and its ability to integrate into the 2.5” SAS form factor, which allowed us to make use of our existing hardware,” said Kreidl. “These are the storage array trays that we had available, and so obviously we wanted to make the best use of that, if possible. The Optimus Ultra SAS SSDs from SanDisk really stood out for us because, for the NexentaStor solution, you really don’t need more than roughly 8GB of storage. In that respect, the 150GB version of the Optimus Ultra SSD was a perfect choice because of its availability, its reliability in terms of how many read/write cycles it could sustain over its predicted lifetime, and its ease of deployment.”
Nexenta uses the SanDisk SSDs as write-ahead devices, which results in a faster user experience, especially when hundreds of students are simultaneously accessing the system. “NexentaStor uses the SSDs as a buffering mechanism to immediately respond back on an I/O request, and in the background, it takes care of the rest of the I/O integrity and writes it,” explained Kreidl. “The fact is, in our virtual environment, 75 percent of all I/O is writes, and a fast write-ahead device, called the Nexenta log device, makes for a significant amount of increased throughput and faster response time.”
By using the Optimus Ultra SAS SSDs for the Nexenta write-ahead logs, the system improved the throughput significantly. “Most of the logging is an ongoing process that fights against the same cycles that users need in order to have their data written to permanent storage,” continued Kreidl. “However, the Optimus Ultra SSD provides the means of dividing the I/O, so that the I/O is not taking place on the same physical devices that hold the data. This leaves more of the efficient cycles available for users to write their data. The incorporation of the Optimus Ultra SAS SSDs from SanDisk made a very noticeable difference in terms of the overall performance.”
First and foremost, the Nexenta/SanDisk solution allowed NAU to provide consistent performance and reliability to IT users. Response times for logging in and launching applications were now both consistent and acceptable, even during the highest peak loads.
“You have this inherent inefficiency of everybody wanting to work at the same time. By isolating the log portion, the performance improved very, very significantly,” said Kreidl. “Plus, if the system should suddenly crash for any reason, you have a record of all of the changes that have taken place. When the system comes back up again, you haven’t lost anything because the changes that normally would have been written to disk are then replayed and caught up before the system becomes live again. Not only do you have the efficiency, but you also have very nice protection against any potential system crash.”
Tobias Kreidl, Academic Team Lead for the Computing and Communication Systems division of ITS, Northern Arizona University
NexentaStor, together with SanDisk SSDs, enabled the ITS team to move from their iSCSI network-based protocol to an NFS-based storage protocol, which provides a mechanism for thin provisioning. With thin provisioning, the team was hoping to save a significant amount of storage space, while retaining the same efficiencies.
Thin provisioning significantly cut down on the amount of storage that NAU needed. “We typically have achieved something on the order of about a 20 to 1 reduction in our storage requirements, which has been tremendous,” said Kreidl. “More importantly, we could see that login times became much more uniform, and many of the problems with slow response for users had completely disappeared.”
Students in the lab environment were utilizing approximately 13.4TB of data with the original architecture. The ITS team considered buying another tray to add an additional 5.5TB to accommodate the standard thick provisioning and the iSCSI solution. However, once the team moved to the thinly provisioned model, storage requirements were reduced to approximately 450GB. “The model is now very sustainable because the thin provisioning is extremely efficient, and the I/O is so good,” said Kreidl. “The elegant solution of Nexenta with SanDisk allowed us to take a significant portion of the read I/O—usually 90 or more percent—and cache that. So when users are accessing the identical data, they can pull that right out of the cache memory instead of having to go to hard disk for every one of these same processes. The time it now takes to provision is significantly less than before. That’s another benefit of this new paradigm.”
Since NAU deployed the NexentaStor/SanDisk solution in late 2014, metrics have shown the system to be robust and responsive. The ITS team has not encountered any issues with the system and has been extremely satisfied with the stability, performance, and overall integrity of the combined solution. “The execution and reliability of the SanDisk SSDs have been flawless from the beginning,” said Booher. “We were able to dynamically add them while the system was up and running and get immediate benefits. The internal I/O operation is handling thousands of requests, but it all translates back to the user running their Windows virtual terminals. Every operation that they do forces a 75 percent write activity, and it was much slower without these fast SSDs.”
Booher and Kreidl observed that the metrics for peak loads and low usage loads were consistent across the board, regardless of system loads. Response time is now consistent and reliable, despite high usage. Response times used to increase as usage increased, but the system is now stable during peak times with heavy usage. “The line chart shows the number of logins and usage on the system and how they vary at different times of the day,” said Booher. “Sometimes they’ll go up at peak time during the day. In contrast, the bar charts are the response times that the user is receiving along with the login times, and they’re very consistent no matter the system demand.”
NAU monitors a typical performance metric—input/output operations per second (IOPS). In the previous environment, the metrics were periodically hitting up to 20,000 to 30,000 IOPS, which was much higher than the team anticipated. Unfortunately, IOPS greater than 20,000 noticeably degraded system performance.
“Now, because the read cache rate is high—typically around 90 percent—no HDD I/O has to take place,” explained Kreidl. “Instead of going to HDD to read data, all of that information is pulled out of NexentaStor memory. We now see peak rates of only 2,000 to 3,000 IOPS—decreased by at least a factor of ten.”
The team confirmed that combining a high cache rate with writing journal logs to the Optimus Ultra SAS SSDs from SanDisk makes a significant difference in end- user experience. “The very high cache hit rate indicates that memory is responding directly to I/O requests, instead of going to I/O on the disk drive,” added Booher. “The server response time for the storage is well under 10 milliseconds.”
Latency and Response Time
ITS has been running the Nexenta/SanDisk solution for nearly a year and is satisfied with the performance to date. “We have had zero issues with any of the SanDisk SSDs that we’ve deployed so far,” said Kreidl. “Latency is another really important metric. Before, we had service times that were up to 100 or more milliseconds, which by normal standards is considered really slow. At one very busy time in production, I noticed that the IOPS rate hit something like 27,000. If that had happened before, our latency would have gone through the roof. In this case, the latency was just 12.8 milliseconds, which is remarkably good. We definitely have the numbers that show that this solution is providing a significantly better performance. We very rarely see latency above about 10 milliseconds, and our IOPS rates are significantly lower than before.”
The Nexenta/SanDisk storage solution works well in a write-intensive environment. “End users don’t realize it, but what they are doing is very write-intensive,” said Booher. “Once we went to the Nexenta/SanDisk storage solution, our results were like night and day. The SanDisk SSDs can sustain a high volume of writes very quickly, so the throughput compared to HDDs is quite a bit faster. We went from an unacceptable situation to an exceptional responding situation. We basically stopped getting complaints from users about the reliability and response of the system.”
NAU gained significant cost savings by re-deploying existing products with the Nexenta storage solution. Although they had previously deployed their hardware in an iSCSI configuration, they converted it to NFS-based storage to take advantage of thin provisioning. “The 20-to-1 consolidation is a very interesting combination of thin provisioning—through NFS along with SanDisk—and on-the-fly compression and decompression,” said Kreidl. “For our Windows images, I’d estimate we gained a factor of 1.7 to 1. To put it in perspective, instead of having to buy 17TB of storage, we can effectively have the same type of storage using only 10TB. That’s a substantial savings overall, and because the space utilization has been so efficient, we, are now using some of the additional space we gained back for other storage projects.”
The ITS team conducts customer satisfaction surveys each semester. Booher commented, “If our services are not functioning properly, we hear about it. The number of complaints went way down once we deployed the Nexenta/SanDisk solution. In fact, it went from many complaints during the course of the semester to essentially zero complaints during a whole semester after that.”
The NAU team recognizes the track record and longevity of SanDisk products. “SanDisk has been around for a long time,” said Booher. “They have a very consistent track record providing a quality, fast- performing product. They’re able to meet the needs of an IT environment where servers and storage are constantly changing. We’re very comfortable using SanDisk as our equipment vendor going forward.”
The team has been extremely satisfied with the responsiveness demonstrated by the SanDisk personnel. SanDisk engineers provided both support and advice on how to configure the system, and they recommended the solution that best fits the needs of NAU.
The success of the Nexenta/SanDisk solution has given the ITS team the confidence to consider deploying additional systems across NAU campuses. “We’ve been running for nearly a year in production, and the Nexenta/SanDisk storage system has executed flawlessly,” said Booher. “As a result, we’ve been working with other departments on campus to implement additional storage solutions. For example, one department with a high availability cluster system is implementing a massive storage farm (624TB) with the Nexenta storage solution and will soon be in production.”
Another NAU college is also considering expanding its virtual environment and deploying Nexenta. “At this point, we have no reservations about recommending the Nexenta/SanDisk solution to other groups and departments within the university and on virtual forums. We have a very high confidence in its reliability, performance gains, and cost-effectiveness,” continued Booher.
*1GB = 1,000,000,000 bytes. Actual user capacity less.
The performance results and cost savings discussed herein are based on internal testing and use of Optimus products. Results and performance may vary according to configurations and systems, including drive capacity, system architecture and applications.
Whether you'd like to ask a few initial questions or are ready to discuss a SanDisk solution tailored to your organizations's needs, the SanDisk sales team is standing by to help.
We're happy to answer your questions, so please fill out the form below so we can get started.
Thank you. We have received your request.