A Simple and Robust Virtual Server Architecture
Not too long ago I started looking for a more robust and flexible architecture for my personal server environment that I could also apply to future IT infrastructure. I collaborated with our Engineering Fellow, Scott Ellis, and the architecture we came up with was quite the nice setup. Now, does it work? First a description: The concept would be virtualization since one of my goals was to reduce the number of physical machines to eliminate as much power consumption and cooling required as possible. The obvious choice was ESXi for the virtualization platform as it is well known and relatively easy to work with (this may not be true for others, but that is the subject of another blog post). For storage I wanted something I am familiar with regarding the OS, as I do not have a lot of free time to work on or administer this stuff. That also means as robust as possible. It seemed that sharing the datastore (Hah, stole that term from VMWare!) via NFS with a ZFS file system as the backend was quite workable. The added benefit is that I would not be the only one in the world doing this, so it is somewhat a tested architecture and I could probably find help if needed through online sources.
The hardware is not really that important, except for the ZFS sub-system as I found out later. I will discuss those details in the next paragraph and at the end of the post. I decided to go with a single ZFS pool and divide it into the necessary file systems. One of the ZFS file systems would contain the virtual hard drives for the virtual machines and the ESXi server would have access to that datastore via NFS. The other ZFS file systems would contain the majority of the data for those servers, effectively being the /home directories and mountable to the virtual machines via NFS as well. I also set up a private network between the two machines for the datastores for an additional level of security. First off was a server install of CENTOS 6.4 64 bit, then adding ZFS to it. Since I was not going to mess around with trying to boot off of ZFS, the boot drives for the storage server were mirrored with mdraid which is easy enough to do during install; I used a couple of old (matter of fact, all the hardware was old, used stuff laying around not being used at all) 74GB Raptor SATA drives. Once the OS was installed, I added the elrepo and updated to the latest stable kernel which is necessary to support ZFS. Then added the ZoL (ZFS on Linux) repo and installed the ZFS packages.
The next step was to create the ZFS pool, where I found the online Solaris ZFS documentation invaluable. The same for the ZFS file systems re: documentation. Some things to note about zpool and zfs settings: set your ashift to 12 to enable utilizing newer disks if you need to replace older disks with newer or upgrade your disks (zpool setting), enable your system to perform better by turning off atime (zfs setting), and turn compression on by setting to lz4 as it can save disk space at almost no realized cost to your system performance (zfs setting). These settings can be added after creation, but you should do this immediately or any files written before adding the setting will not have those settings applied. In the case of the ashift=12 setting for the zpool, you will have to destroy and recreate it to enable this setting as it effects how ZFS talks to and formats the drives. After the ZFS file systems were created I used the standard NFS configuration to share the different file systems, setting them to share only over the private network so those shares would not be exposed to the rest of the world.
Once NFS shares are set up it was time to bring up the VMWare ESXi machine. I used a USB drive to load ESXi, making the server setup very portable in case of a need to change the VM server for any reason, the most obvious being a need for more performance or a system failure. One SATA drive was added to the system to provide scratch space just in case; otherwise all storage would be via NFS to the storage server. In my case I had to determine a method to migrate physical servers to virtual machines. Unfortunately for linux it is not straightforward without setting up a separate converter system (VMWare Standalone Converter). So I opted for another method: install new VM’s of the appropriate OS and then migrate data via rsync. This actually allowed me to upgrade one system to CENTOS 6 vs. the CENTOS 5 it was currently running on. The other needed to stay at CENTOS 5 due to the mail application running on it. This method required some trial and error, as I found moving some directories over to the ‘new’ system messed it up! Namely (for those of you wanting to try this) the boot, sys, proc, and dev directories; also the network config files (ifcfg-*) and the fstab. Those are system hardware dependent and copying from the old machine will cause issues if they are on the new system. I found the easiest way to do this is to create a script file to rsync the appropriate directories over to the new machine, saving the above mentioned files before copying the /etc directory over so they can be restored after the rsync script is complete. I did this copy twice, once for the initial large amount of data and a second time after idling the old system to get any changes after the first rsync. This allowed minimal downtime as the final rsync was very fast. Once it was complete the old system was shut down and the new system brought online in its place. In my case I was able to use new internal network ip addresses and just redirect the external ip’s to the new internal ip (i.e. change the NAT’ed internal IP from old to new). That was the process in a nutshell and it was relatively painless and simple. For me it was great to apply some technical expertise since the bulk of my job is management. It was great to get some hands on again!
After observing the system for a while I saw the datastore latency was a bit high. Actually it was on the order of 50-200ms+ before a write operation completed. This did not seem to impact the overall systems performance at all since they were providing internet services, but it bugged me. I finally decided to make a concerted effort to address the read/write latency issue. I tried adding quad bonded network connections and moving the VM’s directly onto the storage server using KVM (another article! This ended up being a fail for the most part, probably because of limited time and understanding on my part). The last method to try and the technique that resolved the issue was to add an SSD for ZFS logging and read cache. Getting the SSD installed was a bit of a headache as the mainboard had a proprietary connector to power the SSD (all other HDD’s were powered through the backplane) but I was able to find parts and get a cable built to get that going. Once the SSD was installed, I partitioned it into two partitions, a small 2 GB (way more than needed, about 500MB max is used by the system) partition and the rest of the drive in a second partition; it is a 32GB SSD. The first, smaller partition was allocated to the logs and the second, larger partition was allocated to the read cache. It is quite simple to do this on the fly with ZFS while the system is live, and the results are immediate. Now the read and write latency is very close to that of real physical disks, on the order of 25ms and even much less writes. I tested and discovered that you can even make writes (i.e. the log) synchronous and still be well under 50ms write latency if you are concerned about your data. Being these systems are providing internet services the data is not that critical so I have the standard setting for sync.
That is it. Performance is just as good or better than the physical machines I used to have and there is a LOT of overhead left on both the ESXi system and the storage system. Plus I have the robustness of ZFS storage for all of my data and the machine count and power consumption is reduced and snapshots for backups are easy to accomplish. One could even set up a machine in a remote location to move differential snapshots to, or if it is not a lot of data an external hard drive could be used for offsite storage. The good thing about our hardware is that it is highly adaptable to almost any application, appliance software, or complicated architecture. Pretty much anything out there can run on our stuff due to the flexibility.
Let me know what you think. Leave me a comment below.