NFS alternatives / solutions

One of the biggest pain points of EC2 is that there is no easy drop-in replacement for a HA NFS server.  In a typical web server setup you would have your content living in a shared location, and hopefully in your production stack that location is redundant.  A simple setup might just be a clustered NFS server, which involves a shared IP and (ideally) a shared disk.  Larger organizations tend to use dedicated NAS solutions like a NetApp or EMC product (or one of many, many other possibilities).

Unfortunately, none of these are easily possible with EC2, and I’m still amazed that Amazon isn’t offering NAS as a service (NaaS?) – it seems the one glaring omission in their comprehensive smorgasbord of products.  So, what are the potential solutions?

1) Fake a HA cluster using DRBD and an elastic IP.

I don’t like this solution for a few reasons: first, I’ve been burned by DRBD too many times in my career.  Every time I’ve implemented it, I’ve found that it _sometimes_ will cause a kernel panic on a node when the partner node fails.  Admittedly, the last time I checked it out was about 5 years ago, but I think I’ll save revisiting it until I have no choice (if that day ever comes).  Second, using the elastic IP solution means that you’re going to be down for roughly the EC2 internal TTL, which could be 10-20 minutes.  That’s not an acceptable amount of downtime for me.

2) Set up dual, replicated (somehow) NFS servers, one per zone, with servers in each zone pointing to their respective NFS server.

There are a couple of problems with this:  first off, you still need a way to replicate everything, but let’s assume for the sake of argument that that’s not an issue for your environment.  You’re still consuming 2x the required disk, and you’re creating a SPOF for each zone.  Overall, not something I like.

 

IMHO, there’s no good way to replicate your standard NAS setup.  I’ve had lots of discussions with colleagues and with Amazon’s own solutions architects about this, and there’s really no good one-size-fits-all substitute for it.

So, what to do?  Well, unfortunately, the answer is “it depends”.

If you’re just getting started out (i.e. you’re not migrating a NFS-dependent solution to EC2), then you have lots of flexibility.  Here are some examples:

  • Store everything in S3 and use s3fs to remotely access it from your web servers.  Unfortunately this can be pretty laggy, especially if you don’t have any sort of caching layer (like a CDN or just a local caching proxy).
  • Same as above, but separate static and dynamic content and serve the static content straight from S3.  The separation can be a bit of a pain but it’ll speed things up considerably.  Of course, if you have lots of include files or other objects that are used in generating that dynamic content, then things can still get pretty slow.
  • Use S3 as a “source” and periodically sync it to clients, and have clients use local media.  This works great except that it’s tough to manage and using something like s3sync can be very resource-intensive if you’re constantly doing it. This means you’ve got to write a system to manage your deployments.
  • Use S3 for your static content and use a source control package for your dynamic content.  This works great, but of course you need to have a solid deployment process, and you’ll have to come up with some creative deployment solutions if you want all your servers to get updated simultaneously.

Of course, for a lot of larger organizations (or those who don’t want to think about separating out their content), you effectively need to come up with a replacement for NFS.  I’ll get into that in another post, but the short answer is gluster.