Get off NFS while you still can. Thank me later.

ptrincr · on Feb 7, 2016

I think NFS often gets an undue bad reputation. I work at a company which uses NFS at scale and it does the job without too much trouble. NFS is used as the storage for vmware datastores, xen primary storage and also for shared storage mounted between servers.

For the latter case, the mounting of these partitions can be automated with config management on the linux servers. You have to be careful with UID's and GID's but config management helps with this.

The filers supplying the NFS storage can be exploited to provide replication to other datacenters,snapshots and also provide redundancy with multiple heads serving the volumes.

In the past I've used Fibre channel ( found it overly complex) and iSCSI. iSCSI was fairly straight forward to use, but I've never tried to automate it. I guess there isn't a reason you couldn't however. For complexity I guess its Fibre>iSCSI>NFS.

Performance wise we don't have any issues with NFS itself, the bottleneck is sometimes the filer trying to keep up :-)

Anyhow, in complex environments, sometimes its good to keep things simple where you can. NFS helps with that, its stable, scalable and the performance is comparable to iSCSI.

Removing the need for shared storage on the OS where possible is the ultimate aim though.

digi_owl · on Feb 7, 2016

I wonder of how much the experience differs based on the NFS version being used.

gaius · on Feb 7, 2016

Agreed. I've run Oracle with thousands of commits/sec on NFS with no problems. Or no more problems than we'd have had on any storage.

jordanb · on Feb 7, 2016

Yep. I work for a big company with a bureaucratic system engineering department that puts all program code on NFS (SANs).

It's been the cause of virtually all of our service outages and many of our performance problems---and it's completely obsolete in the era of Jenkins and Ansible.

simonw · on Feb 7, 2016

It sounds like they are using NFS for distributing their code, which should be almost entirely read-only - the only place that writes new files to the file system is the single Jenkins machine that manages the deploys. It seems likely to me that this would avoid most of the risks inherent in running NFS at scale.

rachelbythebay · on Feb 7, 2016

Let's say you distribute a binary over NFS -- compile some C or C++ or whatever you like. Then various hosts run /path/to/binary. At some point, the NFS server changes that file out from underneath those hosts, because, well, it can. The usual "text file busy" you'd get when trying to do that on a local filesystem never happens.

At some point after that, the hosts running the binary will try to page something involving that binary and will SIGBUS and die.

That's just one of many failure modes.

apk17 · on Feb 7, 2016

They say they put out new package; I take that to imply they don't rewrite files once deployed.

But what you're describing has interesing failure modes indeed - from just successfully patching the running process to the SIVSEGV to having the target process ending up in an infinite loop (BTST).

jagsta · on Feb 7, 2016

That's not the only issue with running NFS with a horizontally scaled web cluster, stuff like this starts to hurt: http://www.serverphorums.com/read.php?7,655118

TomNomNom · on Feb 7, 2016

That is something we're working on at the moment. It is read-only from the web servers' perspective, but it's definitely not without its problems.

Outdoorsman · on Feb 7, 2016

Looks like a large distributed company...I have some experience with casinos...similar...

I'd be interested to know more about your experiences with an NSF "share"...which he mentions...?

Thanks!