Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

My "restarter process" is upstart. It's convenient, since the OOM-killer tries to not kill init (for bad things happen when you kill init), so it's a somewhat-safe place to put supervisory logic. One of the better calls Canonical has made, I think. :)

Still, in your use-case, I'd definitely recommend only letting users run their "wild code" inside a memory cgroup+process namespace (e.g. an LXC container.)

Crash-only systems only work when a faulty component crashes itself before it crashes you. Processes modellable as mutually-untrustworthy agents should always have a failure boundary drawn between them. (User A shouldn't be able to bring down the cluster-agent; but they shouldn't be able to snipe user B's job by OOMing their job on the same cluster node, either.) And on a Unix box, the only true failure boundaries are jails/zones/containers; nothing else really stops a user from using up any number of not-oft-considered resources (file descriptors, PIDs, etc.)



Do you have any good resources on where to get started going about setting up failure boundaries/jails/zones/containers like this properly?

I think it's surprisingly easy to get yourself in the situation where this is a concern for you[0] but you don't know how to solve it.

[0] Just run "adduser" and have SSH running, or just create an upstart job, or write a custom daemon that accepts and executes jobs from not-quite-trustworthy-undergrads, or...


If you are running Ubuntu, docker.io makes life pretty easy for you to create and maintain LXC containers.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: