HaaS Fast Deployment

Adding features like fast provisioning using HaaS on top of HaaS

Project logistics

Preferred past experience

Some experience in one or more of the following is preferred in possible team members:

What is the HaaS?

We're developing Hardware as a Service to be a new fundamental layer in cloud datacenters. In a HaaS-enabled datacenter, researchers and developers can independently stand up services on bare hardware, with guaranteed isolation between them. They can scale their deployments by allocating and freeing hardware nodes, without having to change any physical setup of the datacenter. HaaS allows bare metal machines to be used similarly to virtual machines in the cloud.

Why do we need Recursive HaaS?

After allocating new nodes, software must be deployed onto them. Currently, the best deployment options we have involve PXE-booting the nodes into an install image, which can take a while. Just the BIOS POST, the first part of this process, can take 5 or 10 minutes. A faster deployment method is key to users scaling their application up more rapidly.

While it is not acceptable for a general environment, users that have some level of trust with each other could use kexec to move from a running Linux kernel to a pre-boot environment without going through a full hardware reboot. As long as different users of the HaaS agree to leave machines in a state that is ready to kexec, then this could be used to quickly deploy to machines allocated with the HaaS. This could allow massively parallel jobs to run for 10 or 20 seconds across thousands of nodes to quickly accomplish tasks like high resolution rendering or other realtime tasks.

Since HaaS is a small, trusted layer in the data center that is used for both production and research purposes, HaaS needs to be remain stable. Thus, new features need to be well tested before they can be rolled out, simliar to the philosophy of Redhat Enterprise Linux (RHEL). Rather than immediately merging freshly developed functionality like fast provisioning into the stable data center HaaS layer, it would make more sense to have another HaaS into which these changes could be made, similar to how Fedora is the more up to date version of RHEL. It's with this in mind that the other major part of this project, Recursive HaaS comes into play.

Users or groups of users could run their own HaaS on top of the datacenter's base HaaS: the derived HaaS's nodes would be allocated from base HaaS's nodes, the derived HaaS's networks would be set up by the base HaaS, and so forth. But users of the derived HaaS would have an agreement with each other that users of the base HaaS do not have: When freeing nodes, they will be left in a state ready to be kexec'd. This allows users of the derived HaaS to scale up their deployments much more quickly.

Fast provisioning is just an example of recursive HaaS. There are a number of interesting research projects that we see a recursive HaaS implementation enabling.

Possible Goals

Some Technologies Expected To Be Learned/Used