Why Oxide Chose Illumos
The Oxide Rack will use KVM or bhyve as the VMM, considering Rust for system programming. Key features include live migration, security measures, and strong isolation for enhanced reliability.
Read original articleThe document discusses the design and implementation considerations for the Oxide Rack's host operating system and hypervisor. It outlines the hardware and software components that will form the backbone of compute and storage services, emphasizing the importance of selecting an appropriate Virtual Machine Monitor (VMM). The primary candidates for the VMM are KVM on GNU/Linux and bhyve on illumos, with KVM being favored for its feature richness and support for modern virtualization needs. The document also highlights the potential of using Rust for system programming, advocating for its safety and performance benefits over traditional C. It notes the necessity of supporting popular guest operating systems like Linux and Windows, and outlines essential features such as live migration, security measures against microarchitecture attacks, and the need for robust out-of-band management. The document concludes by emphasizing the importance of isolation and sandboxing to enhance security and reliability in the virtualized environment.
- The Oxide Rack will utilize KVM on GNU/Linux or bhyve on illumos as the VMM.
- Rust is being considered for system programming due to its safety and performance advantages.
- Essential features for the Oxide Rack include live migration, security against microarchitecture attacks, and support for Linux and Windows guests.
- Strong isolation and sandboxing facilities are necessary to mitigate potential security vulnerabilities.
- The design aims to balance performance, safety, and maintainability in the virtualized environment.
Related
Xen Project in peril as colo provider housing test platform closes
The Xen Project faces challenges due to the closure of its colocation provider's data center, impacting crucial testing tools. Options like relocation and system switch are considered amid concerns about maintaining testing capabilities. Shutdown risks reduced bug detection and slower responses, affecting development efforts.
Lennart Poettering: Fitting Everything Together
The blog post explores integrating systemd components for Linux OS development, emphasizing hermetic /usr/ design, image-based OS with security features, self-updating systems, and community-driven desktop OS with advanced security measures.
Rust for Filesystems
At the 2024 Linux Summit, Wedson Almeida Filho and Kent Overstreet explored Rust for Linux filesystems. Rust's safety features offer benefits for kernel development, despite concerns about compatibility and adoption challenges.
Oxide: Control plane data storage requirements
The document specifies requirements for a control plane data storage system for Oxide, highlighting the need for high availability, scalability, security, and a thorough evaluation of NewSQL technologies.
Rust in Illumos
The article discusses the integration of Rust into the illumos operating system, highlighting development challenges, the need for community collaboration, and inviting contributions for userland tools and driver development.
{{citation needed}}?
When I ran the numbers in 2019, there hadn't been guest exploitable vulnerabilities that affected devices normally used for IaaS for 3 years. Pretty much every cloud outside the big three (AWS, GCE, Azure) runs on QEMU.
Here's a talk I gave about it that includes that analysis:
slides - https://kvm-forum.qemu.org/2019/kvmforum19-bloat.pdf
Even if you think it's a foregone conclusion given the history of bcantrill and other founders of Oxide, there absolutely is value in putting decision to paper and trying to provide a rational because then it can be challenged.
The company I co-founded does an RFD process as well and even if there is 99% chance that we're going to use the thing we've always used, if you're a serious person, the act of expressing it is useful and sometimes you even change your own mind thanks to the process.
Bryan Cantrill is CTO of Oxide [1].
I assume that has no bearing on the choice, otherwise it would be mentioned in the discussion.
[1] https://bcantrill.dtrace.org/2019/12/02/the-soul-of-a-new-co...
Given all of that, and taking into account building a product on top of it, and thus needing to support it and stand behind it, Linux wasn't the best choice. Looking ahead (in terms of decades) and not just shipping a product now, it was found that an alternate ecosystem existed to support that.
Culture of the community, design principles, maintainability are all things to consider beyond just "is it popular".
Exciting times in computing once again!
1. Xen Type-1 hypervisor is smaller than KVM/QEMU.
2. Xen "dom0" = Linux/FreeBSD/OpenSolaris. KVM/bhyve also need host OS.
3. AMZN KVM-subset: x86 cpu/mem virt, blk/net via Arm Nitro hardware.
4. bhyve is Type-2.
5. Xen has Type-2 (uXen).
6. Xen dom0/host can be disaggregated (Hyperlaunch), unlike KVM.
7. pKVM (Arm/Android) is smaller than KVM/Xen.
> The Service Management Facility (SMF) is responsible for the supervision of services under illumos.. a [Linux] robust infrastructure product would likely end up using few if any of the components provided by the systemd project, despite there now being something like a hundred of them. Instead, more traditional components would need to be revived, or thoroughly bespoke software would need to be developed, in order to avoid the technological and political issues with this increasingly dominant force in the Linux ecosystem.Is this an argument for Illumos over Linux, or for translating SMF to Linux?
curious about what bugs are being thought of there. Sounds like a very interesting situation to be in
It's a small operation, but https://openbsd.amsterdam/ have absolutely proven that OpenBSD's hypervisor is production-capable in terms of stability - but there are indeed other problems that rule against it on scale.
For those who are unfamiliar with OpenBSD: the primary caveat is that its hypervisor can so far only provide guests with a single CPU core.
If I were Oxide, though, I’d be sprinting to seamless VMWare support. Broadcom has turned into a modern-day Oracle (but dumber??) and many customers will migrate in the next two years. Even if those legacy VMs aren’t “hyperscale”, there’s going to be lots of budget devoted to moving off VMWare.
Linux has a massive advantage where it comes to hardware support for all kinds of esoteric devices. If you don't need that, and you've got engineers that are capable of patching the OS to support your hardware, yep, have at it. Good call.
* https://www.youtube.com/watch?v=UvEKSqBBcZw
Certainly they already had experience with ZFS (as it is built into Illumos/Solaris), but as it was told to them by someone they trusted who ran a lot of Ceph: "Ceph is operated, not shipped [like ZFS]".
There's more care-and-feeding required for it, and they probably don't want that as they want to treat product in a more appliance/toaster-like fashion.
> To mitigate all this, we’re intending to stick with the OSS build, which includes no CCL code.
[0] https://news.ycombinator.com/item?id=41256222
[1] https://rfd.shared.oxide.computer/rfd/0110
It is so sad that we've ended up with designs where this is the case. There is no intrinsic reason why nested virtualization should be hard to implement or should perform poorly. Path dependence strikes again.
That's with virtio, the virtual intel "card" is even slower.
They went with Illumos though, so curious if the poor performance is a FreeBSD-specific thing.
I know NetApp (stack based on FreeBSD) contributed significantly to Bhyve when they were exploring options to virtualize Data ONTAP (C mode)
https://forums.freebsd.org/threads/bhyve-the-freebsd-hypervi...
The section about Rust as a first class citizen seems to contain references to its potential use in Linux that are a few years out of date; with nothing more current than 2021.
> As of March 2021, work on a prototype for writing Linux drivers in Rust is happening in the linux-next tree.
Bryan Cantrill, ex-Sun dev, ex-Joyent CTO, now CTO of Oxide, is the reason they chose Illumos. Oxide is primarily an attempt to give Solaris (albeit Rustified) a second life, similar to Joyent before. The company even cites Sun co-founder Scott McNealy for its principles:
https://oxide.computer/principles
>"Kick butt, have fun, don't cheat, love our customers and change computing forever."
>If this sounds familiar, it's because it's essentially Scott McNealy's coda for Sun Microsystems.
Frankly I don't understand why they blogged that at all. It reeks of desperation, like they feel they need to defend their choice. They don't.
It also should not matter to their customers. They get exposed APIs and don't have to care about the implementation details.
Related
Xen Project in peril as colo provider housing test platform closes
The Xen Project faces challenges due to the closure of its colocation provider's data center, impacting crucial testing tools. Options like relocation and system switch are considered amid concerns about maintaining testing capabilities. Shutdown risks reduced bug detection and slower responses, affecting development efforts.
Lennart Poettering: Fitting Everything Together
The blog post explores integrating systemd components for Linux OS development, emphasizing hermetic /usr/ design, image-based OS with security features, self-updating systems, and community-driven desktop OS with advanced security measures.
Rust for Filesystems
At the 2024 Linux Summit, Wedson Almeida Filho and Kent Overstreet explored Rust for Linux filesystems. Rust's safety features offer benefits for kernel development, despite concerns about compatibility and adoption challenges.
Oxide: Control plane data storage requirements
The document specifies requirements for a control plane data storage system for Oxide, highlighting the need for high availability, scalability, security, and a thorough evaluation of NewSQL technologies.
Rust in Illumos
The article discusses the integration of Rust into the illumos operating system, highlighting development challenges, the need for community collaboration, and inviting contributions for userland tools and driver development.