July 25th, 2024

Unfashionably secure: why we use isolated VMs

Thinkst Canary's security architecture uses isolated virtual machines for each customer, enhancing data security and compliance while incurring higher operational costs and requiring strong configuration management skills.

Read original articleLink Icon
CuriositySkepticismFrustration
Unfashionably secure: why we use isolated VMs

Thinkst Canary employs a unique security architecture that emphasizes complete customer isolation through the use of isolated virtual machines (VMs). Unlike many cloud-managed services that utilize multi-tenant environments, Canary assigns each customer their own Console, ensuring that data remains separate and secure. This design choice mitigates risks associated with unauthorized access and data breaches, which are common in shared environments. The architecture consists of various services, all contained within individual AWS EC2 instances for each customer, which simplifies monitoring and performance assessment.

While this approach may lack the trendy appeal of modern cloud-native technologies, it offers significant security benefits. The reliance on AWS's hypervisor provides a robust security boundary, limiting the impact of potential vulnerabilities. Additionally, operational issues are confined to individual customers, enhancing reliability and compliance with regulatory requirements. The isolated VM model also facilitates easier geographic data management and staged rollouts of new features.

However, this architecture incurs higher operational costs and demands strong configuration management skills, as maintaining thousands of instances can be complex. Custom monitoring solutions are necessary to ensure the health of these instances, as existing AWS tools may not meet all requirements. Despite these challenges, the benefits of enhanced security and customer-focused service delivery make the isolated VM approach a strategic choice for Thinkst Canary.

AI: What people are saying
The discussion around Thinkst Canary's use of isolated virtual machines (VMs) reveals several key themes regarding virtualization and security architecture.
  • There is a divide between proponents of VMs for security and those who advocate for containerization, with some arguing that VMs are overused and inefficient.
  • Concerns about the operational costs and resource management of using VMs versus containers are prevalent, with some suggesting alternatives like Kubernetes namespaces for customer isolation.
  • Many commenters emphasize the importance of data security and isolation, questioning the effectiveness of current multi-tenant architectures.
  • Some participants highlight the need for better orchestration and management tools for VMs, particularly in open-source environments.
  • There is a general consensus that while VMs provide strong isolation, they come with trade-offs in terms of resource efficiency and operational complexity.
Link Icon 30 comments
By @PedroBatista - 6 months
As a permanent "out of style" curmudgeon in the last ~15 years, I like that people are discovering that maybe VMs are in fact the best approach for a lot of workloads and the LXC cottage industry and Docker industrial complex that developed around solving problems created by themselves or solved decades ago might need to take a hike.

Modern "containers" were invented to make things more reproducible ( check ) and simplify dev and deployments ( NOT check ).

Personally FreeBSD Jails / Solaris Zones are the thing I like to dream are pretty much as secure as a VM and a perfect fit for a sane dev and ops workflow, I didn't dig too deep into this is practice, maybe I'm afraid to learn the contrary, but I hope not.

Either way Docker is "fine" but WAY overused and overrated IMO.

By @ploxiln - 6 months
> we operate in networks where outbound MQTT and HTTPS is simply not allowed (which is why we rely on encrypted DNS traffic for device-to-Console communication)

HTTPS is not allowed (locked down for security!), so communication is smuggled over DNS? uhh ... I suspect that a lot of what the customer "security" departments do, doesn't really make sense ...

By @tptacek - 6 months
The cool kids have been combining containers and hardware virtualization for something like 10 years now (back to QEMU-Lite and kvmtool). Don't use containers if the abstraction gets in your way, of course, but if they work for you --- as a mechanism for packaging and shipping software and coordinating deployments --- there's no reason you need to roll all the way back to individually managed EC2 instances.

A short survey on this stuff:

https://fly.io/blog/sandboxing-and-workload-isolation/

By @bobbob1921 - 6 months
My big struggle with docker/containers vs VMs is the storage layer (on containers). I’m sure it’s mostly lack of experience / knowledge on my end, but I never have a doubt or concern that my storage is persistent and clearly defined when using a VM based workload. I cannot say the same for my docker/container based workloads, I’m always a tad concerned about the persistence of storage, (or the resource management in regards to storage). This becomes even more true as you deal with networked storage on both platforms
By @stacktrust - 6 months
A modern virtualization architecture can be found in the OSS pKVM L0 nested hypervisor for Android Virtualization Framework, which has some architectural overlap with HP/Bromium AX L0 + [Hyper-V | KVM | Xen] L1 + uXen L2 micro-VMs with copy-on-write memory.

A Bromium demo circa 2014 was a web browser where every tab was an isolated VM, and every HTTP request was an isolated VM. Hundreds of VMs could be launched in a couple of hundred milliseconds. Firecracker has some overlap.

> Lastly, this approach is almost certainly more expensive. Our instances sit idle for the most part and we pay EC2 a pretty penny for the privilege.

With many near-idle server VMs running identical code for each customer, there may be an opportunity to use copy-on-memory-write VMs with fast restore of unique memory state, using the techniques employed in live migration.

Xen/uXen/AX: https://www.platformsecuritysummit.com/2018/speaker/pratt/

pKVM: https://www.youtube.com/watch?v=9npebeVFbFw

By @mikewarot - 6 months
It's nice to see the Principle Of Least Access (POLA) in practical use. Some day, we'll have operating systems that respect it as well.

As more people wake up to the realization that we shouldn't trust code, I expect that the number of civilization wide outages will decrease.

Working in the cloud, they're not going to be able to use my other favorite security tool, the data diode. Which can positively guarantee ingress of control, while still allowing egress of reporting data.

By @fsckboy - 6 months
just as a meta idea, i'm mystified that systems folks find it impossible to create protected mode operating systems that are protected, and then we all engage in wasteful kluges like VMs.

i'm not anti-VM, they're great technology, i just don't think it should be the only way to get protection. VMs are incredibly inefficient... what's that you say, they're not? ok, then why aren't they integrated into protected mode OSes so that they will actually be protected?

By @jonathanlydall - 6 months
Sure, it’s an option which eliminates the possibility of certain types of errors, but it’s costing you the ability to pool computing resources as efficiently as you could have with a multi-tenant approach.

The author did acknowledge it’s a trade off, but the economics of this trade off may or may not make sense depending on how much you need to charge your customers to remain competitive with competing offerings.

By @vin10 - 6 months
> If you wouldn't trust running it on your host, you probably shouldn't run it in a container as well.

- From a Docker/Moby Maintainer

By @ianpurton - 6 months
I've solved the same problem but used Kubernetes namespaces instead.

Each customer gets their own namespace and a namespace is locked down in terms of networking and I deploy Postgres in each namespace using the Postgres operator.

I've built an operator for my app, so deploying the app into a namespace is as simple as deploying the manifest.

By @jefurii - 6 months
Using VMs as the unit allows them to move to another provider if they need to. They could even move to something like an on-prem Oxide rack if they wanted. [Yes I know, TFA lists this as a "false benefit" i.e. something they think doesn't benefit them.]
By @smitty1e - 6 months
> Switching to another provider would be non-trivial, and I don’t see the VM as a real benefit in this regard. The barrier to switching is still incredibly high.

This point is made in the context of VM bits, but that switching cost could (in theory, haven't done it myself) be mitigated using, e.g. Terraform.

The brace-for-shock barrier at the enterprise level is going to be exfiltrating all of that valuable data. Bezos is running a Hotel California for that data: "You can checkout any time you like, but you can never leave" (easily).

By @SunlitCat - 6 months
VMs are awesome for what they can offer. Docker (and the like) are kinda a lean VM for a specific tool scenario.

What I would like to see, would be more App virtualization software which isolates the app from the underlying OS enough to provide an safe enough cage for the app.

I know there are some commercial offerings out there (and a free one), but maybe someone can chime in has some opinions about them or know some additional ones?

By @er4hn - 6 months
One thing I wasn't able to grok from the article is orchestration of VMs. Are they using AWS to manage the VM lifecycles, restart them, etc?

Last time I looked into this for on-prem the solutions seemed very enterprise, pay the big bux, focused. Not a lot in the OSS space. What do people use for on-prem VM orchestration that is OSS?

By @JohnCClarke - 6 months
Question: Could you get the customer isolation by running all console access through customer specific lambdas which simply add a unique (and secret) header to all requests. Then you can run a single database with sets of tables keyed by that secret header value.

Would give you very nearly as good isolation for much lower cost.

By @osigurdson - 6 months
When thinking about multi-tenancy, remember that your bank doesn't have a special VM or container, just for you.
By @sim7c00 - 6 months
i wish nanoVMs were better. its a cool concept leveraging the actual VM extensions for security. but all the ones i've seen hardly get into user-mode, dont have stack protectors or other trivial security features enabled etc. (smap/smep) making it super insecure anyway.

maybe someday that market will boom a bit more, so we can run hypervisors with vms in there that host single application kind of things. like a BSD kernel that runs postgres as its init process or something. (i know thats oversimplified probarbly ::P).

there's a lot of room in the VM space for improvement ,but pretty much all of it is impossible if you need to load an entire OS multi-purpose-multi-user into the vm.....

By @Melatonic - 6 months
Eventually we'll get a great system managing some form of micro VM that lots of people use and we have years of documentation and troubleshooting on

Until then the debate between VM and Containerisation will continue

By @solatic - 6 months
There's nothing in Kubernetes and containers that prevents you from running single-tenant architectures (one tenant per namespace), or from colocating all single-tenant services on the same VM, and preventing multiple customers from sharing the same VM (pod affinity and anti-affinity).

I'm not sure why the author doesn't understand that he could have his cake and eat it too.

By @Havoc - 6 months
So you end up with thousands of near idle AWS instances?

There has got to be a better middle ground. Like mult tenant but strong splits ( each customer on db etc )

By @coppsilgold - 6 months
If you think about it virtualization is just a narrowing of the application-kernel interface. In a standard setting the application has a wide kernel interface available to it with dozens (ex. seccomp) to 100's of syscalls. A vulnerablility in any one of which could result in full system compromise.

With virtualization the attack surface is narrowed to pretty much just the virtualization interface.

The problem with current virtualization (or more specifically, the VMM's) is that it can be cumbersome, for example memory management is a serious annoyance. The kernel is built to hog memory for cache and etc. but you don't want the guest to be doing that - since you want to overcommit memory as guests will rarely use 100% of what is given to them (especially when the guest is just a jailed singular application), workarounds such as free page reporting and drop_caches hacks exist.

I would expect eventually to see high performance custom kernels for a application jails - for example: gVisor[1] acts as a syscall interceptor (and can use KVM too!) and a custom kernel. Or a modified linux kernel with patched pain points for the guest.

In effect what virtualization achieves is the ability to rollback much of the advantage of having an operating system in the first place in exchange for securely isolating the workload. But because the workload expects an underlying operating system to serve it, one has to be provided to it. So now you have a host operating system and a guest operating system and some narrow interface between the two to not be a complete clown show. As you grow the interface to properly slave the guest to the host to reduce resource consumption and gain more control you will eventually end up reimagining the operating system perhaps? Or come full circle to the BSD jail idea - imagine the host kernel having hooks into every guest kernel syscall, is this not a BSD jail with extra steps?

[1] <https://gvisor.dev/>

By @JackSlateur - 6 months
Boarf

This can be boiled down to "we use AWS' built-in security, not our own". Using EC2 instances is then nothing but a choice. You could do the exact same thing with containers (with fargate, perhaps ?) : one container per tenant, no relations between containers => same things (but cheaper).

By @udev4096 - 6 months
> Nothing here will earn us a speaking invite to CNCF events

This made me laugh for some reason

By @Thaxll - 6 months
You could use different nodepool per customers using the same k8s control plane.
By @kkfx - 6 months
As much stuff you add as much attack surface you have. Virtualized infra are a commercial need, an IT and Operation OBSCENITY definitively never safe in practice.
By @javier_e06 - 6 months
Months ago I went to the movie theater. Why a $20.00 USD bill in my hand I asked the young one (yes I am that old) for a medium pop corn. "Credit Card" only. He warned me. "You have to take cash" I reminded me. He directed me to the box office where I had to purchase a $20 USD gift card which I then used to purchase the pop-corn. I never used the remaining balance. Management does not trust the crew of low wage minions, with cash, who would?

I had my popcorn right? What is the complain here?

I network comes done, stores will have no choice but to hand the food for free.

I am currently not trouble shooting my solutions. I am trouble shooting the VM.