> On premise in my opinion needs a dedicated team managing hardware and leverage...

apelapan · on Dec 10, 2024

If you are in the cloud, you are going to need a team that understands cloud networking, storage, deployment, security etc. You will need enough people to maintain support rotations and survive normal churn.

It seems like many people/organizations belived that they would be rid of the whole "operations problem" once they shifted all their workloads from on-prem to cloud. They believed that they paid a full team for running cables and replacing broken fans/hard drives/PSU:s, when that aspect of on-prem is a tiny (but non-zero) amount of work.

movedx · on Dec 11, 2024

I don't believe a lot of this is required.

OS level security? So, "apt update && apt upgrade", then? I mean, what else are you doing, writing patches for the kernel? Checking every line of code that runs? Are you aware of how effective SELinux and systemd containers are? Just a simple firewall at the OS level? Maybe even just using Tailscale (or the open source Headscale) to introduce zero trust access capabilities.

There's a Terraform provider for Proxmox, which is an excellent hypervisor. Making a template takes less than an hour with configuration.

You do need an Ops person for sure, but an entire _team_?

hmmm-i-wonder · on Dec 11, 2024

>"apt update && apt upgrade",

Across 10k-100k+ servers, all running services and needing to orchestrate restarting across the whole fleet, while providing 0 downtime or impact to thousands of clients with terabytes of data being processed and analyzed at any given time.

Sure whats so hard about changing a tire? Well try to do it on an 18-wheeler while its driving down the highway without any impact to its speed.

> Are you aware of how effective SELinux and systemd containers are? Just a simple firewall at the OS level?

Part of a layered and in-depth system but one that introduces complexity.

>Maybe even just using Tailscale (or the open source Headscale) to introduce zero trust access capabilities.

Tailscale in an enterprise production environment? Not going to pass any sort of security audit and probably violates a number of certifications customer require at the enterprise level for network access controls, visibility and auditing.

Just managing the git/jenkins/spinnaker/terraform infrastructure in dozens of locations deploying to and maintaining tens of thousands of servers/pods requires a 24x7 team on top of the hundreds of teams and tens of thousands of devs using it.

If you're small enough that doesn't make sense, then you might be small enough one Ops person can handle the load (One is never enough if you're smart but...), but you are dealing with a very small amount of infrastructure and services at this point.

CRConrad · on Dec 20, 2024

> Across 10k-100k+ servers

If you "need" that many servers (and aren't Google), you've built your systems massively wrong.

hmmm-i-wonder · on Dec 10, 2024

Absolutely.

My issue is really on the other end of that scale, where getting C-suites to recognize when owning that core competency is actually beneficial to the company even if its not the focus of the company.

I grew up around companies leveraging vertical integration at the right scales to improve costs, seeing companies go the opposite direction trading all those advantages for often never-materializing benefits is... frustrating.