October 8th, 2024

Your CI pipeline isn't ready for AI

Morgante Pell criticized continuous integration systems for inefficiencies, noting that they slow down development and require complex configurations. He emphasized the need for improvements to support faster cycles, especially for AI-generated code.

Read original article

In a recent reflection on the challenges of continuous integration (CI) systems, Morgante Pell expressed frustration with the inefficiencies of CI pipelines while developing an AI code generation agent. He noted that simple code changes often take longer to build, review, and deploy than to write, a sentiment echoed by many customers who experience brittle build pipelines. Pell highlighted that CI performance has not kept pace with local development environments, often resulting in slower builds. A significant issue is the repetitive work in pipelines, which can consume over 50% of compute cycles on redundant tasks, leading to increased costs and potential failures. While various tools like Nx, Bazel, and Docker aim to address these inefficiencies through caching and task management, Pell found them ultimately unsatisfactory, as they require complex configurations that detract from a straightforward local development experience. He criticized the need to define build graphs and the redundancy of teaching tools what they already know, such as optimizing Dockerfile configurations. Pell concluded that while he is not ready to abandon CI, there is a pressing need for improvements to enable faster development cycles, especially in the context of AI-generated code, to prevent backlogs of untested pull requests.

- CI pipelines often slow down development, taking longer to build and deploy than to write code.

- Many developers find their CI systems brittle and inefficient, leading to frustration.

- Existing tools for optimizing CI processes do not adequately address the fundamental issues.

- There is a need for improvements in CI to support faster development cycles, particularly with AI-generated code.

- The complexity of configuring CI systems can detract from the simplicity of local development setups.

Why Copilot Is Making Programmers Worse at Programming

AI-driven coding tools like Copilot may enhance productivity but risk eroding fundamental programming skills, fostering dependency, reducing learning opportunities, isolating developers, and creating a false sense of expertise.

One of the best ways to get value for AI coding tools: generating tests

The 2024 Developer Survey indicates that programmers are increasingly using AI tools for testing, aiming to improve code quality and reduce tedious tasks, enhancing overall software reliability and efficiency.

Ask HN: How to deal with AI generated sloppy code

The author raises concerns about AI-generated code being overly complex and bloated, complicating debugging and maintenance, and invites the tech community to share their strategies for managing these issues.

Devs gaining little (if anything) from AI coding assistants

A study by Uplevel found that AI coding assistants like GitHub Copilot do not significantly boost developer productivity, increase bugs, and lead to more time spent reviewing code rather than writing it.

Researchers seeing little evidence of benefit from co pilots

A study by Uplevel found that AI coding assistants like GitHub Copilot do not significantly improve developer productivity and may increase bugs, with mixed results across different companies.

8 comments

By @pocketarc - 4 months

> More troublingly, performance has not improved in CI at the same pace as on developer machines—it’s usually a lot slower to build our app in CI than it is to do it locally on my M1 laptop.

While some of the other comments around optimizing CI pipelines are solid, this whole thing seems to be due to having CI running on servers that are -worse- than a laptop. Isn't that wild? Servers weaker than laptops. Not even desktops or workstations. LAPTOPS.

And they are, because they're just cloud instances. And most cloud instances... are not fast.

Consider the idea that you could run your CI runner on an M1 laptop if you so choose to. Setting up a self-hosted GH Actions runner (for example) is quite straightforward. Doesn't even need to be an internet-facing machine, it can be a spare machine sitting at home/office. $600 will get you a Mac mini with an M2 CPU and super-fast SSD; everything will build faster than it ever could on any generic CI build server.

By @skeptrune - 4 months

It's incredibly frustrating that LLM's still aren't useful for automating CI and IaC configs despite all the hype.

By @dan_manges - 4 months

We're solving a lot of these problems with Mint: https://rwx.com/mint

Key differentiators:

* Content-based caching eliminates duplicate execution – only run what's relevant based on the changes being made

* Filters are applied before execution, ensuring that cache keys are reliable

* Steps are defined as a DAG, with machines abstracted away, for better performance, efficiency, and composition/reuse

By @jononor - 4 months

At our company our Machine Learning train+eval pipelines run in standard Gitlab CI (in addition to all the standard backend/frontend software builds, and some IoT builds). We have some 4 small PCs at the office set up as runners for the compute intensive jobs. So that each job gets multi-core CPUs with NVME, not just vCPU and virtualized storage. Each job execution is around 8x faster than the standard Gitlab CI runners. And much cheaper than dedicated compute at the standard cloud vendors. Hetzner would be similarly cheap, but I did not want to bother with with remote management, another vendor, network etc.

By @mike_hearn - 4 months

There are some quick wins you can do to improve CI times and reliability. I use them some of these and it does ease the pain. I have a company that develops a tool that is itself a build system that does complex and intensive builds as part of its testing process, so CI times are something I keep an eye on. These tips are mostly useful for JVM/.NET projects, I think. We use self-managed TeamCity which makes this stuff easy.

1. Preserve checkout/build directories between builds. In other words, don't do clean builds. Let your build system do incremental builds and use its dependency caches as it would when running locally. This means not running builds in Docker containers, for instance (unless you take steps to keep them running).

2. Make sure your servers run behind caching HTTP proxies so if you do need to trigger a clean build downloads are properly cached and optimized.

3. Run builds on Macs! Yes, they are now much faster than other machines so if you can afford them and your codebase is portable enough, throw them into the mix and let high priority changes run on them instead of on slower Linux VMs. Apple silicon machines are a bit too new to be reaching obsolescence, but if you do have employees who give up "old" ARM machines then turn those into CI workers.

4. Ensure all build machines have fast SSDs.

5. Use dedicated machines for build workers i.e. not cloud VMs which are often over-subscribed. Or use a cloud that's good value for money and doesn't over-subscribe VMs like Oracle's [1]. Dedicated machines in the big clouds can be expensive, but you can get cheaper smaller machines elsewhere. Or just buy hardware and wire it up yourself in an office. It's not important for build machines to be HA. You always have the option of mixing machines and adding cloud VMs too if your load suddenly increases.

6. Use a build system that understands build graphs properly (i.e. not Maven) and modularize the codebase well. Most build systems can't eliminate redundant unit testing within a module, but can do so between modules, so finer grained modules + incremental builds can reduce the number of tests that are run for a given change.

7. Be judicious about what tests are run on every change. Do you really need to run a full blown end to end test on every commit? Probably not.

Test times are definitely an area where we need some more fundamental R&D though. Integration testing is the highest value testing but it's also the type of test build systems struggle the most to optimize out, as figuring out what might have been broken by a change is too hard.

[1] Disclosure: I do some work for Oracle Labs, but I think this statement is true regardless.

By @rurban - 4 months

My pipeline is. Github self-hosted, running on the inference server with many Nvidia GPU's. Pretty easy to setup.

By @jdlshore - 4 months

Not really about AI, but instead a complaint about the difficulty of optimizing build pipelines.

By @fire_lake - 4 months

Weird article. Bazel does exactly what the author wants. And it seems unrelated to AI.

Why Copilot Is Making Programmers Worse at Programming

One of the best ways to get value for AI coding tools: generating tests

Ask HN: How to deal with AI generated sloppy code

Devs gaining little (if anything) from AI coding assistants

Researchers seeing little evidence of benefit from co pilots

A study by Uplevel found that AI coding assistants like GitHub Copilot do not significantly improve developer productivity and may increase bugs, with mixed results across different companies.

Your CI pipeline isn't ready for AI

Related

Why Copilot Is Making Programmers Worse at Programming

One of the best ways to get value for AI coding tools: generating tests

Ask HN: How to deal with AI generated sloppy code

Devs gaining little (if anything) from AI coding assistants

Researchers seeing little evidence of benefit from co pilots

Related

Why Copilot Is Making Programmers Worse at Programming

One of the best ways to get value for AI coding tools: generating tests

Ask HN: How to deal with AI generated sloppy code

Devs gaining little (if anything) from AI coding assistants

Researchers seeing little evidence of benefit from co pilots