| Headless Mode

January 1, 0001
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
---
title: "Amazon's AI Coding Bot Caused an AWS Outage"
date: 2026-02-20
author: "Digital Frontier"
draft: false
categories: ["Technical"]
tags: ["aws", "coding agents", "kiro", "infrastructure incidents", "ai safety"]
description: "Amazon's Kiro AI coding tool triggered an AWS outage in December after an engineer had broader permissions than expected."
summary: "An AI coding bot — Amazon's Kiro — caused an AWS service disruption in December. Amazon attributes the incident to user access control failures rather than AI autonomy, but the event highlights real risks of deploying agentic coding tools against production infrastructure."
article:
  type: "analysis"
technologies: ["AWS", "Amazon Kiro", "Amazon Q Developer"]
keywords: ["aws outage", "kiro ai", "ai coding agent", "amazon kiro incident", "agentic coding risk", "ai infrastructure safety", "coding assistant production"]
---

In December, Amazon's AI coding assistant Kiro caused a disruption to an AWS service in parts of mainland China. Amazon described it as an "extremely limited event" affecting a single service. A second incident involving the earlier Amazon Q Developer tool did not impact any customer-facing AWS service.

Amazon's position: "In both instances, this was user error, not AI error." The engineer involved in the December incident had "broader permissions than expected — a user access control issue, not an AI autonomy issue." Kiro's default behavior requests authorization before taking any action, but the engineer had bypassed the normal two-person approval workflow.

Neither incident approached the severity of the 15-hour AWS outage in October 2025 that took down multiple customer applications, including ChatGPT.




## What Happened

Amazon's internal AI coding tools were treated as extensions of the operator and inherited the same permissions. In both incidents, engineers did not require peer approval before deploying changes — a deviation from standard procedure.

| Factor | December Incident (Kiro) | Earlier Incident (Q Developer) |
|---|---|---|
| Tool | Kiro | Amazon Q Developer |
| Scope | Single service, mainland China | No customer-facing impact |
| Root cause | Overly broad user permissions | Under investigation |
| Peer review required | No (bypassed) | No |

AWS launched Kiro in July 2025 as a step beyond "vibe coding," generating code from structured specifications rather than freeform prompts.

## The Access Control Problem

The core failure is straightforward: the AI agent inherited human-level permissions without human-level review gates. Kiro's default configuration requires authorization before acting, but the deployment environment granted the operator — and by extension, the agent — permissions that skipped mandatory peer review.

This is not a novel failure mode. It is the same class of misconfiguration that causes incidents with any automation tool. The difference is velocity: an AI coding agent can generate and apply changes faster than a human operator, compressing the window between mistake and impact.

## Adoption Pressure

Amazon has set an internal target for 80 percent of developers to use AI coding tools at least once per week and is actively tracking adoption. Some employees remain skeptical of the tools' utility for core work given error risk.

Following the December incident, AWS implemented mandatory peer review and staff training for AI-assisted deployments.

## Implications for Agentic Coding

The incident pattern is predictable and will recur across organizations:

1. **Permission inheritance** — agents get operator-level access by default
2. **Review bypass** — speed incentives erode approval gates
3. **Blame framing** — "user error, not AI error" deflects from systemic design questions

The relevant question is not whether the AI made an autonomous mistake. It is whether organizations deploying agentic tools are enforcing the same change management controls they require for human operators. In this case, they were not.

## References

1. [An AI coding bot took down Amazon Web Services — Ars Technica](https://arstechnica.com/ai/2026/02/an-ai-coding-bot-took-down-amazon-web-services/)
2. [Original reporting — Financial Times](https://www.ft.com/)
Configuration details reflect a production environment at time of writing. Implementation specifics vary based on tooling versions, platform updates, and organizational requirements. Validate approaches against current documentation before deployment.
← Back to Journal