How I Built a Full Audio/Video Streaming Microservice in One Day with Claude Fable 5 Auto Mode
Claude Fable 5 in auto mode built my entire HLS streaming microservice in under a day — AWS infra, security, backend, frontend, CI/CD, migrations, runbooks.

Last week I did something I would have called irresponsible a year ago: I handed an entire production microservice — a full audio/video streaming service with real-time delivery, AWS infrastructure, security, CI/CD, and data migrations — to Claude Fable 5 running in auto mode, and I shipped it the same day.
Not a prototype. Not a demo repo with a TODO: auth comment. A service that today sits in front of every video and audio file my other services used to serve as raw, full-size downloads — now delivered as adaptive HLS streams with signed playback, sub-second startup, and a live-streaming path.
My total contribution was a written brief and 17 design decisions. Fable wrote the PRD, the TRD, the high-level and low-level designs, the Terraform, the backend, the frontend player, the GitHub Actions pipelines, the migration scripts that transcoded my existing media library, and the runbooks my other services now use to integrate. If you searched for how to build a microservice with an AI agent, or you’re wondering whether Claude Fable 5 auto mode is actually different from babysitting an autocomplete — this is the full, honest teardown.
THE DAY IN NUMBERS
~9 hrs
Wall clock
brief to production
17
Human inputs
all design decisions
4 docs
Before any code
PRD, TRD, HLD, LLD
100%
Infra as code
Terraform, zero console clicks
TL;DR
- Claude Fable 5 in auto mode built a complete HLS audio/video streaming microservice in under a day — AWS resources, security hardening, backend, frontend player, CI/CD, and integration runbooks included.
- It wrote PRD → TRD → HLD → LLD before writing a single line of code, and paused only to ask me real design questions: HLS vs DASH, signed cookies vs signed URLs, IVS vs MediaLive for the realtime path.
- The architecture it chose is boring in the best way: S3 + MediaConvert + CloudFront signed cookies for on-demand media, AWS IVS for real-time streams, a small Lambda control plane, all provisioned by Terraform.
- It also wrote idempotent migration scripts that transcoded my entire back catalog of raw videos and audio files — with a DynamoDB state table, concurrency caps, and verification.
- My job changed from typing code to making decisions and reviewing diffs. That’s the actual story of agentic coding in 2026: the bottleneck moved from implementation to judgment.
What “Auto Mode” Means (and What It Doesn’t)
Claude Fable 5 is Anthropic’s newest model tier — the Mythos-class model that sits above Opus — and inside Claude Code, “auto mode” means the agent runs the full loop autonomously: it plans, edits files, runs commands, executes tests, deploys infrastructure, and keeps going until the task is done or it genuinely needs a human decision. You’re not approving every tool call. You’re not pasting code between windows. You set the destination and the constraints; the agent drives.
The crucial nuance: auto mode is not “no human input.” It’s minimal, high-leverage human input. Across nine hours, Fable stopped me 17 times — and every single stop was a question only I could answer: a product trade-off, a cost ceiling, a security posture choice. It never once asked me “how do I configure CloudFront?” It asked me “do you want playback URLs to be shareable, or locked to the session?” Those are very different questions, and the second one is the one that actually deserves my time.
💡 Key insight: The quality bar of an agentic build is set by the quality of the questions the agent asks you. Fable 5 asks product questions, not syntax questions — that’s the generational difference.
The Starting Point: Raw Files Pretending to Be Streaming
Here’s the embarrassing “before” picture, because every good case study needs one.
I run several services that handle user-facing media — course recordings, podcast-style audio, screen captures. The original implementation was the one every team ships first: media files uploaded to S3, served back through the app as raw, full-size files. A GET /files/{id} endpoint, a presigned URL, and an HTML <video> tag pointed at a 400 MB MP4.
It worked, in the way a tent works as a house. Here’s the same media library, before and after the one-day build:
RAW FILE SERVING VS A REAL STREAMING SERVICE
BEFORE — RAW FILES
A presigned URL and a prayer
- Seeking was brutal — jumping to minute 40 meant range-requesting through a monolithic file, and mobile networks often re-buffered from scratch
- One bitrate for everyone — hotel Wi-Fi got the same 1080p file as fiber, with no audio-only fallback
- Bandwidth scaled with file size, not watch time — a 2-minute viewer still pulled half the file
- Zero real-time story — live sessions ran on a third-party embed I didn't control
AFTER — HLS MICROSERVICE
Adaptive, signed, instant
- Instant seek — short HLS segments mean jumping anywhere starts playback in under a second
- Adaptive bitrate ladder — 1080p to audio-only, picked per viewer per second by the player
- CDN-served segments, billed by watch time — viewers download only what they actually watch
- Realtime built in — AWS IVS streams play in the exact same player as on-demand content
Same S3 library, same users — the delivery model is the only thing that changed.
The fix is well understood: transcode to HLS (HTTP Live Streaming) with an adaptive bitrate ladder, serve segments from a CDN, sign playback, and use a managed service for live. The reason I hadn’t done it: done properly, it’s a solid two-to-three week project across infra, backend, frontend, and a scary migration of existing content. That estimate is what Fable 5 deleted.
The One-Day Timeline
Here’s how the day actually broke down. Times are approximate; the shape is exact.
BRIEF TO PRODUCTION IN ~9 HOURS
The striking part is the ratio: roughly a quarter of the day went into documents and decisions before any code existed — and that front-loading is why the rest of the day didn't derail.
08:30 — 09:00
The brief
I wrote roughly a page: what the service must do (VOD streaming, realtime streams, drop-in integration for existing services), constraints (AWS, Terraform, no public buckets), and what done looks like.
09:00 — 11:00
PRD, TRD, HLD, LLD
Fable produced all four documents, pausing for design decisions: protocol, live-streaming service, signing strategy, compute model. I answered questions and red-penned the docs. Zero code yet.
11:00 — 13:30
Infrastructure + backend
Terraform for every AWS resource — buckets, MediaConvert, CloudFront, IVS, DynamoDB, IAM — plus the Lambda control plane: upload, transcode orchestration, playback sessions.
13:30 — 15:30
Frontend + CI/CD
A typed player package wrapping hls.js with adaptive quality, audio-only mode, and live support. GitHub Actions with OIDC — no long-lived AWS keys — and a gated terraform plan/apply flow.
15:30 — 17:00
Migration of the back catalog
Idempotent scripts that walked the legacy bucket, queued MediaConvert jobs with a concurrency cap, tracked state in DynamoDB, and verified every output against the source.
17:00 — 18:00
Integration + runbooks
Fable patched my other services to request playback sessions instead of raw URLs, kept a dual-read fallback for safety, and wrote the runbooks any future service needs to onboard.
Docs Before Code: The Part Everyone Skips
This is the section I most want you to steal, because it’s the highest-leverage behavior in the whole workflow — and it costs nothing.
Before touching code, Fable wrote four documents into the repo, in order, and made me approve each one:
- PRD (Product Requirements Document) — what the service does and for whom: VOD playback, live streams, integration contract for sibling services, explicit non-goals (no DRM in v1, no user-generated live streams).
- TRD (Technical Requirements Document) — the measurable bar: time-to-first-frame under 1 second on broadband, live glass-to-glass latency under 5 seconds, playback URLs unusable after expiry, migration must be resumable and verifiable.
- HLD (High-Level Design) — the architecture diagram, the AWS services chosen and the ones rejected (with reasons), data flow for upload, playback, live, and migration.
- LLD (Low-Level Design) — DynamoDB key design, every API route with request/response shapes, IAM policy boundaries per function, error taxonomy, the exact MediaConvert ladder.
If you’ve read my piece on spec-driven development with AI agents, you know I’m already convinced specs are the steering wheel for agentic coding. What Fable 5 adds is that the agent now writes the spec itself and interrogates you against it. The TRD review is where I caught the one thing I’d have regretted: the first draft proposed signed URLs per segment. I pushed back — per-segment signing breaks CDN cache efficiency — and Fable switched the design to CloudFront signed cookies scoped to a playback session, then updated the LLD and the threat notes to match, unprompted.
💡 Key insight: Reviewing a 2-page TRD takes 10 minutes and catches architecture mistakes. Reviewing 4,000 lines of generated code to find the same mistake takes a day. Auto mode works because of the documents, not despite them.
The Architecture Fable Built
The short answer: a serverless control plane around managed media services. Fable’s HLD argued — correctly — that in 2026 you should not be running your own transcoders or packagers for this workload class, and every component it picked is the boring, durable choice:
- S3 (two private buckets) — a mezzanine bucket for original uploads and a packaged bucket for HLS output. Both with Block Public Access on, KMS encryption at rest, and lifecycle rules that expire failed multipart uploads.
- AWS Elemental MediaConvert — VOD transcoding. Each video becomes an adaptive ladder (1080p / 720p / 480p / audio-only) of HLS segments; audio files become segmented HLS audio so podcasts get the same instant-seek behavior as video.
- EventBridge — glue.
ObjectCreatedon the mezzanine bucket triggers the transcode orchestrator; MediaConvert job-state changes flow back to update asset status. No polling anywhere. - CloudFront with Origin Access Control + signed cookies — the only public face of media. The packaged bucket is unreachable except through the CDN, and the CDN only serves you with a valid short-lived cookie.
- A small Lambda + API Gateway control plane — four routes: request an upload (presigned multipart), check asset status, create a playback session (the integration contract), and create a live channel.
- AWS IVS (Interactive Video Service) — the real-time path. Managed RTMPS ingest, 2–5 second latency, an HLS-compatible playback URL that drops into the same player. Fable’s HLD explicitly rejected MediaLive for v1 as cost- and ops-overkill, which matched my instinct exactly.
- DynamoDB — asset and session metadata, single-table, with the migration state tracked in the same table under its own key prefix.
The integration contract is the part my other services care about, and it’s one endpoint:
// Before: every service hand-rolled raw file access
const url = await getPresignedUrl(fileId); // 400 MB MP4, good luck
// After: one call, any service, audio or video, VOD or live
const session = await streamSvc.createPlaybackSession({
assetId,
viewerId, // bound to the session, not shareable
expiresIn: 3600
});
// → { manifestUrl: "https://media.example.com/hls/{assetId}/master.m3u8",
// cookies: { "CloudFront-Policy": "...", "CloudFront-Signature": "..." } } Raw-file serving didn’t just get faster — it got deleted as a concept. There is no code path left that hands a full original file to a browser.
The security work I didn’t have to ask for
I gave Fable one sentence of security direction: “private by default, no long-lived credentials, signed playback.” Here’s what it derived from that sentence:
SECURITY FABLE SHIPPED UNPROMPTED
IAM
Least-privilege per function
Each Lambda has its own role scoped to exactly its resources — the playback-session function cannot touch the mezzanine bucket, the upload function cannot read DynamoDB sessions.
EDGE
OAC + signed cookies, short TTL
Buckets are unreachable except via CloudFront Origin Access Control. Playback cookies expire with the session and are scoped to one asset path.
CI/CD
OIDC instead of AWS keys
GitHub Actions assumes a role via OpenID Connect. There is no AWS secret stored in the repo or the CI environment at all.
DATA
KMS at rest, TLS in transit
Both buckets and the DynamoDB table are KMS-encrypted; the API enforces TLS and validates upload content types before issuing presigned URLs.
I still ran my own review pass and an automated code review over the diff — trust but verify is doing heavy lifting in this workflow — and the review came back with style nits, not security findings.
The Migration: Transcoding an Existing Library Without Fear
New architectures are easy; old data is where projects go to die. I had years of raw media sitting in the legacy bucket, all of it needing transcoding into HLS, none of it allowed to break while users were actively consuming it.
Fable’s migration design treated the transcode of the back catalog as a resumable, verifiable batch job, not a script you run and pray over:
HOW THE MIGRATION SCRIPTS WORK
Every stage is idempotent — the whole pipeline can be killed and re-run at any point and it picks up exactly where it stopped.
STAGE 01
Inventory
Walk the legacy bucket, fingerprint each object (ETag + size), and write one migration record per asset into DynamoDB with status PENDING. Re-running only adds new files.
STAGE 02
Queue with a concurrency cap
Submit MediaConvert jobs from the PENDING set, capped to stay inside account quotas and a cost ceiling I set ($/day). Job IDs land back on the migration record.
STAGE 03
Verify, don’t assume
On job completion, compare output duration against source duration (±1s), check every rendition in the ladder exists, and probe the manifest. Only then: status VERIFIED.
STAGE 04
Cut over with a net
Consuming services dual-read: VERIFIED assets stream HLS, everything else falls back to the legacy raw path. When the failure list hit zero, the fallback was removed.
The detail that sold me: Fable added a --dry-run flag and a cost estimate to the queue stage before I asked, because the TRD it had written contained a cost ceiling — so it treated “don’t surprise me on the bill” as a requirement to implement, not a vibe. Eleven assets failed verification on the first pass (corrupted sources, one mislabeled codec). They were exactly the kind of thing a hand-rolled for loop over aws s3 ls would have silently butchered.
CI/CD and Runbooks: The Unsexy 20% That Makes It Real
A microservice without a pipeline is a liability with good intentions. Fable shipped both halves of operability the same afternoon:
- GitHub Actions: on PR — lint, type-check, unit tests,
terraform planposted as a PR comment; on merge — gatedterraform apply, Lambda deploy, and a canary that creates a real playback session against production and fails the deploy if time-to-first-byte on the manifest regresses. - Runbooks in the repo (
/runbooks): Onboarding a new service (the playback-session contract, with copy-paste client code), Live stream operations (create channel, rotate stream key, end-of-stream archive), Transcode failure triage (where MediaConvert errors land, how to requeue one asset), and Cost monitoring (the CloudWatch dashboard + budget alarms it provisioned).
The runbooks are why integrating my first two services took an hour instead of a week of Slack archaeology. The third service was onboarded by a teammate without talking to me at all — they read the runbook Fable wrote, called one endpoint, and shipped. That’s the real productivity story: the agent didn’t just write code, it wrote down how to use the code, which is the part humans chronically skip.
What I Actually Did All Day
Let’s be precise about the human role, because this is where most coverage of agentic coding gets hand-wavy. Every one of my 17 inputs was a product trade-off, a cost ceiling, or a risk-tolerance call — never a technical how-to. These five carried the most weight:
THE DECISION LOG — WHERE THE HUMAN EARNED THEIR KEEP
Fable framed each question with the trade-offs already researched; my job was to bring the context only I had.
- Protocol
THE QUESTION 01
HLS only, or HLS + DASH?
THE CALL
HLS-only for v1, skip DASH entirely.
WHY IT WAS MINE TO MAKE
My audience skews mobile and Safari, where HLS is native. One protocol halves the test matrix — and DASH can be added later without re-transcoding.
- Realtime
THE QUESTION 02
AWS IVS or MediaLive for the realtime path?
THE CALL
IVS — managed ingest, 2–5 second latency.
WHY IT WAS MINE TO MAKE
Pure cost/ops trade-off. Sub-second latency wasn't worth 10x the operational surface for my use case. Only I knew the latency my product actually needs.
- Security
THE QUESTION 03
Signed URLs or signed cookies for playback?
THE CALL
Signed cookies, scoped per playback session.
WHY IT WAS MINE TO MAKE
Product stance: paid content shouldn't be hot-linkable, and per-segment URL signing wrecks CDN cache efficiency. A judgment call about the business, not the tech.
- Compute
THE QUESTION 04
Lambda or Fargate for the control plane?
THE CALL
Lambda, accepting cold starts.
WHY IT WAS MINE TO MAKE
My traffic is spiky with long idle valleys — scale-to-zero wins. Someone with steady traffic should choose the opposite, which is exactly why the agent asked.
- Migration
THE QUESTION 05
How aggressive should the migration cutover be?
THE CALL
Dual-read fallback until verified failures hit zero.
WHY IT WAS MINE TO MAKE
Risk tolerance is a business decision. I chose the slow-and-safe path because live users were consuming this media during the migration.
Plus the review passes: each design doc, the Terraform plan before the first apply, the IAM policies line by line, and the final diff. Call it two hours of genuine attention across the nine.
Notice what’s not on the list: I never wrote a handler, never created a resource in the AWS console (my single console visit was confirming the budget alarm fired during a test), never debugged a YAML indentation error. Every hour I spent was on decisions that needed my context — which is exactly the division of labor I’ve been arguing the agent landscape was heading toward. It also confirmed something smaller but important: a well-maintained CLAUDE.md with your conventions is the cheapest force multiplier in this whole setup — Fable followed my repo conventions because they were written down where it looks.
Where It Stumbled (Because Nothing Is Magic)
Honesty section. Three real friction points:
- First MediaConvert ladder was over-provisioned. The draft included a 4K rendition my content doesn’t have sources for. Caught in LLD review — a 30-second fix at doc stage, but it would have quietly doubled transcode costs if I’d rubber-stamped it.
- One quota assumption. Fable assumed default MediaConvert concurrent-job quotas; my account had a lower legacy limit. The migration’s capped queue absorbed it gracefully (jobs just drained slower), and Fable filed the quota-increase request when the throttling showed up in logs — but it discovered the limit by hitting it, not by checking first.
- It’s only as good as your brief. I forgot to mention that some legacy audio was in a deprecated codec. The verification stage caught all of them, but a better inventory in my brief would have saved a requeue cycle.
None of these are “AI wrote bad code” stories. They’re the same integration realities any senior engineer hits — the difference is the system was designed (by the agent, in the TRD) to surface them loudly instead of corrupting silently.
Best Practices: How to Run an Auto-Mode Build
THE AUTO-MODE PLAYBOOK
Track progress as you work through the list
0%
0/8 done
FAQ
Questions readers usually have
The questions I've been asked since posting the before/after numbers.
Final Take
The headline isn’t “AI wrote a lot of code fast.” Code generation has been cheap for two years. The headline is that Claude Fable 5 ran an engineering process — requirements, design review, infrastructure, security posture, migration safety, operations docs — and the process is what made one-day delivery survivable instead of reckless.
My role didn’t shrink; it concentrated. Seventeen decisions and two hours of review were the entire human footprint, but they were the seventeen decisions that determine whether this service is still standing in two years. That’s the trade every senior engineer should want.
If you’re going to try this, start with the playbook above and a service you actually need — not a toy. The toys don’t force the migration, the security review, or the runbooks, and those are exactly where auto mode earns its keep.
If you found this useful, read spec-driven development with AI agents next — it’s the methodology that makes builds like this one repeatable instead of lucky.
Sources
- Anthropic: Claude Fable 5 and Mythos 5
- AWS Elemental MediaConvert documentation
- Amazon IVS documentation
- HLS specification (RFC 8216)
- hls.js
- GitHub Actions OIDC with AWS
Written for umesh-malik.com — no-fluff technical writing on AI, Web Dev, and Engineering.
About the Author
Software engineer writing about AI, Claude Code, LLMs, OpenAI, Anthropic, and developer tooling. 5+ years building production systems at Expedia Group, Tekion, and BYJU'S.
Related Articles

AI Coding Agents & DX
Cursor vs Claude Code vs Copilot (2026): Which AI Coding Tool, for What
Cursor vs Claude Code vs GitHub Copilot in 2026 — how they actually differ in model, workflow, and autonomy, and which to use for what (I use all three).

AI Coding Agents & DX
How to Write a CLAUDE.md That Actually Helps
How to write a CLAUDE.md that actually helps Claude Code: what to include, what to leave out, a real structure, and how to stop it from rotting.

AI Coding Agents & DX
How to Switch Between Multiple Claude Code Accounts Without Re-Logging In (claude-swap Guide)
claude-swap is an open-source CLI that switches Claude Code accounts in seconds — no browser re-login. How it works, how to use it, and what it misses.