Skip to main content

How I Built a Full Audio/Video Streaming Microservice in One Day with Claude Fable 5 Auto Mode

Claude Fable 5 in auto mode built my entire HLS streaming microservice in under a day — AWS infra, security, backend, frontend, CI/CD, migrations, runbooks.

11 min read
Cover showing the one-day flow of Claude Fable 5 building a streaming microservice: spec, infrastructure, code, and ship

Last week I did something I would have called irresponsible a year ago: I handed an entire production microservice — a full audio/video streaming service with real-time delivery, AWS infrastructure, security, CI/CD, and data migrations — to Claude Fable 5 running in auto mode, and I shipped it the same day.

Not a prototype. Not a demo repo with a TODO: auth comment. A service that today sits in front of every video and audio file my other services used to serve as raw, full-size downloads — now delivered as adaptive HLS streams with signed playback, sub-second startup, and a live-streaming path.

My total contribution was a written brief and 17 design decisions. Fable wrote the PRD, the TRD, the high-level and low-level designs, the Terraform, the backend, the frontend player, the GitHub Actions pipelines, the migration scripts that transcoded my existing media library, and the runbooks my other services now use to integrate. If you searched for how to build a microservice with an AI agent, or you’re wondering whether Claude Fable 5 auto mode is actually different from babysitting an autocomplete — this is the full, honest teardown.

THE DAY IN NUMBERS

~9 hrs

Wall clock

brief to production

17

Human inputs

all design decisions

4 docs

Before any code

PRD, TRD, HLD, LLD

100%

Infra as code

Terraform, zero console clicks

TL;DR

  • Claude Fable 5 in auto mode built a complete HLS audio/video streaming microservice in under a day — AWS resources, security hardening, backend, frontend player, CI/CD, and integration runbooks included.
  • It wrote PRD → TRD → HLD → LLD before writing a single line of code, and paused only to ask me real design questions: HLS vs DASH, signed cookies vs signed URLs, IVS vs MediaLive for the realtime path.
  • The architecture it chose is boring in the best way: S3 + MediaConvert + CloudFront signed cookies for on-demand media, AWS IVS for real-time streams, a small Lambda control plane, all provisioned by Terraform.
  • It also wrote idempotent migration scripts that transcoded my entire back catalog of raw videos and audio files — with a DynamoDB state table, concurrency caps, and verification.
  • My job changed from typing code to making decisions and reviewing diffs. That’s the actual story of agentic coding in 2026: the bottleneck moved from implementation to judgment.

What “Auto Mode” Means (and What It Doesn’t)

Claude Fable 5 is Anthropic’s newest model tier — the Mythos-class model that sits above Opus — and inside Claude Code, “auto mode” means the agent runs the full loop autonomously: it plans, edits files, runs commands, executes tests, deploys infrastructure, and keeps going until the task is done or it genuinely needs a human decision. You’re not approving every tool call. You’re not pasting code between windows. You set the destination and the constraints; the agent drives.

The crucial nuance: auto mode is not “no human input.” It’s minimal, high-leverage human input. Across nine hours, Fable stopped me 17 times — and every single stop was a question only I could answer: a product trade-off, a cost ceiling, a security posture choice. It never once asked me “how do I configure CloudFront?” It asked me “do you want playback URLs to be shareable, or locked to the session?” Those are very different questions, and the second one is the one that actually deserves my time.

💡 Key insight: The quality bar of an agentic build is set by the quality of the questions the agent asks you. Fable 5 asks product questions, not syntax questions — that’s the generational difference.

The Starting Point: Raw Files Pretending to Be Streaming

Here’s the embarrassing “before” picture, because every good case study needs one.

I run several services that handle user-facing media — course recordings, podcast-style audio, screen captures. The original implementation was the one every team ships first: media files uploaded to S3, served back through the app as raw, full-size files. A GET /files/{id} endpoint, a presigned URL, and an HTML <video> tag pointed at a 400 MB MP4.

It worked, in the way a tent works as a house. Here’s the same media library, before and after the one-day build:

RAW FILE SERVING VS A REAL STREAMING SERVICE

BEFORE — RAW FILES

A presigned URL and a prayer

  • Seeking was brutal — jumping to minute 40 meant range-requesting through a monolithic file, and mobile networks often re-buffered from scratch
  • One bitrate for everyone — hotel Wi-Fi got the same 1080p file as fiber, with no audio-only fallback
  • Bandwidth scaled with file size, not watch time — a 2-minute viewer still pulled half the file
  • Zero real-time story — live sessions ran on a third-party embed I didn't control

AFTER — HLS MICROSERVICE

Adaptive, signed, instant

  • Instant seek — short HLS segments mean jumping anywhere starts playback in under a second
  • Adaptive bitrate ladder — 1080p to audio-only, picked per viewer per second by the player
  • CDN-served segments, billed by watch time — viewers download only what they actually watch
  • Realtime built in — AWS IVS streams play in the exact same player as on-demand content

Same S3 library, same users — the delivery model is the only thing that changed.

The fix is well understood: transcode to HLS (HTTP Live Streaming) with an adaptive bitrate ladder, serve segments from a CDN, sign playback, and use a managed service for live. The reason I hadn’t done it: done properly, it’s a solid two-to-three week project across infra, backend, frontend, and a scary migration of existing content. That estimate is what Fable 5 deleted.

The One-Day Timeline

Timeline showing Claude Fable 5's one-day build: morning spent on PRD, TRD, HLD and LLD documents, midday on Terraform infrastructure and backend code, afternoon on frontend, CI/CD and migration scripts, evening on integration runbooks and production cutover

Here’s how the day actually broke down. Times are approximate; the shape is exact.

BRIEF TO PRODUCTION IN ~9 HOURS

The striking part is the ratio: roughly a quarter of the day went into documents and decisions before any code existed — and that front-loading is why the rest of the day didn't derail.

  1. 08:30 — 09:00

    The brief

    I wrote roughly a page: what the service must do (VOD streaming, realtime streams, drop-in integration for existing services), constraints (AWS, Terraform, no public buckets), and what done looks like.

  2. 09:00 — 11:00

    PRD, TRD, HLD, LLD

    Fable produced all four documents, pausing for design decisions: protocol, live-streaming service, signing strategy, compute model. I answered questions and red-penned the docs. Zero code yet.

  3. 11:00 — 13:30

    Infrastructure + backend

    Terraform for every AWS resource — buckets, MediaConvert, CloudFront, IVS, DynamoDB, IAM — plus the Lambda control plane: upload, transcode orchestration, playback sessions.

  4. 13:30 — 15:30

    Frontend + CI/CD

    A typed player package wrapping hls.js with adaptive quality, audio-only mode, and live support. GitHub Actions with OIDC — no long-lived AWS keys — and a gated terraform plan/apply flow.

  5. 15:30 — 17:00

    Migration of the back catalog

    Idempotent scripts that walked the legacy bucket, queued MediaConvert jobs with a concurrency cap, tracked state in DynamoDB, and verified every output against the source.

  6. 17:00 — 18:00

    Integration + runbooks

    Fable patched my other services to request playback sessions instead of raw URLs, kept a dual-read fallback for safety, and wrote the runbooks any future service needs to onboard.

Docs Before Code: The Part Everyone Skips

This is the section I most want you to steal, because it’s the highest-leverage behavior in the whole workflow — and it costs nothing.

Before touching code, Fable wrote four documents into the repo, in order, and made me approve each one:

  1. PRD (Product Requirements Document) — what the service does and for whom: VOD playback, live streams, integration contract for sibling services, explicit non-goals (no DRM in v1, no user-generated live streams).
  2. TRD (Technical Requirements Document) — the measurable bar: time-to-first-frame under 1 second on broadband, live glass-to-glass latency under 5 seconds, playback URLs unusable after expiry, migration must be resumable and verifiable.
  3. HLD (High-Level Design) — the architecture diagram, the AWS services chosen and the ones rejected (with reasons), data flow for upload, playback, live, and migration.
  4. LLD (Low-Level Design) — DynamoDB key design, every API route with request/response shapes, IAM policy boundaries per function, error taxonomy, the exact MediaConvert ladder.

If you’ve read my piece on spec-driven development with AI agents, you know I’m already convinced specs are the steering wheel for agentic coding. What Fable 5 adds is that the agent now writes the spec itself and interrogates you against it. The TRD review is where I caught the one thing I’d have regretted: the first draft proposed signed URLs per segment. I pushed back — per-segment signing breaks CDN cache efficiency — and Fable switched the design to CloudFront signed cookies scoped to a playback session, then updated the LLD and the threat notes to match, unprompted.

💡 Key insight: Reviewing a 2-page TRD takes 10 minutes and catches architecture mistakes. Reviewing 4,000 lines of generated code to find the same mistake takes a day. Auto mode works because of the documents, not despite them.

The Architecture Fable Built

Architecture diagram of the streaming microservice: uploads flow into a private S3 mezzanine bucket, EventBridge triggers MediaConvert to produce HLS renditions into a packaged bucket served by CloudFront with signed cookies; a Lambda control plane issues playback sessions backed by DynamoDB; AWS IVS handles real-time streams; other services integrate through one playback-session API

The short answer: a serverless control plane around managed media services. Fable’s HLD argued — correctly — that in 2026 you should not be running your own transcoders or packagers for this workload class, and every component it picked is the boring, durable choice:

  • S3 (two private buckets) — a mezzanine bucket for original uploads and a packaged bucket for HLS output. Both with Block Public Access on, KMS encryption at rest, and lifecycle rules that expire failed multipart uploads.
  • AWS Elemental MediaConvert — VOD transcoding. Each video becomes an adaptive ladder (1080p / 720p / 480p / audio-only) of HLS segments; audio files become segmented HLS audio so podcasts get the same instant-seek behavior as video.
  • EventBridge — glue. ObjectCreated on the mezzanine bucket triggers the transcode orchestrator; MediaConvert job-state changes flow back to update asset status. No polling anywhere.
  • CloudFront with Origin Access Control + signed cookies — the only public face of media. The packaged bucket is unreachable except through the CDN, and the CDN only serves you with a valid short-lived cookie.
  • A small Lambda + API Gateway control plane — four routes: request an upload (presigned multipart), check asset status, create a playback session (the integration contract), and create a live channel.
  • AWS IVS (Interactive Video Service) — the real-time path. Managed RTMPS ingest, 2–5 second latency, an HLS-compatible playback URL that drops into the same player. Fable’s HLD explicitly rejected MediaLive for v1 as cost- and ops-overkill, which matched my instinct exactly.
  • DynamoDB — asset and session metadata, single-table, with the migration state tracked in the same table under its own key prefix.

The integration contract is the part my other services care about, and it’s one endpoint:

// Before: every service hand-rolled raw file access
const url = await getPresignedUrl(fileId); // 400 MB MP4, good luck

// After: one call, any service, audio or video, VOD or live
const session = await streamSvc.createPlaybackSession({
  assetId,
  viewerId,            // bound to the session, not shareable
  expiresIn: 3600
});
// → { manifestUrl: "https://media.example.com/hls/{assetId}/master.m3u8",
//     cookies: { "CloudFront-Policy": "...", "CloudFront-Signature": "..." } }

Raw-file serving didn’t just get faster — it got deleted as a concept. There is no code path left that hands a full original file to a browser.

The security work I didn’t have to ask for

I gave Fable one sentence of security direction: “private by default, no long-lived credentials, signed playback.” Here’s what it derived from that sentence:

SECURITY FABLE SHIPPED UNPROMPTED

IAM

Least-privilege per function

Each Lambda has its own role scoped to exactly its resources — the playback-session function cannot touch the mezzanine bucket, the upload function cannot read DynamoDB sessions.

EDGE

OAC + signed cookies, short TTL

Buckets are unreachable except via CloudFront Origin Access Control. Playback cookies expire with the session and are scoped to one asset path.

CI/CD

OIDC instead of AWS keys

GitHub Actions assumes a role via OpenID Connect. There is no AWS secret stored in the repo or the CI environment at all.

DATA

KMS at rest, TLS in transit

Both buckets and the DynamoDB table are KMS-encrypted; the API enforces TLS and validates upload content types before issuing presigned URLs.

I still ran my own review pass and an automated code review over the diff — trust but verify is doing heavy lifting in this workflow — and the review came back with style nits, not security findings.

The Migration: Transcoding an Existing Library Without Fear

New architectures are easy; old data is where projects go to die. I had years of raw media sitting in the legacy bucket, all of it needing transcoding into HLS, none of it allowed to break while users were actively consuming it.

Fable’s migration design treated the transcode of the back catalog as a resumable, verifiable batch job, not a script you run and pray over:

HOW THE MIGRATION SCRIPTS WORK

Every stage is idempotent — the whole pipeline can be killed and re-run at any point and it picks up exactly where it stopped.

  1. STAGE 01

    Inventory

    Walk the legacy bucket, fingerprint each object (ETag + size), and write one migration record per asset into DynamoDB with status PENDING. Re-running only adds new files.

  2. STAGE 02

    Queue with a concurrency cap

    Submit MediaConvert jobs from the PENDING set, capped to stay inside account quotas and a cost ceiling I set ($/day). Job IDs land back on the migration record.

  3. STAGE 03

    Verify, don’t assume

    On job completion, compare output duration against source duration (±1s), check every rendition in the ladder exists, and probe the manifest. Only then: status VERIFIED.

  4. STAGE 04

    Cut over with a net

    Consuming services dual-read: VERIFIED assets stream HLS, everything else falls back to the legacy raw path. When the failure list hit zero, the fallback was removed.

The detail that sold me: Fable added a --dry-run flag and a cost estimate to the queue stage before I asked, because the TRD it had written contained a cost ceiling — so it treated “don’t surprise me on the bill” as a requirement to implement, not a vibe. Eleven assets failed verification on the first pass (corrupted sources, one mislabeled codec). They were exactly the kind of thing a hand-rolled for loop over aws s3 ls would have silently butchered.

CI/CD and Runbooks: The Unsexy 20% That Makes It Real

A microservice without a pipeline is a liability with good intentions. Fable shipped both halves of operability the same afternoon:

  • GitHub Actions: on PR — lint, type-check, unit tests, terraform plan posted as a PR comment; on merge — gated terraform apply, Lambda deploy, and a canary that creates a real playback session against production and fails the deploy if time-to-first-byte on the manifest regresses.
  • Runbooks in the repo (/runbooks): Onboarding a new service (the playback-session contract, with copy-paste client code), Live stream operations (create channel, rotate stream key, end-of-stream archive), Transcode failure triage (where MediaConvert errors land, how to requeue one asset), and Cost monitoring (the CloudWatch dashboard + budget alarms it provisioned).

The runbooks are why integrating my first two services took an hour instead of a week of Slack archaeology. The third service was onboarded by a teammate without talking to me at all — they read the runbook Fable wrote, called one endpoint, and shipped. That’s the real productivity story: the agent didn’t just write code, it wrote down how to use the code, which is the part humans chronically skip.

What I Actually Did All Day

Let’s be precise about the human role, because this is where most coverage of agentic coding gets hand-wavy. Every one of my 17 inputs was a product trade-off, a cost ceiling, or a risk-tolerance call — never a technical how-to. These five carried the most weight:

THE DECISION LOG — WHERE THE HUMAN EARNED THEIR KEEP

Fable framed each question with the trade-offs already researched; my job was to bring the context only I had.

  1. THE QUESTION 01

    HLS only, or HLS + DASH?

    Protocol

    THE CALL

    HLS-only for v1, skip DASH entirely.

    WHY IT WAS MINE TO MAKE

    My audience skews mobile and Safari, where HLS is native. One protocol halves the test matrix — and DASH can be added later without re-transcoding.

  2. THE QUESTION 02

    AWS IVS or MediaLive for the realtime path?

    Realtime

    THE CALL

    IVS — managed ingest, 2–5 second latency.

    WHY IT WAS MINE TO MAKE

    Pure cost/ops trade-off. Sub-second latency wasn't worth 10x the operational surface for my use case. Only I knew the latency my product actually needs.

  3. THE QUESTION 03

    Signed URLs or signed cookies for playback?

    Security

    THE CALL

    Signed cookies, scoped per playback session.

    WHY IT WAS MINE TO MAKE

    Product stance: paid content shouldn't be hot-linkable, and per-segment URL signing wrecks CDN cache efficiency. A judgment call about the business, not the tech.

  4. THE QUESTION 04

    Lambda or Fargate for the control plane?

    Compute

    THE CALL

    Lambda, accepting cold starts.

    WHY IT WAS MINE TO MAKE

    My traffic is spiky with long idle valleys — scale-to-zero wins. Someone with steady traffic should choose the opposite, which is exactly why the agent asked.

  5. THE QUESTION 05

    How aggressive should the migration cutover be?

    Migration

    THE CALL

    Dual-read fallback until verified failures hit zero.

    WHY IT WAS MINE TO MAKE

    Risk tolerance is a business decision. I chose the slow-and-safe path because live users were consuming this media during the migration.

Plus the review passes: each design doc, the Terraform plan before the first apply, the IAM policies line by line, and the final diff. Call it two hours of genuine attention across the nine.

Notice what’s not on the list: I never wrote a handler, never created a resource in the AWS console (my single console visit was confirming the budget alarm fired during a test), never debugged a YAML indentation error. Every hour I spent was on decisions that needed my context — which is exactly the division of labor I’ve been arguing the agent landscape was heading toward. It also confirmed something smaller but important: a well-maintained CLAUDE.md with your conventions is the cheapest force multiplier in this whole setup — Fable followed my repo conventions because they were written down where it looks.

Where It Stumbled (Because Nothing Is Magic)

Honesty section. Three real friction points:

  1. First MediaConvert ladder was over-provisioned. The draft included a 4K rendition my content doesn’t have sources for. Caught in LLD review — a 30-second fix at doc stage, but it would have quietly doubled transcode costs if I’d rubber-stamped it.
  2. One quota assumption. Fable assumed default MediaConvert concurrent-job quotas; my account had a lower legacy limit. The migration’s capped queue absorbed it gracefully (jobs just drained slower), and Fable filed the quota-increase request when the throttling showed up in logs — but it discovered the limit by hitting it, not by checking first.
  3. It’s only as good as your brief. I forgot to mention that some legacy audio was in a deprecated codec. The verification stage caught all of them, but a better inventory in my brief would have saved a requeue cycle.

None of these are “AI wrote bad code” stories. They’re the same integration realities any senior engineer hits — the difference is the system was designed (by the agent, in the TRD) to surface them loudly instead of corrupting silently.

Best Practices: How to Run an Auto-Mode Build

THE AUTO-MODE PLAYBOOK

Track progress as you work through the list

0%

0/8 done

FAQ

Questions readers usually have

The questions I've been asked since posting the before/after numbers.

Final Take

The headline isn’t “AI wrote a lot of code fast.” Code generation has been cheap for two years. The headline is that Claude Fable 5 ran an engineering process — requirements, design review, infrastructure, security posture, migration safety, operations docs — and the process is what made one-day delivery survivable instead of reckless.

My role didn’t shrink; it concentrated. Seventeen decisions and two hours of review were the entire human footprint, but they were the seventeen decisions that determine whether this service is still standing in two years. That’s the trade every senior engineer should want.

If you’re going to try this, start with the playbook above and a service you actually need — not a toy. The toys don’t force the migration, the security review, or the runbooks, and those are exactly where auto mode earns its keep.

If you found this useful, read spec-driven development with AI agents next — it’s the methodology that makes builds like this one repeatable instead of lucky.

Sources


Written for umesh-malik.com — no-fluff technical writing on AI, Web Dev, and Engineering.

Share this article:
X LinkedIn