00 — The Meta-Story
Who Built This
Matthew is a technology leader who manages engineering teams at a SaaS company. He started this platform on February 22, 2026. Today it runs 62 Lambdas, 121 MCP tools, a 72-page website, and a CI/CD pipeline. How? Every single conversation was with Claude.
This is what happens when a domain expert — someone who knows their health data, their goals, and their constraints — pairs with an AI that can write production code. The human sets the architecture. The AI writes the implementation. The human reviews the output. The AI iterates. Zero Stack Overflow.
00b — The Partnership
What Claude Did vs. What Matt Did
Claude wrote
- Every Lambda function (62 and counting)
- The full CDK infrastructure (8 stacks)
- The observatory CSS design system
- The MCP tool registry (121 tools)
- The correlation engine
- All 68 site pages
Matt defined
- Every architecture decision (45 ADRs)
- The editorial design language
- The data model and source priorities
- The Board of Directors system (34 personas)
- The Henning Brandt evidence standard
- What questions to ask the data
01 — Audience
Who This Is For
You're a developer, hobbyist, or technical leader who wants to build a personal data system —
health tracking, quantified self, home automation, or any domain where you're collecting data from multiple
sources and want AI to reason about it. You don't need a team. You don't need a budget.
You need a pattern.
This page documents the architectural pattern I followed: what I chose, what I avoided, what broke,
and what I'd do differently if I started over tomorrow.
02 — Architecture decisions
What I Chose (and What I Didn't)
Every decision here was optimized for a single operator. No team coordination overhead, no multi-tenant complexity,
no premature abstraction. The philosophy: the simplest thing that works, run by one person, at near-zero cost.
// chose
DynamoDB single-table, no GSIs
One table. PK = USER#matthew#SOURCE#{source}, SK = DATE#YYYY-MM-DD. Every query is a known key pattern. No GSIs means no extra cost and no index propagation lag.
// trade-off: ad-hoc queries are impossible. You must know your access patterns upfront. For N=1 data, you always do.
// avoided
PostgreSQL / RDS
RDS starts at ~$15/month even for the smallest instance, runs 24/7, and requires patching. For write-once-read-many time-series health data, it's overkill. DynamoDB on-demand: $0.48/month.
// would reconsider if: ad-hoc analytical queries became critical, or multi-user support was needed
// chose
Lambda + EventBridge (no containers)
Every function is event-driven. Ingestion runs on cron (06:45–11:00 AM PT). Compute runs in sequence. Zero idle cost. Cold starts are <2s and irrelevant for batch processing.
// trade-off: 15-min max timeout, no long-running jobs, package size limits. All manageable for this workload.
// avoided
ECS / Fargate / EC2
Always-on compute makes no sense for a system that runs ~100 invocations/day. Even the smallest Fargate task is ~$10/month. The platform's entire Lambda bill is $0.12/month.
// would reconsider if: real-time streaming ingestion or websocket features were needed
// chose
CDK (TypeScript) for IaC
8 CDK stacks define all infrastructure. IAM roles are least-privilege per Lambda. EventBridge schedules, DynamoDB table, S3 buckets, CloudFront — all in code, all version-controlled.
// trade-off: CDK learning curve is steep. But "infrastructure as code" means rebuilding from scratch takes minutes, not days.
// avoided
Terraform / SAM / manual console
Terraform is cloud-agnostic but adds a state file management burden. SAM is fine but CDK gives you real programming constructs (loops, conditionals). Manual console is how incidents happen.
// would reconsider if: multi-cloud was a requirement (it never will be for a personal project)
// chose
MCP (Model Context Protocol) for AI
121 tools exposed via MCP. Claude calls them in natural language. No SQL, no dashboards, no manual queries. The AI is the interface. OAuth 2.1 + HMAC auth on the Lambda Function URL.
// trade-off: vendor lock-in to Anthropic's protocol. But MCP is open spec, and the tool implementations are just Python functions.
// avoided
Custom dashboard / Grafana / Retool
Dashboards require maintenance, are hard to make flexible, and can't answer questions you didn't anticipate. "What correlates with my bad sleep last week?" is a natural language question, not a dashboard filter.
// exception: the public website (averagejoematt.com) is a read-only dashboard for visitors. But I query via Claude.
03 — The stack
The Exact Stack
No frameworks. No ORMs. No dependency trees. The entire platform runs on Python stdlib + boto3 + one Claude API call per digest.
Language
Python 3.12 — stdlib only for all Lambdas. No pip dependencies. urllib for HTTP, json for parsing, boto3 from Lambda runtime.
Infrastructure
AWS CDK (TypeScript) — 8 stacks: Compute, Data, Web, Alarms, IAM, Shared Layer, Site, Monitoring.
Database
DynamoDB — single table, on-demand, PK+SK only, no GSIs, KMS encrypted, PITR 35-day, deletion protection.
Object store
S3 — raw JSON archive (raw/{source}/{type}/{Y}/{M}/{D}.json), config files, static site hosting.
AI
Claude (Anthropic) — Sonnet for analysis, Haiku for classification. ~$3/month. MCP for tool calls.
CI/CD
GitHub Actions — OIDC federation (no static keys), lint → test → plan → deploy with manual approval gate, auto-rollback on smoke test failure.
Monitoring
CloudWatch + X-Ray — 66 alarms, synthetic canary every 4h, dead-letter queues on all async invocations.
Auth
Secrets Manager + KMS — 10 secrets, in-memory Lambda caching, OIDC for CI/CD, OAuth 2.1 for MCP.
Frontend
Vanilla HTML/CSS/JS — no React, no build step, no bundler. S3 + CloudFront. Site API via Lambda Function URL.
04 — Lessons learned the hard way
What Broke (and What I Learned)
These aren't hypothetical. Each lesson cost at least one incident, one late-night debug session, or one "how did I miss that" moment.
01
MCP registration integrity requires automated validation
MCP is a new protocol — the registry integrity pattern prevents deployment of unimplemented tools. A CI gate (test_mcp_registry.py) cross-references every registered tool name against its implementing function. Zero tolerance for registration-without-implementation.
// pattern: automated registry integrity test runs on every deploy
02
Mixed-ownership S3 prefixes require deployment boundaries
When static site files and Lambda-generated files coexist in the same S3 bucket, sync --delete creates a mixed-ownership problem — deployment removes files it didn't create. ADR-032 established deployment boundaries: separate prefixes per owner, bucket policy blocks DeleteObject on protected paths, and a safe_sync wrapper enforces the rules.
// pattern: ADR-032 safe_sync.sh wrapper — never sync --delete to bucket root
03
CI/CD is non-optional, even for solo projects
I chose manual deploys for velocity in weeks 1-2, accepting the risk for faster iteration. By week 3, the error rate made it clear: even for a single engineer, CI/CD isn't optional. Architecture Review #13 formalized this as the top finding. The pipeline now enforces lint, test, plan, manual approval, deploy, smoke test, and auto-rollback.
// pattern: GitHub Actions with OIDC federation, manual approval gate, auto-rollback
04
Lambda@Edge deploys to us-east-1 regardless of your home region
Spent 3 hours debugging why Lambda@Edge couldn't read secrets. Everything was in us-west-2. Lambda@Edge runs in us-east-1. Secrets must be there too.
// also: CloudWatch billing alarms must use SNS in us-east-1
05
macOS ships bash 3.2 — no associative arrays
Wrote a deploy script using declare -A for a Lambda mapping. Worked on Linux, crashed on macOS. Bash 3.2 doesn't support associative arrays (Apple ships ancient bash due to GPL v3 licensing).
// fix: use parallel indexed arrays or inline Python blocks in bash scripts
06
Secrets governance requires dependency mapping
In any Lambda-based system, the relationship between secrets and their consumers isn't visible from the AWS console. ADR-014 established the governance pattern: document which Lambdas consume which secrets, enforce via automated cross-reference, and never bundle secrets unless consumed by the same Lambda set.
// pattern: ADR-014 secrets dependency mapping — automated validation in CI
07
Correlation ≠ Causation (and your AI will forget this)
Early daily briefs said things like "your high HRV caused better sleep quality." The correlation engine found a relationship. Claude narrated it as causal. Had to add explicit correlational framing instructions to every AI prompt.
// fix: system prompt mandates "correlates with" / "is associated with" — never "causes" or "leads to"
08
Personal data systems need explicit domain boundaries
Health data, behavioral data, and productivity data each belong in separate partitions — even when all three are yours. Cross-contamination creates governance complexity that's trivial to prevent upfront and painful to untangle later. The platform enforces strict data domain boundaries: each source writes to its own partition, and no employer or third-party system is ever ingested.
// pattern: single-table DynamoDB with source-prefixed partition keys — no cross-domain writes
05 — Build timeline
How Fast It Grew
From zero to 62 Lambdas in 5 weeks. Every version was built with Claude as the sole engineering partner.
Feb 22
Day 1 — First Lambda, first DynamoDB write
Whoop ingestion. Single Lambda, single table. 15 minutes from idea to working code.
Week 1
8 data sources online
Whoop, Withings, Strava, Apple Health, MacroFactor, Habitify, Garmin, Eight Sleep. All on EventBridge crons.
Week 2
Daily Brief email + MCP server
Claude synthesizes all data into a coaching email every morning. MCP tools let me query data in natural language.
Week 3
Character Sheet engine + public website
7-pillar scoring system with EMA smoothing, level/tier transitions. averagejoematt.com goes live with live data.
Week 4
Intelligence layer: correlations + hypothesis engine
23-pair Pearson correlation matrix with BH-FDR correction. Weekly AI hypothesis generation from data patterns.
Week 5
CI/CD, architecture reviews, 103 MCP tools
GitHub Actions pipeline with OIDC. 17 architecture reviews by 14-member AI board. Challenge system with XP gamification.
Week 6+
Observatory editorial design, 67-page site, launch prep
Full editorial observatory pattern across 6 health domains. Usability study with 15 simulated participants. Site-api split for isolated AI endpoints. 62 Lambdas, 121 MCP tools, 26 data sources. $19/month.
06 — If I started over tomorrow
What I'd Do Differently
01
CI/CD from day one. Not day 30. The 8 deployment incidents were entirely preventable. GitHub Actions + OIDC takes 2 hours to set up and saves hundreds of hours.
02
Start with 3 data sources, not 8. Whoop + Habitify + one nutrition tracker is enough to build the full pipeline pattern. Adding sources later is trivial once the pattern exists.
03
Design the MCP tools as the primary interface from the start. I built dashboards first, then realized Claude was a better interface. Would skip straight to MCP and build the website as a secondary output.
04
Write Architecture Decision Records from day one. ADRs are 5-minute documents that save hours of "why did I do it this way?" 3 months later. The platform now has 45 ADRs and they're invaluable.
05
Pre-compute everything, query nothing live. The Compute → Store → Read pattern means MCP tools read from pre-calculated results, never from raw data. This makes the system fast, cheap, and predictable.
07 — Start here
Your First Weekend
You can have a working personal data system in one weekend. Here's the minimum viable path:
1
Saturday morning
npx cdk init. One DynamoDB table, one Lambda, one EventBridge cron.
Pick your most interesting data source (Whoop, Oura, Garmin).
Write the ingestion Lambda. Deploy. Verify data in DynamoDB console.
2
Saturday afternoon
Add a second data source. Write an MCP tool that reads both.
Test with Claude Desktop. Ask it: "How did I sleep this week?"
The moment Claude answers from your own data is when it clicks.
3
Sunday
Write a Daily Brief Lambda that reads DynamoDB and calls Claude to synthesize an email.
Schedule it with EventBridge. Tomorrow morning, your AI sends you a coaching brief.
You now have a platform. Everything else is iteration.
Want to see exactly how each piece works? The full architecture diagram, all 62 Lambda definitions,
the MCP tool catalog, and every EventBridge schedule are documented on the platform page.
The cost page shows exactly what you'll spend.
08 — Show me the code
Why the Repo Is Private
The GitHub repo is private. Not because the code is proprietary — it's because it contains API keys in git history, personal health data references, and infrastructure identifiers that would be irresponsible to expose publicly.
What I can show you instead: the full architecture diagram, every architecture review grade (19 reviews by a 12-member AI board), the exact cost breakdown, and every decision documented in 45 ADRs. The code patterns are described in detail on this page — enough to reproduce the system yourself.
If you're building something similar and want to compare notes, reach out. I'm happy to share specific implementation details one-on-one.