← Intelligence Hub
Volume 4 · Engineering Intelligence

Built in 25 Days

How a lean team shipped a multi-AI production tax engine — 10,700 lines of extraction logic, a 16-tool autonomous agent, and psychographic personalization from 1,885 real interactions — in under a month.

10.7K
Lines of Extraction Logic
42+
IRS Form Types
16
AI Agent Tools
575
Test Cases

The Timeline

From first commit to production multi-AI system. Every week compounded on the last.

Week 1 — Foundation
March 1–7, 2026
  • Next.js App Router + Supabase auth scaffolding
  • 12 core database tables with RLS policies
  • Document upload infrastructure + viewer
  • Organizer flow (taxpayer info collection)
  • Referral system + OG dynamic preview cards
Week 2 — Pipeline
March 8–14, 2026
  • 22-status admin pipeline with 6-bucket view
  • Stripe billing integration (on-demand checkout sessions)
  • Cal.com meeting sync cron (daily at 2 AM UTC)
  • Mobile-first admin UI (3px color track, tap-to-expand)
  • Growth analytics dashboard with weekly metrics
Week 3 — Intelligence
March 15–21, 2026
  • Atlas AI assistant — SSE streaming + 16 tools + memory
  • Azure Document Intelligence for W-2, 1040, 1099 family
  • 3-tier extraction pipeline: Azure DI + Azure CU + Claude fallback
  • 42+ IRS form types with custom normalizers (10,700 LOC)
  • Voice agent (Vapi integration, conversational mode)
  • 30 unit tests + 16 Playwright E2E tests shipped same week
Week 4 — Orchestration
March 22–26, 2026
  • Psychographic intelligence: 5 emotional archetypes from 705 real calls
  • Gemini Flash reasoning layer (12 specialized tax use cases)
  • Langfuse prompt versioning + A/B testing (20/80 split)
  • Confidence scoring per extracted field (0.0–1.0)
  • Alert system: 8 risk flag types with severity tiers
  • Post-filing dashboard + tax planning page + Learn tab
  • 575 test cases across unit + E2E + Lighthouse CI
The pivotal moment

March 20, 5:26 PM — The system shifted from "Claude does everything" to a three-tier extraction architecture. Azure Document Intelligence for structure, Azure Content Understanding for semantics, Claude only as fallback. This single decision cut extraction cost by 50–70% and improved accuracy to 99%+ on standard forms.

The Extraction Engine

Extraction Pipeline
The technical core

10,700 lines of domain-specific tax logic — not a wrapper around an LLM. Custom normalizers for every IRS form type. Cross-document validation. Confidence scoring per field. This is the hardest piece to replicate.

Architecture

Document Upload
Azure DI
Structured extraction
Azure CU
Semantic understanding
Run in parallel · 45–90s timeout · Winner takes all
20+ Validation Rules
+ 8 cross-document checks
Claude Fallback
Only if Azure returns empty
return_intelligence
60+ aggregated fields per return

Form Coverage

42+
IRS Form Types
22
W-2 Box 12 Codes
80+
Brokerage Registry
5
Typed Error Codes
Document FamilyTypes CoveredKey Fields
W-2W-2, W-2c, W-4All 22 Box 12 codes (D=401k, AA=Roth, W=HSA, Z=409A, V=NSO), multi-state, locality
1040 Family1040, 1040-SR, 1040-NR + Schedules A/B/C/D/E/SE/1/2/3180+ field mapping, derived effective tax rate, AMT flag
1099 FamilyINT, DIV, NEC, MISC, R, B, G, K, SSA, SA, S, DA + 10 moreTransaction-level wash sale, RSU $0 cost basis detection, qualified vs ordinary dividends
1098 Family1098, 1098-E, 1098-TMortgage interest, student loan, tuition + AOC eligibility gating
OtherK-1, 1095-A/C, paystub, ID docs, bank statements, receiptsYTD 401k tracking, over-contribution risk flag, employer match detection
Custom AnalyzersForm 8949, Form 8889, state returnsGen AI-powered (trained CU analyzers), not generic OCR

Normalizer Depth

FileLOCWhat It Does
azure-cu-normalizers.ts1,311W-2 + 1099 combo via Content Understanding field names
azure-normalizers-income.ts1,26115 income form normalizers via Document Intelligence
azure-cu-normalizers-income.ts1,1271099 family via CU (different field naming than DI)
azure-normalizers-1040.ts9961040 main form + all schedules via DI
extractor.ts986Unified extraction orchestrator with retry + timeout strategy
Why this is hard to replicate

409A failure code Z detection — triggers 20% penalty + deferred income taxation if missed. Most competitors map 5–8 Box 12 codes. We map all 22. RSU $0 cost basis flagging on 1099-B catches double-tax risk that costs clients $3,000+. Box 14 SDI/FLI regex handles OCR variants (CASDI, NJSDI, NYSDI) for state deductibility routing across CA/NJ/NY/WA.

Validation & Quality

20+ form-specific validation rules catch extraction errors before they reach the CPA. 8 cross-document checks catch inconsistencies across the full return.

Form-Level Checks

W-2: fed withheld ≤ wages, SS wages within 20% of total. 1099-B: proceeds ≥ 0, cost basis sanity. 1098: mortgage interest ≤ outstanding balance. Any field with confidence < 0.7 flagged for CPA review.

Cross-Document Checks

Paystub YTD ≈ W-2 wages. Duplicate 1099-INT from same payer. Duplicate 1099-B from same brokerage. Withholding rate sanity (5–40%). W-2 wages vs prior 1040 AGI consistency.

Atlas AI Agent

Atlas AI Agent — luminous brain orb connected to 16 autonomous tool panels
Atlas AI Agent
Not a chatbot

A 16-tool autonomous agent that fills organizer forms, books meetings, tracks IRS refunds, submits revision requests, and approves returns for e-filing. It acts on behalf of the user — it doesn't just talk.

Tool Ecosystem

ToolWhat It DoesWrites to DB
fill_organizer_fieldSaves to 4 tables (profiles, tax_profiles, addresses, income_info)Yes
approve_draftMarks return approved, notifies expert, triggers e-file pipelineYes
show_bookingEmbeds Cal.com widget inline, updates return statusYes
submit_revision_requestRecords revision, updates pipeline statusYes
fill_questionnaire_eventsBatch yes/no for life events + financial eventsYes
get_tax_estimateReal-time federal tax calculation from extracted dataRead
track_irs_refundTriggers background IRS refund checkYes
suggest_upload_categoryOpens file picker for specific document typeRead
navigate_organizer_sectionMoves sidebar to specific section (11 options)Read
suggest_repliesRenders quick-reply chips in chat UIRead
list_uploaded_documentsCurrent documents with extraction statusRead
get_missing_documentsPersonalized missing doc list by income typeRead
check_return_statusCurrent pipeline stage + task completionRead
get_organizer_statusFields filled vs missing across all sectionsRead
trigger_external_workflowStarts background async job (IRS checks)Yes
contact_expertEscalation to human CPA/EAYes

Context Assembly

Before every response, Atlas loads 8 context layers in parallel via Promise.all(). Total assembly time: ~200–300ms.

buildAgentContext() — 8 Parallel DB Loaders
Return Intelligence
60+ aggregated fields
Conversation Memory
Top 3 by importance
Personality Profile
analyst / driver / amiable / expressive
Pending Doc Requests
Top 5 by dollar impact
Tax Rules
40 IRS limits (anti-hallucination)
What-If Scenarios
Actionable savings ranked by delta
Journey State
9 sub-queries, next action
Emotional Archetype
Detected from calls + behavior

Streaming & Performance

~400ms
Stream opens before context loads
80%
Cheaper on prompt cache hit
5
Max iterations per turn

SSE streaming opens the HTTP connection ~400ms before context assembly completes — the user sees the typing indicator immediately. Bidirectional sync: form fields update live as the user talks. Ephemeral prompt caching (5-minute TTL) saves ~80% on input tokens for rapid-fire conversations.

Determinism rule

Every dollar amount Atlas cites must come from database context, never from model weights. The 40 IRS tax rules are loaded as ground truth — Atlas never guesses a deduction limit or bracket threshold. This is how you build trust with anxious immigrants handling $200K+ in W-2 income.

The Intelligence Layer

Psychographic Intelligence

5 emotional archetypes detected from 705 real client calls (Fireflies + RingCentral), not personas invented in a workshop. Atlas adapts tone, urgency, and detail level per archetype. Refinement runs every 5 messages based on emotional signals.

52%
Anxious Immigrant
Reassure before inform. Avoid scary words (audit, penalty). "This is very common."
~20%
Optimizer
Show the math. Precise numbers. Asks "why" — answer with calculations.
~12%
Overwhelmed
Bottom line first. Skip jargon. One thing at a time. Don't stack questions.
~8%
Price Shopper
Lead with value, not features. Compare to alternatives. Quantify savings.
~8%
Relationship Seeker
Mirror warmth. "I've got you." Long-term loyalty over quick wins.

Alert System

8 risk flag types auto-detected from extracted data. Severity tiers prevent alert fatigue. Anti-contradiction rules ensure Atlas never contradicts CPA advice.

AlertDetection LogicWhy It Matters
401(k) Over-ContributionSum Box 12 code D across all W-2s > $23,5006% excise tax on excess if not corrected
HSA Over-ContributionSum Box 12 code W vs family/individual limit6% excise tax + taxable income
RSU $0 Cost Basis1099-B transactions with cost_basis = 0Double-tax risk: $3,000+ per vesting event
Wash Sale AccumulationSum wash_sale_loss_disallowed across all 1099-BsOverstated loss = IRS audit trigger
FBAR / FATCA RequiredForeign account indicators in extraction$10,000+ penalty per unreported account
Multi-State FilingMultiple state entries across W-2sIncorrect allocation = state audit
Backdoor Roth OpportunityAGI > $161K + no IRA distributionMissing $1,500+ annual tax savings
Underpayment Penalty RiskWithheld < 90% of estimated liabilityAvoid penalty via Q4 estimated payment

Prompt Management

Langfuse Versioning

Edit Atlas behavior without deploying code. 5 archetype prompt variants, emotional framework, curiosity mode. A/B testing with 20% experiment / 80% control via user ID hash.

Emotional Scoring

Every response scored for engagement + emotional shift. Confident (+2), relieved (+1), neutral (0), confused (-1), frustrated (-2), anxious (-1). Tracked in Langfuse traces.

Production Infrastructure

169
API Routes
692
TypeScript Files
575
Test Cases
59
Database Tables

Integrations

ServicePurposeStatus
StripeDynamic checkout sessions, webhook payment confirmation, split invoicingProduction
Cal.comMeeting scheduling, daily cron sync, auto-status (confirmed/completed/cancelled)Production
ResendTransactional email (welcome, referral, expert notify), archetype-aware copyProduction
LangfusePrompt versioning, A/B testing, conversation tracing, emotional scoringProduction
Upstash RedisRate limiting across serverless instances (5/min, 80/hr, 200/day)Production
SentryError tracking + performance monitoring + release healthProduction

Security

Data Protection

RLS on all 59 tables. SSN encrypted with AES-256-GCM. UUID validation on all route params. 10 MB file upload limit enforced server-side. HSTS + CSP + X-Frame-Options: DENY.

Rate Limiting

Upstash Redis sliding window: 5 msg/min burst, 80 msg/hr sustained, 200 msg/day cap. Fail-open if Redis unreachable (never block users). Exponential backoff with Retry-After header respect.

Cost Efficiency

Prompt Caching

Ephemeral 5-min TTL on system prompt + tools array. ~80% cheaper input tokens, ~40% faster TTFT on cache hit. Estimated ~$0.50/user/month vs $2–3 without caching.

Model Selection

Claude Sonnet for conversation. Haiku for memory extraction (70% cheaper). Parallel Azure timeout forces 50–70% fewer expensive CU calls. Deterministic income builder replaced $0.005/doc Claude calls.

Testing

575 test cases across Vitest unit tests, Playwright E2E scenarios, and Lighthouse CI. Seeded test clients (w2-simple, freelancer, missing-docs) for repeatable E2E runs. GitHub Actions CI on every push.

The Multi-AI Orchestra

Multi-AI Orchestra

"The secret sauce isn't one AI model — it's the orchestration. Six specialized engines, each playing its role in a carefully choreographed system."

Azure DI
Structured extraction — prebuilt IRS form models, field-level confidence, deterministic output
Azure CU
Semantic understanding — handles unusual layouts, Gen AI-powered, custom trainable analyzers
Claude
Conversation + fallback — SSE streaming agent with 16 tools, edge case extraction
Gemini Flash
Reasoning layer — 12 specialized tax reasoning tasks, scenario analysis, planning
ElevenLabs
Voice synthesis — filing summaries as audio, sharable MP3s with weather-effect player
Langfuse
Prompt ops — versioning, A/B testing, conversation tracing, emotional scoring, cost tracking

Each service is chosen for what it does best. Azure for structured document understanding. Claude for nuanced conversation. Gemini for fast reasoning. ElevenLabs for natural voice. Langfuse for iteration speed. The orchestration layer is the moat — not any single model.

What's Next

The platform thesis expands

Everything built so far — the extraction engine, the agent architecture, the psychographic layer — is a horizontal capability. Tax was the proving ground. The intelligence layer is the product.

Beyond Tax
The workflow intelligence layer that monitors every client relationship in real-time. Flags unreplied communications, missing documents, and approaching deadlines before they become problems.
Beyond One Firm
Multi-tenant architecture. One codebase, vertical-specific configuration. Status pipelines, deadline calendars, signal types, and alert templates — all configurable per vertical.
Beyond US
300,000+ chartered accountants in India. Massively underserved. Distribution moat + product validation + seasonality buffer. Peak seasons offset: India (Jan–Mar, Jul–Sep) vs US (Jan–Apr).
Beyond Filing
Year-round engagement: tax planning, RSU vesting calendar, quarterly estimated payments, Roth conversion optimizer. The client relationship doesn't end at e-file — it begins.

"We built this in 25 days with a lean team. Imagine what happens with 12 months and a platform thesis."