# Auditor response — 2026-04-25 — role confirmed + venue plan revised + Step 1/2 drafted ## 1. Role confirmed I am the **advisor / system architect**. I read, analyze, design, and write prompts. I do not write or modify production code. The dev CLI executes. I will: - Continue read-only inspection (logs, DB, configs, file system) — these are within my mandate. - Continue advising and drafting dev prompts. - **Wait for your explicit approval before sending any prompt to the dev CLI.** - Not perform "convenient" sysadmin tasks unless you explicitly authorize them as scoped exceptions (as you did earlier with nginx + the published plan). If I'm about to take an action that crosses into execution, I will pause and ask first. --- ## 2. Plan adjustment: broaden venues from day 1 You're right — confining the initial historical test to Kraken + Coinbase risks "confirming what we already know." Two-venue spot arb between two US-regulated retail venues at retail fees is a known-thin market. ### Updated venue set for initial historical replay | Venue | Why include | Inefficiency expectation | |---|---|---| | Kraken | Already in live engine | Baseline | | Coinbase | Already in live engine | Baseline | | **Binance** (NEW) | Largest by volume globally; sets the "anchor" price. Useful as a reference even if it's the most efficient venue. | Low — but seeing spreads vs Binance reveals where Kraken/Coinbase mis-quote | | **OKX** (NEW) | Top-5 by volume, Asia-centric, has known microstructure differences during US off-hours. Most likely place to find genuine inefficiency at retail tier. | Medium-high | **Recommendation: start with all 4 venues.** This gives `4 × 3 = 12` directed venue-pairs per currency vs `2` today (6× the surface area for finding inefficiency). Marginal data cost is small. If the budget needs to be tighter, drop to 3 venues: keep **Kraken + Coinbase + OKX**. OKX is the better single addition for inefficiency-detection because Binance is so efficient it tends to compress the spreads of everyone else. I am **not** including new currency pairs (SOL, XRP, DOGE etc.) in this round. That's a separate axis (Step 6 in my prior plan). Mixing the two axes makes it harder to attribute results. ### Revised cost / scope estimate | Item | Old plan | New plan | |---|---|---| | Venues | 2 (kraken, coinbase) | 4 (kraken, coinbase, binance, okx) — or 3 with okx as the cheapest extra | | Pairs | 2 (BTC/USD, ETH/USD) | 2 (unchanged) | | Window | 7 days | 7 days (unchanged) | | Tier B (L1 tick) data cost from Tardis.dev | ~$30–$60 | ~$60–$120 (4 venues) / ~$45–$90 (3 venues) | | Directed venue-pairs per currency | 2 | 12 (or 6 if 3 venues) | I will re-publish the updated `/plan.md` only on your authorization. --- ## 3. Step 1 — read-only audit (this is what I will do, no dev involvement) **Scope:** all reading. No writes, no config changes, no dev prompts. **Tasks:** | # | Task | Tool I'll use | Output | |---|---|---|---| | 1.1 | Inventory self-captured tick data (`engine/data/ticks/`): file format, daily volume per venue/symbol, gaps, completeness, retention policy | sudo read of `engine/data/ticks/`, file inspection | A coverage table: `(venue, symbol, days, MB/day, gaps, ts integrity)` | | 1.2 | Audit engine module boundaries (`engine/src/feeds/`, `book/`, `strategy/`, `risk/`, `execution/`) to confirm a feed-layer swap is structurally feasible | sudo read of source | Yes/no on "can replay swap feeds without modifying downstream"; list of any cross-cutting touch points (clock, IDs, persistence) | | 1.3 | Audit ledger DB schema for replay-compatibility | sudo read of migrations + schema dump | Confirm same `opportunities` schema can hold replay rows; flag any UNIQUE constraints that would prevent replay re-runs | | 1.4 | Identify all `nowUtcMs()`/`nowMelb()` call sites — this is the surface area the replay clock must cover | grep through `engine/src/` | A list of files/lines that need replay-clock injection | | 1.5 | Research historical L1 data vendors (Tardis.dev, Kaiko, CryptoChassis, Crypto-Lake) for 7-day window, kraken+coinbase+binance+okx, BTC/USD+ETH/USD | WebFetch/WebSearch on vendor pricing pages | Comparison table: cost, format, latency-of-quote granularity, ease of integration. Final venue/vendor recommendation. | | 1.6 | Identify existing live capture coverage gaps (e.g., does `data/ticks/` capture both venues for both pairs in unified time? do timestamps line up?) | sudo read + simple JSONL diff | Statement on whether self-captured data is good enough for a parallel "free" replay test before paid data lands | **Time:** ~20 minutes if I dive in. **Output:** a single audit report in this conversation, plus an offer to update `/plan.md` if anything changes the strategy materially. **This step does not need dev approval — it's all read-only.** I'll start it on your "go." --- ## 4. Step 2 developer prompt (DRAFTED, NOT SENT) This is what I would send to the dev CLI **after** Step 1 confirms feasibility and **after** you approve. Reviewing it now lets you change scope before any code is touched. --- > **Task: replay/backtest infrastructure (read-only over historical data)** > > Add the ability to replay captured or vendor-supplied L1 tick data through the existing engine pipeline (`OrderBook → Strategy → Risk → Simulator`) without modifying any of those modules and without touching live state. > > **Constraints (must hold):** > - `mode: paper` config lock unchanged. > - No schema changes, no migrations against the live `engine/data/db/ledger.db`. > - No modification of strategy / risk / execution / fees / slippage logic. > - No `new Date()` introduced anywhere in business code. > - All 85 existing tests must still pass. > - Replay must be **venue-agnostic**: accept any `(exchange, symbol)` tuple present in the input stream — do not hardcode kraken/coinbase. > - Replay output writes to a separate SQLite file path (e.g. `data/db/replay-{runId}.db`), never the live ledger. > > **Components to add:** > > 1. **`FeedAdapter` interface** in `engine/src/feeds/feed-adapter.ts` exposing: > - `start(handler: (quote: Quote) => void): Promise` > - `stop(): Promise` > - `health(): { exchange: string; healthy: boolean; lastMessageAt: number }` > Refactor `feeds/kraken.ts` and `feeds/coinbase.ts` to implement it. **No behavior change.** > > 2. **`HistoricalFeedAdapter`** in `engine/src/feeds/historical.ts`: > - Constructor accepts an array of file paths (JSONL in the existing `data/ticks/` format) and an optional `[fromMs, toMs]` window. > - On `start`, opens all files, k-way-merges quote events by `ts` ascending, calls handler for each quote. > - Emits in batches with no real-time delay (as-fast-as-possible mode). > - Validates each line against the existing `Quote` zod schema; reject malformed lines with a counted-and-logged error, do not crash. > > 3. **`ReplayClock`** in `engine/src/time/replay-clock.ts`: > - Module-level pluggable clock. The existing `nowUtcMs()` / `nowMelb()` helpers must read from a clock-source that defaults to wall-clock but can be swapped at startup to a `ReplayClock`. > - The replay clock returns the timestamp of the most recently dispatched quote. > - Audit and update **only** the existing time helper module — do not change call sites. > > 4. **`replay` CLI runner** in `engine/src/replay.ts` (entrypoint registered in `package.json` as `npm run replay`): > - Args: `--tick-dir ` (or `--tick-files `), `--from `, `--to `, `--pairs `, `--venues `, `--out-db `, optional `--fees-override `, `--slippage-override `, `--min-edge-pct-override `. > - Wires `HistoricalFeedAdapter` + `ReplayClock` into the existing pipeline factory. > - Writes `opportunities` rows to `--out-db` using the same repo code as live (so ledger code is unchanged). > - Prints a summary at the end: total quotes, total opportunities, qualifying count, wall-clock duration, replayed-time duration. > > 5. **Tests** (additional, must keep all 85 prior passing): > - Unit: `HistoricalFeedAdapter` k-way merges two files in correct order on a synthetic 100-quote fixture. > - Unit: `ReplayClock` returns last-dispatched quote ts. > - Integration: replay engine over a fixture JSONL containing 1,000 quotes spanning 5 minutes; assert opportunity count, qualified count, and net_edge percentile match a hand-computed baseline. > - **Regression-of-equivalence test:** take a 30-min slice of `data/ticks/` from a known live run, replay it through the new infra, and assert the output `opportunities` rows match the live ledger rows (filtered to the same time window) on `(detected_at, symbol, buy_exchange, sell_exchange, gross_spread_usd, net_edge_usd, qualifies)`. ID column may differ — exclude from comparison. Allow ≤1% mismatch tolerance for races at quote boundaries. > > **Out of scope for this task:** > - Live WS adapters for Binance / OKX (separate task, only needed for live capture, not historical replay). > - Ingestion of vendor (Tardis/Kaiko) data formats (separate task). > - Dashboard UI for replay results. > - Performance optimization beyond "completes 7 days of 4-venue ticks in <10 minutes wall-clock." > > **Definition of done:** > - `npm test` ≥ 90 passing (was 85 + at least 5 new). > - `npm run replay -- --tick-dir engine/data/ticks --pairs BTC/USD,ETH/USD --venues kraken,coinbase --out-db /tmp/replay-self.db` completes successfully. > - `curl 'http://127.0.0.1:3200/opportunities/stats'` continues to return live-ledger stats unchanged. Replay does not contaminate live DB. Verified by checksum of live `ledger.db` before and after replay run. > - One git commit with a clear message; no side commits to unrelated files. --- ## 5. Awaiting your decisions Three things I need from you before moving: 1. **Approve Step 1** (read-only audit) — yes / no / change scope? 2. **Venue choice for the initial historical run** — 4 venues (kraken, coinbase, binance, okx) or 3 (drop one — which)? 3. **Step 2 prompt review** — any changes? Or hold it as-is until Step 1 finishes and I might tighten it based on what the audit reveals? I will not start Step 1 or send the Step 2 prompt until you respond.