SGT--:--
← Back to all posts

Building to Make Myself Redundant

Building an AI Agent to Replace Me

Here’s what a typical day of my life in the Quant Data team used to look like: a trader Slacks me asking for yesterday’s PnL breakdown on a specific desk. I write a SQL query against ClickHouse, format the results, send back a screenshot. Twenty minutes later, another question from another desk. Meanwhile, Grafana alerts — once a useful signal — have become white noise. The alert count keeps growing, but the number of engineers reading them hasn’t.

So I built an agent to handle the Ad-hoc queries and alert fatigue. We call it D.

Tools and Skills

D runs on Claude CLI and Copilot SDK. A deliberate choice — we designed the system to be vendor-agnostic, so we’re not locked into any single provider. We actively test against multiple backends to make sure the tools and skills layer remains interoperable.

D has two types of capabilities:

Tools are functions — concrete actions like querying ClickHouse, generating a PDF report, uploading a file to Slack, or calling an internal API. Each tool is a thin wrapper around something that already exists.

Skills are knowledge. This is where it gets interesting. We split them into two tiers:

  • Internal skills — Hidden from end users. These encode data and infrastructure knowledge: which ClickHouse tables matter, what each column means, how to interpret schema quirks. The stuff that lives in an engineer’s head.
  • External skills — Visible and editable by the trading desk. These encode business logic and domain context. More on this below, because this is the part that makes D actually useful.

We gave D its own GitHub account and connected it to everything it needs:

  • Data — Read-only access to ClickHouse and MongoDB for trade data, positions, and risk metrics
  • Observability — Grafana dashboards, K8s logs via Teleport, Airflow DAG status
  • Collaboration — Jira, Confluence (with write access to specific folders), and GitHub repos
  • Interface — Slack and Chainlit as the user-facing frontends
┌────────────────────────────────────────────────────────────┐
│                      USER INTERFACES                       │
│               ┌──────────┐    ┌───────────┐                │
│               │  Slack   │    │ Chainlit  │                │
│               └─────┬────┘    └─────┬─────┘                │
│                     └───────┬───────┘                      │
└─────────────────────────────┼──────────────────────────────┘


┌────────────────────────────────────────────────────────────┐
│                      AGENT D                               │
│                                                            │
│  ┌──────────────┐  ┌─────────────┐  ┌───────────────────┐  │
│  │ System Prompt│  │  Reasoning  │  │ Tool Selection    │  │
│  └──────────────┘  └─────────────┘  └───────────────────┘  │
│                                                            │
│  ┌─────────────────────┐  ┌──────────────────────────────┐ │
│  │ Internal Skills     │  │ External Skills (user-owned) │ │
│  │ (data/infra context)│  │ (business logic/domain)      │ │
│  └─────────────────────┘  └──────────────────────────────┘ │
│                                                            │
└──────────┬─────────────────┬─────────────────┬─────────────┘
           │                 │                 │
           ▼                 ▼                 ▼
┌────────────────┐  ┌────────────────┐  ┌────────────────────┐
│  DATA (read)   │  │ OBSERVABILITY  │  │  COLLAB (write)    │
│                │  │                │  │                    │
│  ClickHouse    │  │  Grafana       │  │  Jira              │
│  MongoDB       │  │  K8s / Teleport│  │  Confluence        │
│                │  │  Airflow       │  │  GitHub            │
└────────────────┘  └────────────────┘  └────────────────────┘

┌────────────────────────────────────────────────────────────┐
│                      CONTROL PLANE                         │
│                                                            │
│  MySQL — traces, access control, audit logs                │
│  Every prompt, tool call, query, and response is logged    │
└────────────────────────────────────────────────────────────┘

External Skills: Letting the Desk Teach the Agent

This is the highest-leverage decision we made. External skills are Markdown files in a GitHub repo that the trading desk can edit directly. 1 folder for each desk — no engineering ticket, no deployment, no waiting. Strongly encouraged to follow design principles from Claude’s guide on skills based around progressive disclosure.

Each file gives D domain context that only the desk has. For example, when risk on a particular book exceeds a threshold, the cause could be one of several things — and the correct response depends on which one:

  • Insufficient balance on an external exchange → Request a funding transfer
  • Backfill trades weren’t accounted for → Manual hedge needed
  • Sudden price swing causing risk to spike → Monitor, let the VWAP hedge correct over time

An engineer can’t write these rules. A trader can — and they do, in plain Markdown. When D encounters a risk alert, it pulls the relevant skill files and reasons through the possible causes with the desk’s own decision framework.

When a trader notices D doesn’t understand something, they add a skill file and it gets picked up on the next reload. The feedback loop is tight: identify gap → write context → D gets smarter. No middleman.

Alert-Driven Troubleshooting

Grafana alerts are wired to trigger D via @bot mentions. When an alert fires, D follows a simple loop:

  1. Checks logs — Pulls relevant K8s pod logs and Airflow DAG run status
  2. Queries the database — Looks at recent trade data or system metrics for anomalies
  3. Reasons through skills — Matches the symptoms against known patterns from desk’s skills
  4. Posts summary and action item — Sends a structured diagnosis to the Slack channel with a recommended action

The human on-call reviews and approves. D doesn’t auto-fix — that trust hasn’t been earned yet. But it turns a 30-minute investigation into a 2-minute review.

The Moment It Clicked

The real validation came during a genuine insufficient-balance alert. D detected the issue, diagnosed the root cause, and posted the recommendation to Slack — all within seconds. The traders saw it, confirmed, and initiated the funding transfer within minutes of the alert firing.

Before D, this would’ve been: alert fires → engineer wakes up or context-switches → investigates → Slacks the desk → desk investigates on their end → action taken. That’s easily 30 minutes to an hour, sometimes more if it’s overnight. In crypto markets that never close, those minutes have real PnL impact.

That was the moment it stopped being a side project.

Access Control and Observability

The security model is deliberately simple:

  • All database access is read-only. D can query ClickHouse and MongoDB but can’t write to them.
  • Write access is narrowly scoped — specific folders in Grafana, Jira, and Confluence. Enough to update dashboards and create tickets, not enough to do damage.
  • Full trace logging. Every interaction is recorded end-to-end in a MySQL control plane: the user prompt, system prompt, D’s reasoning chain, which tools were called, what queries were executed, and the final response.

The trace logging isn’t optional — it’s the foundation of trust. When a trader says “D gave me the wrong number,” you need to reconstruct exactly what happened. No black boxes.

Early Results

It’s too early for precise metrics. But the qualitative shift is unmistakable.

Traders get answers in seconds instead of waiting for an engineer to context-switch. Alert response went from “wait until someone investigates” to immediate triage with a recommended action. And in cases like the insufficient-balance alert, the speed improvement has direct, measurable PnL impact.

Roughly 30–40% of my repetitive work — the ad-hoc queries, the routine troubleshooting, the “can you check this for me” Slack messages — is now handled by D.

What I’ve Learned

Start read-only. D earned trust by proving it could answer questions accurately before we gave it any write access. Even now, writes are scoped to documentation, Jira tickets, Github PRs — never direct changes to production systems without human oversight.

Let the domain experts own the context. Engineers shouldn’t be the bottleneck for teaching an agent about trading desks. The external skills repo — where traders write Markdown files with their own decision logic — turned out to be the single most impactful design choice.

Trace everything. Full observability over the agent’s reasoning isn’t a nice-to-have. It’s what makes the difference between “interesting prototype” and “tool the desk actually relies on.”

Redundancy is a spectrum. D handles the work that was never a good use of my time in the first place. The ad-hoc queries. The routine alert triage. The human-API work.

That’s the point. Not to replace the engineer, but to stop wasting one.


The goal was never to automate myself out of a job. It was to stop being the human middleware between traders and data — and spend my time on the systems that actually need engineering judgment.