Build a Knowledge Base That Compounds: The Signal Ledger Pat

Andrej Karpathy recently posted about using LLMs to build personal knowledge bases. Fifteen million people saw it. His follow-up explicitly said "there is room here for an incredible new product instead of a hacky collection of scripts."

I've been running exactly this system for 29 sessions over the past five weeks. Not as a product, as a workflow. And the thing I've learned is that the value has almost nothing to do with the tooling. It lives in the rules you set before the first session.

Without a system, I was the same as everyone else: reading 10-20 technical sources a week, retaining maybe 10% within a few days, and losing the connections between ideas entirely. Bookmarks are where links go to die. My pocket / matter queue gets longer and longer and full of guilt.

So I built what I call a Signal Ledger. I drop sources into a conversation with Claude, and the LLM filters for signal relevant to my actual work. Not summaries. Distillations. The difference matters, and I'll come back to it.

Write the Rules Down Before Session 1

The single most important thing I did was write the rules down before processing a single source. I call it the contract, a document that tells the LLM exactly how to behave during every session.

The key clauses:

3-5 bullets per source, maximum. This forces compression. If the LLM can't distill a 3,000-word article into 3-5 bullets, it didn't understand the article.
Lead with actionable signal, not interesting signal. "This is a cool finding" is not the same as "this changes how you should build X." Actionable comes first.
Every entry gets a "so what for you" tied to an active project or theme. This is the part that makes the whole thing work. Every source has to connect to something I'm currently building or tracking. If there's no connection, the source might still be worth reading, but it's not signal for the ledger.
Negative signal matters. "This article wasn't useful because X" is explicitly part of the output. It calibrates future sessions and stops me from wasting time on similar sources later.

And then the distinction I mentioned: no summaries, only distillations. A summary tells you what the article said. A distillation tells you what it means for your work. A summary of a Stripe engineering post might say "Stripe built an internal coding agent that merges 1,300 PRs per week." A distillation says "Stripe's key insight is that their agent works because of infrastructure built for human engineers years before LLMs existed. The harness predates the model. Apply this to your own CI/CD before building agent tooling on top."

Without the contract, you get a glorified RSS reader. With it, you get an editor that applies judgment.

Themes and Corroboration

The core mechanism that makes this compound rather than accumulate is theme tracking with corroboration requirements.

Every signal gets classified into one of three tiers:

Parking Lot. A single-source signal. One person said something interesting. Might be noise. Might be early signal. It sits here until corroborated.

Active Theme (watch). Two sources from independent origins describing the same pattern. Worth tracking, not yet worth acting on.

Active Theme (confirmed). Three or more independent sources converging. Signal you can write about, build on, or cite with confidence.

The independence requirement is the key. Two blog posts that both cite the same viral tweet aren't independent. They're amplification of one signal. Three practitioners at different companies, different tech stacks, describing the same architectural pattern without referencing each other? That's convergence.

Concrete example: I track a theme called "Harness Engineering" that started as a single mention in Session 2. One practitioner's blog post arguing that the infrastructure wrapping an LLM matters more than the model itself. Interesting, single source, into the parking lot.

By Session 29, it has 59 independent sources. Practitioners at Stripe, Meta, Vercel, and Anthropic. Academic researchers. Solo developers. Open-source tool authors. A VC investor. An Anthropic safety researcher. All independently converging on the same pattern.

I didn't go looking for that theme. It emerged from the data. The corroboration requirement forced me to notice it organically rather than cherry-pick evidence for something I already believed. And that's the real reason for counting sources and tracking independence: it forces intellectual honesty about what you actually know versus what you want to be true.

What Compounding Looks Like in Practice

After 29 sessions across roughly five weeks, here's what's different.

I've processed over 200 sources. I track 11 active themes with named mechanisms and cross-references between them. I have a parking lot of 15+ single-source signals being watched. I have a backlog of writing candidates where each one has 10+ sources of evidence assembled before I write a word.

The qualitative shift matters more. I stopped chasing individual articles and started seeing patterns across articles. A new source on AI code review lands differently when I already have 25 sources on the broader pattern it fits into. Processing takes less time as sessions progress because existing themes provide immediate context. "This is the 4th independent source on comprehension debt" is more useful than "here's another article about AI coding risks."

My first published blog post after enacting this process drew from sources tracked across 8 sessions. I didn't have to go find supporting material after deciding to write. The ledger had already assembled it.

I think the difference is between reading and research. Reading without a system is consumption. Each article exists in isolation, competes with everything else you read that week, and fades. With a compounding system, today's reading makes tomorrow's reading more valuable because it either reinforces, nuances, or contradicts something you're already tracking.

This is what compounding looks like for me:

Theme	Sources	First Appeared
Harness Engineering	59	Session 2
Capability/Practice Gap	29	Session 1
Vibe Coding Risk	26	Session 3
BDD / Spec-First	16+	Session 3
Comprehension Debt	18	Session 17
Context Infrastructure	21	Session 4
Execution Layer > Model Layer	19	Session 7
Autonomous Compounding Loop	17	Session 12
Third Era	13	Session 3
Persistent Agent Memory	11	Session 3
Code Review as Delivery Chokepoint	4	Session 16

What Doesn't Work

The ledger gets unwieldy. Past roughly 50,000 words, appending to a single file causes performance issues. I added "theme docs" as a patch: standalone reference documents per theme that get rewritten when a theme evolves significantly. If you've built agent memory systems, this problem will sound familiar. Any append-only store needs a compaction mechanism eventually.

Confirmation bias is a real risk. When you have a named theme, you start seeing it everywhere. A source that vaguely touches the topic gets filed under the theme even when it's tangential at best. I added a monthly health check: a structured audit that reviews the entire ledger for contradictions, unsourced claims, stale themes, and drift between what the sources actually say and what my framing claims they say. The first one flagged two themes where I was stretching the source material. I wouldn't have caught it otherwise.

Not every session is high-signal. Some batches of 10 sources produce one actionable insight and nine entries of "this was noise." That's calibration, not waste. But it doesn't feel great in the moment.

The "so what for you" framing requires active projects. If you're not building anything, it collapses back into summarization. The system is for people who are working on things, not for passive readers.

Steal This

The specific tools don't matter. I use Obsidian and Claude, but the principles work with any LLM and any notes app. Here's what does matter:

1. Write a contract before Session 1. Format, length limits, what counts as signal, how to handle noise. Write it down. The LLM should follow these rules every session without you re-explaining them. Here's a version of the contract I use - for it and make it your own.

2. Require a "so what for your work" on every source. This kills the instinct to collect interesting things and replaces it with a filter for useful things. If a source can't be tied to something you're building, it might still be worth reading. It's just not signal.

3. Track themes with corroboration requirements. Don't promote a signal to "something I believe" until three or more independent sources converge. Count the sources. Track whether they're independent. This is what separates a knowledge base from a collection of highlights.

4. Log negative signal. "This wasn't useful because the author conflated two different patterns" teaches the system and you what to skip next time.

5. Audit yourself. Monthly health checks. Drift detection. Review your parking lot. Stuff that's been sitting there for weeks without corroboration was probably noise.

6. Each session should make the next session faster. If your 20th session takes the same effort as your 5th, you're accumulating, not compounding. Existing themes should provide context that accelerates processing. If they don't, something's off.

Where I Am Now

Karpathy said there's room for a product here. Maybe. But after 29 sessions, I think the product question is less interesting than the workflow question. A tool that summarizes your reading is an RSS reader with a language model. What changes the dynamic is enforcing corroboration requirements, tying every input to active projects, and auditing yourself for confirmation bias. Those are editorial decisions, not features.

The ledger now tracks connections I've forgotten and assembles evidence bases for things I haven't written yet. I don't read articles the same way I used to. I read them as potential entries in a system that's been building context for weeks.

That's the workflow. Tools are up to you.

Build a Knowledge Base That Compounds

Write the Rules Down Before Session 1

Themes and Corroboration

What Compounding Looks Like in Practice

What Doesn't Work

Steal This

Where I Am Now

Comments (1)

More from this blog

I Ran 60 Autoresearch Experiments on a Production Search Algorithm. Here's What Actually Happened.

I Built a Local-First HSA Receipt Tracker with Flask, Google Drive, and AI

Using Claude to manage.. me

Changing how we do standups

Command Palette

Write the Rules Down Before Session 1

Themes and Corroboration

What Compounding Looks Like in Practice

What Doesn't Work

Steal This

Where I Am Now

Comments (1)

More from this blog