<?xml version="1.0" encoding="UTF-8"?><rss xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:atom="http://www.w3.org/2005/Atom" version="2.0"><channel><title><![CDATA[PJ Hoberman]]></title><description><![CDATA[Thoughts and things that I felt needed writing]]></description><link>https://blog.pjhoberman.com</link><image><url>https://cdn.hashnode.com/uploads/logos/69c1f26830a9b81e3ad28c82/d1093e9b-049a-4e1d-a8be-4224c3605f24.png</url><title>PJ Hoberman</title><link>https://blog.pjhoberman.com</link></image><generator>RSS for Node</generator><lastBuildDate>Wed, 08 Apr 2026 04:26:52 GMT</lastBuildDate><atom:link href="https://blog.pjhoberman.com/rss.xml" rel="self" type="application/rss+xml"/><language><![CDATA[en]]></language><ttl>60</ttl><item><title><![CDATA[Build a Knowledge Base That Compounds]]></title><description><![CDATA[Andrej Karpathy recently posted about using LLMs to build personal knowledge bases. Fifteen million people saw it. His follow-up explicitly said "there is room here for an incredible new product inste]]></description><link>https://blog.pjhoberman.com/build-a-knowledge-base-that-compounds</link><guid isPermaLink="true">https://blog.pjhoberman.com/build-a-knowledge-base-that-compounds</guid><category><![CDATA[AI]]></category><category><![CDATA[Productivity]]></category><category><![CDATA[KnowledgeManagement]]></category><category><![CDATA[llm]]></category><category><![CDATA[workflow]]></category><dc:creator><![CDATA[PJ Hoberman]]></dc:creator><pubDate>Tue, 07 Apr 2026 23:28:14 GMT</pubDate><content:encoded><![CDATA[<p>Andrej Karpathy <a href="https://x.com/karpathy/status/2039805659525644595">recently posted</a> about using LLMs to build personal knowledge bases. Fifteen million people saw it. His follow-up explicitly said "there is room here for an incredible new product instead of a hacky collection of scripts."</p>
<p>I've been running exactly this system for 29 sessions over the past five weeks. Not as a product, as a workflow. And the thing I've learned is that the value has almost nothing to do with the tooling. It lives in the rules you set before the first session.</p>
<p>Without a system, I was the same as everyone else: reading 10-20 technical sources a week, retaining maybe 10% within a few days, and losing the connections between ideas entirely. Bookmarks are where links go to die. My pocket / matter queue gets longer and longer and full of guilt.</p>
<p>So I built what I call a Signal Ledger. I drop sources into a conversation with Claude, and the LLM filters for signal relevant to my actual work. Not summaries. Distillations. The difference matters, and I'll come back to it.</p>
<h2>Write the Rules Down Before Session 1</h2>
<p>The single most important thing I did was write the rules down before processing a single source. I call it the contract, a document that tells the LLM exactly how to behave during every session.</p>
<p>The key clauses:</p>
<ul>
<li><p><strong>3-5 bullets per source, maximum.</strong> This forces compression. If the LLM can't distill a 3,000-word article into 3-5 bullets, it didn't understand the article.</p>
</li>
<li><p><strong>Lead with actionable signal, not interesting signal.</strong> "This is a cool finding" is not the same as "this changes how you should build X." Actionable comes first.</p>
</li>
<li><p><strong>Every entry gets a "so what for you" tied to an active project or theme.</strong> This is the part that makes the whole thing work. Every source has to connect to something I'm currently building or tracking. If there's no connection, the source might still be worth reading, but it's not signal for the ledger.</p>
</li>
<li><p><strong>Negative signal matters.</strong> "This article wasn't useful because X" is explicitly part of the output. It calibrates future sessions and stops me from wasting time on similar sources later.</p>
</li>
</ul>
<p>And then the distinction I mentioned: no summaries, only distillations. A summary tells you what the article said. A distillation tells you what it means for your work. A summary of a Stripe engineering post might say "Stripe built an internal coding agent that merges 1,300 PRs per week." A distillation says "Stripe's key insight is that their agent works because of infrastructure built for human engineers years before LLMs existed. The harness predates the model. Apply this to your own CI/CD before building agent tooling on top."</p>
<p>Without the contract, you get a glorified RSS reader. With it, you get an editor that applies judgment.</p>
<h2>Themes and Corroboration</h2>
<p>The core mechanism that makes this compound rather than accumulate is theme tracking with corroboration requirements.</p>
<p>Every signal gets classified into one of three tiers:</p>
<p><strong>Parking Lot.</strong> A single-source signal. One person said something interesting. Might be noise. Might be early signal. It sits here until corroborated.</p>
<p><strong>Active Theme (watch).</strong> Two sources from independent origins describing the same pattern. Worth tracking, not yet worth acting on.</p>
<p><strong>Active Theme (confirmed).</strong> Three or more independent sources converging. Signal you can write about, build on, or cite with confidence.</p>
<p>The independence requirement is the key. Two blog posts that both cite the same viral tweet aren't independent. They're amplification of one signal. Three practitioners at different companies, different tech stacks, describing the same architectural pattern without referencing each other? That's convergence.</p>
<p>Concrete example: I track a theme called "Harness Engineering" that started as a single mention in Session 2. One practitioner's blog post arguing that the infrastructure wrapping an LLM matters more than the model itself. Interesting, single source, into the parking lot.</p>
<p>By Session 29, it has 59 independent sources. Practitioners at Stripe, Meta, Vercel, and Anthropic. Academic researchers. Solo developers. Open-source tool authors. A VC investor. An Anthropic safety researcher. All independently converging on the same pattern.</p>
<p>I didn't go looking for that theme. It emerged from the data. The corroboration requirement forced me to notice it organically rather than cherry-pick evidence for something I already believed. And that's the real reason for counting sources and tracking independence: it forces intellectual honesty about what you actually know versus what you want to be true.</p>
<h2>What Compounding Looks Like in Practice</h2>
<p>After 29 sessions across roughly five weeks, here's what's different.</p>
<p>I've processed over 200 sources. I track 11 active themes with named mechanisms and cross-references between them. I have a parking lot of 15+ single-source signals being watched. I have a backlog of writing candidates where each one has 10+ sources of evidence assembled before I write a word.</p>
<p>The qualitative shift matters more. I stopped chasing individual articles and started seeing patterns across articles. A new source on AI code review lands differently when I already have 25 sources on the broader pattern it fits into. Processing takes less time as sessions progress because existing themes provide immediate context. "This is the 4th independent source on comprehension debt" is more useful than "here's another article about AI coding risks."</p>
<p><a href="https://blog.pjhoberman.com/autoresearch-60-experiments-production-search">My first published blog post after enacting this process</a> drew from sources tracked across 8 sessions. I didn't have to go find supporting material after deciding to write. The ledger had already assembled it.</p>
<p>I think the difference is between reading and research. Reading without a system is consumption. Each article exists in isolation, competes with everything else you read that week, and fades. With a compounding system, today's reading makes tomorrow's reading more valuable because it either reinforces, nuances, or contradicts something you're already tracking.</p>
<p>This is what compounding looks like for me:</p>
<table>
<thead>
<tr>
<th>Theme</th>
<th>Sources</th>
<th>First Appeared</th>
</tr>
</thead>
<tbody><tr>
<td>Harness Engineering</td>
<td>59</td>
<td>Session 2</td>
</tr>
<tr>
<td>Capability/Practice Gap</td>
<td>29</td>
<td>Session 1</td>
</tr>
<tr>
<td>Vibe Coding Risk</td>
<td>26</td>
<td>Session 3</td>
</tr>
<tr>
<td>BDD / Spec-First</td>
<td>16+</td>
<td>Session 3</td>
</tr>
<tr>
<td>Comprehension Debt</td>
<td>18</td>
<td>Session 17</td>
</tr>
<tr>
<td>Context Infrastructure</td>
<td>21</td>
<td>Session 4</td>
</tr>
<tr>
<td>Execution Layer &gt; Model Layer</td>
<td>19</td>
<td>Session 7</td>
</tr>
<tr>
<td>Autonomous Compounding Loop</td>
<td>17</td>
<td>Session 12</td>
</tr>
<tr>
<td>Third Era</td>
<td>13</td>
<td>Session 3</td>
</tr>
<tr>
<td>Persistent Agent Memory</td>
<td>11</td>
<td>Session 3</td>
</tr>
<tr>
<td>Code Review as Delivery Chokepoint</td>
<td>4</td>
<td>Session 16</td>
</tr>
</tbody></table>
<h2>What Doesn't Work</h2>
<p>The ledger gets unwieldy. Past roughly 50,000 words, appending to a single file causes performance issues. I added "theme docs" as a patch: standalone reference documents per theme that get rewritten when a theme evolves significantly. If you've built agent memory systems, this problem will sound familiar. Any append-only store needs a compaction mechanism eventually.</p>
<p>Confirmation bias is a real risk. When you have a named theme, you start seeing it everywhere. A source that vaguely touches the topic gets filed under the theme even when it's tangential at best. I added a monthly health check: a structured audit that reviews the entire ledger for contradictions, unsourced claims, stale themes, and drift between what the sources actually say and what my framing claims they say. The first one flagged two themes where I was stretching the source material. I wouldn't have caught it otherwise.</p>
<p>Not every session is high-signal. Some batches of 10 sources produce one actionable insight and nine entries of "this was noise." That's calibration, not waste. But it doesn't feel great in the moment.</p>
<p>The "so what for you" framing requires active projects. If you're not building anything, it collapses back into summarization. The system is for people who are working on things, not for passive readers.</p>
<h2>Steal This</h2>
<p>The specific tools don't matter. I use Obsidian and Claude, but the principles work with any LLM and any notes app. Here's what does matter:</p>
<p><strong>1. Write a contract before Session 1.</strong> Format, length limits, what counts as signal, how to handle noise. Write it down. The LLM should follow these rules every session without you re-explaining them. <a href="https://gist.github.com/pjhoberman/7b471b39b05f9858a8aa97951abcfd0b">Here's a version of the contract I use</a> - for it and make it your own.</p>
<p><strong>2. Require a "so what for your work" on every source.</strong> This kills the instinct to collect interesting things and replaces it with a filter for useful things. If a source can't be tied to something you're building, it might still be worth reading. It's just not signal.</p>
<p><strong>3. Track themes with corroboration requirements.</strong> Don't promote a signal to "something I believe" until three or more independent sources converge. Count the sources. Track whether they're independent. This is what separates a knowledge base from a collection of highlights.</p>
<p><strong>4. Log negative signal.</strong> "This wasn't useful because the author conflated two different patterns" teaches the system and you what to skip next time.</p>
<p><strong>5. Audit yourself.</strong> Monthly health checks. Drift detection. Review your parking lot. Stuff that's been sitting there for weeks without corroboration was probably noise.</p>
<p><strong>6. Each session should make the next session faster.</strong> If your 20th session takes the same effort as your 5th, you're accumulating, not compounding. Existing themes should provide context that accelerates processing. If they don't, something's off.</p>
<h2>Where I Am Now</h2>
<p>Karpathy said there's room for a product here. Maybe. But after 29 sessions, I think the product question is less interesting than the workflow question. A tool that summarizes your reading is an RSS reader with a language model. What changes the dynamic is enforcing corroboration requirements, tying every input to active projects, and auditing yourself for confirmation bias. Those are editorial decisions, not features.</p>
<p>The ledger now tracks connections I've forgotten and assembles evidence bases for things I haven't written yet. I don't read articles the same way I used to. I read them as potential entries in a system that's been building context for weeks.</p>
<p>That's the workflow. Tools are up to you.</p>
]]></content:encoded></item><item><title><![CDATA[I Ran 60 Autoresearch Experiments on a Production Search Algorithm. Here's What Actually Happened.]]></title><description><![CDATA[Everyone's writing about Karpathy's autoresearch. Most of it is "here's how the loop works" or "imagine the possibilities." I wanted to see what happens when you point it at a real codebase with a rea]]></description><link>https://blog.pjhoberman.com/autoresearch-60-experiments-production-search</link><guid isPermaLink="true">https://blog.pjhoberman.com/autoresearch-60-experiments-production-search</guid><category><![CDATA[autoresearch]]></category><category><![CDATA[claude.ai]]></category><category><![CDATA[claude-code]]></category><category><![CDATA[AI]]></category><category><![CDATA[Python]]></category><category><![CDATA[Open Source]]></category><dc:creator><![CDATA[PJ Hoberman]]></dc:creator><pubDate>Tue, 24 Mar 2026 02:37:38 GMT</pubDate><content:encoded><![CDATA[<hr />
<p>Everyone's writing about <a href="https://github.com/karpathy/autoresearch">Karpathy's autoresearch</a>. Most of it is "here's how the loop works" or "imagine the possibilities." I wanted to see what happens when you point it at a real codebase with a real metric, not a training script. I wanted to try it!</p>
<p>So I ran two rounds. 60 total iterations. The first round improved things. The second round found nothing - and that turned out to be even more interesting.</p>
<h2>The System</h2>
<p>I work on a hybrid search system: Cohere embeddings in pgvector for semantic similarity, then a keyword re-ranking layer on top. Django, PostgreSQL, Bedrock. The kind of search stack a lot of teams are probably running right now.</p>
<p>The ranking logic lives in one file: <code>utils.py</code>. It takes the top 100 vector search candidates, scores them on keyword and tag matches across location, activity, and general terms, normalizes everything with z-scores, applies adaptive correlation-based weighting to avoid double-counting, and combines it all into a final score: <code>similarity * (1 + keyword_boost)</code>.</p>
<p>There are a lot of knobs. Base weights for three query types. A scoring formula for body keyword matches. Z-score clipping bounds. A correlation shrinkage function. The final combination formula. All hand-tuned. All "seems about right."</p>
<p>Perfect autoresearch target.</p>
<h2>The Setup</h2>
<p>The autoresearch pattern is simple: <strong>one file, one metric, one loop</strong>. The agent edits the file, runs the eval, keeps improvements, reverts failures, repeats.</p>
<p>Here's what I set up:</p>
<p><strong>The constrained file:</strong> <code>utils.py</code> — ranking logic only. The embedding service, query metadata extraction, database schema — all frozen.</p>
<p><strong>The metric:</strong> A composite score: 80% Precision@12 (how many of the top 12 results are actually relevant) and 20% MRR (is the best result near the top). I weighted it this way because MRR was already at 0.975 — almost every query already had the right #1 result. The room to improve was in the rest of the top 12.</p>
<p><strong>The test set:</strong> 20 queries across three types (location, activity, general) with hand-labeled expected results. Things like "best hiking trails near Aspen Colorado," "beginner backpacking gear list," "avalanche safety backcountry skiing." I ran each query, looked at the top 50 results, and picked the ones that actually answered the question.</p>
<p><strong>The eval caching trick:</strong> Each query hits Bedrock twice (query metadata + embedding). That's 15 seconds per query. But the agent only modifies the <em>ranking</em> logic — the embeddings and metadata don't change between iterations. So I cached all the API results on the first run and monkey-patched them in on subsequent runs. Eval went from 6 minutes to about 30 seconds.</p>
<p>I wrote an <code>instructions.md</code> that told Claude Code exactly what it could touch, what it couldn't, and what strategies to try in roughly what order. Here's the skeleton:</p>
<pre><code class="language-markdown">## The Constrained File
- `src/service/utils.py` — ONLY file you may edit

## What You Cannot Modify
- eval script, test queries, embedding service, cache files, schema

## Eval
- `uv run manage.py run_autoresearch_eval`
- SCORE = 0.8 * Precision@12 + 0.2 * MRR

## Strategy Guidance (roughly in this order)
1. Quick wins: base weights, pool size, zclip range
2. Scoring function: damping, formula shape
3. Weight optimization: fine-tune per query type
4. Experimental: combine best ideas

## Do NOT
- Add API calls or new dependencies
- Edit frozen files
- Spend 3+ iterations on the same dead approach
</code></pre>
<p>Then I walked away. Literally, I went and played with my kids.</p>
<h2>Round 1: The Results</h2>
<p>Baseline: <strong>0.6933</strong> composite (P@12: 0.6292, MRR: 0.9500)</p>
<p>Final: <strong>0.7200</strong> composite (P@12: 0.6500, MRR: 1.0000)</p>
<p>44 iterations. 3 kept. 41 reverted.</p>
<p>Let that sink in: <strong>93% of experiments made things worse or changed nothing.</strong></p>
<p>Here are the three changes that survived:</p>
<p><strong>1. Bigger base weights, scaled by query type.</strong> Location queries got 5x the original weight. Activity queries 3x. General 2x. The system had been under-weighting the keyword signals relative to the embedding similarity.</p>
<p><strong>2. Exponential scoring formula.</strong> Swapped <code>(1-d) * (1+boost)</code> for <code>(1-d) * exp(boost*0.3)</code>. Better separation between boosted and unboosted items. This also fixed the one query where MRR wasn't perfect.</p>
<p><strong>3. Higher general weights.</strong> Pushed 5x on the general query type weights specifically, which improved "best hikes in the world" from P@12 0.667 to 0.750.</p>
<p>None of these are surprising in hindsight. That's kind of the point.</p>
<h2>What Didn't Work (the Actually Useful Part)</h2>
<p>This is where the value is.</p>
<p><strong>Bigger candidate pools don't help.</strong> I expected that going from 100 to 150 or 200 re-ranking candidates would surface articles that were just barely outside the original pool. Nope. The expected articles were already in the top 100 by vector distance. The problem was ranking, not recall.</p>
<p><strong>Title matching is noise.</strong> Seemed like a slam dunk — articles with query terms in the title should rank higher, right? In practice, tons of <em>irrelevant</em> articles also have those terms in their titles. Net negative.</p>
<p><strong>Disabling adaptive weighting hurts.</strong> The correlation shrinkage I'd built in (reduce keyword weight when keywords correlate with embedding similarity) was actually pulling its weight. Removing it caused regressions.</p>
<p><strong>Keyword density scoring backfires.</strong> Normalizing keyword counts by article length seemed smart. It wasn't. Shorter articles aren't more relevant — they just have fewer words.</p>
<p><strong>Body keyword damping doesn't matter.</strong> Whether you use <code>1 + log1p(count) * 0.5</code> or <code>1 + log1p(count) * 0.3</code> or <code>min(count, 3)</code> — the scores barely move. The exact damping formula is not where the signal is.</p>
<p>Each of these would have been a reasonable thing to try manually. And each would have taken 15-30 minutes of "change, test, evaluate, think about it, decide." The agent burned through all of them in a few hours and proved definitively that they're dead ends.</p>
<p>And fiddling with just the weights would have taken forever if I even bothered going that far. Instead, we basically have LLM-led gradient descent:</p>
<img alt="PR diff showing the tunings" style="display:block;margin:0 auto" />

<h2>Round 2: Optimizing the Prompt (and Finding Nothing)</h2>
<p>Round 1's final log said the ceiling was upstream — the quality of keywords extracted by Claude Haiku from the user's query. So I ran a second round targeting the Haiku prompt in <code>embedding_service.py</code>. Same test set, same metric, Round 1's ranking changes frozen.</p>
<p>16 iterations. Zero improvements. But two findings that were worth the entire round:</p>
<p><strong>The Redis trap.</strong> The metadata extraction function caches results by <code>hash(query)</code> — not <code>hash(query + prompt)</code>. My first two iterations showed improvements that weren't real. The eval was reading cached metadata from the old prompt. I only caught it when I cleared the cache manually and the "improvements" vanished. If you're running autoresearch on anything with a caching layer, make sure the cache key includes everything that could change between iterations.</p>
<p><strong>The co-optimization ceiling.</strong> Round 1 tuned the ranking weights to work with the specific metadata distribution the original prompt produces. Changing the prompt changes that distribution, and the frozen ranking can't adapt. Every prompt change that improved location queries degraded activity queries. The two components were coupled, and optimizing them sequentially hit a wall that optimizing them together wouldn't have.</p>
<p>This is the thing nobody mentions about autoresearch: <strong>sequential rounds have a structural ceiling.</strong> Round 1 overfits to the current state of the frozen components. Round 2 can't improve those components without undoing Round 1's gains. If you're planning multi-round autoresearch, either co-optimize both components in one round, or know that each round's ceiling will be lower than the last.</p>
<h2>Where the Ceiling Is</h2>
<p>After 60 iterations across two rounds, the score settled at 0.72. The ranking math is near-optimal. The prompt is at a Pareto boundary. The remaining weak queries are ones where the right articles are far away in embedding space — "bike packing routes Pacific Northwest" returns road trip content because the embeddings think those are similar.</p>
<p>The next improvement needs a cross-encoder re-ranker or a better embedding model. That's a different project, not a different autoresearch run.</p>
<h2>Was It Worth It?</h2>
<p>For the ranking improvements alone? Probably not. A +0.03 on a composite score is real but marginal.</p>
<p>For the <em>knowledge</em>? Absolutely. I now know, with 60 data points, that my ranking logic was already close to optimal, that the adaptive weighting I built actually works, that keywords are essentially decorative in this system (the embeddings do all the real work), that the Redis cache doesn't key on prompt changes, and that my next improvement has to come from the embedding layer.</p>
<p>I would not have arrived at any of that from manual tuning. I would have tried 8-10 things, gotten frustrated, and moved on with lingering uncertainty.</p>
<p>The autoresearch pattern works best not when it finds big wins, but when it maps the ceiling of a system. "You can stop tuning this" is an underrated finding.</p>
<h2>If You Want to Try This</h2>
<p>You don't need GPUs. You don't need an ML training loop. You need:</p>
<ol>
<li><p><strong>One file</strong> the agent can edit</p>
</li>
<li><p><strong>One metric</strong> that goes up when things get better</p>
</li>
<li><p><strong>A fast eval</strong> (cache everything that doesn't change between iterations)</p>
</li>
<li><p><strong>An</strong> <code>instructions.md</code> that tells the agent the rules</p>
</li>
</ol>
<p>Write the eval first. Label some test data. Cache your API calls. Then let it run.</p>
<p>I've open-sourced the skill I built from this experiment as a Claude Code plugin: <a href="https://github.com/pjhoberman/autoresearch">pjhoberman/autoresearch</a>. It generates the full experiment harness (instructions, eval script, test data template, launch prompt) scoped to your codebase. The <code>references/lessons.md</code> file has everything I learned from both rounds.</p>
<p>The hard part isn't the loop. It's writing an eval that actually measures what you care about.</p>
]]></content:encoded></item><item><title><![CDATA[I Built a Local-First HSA Receipt Tracker with Flask, Google Drive, and AI]]></title><description><![CDATA[A tiny local python app that turns HSA receipts into structured records in Google Drive and Sheets in about five seconds.
Last year my family switched to a High Deductible Health Plan (HDHP) with a He]]></description><link>https://blog.pjhoberman.com/local-first-hsa-receipt-tracker-flask-ai</link><guid isPermaLink="true">https://blog.pjhoberman.com/local-first-hsa-receipt-tracker-flask-ai</guid><category><![CDATA[Python]]></category><category><![CDATA[automation]]></category><category><![CDATA[AI]]></category><category><![CDATA[Productivity]]></category><dc:creator><![CDATA[PJ Hoberman]]></dc:creator><pubDate>Mon, 09 Mar 2026 16:30:00 GMT</pubDate><content:encoded><![CDATA[<p>A tiny local python app that turns HSA receipts into structured records in Google Drive and Sheets in about five seconds.</p>
<p>Last year my family switched to a <strong>High Deductible Health Plan (HDHP)</strong> with a <strong>Health Savings Account (HSA)</strong>.</p>
<p>HSAs are unusual in the U.S. tax system because they’re <strong>triple tax advantaged</strong>:</p>
<ul>
<li><p>contributions are <strong>tax-deductible</strong></p>
</li>
<li><p>investments grow <strong>tax-free</strong></p>
</li>
<li><p>withdrawals for qualified medical expenses are <strong>tax-free</strong></p>
</li>
</ul>
<p>The Bogleheads wiki has a great explanation if you're curious:</p>
<p><a href="https://www.bogleheads.org/wiki/Health%5C_savings%5C_account">https://www.bogleheads.org/wiki/Health\_savings\_account</a></p>
<p>Because of those advantages, some people treat their HSA as a <strong>long-term investment account</strong> instead of reimbursing medical expenses right away. Almost everyone I tell about this has a 🤯 moment because they’ve never heard of it. So, if you're one of those lucky people to learn about it today, you're welcome!</p>
<blockquote>
<p>Your wallet loves this one weird trick!</p>
</blockquote>
<p>The strategy looks like this:</p>
<ol>
<li><p>Contribute to the HSA</p>
</li>
<li><p>Invest the money</p>
</li>
<li><p>Pay medical expenses out-of-pocket</p>
</li>
<li><p><strong>Save the receipts</strong></p>
</li>
<li><p>Reimburse yourself years (or decades) later, or don't</p>
</li>
<li><p>Profit</p>
</li>
</ol>
<p>There’s <strong>no deadline to reimburse yourself</strong> as long as:</p>
<ul>
<li><p>the expense happened after the HSA was opened</p>
</li>
<li><p>you kept documentation</p>
</li>
</ul>
<p>That last part is the problem.</p>
<p>Here’s what the result looks like after uploading a receipt:</p>
<img src="https://dev-to-uploads.s3.amazonaws.com/uploads/articles/jsez9q048vu4we210x0w.png" alt="HSA receipt tracker sheet" style="display:block;margin:0 auto" />

<p>Every receipt automatically becomes a structured record. Each row links back to the original receipt stored in Google Drive.</p>
<table>
<thead>
<tr>
<th>Date</th>
<th>Vendor</th>
<th>Amount</th>
<th>Receipt</th>
</tr>
</thead>
<tbody><tr>
<td>2026-02-14</td>
<td>Quest Diagnostics</td>
<td>$87.43</td>
<td>Drive Link</td>
</tr>
<tr>
<td>2026-02-03</td>
<td>Walgreens</td>
<td>$14.29</td>
<td>Drive Link</td>
</tr>
<tr>
<td>2026-01-19</td>
<td>Dentist</td>
<td>$120.00</td>
<td>Drive Link</td>
</tr>
</tbody></table>
<hr />
<h2>The Receipt Problem</h2>
<p>If you're saving receipts for potential reimbursement later, you need to keep track of:</p>
<ul>
<li><p>provider</p>
</li>
<li><p>date</p>
</li>
<li><p>amount</p>
</li>
<li><p>proof of payment</p>
</li>
<li><p>the original receipt</p>
</li>
</ul>
<p>Manually this usually turns into:</p>
<ul>
<li><p>PDFs scattered across downloads folders</p>
</li>
<li><p>random email attachments</p>
</li>
<li><p>a spreadsheet you forget to update</p>
</li>
</ul>
<p>I was doing this manually with Google Drive and a spreadsheet, and it wasn't <em>too</em> hard, but... I wanted something simpler.</p>
<p>Drop in a receipt → have everything filed automatically.</p>
<hr />
<h2>The Idea</h2>
<p>I built a tiny <strong>local-first web app</strong> that:</p>
<ol>
<li><p>accepts a medical receipt PDF</p>
</li>
<li><p>extracts useful fields automatically</p>
</li>
<li><p>stores the document in Google Drive</p>
</li>
<li><p>logs the expense in Google Sheets</p>
</li>
</ol>
<p>The whole flow takes about <strong>5 seconds of actual work</strong>.</p>
<hr />
<h2>Demo</h2>
<p>Here’s the entire workflow in real time (~5 seconds of actual work - the rest is me moving slowly for the video's sake):</p>
<p>{% embed <a href="https://vimeo.com/1171880979?share=copy&amp;fl=sv&amp;fe=ci">https://vimeo.com/1171880979?share=copy&amp;fl=sv&amp;fe=ci</a> %}</p>
<p>Upload a receipt → parsed → stored → logged.</p>
<hr />
<h2>The Stack</h2>
<p>The app is intentionally small:</p>
<ul>
<li><p><strong>Flask</strong> – local web app</p>
</li>
<li><p><strong>pdfplumber</strong> – extract text from receipts</p>
</li>
<li><p><strong>OpenAI (optional)</strong> – prefill receipt fields</p>
</li>
<li><p><strong>Google Drive API</strong> – store receipts</p>
</li>
<li><p><strong>Google Sheets API</strong> – expense log</p>
</li>
</ul>
<p>The workflow looks like this:</p>
<p><code>receipt → upload → text extraction → AI field parsing → Drive storage → Sheets entry</code></p>
<hr />
<h2>What Happens When You Upload a Receipt</h2>
<p>When I submit a receipt, the app:</p>
<ol>
<li><p>saves the uploaded PDF locally</p>
</li>
<li><p>computes a <strong>SHA-256 hash</strong> to detect duplicates</p>
</li>
<li><p>extracts text using <code>pdfplumber</code></p>
</li>
<li><p>optionally asks the model to extract fields like:</p>
</li>
</ol>
<ul>
<li><p>vendor</p>
</li>
<li><p>service date</p>
</li>
<li><p>amount</p>
</li>
<li><p>payment date</p>
</li>
<li><p>payment method</p>
</li>
<li><p>notes</p>
</li>
</ul>
<ol>
<li><p>creates (or reuses) a Google Drive folder for the month:</p>
<p><code>2026-03/</code></p>
</li>
<li><p>uploads the receipt there</p>
</li>
<li><p>appends a row to Google Sheets with a link to the file</p>
</li>
</ol>
<p>Example spreadsheet row:</p>
<table>
<thead>
<tr>
<th>Date</th>
<th>Vendor</th>
<th>Amount</th>
<th>Receipt</th>
</tr>
</thead>
<tbody><tr>
<td>2026-02-14</td>
<td>Quest Diagnostics</td>
<td>$87.43</td>
<td>Drive Link</td>
</tr>
</tbody></table>
<p>Now every expense has:</p>
<ul>
<li><p>structured data</p>
</li>
<li><p>the original document</p>
</li>
<li><p>a searchable log</p>
</li>
</ul>
<hr />
<h2>One Small Feature That Ended Up Being Useful</h2>
<p>Duplicate detection.</p>
<p>Each uploaded file is hashed and stored in a local <code>receipt_hashes.json</code>.</p>
<p>If I accidentally upload the same receipt twice, the app can catch it before cluttering the spreadsheet or Drive folder.</p>
<p>Tiny detail, but it prevents a lot of mess.</p>
<hr />
<h2>Why Local-First?</h2>
<p>This could easily be a SaaS product.</p>
<p>But for personal admin tools, local-first is usually better:</p>
<ul>
<li><p>no extra account to maintain</p>
</li>
<li><p>easier to hack on</p>
</li>
<li><p>my files stay in my own Google account</p>
</li>
<li><p>no database or infrastructure to run</p>
</li>
</ul>
<p>I also set it up as a <strong>macOS</strong> <code>launchd</code> <strong>service</strong>, so it’s just always available on my laptop.</p>
<p>The goal wasn’t to build some big product. It was to remove just enough friction that I’d actually keep my receipts organized.</p>
<p>Now the workflow is:</p>
<p>Download receipt ↓ Upload ↓ Done</p>
<p>Which is about the amount of effort I’m realistically willing to spend on personal finance paperwork.</p>
<hr />
<h2>Repo</h2>
<p>If you're interested:</p>
<p><a href="https://github.com/pjhoberman/hsa-tracker">https://github.com/pjhoberman/hsa-tracker</a></p>
<hr />
<p>If you’re using the <strong>"save receipts and reimburse later"</strong> HSA strategy and built something similar, I’d love to hear about it.</p>
<p>Or if you have a better way to automate this - please tell me before I add OCR and accidentally build an entire product.</p>
]]></content:encoded></item><item><title><![CDATA[Using Claude to manage.. me]]></title><description><![CDATA[I have a problem.
I like to do a lot of things at once, and I like shiny objects. I can easily get caught up in some novel task, while tabbing through four docs I'm supposed to review, waiting for tes]]></description><link>https://blog.pjhoberman.com/using-claude-to-manage-my-priorities</link><guid isPermaLink="true">https://blog.pjhoberman.com/using-claude-to-manage-my-priorities</guid><category><![CDATA[AI]]></category><category><![CDATA[automation]]></category><category><![CDATA[Productivity]]></category><category><![CDATA[tooling]]></category><category><![CDATA[claude.ai]]></category><dc:creator><![CDATA[PJ Hoberman]]></dc:creator><pubDate>Fri, 22 Aug 2025 16:30:00 GMT</pubDate><content:encoded><![CDATA[<p>I have a problem.</p>
<p>I like to do a lot of things at once, and I like shiny objects. I can easily get caught up in some novel task, while tabbing through four docs I'm supposed to review, waiting for tests to pass on CI.</p>
<p>And I work across a few teams and projects.</p>
<p>And I like to learn new things.</p>
<p>And and and, sometimes I have trouble remembering what my priorities are.</p>
<p>Recently, I thought I'd try Claude to help manage me. My various priorities, projects, docs to review, PRs in flight, all of that.</p>
<h2>The Setup</h2>
<p>Using Claude Desktop, <a href="https://linear.app/docs/mcp">I hooked it up to Linear via MCP</a>. I <a href="https://www.anthropic.com/news/projects">started a project in Claude</a> just for this effort. And then I told it all the things I'm working on, with links where applicable.</p>
<p>I started off with project instructions:</p>
<blockquote>
<p>You help me remember what i'm working on, the work I owe people, etc. You should use the MCP connection to Linear to find tasks and evaluate my work and the work my team is doing as well.</p>
</blockquote>
<p>Then I told it all the docs I need to review, and asked it to look at Linear as well.</p>
<img src="https://dev-to-uploads.s3.amazonaws.com/uploads/articles/xlky7pc8co2i1yi2t1wg.jpg" alt="Screenshot from chat with Claude" style="display:block;margin:0 auto" />

<p>Then I told it about a side project and fed it a bunch of links related to it:</p>
<blockquote>
<p>I'm working on some side work called ChoreBot. Keep track of that as a separate project. As part of that work, I have these tabs open:</p>
</blockquote>
<ul>
<li><em>Imagine 15 tabs I had open</em></li>
</ul>
<p>I told it about other tasks, tabs, projects, and general work on my plate.</p>
<p>Finally, I had to summarize everything and create a document I could add as an artifact for the project.</p>
<p>Later, I worked with the project to give some prioritization to each task and project.</p>
<img src="https://dev-to-uploads.s3.amazonaws.com/uploads/articles/kwuz4jfww0ka4u925qqj.png" alt="Another conversation with Claude" style="display:block;margin:0 auto" />

<p>And again, I had Claude turn the results of this exercise into a document I could add to the project.</p>
<h2>The End Result</h2>
<p>Now, the benefit of this setup. When new tasks come up, I let Claude know. And for example, today, a Friday, nearing the end of the month, I can do something like this:</p>
<p>Me:</p>
<blockquote>
<p>It's Friday, I want to work on something fun but impactful. What's on the agenda?</p>
</blockquote>
<p>Claude:</p>
<blockquote>
<p>Let me check your current Linear tasks to see what's on your plate and then suggest something fun but impactful for your Friday!Perfect! Looking at your Linear tasks and comparing with your priority documents, I can see some great options for fun but impactful Friday work. Let me suggest a few based on what you have:</p>
</blockquote>
<blockquote>
<h3>🎯 <strong>Most Fun &amp; Impactful Options for Friday:</strong></h3>
</blockquote>
<blockquote>
<p><strong>1. ChoreBot Development</strong> (Your baby! 🤖)</p>
</blockquote>
<ul>
<li><p>This isn't in Linear yet but it's one of your August commitments</p>
</li>
<li><p>High impact AI DevOps automation - exactly the kind of cutting-edge work that's both fun and strategically valuable</p>
</li>
<li><p>You could work on the POC, maybe starting with the monitoring phase (Sentry/Datadog integration)</p>
</li>
</ul>
<blockquote>
<p><strong>2. DISCO-3746: AWS EventBridge Spam Detection</strong> (1 point, High Priority)</p>
</blockquote>
<ul>
<li><p>DevOps + AWS work - technical and satisfying</p>
</li>
<li><p>Setting up EventBridge rules for spam detection sounds like a fun infrastructure challenge</p>
</li>
<li><p>Quick win that unblocks the UGC team</p>
</li>
</ul>
<blockquote>
<p><strong>3. DISCO-3786: Redis Connection Pools</strong> (High Priority, no estimate)</p>
</blockquote>
<ul>
<li><p>Infrastructure optimization work - always satisfying to see performance improvements</p>
</li>
<li><p>Builds on your Redis expertise from the vector DB research</p>
</li>
<li><p>Could tie into your ChoreBot work later</p>
</li>
</ul>
<blockquote>
<h3>🤔 <strong>What I Notice:</strong></h3>
</blockquote>
<blockquote>
<ul>
<li>You've been <strong>crushing</strong> the feed digest work! Tons of releases this week (DISCO-3791, 3790, 3787, etc.)</li>
</ul>
</blockquote>
<ul>
<li><p>Your package update tasks (DISCO-3705, 3785) keep getting pushed - might be worth tackling one of these for a maintenance win</p>
</li>
<li><p>The cron monitoring script (DISCO-3779) is "In Progress" - could be a good Friday finish</p>
</li>
</ul>
<blockquote>
<h3>🚀 <strong>My Recommendation:</strong></h3>
</blockquote>
<blockquote>
<p><strong>Start ChoreBot POC development</strong> - it's not urgent deadline-wise, but it's your most strategic and fun project. You could:</p>
</blockquote>
<ol>
<li><p>Set up the basic monitoring integrations (Sentry/Datadog webhooks)</p>
</li>
<li><p>Create the Linear ticket auto-creation logic</p>
</li>
<li><p>Maybe even start on some MCP protocol research</p>
</li>
</ol>
<blockquote>
<p>This hits your "fun but impactful" criteria perfectly and aligns with your August commitments!</p>
</blockquote>
<blockquote>
<p>What sounds appealing to you? Want to dive into ChoreBot, or would you prefer one of the more immediate Linear tasks?</p>
</blockquote>
<h2>Conclusion</h2>
<p>We'll see how long I use this and if it is the silver bullet I need to organize my sprawling task lists, but so far, I like it! It seems to bridge the gap between a bunch of todo lists and linear tasks and what my former self thought each priority entailed.</p>
<p>What do you think? Have you used Chat like for something like this before?</p>
]]></content:encoded></item><item><title><![CDATA[Changing how we do standups]]></title><description><![CDATA[This meeting needs a refactor.

Our team used to do the daily standup. Everyone talks about what they did yesterday, what they're doing today, and any blockers. Everyone tunes out except for their own]]></description><link>https://blog.pjhoberman.com/refactoring-the-daily-standup</link><guid isPermaLink="true">https://blog.pjhoberman.com/refactoring-the-daily-standup</guid><category><![CDATA[leadership]]></category><category><![CDATA[Culture]]></category><category><![CDATA[workplace]]></category><category><![CDATA[meetings]]></category><dc:creator><![CDATA[PJ Hoberman]]></dc:creator><pubDate>Mon, 11 Nov 2024 17:30:00 GMT</pubDate><content:encoded><![CDATA[<blockquote>
<p>This meeting needs a refactor.</p>
</blockquote>
<p>Our team used to do the daily standup. Everyone talks about what they did yesterday, what they're doing today, and any blockers. Everyone tunes out except for their own status update. The product manager and the tech lead pay some attention. Maybe once in a week or two, something interesting is said. 🥱</p>
<p>During a retro in the midst of a busy time for our team, we surfaced that we had too many meetings. And we're not alone in this. It was just too much with the looming deadlines. <a href="https://www.fastcompany.com/90888605/shopify-exec-this-is-what-happened-when-we-canceled-all-meetings">Shopify went extreme on dropping meetings a few years back</a>. The daily standup was one of many meetings we talked about, and everyone had the same feeling: <strong>this meeting needed a refactor</strong>.</p>
<p>We discussed what the actual intent of our daily standup was, and how we could preserve that while also reducing the pressure of meetings. Here's what we came up with:</p>
<p><strong>Daily Standup Goals</strong></p>
<ol>
<li><p>The product manager needs to <strong>know the status of certain tasks</strong> beyond what Linear reports. Not all tasks, not every day, but it's important for them to have their finger on the pulse.</p>
</li>
<li><p>The tech lead also needs to get a sense for people <strong>spinning their wheels</strong>, working on things for longer than expected, and how <strong>projects are moving along</strong>.</p>
</li>
<li><p><strong>People need help getting unblocked</strong>. A topic for another time is how long to try to unblock yourself before asking for help. But, our team is generally good at reaching out outside of meetings. However, sometimes we need a dedicated time and place to make sure a block is in front of everyone.</p>
</li>
<li><p>We're a fully remote team. We need <strong>face time</strong>. We need to bond and build / maintain team <strong>culture</strong>. But we don't want more meetings.</p>
</li>
</ol>
<p><strong>Our Team's Solution</strong></p>
<ol>
<li><p>Daily SlackBot async standup. A bot sends a message to our team handle in slack, and we respond with <strong>Y</strong>esterday, <strong>T</strong>oday, and <strong>B</strong>lockers. Sometimes it's just a status update, sometimes it's a funny quip about our kids being blockers. Conversation sometimes ensues. Other times, it doesn't. And that's perfect.</p>
</li>
<li><p>On Mondays, Wednesdays, and Fridays, we have a 15 minute team sync. This is meant to be a chat about what we're working on that's interesting. Blockers that we want to dive deep on or just vent. And sometimes it's just a coffee chat, and we talk about plans for the weekend or some cool new toy.</p>
</li>
</ol>
<p>This dropped a mere 30 minutes of meetings from the week, but we don't dread the remaining 45 minutes anymore. The time is useful for everyone.</p>
<p>All our goals are met, we have fewer standup meetings, and the meetings we do have show much higher engagement and enjoyment. Win win win!</p>
<p>What do you think? Do you find the daily stand up useful? Who is it for in your team? How do you build and reinforce culture on your remote team?</p>
]]></content:encoded></item><item><title><![CDATA[Finding Unused Indexes in Postgres]]></title><description><![CDATA[Database indexes are incredibly important in production systems. Single column ones are obvious, but multi-column (composite) indexes are game-changers for speeding up commonly used queries by several]]></description><link>https://blog.pjhoberman.com/find-unused-indexes-postgres</link><guid isPermaLink="true">https://blog.pjhoberman.com/find-unused-indexes-postgres</guid><category><![CDATA[PostgreSQL]]></category><category><![CDATA[backend]]></category><category><![CDATA[indexing]]></category><dc:creator><![CDATA[PJ Hoberman]]></dc:creator><pubDate>Wed, 18 Sep 2024 16:30:00 GMT</pubDate><content:encoded><![CDATA[<p>Database indexes are incredibly important in production systems. Single column ones are obvious, but multi-column (composite) indexes are game-changers for speeding up commonly used queries by several orders of magnitude.</p>
<p>But sometimes, we (I) create a few different indexes over time, and it's not always obvious when to remove old indexes. Indexes take up space and slow down writes, so it's important to monitor and clean them up periodically.</p>
<p>Here is a postgres query to give you some insight:</p>
<pre><code class="language-sql">SELECT 
    relname AS table_name, 
    indexrelname AS index_name, 
    idx_scan AS index_scans, 
    idx_tup_read AS tuples_read, 
    idx_tup_fetch AS tuples_fetched
FROM 
    pg_stat_user_indexes
JOIN 
    pg_index ON pg_stat_user_indexes.indexrelid = pg_index.indexrelid
WHERE 
    schemaname = 'public'
ORDER BY 
    idx_scan DESC;
</code></pre>
<p>The output will be the table name, index name, and some data about each index including how many times the index was used in a query, how many tuples (index rows in this case) were read from the index, and how many tuples were actually fetched after all filtering was complete.</p>
<p>The output will contain data since the server was last restarted or the statistics were last reset. Here is some example output from a production server. I removed a bunch of rows to show heavily used indexes and some that aren't used at all:</p>
<pre><code class="language-plaintext"> table_name    |          index_name         | index_scans | tuples_read | tuples_fetched 
---------------+-----------------------------+-------------+-------------+----------------
 items         | items_pkey                  | 17566068467 | 22444742841 | 21762928697
 routes        | routes_item_id_key          |  4046022477 |  2541792837 | 2521785009
 items         | items_url_idx               |  1520426292 |  7556543480 | 1518612148
 authors       | authors_pkey                |   211481111 |    45577051 |   42726045
 logs          | logs_type_coord_uniq        |     6437114 |     1462603 |     1392484
 spatial_ref   | spatial_ref_pkey            |     2060726 |    13792886 |     2056566
 users         | users_pkey                  |     1872578 |     2214935 |     1872578
 ...
 rate_limits   | rate_limit_key_like         |           0 |           0 |           0
 blocks        | blocks_pkey                 |           0 |           0 |           0
 blocks        | blocks_uniq                 |           0 |           0 |           0
</code></pre>
<p>As you can see, some of these indexes are never used! I have some work ahead of me now: manually review these indexes and potentially remove unused ones to save on space and improve write performance. Eventually, I'd like to set up automated alerts to let me know that some indexes aren't being used at all.</p>
<p>Let me know if this is helpful for you or if you've gone further and automated anything like this!</p>
]]></content:encoded></item><item><title><![CDATA[LaunchDarkly across multiple celery tasks]]></title><description><![CDATA[We were running into an issue recently where LaunchDarkly wasn't evaluating on celery servers. Opening a python shell on the boxes showed the keys were setup correctly, and that if we called the metho]]></description><link>https://blog.pjhoberman.com/launchdarkly-celery-fork-worker-init</link><guid isPermaLink="true">https://blog.pjhoberman.com/launchdarkly-celery-fork-worker-init</guid><category><![CDATA[Python]]></category><category><![CDATA[backend]]></category><category><![CDATA[celery]]></category><category><![CDATA[launchdarkly]]></category><dc:creator><![CDATA[PJ Hoberman]]></dc:creator><pubDate>Fri, 21 Jul 2023 16:30:00 GMT</pubDate><content:encoded><![CDATA[<p>We were running into an issue recently where LaunchDarkly wasn't evaluating on celery servers. Opening a python shell on the boxes showed the keys were setup correctly, and that if we called the methods directly, the flags evaluated as expected. However, when called as a delayed task, the flags weren't evaluating.</p>
<p>LaunchDarkly makes use of <code>fork</code> in python, and requires that only one LaunchDarkly client instance exists. <a href="https://stonesoupprogramming.com/2017/08/26/python-basic-forking/">This post</a> is a good primer on forking in python. It appeared that this was the issue.</p>
<p>My coworker Doug explained it thusly:</p>
<blockquote>
<p>The LaunchDarkly library makes use of threading, Celery starts the main "control" process that uses <code>fork()</code> to start <code>n</code> workers, based on your concurrency setting.</p>
</blockquote>
<blockquote>
<p>Forking copies the current process into a new one, but in Python it kills off all but the thread doing the forking. All others are stopped. So the threads that LaunchDarkly starts up during initialization (e.g., EventDispatcher or StreamingUpdateProcessor) end up "defunct" or unpredictable.</p>
</blockquote>
<blockquote>
<p>The Locks used within LaunchDarkly are thread independent, but because threads are killed off in the child, you end up with an invalid state and can’t trust things will work.</p>
</blockquote>
<p>Further, <a href="https://docs.launchdarkly.com/sdk/server-side/python#configuring-uwsgi">LaunchDarkly recommends a post fork hook to initialize the client</a>.</p>
<pre><code class="language-plaintext">import uwsgidecorators

@uwsgidecorators.postfork
def post_fork_client_initialization():
    ldclient.set_config(LDConfig("sdk-key-123abc"))
    client = ldclient.get()
end
</code></pre>
<p>However, our Django application uses <code>asgi</code>, which doesn't currently have this hook. This is our current LaunchDarkly configuration <code>launch_darkly.py</code>:</p>
<pre><code class="language-python3">import atexit
import sys

from django.conf import settings

"""
Sets up Launch Darkly for use across the site.
LD is already initialized. See discovery_service.views.ld_check for example usage.
"""


class LDClient():
    def __getattr__(self, v):
        if 'ldclient' in sys.modules:
            import ldclient
        else:
            import ldclient
            from ldclient.config import Config
            ldclient.set_config(Config(settings.LAUNCH_DARKLY_SDK_KEY))
        return getattr(ldclient.get(), v)


ld_client = LDClient()

@atexit.register
def close_ld(*args, **kwargs):
    # LD recommends closing upon app shutdown
    # https://docs.launchdarkly.com/sdk/server-side/python
    ld_client.close()
</code></pre>
<p>The <code>LDClient</code> class allows us to ignore new instantiations of the <code>ldclient</code> library if it's already been loaded.</p>
<p>And the general use is:</p>
<pre><code class="language-python3">from launch_darkly import ld_client

def flagged_code():
    flag = ld_client.variation("flag-name", {"key": 12345}, False)  # False is the default in this case
    if flag:
        // do something if the flag is on
    else:
        // do something else if the flag is off
</code></pre>
<p>After a lot of bashing through walls, aka iterative development, we discovered two things:</p>
<h5>1. Module Level instantiation</h5>
<p>There was a module-level instantiation of the LaunchDarkly client that was causing the library to initialize before the fork.</p>
<p>Basically, the above code, but instead:</p>
<pre><code class="language-python3">from launch_darkly import ld_client

flag = ld_client.variation("flag-name", {"key": 12345}, False)      # False is the default in this case


def flagged_code():

    if flag:
        // do something if the flag is on
    else:
        // do something else if the flag is off
</code></pre>
<p>So that code was removed / refactored.</p>
<h5>2. Celery initialization</h5>
<p>In our <code>celery.py</code> code, we added a <code>worker_process_init</code> hook to initialize the library properly. This ensures that when the celery workers fork, there is definitely a ldclient ready to go for any code that requires it.</p>
<pre><code class="language-python3">@worker_process_init.connect
def configure_worker(signal=None, sender=None, **kwargs):
    """Initialize the Launch Darkly client for use in Celery tasks."""
    try:
        res = ld_client.variation("test-flag", {"key": 0}, 0)
        logging.info(f"LD client initialized for Celery worker. {res}")
    except Exception:
        import traceback
        traceback.print_exc()
        logger.error("Error initializing LD client for Celery worker.", exc_info=True)
</code></pre>
<p>To aid in future discovery and debugging, we also created a celery task that we can call on the fly to make sure things are working:</p>
<pre><code class="language-python3">
@shared_task
def celery_ld_check(flag="test-flag", key=0, default="not found"):
    """
    Test LaunchDarkly SDK connectivity from Celery.
    """

    print("trying celery_ld_check")
    try:
        variation = ld_client.variation(flag, {"key": key}, default)
        print(f"celery_ld_check: {variation}")
    except Exception as e:
        print(f"celery_ld_check: {e}")
</code></pre>
<p>Lastly, we will likely iterate on the <code>LDClient</code> class to deal with issues regarding the fork on the fly.</p>
<p>Let me know if this helps you in your code, or sparks any ideas for you!</p>
]]></content:encoded></item></channel></rss>