Pranav Belhekar

Guides

Long-form visual explainers. Each one is built to be the article you send a colleague when they ask "how does this actually work?"

The KV Cache, Illustrated

How modern LLMs actually remember the conversation while they generate. From the redundant computation that made naïve attention untenable, through what's exactly inside the cache, to the optimizations every production team uses on top: paged attention, GQA, prompt caching, attention sinks, speculative decoding. With worked math and thirteen diagrams.

Jun 15, 2026
A Visual Guide to AI Agent Memory

Your agent rebuilds its entire mind from scratch on every single turn. This guide explains — visually, from first principles — how agents remember, forget, and what actually works in production.

Jun 10, 2026
Agents Need to Sleep: The Architecture of Memory Consolidation

Why a 24/7 agent inevitably suffers from context collapse, and how to build an offline memory consolidation daemon inspired by human sleep.

Jun 10, 2026
A Visual Guide to Belief Revision in AI Agents

Your agent stores facts but cannot change its mind. This guide explains, visually, why vector databases fail at contradictions, what forty years of belief revision theory can teach us, and how to build the machinery that lets an agent update what it knows.

Jun 2, 2026