Engineering Notes

A collection of thoughts on building production AI systems. Focus is on architecture, failure modes, and trade-offs.

Why most LangChain tutorials fail in production and how to build robust state management for autonomous agents.

Analyzing TTFT (Time To First Token) vs Total Latency and how standard caching strategies optimize user retention.

A flowchart for decision makers. Sometimes a regex or a simple classifier is 100x cheaper and 1000x faster.

Strategies for handling both high-throughput embeddings and high-latency generation on the same cluster.