Engineering Notes
A collection of thoughts on building production AI systems. Focus is on architecture, failure modes, and trade-offs.
Oct 24, 2024
Designing Agentic AI Systems Beyond the "Toy Demo" Phase
Why most LangChain tutorials fail in production and how to build robust state management for autonomous agents.
Architecture
Agents
Sep 12, 2024
The Hidden Cost of Latency in LLM Applications
Analyzing TTFT (Time To First Token) vs Total Latency and how standard caching strategies optimize user retention.
Performance
Aug 05, 2024
When NOT to use an LLM
A flowchart for decision makers. Sometimes a regex or a simple classifier is 100x cheaper and 1000x faster.
Strategy
Jul 22, 2024
Optimizing Triton Inference Server for Mixed Workloads
Strategies for handling both high-throughput embeddings and high-latency generation on the same cluster.
MLOps