Avdhesh Singh Chouhan

AI ARCHITECT

Building Organizational Intelligence at Scale.

Over 8 years of experience architecting intelligent systems at scale. Currently building Donnie at LEC Robotics—an organizational intelligence platform that captures and compounds organizational knowledge into strategic assets through persistent memory and agentic reasoning.

Avdhesh Singh Chouhan

Technical Identity

I specialize in building intelligent systems that capture, reason, and act across complex domains. My expertise spans agentic AI architectures, organizational intelligence platforms, and foundation model orchestration—from system design to production deployment at scale. My work covers the full AI stack: multi-agent orchestration, persistent memory systems, LLM/SLM fine-tuning, cloud infrastructure, and edge optimization.

Currently, I am a Principal AI Engineer at LEC Robotics, architecting Donnie, an organizational intelligence platform that builds persistent, compounding knowledge systems across enterprises. Previously, I specialized in AI Agents, Generative AI, and LLM orchestration on Google Cloud Platform (GCP) for global clients. At Capgemini Engineering, I led AI initiatives, architected in-house GenAI accelerators, and steered cross-functional teams to deliver high-impact innovations. I am a recipient of the 2024 Annual Engineering Excellence Award for scalable AI innovation.

Design

GenAI Accelerators, RAG Pipelines, Multi-Agent Orchestration, and Microservices Architecture.

Build

Fine-tuning SLMs, Custom Transformers, LangChain Workflows, and Python/C++ logic.

Deploy

Docker, Kubernetes, TensorRT optimization, and Heterogeneous Compute.

Operate

CI/CD for ML, Model Evaluation, Cloud Infrastructure (AWS/GCP/Azure), and Scale.

Technical Domains

GenAI & Agentic Systems

Architecting autonomous workflows and RAG at scale.

  • Frameworks: LangChain, LlamaIndex, Transformers
  • Core: RAG, Vector DBs (Pinecone), Semantic Search
  • Skills: Prompt Engineering, Multi-Modal Systems, Agentic AI

Edge AI & Computing

Deploying heavy models on constrained hardware.

  • Hardware: NVIDIA Jetson, Qualcomm Snapdragon
  • Optimization: TensorRT, SNPE, Model Quantization

Cloud & MLOps

Full-cycle production engineering.

  • Platforms: AWS, GCP, Azure, Kubeflow
  • Infra: Docker, Kubernetes, Microservices, CI/CD
  • Tools: MLflow, JIRA, GIT, FastAPI

Publications

Published Author: "AI: Thriving in a World with Smart Machines"

Global Collaboration | Strategic Insights into the Future of Intelligence

View Publication

The Book

Co-authored a comprehensive guide exploring the convergence of Generative AI and Robotics, featuring insights from 32 global experts.

My Contribution

Authored Chapter 5: "From Task-Bots to General-Purpose Robots", analyzing how Multimodal AI and VLA (Vision-Language-Action) models are redefining robotic autonomy.

Key Topics

Multimodal AI Robotics Reinforcement Learning Future of Work

Impact

Provides a strategic roadmap for navigating the "Intelligence Age," bridging the gap between theoretical AI research and practical industrial application.

Flagship System Engineering

Infinite Context Local RAG (MLX)

High-performance local chatbot for Apple Silicon with unlimited context via recursive summarization.

View Code

The Problem

Running long-context chats on local hardware (MacBook) hits memory limits immediately. Cloud LLMs are expensive and privacy-invasive.

Architecture & Stack

Implemented a hybrid RAG + Recursive Summarization engine using Apple's MLX framework. It dynamically compresses conversation history while retaining key context.

Python Apple MLX Llama 3 ChromaDB FastAPI

Key Innovations

  • Memory Optimization: Achieved infinite conversation length on consumer hardware by abstracting "memory" into short-term (RAM) and long-term (Vector Store).
  • Privacy First: 100% offline execution with performance comparable to cloud-based GPT-3.5.

Impact

Open Source: Gained community traction for enabling "ChatGPT-like" experience on local silicon without data leaving the device.

AIClient (LLM Wrapper)

A unified Python client for interacting with various LLM providers (OpenAI, Anthropic, Gemini) with a consistent API.

View Code

The Problem

Switching between LLM providers requires rewriting integration code due to differing APIs and response formats.

Architecture & Stack

Built a lightweight, extensible wrapper that normalizes inputs and outputs across major LLM APIs, simplifying model swapping.

Python OpenAI API Anthropic API Gemini API PyPI

Key Innovations

  • Unified Interface: Drop-in replacement for different provider SDKs.
  • Streamlined Development: Accelerates prototyping by decoupling logic from specific model providers.

Impact

Developer Productivity: Reduces boilerplate and vendor lock-in for Python-based GenAI applications.

GenAI Accelerator & Agentic Orchestration

End-to-end automation of the AI lifecycle for enterprise deployment.

The Problem

Enterprises struggle to move from GenAI PoCs to production due to fragmented tooling for data ingestion, training, and deployment.

Architecture & Stack

Architected a centralized accelerator automating data preprocessing, model training, evaluation, and CI/CD integration using containerized environments.

LangChain Vector DBs Docker Kubernetes Python

Key Innovations

  • Full Automation: Streamlined the path from raw data to deployed model microservices.
  • Showcase Ready: Spearheaded solutions presented at CES 2025, demonstrating domain-specific LLM capabilities.

Impact

Led a 10-member AI team to deliver production-grade applications featuring multimodal systems and RAG pipelines.

Secure Enterprise LLM Gateway

A high-performance proxy for unified auth, rate limiting, and PII redaction across LLM providers.

View Code

The Problem

Directly exposing LLM API keys to frontend apps is a security nightmare. Enterprises need a control layer for cost and compliance.

Architecture & Stack

Built a centralized gateway that handles API key rotation, request logging, and real-time PII stripping before data leaves the release boundary.

Python Redis Presidio (PII) OIDC Auth Prometheus

Key Innovations

  • Unified Interface: One API surface for OpenAI, Anthropic, and Local models, abstracting provider differences.
  • Cost Control: Granular rate limiting per user/tenant to prevent budget runaways.

Impact

Security & Compliance: Ensures GDPR/SOC2 compliance for GenAI applications by acting as a "firewall" for prompts.

GCP Agentic Knowledge Pipeline

Production-Grade MLOps service for heterogeneous document retrieval and agentic reasoning.

View Code

The Problem

Building "Chat with PDF" demos is easy; building a scalable, observable pipeline that handles thousands of concurrent documents is hard.

Architecture & Stack

Designed a FastAPI microservice deployed on Google Cloud Run, integrating Gemini Pro with a custom knowledge graph for document understanding.

Google Cloud Platform Gemini Pro LangChain Docker CI/CD

Key Innovations

  • Full MLOps Lifecycle: Automated deployment pipelines, drift detection, and structured logging suitable for enterprise audits.
  • Heterogeneous Retrieval: Handles PDF, TXT, and MD files with a unified ingestion layer.

Impact

Demonstrates Production Readiness: Moving beyond "toy code" to deployable, reliable cloud architecture.

Professional Experience

Principal AI Engineer

Feb 2026 – Present

LEC Robotics (Building Donnie)

Architecting Donnie, an organizational intelligence platform that captures, connects, and reasons across entire organizations. Leading the design of persistent multi-layered memory systems, department-specific AI modules, and agentic reasoning engines that compound organizational knowledge into strategic assets.

Senior AI Engineer

Oct 2025 – Feb 2026

Turing

Self-employed consultant specializing in AI Agents, Generative AI, and LLM orchestration on Google Cloud Platform (GCP). Solving complex problems in AI software development for global clients.

Lead AI/ML Engineer

Jun 2022 – Oct 2025

Capgemini Engineering

Leading a 10-member team in architecting GenAI accelerators and Edge AI solutions. Delivered flagship projects for CES 2025. Awarded Engineering Excellence Award 2024 for pioneering scalable AI innovations.

Software Engineer (Cloud & ML)

Sep 2019 – Jun 2022

ConnectWise

Managed cloud infrastructure for data recovery systems and deployed ML models for anomaly detection and classification tasks.

Senior Development Engineer

Sep 2017 – Aug 2019

Calsoft

Designed data collection engines for converged infrastructure and built Python-based backend integrations for enterprise storage solutions.

Education & Certifications

Education

  • M.Sc. in AI & Machine Learning
    Liverpool John Moores University
  • PG Diploma in ML & AI
    IIIT Bangalore
  • B.E. in Computer Science
    Acropolis Institute

Top Skills

Generative AI Agentic AI Systems LLMs & SLMs RAG Pipelines Multimodal AI (VLM/VLA) Reinforcement Learning Computer Vision GCP & AWS Kubernetes & Docker System Architecture Python & PyTorch Vector Databases Fine-tuning (QLoRA) MLOps & CI/CD TensorFlow & Keras Scikit-Learn

Engineering Philosophy

Research != Production

A Jupyter notebook is an experiment, not a product. I enforce strict rigorous testing, versioning, and CI/CD for all ML artifacts.

Boring Technology

I prioritize proven, maintainable stacks (Postgres, Docker, standard PyTorch) over the latest hype unless the hype solves a specific blocked problem.

Fail Fast & Loud

Systems should fail deterministically and loudly. I design monitoring that alerts on data drift and latency spikes before users notice.

Get in Touch

If you are solving hard problems in AI infrastructure, efficient inference, or autonomous systems, let's talk.

Email Me LinkedIn GitHub Download Resume