Role
AI Systems Researcher
Inference · Orchestration · Scientific ML
Company
Coforge
Jun 2024 – Present
I design how AI systems behave under real-world constraints — from decision layers and inference routing to physics-informed models. Not what a model outputs, but how the system decides, executes, and holds under load.
Most AI work focuses on what a model outputs. I focus on how the system behaves — how requests are routed, how decisions are made under constraint, and how the system holds when inputs are noisy, capacity is saturated, or governing assumptions break.
This spans three areas: inference and orchestration systems (scheduling, routing, execution), scientific ML (physics-informed models where governing equations constrain learning), and retrieval pipelines that have to be reliable — not just accurate on benchmarks.
Controla is inference infrastructure that gets smarter the longer it runs. ScholarOS is structured research execution. PHYSCLIP aligns symbolic physics with observed behavior. The PINNs work embeds PDEs into training. These are not tools — they are systems with decision logic.
Inference as a System
Inference is not a function call — it is a workload with scheduling, routing, and resource constraints. I design the layer that decides how requests move through hardware, when to batch, and how to degrade gracefully when capacity is hit.
Execution Under Constraints
Real systems run under latency budgets, VRAM ceilings, and cost targets. I design around those constraints before they become failures — not after. Observability is part of the system, not bolted on.
Physics-Informed Scientific ML
Governing equations exist for many systems. I embed them directly into the learning objective — PDEs as training constraints, not post-hoc validators. PHYSCLIP and the PINNs work both come from this.
Experience

AI Engineer @ Coforge
Owns the inference and concurrency architecture for HSBC voice AI — redesigned from thread pool to asyncio event loop, eliminating GIL contention across the full SIP/STT/LLM pipeline · 7× session capacity · $1.3M annualized savings · MTTR 1–2hr → ~5min via cross-stack log correlation

Data Scientist @ Gida Technologies
163+ language RAG systems, sub-50ms recommenders

Head of Machine Learning @ IISc
Founded & led the ML team at NMCAD Lab, Aerospace Engg. — eVTOL design optimisation using ML/DL under Dr Dineshkumar Harursampath. 5 projects delivered in 8 months.

AI Product Developer @ CellStrat
Built production-ready AI products based on OpenAI research for cellstrathub.com — 11k+ global AI developers at time of deployment.

Graphic Designer @ OutLawed
Designed social media content and teaching aids to empower underprivileged students through grassroots teaching programs.
Education

Ex. Diploma in ML & AI @ IIIT Bangalore
Generative AI & Agentic AI · MLOps

B.E. Mechanical @ BMS College of Engineering
Profiled under load. Not just imported.
Tech Stack
Production AI Infrastructure
01Operated inference under 300ms latency SLOs, 1,600+ concurrent sessions, and cost ceilings where token spend maps directly to monthly burn — $118K → $8K/month.
Distributed Inference Systems
02asyncio + uvloop replacing thread-blocked GIL-contended workers. 20 → 140+ concurrent calls per VM under sustained production load. Latency profiled at 99th percentile, not average.
Retrieval & Indexing Infrastructure
03HNSW indexing with dense embeddings, configurable Top-K (3–4), and 1024–1536 token context windows tuned for recall vs. coherence. Reproducible index artifacts for air-gapped operation.
Monitoring, Telemetry & Failure Isolation
04250K+ log lines reconstructed in <5s via GCP Logging APIs. MTTR: 1–2 hours → ~5 minutes. Trace correlation built into the stack — not bolted on after the incident.
Graph-Based Retrieval Systems
05Weighted directed graph encoding multi-level skill hierarchies as typed edges with dynamic weight updates. Sub-50ms inference on NVIDIA T4 under production concurrency. 30% relevance improvement over flat matching.
Scientific ML Systems
06Dual-loss PINN framework embedding PDE/ODE constraints directly into the optimization objective. Stable convergence validated across 6 physics benchmarks with limited labeled data — fluid, structural, and thermal domains.
ashwingupta.dev — Design Handoff to Production
ShippedPageIndexOllama — Local-First Fork of PageIndex
ShippedResearch-It — Fully Local RAG System
ShippedReal-Time AI Voice Infrastructure for Banking
Client DeliveryAI-Powered Azure Infrastructure Documentation Engine
Client Delivery
CoforgeAI Contract Intelligence System for Airline Agreements
Client DeliveryHere.app – Multilingual Vehicle Intelligence Platform
Client Delivery
HDFC ERGOLaminar · Metamorph · Polymorph — AI Delivery Toolchain
Client Delivery
Gida TechnologiesGraph-Based Skill Recommendation Engine
Client Delivery
PrismforcePhysics-Informed Neural Networks (PINNs)
Best Outgoing Project · 2022–23 🏆
BMS College of EngineeringOpen to full-time AI/ML engineering roles, research collaborations, and interesting problems at the intersection of LLMs, distributed systems, and scientific ML.
Send a message