Inference · Orchestration · Scientific ML

Ashwin

Gupta

Role

AI Systems Researcher
Inference · Orchestration · Scientific ML

Company

Coforge
Jun 2024 – Present

I design how AI systems behave under real-world constraints — from decision layers and inference routing to physics-informed models. Not what a model outputs, but how the system decides, executes, and holds under load.

Building, not Browsing
Ashwin Gupta
MLOps & GenAI — IIIT BangalorePyTorch · LLMs · RAG · GCP
Scroll
01 — About

Systems that decide. Infrastructure that holds.

Most AI work focuses on what a model outputs. I focus on how the system behaves — how requests are routed, how decisions are made under constraint, and how the system holds when inputs are noisy, capacity is saturated, or governing assumptions break.

This spans three areas: inference and orchestration systems (scheduling, routing, execution), scientific ML (physics-informed models where governing equations constrain learning), and retrieval pipelines that have to be reliable — not just accurate on benchmarks.

Controla is inference infrastructure that gets smarter the longer it runs. ScholarOS is structured research execution. PHYSCLIP aligns symbolic physics with observed behavior. The PINNs work embeds PDEs into training. These are not tools — they are systems with decision logic.

Inference as a System

Inference is not a function call — it is a workload with scheduling, routing, and resource constraints. I design the layer that decides how requests move through hardware, when to batch, and how to degrade gracefully when capacity is hit.

Execution Under Constraints

Real systems run under latency budgets, VRAM ceilings, and cost targets. I design around those constraints before they become failures — not after. Observability is part of the system, not bolted on.

Physics-Informed Scientific ML

Governing equations exist for many systems. I embed them directly into the learning objective — PDEs as training constraints, not post-hoc validators. PHYSCLIP and the PINNs work both come from this.

Experience

Jun 2024toPresent
Coforge

AI Engineer @ Coforge

Owns the inference and concurrency architecture for HSBC voice AI — redesigned from thread pool to asyncio event loop, eliminating GIL contention across the full SIP/STT/LLM pipeline · 7× session capacity · $1.3M annualized savings · MTTR 1–2hr → ~5min via cross-stack log correlation

🏆Best Team Award — HSBC Account
🏆Pat on the Back — Think Customer Award (Individual Excellence)
🎓Led Java Spring AI training for 130+ colleagues
62% were Senior Engineers, Tech Leads & Architects
81% voted preferred trainer
Net Promoter Score +50 · 4.4/5 avg satisfaction
Jan 2023toMay 2024
Gida Technologies

Data Scientist @ Gida Technologies

163+ language RAG systems, sub-50ms recommenders

Jan 2022toSep 2022
IISc

Head of Machine Learning @ IISc

Founded & led the ML team at NMCAD Lab, Aerospace Engg. — eVTOL design optimisation using ML/DL under Dr Dineshkumar Harursampath. 5 projects delivered in 8 months.

Feb 2021toDec 2021
CellStrat

AI Product Developer @ CellStrat

Built production-ready AI products based on OpenAI research for cellstrathub.com — 11k+ global AI developers at time of deployment.

Jan 2020toOct 2022
OutLawed

Graphic Designer @ OutLawed

Designed social media content and teaching aids to empower underprivileged students through grassroots teaching programs.

Education

Oct 2025toPresent
IIIT Bangalore

Ex. Diploma in ML & AI @ IIIT Bangalore

Generative AI & Agentic AI · MLOps

Aug 2019toMay 2023
BMS College of Engineering

B.E. Mechanical @ BMS College of Engineering

🏆Best Outgoing Project - Mechanical Engineering '23
📄Published @ NCISCT 2022
02 — Core Capabilities

What runs in production.

Profiled under load. Not just imported.

Tech Stack

PyTorchLangChainHugging FaceOllamaFastAPIGCPAzureDockerKubernetesW&BMLflowFAISSRedisSQLTerraformPython

Production AI Infrastructure

01

Operated inference under 300ms latency SLOs, 1,600+ concurrent sessions, and cost ceilings where token spend maps directly to monthly burn — $118K → $8K/month.

latency budgetsSLA designcost modelingservice observability

Distributed Inference Systems

02

asyncio + uvloop replacing thread-blocked GIL-contended workers. 20 → 140+ concurrent calls per VM under sustained production load. Latency profiled at 99th percentile, not average.

async runtimesbatchingcapacity planningtail-latency control

Retrieval & Indexing Infrastructure

03

HNSW indexing with dense embeddings, configurable Top-K (3–4), and 1024–1536 token context windows tuned for recall vs. coherence. Reproducible index artifacts for air-gapped operation.

FAISS/HNSWchunking strategiesembedding maintenanceretrieval grounding

Monitoring, Telemetry & Failure Isolation

04

250K+ log lines reconstructed in <5s via GCP Logging APIs. MTTR: 1–2 hours → ~5 minutes. Trace correlation built into the stack — not bolted on after the incident.

SLA monitoringlog correlationtelemetry pipelinesfailure domain isolation

Graph-Based Retrieval Systems

05

Weighted directed graph encoding multi-level skill hierarchies as typed edges with dynamic weight updates. Sub-50ms inference on NVIDIA T4 under production concurrency. 30% relevance improvement over flat matching.

graph traversalweighted scoringsub-50ms inference loops

Scientific ML Systems

06

Dual-loss PINN framework embedding PDE/ODE constraints directly into the optimization objective. Stable convergence validated across 6 physics benchmarks with limited labeled data — fluid, structural, and thermal domains.

physics-informed modelsphysics-constrained trainingregime classification
03 — Research & Systems Thinking

Observe. Abstract. Construct.

Tap to dive deeper
04 — Projects

Delivered, Scaled.

Tap to dive deeper

ashwingupta.dev — Design Handoff to Production

Shipped
PersonalPersonal
ProblemThe original portfolio claimed performance engineering while shipping 400 animated DOM nodes and a 2 MB hero — self-defeating on load.
SystemRebuilt as a three-layer spatial interface — environment shell, Canvas particle field, hologram surface — collapsing visual effects into one system.
DesignOffscreen pre-rendering, visibility-gated RAF, lazy loading, and asset compression cut work at the source, making optimization structural not cosmetic.
01

PageIndexOllama — Local-First Fork of PageIndex

Shipped
Open SourceOpen Source
ProblemTree-RAG was hardwired to one provider contract — completion differences silently corrupted recursive traversal, and failures surfaced only after collapse.
SystemAdded a provider-routing layer with finish-reason normalization, so traversal depends on stable internal contracts rather than whichever runtime answered.
DesignPrompt externalization, bounded concurrency, and hierarchical fallback keep long-document runs stable on local models with uneven outputs and limited memory.
02

Research-It — Fully Local RAG System

Shipped
Open SourceOpen Source
ProblemAcademic RAG assumed cloud inference by default — air-gapped institutions and low-VRAM machines had no private path from ingestion to QA.
SystemBuilt a local-first retrieval stack: LEANN/HNSW indexes, dense embeddings, Ollama inference, and normalized ingestion across PDFs, HTML, and paper folders.
DesignChunk overlap, tuned Top-K and context windows, plus PyMuPDF and BeautifulSoup cleanup fix retrieval quality before errors reach query time.
03

Real-Time AI Voice Infrastructure for Banking

Client Delivery
HSBCHSBC
ProblemVoice infrastructure was thread-bound at 20 calls per VM, documentation took 10–15 minutes, and incident recovery demanded 1–2 hours.
SystemRedesigned around asyncio + uvloop — each SIP session became a coroutine across SBC, STT, and LLM stages, removing thread contention.
DesignBuilt cross-stack log correlation, SIPp load testing, and secure media transport — capacity, observability, and cost treated as one system.
04

AI-Powered Azure Infrastructure Documentation Engine

Client Delivery
CoforgeCoforge
ProblemAzure documentation relied on manual exports and hand-drawn diagrams — every project took 2–3 days and drifted from live state.
SystemBuilt a live-state extraction pipeline — subscription scan, topology mapping, and security analysis generate documents from current resource evidence.
DesignFew-shot prompting grounds generation in extracted inventory; guardrails reject any component without a matching live resource in the estate.
05

AI Contract Intelligence System for Airline Agreements

Client Delivery
Amex GBTAmex GBT
ProblemAirline agreements mixed scan-quality and readable PDF tables, and carrier template drift made manual review the only reliable extraction path.
SystemCamelot + Ghostscript extracted tables from both formats; GPT-4o one-shot normalization mapped varied carrier layouts into one contract view.
DesignThe pipeline preserves context without per-carrier tuning — low-quality scans, nested tables, and layout drift are handled inside one extraction flow.
06

Here.app – Multilingual Vehicle Intelligence Platform

Client Delivery
HDFC ERGOHDFC ERGO
ProblemVehicle assistants answered specification queries inconsistently across languages — the same request could contradict itself, making manual escalation the safe fallback.
SystemBuilt a RAG system over a structured vehicle database with image-linked attributes — every answer grounded in one canonical source.
DesignQA-gated retrieval validates lookup quality before generation, while 163-language delivery stays anchored to one data model instead of post-hoc translation.
07

Laminar · Metamorph · Polymorph — AI Delivery Toolchain

Client Delivery
Gida TechnologiesGida Technologies
ProblemContent generation, chatbot delivery, and API conversion lived in separate tools with manual handoffs — output drifted across every project.
SystemDesigned a three-part AI toolchain: Laminar for multilingual content, Metamorph for no-code chatbots, and Polymorph for API conversion scaffolds.
DesignEach tool ships standardized deployable artifacts — brand-consistent visuals, multilingual content at scale, and cURL-derived code across 20+ languages.
08

Graph-Based Skill Recommendation Engine

Client Delivery
PrismforcePrismforce
ProblemSkill recommendations ignored hierarchical relationships, taxonomy changes forced full batch retraining, and live inference missed the sub-50ms target.
SystemBuilt a weighted directed graph over multi-level skill hierarchies with typed edges and lightweight scoring — structure, not retraining, drives relevance.
DesignDynamic node insertion and deterministic traversal keep the graph current; latency was profiled at the 99th percentile under production load.
09

Physics-Informed Neural Networks (PINNs)

Best Outgoing Project · 2022–23 🏆
BMS College of EngineeringBMS College of Engineering
ProblemPurely data-driven simulation needed large labeled datasets and produced physically invalid solutions when sparse data let models ignore governing laws.
SystemDeveloped a dual-loss PINN framework that embeds PDE/ODE constraints directly into optimization — data fit and physical law are solved together.
DesignValidated across six benchmarks spanning fluid, structural, and thermal domains, including Burgers' equation plus Neumann and Dirichlet variants.
10
05 — Contact

Let's build something that matters.

Open to full-time AI/ML engineering roles, research collaborations, and interesting problems at the intersection of LLMs, distributed systems, and scientific ML.

Send a message

© 2026 Ashwin Gupta — Personal PortfolioAI Engineer — Bangalore, India