Inference · Orchestration · Scientific ML

Ashwin

Gupta

Role

AI Systems Researcher
Inference · Orchestration · Scientific ML

Company

Coforge
Jun 2024 – Present

I design how AI systems behave under real-world constraints — from decision layers and inference routing to physics-informed models. Not what a model outputs, but how the system decides, executes, and holds under load.

Building, not Browsing

MLOps & GenAI — IIIT BangalorePyTorch · LLMs · RAG · GCP

Scroll

01 — About

Systems that decide. Infrastructure that holds.

Most AI work focuses on what a model outputs. I focus on how the system behaves — how requests are routed, how decisions are made under constraint, and how the system holds when inputs are noisy, capacity is saturated, or governing assumptions break.

This spans three areas: inference and orchestration systems (scheduling, routing, execution), scientific ML (physics-informed models where governing equations constrain learning), and retrieval pipelines that have to be reliable — not just accurate on benchmarks.

Controla is inference infrastructure that gets smarter the longer it runs. ScholarOS is structured research execution. PHYSCLIP aligns symbolic physics with observed behavior. The PINNs work embeds PDEs into training. These are not tools — they are systems with decision logic.

Inference as a System

Inference is not a function call — it is a workload with scheduling, routing, and resource constraints. I design the layer that decides how requests move through hardware, when to batch, and how to degrade gracefully when capacity is hit.

Execution Under Constraints

Real systems run under latency budgets, VRAM ceilings, and cost targets. I design around those constraints before they become failures — not after. Observability is part of the system, not bolted on.

Physics-Informed Scientific ML

Governing equations exist for many systems. I embed them directly into the learning objective — PDEs as training constraints, not post-hoc validators. PHYSCLIP and the PINNs work both come from this.

Experience

Jun 2024toPresent

AI Engineer @ Coforge

Owns the inference and concurrency architecture for HSBC voice AI — redesigned from thread pool to asyncio event loop, eliminating GIL contention across the full SIP/STT/LLM pipeline · 7× session capacity · $1.3M annualized savings · MTTR 1–2hr → ~5min via cross-stack log correlation

🏆Best Team Award — HSBC Account

🏆Pat on the Back — Think Customer Award (Individual Excellence)

🎓Led Java Spring AI training for 130+ colleagues

↳62% were Senior Engineers, Tech Leads & Architects

↳81% voted preferred trainer

↳Net Promoter Score +50 · 4.4/5 avg satisfaction

Jan 2023toMay 2024

Data Scientist @ Gida Technologies

163+ language RAG systems, sub-50ms recommenders

Jan 2022toSep 2022

Head of Machine Learning @ IISc

Founded & led the ML team at NMCAD Lab, Aerospace Engg. — eVTOL design optimisation using ML/DL under Dr Dineshkumar Harursampath. 5 projects delivered in 8 months.

Feb 2021toDec 2021

AI Product Developer @ CellStrat

Built production-ready AI products based on OpenAI research for cellstrathub.com — 11k+ global AI developers at time of deployment.

Jan 2020toOct 2022

Graphic Designer @ OutLawed

Designed social media content and teaching aids to empower underprivileged students through grassroots teaching programs.

Education

Oct 2025toPresent

Ex. Diploma in ML & AI @ IIIT Bangalore

Generative AI & Agentic AI · MLOps

Aug 2019toMay 2023

B.E. Mechanical @ BMS College of Engineering

🏆Best Outgoing Project - Mechanical Engineering '23

📄Published @ NCISCT 2022

02 — Core Capabilities

What runs in production.

Profiled under load. Not just imported.

Tech Stack

PyTorchLangChainHugging FaceOllamaFastAPIGCPAzureDockerKubernetesW&BMLflowFAISSRedisSQLTerraformPython

Production AI Infrastructure

Operated inference under 300ms latency SLOs, 1,600+ concurrent sessions, and cost ceilings where token spend maps directly to monthly burn — $118K → $8K/month.

latency budgetsSLA designcost modelingservice observability

Distributed Inference Systems

asyncio + uvloop replacing thread-blocked GIL-contended workers. 20 → 140+ concurrent calls per VM under sustained production load. Latency profiled at 99th percentile, not average.

async runtimesbatchingcapacity planningtail-latency control

Retrieval & Indexing Infrastructure

HNSW indexing with dense embeddings, configurable Top-K (3–4), and 1024–1536 token context windows tuned for recall vs. coherence. Reproducible index artifacts for air-gapped operation.

FAISS/HNSWchunking strategiesembedding maintenanceretrieval grounding

Monitoring, Telemetry & Failure Isolation

250K+ log lines reconstructed in <5s via GCP Logging APIs. MTTR: 1–2 hours → ~5 minutes. Trace correlation built into the stack — not bolted on after the incident.

SLA monitoringlog correlationtelemetry pipelinesfailure domain isolation

Graph-Based Retrieval Systems

Weighted directed graph encoding multi-level skill hierarchies as typed edges with dynamic weight updates. Sub-50ms inference on NVIDIA T4 under production concurrency. 30% relevance improvement over flat matching.

graph traversalweighted scoringsub-50ms inference loops

Scientific ML Systems

Dual-loss PINN framework embedding PDE/ODE constraints directly into the optimization objective. Stable convergence validated across 6 physics benchmarks with limited labeled data — fluid, structural, and thermal domains.

physics-informed modelsphysics-constrained trainingregime classification

03 — Research & Systems Thinking

Observe. Abstract. Construct.

Tap to dive deeper

NCISCT 2022

Published

Automated Assessment Generation — Graphs & Language Models

Published Research · IJISET · Vol. 9 Special Issue

ProblemMCQ generation fails when distractors are merely wrong — they must be semantically plausible enough to separate genuine understanding from guessing.

MethodBERT surfaces salient spans, proper nouns anchor pivots, and WordNet / ConceptNet generate nearby alternatives through hierarchical sense fallback.

System designWordNet supplies hypernym→hyponym chains with sense disambiguation; ConceptNet adds part-of structure when lexical coverage thins, keeping selection grounded.

↗

PINNs White Paper

White Paper

Physics-Informed Inference for Partial Observability

ProblemPartially observed internal state creates a blind-control problem — sparse telemetry leaves conventional numerical solvers guessing what sensors never see.

MethodPDE constraints are embedded inside the training objective — the network fits observed telemetry while satisfying governing dynamics simultaneously.

System designStaged training rebalances telemetry fidelity against PDE adherence; convergence is read through residual consistency, boundary behavior, and physical plausibility.

↗

PHYSCLIP

Open Source

Contrastive Regime Classification — Symbolic and Observed Space Alignment

ProblemPhysics-informed models assume the governing equation is known — the harder upstream problem is deciding which regime applies first.

MethodDual encoders map symbolic descriptions and field states into a shared latent space, so regime recognition emerges from cross-modal alignment.

System designA contrastive objective pulls matched pairs together and pushes mismatched apart, making PHYSCLIP a perception layer before PINN-style enforcement.

↗

ScholarOS

Commercial Software (In Development)

Research as Structured Execution — Deterministic Services Over Autonomous Generation

ProblemResearch copilots generate fluent text without evidence traceability — grounded synthesis and hallucination look identical, so no claim can be audited.

MethodFive locked MCP services cover literature mapping, contradiction detection, hypothesis critique, evidence extraction, and assembly through schema-defined interfaces.

System designOnly hypothesis critique remains agentic — bounded to five iterations; all other stages are deterministic with provenance preserved through typed artifacts.

↗

controla

Commercial Software (In Development)

Local Inference That Learns — Routing That Compounds With Every Deployment

ProblemLocal inference routing is stateless by default — prior outcomes are ignored, so each request repeats the same blind dispatch mistakes.

MethodEvery request feeds contextual EWMA weight learning, so routing adapts to workload; the system improves as it runs without retuning.

System designPolicy updates are replay-validated before promotion — candidate routes degrading latency, accuracy, or SLA coverage are blocked before reaching live traffic.

↗

04 — Projects

Delivered, Scaled.

Tap to dive deeper

ashwingupta.dev — Design Handoff to Production

Shipped

Personal

ProblemThe original portfolio claimed performance engineering while shipping 400 animated DOM nodes and a 2 MB hero — self-defeating on load.

SystemRebuilt as a three-layer spatial interface — environment shell, Canvas particle field, hologram surface — collapsing visual effects into one system.

DesignOffscreen pre-rendering, visibility-gated RAF, lazy loading, and asset compression cut work at the source, making optimization structural not cosmetic.

01↗

PageIndexOllama — Local-First Fork of PageIndex

Shipped

Open Source

ProblemTree-RAG was hardwired to one provider contract — completion differences silently corrupted recursive traversal, and failures surfaced only after collapse.

SystemAdded a provider-routing layer with finish-reason normalization, so traversal depends on stable internal contracts rather than whichever runtime answered.

DesignPrompt externalization, bounded concurrency, and hierarchical fallback keep long-document runs stable on local models with uneven outputs and limited memory.

02↗

Research-It — Fully Local RAG System

Shipped

Open Source

ProblemAcademic RAG assumed cloud inference by default — air-gapped institutions and low-VRAM machines had no private path from ingestion to QA.

SystemBuilt a local-first retrieval stack: LEANN/HNSW indexes, dense embeddings, Ollama inference, and normalized ingestion across PDFs, HTML, and paper folders.

DesignChunk overlap, tuned Top-K and context windows, plus PyMuPDF and BeautifulSoup cleanup fix retrieval quality before errors reach query time.

03↗

Real-Time AI Voice Infrastructure for Banking

Client Delivery

HSBC

ProblemVoice infrastructure was thread-bound at 20 calls per VM, documentation took 10–15 minutes, and incident recovery demanded 1–2 hours.

SystemRedesigned around asyncio + uvloop — each SIP session became a coroutine across SBC, STT, and LLM stages, removing thread contention.

DesignBuilt cross-stack log correlation, SIPp load testing, and secure media transport — capacity, observability, and cost treated as one system.

04↗

AI-Powered Azure Infrastructure Documentation Engine

Client Delivery

Coforge

ProblemAzure documentation relied on manual exports and hand-drawn diagrams — every project took 2–3 days and drifted from live state.

SystemBuilt a live-state extraction pipeline — subscription scan, topology mapping, and security analysis generate documents from current resource evidence.

DesignFew-shot prompting grounds generation in extracted inventory; guardrails reject any component without a matching live resource in the estate.

05↗

AI Contract Intelligence System for Airline Agreements

Client Delivery

Amex GBT

ProblemAirline agreements mixed scan-quality and readable PDF tables, and carrier template drift made manual review the only reliable extraction path.

SystemCamelot + Ghostscript extracted tables from both formats; GPT-4o one-shot normalization mapped varied carrier layouts into one contract view.

DesignThe pipeline preserves context without per-carrier tuning — low-quality scans, nested tables, and layout drift are handled inside one extraction flow.

06↗

Here.app – Multilingual Vehicle Intelligence Platform

Client Delivery

HDFC ERGO

ProblemVehicle assistants answered specification queries inconsistently across languages — the same request could contradict itself, making manual escalation the safe fallback.

SystemBuilt a RAG system over a structured vehicle database with image-linked attributes — every answer grounded in one canonical source.

DesignQA-gated retrieval validates lookup quality before generation, while 163-language delivery stays anchored to one data model instead of post-hoc translation.

07↗

Laminar · Metamorph · Polymorph — AI Delivery Toolchain

Client Delivery

Gida Technologies

ProblemContent generation, chatbot delivery, and API conversion lived in separate tools with manual handoffs — output drifted across every project.

SystemDesigned a three-part AI toolchain: Laminar for multilingual content, Metamorph for no-code chatbots, and Polymorph for API conversion scaffolds.

DesignEach tool ships standardized deployable artifacts — brand-consistent visuals, multilingual content at scale, and cURL-derived code across 20+ languages.

08↗

Graph-Based Skill Recommendation Engine

Client Delivery

Prismforce

ProblemSkill recommendations ignored hierarchical relationships, taxonomy changes forced full batch retraining, and live inference missed the sub-50ms target.

SystemBuilt a weighted directed graph over multi-level skill hierarchies with typed edges and lightweight scoring — structure, not retraining, drives relevance.

DesignDynamic node insertion and deterministic traversal keep the graph current; latency was profiled at the 99th percentile under production load.

09↗

Physics-Informed Neural Networks (PINNs)

Best Outgoing Project · 2022–23 🏆

BMS College of Engineering

ProblemPurely data-driven simulation needed large labeled datasets and produced physically invalid solutions when sparse data let models ignore governing laws.

SystemDeveloped a dual-loss PINN framework that embeds PDE/ODE constraints directly into optimization — data fit and physical law are solved together.

DesignValidated across six benchmarks spanning fluid, structural, and thermal domains, including Burgers' equation plus Neumann and Dirichlet variants.

10↗

05 — Contact

Let's build something that matters.

Open to full-time AI/ML engineering roles, research collaborations, and interesting problems at the intersection of LLMs, distributed systems, and scientific ML.

ashwingupta3012@gmail.com

↗

GitHub

github.com/spice14

↗

linkedin.com/in/ashwingupta3012

↗

Kaggle

kaggle.com/ashwingupta3012

↗

Location

Bangalore, India

↗

Send a message