Sai Mitheran J

Sai Mitheran Jagadesh Kumar

I am currently part of the Machine Learning team at Latent AI, accelerating AI on the edge. I am a recent graduate from Carnegie Mellon University, specializing in AI/ML systems in Electrical and Computer Engineering. I graduated as a Gold medalist from the National Institute of Technology, Tiruchirappalli in Electronics and Communication Engineering in 2022. I was affiliated with the Max Planck Institute of Informatics, Saarbrücken, funded by the DAAD-WISE Scholarship.

I'm a recipient of the Indian Academy of Sciences Research Fellowship, the prestigious Dr. A.L. Abdussattar Memorial Award, the Sri Janardhana Iyengar Memorial Award, and the Graphics Replicability Stamp Award. Outside of research, I'm open to anything to do with Sustainability and Mental Health. Scroll down to know more!

Linktree / Email / LinkedIn / GitHub

News

12/24: Paper accepted at NeurIPS 2024!
09/24: Paper accepted in Springer book – RF, Microwave and Millimeter Wave Technologies!
01/24: Joining Latent AI as an AI Application Engineer (L8)!
12/23: Graduated from CMU with a Master's degree in ECE and 4.0 GPA!
09/23: Joining AirLab to explore large-scale scene understanding!
08/23: Teaching (assistant) 18-290 (Signals and Systems) once more at CMU!
07/23: Serving as a Reviewer for the IEEE Transactions on Neural Networks and Learning Systems!
06/23: Transforming Edge AI at Latent AI as an MLE intern for the summer!
05/23: Awarded the Sri. Janardhana Iyengar Memorial Award at NIT Trichy for the best academic performance in 2022!
02/23: Paper accepted at ICRA 2023!
01/23: Teaching (assistant) 18-290, Signals and Systems at CMU!
10/22: Paper accepted at IEEE Robotics and Automation Letters (RA-L)!
09/22: Started a Research Assistantship at CyLab, CMU in the Biometrics team!
08/22: Teaching (assistant) 18-794, Pattern Recognition Theory at CMU!
07/22: Paper accepted at Optik!
05/22: Paper accepted at ICML 2022!
03/22: Accepted to Carnegie Mellon University as a full-time grad student!
01/22: Paper accepted at ICRA 2022!
01/22: Selected for Research Week with Google Research, 2022
12/21: Check out my Linktree!
12/21: Paper accepted at AAAI 2022!
11/21: Received an honorary mention and award for the AI For Good Challenge.

Research and Experience

During my time at Carnegie Mellon University, I worked at the AirLab (Robotics Institute) on large-scale real-world scene understanding for urban robots. My research primarily explores the intersection of computer vision and efficient deep learning. I collaborated with researchers at the Medical Mechatronics Lab, National University of Singapore, as a Research Assistant on Graph-based Deep Reasoning and Surgical Scene Understanding.

As a Deep Learning Engineer at AIMonk Labs Pvt. Ltd., I worked in a team to build Neuralmarker, transforming businesses with Computer Vision. I received a recommendation from Foxconn Country Head, Josh Foulger, to work with their Intelligent Systems Team on prototyping.

Previously, I was affiliated with the Advanced Geometric Computing Lab and the Shakti Group at the Indian Institute of Technology, Madras, supervised by Dr. M. Ramanathan and Dr. V. Kamakoti. I also worked on Deep Generative Modelling of Real-time Wireless Communication Channels using UAVs with Dr. E.S. Gopi at NIT Trichy, and Dr. Nalin Jayakody at the Tomsk Infocomm Lab.

	Map It Anywhere (MIA): Empowering Bird's Eye View Mapping using Large-scale Public Data Cherie Ho, Jiaye Zou, Omar Alama, Sai Mitheran Jagadesh Kumar, Benjamin Chiang, Taneesh Gupta, Chen Wang, Nikhil Keetha, Katia Sycara, Sebastian Scherer Neural Information Processing Systems (NeurIPS), 2024 [Dataset and Benchmark Track]* Webpage / arXiv Top-down Bird's Eye View (BEV) maps are essential for ground robot navigation, yet current methods for predicting BEV maps from First-Person View (FPV) images lack scalability. We introduce Map It Anywhere (MIA), a data engine leveraging Mapillary and OpenStreetMap to curate a diverse dataset of 1.2 million FPV-BEV pairs. Our model trained on MIA's dataset significantly outperforms existing methods, demonstrating the effectiveness of large-scale public maps for enhancing BEV map prediction and autonomous navigation.
	Compressing Vision Transformers for Low-Resource Visual Learning Youn, Eric and Mitheran, Sai and Prabhu, Sanjana and Chen, Siyuan arXiv / Paper Our work introduces a framework for compressing Vision Transformer models for efficient segmentation, with a focus on enabling deployment on resource-constrained devices like the NVIDIA Jetson Nano (4GB). Our approach combines structured pruning, distillation from a stronger teacher, and quantization strategies to significantly reduce memory usage and inference latency while maintaining high segmentation accuracy and mean IoU. This allows for the rapid deployment of Vision Transformers on the edge.
	Rethinking Feature Extraction: Gradient-based Localized Feature Extraction for End-to-End Surgical Downstream Tasks Pang, Winnie and Islam, Mobarakol and Mitheran, Sai and Seenivasan, Lalithkumar and Xu, Mengya and Ren, Hongliang ICRA, 2023 and IEEE RA-L Code / Paper We develop a detector-free gradient-based localized feature extraction approach for end-to-end model training in surgical tasks like report generation and tool-tissue interaction graph prediction. By using gradient-based localization techniques (e.g., Grad-CAM) to extract features directly from discriminative regions in classification models' feature maps, we eliminate the need for object detection or region proposal networks. Our approach enables real-time deployment of end-to-end models for surgical downstream tasks.
	Rich Feature Distillation with Feature Affinity Module for Efficient Image Dehazing Mitheran, Sai and Suresh, Anushri and P Gopi, Varun Optik, Elsevier Code / arXiv / Paper This work introduces a simple, lightweight, and efficient framework for single-image haze removal, exploiting rich “dark-knowledge" information from a lightweight pre-trained super-resolution model via the notion of heterogeneous knowledge distillation.
	Not All Lotteries Are Made Equal Sahu, Surya Kant and Mitheran, Sai and Mahapatra, Ritul ICML, 2022 (HAET Workshop) arXiv / Paper The Lottery Ticket Hypothesis (LTH) states that for a reasonably sized neural network, there exists a subnetwork within the same network that, when trained from the same initialization, yields no less performance than the dense counterpart. We investigate the effect of model size and the ease of finding winning tickets. Through this work, we show that winning tickets is in-fact, easier to find for smaller models.
	Global-Reasoned Multi-Task Learning Model for Surgical Scene Understanding Seenivasan, Lalithkumar* and Mitheran, Sai* and Islam, Mobarakol and Ren, Hongliang ICRA, 2022 and IEEE RA-L [SOTA, Endovis18] Code / arXiv / Paper This paper introduces a globally-reasoned multi-task surgical scene understanding model capable of performing instrument segmentation and tool-tissue interaction detection.
	Audiomer: A Convolutional Transformer for Keyword Spotting Sahu, Surya Kant and Mitheran, Sai and Kamdar, Juhi and Gandhi, Meet AAAI, 2022 (DSTC10 Workshop) [SOTA, Keyword Spotting] Code / Paper In this work, we introduce an architecture, Audiomer, where we combine 1D Residual Networks with Performer Attention to achieve state-of-the-art performance in Keyword Spotting with raw audio waveforms, out-performing all previous methods while also being computationally cheaper and parameter-efficient.
	Introducing Self-Attention to Target Attentive Graph Neural Networks Mitheran, Sai and Java, Abhinav and Sahu, Surya Kant and Shaikh, Arshad AISP, 2022 Paper / Code / arXiv We propose using a Transformer in combination with a target attentive GNN, which allows richer Representation Learning. We outperform the existing methods on real-world benchmark datasets.
	User-Friendly Waveguide Modes Visualiser Mitheran, Sai and T N, Ram and S, Raghavan IEEE Microwave Magazine 2022, Microwaves 101, Recent Trends on Metamaterial Antennas for Wireless Applications and Deep Learning Techniques, 2021 Paper / Application / Microwaves 101 This article presents the procedure and results of a web application made to visualize field lines of Electric and Magnetic waves inside a waveguide. We propose a first-of-the-kind Graphical User Interface for waveguide visualization as a public resource.
	'CADSketchNet' - An Annotated Sketch dataset for 3D CAD Model Retrieval with Deep Neural Networks Manda, Bharadwaj and Dhayarkar, Shubham and Mitheran, Sai and V.K, Viekash and Muthuganapathy, Ramanathan 3DOR, 2021 and Computers & Graphics Journal Project Page / Paper We introduce the CADSketchNet dataset, an annotated collection of sketches of 3D CAD models, which is intended to enhance the research on developing AI-enabled search engines for 3D CAD models. We also evaluate the performance of various retrieval systems. Many experimental models are constructed and tested on CADSketchNet.

Service

	Student Volunteer, ICLR 2021
	Student Volunteer, ICML 2021

Affiliations (Upto Date)

Volunteering and Initiatives