cv | Partha P. Nath

Basics

Name	Partha Pratim Nath
Label	Machine Learning Engineer \| 3D Pose Estimation and Reconstruction
Email	nath.partha@outlook.com
Phone	+49 15239522871
Url	https://nath-partha.github.io
Summary	Bridging 3D Vision & Language at Scale I am a Machine Learning Engineer specializing in 3D Perception Methods with over 2 years of experience refining prototypes from SOTA research. My work ranges from constructing the ADL4D dataset for complex human activity understanding to training large-scale 3D VLMs (SpatialLM, LLAMA3.2) on high-performance cloud clusters. My technical foundation is built on three pillars: 3D human tracking, rigid/non-rigid object pose estimation, and point cloud scene reconstruction. I am currently focused on unifying these distinct fields to tackle the most complex challenges in spatial intelligence and embodied AI.

Key highlights

🚀 Nvidia Inception Program

Preparing experiments and team for the Nvidia-DGX-Inception Program(Accepted)
🚀 3D Object Detection R&D

Implementing Detr fundamentals to pointcloud detection tasks using local attention methods
🚀 3D Vision-Language Models

Prototyping and Reverse engineering Pointcloud VLMs from SOTA research
📚 Teaching & Curriculum Design

Project Lab Human Activity Understanding creation @ TUM
🏆 Kurt Fischer Prize

Markerless multi-subject hand tracking awarded the Kurt Fischer Prize

Work

2024.12 - Present
Machine Learning and Computer Vision Engineer

Cirqular Pointcloud Analytics GMBH

End-to-end R&D and MLOps for large-scale 3D VLM and Scan2BIM pipelines.
- ML Ops & Distributed Training Infrastructure
  - Scaled pointcept training engine using standalone ZeRO optimizers and model sharding to resolve memory bottlenecks.
  - Evaluated DeepSpeed compilation strategies for Pointcept module incompatibilities and workarounds.
  - Re-architected training backend by wrapping HuggingFace Trainer, unifying distributed strategies with custom checkpointing and synchronisation for config/source files.
  - Deployed on-premise ClearML infrastructure for experiment tracking.
  - Evaluated external libraries for unified cloud resource provisioning and training jobs.
  - Integrated multi-TB scale datasets for preprocessing and training in segmentation and detection tasks
- R&D: 3D Vision-Language & Object Detection
  - Integrated LLMs (LLama3.2, Qwen2) with 3D backbones (Ptv3, Sonata) and reverse-engineered SOTA (SpatialLM, Locate3D) to build custom VLM training pipelines on A100 clusters.
  - Developed and ablated 3D-Detr/Roomformer architectures. Tested 3D local attention encoders; reformulated losses to improve rotation regression for high-aspect-ratio objects.
  - Utilized 150k in GCP startup credits to scale data augmentation and model training experiments on A100 clusters.
- Production Engineering & Scan2BIM
  - Implemented and containerized Lidar panoptic segmentation pipelines for Scan2BIM/Scan2CAD tasks, deploying robust models (Ptv3, Sonata) for Industry Foundation Classes.
  - Maintained production code and managed model updates for out-of-core segmentation pipeline.
- Visualization & Strategy
  - Technical Communication: Visualized progress using Rerun, Open3D, and high-quality GRUT (Nvidia) renders for internal presentations and the Nvidia-DGX-Inception Program.
  - Collaboration: Contributed to core research objectives regarding superpoints, object detection, and 3D reconstruction.
2023.09 - 2024.09
System Engineer: Machine Learning and Computer Vision

RevTec Systems AG: Casinos Austria International

Object Detection and Tracking in RGBD images in on-prem Casino surveillance.
- Real-time Surveillance System
  - Object Detection and Tracking in RGBD images in on-prem Casino surveillance.
  - Developed real-time surveillance software tracking currency, gestures, and equipment.
  - Reviewed and integrated external projects to handle camera calibration and drift stabilization.
  - Managed the CVML lifecycle and outreach for alpha customers (UK & ZA), providing rolling updates and leadership demos.
- Customer Onboarding Toolkit
  - Built a customer onboarding toolkit using SAM and foundation models to generate customer-specific object models and datasets.
  - Improved legacy code and automated models preparation, successfully reducing installation timelines from 2 weeks to less than 5 days.
  - Evaluated and Integrated newer model compilation tools and edge devices for inference scaling options.
2022.10 - 2023.05
Software Engineering Intern

Infineon Technologies AG

Developed Machine Vision Software for Human Pose in 3D Camera.
- 3D Machine Vision Development and data generation
  - Torch and opensource based detection and tracking | 3D Multiview Calibration.
  - Sensor data acquisition library to train gesture detection radar pipelines with cameras.
  - Designed scalable calibration routines for multi-camera setups and prototyped RGB-only multiview data acquisition.
2021.06 - 2023.08
Research and Teaching Assistant

TUM Chair of Media Technology

Awarded Kurt Fischer Prize for Markerless Motion Capture research.
- Research: Markerless Motion Capture (Kurt Fischer Prize)
  - Created a novel markerless motion capture toolkit [RGB images]
  - Created a high fidelity human + object interaction dataset that outperformed previous contributions in pose diversity, accuracy and ability to robustly record very long sequences
  - Utilised deep learning pose estimation, 3D multiview algorithms and linear mathematical solving to robustly calculate 3D humans in view
  - Benchmark Tasks (3D Tracking, Hand Mesh Recovery, Hand Action Segmentation)
  - Featured: https://www.ce.cit.tum.de/en/lmt/home/ Slide 6
- Teaching Assistantship + New Project Lab
  - Designed Course | Guided Projects | 3DML Topics | ICP . Camera Projection . Rendering | Demo Scripts
  - Course Link: https://www.ce.cit.tum.de/en/lmt/lehre/projektpraktikum-project-lab-human-activity-understanding/
- Multicamera Studio Setup
  - Designed Multicamera Studio for RGBD streams with Realsense Sensors in a streamlined Setup
  - Low Latency | Extrinsics Calibration | 8-12 Cameras | Distributed ROS | Optitrack Integration
- VR Simulation Tool
  - UE4 based VR simulation and photorealistic data capture tool built on https://sim2realai.github.io/UnrealROX/

Education

2020.10 - 2023.09
Master

Technical University of Munich

School of Computation, Information and Technology
- Kurt-Fischer €1000 Prize
2016.06 - 2020.05
Bachelor

SRM Institute Of Science & Technology

Electronics and Communication Engineering
- Project: Multispectral Optics Module for a firefighting robot
- First Class with Distinction

Skills

	3D Vision & VLMs
	PointTransformerV3
	Sonata
	LLaMA3.2
	Qwen2
	CLIP
	SpatialLM
	Locate3D

	Deep Learning Frameworks
	PyTorch
	Deepspeed
	Huggingface
	PyTorch3D
	Detectron2
	Pointcept
	mmLabs

	Production ML & Cloud Tools
	GCP (A100)
	AWS (L40, A100)
	Docker
	Multi-node Training
	Model Deployment

	3D Understanding
	Scan2CAD
	Point Cloud Segmentation
	Point Mesh Loss Functions
	ICP
	SLAM
	Multiview Geometry
	Camera Calibration

	Programming & Core Tools
	Python
	C++
	OpenCV
	Open3D
	NumPy
	Scikit-learn
	Pytorch3d
	Rerun
	CVAT

	OpenSource Datasets
	*ADL4D[1.1M]
	ScanNet++
	Structured3D
	H2O3D
	DexYCB
	SpatialLM
	CV4AEC

Publications

2024.02.01

ADL4D: Towards A Contextually Rich Dataset for 4D Activities of Daily Living

Zakour*, Nath*

Designed a weakly supervised markerless hand pose annotation method for 4D Human Activity Understanding capable of handling complex multisubject interactions. Published a Multiview Multisubject activity Dataset of 1.1M frames.

Languages

	English
	Native

	German
	Conversational

	Misc (French, Hindi, Bengali)
	Beginner/Conversational

Interests

	Robotics
	Perception Stack
	Safe Grasp/Interaction

	Motion Capture
	Marker / Markerless
	Monocular and Multiview
	Parametric and Non-Parametric

	3D Scanning
	Point/ Mesh Reconstruction
	Point Focused Gaussian Splatting
	Scan Vectorisation(Scan2CAD)

	Drone/Autonomous Camera Tracking
	Point/Object/Area Tracking
	Ground Stabilization

References

	Dr Rahul Gopal Chaudhari
	TUM Senior Scientist

	M.Sc. Marsil Zakour
	TUM Doctoral Candidate

	Maximilian Strobel
	Infineon, System Architect Machine Learning

	Michael Winking
	Infineon, Staff Engineer

Basics

Key highlights

🚀 Nvidia Inception Program

Preparing experiments and team for the Nvidia-DGX-Inception Program(Accepted)

🚀 3D Object Detection R&D

Implementing Detr fundamentals to pointcloud detection tasks using local attention methods

🚀 3D Vision-Language Models

Prototyping and Reverse engineering Pointcloud VLMs from SOTA research

📚 Teaching & Curriculum Design

Project Lab Human Activity Understanding creation @ TUM

🏆 Kurt Fischer Prize

Markerless multi-subject hand tracking awarded the Kurt Fischer Prize

Work

Machine Learning and Computer Vision Engineer

Cirqular Pointcloud Analytics GMBH

End-to-end R&D and MLOps for large-scale 3D VLM and Scan2BIM pipelines.

System Engineer: Machine Learning and Computer Vision

RevTec Systems AG: Casinos Austria International

Object Detection and Tracking in RGBD images in on-prem Casino surveillance.

Software Engineering Intern

Infineon Technologies AG

Developed Machine Vision Software for Human Pose in 3D Camera.

Research and Teaching Assistant

TUM Chair of Media Technology

Awarded Kurt Fischer Prize for Markerless Motion Capture research.

Education

Master

Technical University of Munich

School of Computation, Information and Technology

Bachelor

SRM Institute Of Science & Technology

Electronics and Communication Engineering

Skills

Publications

ADL4D: Towards A Contextually Rich Dataset for 4D Activities of Daily Living

Zakour*, Nath*

Designed a weakly supervised markerless hand pose annotation method for 4D Human Activity Understanding capable of handling complex multisubject interactions. Published a Multiview Multisubject activity Dataset of 1.1M frames.

Languages

Interests

References

Zakour, Nath