cv
Professional Experience. Click to expand sections.
Basics
| Name | Partha Pratim Nath |
| Label | Machine Learning Engineer | 3D Pose Estimation and Reconstruction |
| nath.partha@outlook.com | |
| Phone | +49 15239522871 |
| Url | https://nath-partha.github.io |
| Summary | Bridging 3D Vision & Language at Scale I am a Machine Learning Engineer specializing in 3D Perception Methods with over 2 years of experience refining prototypes from SOTA research. My work ranges from constructing the ADL4D dataset for complex human activity understanding to training large-scale 3D VLMs (SpatialLM, LLAMA3.2) on high-performance cloud clusters. My technical foundation is built on three pillars: 3D human tracking, rigid/non-rigid object pose estimation, and point cloud scene reconstruction. I am currently focused on unifying these distinct fields to tackle the most complex challenges in spatial intelligence and embodied AI. |
Key highlights
-
🚀 Nvidia Inception Program
Preparing experiments and team for the Nvidia-DGX-Inception Program(Accepted)
-
🚀 3D Object Detection R&D
Implementing Detr fundamentals to pointcloud detection tasks using local attention methods
-
🚀 3D Vision-Language Models
Prototyping and Reverse engineering Pointcloud VLMs from SOTA research
-
📚 Teaching & Curriculum Design
Project Lab Human Activity Understanding creation @ TUM
-
🏆 Kurt Fischer Prize
Markerless multi-subject hand tracking awarded the Kurt Fischer Prize
Work
-
2024.12 - Present Machine Learning and Computer Vision Engineer
Cirqular Pointcloud Analytics GMBH
End-to-end R&D and MLOps for large-scale 3D VLM and Scan2BIM pipelines.
- ML Ops & Distributed Training Infrastructure
- Scaled pointcept training engine using standalone ZeRO optimizers and model sharding to resolve memory bottlenecks.
- Evaluated DeepSpeed compilation strategies for Pointcept module incompatibilities and workarounds.
- Re-architected training backend by wrapping HuggingFace Trainer, unifying distributed strategies with custom checkpointing and synchronisation for config/source files.
- Deployed on-premise ClearML infrastructure for experiment tracking.
- Evaluated external libraries for unified cloud resource provisioning and training jobs.
- Integrated multi-TB scale datasets for preprocessing and training in segmentation and detection tasks
- R&D: 3D Vision-Language & Object Detection
- Integrated LLMs (LLama3.2, Qwen2) with 3D backbones (Ptv3, Sonata) and reverse-engineered SOTA (SpatialLM, Locate3D) to build custom VLM training pipelines on A100 clusters.
- Developed and ablated 3D-Detr/Roomformer architectures. Tested 3D local attention encoders; reformulated losses to improve rotation regression for high-aspect-ratio objects.
- Utilized 150k in GCP startup credits to scale data augmentation and model training experiments on A100 clusters.
- Production Engineering & Scan2BIM
- Implemented and containerized Lidar panoptic segmentation pipelines for Scan2BIM/Scan2CAD tasks, deploying robust models (Ptv3, Sonata) for Industry Foundation Classes.
- Maintained production code and managed model updates for out-of-core segmentation pipeline.
- Visualization & Strategy
- Technical Communication: Visualized progress using Rerun, Open3D, and high-quality GRUT (Nvidia) renders for internal presentations and the Nvidia-DGX-Inception Program.
- Collaboration: Contributed to core research objectives regarding superpoints, object detection, and 3D reconstruction.
- ML Ops & Distributed Training Infrastructure
-
2023.09 - 2024.09 System Engineer: Machine Learning and Computer Vision
RevTec Systems AG: Casinos Austria International
Object Detection and Tracking in RGBD images in on-prem Casino surveillance.
- Real-time Surveillance System
- Object Detection and Tracking in RGBD images in on-prem Casino surveillance.
- Developed real-time surveillance software tracking currency, gestures, and equipment.
- Reviewed and integrated external projects to handle camera calibration and drift stabilization.
- Managed the CVML lifecycle and outreach for alpha customers (UK & ZA), providing rolling updates and leadership demos.
- Customer Onboarding Toolkit
- Built a customer onboarding toolkit using SAM and foundation models to generate customer-specific object models and datasets.
- Improved legacy code and automated models preparation, successfully reducing installation timelines from 2 weeks to less than 5 days.
- Evaluated and Integrated newer model compilation tools and edge devices for inference scaling options.
- Real-time Surveillance System
-
2022.10 - 2023.05 Software Engineering Intern
Infineon Technologies AG
Developed Machine Vision Software for Human Pose in 3D Camera.
- 3D Machine Vision Development and data generation
- Torch and opensource based detection and tracking | 3D Multiview Calibration.
- Sensor data acquisition library to train gesture detection radar pipelines with cameras.
- Designed scalable calibration routines for multi-camera setups and prototyped RGB-only multiview data acquisition.
- 3D Machine Vision Development and data generation
-
2021.06 - 2023.08 Research and Teaching Assistant
TUM Chair of Media Technology
Awarded Kurt Fischer Prize for Markerless Motion Capture research.
- Research: Markerless Motion Capture (Kurt Fischer Prize)
- Created a novel markerless motion capture toolkit [RGB images]
- Created a high fidelity human + object interaction dataset that outperformed previous contributions in pose diversity, accuracy and ability to robustly record very long sequences
- Utilised deep learning pose estimation, 3D multiview algorithms and linear mathematical solving to robustly calculate 3D humans in view
- Benchmark Tasks (3D Tracking, Hand Mesh Recovery, Hand Action Segmentation)
- Featured: https://www.ce.cit.tum.de/en/lmt/home/ Slide 6
- Teaching Assistantship + New Project Lab
- Designed Course | Guided Projects | 3DML Topics | ICP . Camera Projection . Rendering | Demo Scripts
- Course Link: https://www.ce.cit.tum.de/en/lmt/lehre/projektpraktikum-project-lab-human-activity-understanding/
- Multicamera Studio Setup
- Designed Multicamera Studio for RGBD streams with Realsense Sensors in a streamlined Setup
- Low Latency | Extrinsics Calibration | 8-12 Cameras | Distributed ROS | Optitrack Integration
- VR Simulation Tool
- UE4 based VR simulation and photorealistic data capture tool built on https://sim2realai.github.io/UnrealROX/
- Research: Markerless Motion Capture (Kurt Fischer Prize)
Education
-
2020.10 - 2023.09 Master
Technical University of Munich
School of Computation, Information and Technology
- Kurt-Fischer €1000 Prize
-
2016.06 - 2020.05 Bachelor
SRM Institute Of Science & Technology
Electronics and Communication Engineering
- Project: Multispectral Optics Module for a firefighting robot
- First Class with Distinction
Skills
| 3D Vision & VLMs | |
| PointTransformerV3 | |
| Sonata | |
| LLaMA3.2 | |
| Qwen2 | |
| CLIP | |
| SpatialLM | |
| Locate3D |
| Deep Learning Frameworks | |
| PyTorch | |
| Deepspeed | |
| Huggingface | |
| PyTorch3D | |
| Detectron2 | |
| Pointcept | |
| mmLabs |
| Production ML & Cloud Tools | |
| GCP (A100) | |
| AWS (L40, A100) | |
| Docker | |
| Multi-node Training | |
| Model Deployment |
| 3D Understanding | |
| Scan2CAD | |
| Point Cloud Segmentation | |
| Point Mesh Loss Functions | |
| ICP | |
| SLAM | |
| Multiview Geometry | |
| Camera Calibration |
| Programming & Core Tools | |
| Python | |
| C++ | |
| OpenCV | |
| Open3D | |
| NumPy | |
| Scikit-learn | |
| Pytorch3d | |
| Rerun | |
| CVAT |
| OpenSource Datasets | |
| *ADL4D[1.1M] | |
| ScanNet++ | |
| Structured3D | |
| H2O3D | |
| DexYCB | |
| SpatialLM | |
| CV4AEC |
Publications
-
2024.02.01 ADL4D: Towards A Contextually Rich Dataset for 4D Activities of Daily Living
Zakour*, Nath*
Designed a weakly supervised markerless hand pose annotation method for 4D Human Activity Understanding capable of handling complex multisubject interactions. Published a Multiview Multisubject activity Dataset of 1.1M frames.
Languages
| English | |
| Native |
| German | |
| Conversational |
| Misc (French, Hindi, Bengali) | |
| Beginner/Conversational |
Interests
| Robotics | |||
| Perception Stack | |||
| Safe Grasp/Interaction | |||
| Motion Capture | ||||
| Marker / Markerless | ||||
| Monocular and Multiview | ||||
| Parametric and Non-Parametric | ||||
| 3D Scanning | ||||
| Point/ Mesh Reconstruction | ||||
| Point Focused Gaussian Splatting | ||||
| Scan Vectorisation(Scan2CAD) | ||||
| Drone/Autonomous Camera Tracking | |||
| Point/Object/Area Tracking | |||
| Ground Stabilization | |||
References
| Dr Rahul Gopal Chaudhari | |
| TUM Senior Scientist |
| M.Sc. Marsil Zakour | |
| TUM Doctoral Candidate |
| Maximilian Strobel | |
| Infineon, System Architect Machine Learning |
| Michael Winking | |
| Infineon, Staff Engineer |