About Me

Bridging 3D Perception & MLOps Infrastructure

  • Details:
    • Partha P. Nath
    • Machine Learning Engineer
    • Vienna, AT
  • Background:
    • 3D Vision & ML (TU Munich).
    • CS, Math, Electronics, Drones, Simulations.
  • Core Mission: Solving the Hard Problems.
  • Technical Pillars: 3D Human Tracking, Object Pose Estimation, Pointcloud Encoding
1 / 13

ADL4D — Challenge & Scope

CV and Geometry Challenge

ADL4D Banner
  • Goal: Capture Long-Horizon Activities
    • Provide full fidelity 6D hand poses
    • Multi-subject, multi-object, low fps vs
    • Previous Datasets single hand/subject, single object, single action, high fps.
  • Challenges: Standard epipolar matching of landmarks fails with interhand occlusion across views
  • Result: Degenerate triangulation destroying data quality.
2 / 13

ADL4D — Innovation & Impact

ReID in 3D Space, unlocking pose capture

t-SNE Pose Variation
Dataset Samples
  • Innovation: Project ReID as clustering in 3D space w/wo temporal guidance
  • Outcome:
    • Automated multi-subject tracking.
    • Reduced untrackable frames from 1088 → 22 (Internal) and 6302 → 213 (H2O Benchmark).
    • Captured the most diverse hand pose dataset with just basic activity sequences.
    • 1.1M Frames of 20FPS annotated RGB-D paired with aligned MANO hands
    • Strongest inter-dataset HMR generalization
3 / 13

ADL4D MLOps — Fragmentation & Infrastructure

Learning to isolate training

Cross Dataset Qualitative HMR
  • Training Challenges:
    • Dependency, code, and orchestration variations across models, model-tasks, versions and GPU arch.
    • Fragmented Cluster (A4000, A5000, A6000 nodes).
    • GPU Efficacy
  • Actions:
    • Docker Containerization
    • WandB for experiment tracking, grouping and artifacts upload and code version checks
    • Pytorch lightning or other hyperparameter sweep libraries for experiment jobs
    • Fixing OSS codebases for challenges offered in later/our work
    • Optimizing GPU bottlenecks with nvidia-smi, htop, wandb
    • Standardizing DDP for all codebases
4 / 13

Cirqular — Zero to One

Pointcloud Segmentation Week 0–1

  • Context:
    • Zero initial cloud resources.
    • Building a LiDAR segmentation training pipeline from scratch
  • Action:
    • Physically built an A6000 on-prem node.
    • Identified and selected Pointcept (Point Transformer series) as the baseline
    • Adapted loaders for processed data, ran sample trainings and confirmed metrics for top10 models using wandb
    • Built the preprocessing workflow to ingest new raw LiDAR data
    • Containerized and upgraded the training environment for latest dependencies (CUDA-Torch-PYG-SpConv)
    • Deployed an on-prem ClearML server to optimize experiments tracking cost.
5 / 13

Cirqular — Feedback Loop

Weeks 2–3

  • Action:
    • Automated sweeps for training recipes and model sizes to balance latency vs. accuracy.
    • Model Zoo: Auto-loading tagged checkpoints from registry.
    • Feedback Workflow: Users → Inference Team → Annotators+Trainers → Retraining.
    • Integrated methods to combat immediately visible issues like class imbalances, slicing dimensions, early stopping
6 / 13

Cirqular — Consulting & Maintenance

  • Action:
    • Retriggering for new data distributions.
    • Precalculate Semantic re-weighing based on instance tracking.
    • Fixing the Inference Stack for Docker/PyPA outages.
  • Outcome: Self-sustaining internal model zoo requiring minimal manual intervention.
7 / 13

RnD — SpatialLM

Scale & Abstraction

  • Context:
    • Transitioned to GCP with steady cloud credits
    • Segmentation training highly optimized
      • multi-node DDP, Gradient Accumulation, standalone Zero Optimisers Stage 1 & 2
    • ClearML hosted globally in GCP tracking commit id + diff on every experiment
  • Subject:
    • SpatialLM and Scenescript showcased promising results using structured language.
    • Trained on internal (Meta) and Synthetic datasets
    • No Released training code nor results of tuning for real data.
  • Goals:
    • Replicate SpatialLM public results
    • Consolidate annotations public datasets and internal
    • Build the optimal VLM for our specific use case.
8 / 13

SpatialLM — Iteration 1

  • Action:
    • Integration into Pointcept
      • Scenescript Encoder, Llama and Qwen language models
      • Basic Embeddings patching and vocab resizing utility
      • Data Conversion and Layout Annotation Processing for TBs of Public data
      • VLM build and Training code with initial Tests for pointcept encoders
    • Identified and fixed critical issues in sorting determinism, quantization, & token limits
  • Result: Fixed and Retrained with far better results mimicking v1 public release
9 / 13

SpatialLM — Iter 2: Scaling, Standardisation & Abstraction

  • Action:
    • Processed and integrated THOR, CV4AEC, Internal Datasets with S3 Streaming + Local Caching + Process Shard Id
    • Extended necessary pointcept augmentations to handle layout data
    • Iteratively train and rebalance
    • Identify codebase chokepoints in distributed training
    • Unify training under an extended HF trainer
      • FSDP, Checkpointing, Optimised Data Workers, and Multiple experiment trackers
    • Setup deepspeed launcher script handling environment and source code forwarding
  • Result: Experiments became purely config-driven abstractions at massive scale
  • Validation: Validated by SpatialLM authors' second update mirroring our architecture.
10 / 13

SpatialLM — Future Proofing (Iter 3)

Self-Healing Infrastructure with Ray Train

  • Action:
    • Investigated Ray Train / Scale AI to unify execution.
  • Why:
    • Solving "Crash Anxiety" during week-long training runs.
    • Naive integration with grafana board
    • Identical in abstraction to deepspeed launching
  • Feature: Automatic GPU provisioning + Node Crash Recovery (Restart & Checkpoint management).
11 / 13

Summary of Competencies

Approximate Tech Stack

  • Compute: GCP, AWS, On-prem (A6000/A5000), Multi-node Clusters.
  • Orchestration: Docker, DeepSpeed, Ray Train (experimental).
  • Training Frameworks: PyTorch Lightning, HF Trainer, FSDP, DDP.
  • Experiment Tracking: WandB, ClearML (Self-hosted & Cloud).
  • Data Ops: S3 Streaming, Local Caching, Voxelization Pipelines.
12 / 13

Conclusion

Thank you for your time

13 / 13
1 / —