# Senior Data Engineer (F/M/D)

**Company:** [Animore](http://jobs.workable.com/companies/tSaLZx2jAc18bDq1VV8Pah.md)
**Location:** Munich, Germany
**Workplace:** on site
**Employment type:** Full-time
**Department:** Engineering

[Apply for this job](http://jobs.workable.com/view/430d222e-be5d-4dab-80ba-aecb94e7ebec)

## Description

### The Opportunity

We’re looking for a Senior Data Engineer to architect and scale the data backbone powering next-generation AI models in robotics and real-world environments.

This role sits at the intersection of distributed systems, multimodal data processing, and applied machine learning, with a strong focus on building high-quality datasets for robotic foundation models. You will ensure that data pipelines, infrastructure, and data strategy directly translate into measurable improvements in model performance.

### Your Responsibilities

-   Drive the model–data loop by connecting application requirements with data collection, and translating model failures into data-driven improvements through collection, curation, and augmentation
-   Build and scale distributed data pipelines (Ray/Anyscale or similar) for TB-scale video, sensor, and robotics datasets
-   Design multimodal data schemas aligning video, actions, and high-frequency sensor streams
-   Develop Python tooling for data quality, including cleaning, anomaly detection, and dataset versioning
-   Own dataset quality and coverage, including annotation workflows, data diversity, and storage trade-offs
-   Lead a small team and coordinate with data providers and annotation vendors
-   Oversee real-world data collection, including technical setup, compliance, and secure data handling

### Technologies

-   Python (advanced, production-grade)
-   Ray / Anyscale or Apache Spark
-   AWS / GCP for large-scale data and GPU training pipelines
-   Video and sensor data formats (H.264/H.265, ROS bags, MCAP)
-   PyTorch, NumPy
-   DVC, LakeFS or similar data versioning tools
-   Distributed data processing and storage systems

## Requirements

### Must Have

-   5+ years in Data/ML Engineering, including 2+ years in a senior or lead role
-   Experience with large-scale real-world data (robotics, autonomous systems, or video AI)
-   Strong experience with Ray/Anyscale or Spark for distributed pipelines
-   Advanced Python (performance, concurrency, ML stack like NumPy/PyTorch)
-   Experience working with video and sensor data formats (e.g., H.264/H.265, ROS bags, MCAP)
-   Experience building scalable data pipelines for GPU-based training workloads (AWS/GCP)
-   Experience with data versioning tools such as DVC or LakeFS
-   Proven experience owning systems and mentoring engineers

### Nice to Have

-   Experience building datasets for multimodal foundation models (VLA, VLM or similar)
-   Robotics fundamentals (sensor synchronization, 3D transforms)
-   Experience with active learning or data-centric ML workflows

## Benefits

-   Competitive compensation package
-   Various employee subsidies and perks, including public transportation and Wellpass
-   Work with a world-class team in a flat hierarchy, with direct collaboration alongside the founders and engineering team
-   Opportunity to make a real impact by working on cutting-edge robotics and AI systems
-   Fast growth potential in a rapidly evolving company and industry
-   International office environment with English as the official working language

### Recruiting Process

Your recruiting partner for this role is Madhulika (she/her). You can expect a screening call and up to 4 rounds of interviews including an onsite visit to our office in Munich to meet with the team.

We hire across backgrounds, identities, and experiences, and we are committed to a workplace where everyone belongs. Discrimination has no place here.

If you need any accommodations during the recruiting process, just reach out to your recruiting partner.