| | |
Last updated on March 18, 2026. This conference program is tentative and subject to change
Technical Program for Thursday March 12, 2026
| |
| ThuBA |
|
| Oral Session 2 |
Regular |
| Chair: Vollmer, Anna-Lisa | Bielefeld University |
| |
| 09:30-09:36, Paper ThuBA.1 | |
| Flying Model-Based Gas Tomography for Carbon Dioxide Mapping: A Cooperative Robotic System |
|
| Schaab, Marius | Technical University of Munich (TUM) |
| Wiedemann, Thomas | German Aerospace Center (DLR) |
| Lilienthal, Achim J. | Technical University of Munich (TUM) |
Keywords: Robotics in Hazardous Fields, RIG TC: Robot Perception, Environment Monitoring and Management
Abstract: Drones equipped with gas sensors monitor hazardous emissions; however, rotor downwash frequently renders onboard in-situ sensors ineffective by dispersing gas plumes. To address this, we present a cooperative robotic system designed for remote gas tomography. The architecture consists of a ground-based tracking unit and an aerial reflector unit, establishing an open measurement path that traverses the undisturbed plume while keeping the drone's propulsion external to the sensing volume. The system relies on a robust visual tracking and control framework that maintains precise laser alignment between the stationary ground robot and the moving aerial target. By fusing these remote measurements through model-based gas tomography, we generate spatial maps of gas distribution. Field validation confirms the system’s ability to bypass aerodynamic disturbances, outperforming standard in-situ drones and enabling environmental monitoring of gas concentrations.
|
| |
| 09:36-09:42, Paper ThuBA.2 | |
| Bootstrapping Indoor Semantic Digital Twins from 2D Video |
|
| Alt, Benjamin | University of Bremen |
| Krohm, Luca | University of Bremen |
| Mania, Patrick | University of Bremen |
| Stefańczyk, Maciej | Warsaw University of Technology |
| Wilkowski, Artur | Institute of Control and Computation Engineering, Warsaw University of Technology, ul. Nowowiejska 15/19, 00-665 Warsaw, Poland |
| Beetz, Michael | University of Bremen |
|
|
| |
| 09:42-09:48, Paper ThuBA.3 | |
| Role-Played Human-Robot Interaction under Uncertainty and Its Impact on Trust |
|
| Shen, Shuyuan | University of Augsburg |
| Klein, Stina | University of Augsburg |
| Nasir, Jauwairia | University of Augsburg |
| Andre, Elisabeth | University of Augsburg |
| Kraus, Matthias | University of Augsburg |
Keywords: Human-Centered Robotics, Design and Human Factors, RIG Cluster: Human-Robot Interaction
Abstract: Large language models (LLMs) are increasingly embedded in robotic systems to enable flexible, open-ended human–robot interaction. However, there are concerns that these models fail to communicate uncertainty, leading to overconfident behavior and potential overtrust by users, particularly in sensitive domains such as healthcare. This work used role-play to investigate how uncertainty is communicated in human–robot dialogue and how such uncertainty communication shapes trust. Forty-two participants took part in paired role-play sessions in which one participant enacted a robot and the other a user in a care facility scenario. Our analysis focused on how perceived uncertainty shaped trust, both the robot players’ self-assessed trust and the user players’ trust. Findings showed that when robot players perceived uncertainty, their self-assessed trust decreased significantly in both capacity and moral dimensions, while interestingly, users’ trust also declined in the capacity dimension. We propose treating uncertainty communication as a two-party process and translating it into dialogue policies that calibrate trust, foster transparency, and grounding for safer human–robot collaboration.
|
| |
| 09:48-09:54, Paper ThuBA.4 | |
| Understanding and Addressing Mental Model Mismatches in Human-Robot Teaching |
|
| Richter, Phillip | Universität Bielefeld |
| Wersing, Heiko | Honda Research Institute Europe |
| Vollmer, Anna-Lisa | Bielefeld University |
Keywords: Learning Categories and Concepts, RIG Cluster: Human-Robot Interaction
Abstract: A major challenge in human-robot interaction is the mental
model mismatch, which arises when a human's understanding
of a robot's capabilities differs from the robot's actual
operational model. Such mismatches can result in
ineffective teaching, suboptimal performance, and
interaction breakdowns. This work aims to quantify and
systematically categorize mental model mismatches by
formalizing human expectations and comparing them with
robot learning processes. We have developed metrics to
measure mismatch and are currently developing a
comprehensive taxonomy that identifies specific types of
cognitive misalignments. By providing structured frameworks
for understanding these mismatches, the goal is to enable
targeted interventions fostering intuitive and effective
cooperative learning between humans and robots.
|
| |
| 09:54-10:00, Paper ThuBA.5 | |
| Adaptive Domestic Service Robotics through Foundation Models for Perception, Interaction, and Action |
|
| Memmesheimer, Raphael | University of Bonn |
| Pavlichenko, Dmytro | University of Bonn |
| Kruzhkov, Evgenii | University of Bonn |
| Bode, Jonas | University of Bonn |
| Schilke, Fynn | University of Bonn |
| Tutevych, Vitalii | University of Bonn |
| Lenz, Christian | University of Bonn |
| Schreiber, Michael | University of Bonn |
| Behnke, Sven | University of Bonn |
Keywords: Domestic Robotics, Mobile Manipulation, RIG TC: Robotics Foundation Models
Abstract: This extended abstract presents an overview of the approaches and results of the NimbRo@Home team, winner of the RoboCup@Home 2024 Open Platform League (OPL) in Eindhoven, Netherlands and runner-up of the RoboCup@Home OPL in Salvador, Brazil. The competition evaluates domestic service robots in realistic household scenarios, requiring robust perception, interaction, manipulation, and autonomous task execution. Our work focuses on advancing generalization and robustness through the integration of foundation models for perception, human-robot interaction, and planning. A central contribution is the deployment of foundational models like Large Language Models (LLMs) for human-robot interaction and task planning, Vision Language Models (VLMs) for scene state analysis, open-vocabulary object segmentation for perception and grasping. Traditional RoboCup systems rely on closed-set, supervised vision pipelines that require extensive labeling and retraining for each competition environment. We demonstrate that promptable vision models enable segmentation and manipulation of previously unseen objects described only by natural language. This significantly reduces labeling overhead and allows the robot to adapt on the fly. We combine these models with a closed-set pipeline based on YOLO and MaskDINO, yielding a hybrid perception system that balances robustness, speed, and flexibility across tasks.
|
| |
| 10:00-10:06, Paper ThuBA.6 | |
| How Far Are LLMs from AGI? Evidence from a Large World Problem |
|
| Zenkri, Oussama | Technische Universität Berlin |
| Brock, Oliver | Technische Universität Berlin |
Keywords: Performance Evaluation and Benchmarking, RIG TC: AI-powered and Cognition-Enabled Robotics, RIG Cluster: Learning and Multimodal AI for Robotics
Abstract: Robotics stands to benefit profoundly from Artificial General Intelligence (AGI), which promises agents capable of solving complex, long-horizon problems, show adaptive behavior, and perform robust decision-making in dynamic and uncertain environments. While Large Language Models (LLMs) are widely promoted as a viable pathway to this goal, their efficacy is typically benchmarked on tasks that fall into what Leonard Savage defined as the small world: static, fully observable, and deterministic. We evaluate LLMs, including reasoning models, on a long-horizon task designed to enforce large world constraints, requiring the accumulation and updating of information across successive interactions, and the formation of stable hypotheses about latent task structure across varying instances. The task prevents reliance on trial-specific recall and instead demands the identification of behavioral regularities. We contrast LLMs’ performance against a human baseline, serving as a baseline for genuine generalization. Our evaluation reveals that current models consistently fail to leverage interaction history to infer latent regularities, that are necessary for robust generalization across related task instances. These findings indicate that, while LLMs are effective at small-world inference, they do not yet exhibit the cognitive properties required to cope with the demands of real-world problem solving.
|
| |
| 10:06-10:12, Paper ThuBA.7 | |
| Versatile LiDAR Bundle Adjustment for Multi-Scan Alignment Utilizing Continuous-Time Trajectories |
|
| Wiesmann, Louis | University of Bonn |
| Marks, Elias Ariel | University of Bonn |
| Gupta, Saurabh | University of Bonn |
| Guadagnino, Tiziano | University of Bonn |
| Behley, Jens | University of Bonn |
| Stachniss, Cyrill | University of Bonn |
Keywords: SLAM, Mapping, Localization
Abstract: Constructing precise global maps is a key task in robotics and is required for localization, surveying, monitoring, or constructing digital twins. To build accurate maps, data from mobile light detection and ranging (LiDAR) sensors is often used. Mapping requires correctly aligning the individual point clouds to each other to obtain a globally consistent map. In this paper, we investigate the problem of multi-scan alignment to obtain globally consistent point cloud maps. We propose a 3D LiDAR bundle adjustment approach that targets high-accuracy mapping and pose estimation. Our method solves the misalignment error of the corresponding scans in a joint least squares adjustment using all available data. We utilize a continuous-time trajectory to better model the actual ego-motion of the LiDAR instead of using the classical discrete-time assumption. To enable the joint optimization of thousands of LiDAR scans, we prune the search space of correspondences and utilize out-of-core circular buffers. We show that with our general optimization strategy, we can address tasks like simultaneous localization and mapping, multi-session alignment, and scan-to-map matching with different sensor types in different application areas.
|
| |
| 10:12-10:18, Paper ThuBA.8 | |
| SE(3)-PoseFlow: Estimating 6D Pose Distributions for Uncertainty-Aware Robotic Manipulation |
|
| Jin, Yufeng | Technische Universität Darmstadt |
| Funk, Niklas Wilhelm | TU Darmstadt |
| Prasad, Vignesh | TU Darmstadt |
| Li, Zechu | Technische Universität Darmstadt |
| Franzius, Mathias | Honda Research Institute (HRI) |
| Peters, Jan | Technische Universität Darmstadt |
| Chalvatzaki, Georgia | Technische Universität Darmstadt |
Keywords: Perception for Grasping and Manipulation
Abstract: Object pose estimation is challenging due to partial observability and symmetries, which often lead to pose ambiguity. While deterministic networks struggle to capture this multi-modality, we propose a novel probabilistic framework leveraging textit{flow matching on the SE(3) manifold}. Our approach models full 6D pose distributions via sample-based estimates, enabling robust reasoning about uncertainty in ambiguous scenarios. We achieve state-of-the-art results on Real275, YCB-V, and LM-O, and demonstrate how our uncertainty-aware estimates effectively guide downstream robotic tasks such as active perception and grasp synthesis.
|
| |
| 10:18-10:24, Paper ThuBA.9 | |
| Minsound: Adding Internal Audio Sensing to Internal Vision Enables Human-Like In-Hand Fabric Recognition with Soft Robotic Fingertips |
|
| Andrussow, Iris | Max-Planck-Institute for Intelligent Systems |
| Solano, Jans | Max Planck Institute for Intelligent Systems |
| Richardson, Benjamin A. | Max Planck Institute for Intelligent Systems |
| Martius, Georg | Max Planck Institute for Intelligent Systems |
| Kuchenbecker, Katherine J. | Max Planck Institute for Intelligent Systems |
Keywords: Force and Tactile Sensing, Soft Sensors and Actuators, RIG TC: Tactile Robotics
Abstract: Distinguishing the feel of smooth silk from coarse cotton is a trivial everyday task for humans. When exploring such fabrics, fingertip skin senses both spatio-temporal force patterns and texture-induced vibrations that are integrated to form a haptic representation of the explored material. In this work, we present a robotic system that can sense both of these types of haptic information, and we investigate how each type influences robotic tactile perception of fabrics. Our robotic hand's middle finger and thumb each feature a soft tactile sensor: one is the open-source Minsight sensor that uses an internal camera to measure fingertip deformation and force at 50 Hz, and the other is our new sensor Minsound that captures vibrations through an internal MEMS microphone with a bandwidth from 50 Hz to 15 kHz. Inspired by the movements humans make to evaluate fabrics, our robot actively encloses and rubs folded fabric samples between its two sensitive fingers. Experiments test the influence of each sensing modality on overall classification performance on a new dataset of 20 common fabrics; our transformer-based method achieves a maximum fabric classification accuracy of 97%. Incorporating an external microphone away from Minsound increases our method's robustness in loud ambient-noise conditions.
|
| |
| 10:24-10:30, Paper ThuBA.10 | |
| Automated On-Site Assembly of Timber Building Components on the livMatS Biomimetic Shell |
|
| Lauer, Anja Patricia Regina | University of Stuttgart |
| Benner, Elisabeth | University of Stuttgart |
| Stark, Tim | University of Stuttgart |
| Klassen, Sergej | University of Stuttgart |
| Abolhasani, Sahar | University of Stuttgart |
| Schroth, Lukas | ETH Zürich |
| Gienger, Andreas | University of Stuttgart |
| Wagner, Hans-Jakob | University of Stuttgart |
| Schwieger, Volker | University of Stuttgart |
| Menges, Achim | Insitute for Computational Design and Construction, University of Stuttgart |
| Sawodny, Oliver | University of Stuttgart |
|
|
| |
| ThuGA |
|
| Interactive Session & Demos 2 & Lunch |
Interactive |
| |
| 11:30-13:00, Paper ThuGA.1 | |
| Pololu-Rs: A Rust-Based Framework for Reproducible Multi-Robot Experiments |
|
| Li, Jiaming | TU Berlin |
| Stentzler, Charlotte | TU Berlin |
| Roser, Johannes | TU Berlin |
| Hoenig, Wolfgang | TU Berlin |
Keywords: RIG Cluster Multi-Robot Systems, Software-Hardware Integration for Robot Systems, Wheeled Robots
Abstract: We introduce a modular, low-cost, open-source framework for differential-drive and tracked multi-robot experiments. The framework combines COTS hardware, a Rust-based real-time firmware, and a ROS 2 integration that together support reproducible, real-world multi-robot experiments. The successful usage of the framework presented in published research demonstrates its real-world applicability.
|
| |
| 11:30-13:00, Paper ThuGA.2 | |
| Extended Abstract: Preferential Bayesian Optimization with Crash Feedback |
|
| Menn, Johanna | RWTH Aachen University |
| Stenger, David | RWTH AACHEN |
| Trimpe, Sebastian | RWTH Aachen University |
Keywords: Human Factors and Human-in-the-Loop, Machine Learning for Robot Control
Abstract: Bayesian optimization is commonly used for black-box parameter tuning in robotics, but typically requires an explicit objective function, which is often unavailable in practice. Preferential Bayesian optimization (PBO) addresses this limitation by directly learning from human preferences. When applied to hardware systems, evaluating unsafe parameters can cause crashes, resulting in downtime and increased hardware stress. Standard PBO cannot exploit feedback from such failures and therefore repeatedly explores unsafe regions. We introduce CrashPBO, an extension of PBO that incorporates both preference feedback and explicit crash reports. Synthetic benchmarks show a substantial reduction in crashes, and experiments across three real robotic platforms demonstrate the potential to reduce tuning time and hardware strain, resulting in a more reliable and practical framework for parameter learning with human feedback. Video:https://tinyurl.com/crashpbo
|
| |
| 11:30-13:00, Paper ThuGA.3 | |
| Risk Awareness and Management for Autonomous Robots: Assessing Non-Perceivable Hazards through Context-Aware Safety Adaptation |
|
| Wolf, Patrick | University of Kaiserslautern-Landau | Fraunhofer IESE |
| Helten, Catharina | RPTU University of Kaiserslautern-Landau |
| Adler, Rasmus | Fraunhofer IESE |
| Schneider, Daniel | Fraunhofer IESE |
Keywords: Robot Safety, RIG Cluster: Safety, Reliability and Resilience of AI-based Robotics, RIG TC: Safety and Reliability of AI-based Robotics
Abstract: Autonomous robots can operate under a wide range of conditions and must act safely. Accordingly, the question arises of how to describe all unsafe conditions. Here, machinery manufacturers are responsible for specifying safe operating conditions, but the operator is responsible for ensuring they are met. Therefore, this interface between manufacturers and operators is becoming increasingly challenging because manufacturers do not know how operators will use the robots. Moreover, a further challenge is that some dangerous situations cannot be detected solely by sensors and must be managed by restricting the robot's operational domain. This paper discusses these challenges and proposes an approach to incorporate context-aware risk assessment into a robot's runtime autonomy. The central idea is demonstrated using a Unitree Go2 legged robot and an autonomous Unimog truck in a realistic workshop scenario.
|
| |
| 11:30-13:00, Paper ThuGA.4 | |
| Fast IMU-Based Contact Detection and Reaction for Spatial Parallel Robots for Human-Robot Collaboration |
|
| Piosik, Jan | Leibniz Universität Hannover |
| Mohammad, Aran | Leibniz University Hannover |
| Seel, Thomas | Leibniz Universität Hannover |
| Schappler, Moritz | Institute of Mechatronic Systems, Leibniz Universitaet Hannover |
Keywords: RIG Cluster: Human-Robot Interaction, Parallel Robots, Safety in HRI
Abstract: Ensuring safety in human-robot collaboration (HRC) requires rapid contact detection and reaction to minimize potential damage. While such strategies are established for serial and demonstrated for planar parallel robots, spatial parallel robots (PRs) pose unique challenges due to their complex, nonlinear dynamics. This work presents a framework for fast contact detection and reaction specifically developed for spatial PRs. The approach utilizes an Unscented Kalman Filter (UKF) for sensor fusion of IMU and encoder data to directly determine external forces. Four distinct reaction strategies—stop, zero-g, retraction, and reflex—are investigated to terminate contact immediately upon detection. Experimental results demonstrate that the IMU-based detection is approximately 3.6 times faster than a conventional momentum observer. Furthermore, the investigated strategies successfully terminate contact within a short time. These findings provide a vital foundation for implementing safe HRC in spatial parallel robotic systems.
|
| |
| 11:30-13:00, Paper ThuGA.5 | |
| AI-Assisted Risk Assessment for Safe Industrial Robot Applications |
|
| Stuhlenmiller, Florian | ABB AG Corporate Research Center Germany |
| Dai, Fan | ABB AG, Corporate Research Germany |
| Benzi, Federico | ABB AG |
Keywords: Robot Safety, AI-Based Methods, Agent-Based Systems
Abstract: Designing safe industrial robot applications is essential to protect people, equipment and environment. The growing complexity of robot applications increases the respective design effort. To support the early design phase, an agent-based approach leveraging generative artificial intelligence is presented to semi-automatically identify hazards and propose feasible risk reduction measures under human supervision. A prototypic implementation demonstrates the potential to reduce engineering effort, indicating its applicability in industrial scenarios.
|
| |
| 11:30-13:00, Paper ThuGA.6 | |
| A Multi-View Heterogeneous Multi-Robot Dataset for Relative Localization and Collaborative Perception in Dynamic Scenes |
|
| Lichtenfeld, Jonathan | Technical University of Darmstadt |
| von Stryk, Oskar | Technische Universität Darmstadt |
Keywords: RIG Cluster Multi-Robot Systems, RIG Cluster: Field Robotics, Multi-Robot SLAM
Abstract: Multi-robot research in localization and mapping has primarily focused on large-scale SLAM in static environments, often lacking the mutual visibility needed for close-range coordination. We propose a novel multi-sensor dataset featuring heterogeneous platforms with frequent mutual line-of-sight in high-dynamic scenarios. Currently in development, the dataset will provide precise ground-truth poses and pointwise dynamic labels to facilitate research in robust relative localization and multi-robot moving object detection. By offering simultaneous, diverse viewpoints of dynamic scenes, this dataset is designed to enable algorithms that overcome occlusions and improve collective scene understanding. We invite researchers within the Robotics Institute Germany (RIG) to discuss and potentially collaborate on this initiative to establish a highly recognized RIG multi-robot benchmark.
|
| |
| 11:30-13:00, Paper ThuGA.7 | |
| PINGS: Gaussian Splatting Meets Distance Fields within a Point-Based Implicit Neural Map |
|
| Pan, Yue | University of Bonn |
| Zhong, Xingguang | University of Bonn |
| Jin, Liren | University of Bonn |
| Wiesmann, Louis | University of Bonn |
| Popovic, Marija | TU Delft |
| Behley, Jens | University of Bonn |
| Stachniss, Cyrill | University of Bonn |
Keywords: SLAM, Mapping, RIG TC: Robot Perception
Abstract: Robots benefit from high-fidelity reconstructions of their environment, which should be geometrically accurate and photorealistic to support downstream tasks. While this can be achieved by building distance fields from range sensors and radiance fields from cameras, realising scalable incremental mapping of both fields consistently and at the same time with high quality is challenging. In this paper, we propose a novel map representation that unifies a continuous signed distance field and a Gaussian splatting radiance field within an elastic and compact point-based implicit neural map. By enforcing geometric consistency between these fields, we achieve mutual improvements by exploiting both modalities. We present a novel LiDAR-visual SLAM system called PINGS using the proposed map representation and evaluate it on several challenging large-scale datasets. Experimental results demonstrate that PINGS can incrementally build globally consistent distance and radiance fields encoded with a compact set of neural points. Compared to state-of-the-art methods, PINGS achieves superior photometric and geometric rendering at novel views by constraining the radiance field with the distance field. Furthermore, by utilizing dense photometric cues and multi-view consistency from the radiance field, PINGS produces more accurate distance fields, leading to improved odometry estimation and mesh reconstruction.
|
| |
| 11:30-13:00, Paper ThuGA.8 | |
| Towards Industrial Robot As a Service: Current Status and Future Research Directions |
|
| Tanz, Lukas | Technical University of Munich |
| Geng, Paul | Technical University of Munich (TUM) |
| Daub, Rüdiger | Technical University of Munich (TUM), Fraunhofer IGCV |
Keywords: Industrial Robots, Flexible Robotics, Factory Automation
Abstract: Industrial robot as a service aims to increase the flexibility and economic viability of industrial automation in high-mix, low-volume production environments. Instead of focusing on robot-specific programming, this paradigm requires integrated solutions for rapid programming and commissioning, robust perception under uncertainty, and efficient adaptation of hardware to changing products and processes. Recent advances in artificial intelligence, including large language models, symbolic planning, and data-driven optimization, have enabled new approaches to these challenges, but essential industrial requirements remain unmet. This paper highlights current research gaps in programming and commissioning, learning perception skills, and hardware adaptation.
|
| |
| 11:30-13:00, Paper ThuGA.9 | |
| How Can AI Empower Autonomous Sediment Sampling in Deep-Sea Environments? |
|
| Sourkounis, Cora Maria | Leibniz University Hannover |
| Kwasnitschka, Tom | GEOMAR Helmholtz Centre for Ocean Research Kiel |
| Raatz, Annika | Leibniz Universität Hannover |
Keywords: RIG TC: AI-driven Marine Robotics, RIG Cluster: Field Robotics
Abstract: This article presents a project focused on accelerating suction sampling in deep-sea environments through the development of an innovative robotic system. The novel design aims to reduce transportation and preparation time while enhancing the sampling process itself, as manual operation of deep-sea robots is notably time-consuming. Following the completion of the robotic system concept, the current phase involves optimizing the design of a suction sampling pipeline. This pipeline begins with the generation of a 3D model of the target area using stereo camera data. The ultimate goal is to achieve a seamless and largely automated sampling process. This article introduces the discussion on how artificial intelligence can enhance and support this pipeline, paving the way for more efficient and effective deep-sea sediment sampling.
|
| |
| 11:30-13:00, Paper ThuGA.10 | |
| Child and Parent Perspectives on SonoBox: A Robotic Contactless Ultrasound System for Pediatric Forearm Fracture Diagnosis |
|
| Golwalkar, Rucha | University of Lübeck |
| Polzin, Louis | University of Lübeck |
| de Vries, Anton | University of Lübeck |
| Tüshaus, Ludger | Klinik für Kinderchirurgie, Universitätsklinikum Schleswig-Holstein, Lübeck |
| Ernst, Floris | University of Lübeck |
|
|
| |
| 11:30-13:00, Paper ThuGA.11 | |
| FlowTouch: View-Invariant Visuo-Tactile Prediction |
|
| Bien, Seongjin | University of Technology Nuremberg |
| Kneissl, Carlo | LMU Munich |
| Ressler-Antal, Thomas | Ludwig Maximilian University of Munich |
| Fundel, Frank | LMU Munich |
| Jülg, Tobias Thomas | University of Technology Nuremberg |
| Walter, Florian | Technical University Munich |
| Ommer, Bjorn | LMU Munich |
| Kutyniok, Gitta | The Ludwig Maximilian University of Munich |
| Burgard, Wolfram | University of Technology Nuremberg |
Keywords: RIG TC: Tactile Robotics, RIG Cluster: Learning and Multimodal AI for Robotics
Abstract: Humans can predict tactile sensation from visual stimuli. This remarkable ability helps us to guide the way we manipulate objects and interact with our environment. However, tactile prediction is not an isolated capability that arises solely from vision. It draws upon prior experiences and models formed about the environment. Developing a similar capability for robots should thus not be viewed as a purely vision-based task, but rather as a problem that requires multiple modalities to achieve similar results. In this work, we propose FlowTouch, which builds upon these premises for tactile state prediction. It leverages 3D information of the target object to extract more information than would be available in an image alone, in order to achieve more robust tactile prediction capabilities that can generalize to a wider range of objects.
|
| |
| 11:30-13:00, Paper ThuGA.12 | |
| MuST-C: The Multi-Sensor, Multi-Temporal, and Multi-Crop Dataset for In-Field Phenotyping and Monitoring |
|
| Chong, Yue Linn | University of Bonn |
| Krämer, Julie | Forschungszentrum Jülich |
| Chakhvashvili, Erekle | Forschungszentrum Jülich |
| Marks, Elias Ariel | University of Bonn |
| Esser, Felix | University of Bonn |
| Dreier, Ansgar | University of Bonn |
| Rosu, Radu Alexandru | University of Bonn |
| Warstat, Kevin | Forschungszentrum Jülich |
| Pude, Ralf | University of Bonn |
| Behnke, Sven | University of Bonn |
| Muller, Onno | Forschungszentrum Jülich |
| Rascher, Uwe | Forschungszentrum Jülich GmbH |
| kuhlmann, Heiner | University of Bonn |
| Stachniss, Cyrill | University of Bonn |
| Behley, Jens | University of Bonn |
| Klingbeil, Lasse | University of Bonn |
Keywords: Data Sets for Robotic Vision, Robotics and Automation in Agriculture and Forestry, RIG TC: Agri-Robotics
Abstract: Phenotyping is crucial for understanding crop trait variation and advancing research, but is currently limited by expensive, labor-intensive monitoring. New phenotypic trait monitoring methods are being proposed to reduce this so-called phenotyping bottleneck via automation. These methods are often data-driven, requiring a dataset recorded with a specific sensor and corresponding reference values for developing novel methods. To this end, we present the MuST-C (Multi-Sensor, multi-Temporal, multiple Crops) dataset, which contains field data from various sensors collected over a growing season, covering six crop species. All data was georeferenced for alignment across sensors and dates. To collect our dataset, we deployed aerial and ground robotic platforms equipped with RGB cameras, LiDARs, and multispectral cameras, aiming to capture a wide variety of modalities and observations from different viewpoints. In addition to sensor data, we also provide manually collected leaf area index and biomass reference measurements. Our dataset enables the development of novel automatic phenotypic trait estimation methods, allows comparisons across different sensors, and generalizability across crop species.
|
| |
| 11:30-13:00, Paper ThuGA.13 | |
| ReMoRA: A Resilient and Modular Framework for Building Dependable Robotic Applications |
|
| Wu, Ruichao | Fraunhofer IPA |
| Youssef, Mohamed | University of Stuttgart |
| Kahl, Bjoern | Fraunhofer IPA |
| Kraus, Werner | Fraunhofer IPA |
| Morozov, Andrey | University of Stuttgart |
Keywords: Software Tools for Robot Programming, Engineering for Robotic Systems, RIG TC: Integration and Engineering of Industrial Robot Systems
Abstract: Robotic systems increasingly require modular and resilient software that supports reuse, runtime awareness, and structured data access. However, many existing solutions lack explicit execution supervision and semantically consistent execution traces, limiting both reliability and learning-based research. This paper presents the Resilient Modular Robotic Application framework, a ROS~2-based architecture that structures robotic applications into reusable skill servers with standardized interfaces and explicit control flows. By embedding quality supervision at skill boundaries, ReMoRA enables fault containment and transparent execution monitoring. Beyond application development, ReMoRA provides structured execution data that supports research directions such as predictive maintenance, noise-aware data collection for imitation learning, and learning over a robot's operational lifetime.
|
| |
| 11:30-13:00, Paper ThuGA.14 | |
| Zero-Shot Semantic Object Placement with Foundation Models |
|
| Mirjalili, Reihaneh | University of Technology Nuremberg |
| Krawez, Michael | University of Technology Nuremberg |
| Blei, Yannik | University of Technology Nuremberg |
| Walter, Florian | Technical University Munich |
| Burgard, Wolfram | University of Technology Nuremberg |
Keywords: Perception for Grasping and Manipulation, AI-Enabled Robotics, RIG TC: Robotics Foundation Models
Abstract: In this paper, we present a zero-shot object placing pipeline that uses pretrained foundation models to place objects in semantically appropriate set-down orientations. Given a single RGB image, we reconstruct a 3D model and estimate the object pose. We then render a small set of axis-aligned candidate orientations and prompt a vision-language model (VLM) to choose the orientation that matches proper placement. Next, we convert the chosen orientation into an end-effector rotation and execute it on the robot. We refer to this estimation-and-execution step as an alignment cycle. We repeat the alignment cycle once more after a fixed 90° yaw reorientation, and then place the object on a planar surface. Experiments on six household objects across multiple initial roll-pitch configurations achieve an average placement success rate of 0.87, with failures primarily due to perception challenges (e.g., transparent objects) or incorrect orientation selection by the vision-language model.
|
| |
| 11:30-13:00, Paper ThuGA.15 | |
| LLM-Agent Supported Programming of Micro-Assembly Machines |
|
| Wiemann, Rolf | Leibniz University of Hannover |
| Terei, Niklas | Leibniz University of Hanover |
| Raatz, Annika | Leibniz Universität Hannover |
|
|
| |
| 11:30-13:00, Paper ThuGA.16 | |
| Toward Human-Like Locomotion through Modal Gait Decomposition and Optimal Control |
|
| Kist, Arian | Technical University of Munich |
| Flor, Isabella | Technical University Munich |
| Perrin, Clément | Technische Universität München |
| Rixen, Daniel | Technische Universität München |
Keywords: Humanoid and Bipedal Locomotion, Motion Control, Whole-Body Motion Planning and Control
Abstract: The degree to which a bipedal robot's locomotion pattern resembles that of a human is rarely quantified or explicitly considered in motion planning. In this work, we present an approach to address this issue. Using modal decomposition techniques on gait data enables a quantitative analysis of motion patterns and precise comparison of bipedal robot and human locomotion. Based on this, we define an optimal control problem for bipedal locomotion to actively generate a human-like gait. Using a planar minimal bipedal model, we demonstrate preliminary results that indicate an improved human-like motion pattern and provide an outlook on future work.
|
| |
| 11:30-13:00, Paper ThuGA.17 | |
| Learning Dexterous Manipulation with Three Independent Fingers from Human Demonstrations |
|
| Gürtler, Nico | Uni. Tübingen and Max Planck Institute for Intelligent Systems |
| Andrussow, Iris | Max-Planck-Institute for Intelligent Systems |
| Walia, Rohan | University of Tübingen |
| Schölkopf, Bernhard | Max Planck Institute for Intelligent Systems |
| Martius, Georg | Uni. Tübingen and Max Planck Institute for Intelligent Systems |
Keywords: Dexterous Manipulation, In-Hand Manipulation, Imitation Learning
Abstract: Humans have proven to be powerful teachers for robot manipulation skills via imitation learning. How can we leverage this potential for robots with a morphology that differs from humans? In this work, we demonstrate that teleoperation of a three-fingered robot morphology is both feasible and effective for dexterous manipulation tasks. To address the challenges posed by the embodiment gap between human demonstrators and non-humanoid robots, we investigate three teleoperation strategies: fingertip matching using hand tracking from a commercial AR headset, direct control via motion controllers, and kinesthetic teaching with a leader robot. For each of the three strategies, we collect demonstrations on a suite of dexterous manipulation tasks, including assembling a 3D-printed object and folding a napkin. We then train manipulation policies with state-of-the-art imitation learning methods and evaluate their success on the respective tasks. The policies trained on data collected via motion controllers and kinesthetic teaching generally outperform those trained on hand-tracking data.
|
| |
| 11:30-13:00, Paper ThuGA.18 | |
| End-To-End Low-Level Neural Control of an Industrial-Grade 6D Magnetic Levitation System |
|
| Hartmann, Philipp | Bielefeld University |
| Stranghöner, Jannick | Bielefeld University |
| Neumann, Klaus | Bielefeld University / Fraunhofer IOSB-INA |
Keywords: RIG TC: Foundations of Optimization and Learning for Robotics, RIG Cluster: Learning and Multimodal AI for Robotics, Neural and Fuzzy Control
Abstract: Magnetic levitation (MagLev) is poised to revolutionize industrial automation by integrating flexible product transport and seamless manipulation. However, controlling such systems is inherently difficult due to their complex, unstable dynamics. Traditional control methods depend on complex, hand-crafted pipelines that are sensitive to model mismatches, resulting in robust but conservative solutions. In contrast, we present the first neural controller for 6D MagLev. Trained end-to-end on interaction data from a proprietary controller, it maps raw sensor data directly to coil currents. The controller demonstrates robust stabilization, generalizes to unseen trajectories, and extrapolates to previously unseen situations while maintaining accurate and robust control. This suggests that learning-based control can effectively substitute traditional engineering in demanding high-frequency physical systems. Demonstration videos are publicly available at https://sites.google.com/view/neural-maglev.
|
| |
| 11:30-13:00, Paper ThuGA.19 | |
| MoLaB - a Benchmark for Mobile Mapping Systems |
|
| Wagner, Markus | University of Bonn |
| Stapper, Tobias | University of Bonn |
| Klingbeil, Lasse | University of Bonn |
| kuhlmann, Heiner | University of Bonn |
Keywords: Mapping, Performance Evaluation and Benchmarking, SLAM
Abstract: The generation of maps of the environment is one of the tasks for which mobile sensing systems, such as robots, are utilized. These maps are created using various perception sensors, such as LiDARs, in conjunction with the pose information of the system. Determining the accuracy of these maps is challenging due to multiple influencing factors, including pose estimation and system calibration, which impact the final map. We propose a method to benchmark the 3D mapping accuracy of mobile sensing systems using a freely accessible test environment with highly accurate reference data. We evaluate the accuracy of various parameters derived from the generated 3D map of the test environment, which are relevant for real-world applications. Our approach can assess mapping accuracy under different conditions, such as changing environmental settings, and provides insights into correlations primarily arising from pose estimation.
|
| |
| 11:30-13:00, Paper ThuGA.20 | |
| FARM: Force-Aware Robotic Manipulation with Tactile-Conditioned Diffusion Policies |
|
| Helmut, Erik | Technische Universität Darmstadt |
| Funk, Niklas Wilhelm | TU Darmstadt |
| Schneider, Tim | Technical University Darmstadt |
| de Farias, Cristiana | TU Darmstadt |
| Peters, Jan | Technische Universität Darmstadt |
Keywords: Imitation Learning, Deep Learning Methods, Force and Tactile Sensing
Abstract: Contact-rich manipulation requires precise force control, yet many imitation-learning approaches treat visuotactile feedback as a passive observation rather than an explicit control target. In this work, we present Force-Aware Robotic Manipulation (FARM), an imitation learning framework that leverages high-dimensional tactile data to define a force-based action space. Using a modified version of the handheld Universal Manipulation Interface (UMI) gripper equipped with a GelSight Mini tactile sensor, we collect human demonstrations and deploy them on a matching actuated gripper. During policy rollouts, the proposed FARM diffusion policy jointly predicts robot pose, grip width, and grip force. FARM outperforms several baselines across high-force, low-force, and dynamic force adaption tasks, demonstrating the advantages of force-grounded, high-dimensional tactile observations and a force-based control space. The codebase and design files are open-sourced and available at https://tactile-farm.github.io.
|
| |
| 11:30-13:00, Paper ThuGA.21 | |
| Stein Variational Ergodic Surface Coverage with SE(3) Constraints |
|
| Li, Jiayun | Technical University of Darmstadt |
| Jin, Yufeng | Technische Universität Darmstadt |
| Teng, Sangli | University of California, Berkeley |
| Gong, Dejian | Technical University of Darmstadt |
| Chalvatzaki, Georgia | Technische Universität Darmstadt |
Keywords: Constrained Motion Planning, Optimization and Optimal Control
Abstract: Robotic surface manipulation requires generating trajectories that achieve comprehensive coverage of complex 3D surfaces while maintaining precise end-effector poses in SE(3). Although ergodic trajectory optimization (TO) provides a principled framework for coverage, existing approaches struggle on discrete point-cloud surfaces due to highly nonconvex objectives and the lack of manifold-aware sampling mechanisms. This work presents TSVEC, a sampling-based ergodic trajectory optimization framework that extends Stein Variational Gradient Descent (SVGD) to SE(3) and incorporates trajectory-level preconditioning. By formulating point-cloud ergodic coverage as inference on the SE(3) manifold, TSVEC enables parallel exploration of multiple trajectory modes while preserving geometric consistency. A Gauss–Newton preconditioner further mitigates the severe ill-conditioning inherent in long-horizon ergodic optimization. Experiments on point-cloud surface coverage benchmarks and real-world robotic surface drawing tasks demonstrate that TSVEC consistently produces higher-quality coverage trajectories than representative optimization-based and sampling-based baselines, with successful validation on a robot manipulator.
|
| |
| 11:30-13:00, Paper ThuGA.22 | |
| From Expert Fusion to Scalable Reinforcement Learning for Complex Legged Robots |
|
| Enslin, Louis-Elias | Karlsruhe Institute of Technology |
| Eichmann, Christian | FZI Research Center for Information Technology |
| Roennau, Arne | Karlsruhe Institute of Technology (KIT) |
Keywords: RIG TC: Manyfold Legged Locomotion in Various Terrains, RIG Cluster: Legged Locomotion
Abstract: This extended abstract presents a reinforcement learning (RL) approach for complex legged robots with many degrees of freedom. The work focuses on simplifying training through a multi-expert policy distillation method developed for the six-legged robot LAURON VI. Several expert policies were trained for different terrains and then combined into one generalized policy through imitation learning. The resulting controller showed smooth transitions between tasks, reduced reward shaping requirements, and improved generalization compared to single-policy RL. The approach was successfully tested in simulation and transferred to the real robot. Based on these results, future research will explore scalable RL methods for robots with higher complexity, such as quadrupeds with manipulation arms or physically coupled robots, aiming to make RL training more efficient and adaptable for real-world applications.
|
| |
| 11:30-13:00, Paper ThuGA.23 | |
| Evidential Learning of Semantic Scene Graphs for Occlusion-Aware Pepper Plant Perception |
|
| Mueller-Goldingen, Niklas | University of Bonn |
| Menon, Rohit | University of Bonn |
| Pan, Sicong | University of Bonn |
| Chenchani, Gokul Krishna Gandhi | Hochschule Bonn-Rhein-Sieg |
| Bennewitz, Maren | University of Bonn |
Keywords: Semantic Scene Understanding, RIG TC: Semantic Perception, Representation Learning
Abstract: Automated harvesting in dense horticultural environments remains challenging due to complex plant topology, severe occlusions, and limited sensor viewpoints. In sweet pepper plants, fruits, stems, peduncles, and leaves form articulated structures that are often only partially observable, leading to failures when hidden attachments or occlusions cannot be inferred reliably. While modern perception pipelines can detect and map plant instances, they typically do not explicitly model structural relations or quantify uncertainty arising from partial observability. This paper proposes an uncertainty-aware semantic scene graph formulation for pepper plant perception based on evidential deep learning. From geometric observations of plant organs, we infer semantic scene graphs whose nodes represent plant organs and whose edges encode attachment and direction-conditioned occlusion relations. A GraphSAGE-based graph neural network with evidential prediction heads models uncertainty in node and edge predictions using Dirichlet distributions. Occlusion relations are supervised using a projection-based geometric formulation in a fruit-local reference frame. Experiments on a procedurally generated dataset show that the proposed approach accurately predicts plant structure while expressing meaningful uncertainty for ambiguous relations caused by occlusion. The resulting uncertainty estimates support downstream decisions such as targeted leaf manipulation and next-best-view planning.
|
| |
| 11:30-13:00, Paper ThuGA.24 | |
| A Novel Powered Jaw Exoskeleton to Treat Temporomandibular Disorders: Design and Control Challenges |
|
| Müller, Paul-Otto | Technical University of Darmstadt |
| von Stryk, Oskar | Technische Universität Darmstadt |
Keywords: Rehabilitation Robotics, RIG Cluster: Healthcare Robotics and Human Augmentation, RIG TC: Robotic Augmentation of the Human Body
Abstract: Temporomandibular disorders severely impair masticatory function and quality of life. While powered jaw exoskeletons offer potential for rehabilitation, they face challenges related to complex biomechanics and safe force transmission. This paper presents a novel hybrid active jaw exoskeleton that addresses these issues by combining a rigid chin mechanism for precise force application with a compliant facial interface for enhanced safety. We develop a high-fidelity MuJoCo simulation and outline a control concept to handle partial observability and soft dynamics. This integrates a learned, deformation-aware dynamics model with latent states into a constrained, differentiable model predictive control scheme tuned via RL. This work establishes a foundation for safe, wearable robot-assisted therapy for temporomandibular disorders.
|
| |
| 11:30-13:00, Paper ThuGA.25 | |
| Sparse and Dense Rendering for Event-Based 3D Gaussian Splatting |
|
| Kohyama, Kai | Keio University |
| Aoki, Yoshimitsu | Keio University |
| Gallego, Guillermo | Technische Universität Berlin |
| Shiba, Shintaro | Keio University |
Keywords: RIG TC: Robot Perception, Computer Vision for Automation, SLAM
Abstract: Event cameras offer advantages over traditional frame-based cameras, making them suitable for motion and structure estimation. However, it is unclear how event-based 3D Gaussian Splatting (3DGS) approaches can leverage fine-grained temporal information of sparse events. This work proposes a framework to address the trade-off between accuracy and temporal resolution in event-based 3DGS. Our key idea is to decouple the rendering into two branches: sparse, event-by-event geometry (depth) rendering, and dense, snapshot-based radiance (intensity) rendering, by using ray-tracing and the image of warped events. Our method achieves state-of-the-art performance on real-world datasets and competitive results on a synthetic dataset. It works without prior information (e.g., pretrained image reconstruction models) or COLMAP-based initialization, is more flexible in the number of events sliced, and achieves sharp reconstruction on scene edges with fast training time. We hope that this work deepens our understanding of the sparse nature of events for 3D reconstruction.
|
| |
| 11:30-13:00, Paper ThuGA.26 | |
| Class-Incremental End-To-End Motion Prediction |
|
| Schischka, Nicolas | University of Freiburg |
| Gosala, Nikhil | University of Freiburg |
| Ravi, Kiran | Qualcomm |
| Yogamani, Senthil | Qualcomm |
| Valada, Abhinav | University of Freiburg |
Keywords: Deep Learning for Visual Perception, Incremental Learning, Continual Learning
Abstract: In recent years, end-to-end autonomous driving models for motion prediction and planning have become increasingly popular due to their potential to cope with imperfect detections by leveraging end-to-end differentiability. Camera-based end-to-end systems, in particular, have emerged as a promising alternative to LiDAR-centric pipelines due to their better affordability. In this abstract, we focus on motion forecasting, which aims to predict the future movement of all agents present in a scene over the next few seconds. Achieving this in the most effective manner is crucial for scene understanding and subsequent planning of the ego-vehicle. Despite this progress, most existing approaches implicitly assume that exhaustive annotations for all classes are available during training, which constrains real-world deployability. In practice, operational domains evolve as new classes need to be added due to regional differences or novel modes of transportation, such as e-scooters. Accommodating such changes typically requires full retraining, incurring substantial computational cost and delaying deployment. A more practical alternative is class-incremental learning, where the model is updated to recognize and forecast newly introduced agent classes using annotations for those classes only, while retaining performance on previously learned ones. Although class-incremental learning has been studied in other perception tasks such as 2D object detection and semantic segmentation, it remains largely unexplored for camera-based end-to-end motion forecasting. Moreover, even in continual learning work on tracklet-based motion forecasting that relies on ground-truth detections as input, class-incremental settings have received little attention, leaving a notable gap in the current literature.
|
| |
| 11:30-13:00, Paper ThuGA.27 | |
| Temporal Task Segmentation and Attribute-Based Rules for Hazard Identification in Human-Robot Collaboration |
|
| Scharping, Robert | Fraunhofer Institute for Factory Operation and Automation IFF |
| Öltjen, Julian | Voraus Robotik GmbH |
| Bollmann, Yannick | Fraunhofer Institute for Factory Operation and Automation IFF |
| Behrens, Roland | Fraunhofer IFF |
| Stark, Alexander | Voraus Robotik GmbH |
Keywords: Human-Robot Collaboration, Human-Centered Robotics, Human Factors and Human-in-the-Loop
Abstract: This work presents an attribute-based method for systematic hazard identification and risk assessment in human-robot collaboration applications. By formalizing ISO 12100 concepts through rule-based reasoning and temporal task segmentation, the approach reduces the possible hazard space while preserving qualified personnel's responsibility.
|
| |
| 11:30-13:00, Paper ThuGA.28 | |
| Dynamic Human-To-Robot Object Handover with VLM-Based Intention Detection and Movement Primitives |
|
| Rietsch, Sebastian | Karlsruhe Institute of Technology (KIT) |
| Ruf, Lukas | Karlsruhe Institute of Technology (KIT) |
| Asfour, Tamim | Karlsruhe Institute of Technology (KIT) |
Keywords: RIG Cluster: Human-Robot Interaction, Human-Robot Collaboration
Abstract: This work presents an initial exploration of using Vision-Language Models (VLMs) for dynamic Human-to-Robot (H2R) handovers, integrating VLM-based intention detection with Via-Point Movement Primitives (VMPs) for adaptive motion generation. By employing a structured chain-of-thought prompt and a majority vote over a prediction circular buffer, the system achieves 95.1% handover intention detection accuracy on the ARMAR-6 robot without task-specific training. Preliminary results suggest the approach can react dynamically to changing human behaviors and grasp strategies, though our evaluation reveals current challenges that must be addressed before practical deployment.
|
| |
| 11:30-13:00, Paper ThuGA.29 | |
| From Demonstrations to Safe Deployment: Path-Consistent Safety Filtering for Diffusion Policies |
|
| Römer, Ralf | Technical University of Munich |
| Balletshofer, Julian | Technical University of Munich |
| Thumm, Jakob | Technical University of Munich |
| Pavone, Marco | Stanford University |
| Schoellig, Angela P. | TU Munich |
| Althoff, Matthias | Technische Universität München |
Keywords: RIG Cluster: Safety, Reliability and Resilience of AI-based Robotics, RIG Cluster: Learning and Multimodal AI for Robotics, RIG Cluster: Human-Robot Interaction
Abstract: Diffusion policies (DPs) achieve state-of-the-art performance on complex, long-horizon manipulation tasks by learning from expert demonstration datasets. However, since they cannot guarantee safe behavior, external safety mechanisms are needed. These, however, alter actions in ways unseen during training, causing unpredictable behavior and performance degradation. To address these problems, we propose path-consistent safety filtering (PACS) for DPs. Our approach performs path-consistent braking on a trajectory computed from the sequence of generated actions, keeping execution consistent with the policy's training distribution. We verify safety using set-based reachability analysis, enabling real-time deployment at 1kHz. Our experimental evaluation in simulation and on three challenging real-world human-robot interaction tasks shows that PACS (a) provides formal safety guarantees in dynamic environments, (b) preserves task success rates, and (c) outperforms reactive safety approaches, such as control barrier functions, by up to 68% in task success. Videos and extensive results are available at tum-lsy.github.io/pacs.
|
| |
| 11:30-13:00, Paper ThuGA.30 | |
| A Framework for Learning Temporal Task Constraints for Bimanual Manipulation Tasks from Human Demonstration |
|
| Dreher, Christian R. G. | Karlsruhe Institute of Technology (KIT) |
| Dormanns, Patrick | Karlsruhe Institute of Technology (KIT) |
| Asfour, Tamim | Karlsruhe Institute of Technology (KIT) |
Keywords: RIG Cluster: Learning and Multimodal AI for Robotics, RIG TC: AI-powered and Cognition-Enabled Robotics
Abstract: This work presents a framework for learning temporal task constraints for bimanual manipulation tasks from human demonstration. The approach integrates three parts: assessing temporal relationships between actions, inferring symbolic and subsymbolic task constraints, and generating executable temporal plans for robots. By combining qualitative Allen relations with quantitative timing parameters, the framework enables flexible and precise synchronization of bimanual actions in robot task executions.
|
| |
| 11:30-13:00, Paper ThuGA.31 | |
| Augmented Reality for RObots (ARRO): Pointing Visuomotor Policies towards Visual Robustness |
|
| Mirjalili, Reihaneh | University of Technology Nuremberg |
| Jülg, Tobias Thomas | University of Technology Nuremberg |
| Walter, Florian | Technical University Munich |
| Burgard, Wolfram | University of Technology Nuremberg |
Keywords: RIG TC: Robotics Foundation Models, Imitation Learning, AI-Enabled Robotics
Abstract: In this paper, we present ARRO, a novel visual representation that leverages zero-shot open-vocabulary segmentation and object detection models to efficiently mask out task-irrelevant regions of the scene in real time without requiring additional training, modeling of the setup, or camera calibration. By filtering visual distractors and overlaying virtual cues during both training and inference, ARRO improves robustness to scene variations and reduces the need for additional data collection. We extensively evaluate ARRO with Diffusion Policy on a range of tabletop manipulation tasks in real-world environments, and further demonstrate its compatibility and effectiveness with generalist robot policies, such as Octo, OpenVLA and pizero. Across all settings in our evaluation, ARRO yields consistent performance gains, allows for selective masking to choose between different objects, and shows robustness even to challenging segmentation conditions. Videos showcasing our results are available at: https://augmented-reality-for-robots.github.io/
|
| |
| 11:30-13:00, Paper ThuGA.32 | |
| Dynamics-Informed Vision–Language Models: An Extended Abstract on Dynamics-Aware Reasoning towards Next Generation Autonomous Systems |
|
| Schäfer, Finn Rasmus | Technical University Munich |
| Betz, Johannes | Technical University of Munich |
Keywords: RIG Cluster: Learning and Multimodal AI for Robotics, AI-Enabled Robotics, Formal Methods in Robotics and Automation
Abstract: Recent advances in vision–language models (VLMs) and vision–language–action (VLA) architectures have enabled impressive semantic understanding and generalization capabilities in robotics and autonomous driving. However, current foundation models are predominantly vision-centric and often abstract away the agent’s internal physical state. In contrast, classical robotic and autonomous driving systems explicitly rely on ego-dynamics, system constraints, and physical feasibility to ensure safe and reliable behavior. This discrepancy leads to an increasing semantic–dynamic mismatch between what learned models reason about and what embodied systems can physically execute, particularly in safety-critical and out-of-distribution scenarios. In this extended abstract, we argue that the next generation of VLM and VLA architectures must move beyond vision-dominant representations and toward dynamics-informed multimodal alignment. We propose treating ego-motion and dynamic state as first-class modalities that condition semantic reasoning, rather than as post-hoc safety filters. By embedding vision, language, and dynamics into a shared representation space, models can align intent with physical feasibility and execution constraints. We discuss current trends in robotics that indicate a shift from classical sensor fusion toward representation-level multimodal alignment, while highlighting the absence of explicit ego-dynamic grounding in existing approaches. Finally, we outline key challenges related to multimodal alignment, data availability, and evaluation under physical constraints, and argue that dynamics-informed foundation models are a necessary step toward reliable, deployable embodied intelligence in robotics and autonomous driving.
|
| |
| 11:30-13:00, Paper ThuGA.33 | |
| Walking on Roofs: Exploring the Potential of Walking Robots for Construction Work on Roofs |
|
| Dettmar, Bjoern-Felix | Karlsruher Institute of Technology (KIT) |
| Roennau, Arne | Karlsruhe Institute of Technology (KIT) |
Keywords: RIG Cluster: Legged Locomotion, RIG TC: Manyfold Legged Locomotion in Various Terrains
Abstract: The construction industry remains one of the least automated sectors, with roof work posing particularly high safety risks to human workers due to steep inclines, smooth surfaces, and risk of falling. Quadruped robots offer promising mobility and adaptability, yet their suitability for roof environments has not been systematically investigated. This paper presents an experimental evaluation of the commercial quadruped robot Unitree Go2 on realistic roof inclines. An adjustable roof test rig and a motion-capture-system were employed to objectively assess locomotion performance, with a particular focus on foot slippage. Three fundamental capabilities – static balancing, getting up and laying down, and incline walking – were evaluated across increasing slope angles. The results show that the unmodified robot fails to meet manufacturer-claimed slope capabilities, with critical failure occurring at significantly lower inclines. Static balance was on average lost at (27.58±0.72)°, get-up and lay-down motions were only reliable on shallow slopes, and incline walking exhibited a highly significant quadratic increase in slippage with slope angle. These findings demonstrate that current commercial quadrupeds are not yet suitable for safe roof operation without targeted adaptations. The study establishes quantitative performance limits and provides a baseline for future roof-specific developments, including adapted gaits, improved friction modeling, and end-effector design.
|
| |
| 11:30-13:00, Paper ThuGA.34 | |
| Hashed TSDF Submapping with Loop Closure Using NDDs |
|
| Kuhlmann, Jan | Fulda University of Applied Sciences |
| Wiemann, Thomas | Fulda University of Applied Sciences |
Keywords: Mapping, SLAM
Abstract: Truncated Signed Distance Fields (TSDFs) are a continuous representation of surfaces in 3D space. For accurate mapping, loop closure detection and pose graph optimization are crucial to compensate drift. TSDF representations are normally static and therefore do not support optimization after loop detection. Overlapping submaps can be used to solve this problem at the cost of increased memory consumption. In previous work, we presented a cluster-hashed associative and discretized data structure (CHAD TSDF) tailored to address this problem. It reduces memory consumption by hashing node contents instead of spatial positions. In this paper, we extend CHAD TSDF to support memory efficient submapping for pose graph optimization. Loop closures are detected with normal distribution descriptors (NDD), which give a rough estimation of rotational error. The translational error is compensated by a gradient descent approach on TSDF values. To construct a consistent global map from the submaps, TSDF-to-TSDF fusion with weighted trilinear interpolation is used.
|
| |
| 11:30-13:00, Paper ThuGA.35 | |
| Hierarchical Bayesian Optimization for Efficient Multi-Task Robot Controller Parameter Learning |
|
| Hirt, Sebastian | TU Darmstadt |
| Theiner, Lukas | TU Darmstadt |
| Pfefferkorn, Maik | Technical University of Darmstadt |
| Findeisen, Rolf | Control and Cyber-Pysical Systems Laborator |
Keywords: Machine Learning for Robot Control, Optimization and Optimal Control, Transfer Learning
Abstract: Robots often rely on controllers with tunable parameters (e.g., MPC weights, whole-body control shaping terms, safety or comfort penalties). These parameters must be re-tuned across tasks such as changing objectives, payloads, terrains, or users, while each closed-loop evaluation may be expensive. Bayesian optimization is commonly used for this purpose, but typically treats each task independently and models the total episode cost as a black-box function of the parameters, resulting in limited data efficiency and poor task transfer. We therefore propose a hierarchical Bayesian optimization method that exploits the rollout structure of closed-loop evaluations: instead of learning a black-box mapping from parameters to total cost, we learn parameter-dependent closed-loop trajectories and compute task-specific costs from predicted rollouts. This enables efficient transfer across tasks that share the same robot and controller structure but differ in evaluation criteria. We provide theoretical guarantees showing sublinear regret comparable to standard approaches and demonstrate improved sample efficiency and faster adaptation in a multi-task simulation benchmark.
|
| |
| 11:30-13:00, Paper ThuGA.36 | |
| Effective Explanations for Belief-Desire-Intention Robots: When and What to Explain |
|
| Wang, Cong | TU Dresden |
| Calandra, Roberto | TU Dresden |
| Klös, Verena | Carl Von Ossietzky Universität Oldenburg |
Keywords: Human-Robot Collaboration, Social HRI, Human-Centered Robotics
Abstract: When robots perform complex and contextdependent tasks in our daily lives, deviations from expectations can confuse users. Explanations of the robot’s reasoning process can help users to understand the robot intentions. However, when to provide explanations and what they contain are important to avoid user annoyance. We have investigated user preferences for explanation demand and content for a robot that helps with daily cleaning tasks in a kitchen. Our results show that users want explanations in surprising situations and prefer concise explanations that clearly state the intention behind the confusing action and the contextual factors that were relevant to this decision. Based on these findings, we propose two algorithms to identify surprising actions and to construct effective explanations for Belief-Desire-Intention (BDI) robots. Our algorithms can be easily integrated in the BDI reasoning process and pave the way for better humanrobot interaction with context- and user-specific explanations. This paper summarizes and builds upon the research presented at IEEE RO-MAN 2025.
|
| |
| 11:30-13:00, Paper ThuGA.37 | |
| On the Impact of Sensor Modalities in ACT-Based Humanoid Manipulation |
|
| Kühn, Robin | Leibniz University Hanover |
| Seel, Thomas | Leibniz Universität Hannover |
| Schappler, Moritz | Institute of Mechatronic Systems, Leibniz Universitaet Hannover |
Keywords: RIG Cluster: Learning and Multimodal AI for Robotics, Imitation Learning, Humanoid Robot Systems
Abstract: The widespread adoption of humanoid robots in industrial environments is hindered by the complexity of teaching new tasks. While Imitation Learning (IL), particularly Action Chunking with Transformers (ACT), enables rapid task acquisition, there is no consensus yet on the optimal sensory hardware required for manipulation tasks. This paper benchmarks 14 sensor combinations, explicitly evaluating the integration of tactile and proprioceptive modalities alongside active vision, on the Unitree G1 humanoid robot equipped with three-finger hands. Our analysis demonstrates that strategic sensor selection can outperform complex configurations in data-limited regimes. We introduce an open-source Unified Ablation Framework that utilizes sensor masking on a comprehensive master dataset to eliminate human variability. Results indicate that additional modalities often degrade performance for IL with limited data. A minimal active stereo camera setup outperformed complex multi-sensor configurations, achieving 87.5% success in spatial generalization. Conversely, adding pressure sensors reduced success from 94% to 67% due to a low signal-to-noise ratio. We conclude that in data-limited regimes, active vision offers a superior trade-off between robustness and complexity. While tactile modalities may require larger datasets to be effective, our findings validate that strategic sensor selection is critical for designing an efficient learning process.
|
| |
| 11:30-13:00, Paper ThuGA.38 | |
| Exploiting Foundation Model Guided BEV Maps for 3D Object Detection and Tracking |
|
| Käppeler, Markus | University of Freiburg |
| Çiçek, Özgün | Bosch |
| Cattaneo, Daniele | University of Freiburg |
| Glaeser, Claudius | Robert Bosch GmbH |
| Miron, Yakov | Bosch |
| Valada, Abhinav | University of Freiburg |
Keywords: Computer Vision for Transportation, Object Detection, Segmentation and Categorization, Deep Learning for Visual Perception
Abstract: Camera-based 3D object detection and tracking are fundamental tasks in autonomous driving. Existing state-of-the-art approaches often rely exclusively on either perspective-view (PV) or bird’s-eye-view (BEV) features, limiting their ability to leverage both fine-grained object details and spatially structured scene representations. In this work, we propose DualViewDistill, a hybrid detection and tracking framework that incorporates both PV and BEV camera image features to leverage their complementary strengths. Our approach introduces BEV maps guided by foundation models, leveraging descriptive DINOv2 features that are distilled into BEV representations through a novel distillation process. By integrating PV features with BEV maps enriched with semantic and geometric features from DINOv2, our model leverages this hybrid representation via deformable aggregation to enhance 3D object detection and tracking. Extensive experiments on the nuScenes benchmark demonstrate that DualViewDistill achieves state-of-the-art performance. The results showcase the potential of foundation model BEV maps to enable more reliable perception for autonomous driving.
|
| |
| 11:30-13:00, Paper ThuGA.39 | |
| Next Best View for Text Detection and Recognition in Port Monitoring Unmanned Aerial Vehicles |
|
| Gülsoylu, Emre | University of Hamburg |
| Fiedler, Niklas | University of Hamburg |
| Frintrop, Simone | University of Hamburg |
Keywords: Aerial Systems: Applications, Computer Vision for Transportation, Motion and Path Planning
Abstract: Next-Best-View (NBV) planning is a critical capability for autonomous drones operating in complex, occluded environments. While NBV has been widely applied to tasks such as 3D reconstruction, object detection, and exploration, its use for scene-text detection and recognition, particularly in industrial settings, remains underexplored. This work addresses this gap by formalising NBV optimisation for identifying intermodal loading units (ILUs) in ports, where textual identifiers (e.g., ISO 6346 ID codes) can be occluded or degraded, leading to operational inefficiencies. We propose a two-mission approach for robust ILU identification. First, a survey mission captures nadir-view images using the Divide Areas Algorithm for Optimal Multi-Robot Coverage Path Planning (DARP), generating georeferenced 3D point clouds and orthophotos. These are processed via the Three-stage Identification of Transportation UnitS (TITUS) pipeline for ILU segmentation, text detection, and ID code recognition. However, survey missions are limited by their top-down perspective, which fails to capture legible ID codes on stacked or damaged ILUs. To resolve this, we introduce a targeted mission, where drones dynamically navigate to optimal viewpoints for text detection, guided by a novel Legibility Score (LS). The LS balances viewing angle, distance, and line-of-sight constraints to maximise ID code legibility while minimising flight time. The targeted mission leverages 3D point clouds from the survey mission to estimate each ILU’s pose. For each ILU face, candidate waypoints are sampled within a truncated half-cone and evaluated using the LS, which combines an angle term (viewing alignment) and a distance term (proximity). Waypoints are optimised using Ant Colony Optimisation, prioritising both path efficiency and legibility. This work proposes a domain-specific NBV utility function for text detection. Future work includes adaptive weighting for the LS and extending the framework to dynamic multi-drone coordination.
|
| |
| 11:30-13:00, Paper ThuGA.40 | |
| Robust Robotic Disassembly under Structural and Shape Uncertainty |
|
| Baumgärtner, Jan | Karlsruhe Institute of Technology |
| Fleischer, Jürgen | Karlsruhe Institute of Technology (KIT) |
Keywords: Disassembly, RIG TC: AI-Robotics in Industry, Task and Motion Planning
Abstract: To support the circular economy, robotic systems must not only assemble new products but also disassemble end-of-life (EOL) ones for reuse, recycling, or safe disposal. Existing approaches to disassembly sequence planning often assume deterministic and fully observable product models, yet real EOL products frequently deviate from their initial designs in both structure and shape due to wear, corrosion, or undocumented repairs. In this work, we argue that the uncertainty inherent in the structure of the EOL products should be formulated as a POMDP, and we propose a probabilistic task and motion planning framework for disassembly that can cope with such uncertainties. We furthermore show that uncertainty in the object shape can be addressed by autonomously designing grippers that consider shape uncertainty during grasp planning.
|
| |
| 11:30-13:00, Paper ThuGA.41 | |
| Synthesis Costs of Specialized Robot Controllers in an Object Retrieval and Delivery Scenario for Multi-Robot Systems |
|
| Leopardi, Paolo | University of Konstanz |
| Hamann, Heiko | University of Konstanz |
| Kuckling, Jonas | University of Konstanz |
| Kaiser, Tanja Katharina | University of Technology Nuremberg |
Keywords: RIG Cluster Multi-Robot Systems, RIG TC: Swarm Robotics, Swarm Robotics
Abstract: Designing control strategies for multi-robot systems is often guided by the intuition that dividing work into specialized roles simplifies individual robot behavior and improves overall system efficiency. This intuition is supported by biological examples of task partitioning and self-organized specialization. However, in engineered systems, specialization is not free: it introduces additional synthesis effort, coordination requirements, and interfaces between controllers that must function reliably under uncertainty. In this work, we investigate the cost of task specialization in multi-robot systems when controller synthesis is constrained by a limited evaluation budget. We study a two-stage object retrieval and delivery scenario in which robots must transport objects from a source to a target area. The task can be executed either by generalist robots that perform the full task end-to-end, or by task-specialist robots that split the task into sequential subtasks connected by an intermediate handoff. Robot controllers are synthesized using evolutionary optimization, represented as neural network policies. To ensure a fair comparison, evaluation budgets account for differences in task duration between specialist and generalist behaviors, while keeping the total number of evaluations constant. After optimization, controllers are deployed in a multi-robot setting and evaluated based on task completion performance. Our results show that, across all tested configurations, teams of generalist robots consistently outperform task-specialist teams. While specialized controllers for individual subtasks can be successfully synthesized in isolation, their combination leads to substantially lower system-level performance. Performance across specialist combinations varies widely, indicating strong sensitivity to weak links among subtasks. We attribute this performance gap primarily to task interdependence. In specialized systems, overall performance is constrained by the weaker subtask, and additional handoffs increase coordination demands and failure probabilities. In contrast, generalist robots contribute more independently to task completion, resulting in higher robustness under limited synthesis budgets.
|
| |
| 11:30-13:00, Paper ThuGA.42 | |
| Classical Trajectory Planning for Dual-Camera Visual Servoing on Edge Systems |
|
| Madavath, Abilash Philip | Cologne University of Applied Sciences |
| Aubeeluck, Chandra Yuvesh | Cologne University of Applied Sciences |
| Raju, Augustin | Cologne University of Applied Sciences |
| Pyschny, Nicolas | Cologne University of Applied Sciences |
| Zwanzig, Florian | Cologne University of Applied Sciences |
| Hackelöer, Felix | Cologne University of Applied Sciences |
Keywords: Visual Servoing, Sensor Fusion, Industrial Robots
Abstract: Dynamic grasping of moving objects in industrial environments requires tight synchronization between perception and robot actuation. Although deep learning has advanced object detection, inference latency on resource-constrained edge platforms can significantly reduce interception accuracy in high-speed conveyor systems. This paper presents a comparative analysis of classical trajectory prediction and interception algorithms implemented on an NVIDIA Jetson Orin Nano for industrial conveyor picking. We evaluate two prediction methods—RANSAC-based linear extrapolation and Kalman filtering—and benchmark five interception solvers using high-precision ground-truth data from an OptiTrack motion capture system. The results show that Kalman filtering achieves sub-2 ms execution times suitable for real-time control, while iterative numerical solvers outperform analytical closed-form solutions in robustness. Additionally, we quantify and compensate for systematic perception-to-action delays via temporal lead-time adjustment, providing practical guidelines for algorithm selection in real-time dynamic picking systems.
|
| |
| 11:30-13:00, Paper ThuGA.43 | |
| Smart Fabrics: A Scalable, Modular Approach to In-Place Printed Strain Sensors for Robotic Proprioception |
|
| Macher, Philipp Linus | Technical University of Darmstadt |
| Ali, Usama | Technische Universität Darmstadt |
| Gross, Roderich | Technical University of Darmstadt |
Keywords: RIG Cluster Multi-Robot Systems, RIG TC: Reconfigurable Robotics, RIG TC: Swarm Robotics
Abstract: We present a smart fabric with sensing ability that can scale to large sensor counts due to its modular structure. The fabric combines inkjet-printed dual-layer resistive strain sensors with distributed compute nodes. By comparing resistance changes on the two sensor layers, the system distinguishes stretching from bending within the same sensing element. Each compute node digitizes up to six sensors using a high-resolution ADC and transmits measurements via identifier-based CAN arbitration. Prototype measurements and throughput analysis support operation at approximately 4,750 sensors at 1 Hz, or 36 sensors at 125 Hz for the tested message format and acquisition pipeline. This architecture enables reconfigurable, large-area robotic fabrics with scalable wiring and incremental node addition.
|
| |
| 11:30-13:00, Paper ThuGA.44 | |
| Leveraging Generative Models to Learn Preference Vectors for Context-Based Multi-Objective Robot Navigation |
|
| Sethuraman, Tharun | Hochschule Bonn-Rhein-Sieg |
| Agrawal, Subham | University of Bonn |
| de Heuvel, Jorge | University of Bonn |
| Hassan, Teena | Bonn-Rhein-Sieg University of Applied Sciences |
| Bennewitz, Maren | University of Bonn |
Keywords: Human-Aware Motion Planning, Humanoid Robot Systems, RIG TC: Robotics Foundation Models
Abstract: Robots are increasingly deployed in environments where they share physical space with humans and collaborate on tasks. In such settings, humans expect robots to follow social norms and personal preferences, ensuring comfort, safety, and acceptance. These dynamic preferences are context-dependent, shaped by environmental factors, such as the room type or object locations, and necessitate context understanding to reflect human preferences. Recently, advancements in generative models have led to general-purpose reasoning systems with strong generalization and contextual understanding capabilities. These capabilities offer great potential for robot behavior in human-shared environments, as they allow robots to reason about their surroundings, but are often impractical for direct robot control, due to high latency and resource consumption. In response, hybrid approaches using low-latency motion policies, such as Multi Objective Reinforcement Learning(MORL) for low-level control, offer a viable alternative solution.Such multi-objective navigation approaches use a numerical vector to weigh the different objectives during runtime and, in this way, tune robot behavior to reflect user preferences. However, since numerical preference vectors are not intuitive for users, we propose a framework that uses multiple generative models to understand and maintain context-dependent preferences and translate them into vectors for MORL control.
|
| |
| 11:30-13:00, Paper ThuGA.45 | |
| Correct Robots before They Make Mistakes: Proactive Interactive Learning Framework Via Extended Reality |
|
| Jiang, Xinkai | Karlsruhe Institute of Technology |
| Zhou, Hongyi | Karlsruhe Institute of Technology |
| Vanjani, Pankhuri | Karlsruhe Institute of Technology |
| Li, Zhuoyue | Karlsruhe Institute of Technology |
| Baki, Ahmad | Karlsruhe Institute of Technology |
| Neumann, Gerhard | Karlsruhe Institute of Technology |
| Lioutikov, Rudolf | Karlsruhe Institute of Technology |
Keywords: Imitation Learning, Learning from Demonstration, Human Factors and Human-in-the-Loop
Abstract: Imitation learning has shown strong potential for training robot policies from human demonstrations, but its performance critically depends on large, high-quality datasets. In practice, limited data coverage often causes learned policies to encounter out-of-distribution states during execution, leading to compounding errors and task failures. Addressing these failures typically requires human intervention; however, existing approaches rely on post-hoc manual inspection and correction of collected data, which is labor-intensive and difficult to scale. Extended Reality (XR) offers a natural interface for human-in-the-loop robot learning by enabling intuitive visualization and interaction with 3D robot states, trajectories, and policy behavior. In this work, we propose a Extended Reality–based framework that allows humans to proactively correct robot data and policies before failures occur. By visualizing policy execution, users can provide timely, structured corrections that are directly integrated into the learning pipeline. Our approach shifts human correction from a reactive, offline process to a proactive, in-context interaction, reducing the need for manual dataset cleanup while improving data efficiency and policy robustness. This framework demonstrates the potential of Extended Reality as a scalable and effective tool for human-guided robot policy learning.
|
| |
| 11:30-13:00, Paper ThuGA.46 | |
| Robotics Data Management at Scale Via Query-Centric Storage |
|
| Krack, Pierre | University of Technology Nuremberg |
| Blei, Yannik | University of Technology Nuremberg |
| Jülg, Tobias Thomas | University of Technology Nuremberg |
| Walter, Florian | Technical University Munich |
| Burgard, Wolfram | University of Technology Nuremberg |
Keywords: Data Sets for Robot Learning, RIG Cluster: Learning and Multimodal AI for Robotics, Big Data in Robotics and Automation
Abstract: Robot learning research is increasingly constrained by data engineering. Datasets vary in structure, modalities, and file formats, requiring significant effort in reading documentation, writing parsers, extracting, transforming, and loading code, and converting large datasets into task-specific formats—only to repeat the process for every new dataset, model, or experiment. At its core, researchers face a data problem that has been studied extensively by the database community. By taking a database perspective, robotics datasets can be treated as structured, queryable collections rather than opaque files tied to specific training pipelines. In this paper, we analyze the data requirements of robot learning research and propose a query-centric approach to storing datasets. We show how heterogeneous robotics datasets can be explored, filtered, transformed, and combined using simple, high-performance SQL queries.
|
| |
| 11:30-13:00, Paper ThuGA.47 | |
| Natural Control – Hybrid Impedance on Series Elastic Actuators |
|
| Vonwirth, Patrick | RPTU University Kaiserslautern-Landau |
| Berns, Karsten | University of Kaiserslautern-Landau |
Keywords: Compliance and Impedance Control, Natural Machine Motion, RIG Cluster: Legged Locomotion
Abstract: Modern robotics, especially humanoids, made significant advances in actuation, control, and motion capabilities. However, they are still outperformed by their biological counterparts, particularly in adaptability and computational power. Thus, studying natural control offers significant potential to advance the fundamental principles of robot control. From the natural muscular tendon complex and the first neural reflex circuits, natural control can be modeled as a hybrid impedance approach. It features centralized whole-body force control, stabilized with distributed local damping.
|
| |
| 11:30-13:00, Paper ThuGA.48 | |
| Object Collection with Modular Robots in Aquatic Environments |
|
| Ali, Usama | Technische Universität Darmstadt |
| Lei, Zheshui | The University of Sheffield |
| Talamali, Mohamed S. | University of Sheffield |
| Argote-Gerald, Jahir | The University of Sheffield |
| Miyauchi, Genki | The University of Sheffield |
| Rau, Julian | Technical University of Darmstadt |
| Cao, Lin | University of Sheffield |
| Gross, Roderich | Technical University of Darmstadt |
Keywords: RIG Cluster Multi-Robot Systems, RIG TC: Reconfigurable Robotics, RIG TC: Multi-Robot Coordination
Abstract: Floating-object collection on water surfaces is an important capability for environmental cleanup and monitor- ing. We study object collection using a modular aquatic robot assembled into a U-shaped morphology that captures stationary objects by funneling and retaining them in a frontal cavity, avoiding precise grasping. The central technical challenge is scalability: as the number of modules increases, the number of independently actuated pump faces grows rapidly. We present a morphology-aware wrench mapping and a constant-size com- posite allocation method that realizes body wrench commands with nearly constant optimization dimension, independent of module count. In simulation, the composite allocator achieves collection performance comparable to a face-level (“granular”) allocator while providing substantial computational advantages at scale. We further show how cavity geometry should be adapted to object density and how modular resolution improves robustness to sensor/actuator faults with diminishing returns.
|
| |
| 11:30-13:00, Paper ThuGA.49 | |
| Leveraging 2D Foundation Models for 3D Segmentation |
|
| Knaebel, Karim | RWTH Aachen University |
| Yilmaz, Kadir | RWTH Aachen University |
| de Geus, Daan | Eindhoven University of Technology |
| Hermans, Alexander | RWTH Aachen University |
| Adrian, David Benjamin | Bosch Corporate Research & Ulm University |
| Linder, Timm | Robert Bosch GmbH |
| Leibe, Bastian | RWTH Aachen University |
Keywords: Deep Learning for Visual Perception
Abstract: Vision foundation models (VFMs) trained on large-scale image datasets provide high-quality features that have significantly advanced 2D visual recognition. However, their potential in 3D vision remains largely untapped, despite the common availability of 2D images alongside 3D point cloud datasets. While significant research has been dedicated to 2D--3D fusion, recent state-of-the-art 3D methods predominantly focus on 3D data, leaving the integration of VFMs into 3D models underexplored. In this work, we challenge this trend by introducing DITR, a simple yet effective approach that extracts 2D foundation model features, projects them to 3D, and finally injects them into a 3D point cloud segmentation model. DITR achieves state-of-the-art results on both indoor and outdoor 3D semantic segmentation benchmarks.
|
| |
| 11:30-13:00, Paper ThuGA.50 | |
| Data Generation Via Reinforcement Learning for Language-Conditioned Bimanual Dexterous Manipulation |
|
| Li, Zechu | Technische Universität Darmstadt |
| Jin, Yufeng | Technische Universität Darmstadt |
| Liu, Puze | German Research Center for Artificial Intelligence |
| Peters, Jan | Technische Universität Darmstadt |
| Chalvatzaki, Georgia | Technische Universität Darmstadt |
Keywords: Dexterous Manipulation, Reinforcement Learning
Abstract: A key bottleneck in training generalist policies for
bimanual dexterous manipulation is the lack of large-scale,
high-quality datasets. Synthetic data generation in
simulation provides a scalable alternative to human video
demonstrations by overcoming challenges such as morphology
mismatch, missing physical interactions, and the generation
of robot actions. We propose a systematic RL-based data
generation framework that integrates generalizable reward
design, effective domain randomization, and
language-conditioned task annotations. This framework
synthesizes diverse, high-quality datasets for dexterous
bimanual manipulation and enables training of
language-conditioned multi-task policies. Our experiments
show that the generated data significantly improves
generalization across a wide range of manipulation tasks.
|
| |
| 11:30-13:00, Paper ThuGA.51 | |
| Steering-Angle-Controlled Robotic Ultrasound for Spinal Imaging |
|
| Bi, Yuan | TUM |
| Duelmer, Felix | Technical University of Munich |
| Manalil, Larissa | TUM |
| Navab, Nassir | TU Munich |
Keywords: Medical Robots and Systems, RIG TC: Surgical Robotics
Abstract: Spinal interventions are commonly guided by fluoroscopy or computed tomography (CT), exposing both patients and clinicians to ionizing radiation. Robotic ultrasound (US) offers a real-time, radiation-free alternative, but accurate 3D reconstruction of the spine remains challenging due to limited visualization of surfaces aligned with the ultrasound propagation direction. We propose a robotic ultrasound scanning approach that dynamically controls the steering angle of a linear probe to enhance spinal surface visibility. Ultrasound images acquired at multiple steering angles are fused to generate a more complete 3D reconstruction of the spinal surface. Experimental results demonstrate improved reconstruction accuracy and completeness compared to fixed-angle scanning, achieving a mean error of 0.79 mm and a coverage of 80%. This approach provides improved 3D visualization of spinal anatomy and could potentially support downstream tasks such as image-guided spinal interventions.
|
| |
| 11:30-13:00, Paper ThuGA.52 | |
| Damage Risk Quantification for Robot Collisions Using Vision-Language Models |
|
| Kiemel, Jonas | Karlsruhe Institute of Technology |
| Oztop, Erhan | Osaka University / Ozyegin University |
| Asfour, Tamim | Karlsruhe Institute of Technology (KIT) |
Keywords: RIG Cluster: Safety, Reliability and Resilience of AI-based Robotics, RIG TC: Safety and Reliability of AI-based Robotics, Collision Avoidance
Abstract: This work investigates the use of Vision-Language Models (VLMs) to estimate the risk of damage from robot-object collisions. Using a curated dataset of 100 images, each depicting a moving object or substance close to collision with a robot, we evaluate how state-of-the-art VLMs quantify the risk of damage to both the robot and the object on a scale from 0 to 10. While numerical outputs vary among models, an analysis across eight object categories shows that VLMs can produce plausible risk quantifications. Our dataset of everyday objects provides reference points for quantified risk values, enabling future VLM applications in damage-aware collision avoidance.
|
| |
| 11:30-13:00, Paper ThuGA.53 | |
| Task and Motion Planning for Humanoid Loco-Manipulation |
|
| Ciebielski, Michal | Technical University of Munich |
| Dhédin, Victor | Technical University of Munich |
| Khadiv, Majid | Technical University of Munich |
Keywords: RIG TC: Foundations of Optimization and Learning for Robotics, RIG Cluster: Legged Locomotion, RIG TC: Safety and Reliability of AI-based Robotics
Abstract: This work presents an optimization-based task and motion planning (TAMP) framework that unifies planning for locomotion and manipulation through a shared representation of contact modes. We define symbolic actions as contact mode changes, grounding high-level planning in low-level motion. This enables a unified search that spans task, contact, and motion planning while incorporating whole-body dynamics, as well as all constraints between the robot, the manipulated object, and the environment. Results on a humanoid platform show that our method can generate a broad range of physically consistent loco-manipulation behaviors over long action sequences requiring complex reasoning. To the best of our knowledge, this is the first work that enables the resolution of an integrated TAMP formulation with fully acyclic planning and whole body dynamics with actuation constraints for the humanoid loco-manipulation problem.
|
| |
| 11:30-13:00, Paper ThuGA.54 | |
| A Gamified Testbed for Teleoperated Robotics Enabled by Digital Twins and 6G |
|
| Sabanovic, Kevin-Ismet | TU Dortmund University |
| Schippers, Hendrik | TU Dortmund University |
| Heimann, Karsten | TU Dortmund University |
| Wietfeld, Christian | TU Dortmund University |
Keywords: Telerobotics and Teleoperation, Human-Robot Collaboration, Virtual Reality and Interfaces
Abstract: Teleoperation is a key enabler for robotic systems in complex and unpredictable environments. With the advent of 6G, immersive teleoperation using Virtual Reality (VR) and Digital Twins (DTs) is becoming a core service, offering intuitive control and robust perception through semantic state representations. We developed a teleoperation testbed that combines two industrial robotic arms with a virtualized air hockey scenario, enabling real-time interaction under different visual feedback modes. To showcase the system’s capabilities, we created a science communication video, which was successfully presented at public events, generating significant interest and demonstrating the potential of the platform. We evaluated four modes: high-quality 6G VR Video and 6G VR DT, and their impaired counterparts, impaired VR Video and impaired VR DT, under emulated wireless impairments. Players controlled the robots using gesture-based interaction in VR, ensuring consistent control while isolating the impact of visual feedback. Our findings reveal that digital twin feedback is significantly more robust to packet loss compared to video streaming. While impaired VR DT maintained near-optimal performance, impaired VR Video suffered from severe artifacts and interruptions, leading to reduced playability and confidence. Digital twin feedback achieved up to 70% higher offensive and defensive performance under degraded conditions, while requiring substantially less bandwidth. These results highlight the potential of semantic, state-based visual feedback to enhance both Quality of Experience (QoE) and reliability, providing a robust foundation for immersive teleoperation in challenging network environments.
|
| |
| 11:30-13:00, Paper ThuGA.55 | |
| A Force-Amplified Tendon–Pulley Finger for Humanoid Robotic Hands |
|
| Mueller, Tobias | Fulda University of Applied Sciences |
| Schultheis, Marius | Fulda University of Applied Sciences |
| Schreiner, Niklas | Alpaka Innovation |
Keywords: Human-Robot Collaboration, Humanoid Robot Systems, Grippers and Other End-Effectors
Abstract: Humanoid robotic hands require compact finger mechanisms with high force density to robustly grasp a wide range of real-world objects. This contribution presents a finger module with two actively driven degrees of freedom and integrated force amplification, combining electric motors with planetary gearboxes and a miniaturized block-and-tackle (4:1) tendon–pulley mechanism.
|
| |
| ThuMA |
|
| Interactive Session & Demos 3 & Coffee |
Interactive |
| |
| 15:00-16:20, Paper ThuMA.1 | |
| Out of the Cage: Advanced Functional Safety for Humanoids – Positron Safety AI Architecture |
|
| Weisshardt, Florian | Synapticon |
| Fröhlich, Tim | Synapticon |
| Volpert, Dieter | Synapticon |
| Lukin, Petr | Synapticon |
| Bharadwaj, Varun | Synapticon |
| Ingale, Abhilash | Synapticon |
| Habib, William | Synapticon |
| Ballesteros, Roque | Synapticon |
| Gofre, Jauri | Synapticon |
Keywords: Robot Safety, Humanoid Robot Systems, AI-Enabled Robotics
Abstract: As humanoid robotics mature from research novelties to viable solutions in logistics, healthcare, and home assistance, they must exit the traditional industrial cage. However, existing safety paradigms - predicated on the assumption that a stopped robot is a safe robot - are insufficient for dynamically stable mobile robots. Humanoids introduce unique risks: they are mechanically unstable (inverted pendulums), heavy, and increasingly driven by non-deterministic AI controllers. Existing safety paradigms (STO) are dangerous for unstable bipedal systems. This paper introduces Positron Safety AI, a 3-layer architecture (Safe Motion, Safe Human Detection, AI Behavior) designed to address humanoid tipping hazards and AI non-determinism and proposing a new separation distance calculation for humanoids based on the formula from ISO 10218-2:2025 Annex L.
|
| |
| 15:00-16:20, Paper ThuMA.2 | |
| Strongly Entangled Wire Harness Disentangling with Interactive Perception |
|
| Zhou, Zexu | University of Stuttgart |
| Zeh, Lukas | University of Stuttgart |
| Lechler, Armin | University Stuttgart |
| Verl, Alexander | University of Stuttgart |
Keywords: Intelligent and Flexible Manufacturing, RIG TC: Deformable Object Manipulation, RIG Cluster AI-Powered Industrial Robotics
Abstract: Scientists have been researching robots for decades in order to enable them to manipulate objects like humans. But it is still a challenge to manipulate deformable objects dexterously. In the automotive industry, there has been significant interest in robotized wire harness assembly. At our institute, a series of depth image based tracking solutions for shape-variant cables and complex branched wire harnesses have been implemented. For more complex wire harnesses, graph-based topology matching is enabled using feature extraction. But the strongly entangled wire harness has presented an enormous challenge to perception. A disentangling solution inspired by interactive perception could address the problem.
|
| |
| 15:00-16:20, Paper ThuMA.3 | |
| Diffusion-Based Radar Point Cloud Enhancement for Robust 3D Perception |
|
| Xiong, Mengchen | Technical University of Munich |
| Peng, Yifei | Technical University of Munich |
| Xu, Xiao | Technical University of Munich |
| Steinbach, Eckehard | Technical University of Munich |
Keywords: RIG TC: Robot Perception, RIG Cluster: Rigorous Perception, Object Detection, Segmentation and Categorization
Abstract: Millimeter-wave radar is a robust sensing modality for autonomous perception, yet its utility for 3D tasks is often limited by inherent sparsity and multipath noise. In this work, we present a diffusion-based framework that reconstructs dense, LiDAR-like geometric representations from sparse radar data. Experimental results show that the enhanced radar point clouds effectively recover scene geometry and enable reliable performance in downstream 3D object detection.
|
| |
| 15:00-16:20, Paper ThuMA.4 | |
| Fast and Accurate Radar-Only Teach-And-Repeat Localization |
|
| Hilger, Maximilian | Technical University of Munich |
| Adolfsson, Daniel | Örebro University |
| Becker, Ralf | Company Bosch Rexroth |
| Andreasson, Henrik | Örebro University |
| Lilienthal, Achim J. | TU Munich |
Keywords: RIG Cluster: Field Robotics, RIG TC: Civil Safety Robotics, Localization
Abstract: Reliable localization in prior maps is crucial for autonomous navigation, especially in vision-degraded settings where optical sensors may fail. In this work, we present a teach-and-repeat localization pipeline utilizing a spinning radar, designed for robust and accurate performance in adverse conditions. Our method performs localization by jointly aligning incoming scans to stored keyframes from the teach pass and to a sliding window of recent live keyframes. We represent scans as a sparse set of oriented surface points, computed from Doppler-compensated measurements. The map is maintained as a pose graph whose nodes are traversed during localization. Experiments on the Boreas dataset demonstrate localization accuracies of 0.117 m and 0.096°, corresponding to improvements of up to 63% over the previous state of the art, while running efficiently at 29 Hz. These results reduce the gap to lidar-level localization, with the largest improvement observed in heading estimation.
|
| |
| 15:00-16:20, Paper ThuMA.5 | |
| Uncertainty-Aware Intention Prediction from Egocentric Video: A Controlled Comparison of Temporal Models |
|
| Schlegel, Patricia | University of Tuebingen |
| Gaus, Johannes Albert | University of Tuebingen |
| Wochner, Isabell | University of Tübingen |
| Haeufle, Daniel Florian Benedict | University of Tübingen |
|
|
| |
| 15:00-16:20, Paper ThuMA.6 | |
| Comparison of Omni-Directional Platforms for Mobile Manipulation |
|
| Hess, Daniel | University of Applied Sciences and Arts in Dortmund |
| Trinh, Buu Hai Dang | Dortmund University of Applied Sciences and Arts, Dortmund, Germany |
| Roehrig, Christof | Univ. of Appl. Sci. Dortmund |
|
|
| |
| 15:00-16:20, Paper ThuMA.7 | |
| Evaluating an MR-Based System for Human–Robot Assembly Training |
|
| Lang, Silvio | Technical University of Applied Sciences Würzburg-Schweinfurt (thws) |
| Pfister, Tom | Technical University of Applied Sciences Würzburg-Schweinfurt |
| Kaupp, Tobias | Technical University of Applied Sciences Würzburg-Schweinfurt |
|
|
| |
| 15:00-16:20, Paper ThuMA.8 | |
| A Comparative Study of Intuitive Teleoperation Interfaces for Dexterous Robotic Manipulation |
|
| Zhong, Weiqiang | Karlsruhe Institute of Technology (KIT) |
| Welte, Edgar | Karlsruhe Institute of Technology (KIT) |
| Rayyes, Rania | Karlsruhe Institute of Technology (KIT) |
Keywords: Telerobotics and Teleoperation, Dexterous Manipulation, Human-Robot Collaboration
Abstract: Teleoperating dexterous robotic hands remains challenging due to limited feedback, occlusions, and the difficulty of accurately mapping human hand motion to high-DOF robot joints. This work presents a comparative study of three teleoperation interfaces for controlling a Shadow Dexterous Hand: a haptic glove with force-feedback, a VR headset with hand tracking, and a custom vision-based stereo camera system. The objective of this study is to systematically compare these interfaces in terms of task performance, control reliability, usability, and user experience, and to identify their task-dependent trade-offs. All interfaces are evaluated within a unified teleoperation framework, sharing the same Shadow Hand control interface while relying on different sensing modalities. User experiments across three manipulation tasks (gesture imitation, pick-and-place, and pouring) provide objective performance measures and subjective user experience ratings, highlighting the strengths and limitations of each interface. This work reports preliminary results from an initial user study with 15 participants, intended to inform an ongoing and larger-scale evaluation.
|
| |
| 15:00-16:20, Paper ThuMA.9 | |
| Reinforcement Learning Control of Unstable Nonlinear Physical Systems: An Inverted Hydraulic Pendulum Application |
|
| Karaoglu, Selim | RWTH Aachen University, Institute for Fluid Power Drives and Systems (ifas) |
| Roeder, Patrick | RWTH Aachen University, Institute for Fluid Power Drives and Systems (ifas) |
| Brumand-Poor, Faras | RWTH Aachen University, Institute for Fluid Power Drives and Systems (ifas) |
| Schmitz, Katharina | RWTH Aachen University, Institute for Fluid Power Drives and Systems (ifas) |
|
|
| |
| 15:00-16:20, Paper ThuMA.10 | |
| Style-Biased Reinforcement Learning for Quadruped Locomotion |
|
| Ju, Siwei | Technische Universität Darmstadt |
| Peters, Jan | Technische Universität Darmstadt |
| Arenz, Oleg | Technische Universität Darmstadt |
Keywords: Imitation Learning, Legged Robots, RIG Cluster: Legged Locomotion
Abstract: Reinforcement learning has emerged as a powerful approach for learning locomotion policies, typically commanded through desired velocities or keyframes. However, such interfaces lack the spatial and temporal expressiveness needed to capture motion styles or to serve as context for low-level policies in hierarchical settings. When using more detailed references such as end-effector trajectories, manual tuning of reward coefficients becomes difficult. In addition, reference motions generated by high-level policies or originating from different embodiments (e.g., humans or dogs) are often physically infeasible, leaving the agent uncertain about when to deviate from them. To address these challenges, we propose a style-biased reinforcement learning (SBRL) framework that formulates hybrid reinforcement–imitation learning as a constrained optimization problem, automatically adjusting reward coefficients to satisfy predefined imitation error bounds. We further introduce a receding-horizon trajectory prediction module that improves temporal credit assignment. We evaluate our method on both simulated and real quadruped locomotion tasks with toe trajectory tracking, demonstrating that it achieves a more favorable Pareto frontier than prior state-of-the-art approaches.
|
| |
| 15:00-16:20, Paper ThuMA.11 | |
| UniFField: A Generalizable Unified Neural Feature Field for Visual, Semantic, and Spatial Uncertainties in Any Scene |
|
| Maurer, Christian | Technische Universität Darmstadt |
| Jauhri, Snehal | Technische Universität Darmstadt |
| Lueth, Sophie C. | Technische Universität Darmstadt |
| Chalvatzaki, Georgia | Technische Universität Darmstadt |
Keywords: Deep Learning for Visual Perception, Computer Vision for Automation, RGB-D Perception
Abstract: Comprehensive visual, geometric and semantic understanding of a 3D scene is crucial for successful execution of robotic tasks, especially in unstructured and complex environments. While recent 3D neural feature fields enable robots to leverage pretrained vision models for tasks such as language-guided manipulation and navigation, existing methods are typically scene-specific and do not model prediction uncertainty. We present UniFField, a unified uncertainty-aware neural feature field that combines visual, semantic, and geometric features in a single generalizable representation while also predicting uncertainty in each modality. Our approach generalizes zero-shot to any new environment, incrementally integrates RGB-D images into our voxel-based feature representation as the robot explores the scene, simultaneously updating uncertainty estimation. We evaluate the quality of the uncertainty predictions and demonstrate their effectiveness in an active object search task with a mobile manipulator robot.
|
| |
| 15:00-16:20, Paper ThuMA.12 | |
| An Integrated Robotic Platform for Autonomous Fresco Assembly |
|
| Dengler, Nils | University of Bonn |
| Kreis, Benedikt | University of Bonn |
| Catalano, Manuel Giuseppe | Istituto Italiano Di Tecnologia |
| Tsagarakis, Nikos | Istituto Italiano Di Tecnologia |
| Bennewitz, Maren | University of Bonn |
Keywords: Manipulation Planning, Dual Arm Manipulation, Assembly
Abstract: Preserving cultural heritage is a fundamental challenge in modern archaeology, as it enables the transfer of knowledge across generations. However, this process is complicated by factors such as natural aging, environmental change, and human activities. In the case of the ancient city of Pompeii, countless archaeological treasures were damaged or destroyed by the eruption of mount Vesuvius and later by bombings during the second world war. Archaeological restoration is traditionally performed by hand and requires exceptional skill and patience, often taking months or years depending on the number of fragments. When considering the reconstruction of ancient frescoes in this context, it is comparable to assembling a jigsaw puzzle with damaged or missing pieces and no reference image. In this work, we present an integrated robotic platform designed to support the reconstruction process in a safe and robust manner, as handling ancient fresco fragments differs fundamentally from industrial robotics tasks in structured environments. Therefore, within the EU Horizon 2020 project RePAIR, we developed a dual-arm robotic system that integrates perception, motion planning, and grasping to enable precise manipulation and assembly of fragmented cultural heritage frescoes. Building upon game-theoretic puzzle-solving algorithms, we validate the system through real-world assembly trials under supervised conditions.
|
| |
| 15:00-16:20, Paper ThuMA.13 | |
| FBGA: A Forward-Backward Method for Online Time-Optimal Velocity Planning with Generic Acceleration Constraints |
|
| Piazza, Mattia | University of Trento |
| Piccinini, Mattia | Technical University of Munich |
| Taddei, Sebastiano | University of Trento, Politecnico Di Bari |
| Biral, Francesco | University of Trento |
| Bertolazzi, Enrico | University of Trento |
Keywords: Constrained Motion Planning, Optimization and Optimal Control, Motion and Path Planning
Abstract: We present FBGA, a new algorithm for time-optimal velocity planning under generic acceleration constraints. By extending previous forward-backward approaches to handle custom acceleration constraints, our FBGA matches the accuracy of optimal control baselines while being up to three orders of magnitude faster. Our open-source C++ implementation is available at: https://github.com/DRIVEWISE/FBGA.
|
| |
| 15:00-16:20, Paper ThuMA.14 | |
| Learning Quadruped Locomotion from Casual Videos |
|
| Hausdörfer, Oliver | Technical University of Munich (Chair of Safety, Performance and Reliability for Learning Systems, Prof. Angela Schoellig) |
| von Rohr, Alexander | Technical University of Munich |
| Skubacz, Filip | Technical University of Munich |
| Omar, Shafeef | Munich Institute of Robotics and Machine Intelligence, Technical University of Munich |
| Zhou, Siqi | Technical University of Munich |
| Khadiv, Majid | Technical University of Munich |
| Schoellig, Angela P. | TU Munich |
|
|
| |
| 15:00-16:20, Paper ThuMA.15 | |
| A Participatory Interview Study for Service Robots and Their Value for Care |
|
| Klein, Stina | University of Augsburg |
| Shen, Shuyuan | University of Augsburg |
| Andre, Elisabeth | University of Augsburg |
| Kraus, Matthias | University of Augsburg |
Keywords: Human-Centered Robotics, Long term Interaction, Social HRI
Abstract: Care facilities increasingly consider service robots (SRs) to mitigate staff shortages and documentation burden, yet adoption often stalls because robots misalign with care as a value-driven, relational practice. We report a participatory interview study in three German care facilities with caregivers (n=7) and care recipients (n=3), grounded in Value-Sensitive Design (VSD). Stakeholders identified credible SR roles in logistics, documentation support, reminders, and wayfinding/visitor guidance, but drew strong boundaries around intimate and safety-critical care tasks. Acceptance depends less on any single function than on whether SRs can adapt their initiative, autonomy, timing, and interaction modality to preserve human attentiveness and warmth, sustaining independence and safety concerns for care recipients and caregivers' job security, control over their workload, and legal constraints. We synthesize these insights into a framing of fluid adaptivity as an operational bridge from abstract values to concrete robot behavior.
|
| |
| 15:00-16:20, Paper ThuMA.16 | |
| COFFAIL: A Dataset of Successful and Anomalous Robot Skill Executions in the Context of Coffee Preparation |
|
| Mitrevski, Alex | Chalmers University of Technology |
| Salunke, Ayush | Hochschule Bonn-Rhein-Seig |
Keywords: Data Sets for Robot Learning, Learning from Demonstration
Abstract: In the context of robot learning for manipulation, curated datasets are an important resource for advancing the state of the art; however, available datasets typically only include successful executions or are focused on one particular type of skill. In this short paper, we briefly describe a dataset of various skills performed in the context of coffee preparation. The dataset, which we call COFFAIL, includes both successful and anomalous skill execution episodes collected with a physical robot in a kitchen environment, a couple of which are performed with bimanual manipulation. In addition to describing the data collection setup and the collected data, the paper illustrates the use of the data in COFFAIL to learn a robot policy using imitation learning.
|
| |
| 15:00-16:20, Paper ThuMA.17 | |
| Transparent Robot Skill Execution Using Visual Predictive Capabilities |
|
| Mitrevski, Alex | Chalmers University of Technology |
| Zhang, Jing | Chalmers University of Technology |
| Ramirez-Amaro, Karinne | Chalmers University of Technology |
| Dean, Emmanuel | Chalmers University of Technology |
Keywords: Cognitive Control Architectures, Learning from Experience
Abstract: Learning-based robot skill represent the current state-of-the-art of robot manipulation, but they can struggle to generalise to out-of-distribution tasks, and failures may be difficult to understand and resolve. In our ongoing work, we aim to develop a more interpretable framework that combines a learned policy with a learned forward model and a learned semantic representation that facilitates monitoring and simplifies adaptation. In this short paper, we briefly describe the ideas we pursue in this direction, with a concrete focus on the forward modelling aspect. We particularly discuss two network-based forward model variants and illustrate some preliminary results of the obtained predictions on a domestic object pick up task.
|
| |
| 15:00-16:20, Paper ThuMA.18 | |
| Robustness Evaluation of Uncertainty-Gated Intention Prediction with Noise and Dropouts |
|
| Mees, Hans | Eberhard Karls Universität Tübingen |
| Gaus, Johannes Albert | University of Tuebingen |
| Schmitt, Syn | University of Stuttgart, Germany |
| Haeufle, Daniel Florian Benedict | University of Tübingen |
|
|
| |
| 15:00-16:20, Paper ThuMA.19 | |
| LeARN: Learnable and Adaptive Representations for Nonlinear Dynamics in System Identification |
|
| SINGH, ARUNABH | Birla Institute of Technology and Science, Hyderabad |
| Mukherjee, Joyjit | Birla Institute of Technology and Science, Hyderabad |
Keywords: RIG TC: Principles and Methods for Building AI-powered Robust and Resilient Robots, Calibration and Identification, Model Learning for Control
Abstract: System identification, the process of deriving mathematical models of dynamical systems from observed input-output data, has undergone a paradigm shift with the advent of learning-based methods. Addressing the intricate challenges of data-driven discovery in nonlinear dynamical systems, these methods have garnered significant attention. Among them, Sparse Identification of Nonlinear Dynamics (SINDy) has emerged as a transformative approach, distilling complex dynamical behaviors into interpretable linear com- binations of basis functions. However, SINDy’s reliance on domain-specific expertise to construct its foundational ’library’ of basis functions limits its adaptability and universality. In this work, we introduce a nonlinear system identification framework LeARN that transcends the need for prior domain knowledge by learning the library of basis functions directly from data. To enhance adaptability to evolving system dynamics under varying noise conditions, we employ a novel meta-learning-based system identification approach that utilizes a light-weight Deep Neural Network (DNN) to dynamically refine these basis functions. This not only captures intricate system behaviors but also adapts seamlessly to new dynamical regimes. We validate our framework on the Neural Fly dataset, showcasing its robust adaptation and generalization capabilities. Despite its simplicity, our LeARN achieves competitive dynamical error performance to SINDy. This work presents a step towards autonomous discovery of dynamical systems, paving the way for a future where machine learning uncovers the governing principles of complex systems without requiring extensive domain-specific interventions.
|
| |
| 15:00-16:20, Paper ThuMA.20 | |
| Efficient Fine-Tuning of VLA Models for Industrial Manipulation |
|
| Wrede, Konstantin | Fraunhofer IIS/EAS |
| Di, Yibo | Fraunhofer IIS |
| Neumann, Julius | Fraunhofer IIS/EAS |
| Martin, Ron | Fraunhofer IIS/EAS |
| Schneider, Peter | Fraunhofer IIS/EAS |
Keywords: AI-Enabled Robotics, Data Sets for Robot Learning
Abstract: This work investigates the potential of Vision-Language-Action (VLA) models for industrial robotic manipulation, aiming to address the need for flexible automation solutions. By deploying and evaluating the open-source VLA model π0 (openpi) on a Franka Panda robot, this study analyzes the data efficiency and generalization capabilities of robot foundation models in industrial settings. We compare three training strategies: Real-only, Sim-and-Real Co-Training, and Sim-then-Real across two tasks of varying complexity: Plug Removal and Long-Horizon Object Sorting. Results demonstrate that the Sim-then-Real approach significantly outperforms other strategies, achieving high success rates with minimal real-world data (100% success on Plug Removal with only 10 real demonstrations). This study shows how to efficiently leverage simulation-based pre-training coupled with real-world fine-tuning in industrial robotic manipulation tasks.
|
| |
| 15:00-16:20, Paper ThuMA.21 | |
| Human-Interpretable Uncertainty Explanations for Point Cloud Registration |
|
| Gaus, Johannes Albert | University of Tuebingen |
| Schneider, Loris | Karlsruhe Institute of Technology |
| Shi, Yitian | karlsruhe institute of technology |
| Lee, Jongseok | German Aerospace Center |
| Rayyes, Rania | Karlsruhe Institute for Technology (KIT) |
| Triebel, Rudolph | German Aerospace Center (DLR) |
|
|
| |
| 15:00-16:20, Paper ThuMA.22 | |
| Autonomous Docking of Multi-Rotor UAVs on Blimps under the Influence of Wind Gusts |
|
| Goldschmid, Pascal | University of Stuttgart |
| Ahmad, Aamir | University of Stuttgart |
Keywords: Aerial Systems: Applications, Aerial Systems: Mechanics and Control, Aerial Systems: Perception and Autonomy
Abstract: Multi-rotor UAVs face limited flight time due to battery constraints. Autonomous docking on blimps with onboard battery recharging and data offloading offers a promising solution for extended UAV missions. However, the vulnerability of blimps to wind gusts causes trajectory deviations, requiring precise, obstacle-aware docking strategies. To this end, this work introduces two key novelties: (i) a temporal convolutional network that predicts blimp responses to wind gusts, enabling rapid gust detection and estimation of points where the wind gust effect has subsided; (ii) a model predictive controller (MPC) that leverages these predictions to compute collision-free trajectories for docking, enabled by a novel obstacle avoidance method for close-range maneuvers near the blimp. Simulation results show our method outperforms a baseline constant-velocity model of the blimp significantly across different scenarios. We further validate the approach in real-world experiments, demonstrating the first autonomous multi-rotor docking control strategy on blimps shown outside simulation. Source code is available here https://github.com/robot-perception-group/multi_rotor_airsh ip_docking.
|
| |
| 15:00-16:20, Paper ThuMA.23 | |
| Joint Denoising and Motion Estimation with Event Cameras |
|
| Shiba, Shintaro | Keio University |
| Aoki, Yoshimitsu | Keio University |
| Gallego, Guillermo | Technische Universität Berlin |
Keywords: Computer Vision for Automation, RIG TC: Robot Perception, RIG Cluster: Rigorous Perception
Abstract: Event cameras are emerging vision sensors whose noise is challenging to characterize. Existing denoising methods for event cameras are often designed in isolation and thus consider other tasks, such as motion estimation, separately (i.e., sequentially after denoising). However, motion is an intrinsic part of event data, since scene edges cannot be sensed without motion. We propose the first method that simultaneously estimates motion in its various forms (e.g., ego-motion, optical flow) and noise. The method is flexible, as it allows replacing the one-step motion estimation of the widely-used Contrast Maximization framework with any other motion estimator, such as deep neural networks. The experiments show that the proposed method achieves state-of-the-art results on the E-MLB denoising benchmark and competitive results on the DND21 benchmark, while demonstrating effectiveness across motion estimation and intensity reconstruction tasks. Our approach advances event-data denoising theory and expands practical denoising use-cases via open-source code. Project page: https://github.com/tub-rip/ESMD
|
| |
| 15:00-16:20, Paper ThuMA.24 | |
| Towards Mixed-Reality-Based Robot Programming |
|
| Pfister, Tom | Technical University of Applied Sciences Würzburg-Schweinfurt |
| Lang, Silvio | Technical University of Applied Sciences Würzburg-Schweinfurt (thws) |
| Kaupp, Tobias | Technical University of Applied Sciences Würzburg-Schweinfurt |
|
|
| |
| 15:00-16:20, Paper ThuMA.25 | |
| Memory-Aware Environmental Knowledge Sharing for Cooperative Autonomous Robot Systems |
|
| Helten, Catharina | RPTU University of Kaiserslautern-Landau |
| Wolf, Patrick | University of Kaiserslautern-Landau | Fraunhofer IESE |
Keywords: RIG Cluster: Safety, Reliability and Resilience of AI-based Robotics, Cooperating Robots
Abstract: Cooperative autonomous operation requires effective coordination under uncertainty. Leader-follower convoying in off-road environments is challenged by limited visibility, which makes purely local perception and short-term data exchange insufficient. This paper presents a memory-aware leader-follower concept that introduces persistent, confidence-aware environmental knowledge as a basis for cooperation. Vehicles maintain structured short- and long-term representations of environmental context and selectively share abstract, confidence-annotated knowledge via vehicle-to-vehicle (V2V) communication. By explicitly distinguishing transient observations from persistent environmental properties and reasoning about their reliability, follower vehicles adapt their behavior through interpretable behavior modes. This enables robust and self-adaptive convoy operation under uncertainty without explicit path or trajectory sharing.
|
| |
| 15:00-16:20, Paper ThuMA.26 | |
| Task-Adaptive Perception for Human-Robot Interaction |
|
| Mania, Patrick | University of Bremen |
| Beetz, Michael | University of Bremen |
Keywords: RIG TC: Robot Perception, Perception for Grasping and Manipulation, RIG TC: Semantic Perception
Abstract: Human-robot interaction (HRI) requires perception systems that can handle diverse tasks such as multi-person re-identification, gaze estimation and control, gesture and activity recognition, body posture analysis, and spatial reasoning in dynamic crowds. These requirements span both single-shot tasks (e.g., classifying guest attributes) and continuous perception (e.g., tracking speakers or monitoring groups over time), often in unstructured environments where rigid pipelines are insufficient. However, no single perception algorithm can robustly address this spectrum: human detection, re-identification, gaze estimation, activity recognition, and continuous tracking differ fundamentally in their assumptions, data modalities, and temporal characteristics, making monolithic solutions impractical in real-world HRI. To address these limitations, our presented System RoboKudo enables generalized, task-adaptive perception through query-driven Perception Pipeline Trees (PPTs). PPTs use Behavior Trees (BTs) to compose specialized vision experts into task-specific pipelines at runtime, while a shared data structure allows Annotations and belief state information to be maintained consistently across single-shot and continuous execution. By supporting reactivity, looping, and parallel inference within a unified process model, RoboKudo accommodates the heterogeneous demands of HRI. We demonstrate this capability in the Receptionist and Restaurant challenges at RoboCup@Home, which require seamless integration of diverse perception behaviors.
|
| |
| 15:00-16:20, Paper ThuMA.27 | |
| Task-Based Evaluation of Robot Foot Geometries for Granular Substrates Using Three-Dimensional Resistive Force Theory |
|
| Aslam, Umair | RWTH Aachen University |
| Adak, Omer Kemal | RWTH Aachen |
| Fuentes, Raul | RWTH Aachen |
Keywords: Legged Robots, Dynamics, Field Robots
Abstract: oot geometry plays a critical role in legged robot locomotion on granular substrates, influencing thrust generation, energetic cost, load support, and stability. However, systematic evaluation of robot foot designs in three-dimensional granular interaction remains limited. In this paper, we present a compact, task-based framework for evaluating rigid robot foot geometries using three-dimensional resistive force theory (3D-RFT). A representative stance-phase interaction is defined, consisting of vertical intrusion followed by horizontal shear under prescribed yaw misalignment. Performance is quantified using stroke-integrated metrics capturing thrust, energetic cost, and peak yaw moment, complemented by a sinkage-based proxy for load support. The framework is applied to three representative foot geometries—a flat plate, a high-aspect-ratio ski, and a ribbed plate—under identical kinematic conditions. Results reveal clear tradeoffs between shallow-sinkage load support, thrust generation, and yaw robustness, demonstrating the utility of 3D-RFT as a practical design evaluation tool for robot feet interacting with granular media.
|
| |
| 15:00-16:20, Paper ThuMA.28 | |
| Towards Whole-Body VLA: A Scalable Data Collection Framework for Quadrupedal Mobile Manipulators |
|
| Gao, Yuan | Technical University of Munich |
| Piccinini, Mattia | Technical University of Munich |
| Betz, Johannes | Technical University of Munich |
Keywords: Whole-Body Motion Planning and Control, Data Sets for Robot Learning, RIG TC: Robotics Foundation Models
Abstract: Quadrupedal mobile manipulators combine locomotion and manipulation for versatile operation in unstructured environments. While conventional model-based control ensures stability, it often lacks the generalization capabilities required for diverse daily tasks. Conversely, data-driven approaches offer a promising alternative but are hindered by a critical scarcity of unified whole-body data. To bridge this gap, we introduce a scalable dataset collection pipeline designed to enable the training of Vision Language Action (VLA) models. Our framework automates the generation of diverse demonstrations through a two-stage process: bootstrapping from expert model-based controllers and scaling via autonomous rollouts. This work provides the foundational data infrastructure to extend the success of VLA models to quadrupedal mobile manipulators.
|
| |
| 15:00-16:20, Paper ThuMA.29 | |
| ANN-CMCGS: Generalizing Continuous Monte Carlo Graph Search with Approximate Nearest Neighbors |
|
| Scherer, Christoph | Technical University Berlin |
| Hoenig, Wolfgang | Technical University Berlin |
Keywords: RIG Cluster: Learning and Multimodal AI for Robotics, Motion and Path Planning, Planning under Uncertainty
Abstract: Continuous Monte Carlo Graph Search (CMCGS) enables state reuse in continuous domains but relies on a layered, acyclic structure, limiting its effectiveness. We introduce ANN-CMCGS, a generalized, non-layered formulation to detect approximate transpositions in continuous spaces via approximate nearest-neighbor search. By allowing arbitrary directed graphs and enabling incremental reuse across decision steps, ANN-CMCGS demonstrates improved exploration efficiency and success rates in challenging continuous domains.
|
| |
| 15:00-16:20, Paper ThuMA.30 | |
| A Manipulation Pipeline for Grasping Unknown Objects in Heavy Clutter for Decontamination |
|
| Hyseni, Engjell | Karlsruhe Institute of Technology (KIT) |
| Nutto, Sebastian | Karlsruhe Institute of Technology (KIT) |
| Nefzer, Janna | Karlsruhe Institute of Technology (KIT) |
| De Diego Pérez, Miguel | Universitat Jaume I |
| Morales, Antonio | Universitat Jaume I |
| Asfour, Tamim | Karlsruhe Institute of Technology (KIT) |
Keywords: RIG Cluster: Manipulate Anything, Manipulation Planning, Perception for Grasping and Manipulation
Abstract: The decontamination of nuclear waste remains a challenging and largely manual process, exposing human workers to physical strain and potential health risks due to radiation. In this work, we present a manipulation pipeline for grasping unknown objects in heavily cluttered environments, motivated by real-world decontamination scenarios addressed in the ROBDEKON project. The proposed system integrates a robust perception pipeline for scene segmentation, a manipulation framework for grasp generation and selection and a failure detection and recovery mechanism. Our approach enables autonomous grasping of previously unseen objects in a cluttered scene from containers and prepares them for subsequent decontamination steps, thereby improving worker safety and increasing overall process efficiency.
|
| |
| 15:00-16:20, Paper ThuMA.31 | |
| Towards Maximum Distance and Accurate Throwing by Exploiting Dynamics of Robotic Manipulators |
|
| Barten, Moritz | Karlsruhe Institute of Technology |
| Meyer, Anne | Karlsruhe Institute of Technology |
| Roennau, Arne | Karlsruhe Institute of Technology (KIT) |
Keywords: RIG Cluster AI-Powered Industrial Robotics, RIG TC: AI-Robotics in Industry
Abstract: Modern warehouse logistics demand automated and highly efficient solutions to ensure rapid and reliable commissioning processes. Even though there are several approaches addressing robotic throwing, most of them rely on motion primitives thereby only learning end-effector velocities or specific joint behaviors. These restrictions prevent the robot from fully exploiting its dynamics, which limits the reachable task space and maximum throwing range. To address these gaps, this work proposes an architecture to exploit the dynamic capabilities of a robotic arm, optimizing both throwing distance and accuracy for stationary and moving targets. Therefore, we will extend an existing optimization approach and combine it with reinforcement learning. The optimization module will compute an optimal release state for a maximum throwing distance, whereas the reinforcement learning agent will be trained to find a trajectory to the determined release state. In a second step the trained agent for maximum distance throwing will serve as a pre-trained policy to warmstart a training of a second reinforcement learning agent addressing throwing accuracy. While purely data-driven RL suffers from limited sample efficiency and lack of physical guarantees, we will incorporate physical knowledge directly into the loss terms of the underlying neural networks.
|
| |
| 15:00-16:20, Paper ThuMA.32 | |
| Development of Robotic Hands for Grasping of Deformable Objects |
|
| Hundhausen, Felix | Karlsruhe Institute of Technology |
| Moosmüller, Moritz | Karlsruhe Institute of Technology (KIT) |
| Ruffler, Daniel | Karlsruhe Institute of Technology (KIT) |
| Asfour, Tamim | Karlsruhe Institute of Technology (KIT) |
Keywords: Grippers and Other End-Effectors, RIG TC: Deformable Object Manipulation
Abstract: Manipulating deformable objects such as textiles remains a key challenge in robotics, requiring dexterous hardware and tactile sensing. This paper presents the design and development of an anthropomorphic five-finger robotic hand for deformable object manipulation. The fingers are realized using a four-bar mechanism to replicate human finger trajectories. We compare two distinctive thumb designs with two actuated degrees of freedom to allow pinch grasps. Each finger incorporates an embedded tactile sensing system without the need for finger internal cables. A hand-internal embedded system is designed for real-time control and sensor data processing. To evaluate the performance of the hand design in grasping deformable objects, a prototype is currently being built.
|
| |
| 15:00-16:20, Paper ThuMA.33 | |
| Composable and Interpretable Theory of Mind for Fluid Human-Robot Collaboration Via Behavior Trees |
|
| Schröder, Florian | Bielefeld University |
| Heinrich, Fabian | Bielefeld University |
| Kopp, Stefan | Bielefeld University |
Keywords: RIG Cluster: Human-Robot Interaction, Human-Robot Collaboration, Intention Recognition
Abstract: Fluid collaboration (FC) refers to highly flexible, real-time teamwork characterized by dynamic task and role allocation, as often observable in everyday human interaction. Enabling fluid human-robot collaboration requires robots to employ online Theory of Mind (ToM), inferring latent mental states such as goals and intentions of others to support decentralized planning. Key challenges include managing the computational complexity of ToM, enabling extensibility to new tasks and strategies, providing planning-relevant mental state representations, and ensuring interpretability and explainability for human-robot interaction. We present an approach to ToM based on Behavior Trees (BTs) to address these challenges and support fluid human-robot collaboration. Our approach offers extensibility, adaptable computational complexity, explicit integration of uncertainty, and reuse of the robot’s action policy for ToM, enabling action-driven mental state representations that map to the robot’s task knowledge.
|
| |
| 15:00-16:20, Paper ThuMA.34 | |
| Entity-Grounded Procedural Knowledge Graphs for Executable Task Understanding from Instructional Videos |
|
| Oguz, Cennet | German Research Center for Artificial Intelligence (DFKI) |
| Ostermann, Simon | Deutsches Forschungszentrum Für Künstliche Intelligenz |
| Neumann, Günter | DFKI GmbH & University of Saarland |
Keywords: Integrated Planning and Learning, Visual Learning, Visual Tracking
Abstract: Instructional videos contain rich procedural knowledge that could support robotic task execution. However, most existing video understanding approaches produce free-form captions or high-level action labels that lack the explicit, entity-centric semantics required for robotic planning. We present Entity-Grounded Procedural Knowledge Graphs (EGPKGs), a neuro-symbolic representation that decomposes instructional videos into explicit entity-level transformations with grounded preconditions and effects. EGPKGs integrate language-based action schemas, vision-based entity grounding, and symbolic state transitions to produce executable task representations suitable for AI-powered robotic systems.
|
| |
| 15:00-16:20, Paper ThuMA.35 | |
| Visual Event-Gait-Based Human Following for Quadruped Robots |
|
| Nguyen, Hong Phuoc Nguyen | Karlsruhe Institute of Technology (KIT) |
| Roennau, Arne | Karlsruhe Institute of Technology (KIT) |
Keywords: Human Detection and Tracking, Biologically-Inspired Robots, Recognition
Abstract: As personal service robots transition into human-centric environments, autonomous human-following capabilities are essential for practical deployment. This work presents a robust pipeline for gait-based human following utilizing an event-based camera, leveraging its high temporal resolution to track fast-moving subjects. Our approach employs a detection network to identify pedestrians, followed by a recognition network that utilizes unique walking gait biometric features for human identification. Once a target is identified, the system enables continuous tracking and autonomous following. We introduce a novel event frame representation that contains information in the previous accumulation time, significantly enhancing network performance in dynamic settings. Experimental results confirm the effectiveness of gait-based recognition in real-world scenarios, demonstrating high reliability even when subjects are unseen during the training phase.
|
| |
| 15:00-16:20, Paper ThuMA.36 | |
| Sampling-Based Trajectory Optimization for Humanoid Loco-Manipulation Motion Retargeting |
|
| Dhédin, Victor | Technical University of Munich |
| Khadiv, Majid | Technical University of Munich |
Keywords: RIG TC: Foundations of Optimization and Learning for Robotics, Humanoid Robot Systems, Multi-Contact Whole-Body Motion Planning and Control
Abstract: In this work, we present a sampling-based trajectory optimization framework that retargets imperfect kinematic humanoid loco-manipulation demonstrations into dynamically feasible motions. Our method leverages the temporal structure of the tracking objective by incrementally increasing the optimization horizon, enabling the use of single-shooting to optimize long trajectories efficiently. We validate the approach by successfully retargeting hundreds of demonstrated motions on a fully actuated humanoid interacting with a box. The framework also generalizes across varying object properties such as mass, size, and geometry with the exact same tracking objective. This ability to robustly retarget diverse demonstrations opens the door to generating large-scale synthetic datasets of humanoid loco-manipulation trajectories, addressing a major bottleneck in real-world data collection.
|
| |
| 15:00-16:20, Paper ThuMA.37 | |
| Race Car Aerobatics Via Position-Indexed Iterative-Learning Control |
|
| Wildberger, Lukas | RWTH Aachen University |
| Hose, Henrik | RWTH Aachen |
| Solowjow, Friedrich | RWTH Aachen University |
| Trimpe, Sebastian | RWTH Aachen University |
Keywords: RIG TC: Foundations of Optimization and Learning for Robotics, Wheeled Robots, Learning from Experience
Abstract: We develop a robust 1:10 scale race car platform for repeated autonomous jump experiments with controlled takeoff, free flight, and touchdown. A position-indexed iterative learning control (ILC) formulation is proposed to refine open-loop jump maneuvers from hardware data while mitigating timing variability. Using this approach, precise and reliable jumps exceeding 2 m are achieved within 50 learning iterations.
|
| |
| 15:00-16:20, Paper ThuMA.38 | |
| Soft Drag Gripper - a Soft Simultaneous Multiple Object Gripper Designed to Work in Rectangle Boxes |
|
| Friedl, Werner | German AerospaceCenter (DLR) |
Keywords: Grippers and Other End-Effectors, Robotics and Automation in Agriculture and Forestry, Soft Robot Applications
Abstract: In the field of logistics, humans have the ability to grasp multiple objects simultaneously. This paper presents a hardware solution in the form of the Soft Drag Gripper, that demonstrates excellent capabilities for grasping multiple objects simultaneously. The drag design enables efficient emptying. of rectangular boxes.. Simple strategies can be employed to increase the number of picks per grasp, thereby reducing time and costs. A benchmark compares SDG to exiting design solution showing higher pick rate than other designs.
|
| |
| 15:00-16:20, Paper ThuMA.39 | |
| Including Meshed-Based Costlayers in Path Following Control |
|
| Braun, Justus | Osnabrück University |
| Mock, Alexander | Osnabrück University |
| kl. Piening, Malte | Nature Robots |
| Wiemann, Thomas | Fulda University of Applied Sciences |
Keywords: Task and Motion Planning, Motion and Path Planning, RIG Cluster: Field Robotics
Abstract: Robust collision-free navigation in uneven terrain is necessary for autonomous robots to be deployed in dynamic outdoor environments. The mobile robot navigation problem can be split into global path planning and path-following control. One method that was shown to be an effective solution to solve the control problem is Model Predictive Control (MPC). Existing solutions either assume that the robot moves on a flat surface or use 2D height maps, which are limited to single-story environments. In this contribution, we provide a prove of concept, that includes mesh-based cost maps into MPPI. It uses a unified cost map representation of geometric traversability metrics derived from the mesh geometry and unmapped obstacles detected using 3D LiDAR sensors. We show that the representation of terrain geometry used by our controller enables safer behaviors than existing 2D and 3D methods.
|
| |
| 15:00-16:20, Paper ThuMA.40 | |
| Human Gesture & Activity Recognition in Scene Context for Intralogistic Mobile Robots |
|
| Käs, Stephanie | RWTH Aachen University |
| Linder, Timm | Robert Bosch GmbH |
| Leibe, Bastian | RWTH Aachen University |
Keywords: Datasets for Human Motion, Gesture, Posture and Facial Expressions, Human-Robot Collaboration
Abstract: We study human gesture and activity recognition for intralogistic mobile robots using fisheye cameras. To address strong distortions, we propose a dynamic projection selection strategy for monocular 3D human pose estimation, validated on the new FISHnCHIPS dataset. Building on robust pose estimates, we evaluate gesture recognition using skeleton-based models and vision foundation models on NUGGET, highlighting current trade-offs between accuracy and adaptability for human–robot interaction.
|
| |
| 15:00-16:20, Paper ThuMA.41 | |
| Rendering Forces with a Modular Cable System, Motors, and Brakes |
|
| Bartels, Jan U. | Max-Planck Institute for Intelligent Systems |
| Achberger, Alexander | University of Stuttgart |
| Kuchenbecker, Katherine J. | Max Planck Institute for Intelligent Systems |
| Sedlmair, Michael | University of Stuttgart |
Keywords: Haptics and Haptic Interfaces, Virtual Reality and Interfaces, Human-Centered Robotics
Abstract: We describe the hardware design, force-rendering approach, and evaluation of a new reconfigurable haptic interface consisting of a network of hybrid motor-brake actuation modules that apply forces via cables. Each module contains both a motor and a brake, enabling it to smoothly render active forces up to 6 N using its motor and collision forces up to 186 N using its passive one-way brake. The modular design, meanwhile, allows the system to deliver rich haptic feedback in a flexible number of DoF and widely ranging configurations.
|
| |
| 15:00-16:20, Paper ThuMA.42 | |
| BATEX: Biarticular Soft Exosuit Assistance Improves Walking Efficiency |
|
| Ahmadi, Arjang | Teschniche Universität Darmstadt |
| Firouzi, Vahid | Technical University of Darmstadt |
| Seyfarth, Andre | TU Darmstadt |
| Rinderknecht, Stephan | TU Darmstadt |
| Findeisen, Rolf | Control and Cyber-Pysical Systems Laborator |
| Sharbafi, Maziar | Technische Universität Darmstadt |
Keywords: Wearable Robotics, RIG TC: Robotic Augmentation of the Human Body, RIG Cluster: Healthcare Robotics and Human Augmentation
Abstract: Human locomotion is a highly adaptive and ef- ficient process shaped by biomechanical and neuromuscular control. However, aging and neuromuscular impairments can reduce walking efficiency and increase the metabolic cost of movement. This study investigates the Biarticular Thigh Exosuit (BATEX), a soft wearable device designed to assist both hip and knee joints through coordinated biarticular actuation. By supporting the function of the rectus femoris and hamstring muscle groups, BATEX aims to improve locomotor efficiency and reduce the energetic demands of walking. An experimental study with 12 participants evaluated the effects of BATEX on energy consumption. Results demonstrate that BATEX signifi- cantly lowers energy expenditure during walking, achieving a 9% reduction compared to the No-Exosuit (NE) condition and an 18% reduction compared to the Zero-Torque (ZT) condition. These findings indicate that BATEX can enhance walking efficiency while reducing neuromuscular effort, supporting its potential for applications in healthcare and human-robot interaction, such as mobility assistance.
|
| |
| 15:00-16:20, Paper ThuMA.43 | |
| A Human-Centered Perspective on Interactive Robot Learning |
|
| Beierling, Helen | Bielefeld University |
| Vollmer, Anna-Lisa | Bielefeld University |
Keywords: Human-Centered Robotics, Human-Robot Collaboration, Human Factors and Human-in-the-Loop
Abstract: Recent advances in artificial intelligence and robotics have enabled robots to increasingly enter everyday environments, including highly burdened domains such as healthcare. However, these contexts are strongly shaped by individual needs, making it infeasible to rely solely on preprogrammed robot behaviors. Therefore, robots must be trained by end users, who are often non-experts. While Human-in-the-Loop and interactive robot learning approaches address this challenge by incorporating user feedback, they commonly assume that users are able to provide effective and meaningful input. In practice, mismatches between users’ mental models and the robot’s learning process can slow down or hinder learning. To address these challenges, we developed an initial implementation of Co-Constructive Training (CCT) within a four-year research project. CCT conceptualizes robot learning as a mutual process in which both the user and the system monitor and scaffold each other, aiming to align user understanding and robot learning for more effective human–robot interaction.
|
| |
| 15:00-16:20, Paper ThuMA.44 | |
| Baseline Lower-Limb Kinematics Are Associated with Individual Responses to Exosuit Assistance |
|
| Firouzi, Vahid | Technical University of Darmstadt |
| von Stryk, Oskar | Technische Universität Darmstadt |
| Sharbafi, Maziar | Technische Universität Darmstadt |
Keywords: Prosthetics and Exoskeletons, RIG Cluster: Healthcare Robotics and Human Augmentation, RIG TC: Robotic Augmentation of the Human Body
Abstract: Individuals respond differently to exosuit assistance, with some experiencing metabolic benefits and others not. This study examined whether baseline lower-limb joint kinematics during unassisted walking differ between positive and negative responders to a passive biarticular exosuit. Subjects were classified based on metabolic cost changes across multiple assisted configurations, and unassisted joint kinematics were compared between groups. Significant differences were found in hip flexion–extension, hip abduction–adduction, and knee flexion–extension. These results suggest that baseline gait mechanics may help predict responsiveness to exosuit assistance.
|
| |
| 15:00-16:20, Paper ThuMA.45 | |
| Bipedal Robot Squatting Control Using Human Kinematics |
|
| Jiang, Yelin | Technical University of Darmstadt |
| Zhao, Guoping | Southeast University |
| Haufe, Dennis | Technische Universität Darmstadt |
| Findeisen, Rolf | Control and Cyber-Pysical Systems Laborator |
| Ahmad Sharbafi, Maziar | Technical University of Darmstadt |
Keywords: Humanoid and Bipedal Locomotion, RIG Cluster: Legged Locomotion, Biologically-Inspired Robots
Abstract: Controlling humanoid robot locomotion can be challenging, while biological systems demonstrate adaptive and robust locomotion with minimal control efforts. In that sense, human kinematics hold potential for locomotion controller design. Among various human movements, squatting serves as a fundamental behavior that integrates both stance and balance subfunctions of locomotion. This study investigates how observed human kinematics can be leveraged to control squatting motions in a humanoid robot. We propose a bioinspired open-loop controller to map human joint angles to robot reference trajectories. This controller was implemented on our simulation model and real robot. We explored the parameter space of joint gains to evaluate three key performance metrics: stability, efficiency, and similarity. Experimental results show that our kinematic-based controller is effective for human-like squatting behaviors. By tuning gains, trade-offs among stability, efficiency, and similarity can be achieved to obtain optimal performance. This work contributes to the control of bipedal squatting motion and the understanding of bio-inspired legged locomotion.
|
| |
| 15:00-16:20, Paper ThuMA.46 | |
| Large Language Models for Automatic Specification Design in Supervisory Control of Multi-Robot Systems |
|
| Isildar, Ecem | Technical University of Darmstadt |
| Miyauchi, Genki | The University of Sheffield |
| Gross, Roderich | Technical University of Darmstadt |
Keywords: RIG Cluster Multi-Robot Systems, RIG TC: Swarm Robotics, RIG TC: Multi-Robot Coordination
Abstract: This paper explores using Large Language Models (LLMs) to generate formal control specifications for Supervisory Control Theory (SCT) to ensure safe multi-robot navigation in unmapped environments. Although manual specification design is labor-intensive and direct LLM commanding poses significant safety risks, including collisions and unpredictable behavior in human-populated spaces, our approach bridges this gap by leveraging LLMs for formal synthesis. Results show that GPT-4.1 reliably generates valid specifications from predefined events, enabling collision- and deadlock-free collaborative patrolling. Furthermore, the model effectively filters redundant events and discovers strategies that improve environmental coverage, highlighting the potential for LLMs to accelerate safe controller design.
|
| |
| 15:00-16:20, Paper ThuMA.47 | |
| Robotics in Sensitive Settings: Lessons Learned from Case Studies Exploring Real-World Integration |
|
| Rixen, Jan Ole | Karlsruhe Institute of Technology |
| Gerling, Kathrin | KIT |
| Neef, Caterina | Karlsruhe Institute of Technology |
| Bruno, Barbara | Karlsruhe Institute of Technology (KIT) |
| Herzog, Olivia | Technical University of Munich |
| Ackermann, Marko | Karlsruhe Institute of Technology |
| Mombaur, Katja | Karlsruhe Institute of Technology |
| Pascher, Max | TU Dortmund University |
| Gerken, Jens | TU Dortmund University |
| Vollmer, Anna-Lisa | Bielefeld University |
Keywords: RIG TC: Human-Robot Interaction in Sensitive Settings, RIG Cluster: Human-Robot Interaction, Long term Interaction
Abstract: In this work, we draw on six case studies conducted to gain insights into how robots can support stakeholders in different sensitive settings. In this way, we identify and outline research opportunities for future work within the robotics community that bridges the gap between human-oriented and technical robotics research, enabling the development of solutions that are both effective and suitably aligned with societal demands.
|
| |
| 15:00-16:20, Paper ThuMA.48 | |
| AI-Ready Information Architecture for Smart Factories with RFID, Edge Intelligence, Digital Twins, and Policy Control |
|
| Nagrath, Vineet | Technical University of Munich (TUM) |
| Rajaei, Nader | Technical University of Munich |
| Lilienthal, Achim J. | TU Munich |
Keywords: Mapping, Integrated Planning and Control, RIG Cluster: Safety, Reliability and Resilience of AI-based Robotics
Abstract: Smart factories increasingly rely on Artificial Intelligence (AI) operating over shared physical infrastructure, heterogeneous robots, and multi-stakeholder environments. This paper presents an information architecture that positions RFID-based localization as a foundational sensing layer for AI-driven cyber-physical production systems (CPPS). The architecture integrates high-performance UHF RFID (Siemens SIMATIC RF600), edge computing, digital twins, and policy-governed control (AI.Lock) to provide real-time awareness of assets, personnel, robots, tools, and work-in-progress. By combining RFID proximity, reflected-power triangulation, digital twin baselines, and trajectory persistence, the system enables probabilistic localization, safety enforcement, experiment traceability, and scalable coordination across trustless, multi-vendor environments. The architecture supports use cases ranging from safety geofencing to automated bill-of-material verification and synthetic data generation for machine learning. This paper details system components, key capabilities, and application domains, positioning the information architecture as a critical enabler of safe, explainable, and reusable AI in Industry 4.0.
|
| |
| 15:00-16:20, Paper ThuMA.49 | |
| Evaluation and Future Prospects of the SHIVAA Strawberry-Picking Robot |
|
| Wirkus, Malte | German Research Center for Artificial Intelligence (DFKI) |
| Peters, Heiner | German Research Center for Artificial Intelligence (DFKI) |
| Janzen, Janne | German Research Center for Artificial Intelligence (DFKI) |
| Stark, Tobias | German Research Center for Artificial Intelligence (DFKI) |
| Stoeffler, Christoph | German Research Center for Artificial Intelligence (DFKI) |
Keywords: RIG TC: Agri-Robotics, Robotics and Automation in Agriculture and Forestry, RIG Cluster Multi-Robot Systems
Abstract: The SHIVAA robot was specifically designed for harvesting strawberries grown in outdoor environments. The system features a lightweight manipulator and a perception system based on multispectral camera images for strawberry detection and classification. A passive suspension mechanism ensures all-wheel contact with uneven open-field terrain. A series of field tests was conducted during the 2024 and 2025 strawberry seasons on different professional strawberry plantations at various stages throughout the season. The aim was to evaluate the system's performance in picking strawberries and navigating within rows of plants. Performance parameters such as manipulation success rate, damage or bycatch rate, and total output were determined from the data acquired during the field and outdoor laboratory tests. In addition to the field tests, opportunities to increase the operating speed of the system were identified. Video analysis revealed potential for optimizing high-level coordination, and laboratory tests determined the maximum manipulator speed. To obtain an initial limit value for maximum movement speed, optimal trajectory plans for the manipulator's upward and downward movements were generated using an iterative linear-quadratic regulator. Differential times of 0.8 seconds were feasible in laboratory experiments. During normal operation, the system's individual capabilities are combined to create autonomous sequence control for the gripping process. Some sequential actions can also be performed in parallel to save time. For example, the manipulator can be moved to its rest position at the same time as moving to the next harvesting section. Additionally, the linear joint can be integrated into the manipulator control system, meaning it no longer needs to be controlled individually during harvesting or fruit placement. Currently the robot is further developed to act within a hybrid team of human field workers and other robots to complete the field logistics.
|
| |
| 15:00-16:20, Paper ThuMA.50 | |
| An Open-Source Humanoid Research Platform for Democratizing Robotics (pib Introduction for Researchers) |
|
| Okujava, Shota | Isento GmbH |
| Baier, Jürgen | Isento GmbH |
Keywords: Education Robotics, Developmental Robotics, Embedded Systems for Robotic and Automation
Abstract: Research in humanoid robotics is often hindered by high hardware costs and proprietary software barriers. This paper introduces pib (printable intelligent bot), an open-source, 3D-printable research platform designed to democratize access to advanced robotics through a modular hardware approach and agile development processes. Built on industry standards like ROS 2 and Onshape, pib features a comprehensive digital twin environment in Webots and MuJoCo to facilitate seamless Sim2Real transfer and Reinforcement Learning. The platform advances Human-Robot Interaction (HRI) by integrating LLMs and a LangGraph-based "Intelligence Node" for orchestrating complex sensor-actor workflows. Supported by the cloud-based perception platform TRYB, pib accelerates the development of vision models and offers a scalable ecosystem for future mobility and embodied AI research.
|
| |
| 15:00-16:20, Paper ThuMA.51 | |
| Learning to Race in Minutes: Infoprop Dyna on the Mini Wheelbot |
|
| Subhasish, Devdutt | RWTH Aachen University |
| Hose, Henrik | RWTH Aachen |
| Trimpe, Sebastian | RWTH Aachen University |
Keywords: RIG TC: Foundations of Optimization and Learning for Robotics, RIG Cluster: Learning and Multimodal AI for Robotics
Abstract: Reinforcement Learning (RL) has the potential to enable robots with fast, nonlinear, and unstable dynamics to reach the limits of their performance. However, most recent advances rely on carefully designed physics-based simulators and domain randomization to achieve successful sim-to-real transfer within reasonable wall-clock time. In this work, we bypass the need for such simulators and demonstrate that Infoprop Dyna, a state-of-the-art uncertainty-aware model-based reinforcement learning (MBRL) framework, can enable robots to learn directly from real-world interactions. Using Infoprop Dyna, the Mini Wheelbot, an underactuated unicycle robot, learns to race around a track within 11 minutes of real-world experience.
|
| |
| 15:00-16:20, Paper ThuMA.52 | |
| Distributed Boat Detection Via Acoustic Buoy Networks with Consensus-Based Fusion |
|
| Matzdorf, Felix | Technical University of Darmstadt |
| Talamali, Mohamed S. | University of Sheffield |
| Rau, Julian | Technical University of Darmstadt |
| Miyauchi, Genki | The University of Sheffield |
| Watteyne, Thomas | Inria |
| Gross, Roderich | Technical University of Darmstadt |
Keywords: RIG Cluster Multi-Robot Systems, RIG TC: Swarm Robotics, RIG TC: Networked Robotics
Abstract: Multi-agent sensing enables spatially distributed measurements that improve coverage and robustness compared to a single platform. We explore the use of a network of buoys equipped with microphone arrays with the purpose of detecting passing boats. Each buoy estimates time differences of arrival from short audio windows using the generalized cross-correlation with phase transform and derives a local direction-of-arrival under a plane-wave approximation. We fuse measurements by exchanging limited information parameters and running decentralized consensus, yielding a global maximum a posteriori estimate without transmitting raw audio. Simulation results suggest that localization error increases with measurement noise, whereas additional buoys improve accuracy and convergence. These findings indicate a scalable approach for distributed acoustic boat detection.
|
| |
| 15:00-16:20, Paper ThuMA.53 | |
| Magnetic Jamming for Reconfigurable Robotic Structures |
|
| Aktas, Buse | Robotic Composites and Compositions Group, Max Planck Institute for Intelligent Systems |
| Kim, Minsoo | ETH Zurich |
| Bäckert, Marc | ETH Zurich |
| Posada, Alejandro | Max Planck Institute for Intelligent Systems |
|
|
| |
| 15:00-16:20, Paper ThuMA.54 | |
| Applications and Functional Extensions of Vision–Language–Navigation Models in Indoor Environments |
|
| Chen, Donglin | Northeastern University |
| Zhang, Jiazhao | Peking University |
| Liu, Jiahang | Harbin Institute of Technology(ShenZhen) |
| Qiguan, Shiqun | University of Hamburg |
| Liu, Shang-Ching | Universität Hamburg |
| Wang, He | Peking University |
| Zhang, Jianwei | Hamburg University |
Keywords: AI-Enabled Robotics, AI-Based Methods, Vision-Based Navigation
Abstract: In this work, we propose a unified framework for mobile robots based on a vision–language–navigation (VLN) model, enabling navigation-driven execution of multiple indoor tasks. Specifically, we formulate object counting, object searching, and human-instruction-based navigation within a single navigation paradigm, allowing a robot to interpret natural language instructions and accomplish diverse goals through consistent action planning. To validate the feasibility of our approach, we conduct preliminary deployment and evaluation in simulation. Our preliminary results indicate that its performance is constrained by several practical factors, including the gap in data quality, the scale of fine-tuning, the level of instruction specificity, and the flexibility of action control. These factors also point to promising directions for future work. The code has been open-sourced currently to facilitate further research and development.
|
| |
| 15:00-16:20, Paper ThuMA.55 | |
| Toolbox of Modular Components to Demonstrate Reconfigurable Space Robots |
|
| Langosz, Malte | DFKI GmbH |
| Brinkmann, Wiebke | DFKI Robotics Innovation Center Bremen |
| Schilling, Moritz | University of Bremen |
| Eisenmenger, Jonas | DFKI GmbH |
| Wirkus, Malte | DFKI GmbH |
Keywords: RIG TC: Reconfigurable Robotics, RIG Cluster Multi-Robot Systems, RIG TC: Space Robotics for Sustainability and Exploration
Abstract: The MODKOM (Modular components as Building Blocks for application-specific configurable space robots) project aims to create a toolbox that allows robots to be configured and recombined for specific tasks using specialized, standardized building blocks, throughout different mission phases. To showcase the toolbox's capabilities, a demonstration scenario was created using a selection of hardware and software components. The video shows the demonstration scenario involving autonomous docking, rover reconfiguration and payload deployment, all of which are embedded within a broader mission context.
|
| |
| 15:00-16:20, Paper ThuMA.56 | |
| Contact-Implicit Optimization for Sequential Object Placements |
|
| Zhang, Yuezhe | Technische Universität Darmstadt |
| Tateo, Davide | Lund University |
| Chalvatzaki, Georgia | Technische Universität Darmstadt |
Keywords: Optimization and Optimal Control, Collision Avoidance
Abstract: Robotic object packing has attracted significant research interests in both academia and automation industry for the last decade. It is challenging due to the curse of dimensionality in the combinatorial search and the difficulty in dealing with dynamic and contact constraints for irregular-shaped objects. Current heuristic and learning-based methods assume a limited resolution of spatial discretization and overlook the significance of contacts. In this work, we eliminate these assumptions by introducing a contact-implicit optimization framework that combines the contact constraints into the Signed Distance Functions naturally. We use a convex decomposition module to divide the collision-free space into various convex sets, which yields tight solutions for convex objects. For non-convex objects, we divide the obstacle space into convex sets and parallelize the collision checking to improve the efficiency. Through extensive evaluations on a variety of irregular-shaped objects and comparison with existing methods, we demonstrate that our method can handle convex and non-convex object placements and leads to better performance in terms of packing utility and computation efficiency.
|
| |