| | |
Last updated on March 18, 2026. This conference program is tentative and subject to change
Technical Program for Friday March 13, 2026
| |
| FrZA |
|
| Interactive Session & Demos 4 & Coffee |
Interactive |
| |
| 10:45-12:05, Paper FrZA.1 | |
| Neuromusculoskeletal Modeling of Human Bipedal Gaits: Exploring the Role of Reflexes in Locomotion |
|
| Bunz, Elsa Katharina | University of Stuttgart |
| Haeufle, Daniel Florian Benedict | University of Tübingen |
| Schmitt, Syn | University of Stuttgart, Germany |
Keywords: Human and Humanoid Motion Analysis and Synthesis, Humanoid and Bipedal Locomotion, Biologically-Inspired Robots
Abstract: Human locomotion is characterized by impressive robustness and versatility which is currently unmatched by humanoid robots. These abilities arise from a complex interplay between the musculoskeletal system, spinal circuits and input from supraspinal centers. Despite decades of research, the neural mechanisms underlying this intricate human motor control remain controversial. The most basal control component in the central nervous system are reflexes. They are involuntary responses to sensory stimuli that occur via neural pathways in the spinal cord. While their importance in reacting to perturbations has been widely accepted, the role of reflexes as a stand-alone control component is unclear. In several works, we used neuromusculoskeletal simulations to investigate the role of reflexes in robust and versatile locomotion. The results show that reflexes play an important role in ensuring robust locomotion and can replicate versatile human locomotion involving different gaits and speed adaptations. They support the idea that reflexes are a powerful control primitive and encourage future research on implementing reflexive control on humanoids, prostheses, orthoses and exoskeletons.
|
| |
| 10:45-12:05, Paper FrZA.2 | |
| Autonomous Robotics As an Enabler for Sustainable Agroforestry Systems |
|
| Troesken, Lennart | Technical University of Munich (TUM) |
| Duecker, Daniel Andre | Technical University of Munich (TUM) |
Keywords: RIG TC: Agri-Robotics, RIG Cluster: Field Robotics, Agricultural Automation
Abstract: Robotic technologies in agriculture have primarily focused on automating existing machinery in large-scale, homogeneous production systems. As a result, their applicability to structurally heterogeneous and ecologically driven farming paradigms remains limited, particularly with respect to longterm autonomy, learning, and system-level integration. This work argues for a shift in perspective toward autonomous robotics as a system enabler for sustainable agriculture. Agroforestry is considered as an emerging farming paradigm and reference system that exposes fundamental challenges for agricultural robotics. The paper outlines research directions on long-term autonomy, learning, human–robot interaction, and extended planning horizons.
|
| |
| 10:45-12:05, Paper FrZA.3 | |
| Underwater Manipulation Wrench Estimation with Small-Scale Robots |
|
| Graf, Moritz | Technical University of Munich |
| Duecker, Daniel Andre | Technical University of Munich |
Keywords: RIG TC: AI-driven Marine Robotics, RIG Cluster: Field Robotics
Abstract: Performing dexterous manipulation underwater with small-scale robots remains a challenging problem due to unpredictable current disturbances and the impracticality of integrating force/torque sensors. We propose a model-based wrench estimator relying solely on onboard IMU and DVL measurements and a probabilistic interaction detection method based on a Gaussian mixture model that treats wrenches resulting from steady currents as a quasi-static background and interaction wrenches as a dynamic foreground. The proposed method is evaluated with practical experiments in an indoor basin.
|
| |
| 10:45-12:05, Paper FrZA.4 | |
| Vision-Based Screw Detection and Robotic Unscrewing |
|
| Menetrey-Meinhold, Sara | Chemnitz University of Technology |
| Schlegel, Holger | Technische Universität Chemnitz |
| Rehm, Matthias | Chemnitz University of Technology |
| Dix, Martin | Chemnitz University of Technology |
Keywords: Disassembly, AI-Enabled Robotics, RGB-D Perception
Abstract: Automated disassembly is an important topic regarding the limited resources we have on earth. The most established nondestructive process remains unscrewing. However, there is still no system on the market that allows automated unscrewing in a robust and highly flexible manner. Most research focuses on the removal of very specific screws in predefined environments. This document presents a solution tested on screws from M5 to M8, with hexagonal, Torx and Allen drive and different head types, placed in orientations of up to 51° to the vertical. The system includes a robotic arm, an AI-assisted 3D vision module and an industrial screwdriver adapted for unscrewing. Experimentation showed a detection rate of 84 % to 100 % depending on the screw type and an unscrewing rate between 88 % and 100 % also depending on the screw type. Overall, including the detection, the unscrewing and the removal of the screws from the working table, the system is successful at least 80 % of the time and shows potential for improvements.
|
| |
| 10:45-12:05, Paper FrZA.5 | |
| Learned Incremental Nonlinear Dynamic Inversion for Quadrotors with and without Slung Payloads |
|
| Cobo-Briesewitz, Eckart | TU Berlin |
| Wahba, Khaled | TU Berlin |
| Hoenig, Wolfgang | TU Berlin |
Keywords: RIG Cluster: Learning and Multimodal AI for Robotics, Machine Learning for Robot Control, Model Learning for Control
Abstract: The increasing complexity of multirotor applications demands flight controllers that can accurately account for all forces acting on the vehicle. Conventional controllers model most aerodynamic and dynamic effects but often neglect higher-order forces, as their accurate estimation is computationally expensive. Incremental Nonlinear Dynamic Inversion (INDI) offers an alternative by estimating residual forces from differences in sensor measurements; however, its reliance on specialized and often noisy sensors limits its applicability. Recent work has demonstrated that residual forces can be predicted using learning-based methods. In this paper, we show that a neural network can generate smooth approximations of INDI outputs without requiring additional sensor inputs. We further propose a hybrid approach that integrates learning-based predictions with INDI and demonstrate both methods for multirotors and multirotors carrying slung payloads. Experimental results on trajectory tracking errors demonstrate that the specialized sensor measurements required by INDI can be eliminated by replacing the residual computation with a neural network.
|
| |
| 10:45-12:05, Paper FrZA.6 | |
| BEAST: Efficient Tokenization of B-Splines Encoded Action Sequences for Imitation Learning |
|
| Zhou, Hongyi | Karlsruhe Institute of Technology |
| Lioutikov, Rudolf | Karlsruhe Institute of Technology |
Keywords: Imitation Learning, Deep Learning in Grasping and Manipulation
Abstract: We present the B-spline Encoded Action Sequence Tokenizer (BEAST), a novel action tokenizer that encodes action sequences into compact discrete or continuous tokens using B-spline. In contrast to existing action tokenizers based on vector quantization or byte pair encoding, BEAST requires no separate tokenizer training and consistently produces tokens of uniform length, enabling fast action sequence generation via parallel decoding. Leveraging our B-spline formulation, BEAST inherently ensures generating smooth trajectories without discontinuities between adjacent segments.
|
| |
| 10:45-12:05, Paper FrZA.7 | |
| Plan2Pose: Bridging Symbolic Planning and Robot Control Via Context-Aware Goal Generation on Point Clouds |
|
| Swoboda, Daniel Maximilian | RWTH Aachen University |
| Rakhman, Ulzhalgas | RWTH Aachen University |
| Hofmann, Till | RWTH Aachen University |
Keywords: AI-Enabled Robotics, Deep Learning in Grasping and Manipulation, Deep Learning for Visual Perception
Abstract: Goal-conditioned policies excel at manipulation tasks but lack the context to respect long-term constraints. Conversely, symbolic planners efficiently generate long-horizon action skeletons, but often abstract away the object geometries necessary for execution. However, combining the two approaches is challenging: translating task plans into geometric sub-goals requires translating an abstract symbolic state into a complete geometric goal representation. In this work, we present preliminary results on Plan2Pose, a neuro-symbolic transformer based on point clouds that produces pose transformations for each object and every state transition in the high-level plan. Each step in the plan is represented by a combination of symbolic predicate embeddings and geometric point cloud embeddings. The resulting sequence of grounded state representations is processed using a Transformer. Plan2Pose attends over the full plan to predict target poses that satisfy both immediate and future geometric constraints. The proposed approach can be used as sub-goal generator and combined with a goal-conditioned policy to realize each step, thereby executing long-term plans while taking all geometric constraints into account.
|
| |
| 10:45-12:05, Paper FrZA.8 | |
| Bridging the Loco-Manipulation Disconnect: A Framework for Dynamic Whole-Body Impulse Control on Floating-Base Robots |
|
| Brusnicki, Roberto | Technical University of Munich |
| Betz, Johannes | Technical University of Munich |
| Piccinini, Mattia | Technical University of Munich |
Keywords: Whole-Body Motion Planning and Control, Integrated Planning and Learning, Reinforcement Learning
Abstract: Mobile manipulation in the mid-2020s is defined by a paradox: locomotion has achieved remarkable agility across legged and wheeled-legged platforms, while Vision-Language-Action (VLA) models enable semantic manipulation with increasing generalization. Yet, a fundamental "Loco-Manipulation Disconnect" remains a critical barrier yet to be overcome for deployment in high-utility environments. This disconnect – the architectural segregation of the floating base from the manipulation chain – results in robots that must stop to act, failing in forceful scenarios where body momentum could be exploited. We propose Impulse-Aware Whole-Body Control (IA-WBC), a framework that transitions from quasi-static interaction to Dynamic Whole-Body Impulse Control. Our approach unifies a monolithic RL policy with certified stability guarantees, enabling floating-base robots to reason about impulse ( J = ∫F dt) as a resource rather than force as a disturbance. The framework targets deployment on humanoid and quadruped platforms with manipulation capabilities, with experimental validation planned on dynamic tasks such as door breaching, cart pushing, and dynamic lifting – scenarios requiring peak forces beyond the robot's static actuation limits.
|
| |
| 10:45-12:05, Paper FrZA.9 | |
| Development of a Differential Hip for Dynamic Bipedal Locomotion |
|
| Schroeders, Florian | Techical University of Munich |
| Radecker, Philipp | Technical University of Munich |
| Rixen, Daniel | Technische Universität München |
Keywords: Actuation and Joint Mechanisms, Tendon/Wire Mechanism, Legged Robots
Abstract: High leg inertia limits dynamic agility and energy efficiency in humanoid robotics. This work develops a differential hip architecture that reduces distal mass by relocating actuation to the torso via a tendon-driven differential. A prototype was designed and modeled in the flexible multibody dynamics framework Exudyn to validate kinematic coupling and structural loads. Comparative simulations against a standard serial hip show substantially reduced leg inertia and peak power demand during dynamic gait cycles. Physical experiments corroborate the simulation trends, supporting proximal actuation as a promising strategy for improving bipedal performance.
|
| |
| 10:45-12:05, Paper FrZA.10 | |
| An Effective and Robust Loop Closure Detection Pipeline for 3D LiDARs in Urban Environments |
|
| Gupta, Saurabh | University of Bonn |
| Guadagnino, Tiziano | University of Bonn |
| Mersch, Benedikt | Filics GmbH |
| Trekel, Niklas | University of Bonn |
| Malladi, Meher Venkata Ramakrishna | University of Bonn |
| Stachniss, Cyrill | University of Bonn |
Keywords: SLAM, Mapping, Localization
Abstract: Globally consistent mapping for autonomous robots relies on accurate pose estimation, where loop closures are essential for correcting accumulated drift. This extended abstract presents a robust loop closure detection pipeline for outdoor LiDAR SLAM. Our method constructs local maps from LiDAR scans, performs ground-based alignment, and generates a density-preserving bird's-eye-view representation. We then extract ORB feature descriptors for place recognition, followed by self-similarity pruning to mitigate perceptual aliasing. Experimental results demonstrate high-precision loop closure detection across varying LiDAR scanning patterns, fields of view, and motion profiles.
|
| |
| 10:45-12:05, Paper FrZA.11 | |
| ROS2 Interface for Aggregating Robot Manipulation Performances with Electronic Task Boards |
|
| So, Peter | Technical University of Munich |
| Abdelrahman, Ahmed | Technical University of Munich |
| Le, Hoan Quang | Technical University of Munich |
| Steinbach, Eckehard | Technical University of Munich |
Keywords: Performance Evaluation and Benchmarking, Dexterous Manipulation, Disassembly
Abstract: Conducting cross-comparable robotic object manipulation experiments remains a goal for robotics research, yet progress is still reported manually and with non-harmonized use cases which are difficult to compare. We present a software architecture for benchmarking robot performances capable of supporting multiple versions of physical electronic task boards and contribute a streamlined process for conducting and reporting experiment performance with test condition reproducibility guarantees. We extend the software interface of the existing electronic task board using a ROS2 Action Server and a local web server to perform standard and custom tasks. The interface allows users to programmatically start experiments with the task board and seamlessly publish their experimental data to a public web dashboard for the benchmarking community.
|
| |
| 10:45-12:05, Paper FrZA.12 | |
| DexTeRo: Dexterous Telemanipulation System for Upper Body Humanoid Robots |
|
| Schwarz, Stephan Andreas | Chemnitz University of Technology |
| Nieberle, Benedikt | Chemnitz University of Technology |
| Thomas, Ulrike | Chemnitz University of Technology |
Keywords: Telerobotics and Teleoperation, Human-Centered Robotics, Multi-Robot Systems
Abstract: The rapid development of complex robots such as humanoids increases the interest in telemanipulation systems. Besides the capability to control robots remotely, dual-arm telemanipulation setups are suitable tools for many research areas such as imitation learning or complex manipulation tasks. In this work, we introduce our new telemanipulation system DexTeRo. It enables an operator to control the upper-body of a humanoid robot and provides new features to overcome previous restrictions, such as limited payloads and small workspaces. The components of the follower and leader side together with the applied control architectures are presented. Finally, an outlook on our planned improvements and current research is given.
|
| |
| 10:45-12:05, Paper FrZA.13 | |
| Implementation of the Patellar Tendon Reflex in a Muscle-Driven Robotic Leg Based on Bioinspired Motor Control |
|
| Nadler, Tobias | University of Stuttgart |
| Schmitt, Syn | University of Stuttgart, Germany |
Keywords: Biologically-Inspired Robots, Robust/Adaptive Control, Humanoid and Bipedal Locomotion
Abstract: We developed an anthropomorphic, muscle-driven biorobotic leg that replicates the human monosynaptic reflex loop. By explicitly wiring the artificial muscle spindle signals to mimic the human afferent pathway, defined impacts on the patellar tendon elicit feedback-modulated stimulation mirroring human physiology. By calibrating the controller against dynamics from 14 healthy subjects, we achieved reflex behaviors indistinguishable from humans. This demonstrates the successful implementation of low-level sensorimotor control in muscle-like, soft actuation devices, enabling engineered systems to exploit the robustness and adaptability of biological locomotion.
|
| |
| 10:45-12:05, Paper FrZA.14 | |
| VLAgents: A Policy Server for Efficient VLA Inference |
|
| Jülg, Tobias Thomas | University of Technology Nuremberg |
| Gamal, Khaled | University of Technology Nuremberg |
| Nilavadi, Nisarga | University of Technology Nuremberg |
| Krack, Pierre | University of Technology Nuremberg |
| Bien, Seongjin | University of Technology Nuremberg |
| Krawez, Michael | University of Technology Nuremberg |
| Walter, Florian | Technical University Munich |
| Burgard, Wolfram | University of Technology Nuremberg |
Keywords: RIG TC: Robotics Foundation Models, Software Architecture for Robotic and Automation, Software, Middleware and Programming Environments
Abstract: The rapid emergence of Vision-Language-Action models (VLAs) has a significant impact on robotics. However, their deployment remains complex due to the fragmented interfaces and the inherent communication latency in distributed setups. To address this, we introduce VLAgents, a modular policy server that abstracts VLA inferencing behind a unified Gymnasium-style protocol. Crucially, its communication layer transparently adapts to the context by supporting both zero-copy shared memory for high-speed simulation and compressed streaming for remote hardware. In this work, we present the architecture of VLAgents and validate it by integrating seven policies---including OpenVLA and Pi0. In a benchmark with both local and remote communication, we further demonstrate how it outperforms the default policy servers provided by OpenVLA, OpenPi, and LeRobot. VLAgents is available at github.com/RobotControlStack/vlagents
|
| |
| 10:45-12:05, Paper FrZA.15 | |
| Collaborative Multi-Agent Architectures for Autonomous Robot Self-Optimization Via Shared Deliberation |
|
| Enose Kamalabai, Nampuraja | Infosys |
Keywords: AI-Enabled Robotics
Abstract: Autonomous mobile robots require continuous adaptation of operational and control parameters across tightly coupled subsystems to maintain a stable performance. We propose a multi-agent architecture where specialized Large Language Model (LLM) based agents collaborate through a shared conversation network and leverage interaction history to achieve runtime self-optimization. Validated on the Rover Robotics 2WD platform running ROS2 Humble, the system orchestrates multiparameter tuning across 396+ parameters, spanning 15 ROS2 nodes including Nav2 navigation stack, AMCL localization, and velocity control. This is defined within a transformation tree that encompasses 18 coordinate frames, extending from the map down to individual sensor frames. All agents participate in shared deliberation, enabling emergent cross-subsystem optimization and inherent explainability through natural language reasoning.
|
| |
| 10:45-12:05, Paper FrZA.16 | |
| What Over How: Sparse Graphical Task Models from Minimal Demonstrations |
|
| Röfer, Adrian | University of Freiburg |
| Heppert, Nick | University of Freiburg |
| Valada, Abhinav | University of Freiburg |
Keywords: Learning from Demonstration, Task and Motion Planning, Probability and Statistical Methods
Abstract: Learning robotic manipulation from demonstration has traditionally emphasized behavior‑cloning approaches that map raw state observations to actions, thereby focusing on how a task is performed. Such methods are fragile to substantial variations in task‑space scale, layout, or embodiment, even after extensive training. To improve robustness, recent work has introduced object‑centric representations, yet these still struggle under large environmental changes. An answer to this challenge can lie in understanding the high-level goals of a task first, by modeling manipulation tasks as evolving object graphs which capture the semantic intent (e.g., toast to toaster, to plate, to tray). Our method constructs probabilistic kinematic graphs, which connects objects throughout an entire manipulation, including idle phases. Unlike earlier approaches that require known object correspondences, we match objects across demonstrations using similarity of pretrained visual feature vectors, and we further simplify matching by focusing on transitions between independent subgraphs. This yields compact activation/deactivation sequences and pose distributions for objects at key moments. We study the quality of segmentations on two datasets and a robotic benchmark and qualitatively deploy our approach on a real robotic system.
|
| |
| 10:45-12:05, Paper FrZA.17 | |
| Towards Analyzing the Characteristics of Model-Based and Model-Free Decision Making Algorithms |
|
| Hall, Adam W. | University of Toronto |
| Che, Mingxuan | TU Munich |
| Sawant, Shambhuraj | Max Planck Institute for Intelligent Systems |
| Zou, Joey | University of Toronto |
| Pizarro Bejarano, Federico | University of Toronto |
| Brunke, Lukas | Technical University of Munich |
| Zhou, Siqi | Technical University of Munich |
| Schoellig, Angela P. | TU Munich |
Keywords: RIG TC: Foundations of Optimization and Learning for Robotics, Machine Learning for Robot Control, RIG TC: Principles and Methods for Building AI-powered Robust and Resilient Robots
Abstract: Continuous advancements in robot sensors, actuators, and computing hardware have made robot platforms more compact, capable, and accessible. With improvements in hardware, as well as supporting tools such as large-scale simulation, we have seen decision-making algorithms push the limits of what is possible, achieving previously unseen performance. The rapid progress is also creating a disparity between modular, model-based control methods and data-driven learning approaches, with a limited understanding of the fundamental trade-offs that differentiate them in real-world deployments. These algorithms vary primarily in how they internalize a model of the world---from analytical representations in optimal and learning-based control to experiential data representations in reinforcement learning; these choices lead to distinct processes during policy optimization and consequently behaviours during deployment. In this work, we present a systematic analysis of representative methods across this spectrum and provide a holistic view of their characteristics using six metrics that capture core trade-offs in robot decision-making: model complexity, learning complexity, runtime efficiency, performance, robustness, and task generalization. Through extensive experiments on a practical robotic platform, supported by open-source implementations, we demonstrate how these trade-offs manifest in realistic settings and highlight key considerations for selecting appropriate decision-making frameworks. As robot systems begin to incorporate larger amounts of data, these distinctions will provide a crucial foundation for developing decision-making algorithms that scale safely and effectively for future applications.
|
| |
| 10:45-12:05, Paper FrZA.18 | |
| KI.Fabrik: Shaping the AI-Driven Factory of the Future by Turning Embodied AI into Industrial Reality |
|
| Rajaei, Nader | Technical University of Munich |
| Rudenko, Andrey | Robert Bosch GmbH |
| Lehsing, Christian | Technical University of Munich |
| Vineet, Nagrath | Technical University of Munich |
| Koenig, Alexander | Technical University of Munich |
| Klaus, Diepold | Technical University of Munich |
| Alois, Knoll | Technical University of Munich |
| Lilienthal, Achim J. | TU Munich |
Keywords: Intelligent and Flexible Manufacturing, Factory Automation, Embodied Cognitive Science
Abstract: While AI and automation are successfully prototyped in research labs, they often struggle with complex real production environments. This is particularly true for ``Embodied AI" systems. KI.Fabrik was established as a long-term program to bridge this gap, moving beyond isolated demonstrations toward reliable industrial use. By utilizing a networked ecosystem of modular components, ranging from robot learning hubs to remote teleoperation portals, KI.Fabrik enables the system-level development necessary to turn flexible, AI-driven manufacturing and ``Production-as-a-Service" into a tangible reality.
|
| |
| 10:45-12:05, Paper FrZA.19 | |
| SURE: Safe Uncertainty-Aware Robot-Environment Interaction Using Trajectory Optimization |
|
| Zhang, Zhuocheng | Technical University of Munich |
| Zhao, Haizhou | New York University |
| Sun, Xudong | Technical University of Munich |
| Johnson, Aaron M. | Carnegie Mellon University |
| Khadiv, Majid | Technical University of Munich |
Keywords: Optimization and Optimal Control, Planning under Uncertainty, RIG TC: Foundations of Optimization and Learning for Robotics
Abstract: Robotic tasks involving contact interactions pose significant challenges for trajectory optimization due to discontinuous dynamics. Conventional formulations typically assume deterministic contact events, which limit robustness and adaptability in real-world settings. In this work, we propose SURE, a robust trajectory optimization framework that explicitly accounts for contact timing uncertainty. By allowing multiple trajectories to branch from possible pre-impact states and later rejoin a shared trajectory, SURE achieves both robustness and computational efficiency within a unified optimization framework. We evaluate SURE on two representative tasks with unknown impact times. In a cart–pole balancing task involving uncertain wall location, SURE achieves an average improvement of 21.6% in success rate when branch switching is enabled during control. In an egg-catching experiment using a robotic manipulator, SURE improves the success rate by 40%. These results demonstrate that SURE substantially enhances robustness compared to conventional nominal formulations.
|
| |
| 10:45-12:05, Paper FrZA.20 | |
| Multimodal Human-Cobot-Interaction for Collaborative Tasks |
|
| Milde, Sven | Fulda University of Applied Sciences |
| Milde, Jan-Torsten | Fulda University of Applied Science |
| Blum, Rainer | Fulda University of Applied Sciences |
| Mueller, Tobias | Fulda University of Applied Sciences |
| Schultheis, Marius | Fulda University of Applied Sciences |
| Schreiner, Niklas | Alpaka Innovation |
Keywords: Human-Robot Collaboration, Behavior-Based Systems, Virtual Reality and Interfaces
Abstract: We present CoMeSy, a specialized system for multimodal human-cobot interaction designed to make collaboration more intuitive and efficient through the use of natural language and physical gestures. A primary finding of our research is the system's ability to interpret linguistic instructions situationally, which enables the robot to resolve ambiguous commands and deictic expressions—such as pointing—within a shared workspace. To achieve this, we implemented Dynamic Action Planning using Behavior Trees, which allow the system to decompose complex tasks into manageable subtasks while adapting to evolving environmental states. This is paired with a reactive and robust control architecture that ensures the cobot can adjust to unexpected changes or interruptions without sacrificing performance. Furthermore, the system utilizes a Unified Control Architecture, allowing it to operate interchangeably between a physical work cell and a virtual Augmented Reality (AR) application via a Meta Quest 3, all powered by the same ROS2 computer. Finally, the framework employs Hierarchical Language Processing, a two-level conceptual approach that processes both immediate situational modifiers and more complex multi-stage action sequences.
|
| |
| 10:45-12:05, Paper FrZA.21 | |
| Real-Time Ground Reaction Force Estimation for Wearable Robotics Using Temporal Convolutional Networks |
|
| Jazini, Mohammadjavad | Technical University of Darmstadt |
| Firouzi, Vahid | Technical University of Darmstadt |
| Sharbafi, Maziar | Technical University of Darmstadt |
| Findeisen, Rolf | Technical University of Darmstadt |
Keywords: AI-Based Methods, Rehabilitation Robotics
Abstract: Estimating ground reaction forces outside laboratory environments is a key requirement for wearable robotics and mobile gait analysis. While force plates and instrumented treadmills provide accurate measurements, their stationary nature limits applicability in real-world scenarios. This paper presents a learning-based framework for estimating vertical ground reaction forces from lower-body inertial measurement unit data using temporal convolutional networks. The approach exploits the causal and computationally efficient structure of temporal convolutional networks to enable continuous, low-latency estimation suitable for real-time deployment. Two processing paradigms are investigated: a gait-cycle–segmented formulation and a fully continuous formulation that avoids explicit gait phase detection. Their behavior with respect to accuracy, robustness, and generalization across subjects and walking speeds is analyzed. The results highlight the trade-off between prediction accuracy and real-time feasibility, demonstrating the suitability of the proposed approach for wearable and assistive robotic systems.
|
| |
| 10:45-12:05, Paper FrZA.22 | |
| Identifying Inductive Biases for Efficient Robot Co-Design |
|
| Vaish, Apoorv | Technische Universität Berlin |
| Brock, Oliver | Technische Universität Berlin |
Keywords: Soft Robot Materials and Design, Soft Sensors and Actuators, RIG TC: AI-Driven Co-Design for Task-Oriented Humanoids
Abstract: The co-design of robot morphology and control is computationally intensive, as we lack inductive biases tailored to it. We analyze co-design landscapes to systematically identify inductive biases tailored to the structure of co-design problems. Our method reveals that within regions of the co-design space, a low-dimensional manifold governs the quality of co-designs. Higher-quality regions exhibit variation across more dimensions, with a tighter coupling between morphology and control. To leverage these inductive biases, we propose an adaptive co-design algorithm that extracts this low-dimensional structure within a region and adjusts its exploration bias accordingly.
|
| |
| 10:45-12:05, Paper FrZA.23 | |
| SHaRe-RL: Towards Co-Constructing Industrial Manipulation with Human-In-The-Loop Reinforcement Learning |
|
| Stranghöner, Jannick | Universität Bielefeld |
| Hartmann, Philipp | Bielefeld University |
| Braun, Marco | Bielefeld University |
| Wrede, Sebastian | Bielefeld University / Fraunhofer IOSB-INA |
| Neumann, Klaus | Bielefeld University / Fraunhofer IOSB-INA |
Keywords: RIG Cluster: Human-Robot Interaction, RIG Cluster AI-Powered Industrial Robotics, RIG Cluster: Learning and Multimodal AI for Robotics
Abstract: Reinforcement learning (RL) is a promising route to adaptive robot assembly, yet real-world training is often sample-inefficient and unsafe in contact-rich tasks. A recurring lesson from practical systems is to leverage domain expertise as priors that restrict learning to plausible behaviors. Motivated by this principle, we present SHaRe-RL, a real-world RL framework that takes a first step toward emph{co-constructing} contact-rich assembly skills with an operator by combining multiple forms of prior knowledge. SHaRe-RL integrates (i) a partial task specification via a manipulation primitive net, (ii) operator demonstrations and online interventions, and (iii) a deterministic per-axis compliance layer that bounds interaction forces during exploration. This design yields sample-efficient learning and scales to complex, vision-based assembly. On insertion of Harting Han-Modular connectors with SI{0.2}{millimetre}--SI{0.4}{millimetre} clearance, SHaRe-RL reaches reliable performance within a three-hour wall-clock budget.
|
| |
| 10:45-12:05, Paper FrZA.24 | |
| Automated Acceptance Testing of Robotic Systems Using Behavior-Driven Models |
|
| Nguyen, Minh | University of Bremen |
| Wrede, Sebastian | Bielefeld University |
| Hochgeschwender, Nico | University of Bremen |
Keywords: Software Tools for Robot Programming, RIG TC: Principles and Methods for Building AI-powered Robust and Resilient Robots, RIG Cluster: Safety, Reliability and Resilience of AI-based Robotics
Abstract: We present an approach extending behavior-driven development (BDD) for robotic systems by introducing explicit, composable behavior-driven models for automated acceptance testing. These models formalize acceptance criteria by combining temporal semantics, domain knowledge about robots, objects, and environments, and inter-scenario relations, specifying what makes robotic behavior acceptable. We represent these models as knowledge graphs, which enable querying, manipulation, and transformation into executable test artifacts. To improve developer usability, a domain-specific language is provided and can be transformed into the underlying graph representation. These models enable systematic test case generation, automated execution in simulation (e.g., Isaac Sim), and evaluation of robotic behavior under diverse variations, providing evidence of behavior conformance and failure modes.
|
| |
| 10:45-12:05, Paper FrZA.25 | |
| Robot Path Planning Via Flow Matching with Safety and Adaptivity through Predictive Control |
|
| Holzmann, Philipp | Technichal University of Darmstadt |
| Pfefferkorn, Maik | Technical University of Darmstadt |
| Carvalho, João | Albert-Ludwigs-Universitaet Freiburg |
| Younes, Ali | TU Darmstadt |
| Le, An Thai | Vin University |
| Chalvatzaki, Georgia | Technische Universität Darmstadt |
| Peters, Jan | Technische Universität Darmstadt |
| Findeisen, Rolf | Control and Cyber-Pysical Systems Laborator |
Keywords: Constrained Motion Planning, Robot Safety, Machine Learning for Robot Control
Abstract: Learning-based path planners based on diffusion and flow matching can generate diverse trajectories from demonstrations but classically lack guarantees on safety and con- straint satisfaction during deployment. We propose a framework that integrates flow-matching-based path planning, trained on demonstrations, with model predictive path-following control to combine data-driven path diversity with real-time safe execution. The flow-matching model efficiently captures multimodal path distributions, while predictive controller adapts the motion online, ensuring satisfaction of state/input constraints and obstacle avoidance. We introduce an event-triggered re-planning scheme that biases new path generation using solutions from the predictive controller, enabling safety even in environments with previously unseen obstacles.
|
| |
| 10:45-12:05, Paper FrZA.27 | |
| CRISP - Compliant ROS2 Controllers for Learning-Based Manipulation Policies and Teleoperation |
|
| San José Pro, Daniel | Technical University Munich |
| Hausdörfer, Oliver | Technical University of Munich |
| Römer, Ralf | Technical University of Munich |
| Dösch, Maximilian | Techinical University Munich |
| Schuck, Martin | Technical University of Munich |
| Schoellig, Angela P. | Technical University of Munich |
Keywords: RIG Cluster: Learning and Multimodal AI for Robotics, Imitation Learning, Compliance and Impedance Control
Abstract: Learning-based controllers, such as diffusion policies and vision-language action models, often generate low-frequency or discontinuous robot state changes. Achieving smooth reference tracking requires a low-level controller that converts high-level targets commands into joint torques, enabling compliant behavior during contact interactions. We present CRISP, a lightweight C++ implementation of compliant Cartesian and joint-space controllers for the ROS2 control standard, designed for seamless integration with high-level learning-based policies as well as teleoperation. The controllers are compatible with any manipulator that exposes a joint-torque interface. Through our Python and Gymnasium interfaces, CRISP provides a unified pipeline for recording data from hardware and simulation and deploying high-level learning-based policies seamlessly, facilitating rapid experimentation. The system has been validated on hardware with the Franka Robotics FR3 and in simulation with the Kuka IIWA14 and Kinova Gen3. Designed for rapid integration, flexible deployment, and real-time performance, our implementation provides a unified pipeline for data collection and policy execution, lowering the barrier to applying learning-based methods on ROS2-compatible manipulators. Detailed documentation is available at the project website.
|
| |
| 10:45-12:05, Paper FrZA.28 | |
| Cable Combat: The Manipulandum Begins – a Force-Exerting 3D Interface for Biomechanical Assessment and Assistive Robotics |
|
| Behrendt, Jacob | Friedrich-Alexander Universität Erlangen-Nürnberg |
| Demir, Ayşe Betül | FAU Erlangen-Nuernberg |
| Scheidl, Marc-Anton | Friedrich-Alexander Universität Erlangen-Nürnberg |
| Castellini, Claudio | Friedrich-Alexander-Universität Erlangen-Nürnberg |
| Thuerauf, Sabine | Friedrich-Alexander-University Erlangen-Nuremberg |
Keywords: Calibration and Identification, Telerobotics and Teleoperation, RIG Cluster: Healthcare Robotics and Human Augmentation
Abstract: In this work, we describe our concept for a 3D manipulandum (a device to concurrently measure position, orientation, and interaction forces) and its necessity. The designed manipulandum is cable-driven and an accurate measurement system that can also exert forces on the user. It can be used for multiple purposes, like the evaluation of assistive devices, the evaluation and calibration of biomechanical models, the tracking of rehabilitation or neurodegenerative diseases, and for impedance parameter measurements of the human body, like stiffness.
|
| |
| 10:45-12:05, Paper FrZA.29 | |
| Whole-Body Diffusion Trajectory Generation and Reinforcement Learning Control for Humanoid Loco-Manipulation |
|
| Omar, Shafeef | Technical University of Munich |
| Yu, Dian | Technical University of Munich |
| Khadiv, Majid | Technical University of Munich |
Keywords: RIG Cluster: Learning and Multimodal AI for Robotics, RIG Cluster: Legged Locomotion, Whole-Body Motion Planning and Control
Abstract: Loco-manipulation for humanoid robots remains a long-standing challenge due to the difficulty of learning complex whole-body behaviours from scratch. While humans can easily perform many such tasks, collecting teleoperation data at scale is often infeasible. In this work, we propose a framework that enables large-scale whole-body trajectory generation and control for humanoid robots from limited demonstrations. Specifically, we leverage Sampling-Based Trajectory Optimisation (SBTO) to generate a rich and physically consistent motion dataset after kinematic retargeting, which is then used by a diffusion-based whole-body motion planner to generate motions that are tracked using a general low-level tracking RL policy. By decoupling motion generation from control, the diffusion model produces diverse, smooth, and feasible trajectories, while the lightweight RL policy ensures robust high-frequency tracking. Experiments demonstrate versatile object transport behaviors, including kicking, pushing, and carrying, from arbitrary initial states, significantly outperforming baselines trained on raw retargeted data with physical inconsistencies.
|
| |
| 10:45-12:05, Paper FrZA.30 | |
| Velocity Field Based Data Augmentation for Corrective Imitation Learning |
|
| Ma, Shiping | Technische Universität Berlin |
| Auddy, Sayantan | Technische Universität Berlin |
| Toussaint, Marc | TU Berlin |
Keywords: Data Sets for Robot Learning, Imitation Learning
Abstract: While imitation learning (IL) has demonstrated strong performance in a variety of tasks, policies trained purely from demonstrations often suffer from distribution mismatch during execution. Small errors can accumulate over time, driving the system into states that are insufficiently covered by the training data and leading to task failure. Although distribution shift in IL has been widely studied, existing approaches often provide limited coverage of recovery behaviors from off-trajectory states. In this abstract, we collect demonstrations using a motion-planning solver in simulation and then construct a velocity field around the trajectories to generate additional recovery behaviors. We investigate whether the proposed approach can enhance the robustness of imitation learning.
|
| |
| 10:45-12:05, Paper FrZA.31 | |
| A ROS-Based Platform for Standard-Compliant Haptic Teleoperation |
|
| Liu, Siwen | Technical University of Munich |
| Zhou, Xuanyu | Technical University of Munich |
| Xu, Xiao | Technical University of Munich |
| Steinbach, Eckehard | Technical University of Munich |
Keywords: Telerobotics and Teleoperation, RIG Cluster: Rigorous Perception, RIG Cluster: Human-Robot Interaction
Abstract: This work presents an IEEE 1918.1.1-compliant human-in-the-loop haptic teleoperation testbed. The testbed implements the haptic codec defined in the standard, enabling bandwidth-efficient haptic communication and teleoperation in the presence of communication delay. Two key contributions are the first implementations of the required metadata exchange as an implicit handshake mechanism and the bilateral Plug-and-Play (PnP) procedure specified in IEEE 1918.1.1, which were not included in the previously released open-source reference code accompanying the standard. The system is realized on Linux and evaluated in human-in-the-loop experiments using a NOVINT Falcon as the leader device and a simulated follower robot in Gazebo (Panda arm), demonstrating correct operation of the codec, metadata exchange, and PnP mechanism.
|
| |
| 10:45-12:05, Paper FrZA.32 | |
| LLM-Pack: Intuitive Grocery Handling for Logistics Applications |
|
| Blei, Yannik | University of Technology Nuremberg |
| Krawez, Michael | University of Technology Nuremberg |
| Göß, Adrian | University of Technology Nuremberg (UTN) |
| Jülg, Tobias Thomas | University of Technology Nuremberg |
| Krack, Pierre | University of Technology Nuremberg |
| Walter, Florian | Technical University Munich |
| Burgard, Wolfram | University of Technology Nuremberg |
Keywords: Logistics, Manipulation Planning, Deep Learning in Grasping and Manipulation
Abstract: Robotics and automation are increasingly influential in logistics but remain largely confined to traditional warehouses. In grocery retail, advancements such as cashier-less supermarkets exist, yet customers still manually pick and pack groceries. While robotics research has extensively addressed bin picking, packing objects has received comparatively little attention. Packing grocery items, however, can be crucial for several reasons. First, densely packing objects is typically beneficial for optimizing subsequent logistics due to the saved space. Second, the order in which the items are packed can be important for preventing product damage, e.g., heavy objects should not be placed on top of fragile ones. Unfortunately, it is difficult to exactly specify the criteria for the right packing scheme for all objects the robot might encounter, given the huge variety of objects typically found in stores. In this paper, we introduce LLM-Pack, a novel approach to grocery packing. LLM-Pack leverages language and vision foundation models for identifying groceries and generating packing constraints that mimic human packing strategies. These constraints serve as input to a Mixed-Integer Linear Programming (MIP) approach, which computes an optimal packing scheme. LLM-Pack does not require any training and can flexibly handle new grocery items. We evaluate our approach in simulation and real-world experiments to demonstrate its performance.
|
| |
| 10:45-12:05, Paper FrZA.33 | |
| Policy Distillation from a Model-Based Expert for Non-Prehensile Manipulation |
|
| Shcherba, Denis | TU Berlin |
| Toussaint, Marc | TU Berlin |
Keywords: Imitation Learning
Abstract: While significant advancements have been made in prehensile pick-and-place operations, achieving human-level manipulation capabilities requires further advancements in non-prehensile manipulation. Strategies such as pushing, sliding, or toppling remain formidable challenges, as they require the system to reason about complex contact dynamics and frictional forces without the stability of a firm grasp. This motivates the use of policy distillation from a model-based solver. We leverage an non linear programming based optimal planner in simulation to generate an abundance of optimal trajectories for complex non-prehensile tasks, such as hooking books from a shelf or pushing a puck. This privileged knowledge is then distilled into a sensorimotor student policy. Through extensive ablation studies, we evaluate the impact of different observation modalities—including depth maps and pre-trained visual features—alongside various policy paradigms, ranging from transformer-based sequence-to-sequence models to diffusion-based policies. Finally, we investigate sim-to-real capabilities, addressing sensor mismatch and distribution shifts by comparing simulation-trained models with policies trained on real-world data.
|
| |
| 10:45-12:05, Paper FrZA.34 | |
| Learning Semantic-Geometric Task Graph-Representations from Human Demonstrations |
|
| Herbert, Franziska | Technische Universität Darmstadt |
| Prasad, Vignesh | Technische Universität Darmstadt |
| Liu, Han | Technische Universität Darmstadt |
| Koert, Dorothea | Technische Universität Darmstadt |
| Chalvatzaki, Georgia | Technische Universität Darmstadt |
Keywords: Representation Learning, Semantic Scene Understanding, Bimanual Manipulation
Abstract: Learning structured task representations from human demonstrations is essential for understanding long-horizon manipulation behaviors, particularly in bimanual settings where action ordering, object involvement, and interaction geometry can vary significantly. A key challenge lies in jointly capturing the discrete semantic structure of tasks and the temporal evolution of object-centric geometric relations in a form that supports reasoning over task progression. In this work, we introduce a semantic–geometric task graph-representation that encodes object identities, inter-object relations, and their temporal geometric evolution from human demonstrations. Building on this formulation, we propose a learning framework that combines a Message Passing Neural Network (MPNN) encoder with a Transformer-based decoder, decoupling scene representation learning from action-conditioned reasoning about task progression. The encoder operates solely on temporal scene graphs to learn structured representations, while the decoder conditions on action-context to predict future action sequences, associated objects, and object motions over extended time horizons. Through extensive evaluation on human demonstration datasets, we show that semantic–geometric task graph-representations are particularly beneficial for tasks with high action and object variability, where simpler sequence-based models struggle to capture task progression. Finally, we demonstrate that task graph representations can be transferred to a physical bimanual robot and used for online action selection, highlighting their potential as reusable task abstractions for downstream decision-making in manipulation systems.
|
| |
| 10:45-12:05, Paper FrZA.35 | |
| A Unified Human-Likeness Criterion for Evaluating Human-Like Motion Retargeting on Bimanual Manipulation Tasks |
|
| Meixner, Andre | Karlsruhe Institute of Technology (KIT) |
| Carl, Mischa | Karlsruhe Institute of Technology (KIT) |
| Krebs, Franziska | Karlsruhe Institute of Technology (KIT) |
| Steudle, Steffen | Karlsruhe Institute of Technology |
| Jaquier, Noémie | KTH Royal Institute of Technology |
| Asfour, Tamim | Karlsruhe Institute of Technology (KIT) |
Keywords: RIG Cluster: Manipulate Anything, Anywhere, Anytime, Human and Humanoid Motion Analysis and Synthesis, Bimanual Manipulation
Abstract: Understanding bimanual and human-like motion is pivotal to equip humanoid robots with human-like capabilities and manipulation skills and to enable intuitive human-robot interaction. In this extended abstract, we present a multi-modal dataset of accurate whole-body human bimanual manipulation actions. Moreover, we conceptualize and derive a novel unified human-likeness criterion to assess human-like robot motions, which we evaluated across applications related to motion retargeting on bimanual manipulation tasks. Building on these results, we propose an importance-based motion retargeting approach improving human likeness.
|
| |
| 10:45-12:05, Paper FrZA.36 | |
| Supervisory Control for Runtime-Safe LLM-Generated Swarm Controllers |
|
| Bauer, Jannis | Technical University of Darmstadt |
| Isildar, Ecem | Technical University of Darmstadt |
| Gross, Roderich | Technical University of Darmstadt |
|
|
| |
| 10:45-12:05, Paper FrZA.37 | |
| Towards Automated Disassembly for Battery Removal of Robot Vacuum Cleaners |
|
| Singh, Dheeraj | Fraunhofer IPA |
| Hoeltge, Lasse | Fraunhofer IPA |
| Al Assadi, Anwar | Fraunhofer IPA |
| Bargmann, Daniel | Fraunhofer IPA |
| Kraus, Werner | Fraunhofer IPA |
| Huber, Marco F. | Fraunhofer IPA |
Keywords: Disassembly, Factory Automation
Abstract: The annual amount of electronic waste is increas- ing worldwide, and the number of battery-powered electrical appliances, such as Robot Vacuum Cleaners (RVCs), is also on the rise. The current state-of-the-art disposal process involves collection, partial disassembly, shredding, and subsequent sort- ing, which currently does not prioritize batteries, leading to fire incidents. Furthermore, this process does not guarantee that other value-preserving recycling methods can directly reuse materials as recyclate. Rather than having humans perform this dirty and hazardous task, robot-based disassembly could serve as a new and effective treatment to address these issues. This paper presents a case study of a critical step: the robot- based removal of batteries from RVCs. The proposed solution is a robust and modular robotic cell that fully automates the removal of spring terminal batteries from RVCs once a user feeds the object into the cell. The disassembly process consists of screw detection, unfastening, removal of the battery cover, and finally, removal of the battery. For this objective, we use a force- controlled skill-based robot programming framework combined with computer vision-based detection and comprehensive error handling strategies. Using the robot cell, we achieved an 81% success rate in the entire pipeline for different models.
|
| |
| 10:45-12:05, Paper FrZA.38 | |
| Data-Free Training of Diverse Neural Samplers for Constrained Sets |
|
| Burghoff, Tilman | Technical University Berlin |
| Braun, Cornelius Valentin | Technische Universität Berlin |
| Toussaint, Marc | TU Berlin |
Keywords: Deep Learning Methods, AI-Based Methods, RIG Cluster: Learning and Multimodal AI for Robotics
Abstract: Sampling from constrained sets is a core paradigm for many robotics problems, for example grasping or path planning. Although optimizing under constraints is a well studied problem, sampling under constraints has received less attention. In comparison, diffusion and flow based sampling is well-studied and has proven effective in many domains. However, these models usually need a lot of data for their training. This proves to be a bottleneck, especially in robotics, where datasets are often not readily available and expensive to create. We propose a novel method to train diffusion and flow matching models to sample uniformly from a constrained set. Our algorithm trains the model without any initial dataset, using only differentiable constraints. This allows novel applications for problems where constraints are known but data is rare.
|
| |
| 10:45-12:05, Paper FrZA.39 | |
| CLEVERR: Commonsense LLM-Enhanced Vehicle Routing for Efficient Room Rearrangement |
|
| Gassen, Martina | Technical University of Darmstadt |
| Rudra, Sohan | TU Darmstadt |
| Prasad, Vignesh | TU Darmstadt |
| Schultze, Sven | Technical University of Darmstadt |
| Chalvatzaki, Georgia | Technische Universität Darmstadt |
Keywords: Integrated Planning and Learning, Task Planning, Domestic Robotics
Abstract: Tidying up rooms is a challenging task that requires agents to navigate their environments, locate and interact with objects, and reason about placements. Given the open-ended nature of room tidying, recent works have shifted their focus to using Large Language Models (LLMs). However, existing approaches often rely on predefined goal states or incur high prompting costs, leaving efficiency in goal-free tidying largely unaddressed. Different from existing works, we present a novel method that combines the commonsense reasoning of LLMs and the classical optimization based planning methods for efficient and open-ended goal-free room rearrangement. We incrementally constructs a 3D semantic scene graph and queries an LLM to identify misplaced objects, propose plausible placements, and highlight areas of interest. The resulting waypoints are optimized via a Vehicle Routing Problem (VRP) formulation to minimize travel and improve tidying efficiency. Our method outperforms greedy baselines in both tidying efficiency and success rates, achieveing high accuracy in misplaced object detection and placement suggestions. Finally, we demonstrate real-world feasibility through qualitative experiments on misplacement detection and placement reasoning.
|
| |
| 10:45-12:05, Paper FrZA.40 | |
| Unified Legged Locomotion: A Single Policy for Millions of Morphologies |
|
| Bohlinger, Nico | TU Darmstadt |
| Peters, Jan | Technische Universität Darmstadt |
Keywords: RIG Cluster: Legged Locomotion, Reinforcement Learning, Humanoid and Bipedal Locomotion
Abstract: We present a single, general locomotion policy trained on a diverse collection of 50 legged robots. By combining an improved embodiment-aware architecture (URMAv2) with a performance-based curriculum for extreme Embodiment Randomization, our policy learns to control millions of morphological variations. Our policy achieves zero-shot transfer to unseen real-world humanoid and quadruped robots.
|
| |
| 10:45-12:05, Paper FrZA.41 | |
| Verifier-Guided Action Selection for Embodied Agents |
|
| Singhi, Nishad | Technische Universität Darmstadt |
| Bialas, Christian | TU Darmstadt |
| Jauhri, Snehal | TU Darmstadt |
| Prasad, Vignesh | TU Darmstadt |
| Chalvatzaki, Georgia | Technische Universität Darmstadt |
| Rohrbach, Marcus | TU Darmstadt |
| Rohrbach, Anna | TU Darmstadt |
Keywords: Integrated Planning and Learning, Task Planning, RIG Cluster: Learning and Multimodal AI for Robotics
Abstract: Creating generalist embodied agents that solve complex real-world tasks is a grand challenge for AI. Mul- timodal large language models (MLLMs) have enhanced the reasoning capabilities of embodied agents, yet they struggle with distributional shifts. Standard Chain-of-Thought (CoT) reasoning improves performance but is insufficient to overcome these out-of-distribution challenges. We introduce Verifier- Guided Action Selection (VeGAS), a novel framework that fundamentally enhances the robustness of MLLM reasoning by integrating an explicit verification step. Instead of relying on a single decoded action, VeGAS generates an ensemble of candidate actions and employs a learned generative verifier to select the most reliable action. To train the verifier without any costly human data collection, we introduce an LLM-driven data generation strategy to automatically synthesise a diverse curriculum of failures, enabling the verifier to learn from a rich distribution of potential mistakes. Experiments on long- horizon embodied reasoning tasks showcase the power of the proposed approach to improve performance and generalization across all tasks, leading even to a 70% relative performance gain on challenging scenarios over strong CoT baselines.
|
| |
| 10:45-12:05, Paper FrZA.42 | |
| Fast Path, Slow State: Dual-Rate Vision-Language-Action Control under Asynchrony |
|
| Vanjani, Pankhuri | Karlsruhe Institute of Technology |
| Reuss, Moritz | Karlsruher Institut of Technology |
| Li, Zhuoyue | Karlsruhe Institute of Technology |
| Suliga, Jakub | Karlsruhe Institute of Technology |
| Lioutikov, Rudolf | Karlsruhe Institute of Technology |
Keywords: RIG TC: Robotics Foundation Models, Imitation Learning, Learning from Demonstration
Abstract: Real-world robot perception runs on heterogeneous sensors at different rates and receive observations asynchronously, but most VLA methods assume synchronized observation bundles. We propose a dual-rate VLA design that decouples the control-rate of action generation from slower context estimation. A fast flow-matching style action generator acts continuously from the latest available tokens, while a compact slow state context is updated asynchronously. We evaluate robustness by injecting controlled modality delays and frame dropouts at inference time, and study update rules and representations for the slow state context. Preliminary LIBERO results indicate that compressed memory tokens, particularly GRU-based compression, improve success and inference-time dropout robustness compared to naive frame stacking, and single system baseline motivating event-triggered context updates as a next step.
|
| |
| 10:45-12:05, Paper FrZA.43 | |
| CampusEye: A Visual Corpus for Self-Localization on the Fulda Campus |
|
| Milde, Jan-Torsten | Fulda University of Applied Science |
Keywords: Data Sets for Robotic Vision, Vision-Based Navigation
Abstract: The CampusEye project establishes a visual dataset designed to help autonomous robots navigate the Fulda University of Applied Sciences using only basic cameras. We identified 55 specific locations across the campus, using the existing tactile guidance system to ensure precise positioning and environmental variety. By recording panoramic videos at these spots, the team captured over one million images to reflect different lighting and perspectives. This data was used to train a lightweight deep learning model that identifies a robot's location and heading with high accuracy. The study proves that vision-based localization is a functional, low-cost alternative to expensive sensors like LiDAR for navigating complex pedestrian environments. Overall, the corpus provides a robust foundation for real-time robotic orientation in semi-structured outdoor spaces.
|
| |
| 10:45-12:05, Paper FrZA.44 | |
| RoboGrounder - Grounded Spatio-Temporal Reasoning for Robotics |
|
| Blank, Nils | KIT |
| Lioutikov, Rudolf | Karlsruhe Institute of Technology |
Keywords: Deep Learning for Visual Perception, Data Sets for Robot Learning, Data Sets for Robotic Vision
Abstract: High-quality language and reasoning annotations are essential for training generalizable Vision Language Action Models (VLAs), yet manual labeling is unscalable. Existing automated methods often lack precision in robotic domains or require complex hand-crafted annotation pipelines. To address this, we introduce RoboGrounder, a VLM framework designed to generate reliable spatio-temporal reasoning and object-centric annotations for manipulation demonstrations. We propose a robust annotation pipeline that combines foundation models with robot proprioception to annotate a diverse dataset sourced from various robot manipulation of 500k VQA grounding pairs. Experiments demonstrate that RoboGrounder significantly outperforms base models, showing substantial improvements in task identification, temporal action localization, object interaction detection, and spatial grounding accuracy.
|
| |
| 10:45-12:05, Paper FrZA.45 | |
| MAD-IRL: Multi-Agent Drone Racing Using Inverse RL and Iterative Best-Response MPC |
|
| Schlüter, Niklas | Technical University of Munich |
| Schuck, Martin | Technical University of Munich |
| Brunke, Lukas | Technical University of Munich |
| Samavi, Sepehr | University of Toronto |
| Zhou, Siqi | Technical University of Munich |
| Schoellig, Angela P. | TU Munich |
Keywords: Aerial Systems: Perception and Autonomy, Multi-Robot Systems, RIG Cluster: Safety, Reliability and Resilience of AI-based Robotics
Abstract: While autonomous drone racing has achieved superhuman performance in time trials, competitive multi-agent racing remains challenging due to complex interactions. We address this by modeling opponents as optimal agents, using Inverse Reinforcement Learning (IRL) to infer their reward functions. This learned reward drives an iterative best response Model Predictive Control (MPC) that jointly predicts opponent behavior and optimizes our agent’s strategy, explicitly accounting for collisions and aerodynamic effects. Extensive experiments demonstrate that our approach significantly improves prediction accuracy across diverse opponent controllers, which translates directly to higher success rates in interactive maneuvers, such as overtaking, and therefore lower crash rates.
|
| |
| 10:45-12:05, Paper FrZA.46 | |
| Title: Pinky 2: A Vision-Based Tactile Sensor for Minimal Invasive Surgery |
|
| Koch, Robin | Technische Universität Dresden |
| Mascot, Annabella | Stanford University |
| Younis, Rayan | University Hospital and Medical Faculty Carl Gustav Carus, TU Dresden |
| Wagner, Martin | University Hospital and Faculty of Medicine Carl Gustav Carus at TUD Dresden University of Technology |
| Speidel, Stefanie | National Center for Tumor Diseases |
| Cutkosky, Mark | Stanford University |
| Sieber, Ingo | Karlsruhe Institute of Technology (KIT) |
| Calandra, Roberto | TU Dresden |
|
|
| |
| 10:45-12:05, Paper FrZA.47 | |
| Structured Planning Using Vision Language Models for Physical Agents |
|
| S Prabhu, Pranav | Technische Universität Dortmund |
| Xavier, Aaron | Technische Universität Dortmund |
|
|
| |
| 10:45-12:05, Paper FrZA.48 | |
| Sim-To-Real for Muscle-Actuated Robots Via Learned Actuator Models |
|
| Schneider, Jan | Max Planck Institute for Intelligent Systems, Tübingen |
| Mahajan, Mridul | Boston University |
| Chen, Le | Max Planck Institute for Intelligent Systems, Tübingen |
| Guist, Simon | Max Planck Institute for Intelligent Systems, Tübingen |
| Schölkopf, Bernhard | ELLIS Institute Tübingen |
| Posner, Ingmar | University of Oxford |
| Büchler, Dieter | University of Alberta |
Keywords: Soft Robot Applications, Reinforcement Learning, Transfer Learning
Abstract: Tendon drives and soft muscle actuation, as seen in humans, can make robots safer, faster, and potentially accelerate skill learning. Still, such robots are rarely used due to inherent nonlinearities, friction, and hysteresis, which make such systems challenging to model and control. These challenges prohibit learning entire policies in simulation and transferring them to the real system. To enable sim-to-real training for robots with soft actuation and tendon drives, we propose learning a model of the actuators and combining it with a torque-based simulator, allowing us to learn the difficult-to-model actuation and leverage well-studied rigid body models for the rest. This combined simulation enables training reinforcement learning policies for a goal-reaching and a ball-in-a-cup task purely in simulation, achieving success rates of 90% and 75%, respectively, on the real robot.
|
| |
| 10:45-12:05, Paper FrZA.49 | |
| Hybrid Approach for Asymmetric Teacher-Student Training |
|
| Schik, Maximilian | FZI Forschungszentrum Informatik |
| Daaboul, Karam | Karlsruhe Institut for Technology |
| Neumann, Gerhard | Karlsruhe Institute of Technology |
Keywords: AI-Enabled Robotics
Abstract: Vision-based locomotion policies typically train with
on-policy methods in massively parallel simulation, but
rendering depth images is computationally expensive and
limits feasible parallelism. Off-policy methods can reuse
rendered observations through experience replay, reducing
computational cost, but struggle with instability when
training from high-dimensional visual inputs. We introduce
a hybrid approach that combines off-policy RL with
time-decaying knowledge distillation from a privileged
teacher. Strong early supervision prevents the student from
exploiting poorly calibrated critics, while exponential
decay transfers control to the RL objective as the replay
buffer matures. Trained in parallelized simulation, the
resulting policy transfers zero-shot to a Unitree Go2
quadruped, where it executes crawling behaviors under low
obstacles using only egocentric depth and proprioception.
|
| |
| 10:45-12:05, Paper FrZA.50 | |
| Coarse-To-Fine BEAST: Dual-System Spline Tokenization for Vision-Language-Action Models |
|
| Stranghöner, Jannick | Universität Bielefeld |
| Gruner, Theo | TU Darmstadt |
| Vanjani, Pankhuri | Karlsruhe Institute of Technology |
| Jülg, Tobias Thomas | University of Technology Nuremberg |
| Scherer, Christian Felix | TU Darmstadt |
| Peters, Jan | Technische Universität Darmstadt |
| Burgard, Wolfram | University of Technology Nuremberg |
| Neumann, Klaus | Bielefeld University / Fraunhofer IOSB-INA |
| Lioutikov, Rudolf | Karlsruhe Institute of Technology |
Keywords: RIG Cluster: Learning and Multimodal AI for Robotics, RIG TC: Robotics Foundation Models, Imitation Learning
Abstract: B-spline action tokenizers such as BEAST compress high-frequency robot trajectories into a small, fixed-length token sequence, enabling fast parallel decoding and smooth motion within the chunk. However, in online control a fundamental tension between reactivity and throughput remains: replanning frequently improves responsiveness. However, repeatedly invoking a large vision-language-action (VLA) backbone is expensive. Longer action chunks can improve efficiency, but delay corrections and decrease responsiveness. We propose Coarse-to-Fine BEAST, a dual-system formulation that factorizes planning and refinement directly in B-spline control-point space. A slow System~2 VLA predicts a small set of coarse control points, while a lightweight System~1 refiner performs fast, observation-conditioned residual updates on a small subset of future fine control points, preserving continuity by construction. This decomposition reduces both backbone calls and provides an interface for adding new modalities, such as force/torque or tactile input, during finetuning without modifying the large VLA.
|
| |
| 10:45-12:05, Paper FrZA.51 | |
| Enhanced Co-Design of a Collective Robotic Construction System for the Assembly of In-Plane Timber Structures |
|
| Leder, Samuel | University of Stuttgart |
| Kim, HyunGyu | Korea Aerospace university |
| Sitti, Metin | Max-Planck Institute for Intelligent Systems |
| Menges, Achim | Insitute for Computational Design and Construction, University of Stuttgart |
|
|
| |
| 10:45-12:05, Paper FrZA.52 | |
| Digital Twins for Virtual Validation of Social Robots Leveraging Semantic Knowledge and Cognition |
|
| Zebisch, Raoul | University of Augsburg |
| Schilp, Johannes | University of Augsburg |
Keywords: Social HRI, Semantic Scene Understanding, Cognitive Modeling
Abstract: In recent years, the field of social robotics has gained popularity, most prominently in areas such as service, healthcare, or care of the elderly. More recently, the topic also gained relevance in other application fields such as manufacturing. Social capabilities have the potential to improve the quality of an interaction between humans and robots, enhancing robot acceptance and quality of life in the process. However, their validation through digital approaches proves to be unarguably difficult. When regarding processes of social human-robot interaction, a large number of social facts and rules need to be considered, possibly highly dependent on the context, rendering solutions even with data-driven AI approaches difficult. It is therefore the authors' believe that context-sensitive virtual validation systems of social robots, possibly a digital twin with an interactive 3D virtualization of the robot, achieve better results if one combines an AI system with a formal, possibly fuzzified, rule base, realized in a semantic knowledge graph. In this extended abstract, the authors give an overview on their methodological approach and recent advances to develop a social robot system in a digital twin, enabling virtual validation and improved social robot cognition, such as for communication or task planning.
|
| |
| 10:45-12:05, Paper FrZA.53 | |
| Mistake-Aware LLM Finetuning for Robust Planning |
|
| Prescher, Erik | Technical University Darmstadt |
| Schultze, Sven | Technical University of Darmstadt |
| Rudra, Sohan | TU Darmstadt |
| Prasad, Vignesh | TU Darmstadt |
| Stock-Homburg, Ruth | Technical University of Darmstadt |
| Chalvatzaki, Georgia | Technische Universität Darmstadt |
Keywords: Task Planning, Failure Detection and Recovery, AI-Based Methods
Abstract: While finetuned Large Language Models (LLMs) for embodied planning excel at producing reliable plans for the given environment, they do so in a very narrow area of operation. Usually one wrong step in such a plan is sufficient to get the agent into an unseen scenario. Current training paradigms focus on preventing mistakes by learning from optimal demonstrations, but neglect the crucial skill of recovering when deviating from the correct plan. To address this gap, we propose Mistake-Aware Finetuning (MAF), a novel training methodology that explicitly teaches agents to recover from planning errors. In the MAF paradigm, the model is exposed to plans containing intentional mistakes, but a targeted loss mask ensures it only learns from the subsequent, correct recovery actions. This allows the model to learn the association between a failure state and its resolution without being negatively influenced by the erroneous action. We demonstrate the effectiveness of MAF in substantially improving the task successes from 67% to 96% on the complex MiniGrid MiniBossLevel environment.
|
| |
| 10:45-12:05, Paper FrZA.54 | |
| Close-Proximity Human-Robot Interaction in Medical Interventions and Patient Care |
|
| Plonka, Björn Sören | Karlsruhe Institute of Technology (KIT) |
| Stallkamp, Jan | University of Heidelberg, Medical Faculty Mannheim, MIiSM |
| Mombaur, Katja | Karlsruhe Institute of Technology |
Keywords: Medical Robots and Systems, Physical Human-Robot Interaction, Task Planning
Abstract: Healthcare systems are increasingly strained by demographic change and staff shortages, motivating the integration of robotic assistants into daily patient care. While existing hospital robots mostly focus on surgical applications, routine patient-centered tasks remain unsupported. This work presents a research vision for the interaction strategies of a humanoid robotic assistant designed to operate safely in close physical contact with patients. We propose the use of an intuitive task selection from a predefined yet adaptive set of robot-executable actions. Additionally, we plan to examine the applicability and limitations of state-of-the-art dexterous humanoid hands in medical settings. Finally, we discuss the usability of artificial intelligence in hospital environments, highlighting its potential to increase its acceptance through human-like conversations, while emphasizing that AI-driven actuation of the robot's motors may lead to catastrophic failure.
|
| |
| 10:45-12:05, Paper FrZA.55 | |
| Designing Gaze-Guided Spatial Referencing for Trustworthy Mobile Robot Interaction |
|
| Elangovan, Govindaprasath | PES University, Bangalore |
| Janardhana, Vivek Kashyap | PES University, Bangalore |
| Vinay Krishna Sharma, Vinay Krishna | Indian Institute of Science |
| Bharti, Priyanka | IIT Kanpur |
Keywords: Design and Human Factors, Social HRI
Abstract: This work proposes a gaze-driven spatial referencing
framework that couples robot gaze behavior with SLAM-based
spatial representations to enable explainable and
human-legible interaction in autonomous mobile robots.
While gaze has been widely used as a social cue in
Human–Robot Interaction (HRI), its integration with a
robot’s internal spatial cognition remains limited.
Building on the Mirror Eyes concept of gaze as an
explainability interface and prior findings on trust and
legibility in HRI, we transform gaze into a computational
mechanism that grounds spatial intent in observable robot
behavior. The proposed approach projects the robot’s gaze
vector into the SLAM-generated map to anchor objects,
regions, and navigation goals, allowing the robot to
communicate spatial references through coordinated gaze and
interaction cues. This framework bridges the gap between
internal navigation reasoning and human-understandable
communication, supporting transparency, predictability, and
user trust in mobile robot interaction.
|
| |
| 10:45-12:05, Paper FrZA.56 | |
| Learning Autonomous Excavation: A Reinforcement Learning Approach to Rock Manipulation |
|
| Daaboul, Karam | Karlsruhe Institut for Technology |
| Wiberg, Viktor | Algoryx Simulation AB |
| Singh, Arvind | Karlsruher Institut Für Technologie |
| Weißkopf, Tobias | Karlsruhe Institute of Technology |
| Neumann, Gerhard | Karlsruhe Institute of Technology |
Keywords: Robotics and Automation in Construction, Reinforcement Learning
Abstract: Autonomous control of hydraulic excavators is challenging
due to complex contact dynamics with granular materials,
actuator lag, and stability constraints. We present a
reinforcement learning approach for robustly grasping and
lifting rocks while maintaining machine stability under
realistic physical constraints. Using Proximal Policy
Optimization, we train an agent entirely in a high-fidelity
AGX Dynamics simulation environment that accurately
captures excavator dynamics and soil interactions. Through
domain randomization and curriculum learning, the agent
acquires coordinated control policies without manually
crafted heuristics. The learned policy achieves a 91%
success rate across randomized soil parameters, rock
positions, and joint configurations, and 98% under fixed
conditions. These results demonstrate the potential of
reinforcement learning for automating complex operations
with heavy construction machinery.
|
| |