| |
Last updated on September 13, 2025. This conference program is tentative and subject to change
Technical Program for Tuesday September 30, 2025
|
TO1B |
ASEM Ballroom |
Oral Session 1 |
Regular |
Chair: Mingo Hoffman, Enrico | INRIA Nancy - Grand Est |
Co-Chair: Yoshida, Eiichi | Faculty of Advanced Engineering, Tokyo University of Science |
|
09:00-09:10, Paper TO1B.1 | |
Developing a Vision and Contact Sensor-Based Android-To-Human Touch System That Enables Face and Arm Touch |
|
Fox-Tierney, Aidan Edward | Osaka University |
Sakai, Kurima | ATR |
Shiomi, Masahiro | ATR |
Minato, Takashi | RIKEN |
Ishiguro, Hiroshi | Osaka University |
Keywords: Touch in HRI, Social HRI
Abstract: Mainstream use of AI has enabled human-like conversations with machines. Despite the perceived naturalness of these interactions, they are limited to 2D screens. Androids offer us an opportunity to explore the third dimension, touch, with a human-centered approach. In this work we developed and evaluated a system that lays some of the groundwork for future dialogues with these androids. While there have been robot-to-human studies that focus on behavior modification, handshakes, and hugs, many do not use human-looking androids, do not adapt to humans’ positions, or do not explore face-touch. We demonstrate these shortcomings are surmountable with a system that can direct an android to touch along someone’s arm or cheek. Our research goals were to create a system capable of subjective accuracy approaching human touch perception while maintaining participant rated naturalness scores no worse than those found with systems using nonadaptive android-to-human touch. To evaluate the system, 25 participants selected 30 touch locations while seated in front of a realistic, male-looking android. Coders found our mean contact-to-target arm accuracy to be 4.9 cm, which is only 4 mm over the 4.5 cm upper range for two-point discrimination tests of adults’ arms. Participants rated the naturalness of our touches as 3.9/7, which is no worse than the 3.0/7 found in a previous work with a mechanically identical android that nonadaptively touched people’s forearms. With both goals being met, we have demonstrated that natural adaptive touch with a realistic android, even for the sensitive face, is within reach of scientific studies.
|
|
09:10-09:20, Paper TO1B.2 | |
DexForce: Extracting Force-Informed Actions from Kinesthetic Demonstrations for Dexterous Manipulation |
|
Chen, Claire | Stanford University |
Yu, Zhongchun | Stanford University |
Choi, Hojung | Stanford University |
Cutkosky, Mark | Stanford University |
Bohg, Jeannette | Stanford University |
Keywords: Dexterous Manipulation, Learning from Demonstration
Abstract: Imitation learning requires high-quality demonstrations consisting of sequences of state-action pairs. For contact-rich dexterous manipulation tasks that require dexterity, the actions in these state-action pairs must produce the right forces. Current widely-used methods for collecting dexterous manipulation demonstrations are difficult to use for demonstrating contact-rich tasks due to unintuitive human-to-robot motion retargeting and the lack of direct haptic feedback. Motivated by these concerns, we propose DexForce. DexForce leverages contact forces, measured during kinesthetic demonstrations, to compute force-informed actions for policy learning. We collect demonstrations for six tasks and show that policies trained on our force-informed actions achieve an average success rate of 76% across all tasks. In contrast, policies trained directly on actions that do not account for contact forces have near-zero success rates. We also conduct a study ablating the inclusion of force data in policy observations. We find that while using force data never hurts policy performance, it helps most for tasks that require advanced levels of precision and coordination, like opening an AirPods case and unscrewing a nut.
|
|
09:20-09:30, Paper TO1B.3 | |
The Foundational Pose As a Selection Mechanism for the Design of Tool-Wielding Multi-Finger Robotic Hands |
|
Wang, Sunyu | Carnegie Mellon University |
Oh, Jean | Carnegie Mellon University |
Pollard, Nancy S | Carnegie Mellon University |
Keywords: Multifingered Hands, Mechanism Design, Dexterous Manipulation
Abstract: To wield an object means to hold and move it in a way that exploits its functions. When humans wield tools---such as writing with a pen or cutting with scissors---our hands would reach very specific poses, often drastically different from how we pick up the same objects just to transport them. In this work, we investigate the design of tool-wielding multi-finger robotic hand through a hypothesis: If a hand can kinematically reach a foundational pose (FP) with a tool, then it can wield the tool from that FP. We interpret FPs as snapshots that capture the workings of underlying parallel mechanisms formed by the tool and the hand, and one hand can form multiple mechanisms with the same tool. We tested our hypothesis in a hand design experiment, where we developed a sampling-based multi-objective design optimization framework that uses three FPs to computationally generate many different hand designs and evaluate them. The results show that 10,785 out of 100,480 hand designs we sampled reached the FPs; more than 99% of the 10,785 hands that reached the FPs successfully wielded tools in simulation, supporting our hypothesis. Meanwhile, our methods provide insights into the non-convex, multi-objective hand design optimization problem that could be hard to unveil otherwise, such as clustering and the Pareto front. Lastly, we demonstrate our methods' real-world feasibility and potential with a hardware prototype equipped with rigid endoskeleton and soft skin.
|
|
09:30-09:40, Paper TO1B.4 | |
From Canada to Japan: How 10, 000 Km Affect User Perception in Robot Teleoperation |
|
Capy, Siméon | Tokyo University of Science |
Kwok, Thomas M. | University of Waterloo |
Joseph, Kevin | University of Waterloo |
Kawasumi, Yuichiro | Kawada Technologies, Inc |
Nagashima, Koichi | KAWADA Technologies, Inc |
Sasaki, Tomoya | Tokyo University of Science |
Hu, Yue | University of Waterloo |
Yoshida, Eiichi | Faculty of Advanced Engineering, Tokyo University of Science |
Keywords: Telerobotics and Teleoperation, Human-Centered Robotics
Abstract: Robot teleoperation (RTo) has emerged as a viable alternative to local control, particularly when human intervention is still necessary. This research aims to study the distance effect on user perception in RTo, exploring the potential of teleoperated robots for older adult care. We propose an evaluation of non-expert users' perception of long-distance RTo, examining how their perception changes before and after interaction, as well as comparing it to that of locally operated robots. We have designed a specific protocol consisting of multiple questionnaires, along with a dedicated software architecture using the Robotics Operating System (ROS) and Unity. The results revealed no statistically significant differences between the local and remote robot conditions, suggesting that robots may be a viable alternative to traditional local control.
|
|
09:40-09:50, Paper TO1B.5 | |
AHMP: Agile Humanoid Motion Planning with Contact Sequence Discovery |
|
Tsikelis, Ioannis | Inria |
Tsiatsianas, Evangelos | University of Patras |
Kiourt, Chairi | Athena Research Centre |
Ivaldi, Serena | INRIA |
Chatzilygeroudis, Konstantinos | University of Patras |
Mingo Hoffman, Enrico | INRIA Nancy - Grand Est |
Keywords: Multi-Contact Whole-Body Motion Planning and Control, Hybrid Logical/Dynamical Planning and Verification, Optimization and Optimal Control
Abstract: Planning agile whole-body motions for legged and humanoid robots is a fundamental requirement for enabling dynamic tasks such as running, jumping, and fast reactive maneuvers. In this work, we present AHMP, a multi-contact motion planning framework based on bi-level optimization that integrates a contact sequence discovery technique, using the Mixed-Distribution Cross-Entropy Method (CEM-MD), and an efficient trajectory optimization scheme, which parameterizes the robot’s poses and motions in the tangent space of SE(3). AHMP permits the automatic generation of feasible contact configurations, with associated whole-body dynamic transitions. We validate our approach on a set of challenging agile motion planning tasks for humanoid robots, demonstrating that contact sequence discovery combined with tangent space parameterization leads to highly dynamic motion plans while remaining computationally efficient.
|
|
TI4C |
ASEM Ballroom Lobby |
Interactive Session 1 |
Interactive |
|
14:00-15:00, Paper TI4C.1 | |
Reinforcement Learning of Contact Preferability in Multi-Contact Locomotion Planning for Humanoids |
|
Kumagai, Iori | National Inst. of AIST |
Murooka, Masaki | AIST |
Morisawa, Mitsuharu | National Inst. of AIST |
Kanehiro, Fumio | National Inst. of AIST |
Keywords: Humanoid Robot Systems, Multi-Contact Whole-Body Motion Planning and Control, Reinforcement Learning
Abstract: In this paper, we propose the multi-contact locomotion planning framework for humanoid robots that leverages the target contact selection considering its long-term preferability by reinforcement learning (RL) in optimization-based motion generation with feasibility constraints. It is difficult to predict how the next target contact will affect the motion of the robot over the future in multi-contact locomotion, where humanoid robots are required to perform complex motions with kinematic constraints and static equilibrium. To solve this problem, we evaluate the preferability of the motion planned by the optimization-based motion planner to reach the target contact, which we define as contact preferability, as the reward for RL. This idea enabled us to train the policy to provide a contact with the large future preferability without explicitly designing its future promise by ourselves. We also propose the design of action space for RL based on the robot’s reachability. We construct sets of feasible joint angles for each limb of the robot as successors and use them as the action space instead of directly managing contact poses. By defining the deterministic mapping from the successor to the target contact, the proposed framework can manage acyclic multi-contact motion where the number of contacts can be changed. We evaluate the proposed framework in three scenarios and prove that it can plan a preferable contact sequence for multi-contact locomotion with a high success rate and short computational time.
|
|
14:00-15:00, Paper TI4C.2 | |
Addressing Reachability and Discrete Component Selection in Robotic Manipulator Design through Kineto-Static Bi-Level Optimization |
|
Mingo Hoffman, Enrico | INRIA Nancy - Grand Est |
Costanzi, Daniel | PAL Robotics |
Fadini, Gabriele | ZHAW |
Miguel, Narcís | PAL Robotics |
Del Prete, Andrea | University of Trento |
Marchionni, Luca | Pal Robotics SL |
Keywords: Methods and Tools for Robot System Design, Mechanism Design, Optimization and Optimal Control
Abstract: Designing robotic manipulators for generic tasks while meeting specific requirements is a complex, iterative process involving mechanical design, simulation, control, and testing. New computational design tools are needed to simplify and speed up such processes. This work presents an original formulation of the computational design problem, tailored to help design generic manipulators with strong reachability requirements. The primary challenges addressed in this work are twofold. First, the necessity to consider the design of both continuous quantities and discrete components. Second, the ability to guide the design using high-level requirements, like the robot’s workspace, without needing a specific manipulation task, unlike other co-design frameworks. These two challenges are addressed by employing a novel kineto-static formulation, resulting in a Mixed Integer Nonlinear Programming problem, which is solved using bi-level optimization. A compelling use case from a real industrial application is presented to highlight the practical effectiveness of the proposed method.
|
|
14:00-15:00, Paper TI4C.3 | |
Multi-Contact Inertial Parameters Estimation and Localization in Legged Robots |
|
Martinez, Sergi | Heriot-Watt |
Griffin, Robert J. | Institute for Human and Machine Cognition (IHMC) |
Mastalli, Carlos | Heriot-Watt University |
Keywords: Optimization and Optimal Control, Legged Robots, Calibration and Identification
Abstract: Optimal estimation is a promising tool for estimation of payloads’ inertial parameters and localization of robots in the presence of multiple contacts. To harness its advantages in robotics, it is crucial to solve these large and challenging optimization problems efficiently. To tackle this, we (i) develop a multiple shooting solver that exploits both temporal and parametric structures through a parametrized Riccati recursion. Additionally, we (ii) propose an inertial manifold that ensures the full physical consistency of inertial parameters and enhances convergence. To handle its manifold singularities, we (iii) introduce a nullspace approach in our optimal estimation solver. Finally, we (iv) develop the analytical derivatives of contact dynamics for both inertial parametrizations. Our framework can successfully solve estimation problems for complex maneuvers such as brachiation in humanoids, achieving higher accuracy than conventional least squares approaches. We demonstrate its numerical capabilities across various robotics tasks and its benefits in experimental trials with the Go1 robot.
|
|
14:00-15:00, Paper TI4C.4 | |
SLAG: Scalable Language-Augmented Gaussian Splatting |
|
Szilagyi, Laszlo | Stanford University |
Engelmann, Francis | Stanford University |
Bohg, Jeannette | Stanford University |
Keywords: Semantic Scene Understanding, Mapping, Deep Learning for Visual Perception
Abstract: Language-augmented scene representations hold great promise for large-scale robotics applications such as search-and-rescue, smart cities, and mining. Many of these scenarios are time-sensitive, requiring rapid scene encoding while also being data-intensive, necessitating scalable solutions. Deploying these representations on robots with limited computational resources further adds to the challenge. To address this, we introduce SLEG, a multi-GPU framework for language-augmented Gaussian splatting that enhances the speed and scalability of embedding large scenes. Our method integrates 2D visual-language model features into 3D scenes using SAM and CLIP. Unlike prior approaches, SLEG eliminates the need for a loss function to compute per-Gaussian language embeddings. Instead, it derives embeddings from 3D Gaussian scene parameters via a normalized weighted average, enabling highly parallelized scene encoding. Additionally, we introduce a vector database for efficient embedding storage and retrieval. Our experiments show that SLEG achieves an 18× speedup in embedding computation on a 16-GPU setup compared to OpenGaussian, while preserving embedding quality on the ScanNet and LERF datasets. For more details, visit our project website: https://sleg-project.github.io/
|
|
14:00-15:00, Paper TI4C.5 | |
Planar Velocity Estimation for Fast-Moving Mobile Robots Using Event-Based Optical Flow |
|
Boyle, Liam | ETH Zurich |
Baumann, Nicolas | ETH |
Kühne, Jonas | ETH Zürich |
Bastuck, Niklas | ETH Zurich |
Magno, Michele | ETH Zurich |
Keywords: Wheeled Robots, Field Robots, Sensor Fusion
Abstract: Accurate velocity estimation is critical in mobile robotics, particularly for driver assistance systems and au- tonomous driving. Wheel odometry fused with Inertial Mea- surement Unit (IMU) data is a widely used method for velocity estimation, however, it typically requires strong assumptions, such as non-slip steering, or complex vehicle dynamics models that do not hold under varying environmental conditions, like slippery surfaces. We introduce an approach to velocity estimation that is decoupled from wheel-to-surface traction assumptions by leveraging planar kinematics in combination with optical flow from event cameras pointed perpendicularly at the ground. The asynchronous μ-second latency and high dynamic range of event cameras make them highly robust to motion blur, a common challenge in vision-based perception techniques for autonomous driving. The proposed method is evaluated through in-field experiments on a 1:10 scale autonomous racing platform and compared to precise motion capture data demonstrating not only performance on par with the State-of-the-Art Event-VIO method but also a 38.3% improvement in lateral error. Qualitative experiments at highway speeds of up to 32 m s−1 further confirm the effectiveness of our approach, indicating significant potential for real-world deployment.
|
|
14:00-15:00, Paper TI4C.6 | |
SonicBoom: Contact Localization Using Array of Microphones |
|
Lee, Moonyoung | Carnegie Mellon University |
Yoo, Uksang | Carnegie Mellon University |
Oh, Jean | Carnegie Mellon University |
Ichnowski, Jeffrey | Carnegie Mellon University |
Kantor, George | Carnegie Mellon University |
Kroemer, Oliver | Carnegie Mellon University |
Keywords: Agricultural Automation, Force and Tactile Sensing, Grippers and Other End-Effectors
Abstract: In cluttered environments where visual sensors encounter heavy occlusion, such as in agricultural settings, tactile signals can provide crucial spatial information for the robot to locate rigid objects and maneuver around them. We introduce SonicBoom, a holistic hardware and learning pipeline that enables contact localization through an array of contact microphones. While conventional sound source localization methods effectively triangulate sources in air, localization through solid media with irregular geometry and structure presents challenges that are difficult to model analytically. We address this challenge through a feature-engineering and learning-based approach, autonomously collecting 18,000 robot interaction-sound pairs to learn a mapping between acoustic signals and collision locations on the robot end-effector link. By leveraging relative features between microphones, SonicBoom achieves localization errors of 0.43cm for in-distribution interactions and maintains robust performance of 2.22cm error even with novel objects and contact conditions. We demonstrate the system’s practical utility through haptic mapping of occluded branches in mock canopy settings, showing that acoustic-based sensing can enable reliable robot navigation in visually challenging environments.
|
|
14:00-15:00, Paper TI4C.7 | |
A Reinforcement Learning Approach to Non-Prehensile Manipulation through Sliding |
|
Raei, Hamidreza | Italian Institute of Technology |
De Momi, Elena | Politecnico Di Milano |
Ajoudani, Arash | Istituto Italiano Di Tecnologia |
Keywords: In-Hand Manipulation, Reinforcement Learning, Transfer Learning
Abstract: Although robotic applications increasingly demand versatile and dynamic object handling, most existing techniques are predominantly focused on grasp-based manipulation, limiting their applicability in non-prehensile tasks. To address this need, this study introduces a Deep Deterministic Policy Gradient (DDPG) reinforcement learning (RL) framework for efficient non-prehensile manipulation, specifically for sliding an object on a surface. The algorithm generates a linear trajectory by precisely controlling the acceleration of a robotic arm rigidly coupled to the horizontal surface, enabling the relative manipulation of an object as it slides along the surface. Furthermore, two distinct algorithms have been developed to estimate the frictional forces dynamically during the sliding process. These algorithms dynamically provide friction estimates online after each action, serving as critical feedback to the actor model. This feedback mechanism enhances the policy's adaptability and robustness, ensuring more precise control of the platform's acceleration in response to varying surface conditions. The proposed algorithm is validated through simulations and real-world experiments. Results demonstrate that the proposed framework effectively generalizes sliding manipulation across varying distances and, more importantly, adapts to different surfaces with diverse frictional properties. Notably, the trained model exhibits zero-shot sim-to-real transfer capabilities.
|
|
14:00-15:00, Paper TI4C.8 | |
Robotic In-Hand Manipulation for Large-Range Precise Object Movement: The RGMC Champion Solution |
|
Yu, Mingrui | Tsinghua University |
Jiang, Yongpeng | Tsinghua University |
Chen, Chen | Tsinghua University |
Jia, Yongyi | Tsinghua University |
LI, Xiang | Tsinghua University |
Keywords: In-Hand Manipulation, Multifingered Hands, Dexterous Manipulation
Abstract: In-hand manipulation using multiple dexterous fingers is a critical robotic skill that can reduce the reliance on large arm motions, thereby saving space and energy. This letter focuses on in-grasp object movement, which refers to manipulating an object to a desired pose through only finger motions within a stable grasp. The key challenge lies in simultaneously achieving high precision and large-range movements while maintaining a constant stable grasp. To address this problem, we propose a simple and practical approach based on kinematic trajectory optimization with no need for pretraining or object geometries, which can be easily applied to novel objects in real-world scenarios. Adopting this approach, we won the championship for the in-hand manipulation track at the 9th Robotic Grasping and Manipulation Competition (RGMC) held at ICRA 2024. Implementation details, discussion, and further quantitative experimental results are presented in this letter, which aims to comprehensively evaluate our approach and share our key takeaways from the competition. Supplementary materials including video and code are available at https://rgmc-xl-team.github.io/ingrasp_manipulation.
|
|
14:00-15:00, Paper TI4C.9 | |
Learning Human-Aware Robot Policies for Adaptive Assistance |
|
Qin, Jason | StonyBrook University |
Ban, Shikun | Peking University |
Zhu, Wentao | Eastern Institute of Technology, Ningbo |
Wang, Yizhou | Peking University |
Samaras, Dimitris | StonyBrook University |
Keywords: Reinforcement Learning, Human-Robot Collaboration, Physically Assistive Devices
Abstract: Developing robots that can assist humans efficiently, safely, and adaptively is crucial for real-world applications such as healthcare. While previous work often assumes a centralized simulation system for co-optimizing human-robot interactions, we argue that real-world scenarios are much more complicated, as humans have individual preferences regarding how tasks are performed. Robots typically lack direct access to these implicit preferences. However, to provide effective assistance, robots must still be able to recognize and adapt to the individual needs and preferences of different users. To address these challenges, we first delineate the misalignment problem in human and robot observations, then propose a novel framework in which robots infer human intentions and reason about human utilities through interaction. Our approach features two critical modules: the anticipation module is a motion predictor that captures the spatial-temporal relationship between the robot agent and user agent, which contributes to predicting human behavior; the utility module infers the underlying human utility functions through progressive task demonstration sampling. Extensive experiments across various robot types and assistive tasks demonstrate that the proposed framework not only enhances task success but also significantly improves human preference alignment in simulated environments, paving the way for more personalized and adaptive assistive robotic systems.
|
|
14:00-15:00, Paper TI4C.10 | |
VR-Robo: A Real-To-Sim-To-Real Framework for Visual Robot Navigation and Locomotion |
|
Zhu, Shaoting | Tsinghua University |
Mou, Linzhan | Princeton University |
Li, Derun | Shanghai Jiao Tong University |
Ye, Baijun | Tsinghua University |
Huang, Runhan | Tsinghua University |
Zhao, Hang | Tsinghua University |
Keywords: Legged Robots, AI-Based Methods, Vision-Based Navigation
Abstract: Recent success in legged robot locomotion is attributed to the integration of reinforcement learning and physical simulators. However, these policies often encounter challenges when deployed in real-world environments due to sim-to-real gaps, as simulators typically fail to replicate visual realism and complex real-world geometry. Moreover, the lack of realistic visual rendering limits the ability of these policies to support high-level tasks requiring RGB-based perception like ego-centric navigation. This paper presents a Real-to-Sim-to-Real framework that generates photorealistic and physically interactive "digital twin" simulation environments for visual navigation and locomotion learning. Our approach leverages 3D Gaussian Splatting (3DGS) based scene reconstruction from multi-view images and integrates these environments into simulations that support ego-centric visual perception and mesh-based physical interactions. To demonstrate its effectiveness, we train a reinforcement learning policy within the simulator to perform a visual goal-tracking task. Extensive experiments show that our framework achieves RGB-only sim-to-real policy transfer. Additionally, our framework facilitates the rapid adaptation of robot policies with effective exploration capability in complex new environments, highlighting its potential for applications in households and factories.
|
|
14:00-15:00, Paper TI4C.11 | |
Learning Inverse Hitting Problem |
|
Khurana, Harshit | EPFL |
Hermus, James | EPFL |
Gautier, Maxime | École Polytechnique Fédérale De Lausanne (EPFL) |
Schakkal, André | EPFL |
Billard, Aude | EPFL |
Keywords: Manipulation Planning, Dual Arm Manipulation, Learning Categories and Concepts
Abstract: This paper presents a data collection framework and a learning model to understand the motion of an object after being subject to an impulse. The data collection framework consists of an automated dual arm setup hitting an object to each other, like a collaborative air-hockey game. An impact aware extended Kalman filter is proposed for automation of the air-hockey setup which approximates the discontinuous impulse motion equations through a hitting force model by balancing the energies during collision. To capture the variance in the motion that stochasticity of friction introduces, the errors in the controls for the hitting flux, we model the stochastic relationship between hitting flux and object's resulting displacement, using full density modeling. Further we show the application of the learnt motion model for planning sequential hits with two or more robots, in a Golf-like principle, to enable an object to reach a location far beyond the reach of a single robot.
|
|
14:00-15:00, Paper TI4C.12 | |
Language-Embedded 6D Pose Estimation for Tool Manipulation |
|
Tu, Yuyang | Universitat Hamburg |
Wang, Yunlong | Universiät Hamburg |
Zhang, Hui | University of Hamburg |
Chen, Wenkai | University of Hamburg |
Zhang, Jianwei | University of Hamburg |
Keywords: Grasping, Deep Learning for Visual Perception
Abstract: Robotic tool manipulation requires understanding task-relevant semantics under visually challenging conditions, such as shape variation and occlusion. This paper presents a novel framework for Language-Embedded Semantic 6D Pose Estimation that combines natural language instructions with 3D point cloud data to achieve category-level 6D pose estimation of tools' functional parts. By embedding semantic information from large language models (LLMs) and leveraging a diffusion-based pose estimator, our approach achieves robust generalization across diverse tool categories. We introduce a comprehensive synthetic dataset, tailored for tool manipulation scenarios, with annotated 6D poses of functional parts. Extensive experiments conducted on both the synthetic dataset and real-world robots demonstrate our system’s ability to interpret natural language commands, predict poses of functional parts, and perform manipulation tasks with significant improvements in accuracy and generalization.
|
|
14:00-15:00, Paper TI4C.13 | |
Design and Control of the Humanoid Robot COMAN+ |
|
Ruscelli, Francesco | Istituto Italiano Di Tecnologia |
Rossini, Luca | Istituto Italiano Di Tecnologia |
Mingo Hoffman, Enrico | INRIA Nancy - Grand Est |
Baccelliere, Lorenzo | Istituto Italiano Di Tecnologia |
Laurenzi, Arturo | Istituto Italiano Di Tecnologia |
Muratore, Luca | Istituto Italiano Di Tecnologia |
Antonucci, Davide | Istituto Italiano Di Tecnologia |
Cordasco, Stefano | Istituto Italiano Di Tecnologia (IIT) |
Tsagarakis, Nikos | Istituto Italiano Di Tecnologia |
Keywords: Humanoid and Bipedal Locomotion, Multi-Contact Whole-Body Motion Planning and Control, Humanoid Robot Systems
Abstract: Despite the prevalence of robots operating within controlled environments such as industries, recent advancements in both autonomy and human-robot interaction have expanded the potential for their use within a diverse range of scenarios. Research efforts in humanoid robotics aim to develop platforms possessing the requisite versatility and dexterity to mimic human motion. This allows such machines to perform complex tasks alongside humans while ensuring safety during operations. Following these principles, this paper presents the robot COMAN+, focusing on its hardware capabilities and software implementations. COMAN+ is developed by the Humanoid and Human Centred Mechatronics Research Line at Istituto Italiano di Tecnologia: a human-sized torque-controlled humanoid assembled with a focus on robustness, reliability, and strength. Its custom-made actuation system and sturdy yet lightweight skeleton make it ideal for working in rough conditions with a high power-to-weight ratio for heavy-duty tasks.
|
|
14:00-15:00, Paper TI4C.14 | |
The Open Stack of Tasks Library: OpenSoT |
|
Mingo Hoffman, Enrico | INRIA Nancy - Grand Est |
Laurenzi, Arturo | Istituto Italiano Di Tecnologia |
Tsagarakis, Nikos | Istituto Italiano Di Tecnologia |
Keywords: Robust/Adaptive Control
Abstract: The OpenSoT library is a state-of-the-art framework for instantaneous whole-body motion planning and control based on Quadratic Programming optimization. The library is designed to enable users to easily write and solve a variety of complex instantaneous whole-body control problems with minimal input, facilitating the addition of new tasks, constraints, and solvers. OpenSoT is designed to be real-time safe and can be conveniently interfaced with other software components such as ROS or other robotic-oriented frameworks. This paper aims to present the usage of the OpenSoT library to a large audience of researchers, engineers, and practitioners, as well as to provide insights into its software design, which has matured over nearly 10 years of development.
|
|
14:00-15:00, Paper TI4C.15 | |
A Compact 6D Suction Cup Model for Robotic Manipulation Via Symmetry Reduction |
|
Oliva, Alexander Antonio | Eindhoven University of Technology (TU/e) |
Jongeneel, Maarten | Eindhoven University of Technology |
Saccon, Alessandro | Eindhoven University of Technology - TU/e |
Keywords: Dynamics, Calibration and Identification, Contact Modeling, Compliance and Impedance Control
Abstract: Active suction cups are widely adopted in industrial and logistics automation. Despite that, validated dynamic models describing their 6D force/torque interaction with objects are rare. This work aims at filling this gap by showing that it is possible to employ a compact model for suction cups, providing good accuracy also for large deformations. Its potential use is for advanced manipulation planning and control. We model the interconnected object-suction cup system as a lumped 6D mass-spring-damper systems, employing a potential energy function on SE(3), parametrized by a 6x6 stiffness matrix. By exploiting geometric symmetries of the suction cup, we reduce the parameter identification problem, from 6(6+1) / 2 = 21 to only 5 independent parameters, greatly simplifying the parameter identification procedure, that is otherwise ill-conditioned. Experimental validation is provided and data is shared openly to further stimulate research. As an indication of the achievable pose prediction in steady state, for an object of about 1.75 kg, we obtain a pose error in the order of 5 mm and 3 deg, with a gripper inclination of 60 deg.
|
|
14:00-15:00, Paper TI4C.16 | |
UniphorM: A New Uniform Spherical Image Representation for Robotic Vision |
|
André, Antoine N. | AIST |
Morbidi, Fabio | Université De Picardie Jules Verne |
Caron, Guillaume | CNRS |
Keywords: Omnidirectional Vision, Visual Tracking, Computer Vision for Other Robotic Applications, Rotation estimation
Abstract: In this paper, we present a new spherical image representation, called Uniform Spherical Mapping of Omnidirectional Images (UniphorM), and show its strong potential in robotic vision. UniphorM provides an accurate and distortion-free representation of a 360-degree image, by relying on multiple subdivisions of an icosahedron and its associated Voronoi diagrams. The geometric mapping procedure is described in detail, and the trade-off between pixel accuracy and computational complexity is investigated. To demonstrate the benefits of UniphorM in real-world problems, we applied it to direct visual attitude estimation and visual place recognition (VPR), by considering dual-fisheye images captured by a camera mounted on multiple robotic platforms. In the experiments, we measured the impact of the number of subdivision levels of the icosahedron on the attitude estimation error, time efficiency, and size of convergence domain of visual gyroscope, using UniphorM, and three existing mapping algorithms. A similar evaluation procedure was carried out for VPR, using images from the Mapillary platform. A new omnidirectional image dataset generated with a hexacopter, called SVMI
|
|
14:00-15:00, Paper TI4C.17 | |
On the Passive Virtual Viscous Element Injection Method for Elastic Joint Robots |
|
Zhang, Jiexin | Huazhong University of Science and Technology |
Hou, Tengyu | Shanghai Jiao Tong University |
Ding, Ye | Shanghai Jiao Tong University |
Zhang, Bo | Shanghai Jiao Tong University |
Liu, Honghai | Portsmouth University |
Keywords: Flexible Robots, Physical Human-Robot Interaction, Force Control, Virtual Viscous Element Injection
Abstract: Increasing the viscosity of elastic joints can significantly improve the performance of elastic joint robots during physical human‒robot interactions. However, current approaches for injecting viscous elements require an additional damper to be added in parallel with the elastic elements. In this paper, we propose a new concept called virtual viscous element injection (VVI), which enables a robot to exhibit viscoelasticity without altering its mechanical structure. VVI relies only on motor-side dynamics reshaping and state feedback. Interestingly, the VVI method allows high-resolution joint torque measurements in elastic joint robots, unlike in physical viscoelastic joint robots, which measure joint torque using higher-order derivatives of the positions. Furthermore, the VVI method is proved to preserve the passivity of robot dynamics, which provides numerous possibilities for the applications of combined passivity-based controllers. Specifically, we first emphasize the impedance control method using VVI. The results demonstrate that the VVI-DF method, which combines the direct feedback (DF) method with VVI, addresses the issue of excessive acceleration feedback in the controller. This provides looser constraints for achieving a high-gain torque loop in impedance control. Moreover, this paper also provides examples of the application of VVI combined with passivity-based position and torque controllers. Experiments and simulations demonstrate the effectiveness of the proposed methods. The proposed method can be extended to various robots, such as exoskeletons, and collaborative robots.
|
|
14:00-15:00, Paper TI4C.18 | |
Constrained Articulated Body Dynamics Algorithms |
|
Sathya, Ajay Suresha | Inria |
Carpentier, Justin | INRIA |
Keywords: Direct/Inverse Dynamics Formulation, Dynamics, Optimization and Optimal Control, Humanoid Robots
Abstract: Few constrained rigid body dynamics algorithms have been proposed so far to adequately account for constrained dynamical systems in challenging singular cases (e.g., redundant constraints, singular constraints, etc.) while depicting low algorithmic complexity. In this article, we introduce a series of new algorithms with reduced (and lowest) complexity for the forward simulation of constrained dynamical systems. Notably, we revisit the articulated body algorithm (ABA) and the Popov-Vereshchagin algorithm (PV) in the light of proximal-point optimization and introduce two new algorithms, called constrainedABA and proxPV. These two new algorithms depict linear complexities while being robust to singular cases. We establish the connection with existing literature formulations, especially the relaxed formulation at the heart of the MuJoCo and Drake simulators. We also propose an efficient and new algorithm to compute the damped Delassus inverse matrix with the lowest known computational complexity. All these algorithms have been implemented inside the open-source framework Pinocchio and depict state-of-the-art performances compared to existing algorithms.
|
|
14:00-15:00, Paper TI4C.19 | |
Addressing Operator Physical Ergonomics in Teleoperation with Multi-Modal Dynamic Workspace Re-Indexing |
|
Exterkate, Thijs | Delft University of Technology |
Mol, Nicky | Delft University of Technology |
Abbink, David A. | Delft University of Technology |
Prendergast, J. Micah | Delft University of Technology |
Peternel, Luka | Delft University of Technology |
Keywords: Telerobotics and Teleoperation, Physical Human-Robot Interaction, Human Factors and Human-in-the-Loop
Abstract: This paper presents a multi-modal dynamic workspace re-indexing method for addressing operator ergonomics and workspace limitations. The proposed method has two interactive modes: pose-to-pose mode, which is active when the operator is within an ergonomic workspace of comfortable arm postures, and ergonomic workspace drift mode, which activates after the operator makes an excursion beyond the boundaries of the ergonomic workspace when trying to reach more distant targets with the remote robot. In the ergonomic workspace drift mode, the operator temporarily stays slightly outside these boundaries, while the offset between the local and remote workspace drifts with a velocity proportional to the excursion distance. This dynamically re-indexes the remote workspace toward the distant target, and the operator can remain in a comfortable posture while the remote robot moves toward the intended target where the task is. To construct the ergonomic workspace, we employed the Rapid Upper Limb Assessment method. To validate the proposed method, we conducted experiments on a teleoperation setup involving a Force Dimension Sigma7 haptic device controlling a Kuka LBR iiwa robotic arm. The results show that the proposed controller successfully addresses workspace limitations by dynamically re-indexing the follower's workspace towards target objects, while maintaining good operator ergonomics.
|
|
14:00-15:00, Paper TI4C.20 | |
CPC: Cascaded Predictive Control |
|
Scianca, Nicola | Sapienza University of Rome |
Mingo Hoffman, Enrico | INRIA Nancy - Grand Est |
Keywords: Optimization and Optimal Control, Reinforcement Learning, Humanoid and Bipedal Locomotion
Abstract: We present Cascaded Predictive Control (CPC), a framework that decomposes a model-predictive control horizon into multiple stages, each solved with the model and method best suited to its accuracy requirements. CPC transfers information across stages through the value function: the terminal cost produced by a downstream stage acts as a boundary condition for the preceding one, enabling heterogeneous combinations of different techniques and solvers. Two proof-of-concept studies illustrate the approach. First, a humanoid walking controller uses a double-DDP cascade that schedules single-rigid-body and linear-inverted-pendulum models to extend the prediction horizon at a lower computational cost. Second, a cart–pendulum swing-up couples a DDP stage with a value function learned by an actor–critic policy; the hybrid controller succeeds where the policy alone fails. Although the examples are preliminary, they indicate that CPC can provide a flexible framework for achieving effective results with reduced computational effort by combining the strengths of different techniques.
|
|
14:00-15:00, Paper TI4C.21 | |
Simultaneous Contact Sequence and Patch Planning for Dynamic Locomotion |
|
Dhédin, Victor | Technical University of Munich |
Zhao, Haizhou | New York University |
Khadiv, Majid | Technical University of Munich |
Keywords: Multi-Contact Whole-Body Motion Planning and Control, Legged Robots, Collision Avoidance
Abstract: Legged robots have the potential to traverse highly constrained environments with agile maneuvers. However, planning such motions requires solving a highly challenging optimization problem with a mixture of continuous and discrete decision variables. In this paper, we present a full pipeline based on Monte-Carlo tree search (MCTS) and whole-body trajectory optimization (TO) to perform simultaneous contact sequence and patch selection on highly challenging environments. Through extensive simulation experiments, we show that our framework can quickly find a diverse set of dynamically consistent plans. We experimentally show that these plans are transferable to a real quadruped robot. To the best of our knowledge, this is the first demonstration of simultaneous contact sequence and patch selection using the whole-body dynamics of a quadruped.
|
|
14:00-15:00, Paper TI4C.22 | |
Geodesic Tracing-Based Kinematic Integration of Rolling and Sliding Contact on Manifold Meshes for Dexterous In-Hand Manipulation |
|
Wang, Sunyu | Carnegie Mellon University |
Lakshmipathy, Arjun S | Carnegie Mellon University |
Oh, Jean | Carnegie Mellon University |
Pollard, Nancy S | Carnegie Mellon University |
Keywords: Dexterous Manipulation, Contact Modeling, Multifingered Hands
Abstract: Reasoning about rolling and sliding contact, or roll-slide contact for short, is critical for dexterous manipulation tasks that involve intricate geometries. But existing works on roll-slide contact mostly focus on continuous shapes with differentiable parametrizations. This work extends roll-slide contact modeling to manifold meshes. Specifically, we present an integration scheme based on geodesic tracing to first-order time-integrate roll-slide contact directly on meshes, enabling dexterous manipulation to reason over high-fidelity discrete representations of an object's true geometry. Using our method, we planned dexterous motions of a multi-finger robotic hand manipulating five objects in-hand in simulation. The planning was achieved with a least-squares optimizer that strives to maintain the most stable instantaneous grasp by minimizing contact sliding and spinning. Then, we evaluated our method against a baseline using collision detection and a baseline using primitive shapes. The results show that our method performed the best in accuracy and precision, even for coarse meshes. We conclude with a future work discussion on incorporating multiple contacts and contact forces to achieve accurate and robust mesh-based surface contact modeling.
|
|
14:00-15:00, Paper TI4C.23 | |
VAISI: Vision-Based Adaptive Impedance-Control for Surgical Incisions |
|
Lee, Chun Hei Jeffrey | University of Waterloo |
Marotta, Teresa | University of Waterloo |
McLachlin, Stewart | University of Waterloo |
Wong, Alexander | University of Waterloo |
Hu, Yue | University of Waterloo |
Keywords: Medical Robots and Systems, Computer Vision for Medical Robotics, Compliance and Impedance Control
Abstract: Robot-assisted surgery requires precise control of both position and interaction forces during soft tissue manipulation, a task challenged by the non-linear and highly variable mechanical properties of soft tissues. Skin incisions are a critical first step in many surgical procedures, and performing them accurately presents a fundamental robotic challenge in soft material manipulation. This paper introduces VAISI (Vision-Based Adaptive Impedance-Control for Surgical Incisions), a novel model-free robotic control framework that leverages real-time stereo vision feedback for precise depth regulation and force modulation during skin incisions. This approach is coupled with a compact scalpel-camera end-effector to measure the state and deformations of the targeted soft tissue. The VAISI framework uses vision-based feedback to adapt end-effector stiffness and trajectory via Cartesian impedance control, minimizing excess force while achieving accurate incisions. Experimental validation on textit{ex vivo} porcine belly and hock skin demonstrates that a low-constant-stiffness approach fails to apply enough force to create incisions, whereas VAISI enables sub-millimeter depth-accurate cuts with a maximum standard deviation of 1.23mm, emphasizing the necessity of force adaptation in unknown situations. These results highlight the ability of vision-guided adaptive control for safe and precise soft tissue incisions in future autonomous surgical systems.
|
|
14:00-15:00, Paper TI4C.24 | |
ARTEMIS: An Open-Source, Full-Sized Humanoid Robot for Dynamic Locomotion |
|
Zhu, Taoyuanmin | University of California, Los Angeles |
Ahn, Min Sung | University of California, Los Angeles |
Hong, Dennis | UCLA |
Keywords: Humanoid Robot Systems, Legged Robots, Humanoid and Bipedal Locomotion
Abstract: This paper presents ARTEMIS, an full-sized humanoid robot designed for dynamic motions. With 20 active degrees of freedom using custom proprioceptive actuators, ARTEMIS is capable of walking up to 2.1 m/s using a model-based control approach, making it one of the fastest humanoid robots at the time. It can also seamlessly transition between walking and running, making it the first platform entirely developed in academia to demonstrate such capabilities. This paper explains the details of the platform as well as the controller. ARTEMIS's performance and robustness are validated on various outdoor terrains as well as by winning a global robotics soccer competition. Having validated the platform, we open-source it to the wider community, starting from its actuation approach to the robot model with baseline controllers, to provide an accessible foundation for making custom humanoids.
|
|
14:00-15:00, Paper TI4C.25 | |
CLAP: Clustering to Localize across N Possibilities, a Simple, Robust Geometric Approach in the Presence of Symmetries |
|
Fernandez, Gabriel Ikaika | University of California Los Angeles |
Hou, Ruochen | UCLA |
Xu, Alex | University of California, Los Angeles |
Togashi, Colin | UCLA |
Hong, Dennis | UCLA |
Keywords: Localization, Humanoid Robot Systems
Abstract: In this paper, we present our localization method called CLAP, Clustering to Localize Across n Possibilities, which helped us win the RoboCup 2024 adult-sized autonomous humanoid soccer competition. Competition rules limited our sensor suite to stereo vision and an inertial sensor, similar to humans. In addition, our robot had to deal with varying lighting conditions, dynamic feature occlusions, noise from high-impact stepping, and mistaken features from bystanders and neighboring fields. Therefore, we needed an accurate, and most importantly robust localization algorithm that would be the foundation for our path-planning and game-strategy algorithms. CLAP achieves these requirements by clustering estimated states of our robot from pairs of field features to localize its global position and orientation. Correct state estimates naturally cluster together, while incorrect estimates spread apart, making CLAP resilient to noise and incorrect inputs. CLAP is paired with a particle filter and an extended Kalman filter to improve consistency and smoothness. Tests of CLAP with other landmark-based localization methods showed similar accuracy. However, tests with increased false positive feature detection showed that CLAP outperformed other methods in terms of robustness with very little divergence and velocity jumps. Our localization performed well in competition, allowing our robot to shoot faraway goals and narrowly defend our goal.
|
|
14:00-15:00, Paper TI4C.26 | |
Dual Arm Whole Body Motion Planning: Leveraging Overlapping Kinematic Chains |
|
Cheng, Richard | California Institute of Technology |
Werner, Peter | Massachusetts Institute of Technology |
Matl, Carolyn | Toyota Research Institute |
Keywords: Motion and Path Planning, Whole-Body Motion Planning and Control
Abstract: High degree-of-freedom dual-arm robots are becoming increasingly common due to their effectiveness for operating in human environments and their similarity to the human form factor. However, motion planning in real time within unknown, changing environments remains a challenge for such robots due to the high dimensionality of the configuration space and the complex collision constraints that must be obeyed. In this work, we propose a novel way to alleviate the curse of dimensionality by leveraging the structure imposed by shared joints (e.g. torso joints) in a dual-arm robot. First, we build two dynamic roadmaps for each kinematic chain (i.e. left arm + torso, right arm + torso) with specific structure induced by the shared joints. Then we show that we can leverage this structure to intelligently search through the composition of the two roadmaps. We show that this substantially alleviates the curse of dimensionality while being much more efficient than naive search through the Cartesian product of the roadmaps. We ran several experiments in a real-world grocery store with this motion planner on a 19 DoF mobile manipulation robot executing a grocery fulfillment task, achieving 0.4s average planning times with 99.9% success rate across more than 2000 motion plans.
|
|
14:00-15:00, Paper TI4C.27 | |
CoRe: A Hybrid Approach of Contact-Aware Optimization and Learning for Humanoid Robot Motions |
|
Jeong, Taemoon | Korea University |
Chai, Yoonbyung | Korea University |
Choi, Sol | Korea Institute of Science and Technology |
Bak, Jaewan | Korea Institute of Science and Technology |
Kim, Chanwoo | Korea University |
Yoon, JiHwan | Korea University |
Lee, Yisoo | Korea Institute of Science and Technology |
Lee, Jongwon | Korea Institute of Science and Technology |
Lee, Kyungjae | Korea University |
Kim, Joohyung | University of Illinois Urbana-Champaign |
Choi, Sungjoon | Korea University |
Keywords: Human and Humanoid Motion Analysis and Synthesis, Multi-Contact Whole-Body Motion Planning and Control, Whole-Body Motion Planning and Control
Abstract: Recent advances in text-to-motion generation enable realistic human-like motions directly from natural language. However, translating these motions into physically executable motions for humanoid robots remains challenging due to significant embodiment differences and physical constraints. Existing methods primarily rely on reinforcement learning (RL) without addressing initial kinematic infeasibility. This often leads to unstable robot behaviors. We introduce Contact-aware motion Refinement (CoRe), a fully automated pipeline consisting of human motion generation from text, robot-specific retargeting, optimization-based motion refinement, and a subsequent RL phase enhanced by contact-aware rewards. This integrated approach mitigates common motion artifacts such as foot sliding, unnatural floating, and excessive joint accelerations prior to RL training, thereby improving overall motion stability and physical plausibility. We validate our pipeline across diverse humanoid platforms without task-specific tuning or dynamic-level optimization. Results demonstrate effective sim-to-real transferability in various scenarios, from simple upper-body gestures to complex whole-body locomotion tasks.
|
|
14:00-15:00, Paper TI4C.28 | |
Contact-Rich and Deformable Foot Modeling for Locomotion Control of the Human Musculoskeletal System |
|
Gong, Haixin | Tsinghua University |
Zhang, Chen | Tsinghua University |
Sui, Yanan | Tsinghua University |
Keywords: Human and Humanoid Motion Analysis and Synthesis, Modeling and Simulating Humans, Multi-Contact Whole-Body Motion Planning and Control
Abstract: The human foot serves as the critical interface between the body and environment during locomotion. Existing musculoskeletal models typically oversimplify foot-ground contact mechanics, limiting their ability to accurately simulate human gait dynamics. We developed a novel contact-rich and deformable model of the human foot integrated within a complete musculoskeletal system that captures the complex biomechanical interactions during walking. To overcome the control challenges inherent in modeling multi-point contacts and deformable material, we developed a two-stage policy training strategy to learn natural walking patterns for this interface-enhanced model. Comparative analysis between our approach and conventional rigid musculoskeletal models demonstrated improvements in kinematic, kinetic, and gait stability metrics. Validation against human subject data confirmed that our simulation closely reproduced real-world biomechanical measurements. This work advances contact-rich interface modeling for human musculoskeletal systems and establishes a robust framework that can be extended to humanoid robotics applications requiring precise foot-ground interaction control.
|
|
14:00-15:00, Paper TI4C.29 | |
Learning to Act through Contact: A Unified View of Multi-Task Robot Learning |
|
Omar, Shafeef | Munich Institute of Robotics and Machine Intelligence, Technical |
Khadiv, Majid | Technical University of Munich |
Keywords: Reinforcement Learning, Legged Robots, Whole-Body Motion Planning and Control
Abstract: We present a unified framework for multi-task locomotion and manipulation policy learning grounded in contact-explicit representations. Instead of designing different policies for different tasks, our approach unifies the definition of a task through a sequence of contact goals--desired contact positions, timings, and active end-effectors. This enables leveraging the shared structure across diverse contact-rich tasks, leading to a single policy that can perform a wide range of tasks. In particular, we train a goal-conditioned reinforcement learning (RL) policy to realize given contact plans. We validate our framework on multiple robotic embodiments and tasks: a quadruped performing multiple gaits, a humanoid performing multiple biped and quadrupedal gaits, a humanoid executing different bimanual object manipulation tasks, and a dextrous hand performing in-hand manipulation. Each robot is controlled by a single policy trained to execute different tasks grounded in contacts, demonstrating versatile and robust behaviors across morphologically distinct systems. Our results show that explicit contact reasoning significantly improves generalization to unseen scenarios, positioning contact-explicit policy learning as a promising foundation for scalable loco-manipulation. Video available at: https://youtu.be/chKcB8Un22w
|
|
14:00-15:00, Paper TI4C.30 | |
Frequency Response Data-Driven Optimization for Fast and Precise Motion Control of Flexible Joint Robots |
|
Lee, Deokjin | Daegu Gyeongbuk Institute of Science and Technology |
Song, JunHo | Daegu Gyeongbuk Institute of Science and Technology |
Oh, Sehoon | DGIST |
Keywords: Flexible Robotics, Optimization and Optimal Control, Motion Control
Abstract: A frequency response function (FRF)-based data-driven optimization framework is proposed for motion control of flexible joint robots (FJRs). Unlike conventional model-based methods, the controller directly utilizes measured FRF data to synthesize joint-level controllers via convex optimization. This enables automated, high-bandwidth, and robust control without explicit model identification. Experimental validation on a 7-DOF FJR demonstrates superior performance in tracking accuracy and impact tolerance during a high-speed drumming task.
|
|
14:00-15:00, Paper TI4C.31 | |
Graceful: The Synergistic Impact of Ultra-Low Center of Gravity and Aesthetic Integration for Humanoid |
|
Bear, Wynn | International WYNNBEAR AITech |
Wu, Silence | International WYNNBEAR AITech |
Keywords: Body Balancing, Legged Robots, Humanoid and Bipedal Locomotion
Abstract: This paper presents a novel approach to humanoid robotics, centered on an Ultra-Low Center of Gravity (ULCG) Innate Equilibrium Nexus (IEN) and Aesthetic Integration Invisibility (AII). The current generation of robots often struggles with two fundamental challenges: maintaining dynamic stability and achieving a flawless, integrated form. Our ULCG IEN system provides a foundational solution, enabling humanoids to achieve remarkable, self-contained stability, dynamically adapting to disturbances through an internal, intuitive balancing mechanism. In parallel, the AII system innovates with independent, multi-functional cavities in aesthetic design, seamlessly concealing necessary functional protrusions to ensure a perfect external appearance. This dual innovation fundamentally transforms humanoid capabilities, offering robots with superior, intrinsic balance and an uncompromised aesthetic, paving a different perspective for more sophisticated and socially integrated applications.
|
|
14:00-15:00, Paper TI4C.32 | |
Potentiometer Proprioceptive Sensor for Rolling Contact Joints |
|
Jeong, Inchul | Seoul National University |
Cho, Kyu-Jin | Seoul National University, Biorobotics Laboratory |
Keywords: Grippers and Other End-Effectors, Multifingered Hands, Actuation and Joint Mechanisms
Abstract: Rolling contact joints have low friction, low backlash and endure higher compression force than conventional rotational joints. Due to these advantages, many robot hands utilized RCJs to their design. However, sensing mechanisms for precise control of RCJs are needed to perform complex and delicate manipulation tasks. In this paper, potentiometer based angle sensing mechanisms for RCJ is proposed. They show highly reliable and repeatable outputs and easily integrated into the structure of the joints.
|
|
14:00-15:00, Paper TI4C.33 | |
Novel Isomorphic Embodiment & Expert Skill Alignment: Humanoid Robot Imitation Learning and Generalization |
|
Bear, Wynn | International WYNNBEAR AITech |
Wu, Silence | International WYNNBEAR AITech |
Keywords: Grasping, Bioinspired Robot Learning, Imitation Learning
Abstract: Our proposed approach addresses a critical limitation of current humanoid robot imitation learning, which often fails to achieve professional proficiency due to the lack of a platform with structural and functional isomorphism to the human hand. A robot trying to emulate a world-class masseur, for instance, is like a culinary master lacking the right tools; without an isomorphic mechanical hand, precise manipulation and a comfortable user experience are unattainable. Our paradigm fundamentally transforms existing technical frameworks by grounding expert skill acquisition in a structure-function isomorphic platform. Through fine-tuning, it equips humanoid robots with genuine professional operational capabilities, human-like interactive experiences, and robust performance. Our manipulators (e.g., Palm-VS01 and Palm-VS02) are modular and detachable, integrating stereo vision and IMUs to create a unified "observation-action" database from expert demonstrations. This process, based on visual-inertial odometry (VO), requires only vision data for deployment. A cross-embodiment imitation learning framework, which combines high-level visual-motor strategies and low-level motion redirection, facilitates seamless skill migration and generalization across different robotic platforms, thereby laying a solid foundation for the widespread application of embodied intelligence. To achieve this, we are pursuing two core initiatives. First, we are building a universal professional skills database by collecting isomorphic expert operation videos across industries—such as master masseurs, highly -skilled bartenders and restaurant server, airline engine technicians, virtuoso surgeons, and master chefs. Aligning these demonstrations with our manipulators enables them to master diverse skills, achieving performance comparable to or surpassing general human capabilities. Second, we are establishing a collaborative data-sharing protocol to aggregate vast amounts of successful operation data from millions of users. This astronomical data accumulation will create a real-world, dynamic database, enabling the seamless transfer and generalization of high-skill operations.
|
|
14:00-15:00, Paper TI4C.34 | |
Kinematic Heatmap for Pose Estimation in Dynamic Human Motion |
|
Ullah, Saif | Hanyang University |
Kim, Gyurin | Hanyang University |
Jaffrey, Zahra Batool | Hanyang University |
Ahmad, Niaz | Toronto Metropolitan University |
Khan, Jawad | Gachon University Republic of Korea |
Lee, Youngmoon | Hanyang University |
Keywords: Human Detection and Tracking, Deep Learning for Visual Perception, Human and Humanoid Motion Analysis and Synthesis
Abstract: Multi-person human pose estimation enables un- derstanding human activities, human-object interactions, and human-humanoid interactions. Yet, human pose estimation in dynamic video sequences suffer large performance degradation due to significant occlusions and rapid motions. Our proposed end-to-end framework isolates individuals using instance-level feature maps, thereby eliminating dependencies on bounding boxes and reducing distractions from the surroundings. By adaptively adjusting Gaussian kernels based on joint movement vectors between consecutive frames, we capture both spatial and temporal motion uncertainties. This adaptive strategy enables precise keypoint detection even during rapid movements or occlusions. Experimental results show that our method achieves higher accuracy while also improving computational efficiency over prior state-of-the-art methods on benchmarks including PoseTrack2018, PoseTrack21, and COCO.
|
|
14:00-15:00, Paper TI4C.35 | |
Crowd-Aware Path Planning and Visualization for Predictable Human Robot Interaction |
|
Bae, Minseong | Hanyang University Erica |
Choi, Youngeun | Hanyang University ERICA |
Kim, Youngseong | Hanyang University Erica |
Ahn, Taeseong | Hanyang University Erica |
Lee, Aron | Hanyang University Erica |
Lee, Junhwa | Hanyang University Erica |
Lee, Youngmoon | Hanyang University |
Keywords: Acceptability and Trust, Human-Aware Motion Planning, Motion and Path Planning
Abstract: Humanoid robots in crowded indoor settings must navigate safely and predictably for collision avoidance as well as timely arrival at destinations. Existing systems do not leverage crowd information available at modern infrastructures thus yield crowd-inefficient paths. They also lack predictable pedestrian-robot interface. This paper presents a crowd-aware navigation and visualization system that integrates real-time crowd density data into dynamic heatmaps for adaptive path planning. path projection interface displays intuitive directional cues on the floor, enabling pedestrians to anticipate the robot’s movements. Unlike static obstacle avoidance methods, our system dynamically adjusts trajectories based on congestion and estimates arrival times by factoring in crowd density. This approach enhances safety and transparency in API shared spaces, with applications in high-traffic environments.
|
|
14:00-15:00, Paper TI4C.36 | |
A Versatile Real-Time Controller for Multi-Interface Robot Applications |
|
Kim, Junyoung | KIRO(Korea Institute of Robotics & Technology Convergence) |
Hong, Jeongwoo | Korea Institute of Robotics & Technology Convergence (KIRO) |
Yun, WonBum | Korea Institute of Robotics and Technology Convergence |
Song, Ha-Yoon | Korea Institute of Robotics & Technology Convergence |
Chung, Hyun-Joon | Korea Institute of Robotics and Technology Convergence |
Keywords: Control Architectures and Programming, Software, Middleware and Programming Environments, Software Architecture for Robotic and Automation
Abstract: This study presents a Linux-based real-time controller for rapid prototyping of robotic systems with integration of diverse communication protocols, including EtherCAT, CAN, and DAQ. The controller manages multiple real-time and non-real-time threads, enabling advanced functionalities such as GUIs, computer vision, and AI-based services without degrading real-time performance. Experiments show that all real-time threads maintained accurate 1 kHz sampling with low jitter, even under heavy workloads and concurrent non-RT tasks. The proposed architecture provides a flexible and reliable solution for multi-interface integration, reducing development effort for advanced robotic applications.
|
|
14:00-15:00, Paper TI4C.37 | |
A Fully Soft Exosuit Using Pneumatic Artificial Muscles for Load-Lifting Assistance |
|
Hong, Taehwa | Seoul National University |
Lee, Chihyeong | Seoul National University |
Chang, Shinwon | Seoul National University |
Choi, Eunsik | Seoul National University |
Kim, Beomdo | Republic of Korea Naval Academy |
Ahn, Jooeun | Seoul National University |
Park, Yong-Lae | Seoul National University |
Keywords: Wearable Robotics, Soft Sensors and Actuators
Abstract: We present a fully-soft, untethered lift-assist wearable suit powered by Flat Inflatablte Artificial Muscles (FIAMs). The device consists of lightweight actuators, a load distribution harness, and a compact pneumatic control unit. The force and displacement of the actuators are modeled and characterized in a practical condition of a bent surface, and the durability under repetitive actuation is inspected. Preliminary human trials with five participants lifting 15 kg are conducted, and the results show reduced muscle activation and energy expenditure in some users, demonstrating the potential of fully-soft wearable assistance.
|
|
TO5B |
ASEM Ballroom |
Oral Session 2 |
Regular |
Chair: Cheng, Gordon | Technical University of Munich |
Co-Chair: Kim, Joohyung | University of Illinois Urbana-Champaign |
|
15:30-15:40, Paper TO5B.1 | |
TACT: Humanoid Whole-Body Contact Manipulation through Deep Imitation Learning with Tactile Modality |
|
Murooka, Masaki | AIST |
Hoshi, Takahiro | Tokyo University of Science |
Fukumitsu, Kensuke | Tokyo University of Science |
Masuda, Shimpei | University of Tsukuba |
Hamze, Marwan | Tokyo University of Science |
Sasaki, Tomoya | Tokyo University of Science |
Morisawa, Mitsuharu | National Inst. of AIST |
Yoshida, Eiichi | Faculty of Advanced Engineering, Tokyo University of Science |
Keywords: Multi-Contact Whole-Body Motion Planning and Control, Dual Arm Manipulation, Deep Learning in Grasping and Manipulation
Abstract: Manipulation with whole-body contact by humanoid robots offers distinct advantages, including enhanced stability and reduced load. On the other hand, we need to address challenges such as the increased computational cost of motion generation and the difficulty of measuring broad-area contact. We therefore have developed a humanoid control system that allows a humanoid robot equipped with tactile sensors on its upper body to learn a policy for whole-body manipulation through imitation learning based on human teleoperation data. This policy, named tactile-modality extended ACT (TACT), has a feature to take multiple sensor modalities as input, including joint position, vision, and tactile measurements. Furthermore, by integrating this policy with retargeting and locomotion control based on a biped model, we demonstrate that the life-size humanoid robot is capable of achieving whole-body contact manipulation while maintaining balance and walking. Through detailed experimental verification, we show that inputting both vision and tactile modalities into the policy contributes to improving the robustness of manipulation involving broad and delicate contact.
|
|
15:40-15:50, Paper TO5B.2 | |
Bridging the Reality Gap: Analyzing Sim-To-Real Transfer Techniques for Reinforcement Learning in Humanoid Bipedal Locomotion |
|
Kim, Donghyeon | Graduate School of Convergence Science and Technology, Seoul Nat |
Lee, Hokyun | Seoul National University |
Cha, Junhyeok Rui | Seoul National University |
Park, Jaeheung | Seoul National University |
Keywords: Humanoid and Bipedal Locomotion
Abstract: Reinforcement learning (RL) offers a promising solution for controlling humanoid robots, particularly for bipedal locomotion, by learning adaptive and flexible control strategies. However, direct RL application is hindered by time-consuming trial-and-error processes, necessitating training in simulation before real-world transfer. This introduces a reality gap that degrades performance. Although various methods have been proposed for sim-to-real transfer, they have not been validated on a consistent hardware platform making it difficult to determine which components are key to overcoming the reality gap. In contrast, we systematically evaluate techniques to enhance RL policy robustness during sim-to-real transfer by controlling variables and comparing them on a single robot to isolate and analyze the impact of each technique. These techniques include dynamics randomization, state history usage, noise/bias/delay modeling, state selection, perturbations, and network size. We quantitatively assess the reality gap by simulating diverse conditions and conducting experiments on real hardware. Our findings provide insights into bridging the reality gap, advancing robust RL-trained humanoid robots for real-world applications.
|
|
15:50-16:00, Paper TO5B.3 | |
Heavy Lifting Tasks Via Haptic Teleoperation of a Wheeled Humanoid |
|
Purushottam, Amartya | University of Illinois, Urbana-Champaign |
Yan, Huihan | University of Illinois at Urbana-Champaign |
Xu, Christopher | University of Illinois Urbana-Champaign |
Ramos, Joao | University of Illinois at Urbana-Champaign |
Keywords: Telerobotics and Teleoperation, Wheeled Robots, Haptics and Haptic Interfaces
Abstract: Humanoid robots can support human workers in physically demanding environments by performing tasks that require whole-body coordination, such as lifting and transporting heavy objects. These tasks, which we refer to as Dynamic Mobile Manipulation (DMM), require the simultaneous control of locomotion, manipulation, and posture under dynamic interaction forces. This paper presents a teleoperation framework for DMM on a height-adjustable wheeled humanoid robot for carrying heavy payloads. A Human-Machine Interface (HMI) enables whole-body motion retargeting from the human pilot to the robot by capturing the motion of the human and applying haptic feedback. The pilot uses body motion to regulate robot posture and locomotion, while arm movements guide manipulation. Real-time haptic feedback delivers end-effector wrenches and balance-related cues, closing the loop between human perception and robot-environment interaction. We evaluate different telelocomotion mappings that offer varying levels of balance assistance, allowing the pilot to either manually or automatically regulate the robot’s lean in response to payload-induced disturbances. The system is validated in experiments involving dynamic lifting of barbells and boxes up to 2.5 kg (21% of robot mass), demonstrating coordinated whole-body control, height variation, and disturbance handling under pilot guidance. Video demo can be found at: https://youtu.be/jF270_bG1h8
|
|
16:00-16:10, Paper TO5B.4 | |
Learning a Vision-Based Footstep Planner for Hierarchical Walking Control |
|
Kim, Minku | University of Pennsylvania |
Acosta, Brian | University of Pennsylvania |
Chaudhari, Pratik | University of Pennsylvania |
Posa, Michael | University of Pennsylvania |
Keywords: Humanoid and Bipedal Locomotion, Reinforcement Learning, Vision-Based Navigation
Abstract: Bipedal robots demonstrate potential in navigating challenging terrains through dynamic ground contact. However, current frameworks often depend solely on proprioception or use manually designed visual pipelines, which are fragile in real-world settings and complicate real-time footstep planning in unstructured environments. To address this problem, we present a vision-based hierarchical control framework that integrates a reinforcement learning high-level footstep planner, which generates footstep commands based on a local elevation map, with a low-level Operational Space Controller that tracks the generated trajectories. We utilize the Angular Momentum Linear Inverted Pendulum model to construct a low-dimensional state representation to capture an informative encoding of the dynamics while reducing complexity. We evaluate our method across different terrain conditions using the underactuated bipedal robot Cassie and investigate the capabilities and challenges of our approach through simulation and hardware experiments.
|
|
16:10-16:20, Paper TO5B.5 | |
CHILD (Controller for Humanoid Imitation and Live Demonstration): A Whole-Body Humanoid Teleoperation System |
|
Myers, Noboru | University of Illinois Urbana-Champaign |
Kwon, Obin | University of Illinois Urbana-Champaign |
Yamsani, Sankalp | University of Illinois Urbana-Champaign |
Kim, Joohyung | University of Illinois Urbana-Champaign |
Keywords: Telerobotics and Teleoperation, Whole-Body Motion Planning and Control, Hardware-Software Integration in Robotics
Abstract: Recent advances in teleoperation have demonstrated robots performing complex manipulation tasks. However, existing works rarely support whole-body joint-level teleoperation for humanoid robots, limiting the diversity of tasks that can be accomplished. This work presents Controller for Humanoid Imitation and Live Demonstration (CHILD), a compact reconfigurable teleoperation system that enables joint level control over humanoid robots. CHILD fits within a standard baby carrier, allowing the operator control over all four limbs, and supports both direct joint mapping for full-body control and loco-manipulation. Adaptive force feedback is incorporated to enhance operator experience and prevent unsafe joint movements. We validate the capabilities of this system by conducting loco-manipulation and full-body control demonstrations on a humanoid robot and multiple dual-arm systems. Lastly, we open-source the design of the hardware promoting accessibility and reproducibility. Additional details and open-source information are available at our project website: https://uiuckimlab.github.io/CHILD-pages.
|
|
16:20-16:30, Paper TO5B.6 | |
Latent Conditioned Loco-Manipulation Using Motion Priors |
|
Stępień, Maciej | LAAS CNRS |
Kourdis, Rafael | University of Edinburgh |
Roux, Constant | LAAS, CNRS |
Stasse, Olivier | LAAS, CNRS |
Keywords: Imitation Learning, Reinforcement Learning, Representation Learning
Abstract: Although humanoid and quadruped robots provide a wide range of capabilities, current control methods, such as Deep Reinforcement Learning, focus mainly on single skills. This approach is inefficient for solving more complicated tasks where high-level goals, physical robot limitations and desired motion style might all need to be taken into account. A more effective approach is to first train a multipurpose motion policy that acquires low-level skills through imitation, while providing latent space control over skill execution. Then, this policy can be used to efficiently solve downstream tasks. This method has already been successful for controlling characters in computer graphics. In this work, we apply the approach to humanoid and quadrupedal loco-manipulation by imitating either simple synthetic motions or kinematically retargeted dog motions. We extend the original formulation to handle constraints, ensuring deployment safety, and use a diffusion discriminator for better imitation quality. We verify our methods by performing loco-manipulation in simulation for the H1 humanoid and Solo12 quadruped, as well as deploying policies on Solo12 hardware. Videos and code are available at https://gepetto.github.io/LaCoLoco/
|
|
16:30-16:40, Paper TO5B.7 | |
Adding Internal Audio Sensing to Internal Vision Enables Human-Like In-Hand Fabric Recognition with Soft Robotic Fingertips |
|
Andrussow, Iris | Max-Planck-Institute for Intelligent Systems |
Solano, Jans | Max Planck Institute for Intelligent Systems |
Richardson, Benjamin A. | Max Planck Institute for Intelligent Systems |
Martius, Georg | Max Planck Institute for Intelligent Systems |
Kuchenbecker, Katherine J. | Max Planck Institute for Intelligent Systems |
Keywords: Force and Tactile Sensing, Learning Categories and Concepts, Soft Sensors and Actuators
Abstract: Distinguishing the feel of smooth silk from coarse cotton is a trivial everyday task for humans. When exploring such fabrics, fingertip skin senses both spatio-temporal force patterns and texture-induced vibrations that are integrated to form a haptic representation of the explored material. It is challenging to reproduce this rich, dynamic perceptual capability in robots because tactile sensors typically cannot achieve both high spatial resolution and high temporal sampling rate. In this work, we present a system that can sense both types of haptic information, and we investigate how each type influences robotic tactile perception of fabrics. Our robotic hand’s middle finger and thumb each feature a soft tactile sensor: one is the open-source Minsight sensor that uses an internal camera to measure fingertip deformation and force at 50 Hz, and the other is our new sensor Minsound that captures vibrations through an internal MEMS microphone with a bandwidth from 50 Hz to 15 kHz. Inspired by the movements humans make to evaluate fabrics, our robot actively encloses and rubs folded fabric samples between its two sensitive fingers. Our results test the influence of each sensing modality on overall classification performance, showing high utility for the audio-based sensor. Our transformer-based method achieves a maximum fabric classification accuracy of 97% on a dataset of 20 common fabrics. Incorporating an external microphone away from Minsound increases our method’s robustness in loud ambient noise conditions. To show that this audio-visual tactile sensing approach generalizes beyond the training data, we learn general representations of fabric stretchiness, thickness, and roughness.
|
|
16:40-16:50, Paper TO5B.8 | |
Skin-Machine Interface with Multimodal Contact Motion Classifier |
|
Confente, Alberto | Technical University of Munich |
Jin, Takanori | National Institute of Informatics/SOKENDAI |
Kobayashi, Taisuke | National Institute of Informatics |
Guadarrama-Olvera, J. Rogelio | Technical University of Munich |
Cheng, Gordon | Technical University of Munich |
Keywords: Haptics and Haptic Interfaces, Intention Recognition, Deep Learning Methods
Abstract: This paper proposes a novel framework for utilizing skin sensors as a new operation interface of complex robots. The skin sensors employed in this study possess the capability to quantify multimodal tactile information at multiple contact points. The time-series data generated from these sensors is anticipated to facilitate the classification of diverse contact motions exhibited by an operator. By mapping the classification results with robot motion primitives, a diverse range of robot motions can be generated by altering the manner in which the skin sensors are interacted with. In this paper, we focus on a learning-based contact motion classifier employing recurrent neural networks. This classifier is a pivotal factor in the success of this framework. Furthermore, we elucidate the requisite conditions for software-hardware designs. Firstly, multimodal sensing and its comprehensive encoding significantly contribute to the enhancement of classification accuracy and learning stability. Utilizing all modalities simultaneously as inputs to the classifier proves to be an effective approach. Secondly, it is essential to mount the skin sensors on a flexible and compliant support to enable the activation of three-axis accelerometers. These accelerometers are capable of measuring horizontal tactile information, thereby enhancing the correlation with other modalities. Furthermore, they serve to absorb the noises generated by the robot's movements during deployment. Through these discoveries, the accuracy of the developed classifier surpassed 95 %, enabling the dual-arm mobile manipulator to execute a diverse range of tasks via the Skin-Machine Interface.
|
|
16:50-17:00, Paper TO5B.9 | |
INTENTION: Inferring Tendencies of Humanoid Robot Motion through Interactive Intuition and Grounded VLM |
|
Wang, Jin | Italian Institute of Technology |
Wang, Weijie | Institute of Automation,Chinese Academy of Sciences |
Deng, Boyuan | Istituto Italiano Di Tecnologia |
Zhang, Heng | Italian Institute of Technology |
Dai, Rui | Istituto Italiano Di Tecnologia |
Tsagarakis, Nikos | Istituto Italiano Di Tecnologia |
Keywords: Humanoid Robot Systems, AI-Enabled Robotics, Intention Recognition
Abstract: Traditional control and planning for robotic manipulation heavily rely on precise physical models and predefined action sequences. While effective in structured environments, such approaches often fail in real-world scenarios due to modeling inaccuracies and struggle to generalize to novel tasks. In contrast, humans intuitively interact with their surroundings, demonstrating remarkable adaptability, making efficient decisions through implicit physical understanding. In this work, we propose INTENTION, a novel framework enabling robots with learned interactive intuition and autonomous manipulation in diverse scenarios, by integrating Vision-Language Models (VLMs) based scene reasoning with interaction-driven memory. We introduce Memory Graph to record scenes from previous task interactions which embodies human-like understanding and decision-making about different tasks in real world. Meanwhile, we design an Intuitive Perceptor that extracts physical relations and affordances from visual scenes. Together, these components empower robots to infer appropriate interaction behaviors in new scenes without relying on repetitive instructions. Videos: https://robo-intention.github.io
|
|
TI6C |
ASEM Ballroom Lobby |
Interactive Session 2 |
Interactive |
|
17:00-18:00, Paper TI6C.1 | |
Laplacian Trajectory Editing for Robotic Ultrasound Systems: Adapting Scan Trajectories to Patient Motion |
|
Koelmans, Toine | TU Delft |
Mol, Nicky | Delft University of Technology |
Prendergast, J. Micah | Delft University of Technology |
Keywords: Physical Human-Robot Interaction, Human-Aware Motion Planning, Medical Robots and Systems
Abstract: Robotic Ultrasound Systems (RUSS) provide a promising solution to reduce operator dependency, alleviate physical strain, and meet the growing demand for ultrasound procedures. However, their clinical applicability remains limited by their inability to adapt to dynamic patient movements and tissue deformations during scans. This work introduces a novel framework that leverages Laplacian Trajectory Editing (LTE) for real-time adaptation of scan trajectories in response to both rigid and non-rigid patient movements. it integrates a RGB-D camera to capture surface point clouds, which are processed to estimate displacements between consecutive frames. These displacements define anchor points for LTE-based trajectory adaptations, ensuring smooth motion while preserving local trajectory properties. This approach is validated through experiments spanning rigid phantom movements, generalization across differently shaped phantoms, and non-rigid human arm motion. Adaptation accuracy is quantified by comparing adapted trajectories to a ground-truth reference, with root mean squared errors averaging 0.026 pm 0.012 m in non-rigid scenarios. Real-time trajectory adaptation is achieved, with an average LTE adaptation processing time of 373 ms per trial. Furthermore, our implementation achieved low tracking errors across all conditions while maintaining a high success rate in diverse movement scenarios. These results demonstrate the feasibility of LTE for real-time trajectory adaptation in ultrasound scanning, offering a pathway to more autonomous and clinically viable RUSS implementations.
|
|
17:00-18:00, Paper TI6C.2 | |
Leveraging Sequentiality in Reinforcement Learning from a Single Demonstration |
|
Chenu, Alexandre | Sorbonne Université |
Serris, Olivier | Sorbonne Université |
Sigaud, Olivier | Sorbonne Université |
Perrin-Gilbert, Nicolas | Université Pierre Et Marie Curie-Paris 6, CNRS UMR 7222 |
Keywords: Reinforcement Learning, Learning from Demonstration, Humanoid and Bipedal Locomotion
Abstract: Deep reinforcement learning faces challenges in long-horizon, high-dimensional robotic tasks that provide rewards only upon completion. In this context, while expert demonstrations can help, they are costly. In this work, we leverage a sequential decomposition bias to learn control policies for such tasks from a single demonstration. Our method learns a goal-conditioned policy that guides the system through low-dimensional intermediate goals, offering more flexibility than using states as goals. However, this approach can complicate the task of ensuring compatibility between successive goals. To address these challenges, we propose a formal framework, GC-SeqMDPs, and an algorithm, STIL, which properly handles a hindsight goal relabeling mechanism. We first demonstrate the benefits of STIL on a relatively simple, long-horizon, nonholonomic robotic task. Then, we show that STIL achieves unprecedented sample efficiency on more complex simulated tasks, such as humanoid locomotion and fast running on the Cassie robot, marking a significant step toward the resolution of complex robotic tasks with minimal task-specific information.
|
|
17:00-18:00, Paper TI6C.3 | |
PriorFormer: A Transformer for Real-Time Monocular 3D Human Pose Estimation with Versatile Geometric Priors |
|
Adjel, Mohamed | LISSI, Université De Paris-Est Créteil |
Bonnet, Vincent | University Paul Sabatier |
Keywords: Modeling and Simulating Humans, Human and Humanoid Motion Analysis and Synthesis
Abstract: This paper proposes a new lightweight Transformer-based lifter that maps short sequences of human 2D joint positions to 3D poses using a single camera. The proposed model takes as input geometric priors including segment lengths and camera intrinsics and is designed to operate in both calibrated and uncalibrated settings. To this end, a masking mechanism enables the model to ignore missing priors during training and inference. This yields a single versatile network that can adapt to different deployment scenarios, from fully calibrated lab environments to in-the-wild monocular videos without calibration. The model was trained using 3D keypoints from AMASS dataset with corresponding 2D synthetic data generated by sampling random camera poses and intrinsics. It was then compared to an expert model trained, only on complete priors, and the validation was done by conducting an ablation study. Results show that both, camera and segment length priors, improve performance and that the versatile model outperforms the expert, even when all priors are available, and maintains high accuracy when priors are missing. Overall the average 3D joint center positions estimation accuracy was as low as 36mm improving state of the art by half a centimeter and at a much lower computational cost. Indeed, the proposed model runs in 380µs on GPU and 1800µs on CPU, making it suitable for deployment on embedded platforms and low-power devices.
|
|
17:00-18:00, Paper TI6C.4 | |
A Framework for Optimal Ankle Design of Humanoid Robots |
|
Cervettini, Guglielmo | Istituto Italiano Di Tecnologia |
Mauceri, Roberto | Istituto Italiano Di Tecnologia |
Coppola, Alex | Istituto Italiano Di Tecnologia |
Bergonti, Fabio | Istituto Italiano Di Tecnologia |
Fiorio, Luca | Istituto Italiano Di Tecnologia |
Maggiali, Marco | Italian Institute of Technology |
Pucci, Daniele | Italian Institute of Technology |
Keywords: Mechanism Design, Legged Robots, Methods and Tools for Robot System Design
Abstract: The design of the humanoid ankle is critical for safe and efficient ground interaction. Key factors such as mechanical compliance and motor mass distribution have driven the adoption of parallel mechanism architectures. However, selecting the optimal configuration depends on both actuator availability and task requirements. We propose a unified methodology for the design and evaluation of parallel ankle mechanisms. First, a multi-objective optimization synthesizes the mechanism geometry, then the resulting solutions are evaluated using a scalar cost function that aggregates key performance metrics for cross-architecture comparison. We focus on two representative architectures: the Spherical-Prismatic-Universal (SPU) and the Revolute-Spherical-Universal (RSU). For both, we resolve the kinematics, and for the RSU, introduce a parameterization that ensures workspace feasibility and accelerates optimization. We validate our approach by redesigning the ankle of an existing humanoid robot. The optimized RSU consistently outperforms both the original serial design and a conventionally engineered RSU, reducing the cost function by up to 41% and 14%, respectively.
|
|
17:00-18:00, Paper TI6C.5 | |
Learning Differentiable Reachability Maps for Optimization-Based Humanoid Motion Generation |
|
Murooka, Masaki | AIST |
Kumagai, Iori | National Inst. of AIST |
Morisawa, Mitsuharu | National Inst. of AIST |
Kanehiro, Fumio | National Inst. of AIST |
Keywords: Whole-Body Motion Planning and Control, Humanoid and Bipedal Locomotion, Humanoid Robot Systems
Abstract: To reduce the computational cost of humanoid motion generation, we introduce a new approach to representing robot kinematic reachability: the differentiable reachability map. This map is a scalar-valued function defined in the task space that takes positive values only in regions reachable by the robot's end-effector. A key feature of this representation is that it is continuous and differentiable with respect to task-space coordinates, enabling its direct use as constraints in continuous optimization for humanoid motion planning. We describe a method to learn such differentiable reachability maps from a set of end-effector poses generated using a robot's kinematic model, using either a neural network or a support vector machine as the learning model. By incorporating the learned reachability map as a constraint, we formulate humanoid motion generation as a continuous optimization problem. We demonstrate that the proposed approach efficiently solves various motion planning problems, including footstep planning, multi-contact motion planning, and loco-manipulation planning for humanoid robots.
|
|
17:00-18:00, Paper TI6C.6 | |
Velocity Modulation in Robotic Texture Exploration Based on Human Perceptual Dimensions |
|
Li, Ao | University or Bristol |
Brayshaw, George | University of Bristol |
Lu, Yanhui | University of Bristol |
Ward-Cherrier, Benjamin | University of Bristol |
Keywords: Force and Tactile Sensing, Haptics and Haptic Interfaces
Abstract: Humans adjust the scanning velocity applied during tactile exploration based on textural properties such as roughness and hardness. In this study, we investigate whether robots can benefit from a similar adaptation strategy for texture discrimination. We conduct a user study to estimate the perceptual dimensions of 10 natural textures. We then examine how the scanning velocity of a robotic arm equipped with a neuromorphic tactile sensor affects clustering performance under varying force conditions, and how this performance relates to human perceptual dimensions. Our results show that each texture exhibits a specific optimal velocity that enhances discrimination. Furthermore, this optimal velocity can be reliably estimated from human perceptual ratings of traction, roughness, and hardness (R^2 = 0.77). This approach demonstrates the potential of integrating human-inspired perceptual principles into humanoid robot touch, contributing to more explainable and effective sensorimotor control.
|
|
17:00-18:00, Paper TI6C.7 | |
Novel Vision-Based One-Shot Adaptation of Learned Skills |
|
Prados, Adrian | Universidad Carlos III De Madrid |
Hertel, Brendan | University of Masssachusetts Lowell |
Espinoza, Gonzalo | Universidad Carlos III De Madrid |
Méndez, Alberto | Universidad Carlos III De Madrid |
Barber, Ramon | Universidad Carlos III of Madrid |
Azadeh, Reza | University of Massachusetts Lowell |
Keywords: Learning from Demonstration, Perception for Grasping and Manipulation, Imitation Learning
Abstract: This paper introduces VASO (Visual Adaptation of Skills in One-Shot), a novel framework that enables robots to adapt learned manipulation skills to new environments using a single demonstration. VASO leverages visual keypoint-based representations and integrates them with a trajectory modeling approach inspired by the Fast Marching Square method to generate safe, obstacle-aware paths. It incorporates object detection from visual data and 3D point clouds to identify key manipulation points, applying a Variational Autoencoder to determine optimal grasp positions and orientations. The learned skill is then adapted through an Elastic Map-inspired deformation technique, enabling task generalization with respect to new waypoints or constraints. Unlike existing visual imitation learning methods, VASO achieves robust one-shot generalization without iterative re-planning, and is validated in both simulated and real-world scenarios, demonstrating effectiveness in dynamic and cluttered environments.
|
|
17:00-18:00, Paper TI6C.8 | |
Comparing Machine Learning Methods for Force Myography Based Estimation of Isokinetic Knee and Ankle Torques |
|
Wolk, Tim | Karlsruher Institut Für Technologie (KIT) |
Marquardt, Charlotte | Karlsruhe Institute of Technology (KIT) |
Dezman, Miha | Karlsruhe Institute of Technology (KIT) |
Asfour, Tamim | Karlsruhe Institute of Technology (KIT) |
Keywords: Wearable Robotics, Prosthetics and Exoskeletons, Machine Learning for Robot Control
Abstract: Wearable sensors enable accurate estimation of joint moments through easy-to-use myography-based methods, such as force myography (FMG), offering practical benefits and valuable insights into continuous muscle state estimation to enhance control strategies. This paper presents a comparative analysis of four commonly used machine learning methods, Gaussian process regression (GPR), support vector regression (SVR), feed-forward neural network (FFNN), and temporal convolutional network (TCN), for estimation of human knee and ankle joint torques based on joint angles, velocities, and FMG signals from eight muscles on the human leg. The performance of the methods was evaluated on isokinetic motions of ten participants and compared to the models enhanced by electromyography (EMG) signals. Among the evaluated models, neural networks consistently demonstrated the highest accuracy in both inter- and intra-participant validations. Incorporating FMG modality yielded comparable performance to EMG-based estimation for unknown participants. Additionally, FMG outperforms EMG-based estimation in novel task characteristics within a single participant. These findings demonstrate the potential of FMG as a viable alternative to EMG for human joint torque estimation and highlight its potential for personalized exoskeleton control.
|
|
17:00-18:00, Paper TI6C.9 | |
Friction-Aware Safety Locomotion for Wheeled-Legged Robots Using Vision Language Models and Reinforcement Learning |
|
Peng, Bo | University of Illinois Urbana Champaign |
Baek, DongHoon | Georgia Institute of Technology |
Wang, Qijie | Tsinghua University |
Ramos, Joao | University of Illinois at Urbana-Champaign |
Keywords: Wheeled Robots, Semantic Scene Understanding, Reinforcement Learning
Abstract: Controlling Wheeled-legged robots is challenging especially on slippery surfaces due to their dependence on continuous ground contact. Unlike quadrupeds or bipeds, which can leverage multiple fixed contact points for recovery, wheeled-legged robots are highly susceptible to slip, where even momentary loss of traction can result in irrecoverable instability. Anticipating ground physical properties such as friction before contact would allow proactive control adjustments, reducing slip risk. In this paper, we propose a frictionaware safety locomotion framework that integrates VisionLanguage Models (VLMs) with a Reinforcement Learning (RL) policy. Our method employs a Retrieval-Augmented Generation (RAG) approach to estimate the Coefficient of Friction (CoF), which is then explicitly incorporated into the RL policy. This enables the robot to adapt its speed based on predicted friction conditions before contact. The framework is validated through experiments in both simulation and on a physical customized Wheeled Inverted Pendulum (WIP). Experimental results show that our approach successfully completes trajectory tracking tasks on slippery surfaces, whereas baseline methods relying solely on proprioceptive feedback fail. These findings highlight the importance and effectiveness of explicitly predicting and utilizing ground friction information for safe locomotion. They also point to a promising research direction in exploring the use of VLMs for estimating ground conditions, which remains a significant challenge for purely vision-based methods.
|
|
17:00-18:00, Paper TI6C.10 | |
Learning Multimodal Attention for Manipulating Deformable Objects with Changing States |
|
Saito, Namiko | Microsoft Research Asia - Tokyo |
Tatsumi, Mayu | Waseda University |
Kubo, Ayuna | Waseda University |
Suzuki, Kanata | Fujitsu Limited |
Ito, Hiroshi | Hitachi, Ltd. / Waseda University |
Sugano, Shigeki | Waseda University |
Ogata, Tetsuya | Waseda University |
Keywords: Perception-Action Coupling, Learning from Experience, Perception for Grasping and Manipulation
Abstract: To support humans in their daily lives, robots are required to adapt to objects whose states change due to external factors such as heat and force, and perform appropriate actions accordingly. Many objects in everyday environments exhibit such dynamic and continuous changes in their physical properties. In these situations, sensory input from multiple modalities often contains both valuable and noisy information, and the importance of each sensor modality can shift over time as the object's state changes. This makes real-time perception and motion generation particularly challenging. We propose a predictive recurrent neural network with an attention mechanism that dynamically weights sensor modalities based on their current reliability and relevance, enabling robots to achieve efficient perception and adaptive manipulation of objects undergoing state changes. To demonstrate the effectiveness of the proposed method, we validated it on a physical humanoid robot, using a manipulation task of cooking scrambled eggs as an example scenario.
|
|
17:00-18:00, Paper TI6C.11 | |
End-Effector Position Control of Joint Sensor-Less Robots for Harsh Environment Based on SLAM Using Hand-Eye Camera |
|
Ichiwara, Hideyuki | Hitachi, Ltd. / Waseda University |
Nagai, Takahiro | Hitachi-GE Nuclear Energy, Ltd |
Ueno, Katsunori | Hitachi, Ltd |
Kobayashi, Ryosuke | Hitachi, Ltd |
Hirano, Katsuhiko | Hitachi-GE Nuclear Energy, Ltd |
Keywords: Field Robots, Visual Servoing, Robotics in Hazardous Fields
Abstract: For robots used in harsh environments, such as radiation environments, it may be difficult to attach sensors to joints due to concerns about environmental resistance and reliability. This paper proposes a control system for robots that lack sensors at the joints, whose position and velocity are difficult to control. In principle, the proposed method can be widely applied to humanoid and dual-arm robots. The robot is equipped with a hand-eye camera, which enables it to estimate the end effector's position through SLAM while controlling its movements. Additionally, the joint angles and Jacobian matrix related to the end-effector positions are estimated, and the system calculates which joints to drive and in which direction to approach the target position by driving one joint at a time, thereby controlling the end-effector position. To validate the effectiveness of the proposed method, experiments were conducted using a water hydraulic robot for debris handling tasks. Compared to a human-operated positioning time of 50 seconds, the proposed method achieved an average positioning time of 31 seconds, demonstrating its effectiveness in assisting manual work and potential for field application.
|
|
17:00-18:00, Paper TI6C.12 | |
PCHands: PCA-Based Hand Pose Synergy Representation on Manipulators with N-DoF |
|
Puang, En Yen | Istituto Italiano Di Tecnologia |
Ceola, Federico | Istituto Italiano Di Tecnologia |
Pasquale, Giulia | Istituto Italiano Di Tecnologia |
Natale, Lorenzo | Istituto Italiano Di Tecnologia |
Keywords: Multifingered Hands, Dexterous Manipulation, Learning from Demonstration
Abstract: We consider the problem of learning a common representation for dexterous manipulation across manipulators of different morphologies. To this end, we propose PCHands, a novel approach for extracting hand postural synergies from a large set of manipulators. We define a simplified and unified description format based on anchor positions for manipulators ranging from 2-finger grippers to 5-finger anthropomorphic hands. This enables learning a variable-length latent representation of the manipulator configuration and the alignment of the end-effector frame of all manipulators. We show that it is possible to extract principal components from this latent representation that is universal across manipulators of different structures and degrees of freedom. To evaluate PCHands, we use this compact representation to encode observation and action spaces of control policies for dexterous manipulation tasks learned with Reinforcement Learning (RL). In terms of learning efficiency and consistency, the proposed representation outperforms a baseline that learns the same tasks in joint space. We additionally show that PCHands performs robustly in RL from demonstration, when demonstrations are provided from a different manipulator. We further support our results with real-world experiments that involve a 2-finger gripper and a 4-finger anthropomorphic hand. Code and additional material are available at https://hsp-iit.github.io/PCHands/.
|
|
17:00-18:00, Paper TI6C.13 | |
RGB-D versus Omnidirectional Visual SLAM in Humanoid Robot Positioning |
|
Caillot, Antoine | CNRS-AIST JRL |
André, Antoine N. | AIST |
Duvinage, Thomas | CNRS |
Caron, Guillaume | CNRS |
Keywords: Vision-Based Navigation, SLAM, Omnidirectional Vision
Abstract: Visual Simultaneous Localization and Mapping (VSLAM) with an RGB-Depth (RGB-D) camera is the gold standard to help correcting humanoid robots’ drift while walking toward a goal pose indoors. But, the bounded range and Field of View (FoV) of these cameras constrain the use cases for humanoids to few changes in the environment. At the same time, compact omnidirectional cameras capable of a 360º FoV are spreading quickly at low cost, offering more possibilities to the VSLAM system to observe reliable features whichever the robot orientation. Thus, this paper investigates the VSLAM with a 360º camera versus an RGB-D one in the humanoid head for correcting the walking drift of a baseline walking controller over several meters. The Baseline Biped Positioning hence made is evaluated regarding final pose accuracy improvement and robustness to environmental changes by leveraging multiple sets of real experiments on a 1.8 m tall humanoid robot.
|
|
17:00-18:00, Paper TI6C.14 | |
Minimal Observations Inverse Reinforcement Learning for Predicting Human Box-Lifting Motions |
|
Sabbah, Maxime | LAAS-CNRS |
Becanovic, Filip | University of Belgrade |
Mehrdad, Sarmad | New York University |
Righetti, Ludovic | New York University |
Watier, Bruno | LAAS, CNRS, Université Toulouse 3 |
Bonnet, Vincent | University Paul Sabatier |
Keywords: Human and Humanoid Motion Analysis and Synthesis, Modeling and Simulating Humans, Human Factors and Human-in-the-Loop
Abstract: Heavy-load manual lifting poses a significant risk of injury, motivating the need for personalized robotic assistance. The Minimal Observations Inverse Reinforcement Learning (MO-IRL) algorithm has recently demonstrated strong capabilities in recovering underlying optimality principles from very few demonstrations of simulated robotic motions, and at a very reasonable computational cost. Building on this, the present study integrates ten biomechanically informed cost functions into a direct optimal control formulation to predict human motion during heavy-load manual box-lifting tasks. Contrary to previous literature, thanks to the computational efficiency of MO-IRL, we allow time-varying optimal weights and include a collision-avoidance constraint within the set of cost functions. This constraint represents the subject’s apprehension of hitting the target table, As MO-IRL requires careful tuning of multiple hyperparameters, we employ a grid search to identify the optimal set. With this configuration, the predicted motion achieves an average accuracy of 11.5 ± 6.2deg across all joint angles, outperforming comparable methods. The inferred cost weights reveal a time-varying control strategy: initially minimizing lower-limb torques, then smoothing the motion through reduced joint accelerations and load velocity, and finally adjusting to avoid table collision. These findings show that biomechanically guided MO-IRL, coupled with direct optimal control, can accurately recover complex, constrained lifting motions while providing interpretable insights into human motor objectives, paving the way for adaptive and user-specific robotic assistance.
|
|
17:00-18:00, Paper TI6C.15 | |
Fast Estimation of Globally Optimal Independent Contact Regions for Robust Grasping and Manipulation |
|
King, Jonathan | The Robotics Institute, Carnegie Mellon University |
Ahluwalia, Harnoor | Wakeland High School |
Zhang, Michael | Carnegie Mellon University |
Pollard, Nancy S | Carnegie Mellon University |
Keywords: Grasping, Manipulation Planning, Multifingered Hands
Abstract: This work presents a fast anytime algorithm for computing globally optimal independent contact regions (ICRs). ICRs are regions such that one contact within each region enables a valid grasp. Locations of ICRs can provide guidance for grasp and manipulation planning, learning, and policy transfer. However, ICRs for modern applications have been little explored, in part due to the expense of computing them, as they have a search space exponential in the number of contacts. We present a divide and conquer algorithm based on incremental n-dimensional Delaunay triangulation that produces results with bounded suboptimality in times sufficient for real-time planning. This paper presents the base algorithm for grasps where contacts lie within a plane. Our experiments show substantial benefits over competing grasp quality metrics and speedups of 100X and more for competing approaches to computing ICRs.
|
|
17:00-18:00, Paper TI6C.16 | |
Semantic Navigation with Embodied AI for Humanoid Robots in Personalized Household Environments |
|
Noh, DongKi | LG Electronics Inc |
Lee, Jaemin | LG Electronics |
Park, Jongkuk | LG Electronics |
Kim, Hyoung-Rock | LG Electronics Co. Advanced Research Institute |
Baek, Seung-Min | LG Electronics |
Keywords: AI-Enabled Robotics, Semantic Scene Understanding, Field Robots
Abstract: Humanoid robots designed for household assistance must autonomously navigate complex indoor environments originally built for humans. Unlike industrial settings, household environments are highly heterogeneous and exhibit significant variations across different users. To address this challenge, we propose a navigation system for humanoid robots that integrates 3D semantic simultaneous localization and mapping (SLAM) with high-level semantic understanding derived from large language models (LLMs). This combination enhances mapping accuracy and enables LLM-driven scene-aware planning. Specifically, our method represents environmental information in natural language form, allowing users to intuitively communicate tasks to the robot via verbal instructions. In addition, our framework supports user-friendly interactions grounded in semantic context. We validate the effectiveness of the system through experiments on public datasets and in real-world household scenarios. The results demonstrate significant improvements in mapping accuracy, localization robustness, navigation efficiency, and overall system reliability compared with baseline methods. Furthermore, our approach consistently achieves superior high‑level path planning by leveraging semantic cues and is well suited for deployment on resource‑constrained humanoid platforms.
|
|
17:00-18:00, Paper TI6C.17 | |
Bidirectional Mapping between Physical Contacts and Visual Tactile Images for Physics-Based Simulation |
|
Hong, Taehwa | Seoul National University |
Park, Yong-Lae | Seoul National University |
Keywords: Data Sets for Robot Learning, Calibration and Identification, Soft Sensors and Actuators
Abstract: This paper introduces a simulation framework for the vision-based tactile (ViTac) sensor through accurate modeling of contact deformation and pressure. A finite element model replicates a ViTac sensor to simulate contact events and generate high-resolution surface deformation and force. These simulated contacts are translated into tactile RGB images using data-driven mapping, enabling large-scale synthetic data generation without requiring real measurements. The framework also infers depth and contact force from real RGB images using simulated contact as supervision, reversely. The resulting bidirectional mapping connects simulated and real tactile domains, supporting both synthetic data generation and the addition of physical annotations to existing datasets. This framework is applicable to learning-based tactile perception tasks where high-quality paired data are limited or difficult to collect.
|
|
17:00-18:00, Paper TI6C.18 | |
Proactive Ergonomic Human Motion Generation for Human-Humanoid Collaboration |
|
LEE, DONGGYU | Hanyang University |
Hwang, Seunghoon | California State University Northridge |
Choi, SeungMin | Hanyang University |
Kim, Yoon Seo | Hanyang University |
Kim, Wansoo | Hanyang University ERICA |
Keywords: Human-Robot Collaboration, Human-Centered Automation, Human Factors and Human-in-the-Loop
Abstract: This paper presents a human-centric proactive ergonomic safety framework for human-humanoid collaboration, aiming to ensure that human motion remains physically safe and ergonomically compliant throughout cooperative tasks. A vector auto-regressive model predicts short-term human joint trajectories based on repetitive task data, and the predicted motion is tracked by a virtual human model. Control Barrier Functions enforce ergonomic constraints in real time, including stance stability via the Center of Pressure and effort alignment via the force manipulability ellipsoid. Simulations of collaborative lifting tasks demonstrate that the framework effectively constrains predicted motions within an ergonomic safety set. Quantitative results show that average overloading joint torque was reduced by 24.6% at the shoulder, and total biomechanical effort decreased by 48.1% and 60.8% at the shoulder and elbow, respectively. These reductions highlight the potential of our framework to enable humanoid systems that proactively adapt to human motion while maintaining ergonomic safety in physically collaborative tasks.
|
|
17:00-18:00, Paper TI6C.19 | |
Combining Episodic Memory and LLMs for the Verbalization of Robot Experiences |
|
Plewnia, Joana | Karlsruhe Institute of Technology (KIT) |
Asfour, Tamim | Karlsruhe Institute of Technology (KIT) |
Keywords: Natural Dialog for HRI, Long term Interaction, Robot Companions
Abstract: The ability to communicate past experiences is fundamental for intelligent and natural interaction. Humanoid robots continuously accumulate rich, multi-modal experiential data and must be able to articulate their episodic experiences in natural language to support effective human-robot communication. Existing approaches either rely on the generalization capabilities of large language models (LLMs), sometimes combined with episodic memory, or the precision of rule-based verbalization systems, each presenting limitations when used in isolation. In this work, we present a novel hybrid framework that integrates the adaptability of LLMs with the robustness of rule-based methods and the generalizable structure of memory-based approaches. Our system implements strategies to retrieve and transform memory representations of past perceptions and actions into natural language responses. This enables humanoid robots to respond to natural language queries about their experience. Experimental evaluation based on a set of distinct query types demonstrates that our approach successfully answers 89.4% of episodic memory questions with human-in-the-loop refinement, while reducing token consumption by 97% compared to pure LLM-based methods. Furthermore, we demonstrate the system's extensibility by leveraging LLMs, such as GPT-4.1, to expand the range of permissible queries through example-based interaction. The evaluation on our humanoid robot ARMAR-7 performing household tasks validates that our hybrid approach balances response quality with computational efficiency to address the crucial need for dependable yet flexible verbalization of robot experiences. Code and examples are available at https://robotexperienceverbalization.github.io/REV.github.io/
|
|
17:00-18:00, Paper TI6C.20 | |
A Universal Wire Testing Machine for Enhancing the Performance of Wire-Driven Robots |
|
Suzuki, Temma | The University of Tokyo |
Kawaharazuka, Kento | The University of Tokyo |
Okada, Kei | The University of Tokyo |
Keywords: Tendon/Wire Mechanism, Mechanism Design, Redundant Robots
Abstract: Compared with gears and linkages, wires constitute a lightweight, low-friction transmission mechanism. However, because wires are flexible materials, they tend to introduce large modeling errors, and their adoption in industrial and research robots remains limited. In this study, we built a Universal Wire Testing Machine that enables measurement and adjustment of wire characteristics to improve the performance of wire-driven mechanisms. Using this testing machine, we carried out removal of initial wire stretch, measurement of tension transmission efficiency for eight different diameters of passive pulleys, and measurement of the dynamic behavior of variable-length wires. Finally, we applied the data obtained from this testing machine to the force control of an actual wire-driven robot, reducing the end-effector force error.
|
|
17:00-18:00, Paper TI6C.21 | |
IBSA Joint: A Fully Integrated Bi-Stiffness Actuator Joint for Precise Energy Transfer Timing in Explosive Motion Tasks |
|
Yildirim, Mehmet Can | Technical University of Munich |
Pozo Fortunić, Edmundo | Technical University of Munich |
Ossadnik, Dennis | Technical University of Munich |
Rakcevic, Vasilije | Technical University of Munich |
Samuel, Kangwagye | Technical University of Munich |
Swikir, Abdalla | Mohamed Bin Zayed University of Artificial Intelligence |
Le Mesle, Valentin | Technical University of Munich |
Haddadin, Sami | Mohamed Bin Zayed University of Artificial Intelligence |
Keywords: Actuation and Joint Mechanisms, Motion Control
Abstract: Despite the promise of elastic actuation for energy efficiency and performance, the deployment of robotic systems with elastic joints capable of executing highly dynamic and explosive motions is still limited. We address that gap in this paper by advancing the Bi-Stiffness Actuation (BSA) concept through the design and control of a fully integrated BSA joint (iBSA) that combines elastic energy storage with robust high-speed control. Particularly, we address the need for system-level integration rather than isolated component analysis, as commonly seen in existing literature. Additionally, we introduce an advanced and robust high-speed control system specifically tailored to the iBSA joint's complex control and dynamic response requirements. Through detailed simulations and experimental validation, we demonstrate that the iBSA joint significantly improves efficient energy transfers, precise control, and dynamic performance in challenging tasks such as explosive movements. We develop the iBSA joint with a focus on optimising both actuator design and overall system performance. Moreover, this work addresses key challenges in the timing control of energy storage and release, enabling a practical framework for implementing next-generation high-performance elastic manipulators.
|
|
17:00-18:00, Paper TI6C.22 | |
Lattice Membrane for Improved Mechanical Characteristics of Monolithic Flexure-Hinge Based Anthropomorphic Hands |
|
Hadjigeorgiou, Nicos | King's College London |
Spyrakos-Papastavridis, Emmanouil | King's College London |
Irving, Ryan Alexander | King's College London |
Metcalfe, Benjamin W | University of Bath |
Bailey, Nicola | King’s College London |
Keywords: Compliant Joints and Mechanisms, Multifingered Hands, Prosthetics and Exoskeletons
Abstract: Flexure-based mechanisms offer a promising alternative to traditional rigid-link joints in robotic hands. They enable simplified construction, reduced weight, and adaptability through their intrinsic compliance. This paper presents the design and mechanical characterization of a compound flexure-hinge joint with a lattice membrane for use in a monolithic anthropomorphic hand. A series of experimental tests were conducted to evaluate the joint’s stiffness in flexion and torsion, which were compared to joints from existing literature. Additionally, its durability under cyclic loading was examined, together with the functional grasping capabilities of a full anthropomorphic hand utilizing this joint. Results demonstrate that the joint exhibits increased torsional stiffness of 5.51 ± 0.47 Nmm/degree compared to 0.52 ± 0.08 Nmm/degree for a simple flexure-hinge joint. Additionally, it has a smooth response under flexion at a stiffness of 0.50 ± 0.07 N/mm compared to a full membrane which has a variable stiffness between 0.54 ± 0.10 N/mm and 1.30 ± 0.17 N/mm, caused by a buckling response. Preliminary results also showed that the joint can withstand cyclic loading without failure, but with some deformation, for over 300,000 cycles, and the ability of a hand utilizing these joints to perform essential grips for successful operation. Clear benefits to incorporating the lattice structure are presented, which include, added torsional stiffness compared to a simple flexure-hinge joint, a smooth force profile compared to a full membrane. The lattice design also prevents object interference with the joint cavity during grasping. These findings support the viability of the proposed design approach for use in adaptive, lightweight and compact robotic hands, with applications in humanoid robots and upper-limb prosthetics.
|
|
17:00-18:00, Paper TI6C.23 | |
CAD-Driven Co-Design for Flight-Ready Jet-Powered Humanoids |
|
Vanteddu, Punith Reddy | Istituto Italiano Di Tecnologia |
Gorbani, Davide | Italian Institute of Technology |
L'Erario, Giuseppe | Istituto Italiano Di Tecnologia |
Mohamed, Hosameldin Awadalla Omer | Italian Institute of Technology |
Bergonti, Fabio | Istituto Italiano Di Tecnologia |
Pucci, Daniele | Italian Institute of Technology |
Keywords: Aerial Systems: Mechanics and Control, Humanoid Robot Systems, Legged Robots
Abstract: This paper presents a CAD-driven co-design framework for optimizing jet-powered aerial humanoid robots to execute dynamically constrained trajectories. Starting from the iRonCub-Mk3 model, a Design of Experiments (DoE) approach is used to generate 5,000 geometrically varied and mechanically feasible designs by modifying limb dimensions, jet interface geometry (e.g., angle and offset), and overall mass distribution. Each model is constructed through CAD assemblies to ensure structural validity and compatibility with simulation tools. To reduce computational cost and enable parameter sensitivity analysis, the models are clustered using K-means, with representative centroids selected for evaluation. A minimum- jerk trajectory is used to assess flight performance, providing position and velocity references for a momentum-based linear Model Predictive Control (MPC) strategy. A multi-objective optimization is then conducted using the NSGA-II algorithm, jointly exploring the space of design centroids and MPC gain parameters. The objectives are to minimize trajectory tracking error and mechanical energy expenditure. The framework outputs a set of flight-ready humanoid configurations with validated control parameters, offering a structured method for selecting and implementing feasible aerial humanoid designs.
|
|
17:00-18:00, Paper TI6C.24 | |
Did You Notice? Unveiling Robot-Induced Synchronization in Human-Robot Interaction |
|
Zhu, Lixiao | Mcgill University |
Wong, Christopher Yee | Concordia University |
Moon, AJung | McGill University |
Keywords: Human-Robot Collaboration, Physical Human-Robot Interaction, Ethics and Philosophy
Abstract: Decades of studies in psychology and neuroscience establish that humans naturally synchronize their movements with others. This phenomenon is not limited to Human-Human Interaction (HHI) but also observed in Human-Robot Interaction (HRI), where individuals align their motions to the rhythm of a robot without conscious effort. While such influences are often subtle, characterizing the dynamics of robot-to-human synchronization is critical for designing effective collaborative systems. This study explores whether humans synchronize differently to the movements of robots compared to other humans, and at what point individuals become conscious of the changes in the robot’s movements. Our results reveal an asymmetry in synchronization between HHI and HRI. We find that participants are more likely to notice changes in a robot’s speed, particularly when the robot speeds up or slows down by over 20% compared to the base speed. Participants’ perception of the robot’s animacy is also influenced by these speed changes. Building on our findings, we provide insights into how roboticists can design robot behaviours to minimize unwanted influence and respect human autonomy.
|
|
17:00-18:00, Paper TI6C.25 | |
The Surprising Effectiveness of Linear Models for Whole-Body Model-Predictive Control |
|
Bishop, Arun | Carnegie Mellon University |
Alvarez Padilla, Juan Rodolfo | Carnegie Mellon University |
Schoedel, Samuel | Carnegie Mellon University |
Sow, Ibrahima Sory | Carnegie Mellon University |
Chandrachud, Juee Mandar | Carnegie Mellon University |
Sharma, Sheitej | Carnegie Mellon University |
Kraus, William | Carnegie Mellon University |
Park, Beomyeong | Florida Institute for Human and Machine Cognition |
Griffin, Robert J. | Institute for Human and Machine Cognition (IHMC) |
Dolan, John M. | Carnegie Mellon University |
Manchester, Zachary | Carnegie Mellon University |
Keywords: Whole-Body Motion Planning and Control, Legged Robots, Optimization and Optimal Control
Abstract: When do locomotion controllers require reasoning about nonlinearities? In this work we show that a whole-body model-predictive controller using a simple linear time-invariant approximation of the whole-body dynamics is able to execute basic locomotion tasks on legged robots. The formulation requires no online nonlinear dynamics evaluations or matrix inversions. We demonstrate walking, disturbance rejection, and even navigation to a goal position without a separate footstep planner on a quadrupedal robot. In addition, we demonstrate dynamic walking on a hydraulic humanoid, a robot with significant limb inertia, complex actuator dynamics, and large sim-to-real gap.
|
|
17:00-18:00, Paper TI6C.26 | |
The Role of Embodiment in Intuitive Whole-Body Teleoperation for Mobile Manipulation |
|
Bianchi Moyen, Sophia | Technical University of Darmstadt |
Krohn, Rickmer | TU Darmstadt |
Lueth, Sophie C. | Technical University of Darmstadt |
Pompetzki, Kay | Intelligent Autonomous Systems Group, Technical University Darms |
Peters, Jan | Technische Universität Darmstadt |
Prasad, Vignesh | TU Darmstadt |
Chalvatzaki, Georgia | Technische Universität Darmstadt |
Keywords: Mobile Manipulation, Telerobotics and Teleoperation, Virtual Reality and Interfaces
Abstract: Intuitive Teleoperation interfaces are essential for mobile manipulation robots to ensure high quality data collection while reducing operator workload. A strong sense of embodiment combined with minimal physical and cognitive demands not only enhances the user experience during large-scale data collection, but also helps maintain data quality over extended periods. This becomes especially crucial for challenging long-horizon mobile manipulation tasks that require whole-body coordination. We compare two distinct robot control paradigms: a coupled embodiment integrating arm manipulation and base navigation functions, and a decoupled embodiment treating these systems as separate control entities. Additionally, we evaluate two visual feedback mechanisms: immersive virtual reality and conventional screen-based visualization of the robot's field of view. These configurations were systematically assessed across a complex, multi-stage task sequence requiring integrated planning and execution. Our results show that the use of VR as a feedback modality increases task completion time, cognitive workload, and perceived effort of the teleoperator. Coupling manipulation and navigation leads to a comparable workload on the user as decoupling the embodiments, while preliminary experiments suggest that data acquired by coupled teleoperation leads to better imitation learning performance. Our holistic view on intuitive teleoperation interfaces provides valuable insight into collecting high-quality, high-dimensional mobile manipulation data at scale with the human operator in mind.
|
|
17:00-18:00, Paper TI6C.27 | |
Learning to Walk in Costume: Adversarial Motion Priors for Aesthetically Constrained Humanoids |
|
Flores Alvarez, Arturo Moises | University of California, Los Angeles |
Zargarbashi, Fatemeh | ETH Zurich |
Liu, Havel | UCLA |
Wang, Shiqi | UCLA |
Edwards, Liam | University of California, Los Angeles |
Anz, Jessica | UCLA |
Xu, Alex | University of California, Los Angeles |
Shi, Fan | National University of Singapore |
Coros, Stelian | ETH Zurich |
Hong, Dennis | UCLA |
Keywords: Humanoid Robot Systems, Imitation Learning, Reinforcement Learning
Abstract: We present a Reinforcement Learning (RL)-based locomotion system for Cosmo, a custom-built humanoid robot designed for entertainment applications. Unlike traditional humanoids, entertainment robots present unique challenges due to aesthetic-driven design choices. Cosmo embodies these with a disproportionately large head (16% of total mass), limited sensing, and protective shells that considerably restrict movement. To address these challenges, we apply Adversarial Motion Priors (AMP) to enable the robot to learn natural-looking movements while maintaining physical stability. We develop tailored domain randomization techniques and specialized reward structures to ensure safe sim-to-real, protecting valuable hardware components during deployment. Our experiments demonstrate that AMP generates stable standing and walking behaviors despite Cosmo's extreme mass distribution and movement constraints. These results establish a promising direction for robots that balance aesthetic appeal with functional performance, suggesting that learning-based methods can effectively adapt to aesthetic-driven design constraints.
|
|
17:00-18:00, Paper TI6C.28 | |
No More Blind Spots: Learning Vision-Based Omnidirectional Bipedal Locomotion for Challenging Terrain |
|
Gadde, Mohitvishnu S. | Oregon State University |
Dugar, Pranay | Oregon State University |
Malik, Ashish | Oregon State University |
Fern, Alan | Oregon State University |
Keywords: Humanoid and Bipedal Locomotion, Omnidirectional Vision, Legged Robots
Abstract: Effective bipedal locomotion in dynamic environments, such as cluttered indoor spaces or uneven terrain, requires agile and adaptive movement in all directions. This necessitates omni-directional terrain sensing and a controller capable of processing such input. We present a learning framework for vision-based omnidirectional bipedal locomotion, enabling seamless movement using depth images. A key challenge is the high computational cost of rendering omni-directional depth images in simulation, making traditional sim-to-real reinforcement learning (RL) impractical. Our method combines a robust blind controller with a teacher policy that supervises a vision-based student policy, trained on noise-augmented terrain data to avoid rendering costs during RL and ensure robustness. We also introduce a data augmentation technique for supervised student training, accelerating training by up to 10 times compared to conventional methods. Our framework is validated through simulation and real-world tests, demonstrating effective omnidirectional locomotion with minimal reliance on expensive rendering. This is, to the best of our knowledge, the first demonstration of vision-based omnidirectional bipedal locomotion, showcasing its adaptability to diverse terrains.
|
|
17:00-18:00, Paper TI6C.29 | |
XoPercept: Fine-Grained RGB-D Perception and Mapping for Safe Navigation in Wearable Humanoid Exoskeletons |
|
HONARI, DIBA | Simon Fraser University |
Pedrammanesh, Ali | Simon Fraser University |
Peykari, Behzad | Human in Motion |
Park, Edward J. | Simon Fraser University |
Arzanpour, Siamak | Simon Fraser University |
Najafi, Farshid | Simon Fraser University |
Keywords: RGB-D Perception, Object Detection, Segmentation and Categorization, Prosthetics and Exoskeletons
Abstract: Lower-limb exoskeletons can restore walking ability for individuals with mobility impairments by augmenting human strength. However, even millimeter-scale obstacles or minor surface irregularities can adversely affect an exoskeleton’s stability and hinder real-world deployment. This paper presents XoPercept, an integrated 360° perception system for the self-balancing XoMotion exoskeleton that employs multiple RGB-D cameras to continuously map the walking surface. In indoor experiments, XoPercept detected 98% of obstacles as small as 1 mm and, using a novel hybrid sizing method, provided 3D measurements of obstacles with errors under 3 cm, thereby generating detailed terrain models. By closing this critical safety gap, XoPercept enhances the reliability and confidence of exoskeleton-assisted mobility in everyday environments.
|
|
17:00-18:00, Paper TI6C.30 | |
Locomotion on Constrained Footholds Via Layered Architectures and Model Predictive Control |
|
Olkin, Zachary | California Institute of Technology |
Ames, Aaron | Caltech |
Keywords: Multi-Contact Whole-Body Motion Planning and Control, Legged Robots, Humanoid and Bipedal Locomotion
Abstract: Computing stabilizing and optimal control actions for legged locomotion in real time is difficult due to the nonlinear, hybrid, and high dimensional nature of these robots. The hybrid nature of the system introduces a combination of discrete and continuous variables which causes issues for numerical optimal control. To address these challenges, we propose a layered architecture that separates the choice of discrete variables and a smooth Model Predictive Controller (MPC). The layered formulation allows for online flexibility and optimality without sacrificing real-time performance through a combination of gradient-free and gradient-based methods. The architecture leverages a sampling-based method for determining discrete variables, and a classical smooth MPC formulation using these fixed discrete variables. We demonstrate the results on a quadrupedal robot stepping over gaps and onto terrain with varying heights. In simulation, we demonstrate the controller on a humanoid robot for gap traversal. The layered approach is shown to be more optimal and reliable than common heuristic-based approaches and faster to compute than pure sampling methods.
|
|
17:00-18:00, Paper TI6C.31 | |
A Robotic Hand with Fiber-Optic Tendons for Embedded Tension Sensing to Enhance Dexterity in Teleoperation |
|
Yi, Jaehyun | Seoul National University |
Chung, Wook Joon | Seoul National University |
Lee, Jeongwon | Seoul National University |
Park, Jeonghun | Seoul National University |
Muzammal, Hamza | Argonne National Laboratory |
Park, Young Soo | Argonne National Laboratory |
Park, Yong-Lae | Seoul National University |
Keywords: Underactuated Robots, Force and Tactile Sensing, Telerobotics and Teleoperation
Abstract: Underactuated robotic hands are extensively used in remote manipulation due to their adaptability and mechanical simplicity. This work presents a fiber-optic tendon with embedded fiber Bragg gratings (FBGs) that enables simultaneous actuation, tendon tension sensing, and fingertip tactile sensing. Each finger’s tendon, routed to a servomotor at the wrist, performs actuation while providing real-time haptic information. Experimental results demonstrate the feasibility of this approach, showcasing its multifunctional capabilities and potential to enhance real-time feedback in remote manipulation.
|
|
17:00-18:00, Paper TI6C.32 | |
Seeing through Occlusion: Improving Human Motion Capture Accuracy under Body Interference |
|
Koenig, Nicholas | University of Waterloo |
Kwok, Thomas M. | University of Waterloo |
Hu, Yue | University of Waterloo |
Keywords: RGB-D Perception, Human Detection and Tracking, Multi-Modal Perception for HRI
Abstract: Marker-less motion capture using single RGB-D cameras provides a cost-effective, non-invasive alternative to conventional marker-based systems. However, depth accuracy often degrades under self-occlusion, particularly during arm movements. We propose a simple yet effective kinematic correlation method that leverages geometric relations from human anatomy. By re-projecting elbow and shoulder positions using the more reliable wrist position and known limb lengths via the Pythagorean theorem, our method corrects depth estimates of occluded joints. Evaluation against a Vicon reference system during repeated arm motions shows improvements of 56.8% in RMSE and 12.4% in Pearson correlation for elbow depth estimation. These results demonstrate the approach’s potential to enhance pose estimation robustness under self-occlusion and warrant further validation across additional body segments and movements.
|
|
17:00-18:00, Paper TI6C.33 | |
Language-Guided Semantic Navigation with Spatial Understanding Via Vision-Language Foundation Models |
|
Kwak, Myeongcheol | LG Electronics |
Kim, Kayoung | LG Electronics |
Ko, Hyun | LG Electronics |
Park, Duckgee | LG Electronics |
Kim, Hyoung-Rock | LG Electronics Co. Advanced Research Institute |
Baek, Seung-Min | LG Electronics |
Keywords: Semantic Scene Understanding, Mapping, Object Detection, Segmentation and Categorization
Abstract: Spatial understanding of service environments is crucial for humanoid robots. They need to perform natural language-driven tasks in diverse and unpredictable environments. In this paper, we propose an open-set semantic mapping framework that leverages vision-language foundation models in conjunction with LiDAR SLAM to create precise 3D object representations associated with language embeddings. By integrating the semantic maps with a large language model (LLM), we demonstrate an application case showing language-based object retrieval and subsequent navigation tasks using both common sense reasoning and spatial understanding.
|
|
17:00-18:00, Paper TI6C.34 | |
Neural-Augmented Inertia-Adaptive MPC with Whole-Body Control for 10-DoF Bipedal Robots |
|
Lee, JungHwan | Sungkyunkwan University |
Kim, Hyeonju | Sungkyunkwan University |
Lee, JaeHong | Sungkyunkwan University |
Thanh, Truong | Sungkyunkwan University |
Lee, Hyunyong | AIDIN ROBOTICS |
Nam, SeongWon | Sungkyunkwan University |
Choi, Hyouk Ryeol | Sungkyunkwan University |
Keywords: Humanoid and Bipedal Locomotion, Whole-Body Motion Planning and Control, Humanoid Robot Systems
Abstract: Model Predictive Control (MPC) using single rigid body dynamics (SRBD) performs well in legged locomotion, but is less accurate for bipeds due to their relatively large leg mass. To address this, we adopt an Inertia-Adaptive MPC (IA-MPC) that incorporates centroidal inertia predicted in real time by a Centroidal Composite Inertia Neural Network (CCINN). This inertia enables MPC to achieve posture-dependent control accuracy. Additionally, a Whole-Body Controller (WBC) maps the MPC-computed contact forces and moments into joint torques consistent with full-body dynamics. Simulation results show improved long-horizon walking stability, with reduced lateral deviation and more than 50% reduction in roll error.
|
|
17:00-18:00, Paper TI6C.35 | |
Design and Development of a 10-DoF Humanoid Lower Body System Using Proximal Actuation and Transmission Mechanisms |
|
Kim, Hyeonju | Sungkyunkwan University |
Lee, JaeHong | Sungkyunkwan University |
Lee, JungHwan | Sungkyunkwan University |
Thanh, Truong | Sungkyunkwan University |
Lee, Hyunyong | AIDIN ROBOTICS |
Choi, Hyouk Ryeol | Sungkyunkwan University |
Keywords: Humanoid Robot Systems, Mechanism Design, Humanoid and Bipedal Locomotion
Abstract: This paper presents a 10-DoF bipedal robot with proximal actuation for lightweight and low-inertia leg dynamics. Unlike conventional designs with joint-mounted actuators, our approach places actuators near the torso to reduce mass and inertia. The robot uses low gear ratio Quasi-Direct Drive (QDD) actuators, transmitting power via a timing belt at the knee and a four-bar linkage at the ankle. The robot’s mass is distributed such that both weight and rotational inertia decrease toward the end of the leg, improving responsiveness and control stability. Experimental evaluation showed a knee RMSE of 0.0048 rad, confirming the accuracy of the timing belt-driven transmission. In addition, simulation results demonstrated that joint torques during 0.4 m/s walking remained within actuator limits. Future work includes hardware walking experiments and integration of an upper body to complete the humanoid platform.
|
|
17:00-18:00, Paper TI6C.36 | |
Tendon-Based Proprioception for Manipulation in an Underactuated Anthropomorphic Robotic Hand with Series Elastic Actuators |
|
LEE, JaeHyun | Seoul National University |
Park, Jong Hoo | Seoul National University |
Cho, Kyu-Jin | Seoul National University, Biorobotics Laboratory |
Keywords: Perception for Grasping and Manipulation, Grasping, Mechanism Design
Abstract: This paper presents an anthropomorphic underactuated robotic hand equipped with a miniature, proprioceptive, cable-driven series elastic actuator (MPC-SEA) that enables grasp posture and object deformation estimation using only tendon-based proprioceptive sensing. Unlike conventional tactile-based approaches, the proposed method avoids additional sensors on the finger, preserving the simplicity and compactness of underactuated mechanisms. By integrating the MPC-SEA with a sensor-less underactuated finger, we enable the estimation of phalanx-level contact timing, joint angles, and object deformation during grasping. Experimental results using a single finger validate the proposed sensing framework and demonstrate its feasibility for proprioception-only manipulation without relying on vision or tactile sensing.
|
|
17:00-18:00, Paper TI6C.37 | |
Real-Time Gait Adaptation for Quadrupeds Using Model Predictive Control and Reinforcement Learning |
|
Nair B, Ganga | Indian Institute of Science, Bengaluru |
Kotecha, Prakrut | Indian Institute of Science |
Kolathaya, Shishir | Indian Institute of Science |
Keywords: Legged Robots, Reinforcement Learning, Optimization and Optimal Control
Abstract: Model-free reinforcement learning (RL) policies for quadruped locomotion often converge to a single gait, limiting adaptability and efficiency. We propose a control framework for real-time gait adaptation using Model Predictive Path Integral (MPPI) control and a Dreamer-based module. Our approach jointly optimizes actions and continuous gait parameters to enable smooth transitions, velocity tracking, and energy-efficient locomotion. Simulation results on the Unitree Go1 show up to 55% lower energy consumption compared to fixed-gait RL policies, while maintaining accurate tracking and stable transitions across a range of target speeds.
|
|
17:00-18:00, Paper TI6C.38 | |
Public Perception of Humanoid AI Robots: Topic Modeling and Sentiment Analysis of YouTube Comments |
|
Kim, Natalie | University of Southern California |
Keywords: AI-Based Methods, Robot Companions
Abstract: Fueled by rapid advancements in generative Artificial Intelligence(AI), large language models (LLMs), and large multimodal models (LMMs), excitement around the boundless opportunities for AI robotics and Humanoid AI is also increasing. However, contradictory to its potential benefits, some express concerns about the potential job displacement, wealth concentration, and the erosion of human connection. Furthermore, while some view that humanoid AIs are on the horizon, others express discomfort, saying we humans are not fully prepared to integrate them into our society. Thus, there is a pressing need to understand the public sentiment towards humanoid AI robots in particular. This present study builds a model on YouTube comment topic modeling and sentiment analysis of the humanoid AI robots to determine public opinion. In this study, we have collected 116,611 comments from the top 20 most viewed YouTube videos (as of April 19, 2025) that featured humanoid AI robots. This dataset is examined using topic modeling and sentiment analysis. Topic modeling using BERTopics was implemented to identify the topics within the 3,000 comments. Sentiment classification was performed using the OpenAI GPT-3.5 API.
|
| |