| |
Last updated on October 7, 2024. This conference program is tentative and subject to change
Technical Program for Friday October 18, 2024
|
FrPI6T1 |
Room 1 |
Humanoid and Bipedal Locomotion |
Teaser Session |
Chair: Pucci, Daniele | Italian Institute of Technology |
|
09:00-10:00, Paper FrPI6T1.1 | |
Demonstrating a Robust Walking Algorithm for Underactuated Bipedal Robots in Non-Flat, Non-Stationary Environments |
|
Dosunmu-Ogunbi, Oluwami | University of Michigan |
Shrivastava, Aayushi | University of Michigan Ann Arbor |
Grizzle, J.W | University of Michigan |
Keywords: Humanoid and Bipedal Locomotion
Abstract: This work explores an innovative algorithm designed to enhance the mobility of underactuated bipedal robots across challenging terrains, especially when navigating through spaces with constrained opportunities for foot support, like steps or stairs. By combining ankle torque with a refined angular momentum-based linear inverted pendulum model (ALIP), our method allows variability in the robot's center of mass height. We employ a dual-strategy controller that merges virtual constraints for precise motion regulation across essential degrees of freedom with an ALIP-centric model predictive control (MPC) framework, aimed at enforcing gait stability. The effectiveness of our feedback design is demonstrated through its application on the Cassie bipedal robot, which features 20 degrees of freedom. Key to our implementation is the development of tailored nominal trajectories and an optimized MPC that reduces the execution time to under 500 microseconds—and, hence, is compatible with Cassie's controller update frequency. This paper not only showcases the successful hardware deployment but also demonstrates a new capability, a bipedal robot using a moving walkway.
|
|
09:00-10:00, Paper FrPI6T1.2 | |
Compliance Optimization Control for Rigid-Soft Hybrid System and Its Application in Humanoid Robot Motion Control |
|
He, Zewen | University of Tokyo |
Ishigaki, Taiki | The University of Tokyo |
Yamamoto, Ko | University of Tokyo |
Keywords: Humanoid and Bipedal Locomotion, Legged Robots, Modeling, Control, and Learning for Soft Robots
Abstract: Flexibility and softness play a significant role in dynamic human motions. This includes the flexibility owing to ligaments in the human body and the softness of external structures such as a leaf-spring-type prosthesis. Thus, robotic systems need to utilize such flexibility to achieve dynamic and energy-efficient motion. In this study, we proposed a compliance optimization-based control framework for a rigid-soft hybrid robot system where the continuous deformation of a flexible structure is represented using the piece-wise constant strain (PCS) model. We divided the hybrid system into two states: single support and double support. We validated the proposed method in these states using forward dynamics simulations, assuming a hybrid link system that consists of a humanoid robot with a flexible prosthesis.
|
|
09:00-10:00, Paper FrPI6T1.3 | |
Whole-Body Humanoid Robot Locomotion with Human Reference |
|
Zhang, Qiang | The Hong Kong University of Science and Technology (Guangzhou) |
Cui, Peter | Peter & David Robotics (Beijing) Co, . Ltd |
Yan, David | Peter & David Robotics (Beijing) Co, . Ltd |
Sun, Jingkai | The Hong Kong University of Science and Technology(GZ) |
Duan, Yiqun | University of Technolgoy Sydney |
Han, Gang | PND Robotics |
Zhao, Wen | Nankai University |
Zhang, Weining | Beijing Innovation Center of Humanoid Robotics |
Guo, Yijie | UBTECH Robotics |
Zhang, Arthur | Peter & David Robotics (Beijing) Co, . Ltd |
Xu, Renjing | The Hong Kong University of Science and Technology (Guangzhou) |
Keywords: Humanoid and Bipedal Locomotion, Humanoid Robot Systems, Imitation Learning
Abstract: Recently, humanoid robots have made significant advances in their ability to perform complex tasks due to the deployment of Reinforcement Learning (RL), however, the inherent complexity of humanoid robots, including the difficulty of planning complex reward functions and training entire complex systems, still poses a notable challenge. To conquer these challenges, after many iterations and in-depth investigations, we have meticulously developed a full-size humanoid robot, "Adam", whose innovative structural design greatly improves the efficiency and effectiveness of the imitation learning process. In addition, we have developed a novel imitation learning framework based on an adversarial motion prior, which applies not only to Adam but also to humanoid robots in general. Using the framework, Adam can exhibit unprecedented human-like characteristics in locomotion tasks. Our experimental results demonstrate that the proposed framework enables Adam to achieve human-comparable performance in complex locomotion tasks, marking the first time that human locomotion data has been used for imitation learning in a full-size humanoid robot.
|
|
09:00-10:00, Paper FrPI6T1.4 | |
Toward Understanding Key Estimation in Learning Robust Humanoid Locomotion |
|
Wang, Zhicheng | Zhejiang University |
Wei, Wandi | Zhejiang University |
Yu, Ruiqi | Zhejiang University |
Wu, Jun | Zhejiang University |
Zhu, Qiuguo | Zhejiang University |
Keywords: Humanoid and Bipedal Locomotion, Reinforcement Learning, Motion Control
Abstract: Accurate state estimation plays a critical role in ensuring the robust control of humanoid robots, particularly in the context of learning-based control policies for legged robots. However, there is a notable gap in analytical research concerning estimations. Therefore, we endeavor to further understand how various types of estimations influence the decision-making processes of policies. In this paper, we provide quantitative insight into the effectiveness of learned state estimations, employing saliency analysis to identify key estimation variables and optimize their combination for humanoid locomotion tasks. Evaluations assessing tracking precision and robustness are conducted on comparative groups of policies with varying estimation combinations in both simulated and real-world environments. Results validated that the proposed policy is capable of crossing the sim-to-real gap and demonstrating superior performance relative to alternative policy configurations.
|
|
09:00-10:00, Paper FrPI6T1.5 | |
Joint-Level IS-MPC: A Whole-Body MPC with Centroidal Feasibility for Humanoid Locomotion |
|
Belvedere, Tommaso | CNRS |
Scianca, Nicola | Sapienza University of Rome |
Lanari, Leonardo | Sapienza University of Rome |
Oriolo, Giuseppe | Sapienza University of Rome |
Keywords: Humanoid and Bipedal Locomotion, Whole-Body Motion Planning and Control, Legged Robots
Abstract: We propose an effective whole-body MPC controller for locomotion of humanoid robots. Our method generates motions using the full kinematics, allowing it to account for joint limits and to exploit upper-body motions to reject disturbances. Each MPC iteration solves a single QP that considers the interplay between dynamic and kinematic features of the robot. Thanks to our special formulation, we are able to perform a feasibility analysis, which opens the door to future enhancements of functionality and performance, e.g., step adaptation in complex environments. We demonstrate its effectiveness through a campaign of dynamic simulations aimed at highlighting how the joint limits and the use of the angular momentum through upper-body motions are fundamental for maximizing performance, robustness, and ultimately make the robot able to execute more challenging gaits.
|
|
09:00-10:00, Paper FrPI6T1.6 | |
Integrating Model-Based Footstep Planning with Model-Free Reinforcement Learning for Dynamic Legged Locomotion |
|
Lee, Ho Jae | Massachusetts Institute of Technology |
Hong, Seungwoo | MIT (Massachusetts Institute of Technology) |
Kim, Sangbae | Massachusetts Institute of Technology |
Keywords: Humanoid and Bipedal Locomotion, Reinforcement Learning, Machine Learning for Robot Control
Abstract: In this work, we introduce a control framework that combines model-based footstep planning with Reinforcement Learning (RL), leveraging desired footstep patterns derived from the Linear Inverted Pendulum (LIP) dynamics. Utilizing the LIP model, our method forward predicts robot states and determines the desired foot placement given the velocity commands. We then train an RL policy to track the foot placements without following the full reference motions derived from the LIP model. This partial guidance from the physics model allows the RL policy to integrate the predictive capabilities of the physics-informed dynamics and the adaptability characteristics of the RL controller without overfitting the policy to the template model. Our approach is validated on the MIT Humanoid, demonstrating that our policy can achieve stable yet dynamic locomotion for walking and turning. We further validate the adaptability and generalizability of our policy by extending the locomotion task to unseen, uneven terrain. During the hardware deployment, we have achieved forward walking speeds of up to 1.5 m/s on a treadmill and have successfully performed dynamic locomotion maneuvers such as 90-degree and 180-degree turns.
|
|
09:00-10:00, Paper FrPI6T1.7 | |
Revisiting Reward Design and Evaluation for Robust Humanoid Standing and Walking |
|
van Marum, Bart | Oregon State University |
Shrestha, Aayam | Oregon State University |
Duan, Helei | Oregon State University |
Dugar, Pranay | Oregon State University |
Dao, Jeremy | Oregon State University |
Fern, Alan | Oregon State University |
Keywords: Humanoid and Bipedal Locomotion, Robust/Adaptive Control, Reinforcement Learning
Abstract: A necessary capability for humanoid robots is the ability to stand and walk while rejecting natural disturbances. Recent progress has been made using sim-to-real reinforcement learning (RL) to train such locomotion controllers, with approaches differing mainly in their reward functions. However, prior works lack a clear method to systematically test new reward functions and compare controller performance through repeatable experiments. This limits our understanding of the trade-offs between approaches and hinders progress. To address this, we propose a low-cost, quantitative benchmarking method to evaluate and compare the real-world performance of standing and walking (SaW) controllers on metrics like command following, disturbance recovery, and energy efficiency. We also revisit reward function design and construct a minimally constraining reward function to train SaW controllers. We experimentally verify that our benchmarking framework can identify areas for improvement, which can be systematically addressed to enhance the policies. We also compare our new controller to state-of-the-art controllers on the Digit humanoid robot. The results provide clear quantitative trade-offs among the controllers and suggest directions for future improvements to the reward functions and expansion of the benchmarks.
|
|
09:00-10:00, Paper FrPI6T1.8 | |
Bipedal Safe Navigation Over Uncertain Rough Terrain: Unifying Terrain Mapping and Locomotion Stability |
|
Muenprasitivej, Kasidit | Georgia Institute of Technology |
Jiang, Jesse | Georgia Institute of Technology |
Shamsah, Abdulaziz | Georgia Institute of Technology |
Coogan, Samuel | Georgia Tech |
Zhao, Ye | Georgia Institute of Technology |
Keywords: Humanoid and Bipedal Locomotion, Motion and Path Planning, Robot Safety
Abstract: We study the problem of bipedal robot navigation in complex environments with uncertain and rough terrain. In particular, we consider a scenario in which the robot is expected to reach a desired goal location by traversing an environment with uncertain terrain elevation. Such terrain uncertainties induce not only untraversable regions but also robot motion perturbations. Thus, the problems of terrain mapping and locomotion stability are intertwined. We evaluate three different kernels for Gaussian process (GP) regression to learn the terrain elevation. We also learn the motion deviation resulting from both the terrain as well as the discrepancy between the reduced-order Prismatic Inverted Pendulum Model used for planning and the full-order locomotion dynamics. We propose a hierarchical locomotion-dynamics-aware sampling-based navigation planner. The global navigation planner plans a series of local waypoints to reach the desired goal locations while respecting locomotion stability constraints. Then, a local navigation planner is used to generate a sequence of dynamically feasible footsteps to reach local waypoints. We develop a novel trajectory evaluation metric to minimize motion deviation and maximize information gain of the terrain elevation map. We evaluate the efficacy of our planning framework on Digit bipedal robot simulation in MuJoCo.
|
|
09:00-10:00, Paper FrPI6T1.9 | |
Whleaper: A 10-DOF High-Performance Bipedal Wheeled Robot |
|
Zhu, Yinglei | Tsinghua University |
He, SiXiao | Tsinghua University |
Qi, Zhenghao | Tsinghua University |
Yong, Zhuoyuan | Tsinghua University |
Qin, Yihua | Tsinghua University |
Chen, Jianyu | Tsinghua University |
Keywords: Humanoid and Bipedal Locomotion, Legged Robots, Wheeled Robots
Abstract: Wheel-legged robots combine the advantages of both wheeled robots and legged robots, offering versatile locomotion capabilities with excellent stability on challenging terrains and high efficiency on flat surfaces. However, existing wheel-legged robots typically have limited hip joint mobility compared to humans, which play a crucial role in locomotion. In this paper, we introduce Whleaper, a novel 10-degree-of-freedom (DOF) bipedal wheeled robot, with 3 DOFs at the hip of each leg. Its humanoid joint design enables adaptable motion in complex scenarios, ensuring stability and flexibility. This paper introduces the details of Whleaper, with a focus on innovative mechanical design, control algorithms and system implementation. Firstly, stability stems from the increased DOFs at the hip, which maximize motion range, enhance the attitude of contact between the feet and the ground. Secondly, the extra DOFs also augment its mobility capabilities. During walking or sliding, more complex movements can be adopted to execute obstacle avoidance tasks. Thirdly, we utilize two control algorithms to implement multimodal motion for walking and sliding. By controlling specific DOFs of the robot, we conducted a series of simulation and practical experiments, demonstrating that a high-DOF hip joint design can effectively enhance the stability and flexibility of wheel-legged robots. Whleaper shows its capability to perform actions such as squatting, obstacle avoidance sliding, and rapid turning in real-world scenarios.
|
|
09:00-10:00, Paper FrPI6T1.10 | |
Physically Consistent Online Inertial Adaptation for Humanoid Loco-Manipulation |
|
Foster, James Paul | University of West Florida |
McCrory, Stephen | Institute for Human and Machine Cognition |
DeBuys, Christian | Texas A&M University |
Bertrand, Sylvain | Institute for Human and Machine Cognition |
Griffin, Robert J. | Institute for Human and Machine Cognition (IHMC) |
Keywords: Humanoid Robot Systems, Humanoid and Bipedal Locomotion, Model Learning for Control
Abstract: The ability to accomplish manipulation and locomotion tasks in the presence of significant time-varying external loads is a remarkable skill of humans that has yet to be replicated convincingly by humanoid robots. Such an ability will be a key requirement in the environments we envision deploying our robots: dull, dirty, and dangerous. External loads constitute a large model bias, which is typically unaccounted for. In this work, we enable our humanoid robot to engage in loco-manipulation tasks in the presence of significant model bias due to external loads. We propose an online estimation and control framework involving the combination of a physically consistent extended Kalman filter for inertial parameter estimation coupled to a whole-body controller. We showcase our results both in simulation and in hardware, where weights are mounted on Nadia's wrist links as a proxy for engaging in tasks where large external loads are applied to the robot.
|
|
09:00-10:00, Paper FrPI6T1.11 | |
Feasible Region Construction by Polygon Merging for Continuous Bipedal Walking |
|
Li, Chao | Beijing Institute of Technology |
Chen, Xuechao | Beijing Insititute of Technology |
Hengbo, Qi | Beijing Institute of Technology School of Mechatronical Engineer |
Li, Qingqing | Beijing Institute of Technology |
Zhao, Qingrui | Beijing Institute of Technology |
Shi, Yongliang | Tsinghua University |
Yu, Zhangguo | Beijing Institute of Technology |
Zhao, Lingxuan | Beijing Institute of Technology |
Jiang, Zhihong | Beijing Institute of Technology |
Keywords: Humanoid Robot Systems, Vision-Based Navigation, Humanoid and Bipedal Locomotion
Abstract: Feasible regions for continuous walking must provide necessary information for footstep planning, including surrounding landing areas and details about obstacles to be avoided during foot swing. However, the current frame lacks sufficient information to construct a feasible region needed at the current moment due to knee occlusion. To this end, this paper uses polygon merging to construct an information-complete feasible region. This polygon merging refers to merging polygons from the current frame and a specific previous frame. Since the polygon is more concise and efficient than point cloud for environmental representation, construction can be completed quickly without GPU acceleration. Experiments show that the proposed method successfully constructs informative feasible regions within the allowed time frame, enabling the robot to navigate stairs.
|
|
09:00-10:00, Paper FrPI6T1.12 | |
Magnetic Tactile Sensor with Load Tolerance and Flexibility Using Frame Structures for Estimating Triaxial Contact Force Distribution of Humanoid |
|
Hiraoka, Takuma | The University of Tokyo |
Kunita, Ren | The University of Tokyo |
Kojima, Kunio | The University of Tokyo |
Hiraoka, Naoki | The University of Tokyo |
Konishi, Masanori | The University of Tokyo |
Makabe, Tasuku | The University of Tokyo |
Tang, Annan | The University of Tokyo |
Okada, Kei | The University of Tokyo |
Inaba, Masayuki | The University of Tokyo |
Keywords: Humanoid Robot Systems, Force and Tactile Sensing, Sensor Networks
Abstract: For humanoid whole body contact motions, it is important to recognize the existence of whole body contacts and the contact forces. The challenges in recognizing the existence of whole body contacts and the contact forces in life-size humanoids are: 1) the measurement part with low mechanical strength must be tolerant of high load and 2) it is difficult to model thick elastic bodies with high impact tolerance and uneven sensor placements when applied to various shapes of the whole body. This paper proposes a method of constructing a load tolerant tactile sensor by separating the loaded part from the measuring part with magnetism and protecting the measuring part inside the frame of the robot. For modeling difficulties, this paper proposes learning the relationship between the change in the detected physical quantity due to deformation of the elastic body and the contact force distribution. This paper shows through experiments that the proposed tactile sensor based on a robot frame is load tolerant enough to support the weight of a life-sized humanoid, and that it can acquire contact force distribution and the robot is able to acclimate to external forces.
|
|
09:00-10:00, Paper FrPI6T1.13 | |
Fly by Book: How to Train a Humanoid Robot to Fly an Airplane Using Large Language Models |
|
Kim, Hyungjoo | Korea Advanced Institute of Science and Technology (KAIST) |
Min, Sungjae | Korea Advanced Institute of Science and Technology (KAIST) |
Kang, Gyuree | Korea Advanced Institute of Science and Technology (KAIST) |
Kim, Jihyeok | Korea Advanced Institute of Science and Technology |
Shim, David Hyunchul | KAIST |
Keywords: Humanoid Robot Systems, Task Planning, Formal Methods in Robotics and Automation
Abstract: A pilot needs to manipulate various gadgets in the cockpit based on vast knowledge of rules and procedures while verbally communicating with air traffic controllers. While precision manipulation in the cockpit during the flight is already a difficult task, a far more difficult thing is how to make a robot learn all the knowledge needed to fly an airplane in accordance with all the rules and regulations. As a pioneering effort, this paper introduces LLM-PIBOT, which leverages the latest advances in Large Language Models (LLMs) to empower a humanoid pilot robot (PIBOT) to take the full authority of an airplane by understanding and executing complex procedures outlined in Pilot's Operating Handbooks (POHs). Unlike traditional rule-based methods, LLM-PIBOT system infers suitable flight procedures, employs an embedding process to accurately identify relevant procedures within documents, and structures the text-extracted flight tasks into tuples using our carefully crafted prompts. This approach enables PIBOT to adapt to the given POHs, generating and executing task plans in real-time in response to commands and situations. Experimental results show that LLM-PIBOT can comprehend and follow the complex procedures specified in the manuals and to fly the airplane on a full-scale simulator using the generated flight plans.
|
|
09:00-10:00, Paper FrPI6T1.14 | |
Towards Designing a Low-Cost Humanoid Robot with Flex Sensors-Based Movement |
|
Al Omoush, Muhammad H. | Dublin City University |
Kishore, Sameer | Middlesex University |
Mehigan, Tracey | Dublin City University |
Keywords: Humanoid Robot Systems, Education Robotics, Product Design, Development and Prototyping
Abstract: Humanoid robots have potential applications across diverse sectors, including education, healthcare, and customer service. This paper presents a project on designing and building a low-cost humanoid robot equipped with a flex sensor- based movement mechanism, highlighting its compatibility with Raspberry Pi and microcontrollers such as Arduino Uno and Nano. The project aims to investigate the robot's relevance and effectiveness within educational settings to showcase how a low- cost humanoid robot can potentially support the United Nations' fourth Sustainable Development Goal (UN SDG4) by improving access to quality education through innovative robotics solutions. The robot was tested in a cycle two school (covering Grades 5 to 8 (ages 10 to 13)) in Dubai, United Arab Emirates. It was integrated into math, science, and design technology classes to assess its functionality and efficiency. Surveys conducted among students and teachers showed a high level of acceptance towards the robot, with over 85% of respondents expressing positive attitudes about its presence and interaction in the classroom. However, teachers and students provided feedback concerning the robot's shape, capabilities, and movement mechanism. Teachers also appreciated the robot's alignment with the UN SDG4, stating its capability to support students learning and engagement. The authors highlighted the robot's potential to assist students with sensory challenges, such as hearing and vision impairments, and learning difficulties like dyslexia while emphasizing their commitment to enhancing its accessibility features for a more inclusive learning environment.
|
|
09:00-10:00, Paper FrPI6T1.15 | |
Driving Style Alignment for LLM-Powered Driver Agent |
|
Yang, Ruoxuan | Tsinghua University |
Zhang, Xinyue | Tsinghua University |
Fernandez-Laaksonen, Anais | Tsinghua University |
Ding, Xin | Tsinghua University |
Gong, Jiangtao | Tsinghua University |
Keywords: Humanoid Robot Systems, Intelligent Transportation Systems, AI-Based Methods
Abstract: Recently, LLM-powered driver agents have demonstrated considerable potential in the field of autonomous driving, showcasing human-like reasoning and decision-making abilities. However, current research on aligning driver agent behaviors with human driving styles remains limited, partly due to the scarcity of high-quality natural language data from human driving behaviors. To address this research gap, we propose a multi-alignment framework designed to align driver agents with human driving styles through demonstrations and feedback. Notably, we construct a natural language dataset of human driver behaviors through naturalistic driving experiments and post-driving interviews, offering high-quality human demonstrations for LLM alignment. The framework’s effectiveness is validated through simulation experiments in the CARLA urban traffic simulator and further corroborated by human evaluations. Our research offers valuable insights into designing driving agents with diverse driving styles. The implementation of the framework and details of the dataset can be found at the link.
|
|
09:00-10:00, Paper FrPI6T1.16 | |
From CAD to URDF: Co-Design of a Jet-Powered Humanoid Robot Including CAD Geometry |
|
Vanteddu, Punith Reddy | Istituto Italiano Di Tecnologia |
Nava, Gabriele | Istituto Italiano Di Tecnologia |
Bergonti, Fabio | Istituto Italiano Di Tecnologia |
L'Erario, Giuseppe | Istituto Italiano Di Tecnologia |
Paolino, Antonello | Istituto Italiano Di Tecnologia |
Pucci, Daniele | Italian Institute of Technology |
Keywords: Humanoid Robot Systems, Aerial Systems: Mechanics and Control, Methods and Tools for Robot System Design
Abstract: Co-design optimization strategies usually rely on simplified robot models extracted from CAD. While these models are useful for optimizing geometrical and inertial parameters for robot control, they might overlook important details essential for prototyping the optimized mechanical design. For instance, they may not account for mechanical stresses exerted on the optimized geometries and the complexity of assembly-level design. In this paper, we introduce a co-design framework aimed at improving both the control performance and mechanical design of our robot. Specifically, we identify the robot links that significantly influence control performance. The geometric characteristics of these links are parameterized and optimized using a multi-objective evolutionary algorithm to achieve optimal control performance. Additionally, an automated Finite Element Method (FEM) analysis is integrated into the framework to filter solutions not satisfying the required structural safety margin. We validate the framework by applying it to enhance the mechanical design for flight performance of the jet-powered humanoid robot iRonCub.
|
|
FrPI6T2 |
Room 2 |
Soft and Flexible Robotics II |
Teaser Session |
Chair: George Thuruthel, Thomas | University College London |
Co-Chair: Vazquez, Andres S. | Universidad De Castilla La Mancha |
|
09:00-10:00, Paper FrPI6T2.1 | |
Robust-Adaptive Two-Loop Control for Robots with Mixed Rigid-Elastic Joints |
|
Hua, Minh Tuan | University of Agder |
Sveen, Emil Mühlbradt | University of Agder |
Schlanbusch, Siri Marte | University of Agder |
Sanfilippo, Filippo | University of Agder |
Keywords: Modeling, Control, and Learning for Soft Robots, Soft Robot Applications, Dynamics
Abstract: In robotics, while rigid joints are common due to their accuracy and fast response ability, elastic joints are well-known for their safety when interacting with the environment. To harmonise the advantages of these joint types, robots with mixed rigid-elastic joints can be considered. In this paper, a robust-adaptive two-loop control algorithm is proposed to control this type of robot when there are uncertainties in system parameters. In the outer loop, a robust control algorithm is proposed to deal with the uncertainties in the dynamic parameters of the joint side, together with an adaptive controller for the rigid joints. In the inner loop, another robust control algorithm is proposed to handle the uncertainties in system parameters of the elastic joint’s motor side, and a similar adaptive control algorithm is presented to manipulate the elastic joints' motors. The stability of the system is assured by Lyapunov's stability theory. Finally, simulation experiments are conducted to verify the proposed control algorithm.
|
|
09:00-10:00, Paper FrPI6T2.2 | |
CFD-Enabled Approach for Optimizing CPG Control Network for Underwater Soft Robotic Fish |
|
Wang, Yunfei | Tsinghua University |
Sun, Weiyuan | Tsinghua University |
Tang, Wei | Tsinghua University |
Zhang, Xianrui | Tsinghua University |
Yu, Zhenping | Tsinghua University |
Cao, Shunxiang | Tsinghua University |
Qu, Juntian | Tsinghua University |
Keywords: Modeling, Control, and Learning for Soft Robots, Soft Robot Applications
Abstract: Central Pattern Generators (CPG) nonlinear oscillation network is being increasingly used in the control of multijoint collaborative robots. The motion attitude of robots can be effectively adjusted by tuning parameters of the CPG neural network. However, the mapping from CPG parameters to motion attitude is relatively complicated. To improve the motion performance, an optimization method combining computational fluid dynamics (CFD) and CPG network is proposed. In this work, we design a three-joint biomimetic soft robot fish following the body structure of trevally and an improved CPG network based on the Hopf model is incorporated into the control system. Directly optimizing the swimming performance through experiments is time consuming and complex, a mode of first adjusting parameters on the simulation platform and then refining on the robot is usually adopted. Therefore, a CFD simulation platform using hydrodynamic solutions has been established to assist in analyzing the swimming effect after parameters optimization. Finally, the experimental results show that the swimming effect simulated by the CFD simulation platform is highly similar to the real test, and the swimming performance after the improved CPG network optimization has been significantly increased.
|
|
09:00-10:00, Paper FrPI6T2.3 | |
Design, Modelling, and Experimental Validation of a Soft Continuum Wrist Section Developed for a Prosthetic Hand |
|
Sulaiman, Shifa | University of Naples, Federico II, Naples |
Menon, Mehul | NIT Durgapur |
Schetter, Francesco | University of Naples, Federico II, Naples |
Ficuciello, Fanny | Università Di Napoli Federico II |
Keywords: Modeling, Control, and Learning for Soft Robots, Soft Robot Materials and Design, Soft Robot Applications
Abstract: Soft continuum sections are widely used in robotic mechanisms for achieving dexterous motions. However, most available designs of soft continuum sections cannot support payload during a motion. This paper presents a development of a novel soft robotic wrist section for a prosthetic hand named ‘PRISMA Hand II’. Our research focuses on various phases of development of a soft continuum wrist section, that can support a substantial payload and maintain postures of the hand. Mechanical design, fabrication, and modelling strategies adopted for developing the wrist section are described. The design of the wrist section is constructed by assembling springs, discs, and tendons. The number and dimensions of springs and discs are optimised using static structural analysis. Kinematic modelling and dynamic modelling of the wrist section are carried out using Geometric Variable Strain (GVS) approach based on Cosserat rod theory and a generalised coordinate method respectively. The geometric formulations involved in Cosserat rod theory guaranteed accurate and quick computations considering deformation parameters. Dynamic modelling approach also enhanced performance of the wrist section reducing errors and computational time during real time implementations. This paper also discusses about a dynamic model based controller strategy for the wrist section and advantages of the proposed controller are proved using a comparative study with a kinematic model based PID controller. Experimental validations of motions of the fabricated wrist section employing the dynamic controller are also included in the paper.
|
|
09:00-10:00, Paper FrPI6T2.4 | |
Theoretical Modeling and Bio-Inspired Trajectory Optimization of a Multiple-Locomotion Origami Robot |
|
Zhu, Keqi | Zhejiang University |
Guo, Haotian | National University of Singapore |
Yu, Wei | Zhejiang University |
Sirag, Hassen Nigatu | ZJU |
Li, Tong | Zhejiang University |
Dong, Ruihong | Zhejiang University |
Dong, Huixu | Zhejiang University |
Keywords: Modeling, Control, and Learning for Soft Robots, Biomimetics, Soft Robot Applications
Abstract: Recent research on mobile robots has focused on increasing their adaptability to unpredictable and unstructured environments using soft materials and structures. However, the determination of key design parameters and control over these compliant robots are predominantly iterated through experiments, lacking a solid theoretical foundation. To improve their efficiency, this paper aims to provide mathematics modeling over two locomotion, crawling and swimming. Specifically, a dynamic model is first devised to reveal the influence of the contact surfaces’ frictional coefficients on displacements in different motion phases. Besides, a swimming kinematics model is provided using coordinate transformation, based on which, we further develop an algorithm that systematically plans human-like swimming gaits, with maximum thrust obtained. The proposed algorithm is highly generalizable and has the potential to be applied in other soft robots with similar multiple joints. Simulation experiments have been conducted to illustrate the effectiveness of the proposed modeling.
|
|
09:00-10:00, Paper FrPI6T2.5 | |
Fractional Order Modeling and Control of Hydrogel-Based Soft Pneumatic Bending Actuators |
|
de la Morena, Jesús | UCLM |
Redrejo López, David | Universidad De Castilla-La Mancha |
Ramos, Francisco | University of Castilla-La Mancha |
Feliu, Vicente | Escuela Técnica Superior De IngenierosIndustriales/Universidad D |
Vazquez, Andres S. | Universidad De Castilla La Mancha |
Keywords: Modeling, Control, and Learning for Soft Robots, Soft Robot Materials and Design
Abstract: Soft pneumatic bending actuators (SPBAs) are commonly employed in soft robotics due to their unique characteristics, including safety, low weight, speed, and load capacity. However, the combination of pneumatics with soft materials causes SPBAs to exhibit nonlinearities and infinite degrees of freedom, complicating their dynamic modeling. In this work, we present how the dynamics of SPBAs can be adjusted to a fractional order model (FOM), showing an approach for their empirical identification. We also present a method for designing fractional order controllers (FOCs) for this type of actuators, based on the inversion of the empirical FOM. This modeling and control is applied to a modular SPBA made of a smart hydrogel, which endows the actuators with self-healing, self-adhesion, and self-sensing capabilities.
|
|
09:00-10:00, Paper FrPI6T2.6 | |
A Soft Robotic System Automatically Learns Precise Agile Motions without Model Information |
|
Bachhuber, Simon | FAU Erlangen-Nürnberg |
Pawluchin, Alexander | Berliner Hochschule Für Technik |
Pal, Arka | Student |
Boblan, Ivo | Berliner Hochschule Fuer Technik |
Seel, Thomas | Leibniz Universität Hannover |
Keywords: Modeling, Control, and Learning for Soft Robots, Machine Learning for Robot Control
Abstract: Many application domains, e.g., in medicine and manufacturing, can greatly benefit from pneumatic Soft Robots (SRs). However, the accurate control of SRs has remained a significant challenge to date, mainly due to their nonlinear dynamics and viscoelastic material properties. Conventional control design methods often rely on either complex system modeling or time-intensive manual tuning, both of which require significant amounts of human expertise and thus limit their practicality. In recent works, the data-driven method, Automatic Neural ODE Control (ANODEC) has been successfully used to -- fully automatically and utilizing only input-output data -- design controllers for various nonlinear systems in silico, and without requiring prior model knowledge or extensive manual tuning. In this work, we successfully apply ANODEC to automatically learn to perform agile, non-repetitive reference tracking motion tasks in a real-world SR and within a finite time horizon. To the best of the authors' knowledge, ANODEC achieves, for the first time, performant control of a SR with hysteresis effects from only 30 seconds of input-output data and without any prior model knowledge. We show that for multiple, qualitatively different and even out-of-training-distribution reference signals, a single feedback controller designed by ANODEC outperforms a manually tuned PID baseline consistently. Overall, this contribution not only further strengthens the validity of ANODEC, but it marks an important step towards more practical, easy-to-use SRs that can automatically learn to perform agile motions from minimal experimental interaction time.
|
|
09:00-10:00, Paper FrPI6T2.7 | |
Human-Robot Interaction Control for Multi-Mode Exosuit with Reinforcement Learning |
|
Huang, Kaizhen | Nanjing University of Aeronautics and Astronautics |
Xu, Jiajun | Nanjing University of Aeronautics and Astronautics |
Zhang, Tianyi | Nanjing University of Aeronautics and Astronaut |
Zhao, Mengcheng | Nanjing University of Aeronautics and Astronautics |
Ji, Aihong | Nanjing University of Aeronautics Ans Astronautics |
Song, Guoli | Shenyang Institute of Automation, Chinese Academy of SciencesA |
Li, Y.F. | City University of Hong Kong |
Keywords: Modeling, Control, and Learning for Soft Robots, Dynamics, Intention Recognition
Abstract: Soft exoskeleton robots have promising potential in walking assistance with comfortable wearing experience. In this study, an exosuit equipped with a twisted string actuator (TSA) is developed to provide powerful driving force and diverse operating modes for hemiplegic patients in daily life. It is challenging to establish the human-robot coupling dynamic model due to the soft structure of the exosuit and tight coupling, precise control and effective assistance are difficult to guaranteed in current exosuits. Considering the impedance characteristics of human-robot interaction, an adaptive impedance control method based on reinforcement learning (RL) is proposed, where human motion intention is utilized to optimize impedance parameters and adjust the robot's operating mode. A nonlinear disturbance observer is proposed to compensate for the effects of model estimation errors, joint friction, and external disturbances. Experimental verification demonstrates the effectiveness and superiority of the robotic system.
|
|
09:00-10:00, Paper FrPI6T2.8 | |
Predicting Interaction Shape of Soft Continuum Robots Using Deep Visual Models |
|
Huang, Yunqi | University College London |
Alkayas, Abdulaziz Y. | Khalifa University |
Shi, Jialei | Imperial College London |
Renda, Federico | Khalifa University of Science and Technology |
Wurdemann, Helge Arne | University College London |
George Thuruthel, Thomas | University College London |
Keywords: Modeling, Control, and Learning for Soft Robots, Soft Robot Applications
Abstract: Soft continuum robots, characterized by their inherent compliance and dexterity, are increasingly pivotal in applications requiring delicate interactions with the environment such as the medical field. Despite their advantages, challenges persist in accurately modeling and controlling their shape during interactions with surrounding objects. This is because of the difficulty in modeling the large degrees of freedom in soft-bodied objects that become more active during interactions. In this study, we present a deep visual model to predict the interaction shapes of a soft continuum robot in contact with surrounding objects. By formulating this task as a forward-statics problem, the model uses the initial state images containing the object configuration and future actuation values to predict interactive state images of the robot under this actuation condition. We developed and tested the model in both simulated and physical environments, explored the model's predictive capabilities using monocular and binocular views, and tested the model's generalization ability on different datasets. Our results show that deep learning methods are a promising tool for solving the complex problem of predicting the shape of a soft continuum robot interacting with the environment, requiring no prior knowledge about the system dynamics and explicit mapping of the environment. This study paves the way for future explorations in robot-environment interaction modeling and the development of more adaptable interaction shape control strategies.
|
|
09:00-10:00, Paper FrPI6T2.9 | |
Learning Dynamic Tasks on a Large-Scale Soft Robot in a Handful of Trials |
|
Zwane, Sicelukwanda Njabuliso Tunner | University College London |
Cheney, Daniel G. | Brigham Young University |
Johnson, Curtis C | Brigham Young University |
Luo, Yicheng | UCL |
Bekiroglu, Yasemin | Chalmers University of Technology, University College London |
Killpack, Marc | Brigham Young University |
Deisenroth, Marc Peter | University College London |
Keywords: Modeling, Control, and Learning for Soft Robots, Soft Robot Applications, Machine Learning for Robot Control
Abstract: Soft robots offer more flexibility, compliance, and adaptability than traditional rigid robots. They are also typically lighter and cheaper to manufacture. However, their use in real-world applications is limited due to modeling challenges and difficulties in integrating effective proprioceptive sensors. Large-scale soft robots (approx two meters in length) have greater modeling complexity due to increased inertia and related effects of gravity. Common efforts to ease these modeling difficulties such as assuming simple kinematic and dynamics models also limit the general capabilities of soft robots and are not applicable in tasks requiring fast, dynamic motion like throwing and hammering. To overcome these challenges, we propose a data-efficient Bayesian optimization-based approach for learning control policies for dynamic tasks on a large-scale soft robot. Our approach optimizes the task objective function directly from commanded pressures, without requiring approximate kinematics or dynamics as an intermediate step. We demonstrate the effectiveness of our approach through both simulated and real-world experiments.
|
|
09:00-10:00, Paper FrPI6T2.10 | |
Design and Control of an Ultra-Slender Push-Pull Multisection Continuum Manipulator for In-Situ Inspection of Aeroengine |
|
Zhong, Weiheng | Beijing Institute of Technology |
Huang, Yuancan | Beijing Institute of Technology |
Hong, Da | Beijing Institute of Technology |
Shao, Nianfeng | Beijing Institute of Technology |
Keywords: Soft Robot Applications, Engineering for Robotic Systems, Manufacturing, Maintenance and Supply Chains
Abstract: Since the shape of industrial endoscopes is passively altered according to the contact around it, manual inspection approaches of aeroengines through the inspection ports have unreachable areas, and it's difficult to traverse multistage blades and inspect them simultaneously, which requires engine disassembly or the cooperation of multiple operators, resulting in efficiency decline and increased costs. To this end, this paper proposes a novel continuum manipulator with push-pull multisection structure which provides a potential solution for the disadvantages mentioned above due to its higher flexibility, passability, and controllability in confined spaces. The ultra-slender design combined with a tendon-driven mechanism makes the manipulator acquire enough workspace and more flexible postures while maintaining a light weight. Considering the coupling between the tendons in multisection, a innovative kinematics decoupling control method is implemented, which can realize real-time control in the case of limited computational resources. A prototype is built to validate the capabilities of mechatronic design and the performance of the control algorithm. The experimental results demonstrate the advantages of our continuum manipulator in the in-situ inspection of aeroengines' multistage blades, which has the potential to be a replacement solution for industrial endoscopes.
|
|
09:00-10:00, Paper FrPI6T2.11 | |
Origami Actuator with Tunable Limiting Layer for Reconfigurable Soft Robotic Grasping |
|
Yang, Yang | Nanjing University of Information Science and Technology |
Kejin, Zhu | Nanjing University of Information Science and Technology |
Xie, Yuan | Nanjing University of Information Science and Technology |
Yan, Shaoyang | Nanjing University of Information Science and Technology |
Yi, Juan | Southern University of Science and Technology |
Jiang, Pei | Chongqing University |
Li, Yunquan | South China University of Technology |
Zhang, Yazhan | Peng Cheng National Laboratory |
Li, Yingtian | Shenzhen Institutes of Advanced Technology, Chinese Academy of S |
Keywords: Soft Robot Materials and Design, Soft Robot Applications, Grippers and Other End-Effectors
Abstract: This paper presents a soft actuator inspired by origami and a tunable strain limiting layer, which is proposed for reconfigurable soft robotic grasping. Main structure of the actuator is based on Miura origami which generates extension under pressurized air while a limiting layer with tunable length enables the actuator with different motion patterns. By driving the limiting layer through a servo motor, the range of motion and trajectory of the actuator can be pre-programed and the gripper’s grasping range will be affected accordingly. This paper discusses the design, fabrication, analysis and experimental verification of the actuator. Then grasping performance of the gripper under objects of different shapes, sizes, and weights is experimentally evaluated. The reconfigurable soft gripper can be applied as an end-effector to accomplish adaptive grasping tasks with various targets.
|
|
09:00-10:00, Paper FrPI6T2.12 | |
A 'MAP' to Find High-Performing Soft Robot Designs: Traversing Complex Design Spaces Using MAP-Elites and Topology Optimization |
|
Xie, Yue | University of Cambridge |
Pinskier, Joshua | CSIRO |
Liow, Lois | CSIRO |
Howard, David | CSIRO |
Iida, Fumiya | University of Cambridge |
Keywords: Soft Robot Materials and Design, Perception for Grasping and Manipulation
Abstract: Soft robotics has emerged as the standard solution for grasping deformable objects, and has proven invaluable for mobile robotic exploration in extreme environments. However, despite this growth, there are no widely adopted computational design tools that produce quality, manufacturable designs. To advance beyond the diminishing returns of heuristic bio-inspiration, the field needs efficient tools to explore the complex, non-linear design spaces present in soft robotics, and find novel high-performing designs. In this work, we investigate a hierarchical design optimization methodology which combines the strengths of topology optimization and quality diversity optimization to generate diverse and high-performance soft robots by evolving the design domain. The method embeds variably sized void regions within the design domain and evolves their size and position, to facilitating a richer exploration of the design space and find a diverse set of high-performing soft robots. We demonstrate its efficacy on both benchmark topology optimization problems and soft robotic design problems, and show the method enhances grasp performance when applied to soft grippers. Our method provides a new framework to design parts in complex design domains, both soft and rigid.
|
|
09:00-10:00, Paper FrPI6T2.13 | |
Pneumatic Bladder Links with Wide Range of Motion Joints for Articulated Inflatable Robots |
|
Uchiyama, Katsu | Meiji University |
Niiyama, Ryuma | Meiji University |
Keywords: Soft Robot Materials and Design, Soft Robot Applications, Modeling, Control, and Learning for Soft Robots
Abstract: Exploration of various applications is the frontier of research on inflatable robots. We proposed an articulated robots consisting of multiple pneumatic bladder links connected by rolling contact joints called Hillberry joints. The bladder link is made of a double-layered structure of tarpaulin sheet and polyurethane sheet, which is both airtight and flexible in shape. The integration of the Hilberry joint into an inflatable robot is also a new approach. The rolling contact joint allows wide range of motion of ±150 degrees, the largest among the conventional inflatable joints. Using the proposed mechanism for inflatable robots, we demonstrated moving a 500 g payload with a 3-DoF arm and lifting 3.4 kg and 5 kg payloads with 2-DoF and 1-DoF arms, respectively. We also experimented with a single 3-DoF inflatable leg attached to a dolly to show that the proposed structure worked for legged locomotion.
|
|
09:00-10:00, Paper FrPI6T2.14 | |
Bistable Valve for Electronics-Free Soft Robots |
|
Kan, Longxin | National University of Singapore |
Lam, Jia Qing Joshua | National University of Singapore |
Qin, Zhihang | National University of Singapore |
Li, Keyi | National University of Singapore |
Tang, Zhiqiang | National University of Singapore |
Laschi, Cecilia | National University of Singapore |
Keywords: Soft Sensors and Actuators
Abstract: Recently, there has been a notable shift towards electronics-free designs, which offer promising integration possibilities with soft robots, reducing reliance on traditional electronics. Despite numerous demonstrations showcasing logical control, contact sensors, and gait control, the conventional quake valve-based design is gradually struggling to meet the demands of electronics-free soft robots with increasingly complex functionalities. Integrating multiple tubes, channels, and valves has led to larger and bulkier overall systems. In this study, we introduce a simple yet powerful electronics-free pneumatic valve that excels in various aspects: it allows for flexible function configurations (operating individually, in pairs, or in larger groups), offers high-frequency synchronous reverse outputs, stores valve status, and ensures efficient maintenance. We believe that this work lays the groundwork for developing straightforward yet highly effective fully autonomous soft robots.
|
|
09:00-10:00, Paper FrPI6T2.15 | |
A Facile One-Step Injection Novel Composite Sensor for Robot Tactile Assistance |
|
Zhang, Yuyin | Shanghai University |
Wang, Yue | Shanghai University |
Liu, Na | Shanghai University, Shanghai, China |
Zhong, Songyi | Shanghai University |
Li, Long | Shanghai University |
Xie, Xie | Shanghai University |
Zhang, Quan | Shanghai University |
Yue, Tao | Shanghai University |
Fukuda, Toshio | Nagoya University |
Keywords: Soft Sensors and Actuators, Force and Tactile Sensing
Abstract: Tactile information is the research hotspot of wearable flexible sensors due to its importance and complexity. With the innovation of wearable technology and robotics in healthcare, researchers are increasingly integrating wearable flexible sensors on the front end of robots to reproduce the hand tactile manipulation of human tissues. Therefore, it is hoped to develop a thin-film sensor that can be deployed in a small area to assist robots in surgery and data collection of human tissues. Here we use a one-step injection method to fabricate a novel composite sensor based on liquid metal. By laminating multiple PDMS microfluidic layers, the two parameters of pressure and deformation are measured simultaneously in a decoupled manner. The sensor is small and thin, making it easy to integrate into fingers/robot fingers for assistance. The finger/robot finger exerts pressure on the sensor and the sensor deforms with the material to identify the hardness of the material being touched. Separate performance tests of the two sensors show that the strain and pressure functions are decoupled from each other, and their ratios can identify and classify the hardness of different touched materials (glass, PDMS and silicone). This novel composite sensor we proposed can assist robots in manipulating human tissues during medical surgeries. At the same time, its function in tactile information feedback also has broad applications in medical treatment, rehabilitation and services.
|
|
09:00-10:00, Paper FrPI6T2.16 | |
Development of Permanent Magnet Elastomer-Based Tactile Sensor with Adjustable Compliance and Sensitivity |
|
Abhyankar, Devesh | Waseda University |
Wang, Yushi | Waseda University |
Iwamoto, Yuhiro | Nagoya Institute of Technology |
Sugano, Shigeki | Waseda University |
Kamezaki, Mitsuhiro | The University of Tokyo |
Keywords: Soft Sensors and Actuators, Soft Robot Materials and Design, Human-Robot Collaboration
Abstract: Tactile sensors are crucial in robotics as they enable robots to perceive and interact with their environment through touch, akin to the human sense of touch. Adjustable sensors that can adapt to various tasks by functional or structural modification have not been extensively explored. In terms of sensing adjustability of a sensor, two important aspects are the sensor’s sensitivity and compliance. This paper proposes a novel design for an adjustable compliance and sensitivity sensor composed of a silicone base, a permanent magnetic elastomer (PME), and a printed circuit board (PCB) with magnetic transducers installed. Its adjustability is achieved by varying the pneumatic pressure. This paper presents the design, manufacturing process, and experimental characterization of such an adjustable compliance and sensitivity sensor. This paper thoroughly investigates how altering the pressure of the sensor influences its sensing properties. The results show that it can achieve adjustability in all three axes. For the current design, the sensitivity can be varied from 0.093 to 0.125 mT/N (34.41%), 0.089 to 0.13 mT/N (31.54%), and 0.169 to 0.45 mT/N (62.44%) in the X-, Y-, and Z-axis, respectively. The deformation it undertakes varies from 3.20 to 3.79 mm (18.44%), indicating the compliance change.
|
|
FrPI6T3 |
Room 3 |
Cognitive Systems |
Teaser Session |
Chair: Roennau, Arne | Karlsruhe Institute of Technology (KIT) |
|
09:00-10:00, Paper FrPI6T3.1 | |
Online Hand Movement Recognition System with EEG-EMG Fusion Using One-Dimensional Convolutional Neural Network |
|
Wang, Haozheng | Nankai University |
Jia, Hao | University of Vic Central University of Catalonia |
Sun, Zhe | RIKEN |
Duan, Feng | Nankai University |
Keywords: Intention Recognition, Deep Learning Methods, Sensor-based Control
Abstract: Upper limb amputees face significant challenges in their daily lives due to the loss of hand or arm functionality. Researchers have developed upper limb prostheses to restore normal hand movements for them. Most hand movement recognition systems of prostheses use electromyography (EMG) as the input signal source, but ignore the interrelationship with electroencephalography (EEG), which may contain valuable movement-related information as well. In order to enhance the accuracy of hand movement classification, we proposed a hand movement recognition system based on a one-dimensional convolutional neural network (1D-CNN) that combines EEG and EMG as the input signal sources to increase the quantity of accessible information. In this work, we collected the EEG and EMG of five subjects during the hand movements and used a 1D-CNN based model to classify the preprocessed signals. The average accuracy of using EEG-EMG fusion is 96.59±2.63%, significantly higher than 74.99±8.24% of using single EEG and 90.31±7.16% of using single EMG. Then, we applied the model trained by offline experiment for online recognition, and controlled the Pepper robot to complete the corresponding hand movements. The average accuracy of online recognition can reach 93.00±4.85% by using majority voting method. The results indicate that the method of EEG-EMG fusion can effectively enhance the performance of hand movement recognition system, which promote the development of upper limb prostheses and contribute to the rehabilitation of upper limb amputees.
|
|
09:00-10:00, Paper FrPI6T3.2 | |
Goal Estimation-Based Adaptive Shared Control for Brain-Machine Interfaces Remote Robot Navigation |
|
Muraoka, Tomoka | Osaka University |
Aoki, Tatsuya | Osaka University |
Hirata, Masayuki | Osaka University |
Taniguchi, Tadahiro | Ritsumeikan University |
Horii, Takato | Osaka University |
Nagai, Takayuki | Osaka University |
Keywords: Brain-Machine Interfaces, Telerobotics and Teleoperation, Human Factors and Human-in-the-Loop
Abstract: To address these challenges, our method estimates the user’s intended goal from their commands and uses this goal to generate auxiliary commands through the autonomous system that are both at a higher input frequency and more continuous.Furthermore, by defining the confidence level of the estimation, we adaptively calculated the weights for combining user and autonomous commands, thus achieving shared control. We conducted navigation experiments in both simulated environments and participant experiments in real environments including user ratings, using a pseudo-BMI setup. As a result, the proposed method significantly reduced obstacle collisions in all experiments. It markedly shortened path lengths under almost all conditions in simulations and, in participant experiments, especially when user inputs become more discrete and noisy (p<0.01). Furthermore, under such challenging conditions, it was demonstrated that users could operate more easily, with greater confidence, and at a comfortable pace through this system.
|
|
09:00-10:00, Paper FrPI6T3.3 | |
Voltage Regulation in Polymer Electrolyte Fuel Cell Systems Using Gaussian Process Model Predictive Control |
|
Li, Xiufei | Lund University |
Yang, Miao | City University of HongKong |
Zhang, Miao | Tsinghua Shenzhen International Graduate School |
Qi, Yuanxin | Lund University |
Li, Zhuowei | The University of Nottingham, China |
Yu, Senbin | Gala Sports |
Yuantao, Wang | Beijing University of Technology;Fengtai Technology (Beij |
Shen, Linpeng | Tsinghua University |
Li, Xiang | Nantong University |
Keywords: Cognitive Control Architectures, Model Learning for Control, Intelligent Transportation Systems
Abstract: This study introduces a novel approach utilizing Gaussian process model predictive control (MPC) to stabilize the output voltage of a polymer electrolyte fuel cell (PEFC) system by simultaneously regulating hydrogen and airflow rates. Two Gaussian process models are developed to capture PEFC dynamics, taking into account constraints including hydrogen pressure and input change rates, thereby aiding in mitigating errors inherent to PEFC predictive control. The dynamic performance of the physical model and Gaussian process MPC in constraint handling and system inputs is compared and analyzed. Simulation outcomes demonstrate that the proposed Gaussian process MPC effectively maintains the voltage at the target 48 V while adhering to safety constraints, even amidst workload disturbances ranging from 110-120 A.
|
|
09:00-10:00, Paper FrPI6T3.4 | |
Roaming with Robots: Utilizing Artificial Curiosity in Global Path Planning for Autonomous Mobile Robots |
|
Spielbauer, Niklas | FZI Forschungszentrum Informatik |
Laube, Till Jasper | FZI Forschungszentrum Informatik |
Oberacker, David | FZI Forschungszentrum Informatik |
Roennau, Arne | Karlsruhe Institute of Technology (KIT) |
Dillmann, Rüdiger | FZI - Forschungszentrum Informatik - Karlsruhe |
Keywords: Cognitive Control Architectures, Biologically-Inspired Robots, Autonomous Agents
Abstract: Autonomous Mobile Robots are used with increasing frequency in inspection and maintenance tasks completing fixed goal sequences. The downtime robots experience between goals offers an opportunity to gather additional environment information instead of resting. Uncertainty in the amount of downtime available rules out the definition of a pre-determined schedule set by an external operator. Instead, the robot itself should decide dynamically, what information it should gather before its next task begins. This results in a multi-objective optimization problem trying to maximize information gain while utilizing as much of the available time as possible. We propose a genetic algorithm to solve the presented optimization problem and introduce two different models for artificial curiosity used inside the fitness function for gathering as much information as possible. For planning the genetic algorithm utilizes a multi-map approach using information and obstacle maps. We evaluated our models in a pre-defined and pre-mapped Gazebo environment with a given information map and evaluated their performance against an information-agnostic coverage algorithm. In this work, we show that utilizing artificial curiosity in path planning can result in major information gains by effectively using downtime.
|
|
09:00-10:00, Paper FrPI6T3.5 | |
ROBOVERINE: A Human-Inspired Neural Robotic Process Model of Active Visual Search and Scene Grammar in Naturalistic Environments |
|
Grieben, Raul | Ruhr-Universität Bochum |
Sehring, Stephan | Ruhr-Universität Bochum |
Tekülve, Jan | Ruhr-Universitaet Bochum |
Spencer, John P. | University of East Anglia |
Schöner, Gregor | Ruhr University Bochum |
Keywords: Cognitive Modeling, Neurorobotics, Bioinspired Robot Learning
Abstract: We present ROBOVERINE, a neural dynamic robotic active vision process model of selective visual attention and scene grammar in naturalistic environments. The model addresses significant challenges for cognitive robotic models of visual attention: combined bottom-up salience and top-down feature guidance, combined overt and covert attention, coordinate transformations, two forms of inhibition of return, finding objects outside of the camera frame, integrated space- and object-based analysis, minimally supervised few-shot continuous online learning for recognition and guidance templates, and autonomous switching between exploration and visual search. Furthermore, it incorporates a neural process account of scene grammar - prior knowledge about the relation between objects in the scene - to reduce the search space and increase search efficiency. The model also showcases the strength of bridging two frameworks: Deep Neural Networks for feature extractions and Dynamic Field Theory for cognitive operations.
|
|
09:00-10:00, Paper FrPI6T3.6 | |
Interactive Reinforcement Learning from Natural Language Feedback |
|
Tarakli, Imene | Sheffield Hallam University |
Vinanzi, Samuele | Sheffield Hallam University |
Di Nuovo, Alessandro | Sheffield Hallam University |
Keywords: Human Factors and Human-in-the-Loop, Reinforcement Learning, Cognitive Modeling
Abstract: Large Language Models (LLMs) are increasingly influential in advancing robotics. This paper introduces ECLAIR (Evaluative Corrective Guidance Language as Reinforcement), a novel framework that leverages LLMs to interpret and incorporate diverse natural language feedback into robotic learning. ECLAIR unifies various forms of human advice into actionable insights within a Reinforcement Learning context, enabling more efficient robot instruction. Experiments with real-world users demonstrate that ECLAIR accelerates the robot's learning process, aligning its policy closer to optimal from the outset and reducing the need for extensive human intervention. Additionally, ECLAIR effectively integrates multiple types of advice and adapts well to prompt modifications. It also supports multilingual instruction, broadening its applicability and fostering more inclusive human-robot interactions. Project website: https://sites.google.com/view/eclairiros
|
|
09:00-10:00, Paper FrPI6T3.7 | |
Synthetic Dataset Using Diffusion Model for Pixel-Level Dense Pose Estimation |
|
Wen, Jiaixiao | South China University of Technology |
Liu, Qiong | South China University of Technology |
|
09:00-10:00, Paper FrPI6T3.8 | |
Contacts from Motion: Learning Discrete Features for Automatic Contact Detection and Estimation from Human Movements |
|
Miyake, Hibiki | Tokyo University of Science |
Ayusawa, Ko | National Institute of Advanced Industrial Science and Technology |
Sagawa, Ryusuke | National Institute of Advanced Industrial Science AndTechnology |
Yoshida, Eiichi | Faculty of Advanced Engineering, Tokyo University of Science |
Keywords: Human and Humanoid Motion Analysis and Synthesis, Modeling and Simulating Humans, Human-Centered Robotics
Abstract: This paper presents a novel method for detecting and estimating contact forces only from human motions using machine learning techniques. Knowing the location of the contacts with the environment and the magnitude of the exerted force is critical for dynamic human motion analysis. However, their annotation is usually made manually from captured motion data especially in case of multiple contacts even if the data includes force measurement. Moreover, most existing human motion datasets do not include contact force. To overcome these bottlenecks, we introduce a network that leverages vector-quantized variational autoencoder (VQ-VAE) and self-attention that learns a small set of discrete feature values representing various contact states. These feature values, called contact codes, allow human motions to be converted to contact states and resulting forces. By applying an optimization for contact estimation with a reduced set of manual annotations, the existence of contacts can be automatically determined, which is essential information for dynamic analysis. We validated the effectiveness and potential usefulness of the proposed method with a human walking gait dataset, by converting the human motions into contact sequences and forces and applying the estimated contacts to dynamic motion analysis.
|
|
09:00-10:00, Paper FrPI6T3.9 | |
Dual-Branch Graph Transformer Network for 3D Human Mesh Reconstruction from Video |
|
Tang, Tao | Peking University |
Liu, Hong | Peking University |
You, Yingxuan | Peking University |
Wang, Ti | Peking University |
Li, Wenhao | Peking University |
Keywords: Human and Humanoid Motion Analysis and Synthesis, Modeling and Simulating Humans, Human Detection and Tracking
Abstract: Human mesh reconstruction (HMR) from monocular video plays an important role in human-robot interaction and collaboration. However, existing video-based human mesh reconstruction methods face a tradeoff between accurate reconstruction and smooth motion. These methods design networks based on either RNNs or attention mechanisms to extract local temporal correlations or global temporal dependencies, but the lack of complementary longterm information and local details limits their representation of the human body. To address this problem, we propose a Dual-branch Graph Transformer network for 3D human mesh Reconstruction from video, named DGTR. DGTR employs a dual-branch network including a Global Motion Attention (GMA) branch and a Local Details Refine (LDR) branch to parallelly extract long-term dependencies and local crucial information, helping model global human motion and local human details (e.g., local motion, tiny movement). Specifically, GMA utilizes a global transformer to model long-term human motion. LDR combines modulated graph convolutional networks and the transformer framework to aggregate local information in adjacent frames and extract crucial information of human details. Experiments demonstrate that our DGTR outperforms state-of-the-art video-based methods in reconstruction accuracy and maintains competitive motion smoothness. Moreover, DGTR utilizes fewer parameters and FLOPs, which validate the effectiveness and efficiency of the proposed DGTR. Code is publicly available at https://github.com/TangTao-PKU/DGTR.
|
|
09:00-10:00, Paper FrPI6T3.10 | |
Predicting Long-Term Human Behaviors in Discrete Representations Via Physics-Guided Diffusion |
|
Zhang, Zhitian | Simon Fraser University |
Li, Anjian | Princeton University |
Lim, Angelica | Simon Fraser University |
Chen, Mo | Simon Fraser University |
Keywords: Human and Humanoid Motion Analysis and Synthesis, Intention Recognition, Modeling and Simulating Humans
Abstract: Long-term human trajectory prediction is a challenging yet critical task in robotics and autonomous systems. Prior work that studied how to predict accurate short-term human trajectories with only unimodal features often failed in long-term prediction. Reinforcement learning provides a good solution for learning human long-term behaviors but can suffer from challenges in data efficiency and optimization. In this work, we propose a long-term human trajectory forecasting framework that leverages a guided diffusion model to generate diverse long-term human behaviors in a high-level latent action space, obtained via a hierarchical action quantization scheme using a VQ-VAE to discretize continuous trajectories and the available context. The latent actions are predicted by our guided diffusion model, which uses physics-inspired guidance at test time to constrain generated multimodal action distributions. Specifically, we use reachability analysis during the reverse denoising process to guide the diffusion steps toward physically feasible latent actions. We evaluate our framework on two publicly available human trajectory forecasting datasets: SFU-Store-Nav and JRDB, and extensive experimental results show that our framework achieves superior performance in long-term human trajectory forecasting.
|
|
09:00-10:00, Paper FrPI6T3.11 | |
Multi-View 2D to 3D Lifting Video-Based Optimization: A Robust Approach for Human Pose Estimation with Occluded Joint Prediction |
|
Rato, Daniela | University of Aveiro, Institute of Electronics and Informatics E |
Oliveira, Miguel | University of Aveiro |
Santos, Vitor | University of Aveiro |
Sappa, Angel | Computer Vision Center |
Raducanu, Bogdan | Computer Vision Center |
Keywords: Human Detection and Tracking, Human-Robot Collaboration, Computer Vision for Manufacturing
Abstract: In the context of robotics, accurate 3D human pose estimation is essential for enhancing human-robot collaboration and interaction. This manuscript introduces a multi-view 2D to 3D lifting optimization-based method designed for video-based 3D human pose estimation, incorporating temporal information. Our technique addresses key challenges, namely robustness to 2D joint detection error, occlusions, and varying camera perspectives. We evaluate the performance of the algorithm through extensive experiments on the MPI-INF-3DHP dataset. Our method demonstrates very good robustness up to 25 pixels of 2D joint error and shows resilience in scenarios involving several occluded joints. Comparative analyses against existing 2D to 3D lifting and multi-view methods showcase good performance of our approach.
|
|
09:00-10:00, Paper FrPI6T3.12 | |
Can Reasons Help Improve Pedestrian Intent Estimation? a Cross-Modal Approach |
|
Khindkar, Vaishnavi | IIIT Hyderabad |
Balasubramanian, Vineeth | Indian Institute of Technology, Hyderabad |
Arora, Chetan | Indian Institute of Technology, Delhi |
Subramanian, Anbumani | Intel / IIIT-Hyderabad |
Jawahar, C.V. | IIIT, Hyderabad |
Keywords: Intention Recognition
Abstract: With the increased importance of autonomous navigation systems has come an increasing need to protect the safety of Vulnerable Road Users (VRUs) such as pedestrians. Predicting pedestrian intent is one such challenging task, where prior work predicts the binary cross/no-cross intention with a fusion of visual and motion features. However, there has been no effort so far to hedge such predictions with human-understandable reasons. We address this issue by introducing a novel problem setting of exploring the intuitive reasoning behind a pedestrian’s intent. In particular, we show that predicting the ‘WHY’ can be very useful in understanding the ‘WHAT’. To this end, we propose a novel, reason-enriched PIE++ dataset consisting of multi-label textual explanations/reasons for pedestrian intent. (Explanations, in our context, refers to the interpretation of pedestrian intent and not model interpretability.) We also introduce a novel multi-task learning framework called MINDREAD, which leverages a cross-modal representation learning framework for predicting pedestrian intent as well as the reason behind the intent. Our comprehensive experiments show significant improvement of 5.6% and 7% in accuracy and F1-score for the task of intent prediction on the PIE++ dataset using MINDREAD. We also achieved a 4.4% improvement in accuracy on a commonly used JAAD dataset. Extensive evaluation using quantitative/qualitative metrics and user studies show the effectiveness of our approach
|
|
09:00-10:00, Paper FrPI6T3.13 | |
Enhanced Robotic Assistance for Human Activities through Human-Object Interaction Segment Prediction |
|
Wu, Yuankai | TUM |
Messaoud, Rayene | TUM |
Hildebrandt, Arne-Christoph | Technische Universität München |
Baldini, Marco | ABB AG |
Salihu, Driton | Technical University Munich |
Patsch, Constantin | Technical University of Munich |
Steinbach, Eckehard | Technical University of Munich |
Keywords: Human-Centered Automation, Intention Recognition, Humanoid Robot Systems
Abstract: Robotic assistance is a current research topic with high application value and multiple challenges. Assistive robots are used in various scenarios, such as production lines, operating tables, and elderly care. While providing effective assistance, most of the assistance tasks that current robots can perform are limited to predefined tasks. This limitation arises from the insufficiency of the current robot perception system to forecast future human activities. To address this issue, we propose a novel 2-stage robotic assistant for human activities through future human-object interaction (HOI) segment prediction. Unlike previous work focusing on predefined or short-term tasks, our robotic assistant can make predictions for future assistance according to human habits. In the first stage, we propose a visual-based human-object interaction segment prediction method to predict human activities, which enables the robotic system to infer human intention. Moreover, we define the robotic executable tasks as an interactive tuple to keep the robotic assistance normatively consistent with human activity. Meanwhile, a graph convolutional network with geometric features that can predict human-object interaction segments is proposed to provide target manipulation and target object for the assistive robot. In the second stage, we present a mobile task completion process including visual navigation, object localization and grasping. The perception stage is evaluated on the MPHOI dataset and custom-collected SPHOI dataset. Finally, we evaluate our comprehensive framework through real-time experimentation.
|
|
09:00-10:00, Paper FrPI6T3.14 | |
Aligning Learning with Communication in Shared Autonomy |
|
Hoegerman, Joshua | Virginia Polytechnic Institute and State University |
Sagheb, Shahabedin | Virginia Tech |
Christie, Benjamin | Virginia Tech |
Losey, Dylan | Virginia Tech |
Keywords: Human-Robot Collaboration, Telerobotics and Teleoperation, Intention Recognition
Abstract: Assistive robot arms can help humans by partially automating their desired tasks. Consider an adult with motor impairments controlling an assistive robot arm to eat dinner. The robot can reduce the number of human inputs — and how precise those inputs need to be — by recognizing what the human wants (e.g., a fork) and assisting for that task (e.g., moving towards the fork). Prior research has largely focused on learning the human’s task and providing meaningful assistance. But as the robot learns and assists, we also need to ensure that the human understands the robot’s intent (e.g., does the human know the robot is reaching for a fork?). In this paper, we study the effects of communicating learned assistance from the robot back to the human operator. We do not focus on the specific interfaces used for communication. Instead, we develop experimental and theoretical models of a) how communication changes the way humans interact with assistive robot arms, and b) how robots can harness these changes to better align with the human’s intent. We first conduct online and in-person user studies where participants operate robots that provide partial assistance, and we measure how the human’s inputs change with and without communication. With communication, we find that humans are more likely to intervene when the robot incorrectly predicts their intent, and more likely to release control when the robot correctly understands their task. We then use these findings to modify an established robot learning algorithm so that the robot can correctly interpret the human’s inputs when communication is present. Our results from a second in-person user study suggest that this combination of communication and learning outperforms assistive systems that isolate either learning or communication.
|
|
09:00-10:00, Paper FrPI6T3.15 | |
Learned Sensor Fusion for Robust Human Activity Recognition in Challenging Environments |
|
Conway, Max | University of Denver |
Reily, Brian | Army Research Laboratory |
Reardon, Christopher M. | University of Denver |
Keywords: Multi-Modal Perception for HRI, RGB-D Perception, Data Sets for Robotic Vision
Abstract: Human activity recognition is a vital area of robotics with significant real-world applications, from enhancing security and surveillance to improving healthcare and human-robot interaction. A critical challenge lies in bridging the gap between research models, which often assume ideal conditions, and the complexities of real-world environments. In practice, conditions can be far from perfect, including scenarios with poor lighting, adverse weather, or blurred views. In this paper, we present an innovative approach for robust activity recognition through learned sensor fusion, in which our recognition framework identifies a latent weighted combination of input modalities, enabling classifiers to capitalize on advantages provided by various sensors. In support of our work, we have released a dataset of human activities across multiple modalities with environmental degradation factors such as darkness, fog, and thermal blur. Our proposed approach identifies a weighted combination of modality representations derived from existing architectures. We show that our approach is able to achieve 24% higher classification performance than existing single-modality approaches. Our approach also attains comparable performance to modality fusion approaches in significantly reduced classification time. In real-world robotics applications, particularly those occurring in dangerous, degraded environments, this speed is critical.
|
|
09:00-10:00, Paper FrPI6T3.16 | |
Human Orientation Estimation under Partial Observation |
|
Zhao, Jieting | Southern University of Science and Technology |
Ye, Hanjing | Southern University of Science and Technology |
Zhan, Yu | Southern University of Science and Technology |
Luan, Hao | National University of Singapore |
Zhang, Hong | SUSTech |
Keywords: Robot Companions, Human-Centered Automation, Human Detection and Tracking
Abstract: Reliable Human Orientation Estimation (HOE) from a monocular image is critical for autonomous agents to understand human intention. Significant progress has been made in HOE under full observation. However, the existing methods easily make a wrong prediction under partial observation and give it an unexpectedly high confidence. To solve the above problems, this study first develops a method called Part-HOE that estimates orientation from the visible joints of a target person so that it is able to handle partial observation. Subsequently, we introduce a confidence-aware orientation estimation method, enabling more accurate orientation estimation and reasonable confidence estimation under partial observation. The effectiveness of our method is validated on both public and custom-built datasets, and it shows great accuracy and reliability improvement in partial observation scenarios. In particular, we show in real experiments that our method can benefit the robustness and consistency of the Robot Person Following (RPF) task.
|
|
FrPI6T4 |
Room 4 |
Robot Vision IV |
Teaser Session |
Chair: Tanaka, Kanji | University of Fukui |
Co-Chair: Patel, Amir | University of Cape Town |
|
09:00-10:00, Paper FrPI6T4.1 | |
An Ultrafast Multi-Object Zooming System Based on Low-Latency Stereo Correspondence |
|
Li, Qing | Tsinghua University |
Hu, Shaopeng | Hiroshima University |
Shimasaki, Kohei | Hiroshima University |
Ishii, Idaku | Hiroshima University |
Keywords: Surveillance Robotic Systems, Visual Tracking
Abstract: In this paper, we develop a multiple-object zoom- ing system which can capture clear images of different objects at an ultrafast speed. The system consists of a panoramic HFR stereo camera and a galvanometer-based reflective pan- tilt-zoom (PTZ) camera. In order to alleviate the impact of brightness, noise, and viewing angle in the image, we use the motion information of the object for stereo correspondence. According to the spatial positions of all objects obtained from HFR stereo correspondence, we can obtain the control voltage of the pan and tilt mirror of the galvanometer-based reflective PTZ camera through the mapping relationship. Then, PTZ camera captures clear images of multiple objects in a time-division multiplexed manner at an extremely fast speed. Experimental results show that we can distinguish multiple fast- moving people indoors in the HFR stereo camera and capture their high-definition facial images simultaneously.
|
|
09:00-10:00, Paper FrPI6T4.2 | |
Enhanced Model Robustness to Input Corruptions by Per-Corruption Adaptation of Normalization Statistics |
|
Camuffo, Elena | University of Padova |
Michieli, Umberto | Samsung Research |
Milani, Simone | University of Padova |
Moon, Jijoong | Samsung Research Korea |
Ozay, Mete | Samsung Research |
Keywords: Computer Vision for Automation, Vision-Based Navigation, Semantic Scene Understanding
Abstract: Developing a reliable vision system is a fundamental challenge for robotic technologies (e.g., indoor service robots and outdoor autonomous robots) which can ensure reliable navigation even in challenging environments such as adverse weather conditions (e.g., fog, rain), poor lighting conditions (e.g., over/under exposure), or sensor degradation (e.g., blurring, noise), and can guarantee high performance in safety-critical functions. Current solutions proposed to improve model robustness usually rely on generic data augmentation techniques or employ costly test-time adaptation methods. In addition, most approaches focus on addressing a single vision task (typically, image recognition) utilising synthetic data. In this paper, we introduce Per-corruption Adaptation of Normalization statistics (PAN) to enhance the model robustness of vision systems. Our approach entails three key components: (i) a corruption type identification module, (ii) dynamic adjustment of normalization layer statistics based on identified corruption type, and (iii) real-time update of these statistics according to input data. PAN can integrate seamlessly with any convolutional model for enhanced accuracy in several robot vision tasks. In our experiments, PAN obtains robust performance improvement on challenging real-world corrupted image datasets (e.g., OpenLoris, ExDark, ACDC), where most of the current solutions tend to fail. Moreover, PAN outperforms the baseline models by 20-30% on synthetic benchmarks in object recognition tasks.
|
|
09:00-10:00, Paper FrPI6T4.3 | |
A Low-Cost, High-Speed, and Robust Bin Picking System for Factory Automation Enabled by a Non-Stop, Multi-View, and Active Vision Scheme |
|
Fu, Xingdou | OMRON Corporation |
Miao, Lin | Omron Corporation |
Ohnishi, Yasuhiro | OMRON Corporation |
Hasegawa, Yuki | OMRON Corporation |
Suwa, Masaki | OMRON Corporation |
Keywords: Computer Vision for Automation, Perception for Grasping and Manipulation, Computer Vision for Manufacturing
Abstract: Bin picking systems in factory automation usually face robustness issues caused by sparse and noisy 3D data of metallic objects. Utilizing multiple views, especially with a one-shot 3D sensor and “sensor on hand” configuration is getting more popularity due to its effectiveness, flexibility, and low cost. While moving the 3D sensor to acquire multiple views for 3D fusion, joint optimization, or active vision suffers from low-speed issues. That is because sensing is taken as a decoupled module from motion tasks and is not intentionally designed for a bin picking system. To address the problems, we designed a bin picking system, which tightly couples a multi-view, active vision scheme with motion tasks in a “sensor on hand” configuration. It not only speeds up the system by parallelizing the high-speed sensing scheme to the robot place action but also decides the next sensing path to maintain the continuity of the whole picking process. Unlike others focusing only on sensing evaluation, we also evaluated our design by picking experiments on 5 different types of objects without human intervention. Our experiments show the whole sensing scheme can be finished within 1.682 seconds (maximum) on CPU and the average picking complete rate is over 97.75%. Due to the parallelization with robot motion, the sensing scheme accounts for only 0.635 seconds in takt time on average.
|
|
09:00-10:00, Paper FrPI6T4.4 | |
Every Dataset Counts: Scaling up Monocular 3D Object Detection with Joint Datasets Training |
|
Ma, Fulong | The Hong Kong University of Science and Technology |
Yan, Xiaoyang | The Hong Kong University of Science and Technology |
Zhao, Guoyang | HKUST(GZ) |
Xu, Xiaojie | The Hong Kong University of Science and Technology(Guangzhou) |
Liu, Yuxuan | Hong Kong University of Science and Technology |
Ma, Jun | The Hong Kong University of Science and Technology |
Liu, Ming | Hong Kong University of Science and Technology (Guangzhou) |
Keywords: Computer Vision for Automation, Object Detection, Segmentation and Categorization, Deep Learning for Visual Perception
Abstract: Monocular 3D object detection plays a crucial role in autonomous driving. However, existing monocular 3D detection algorithms depend on 3D labels derived from LiDAR measurements, which are costly to acquire for new datasets and challenging to deploy in novel environments. Specifically, this study investigates the pipeline for training a monocular 3D object detection model on a diverse collection of 3D and 2D datasets. The proposed framework comprises three components: (1) a robust monocular 3D model capable of functioning across various camera settings, (2) a selective-training strategy to accommodate datasets with differing class annotations, and (3) a pseudo 3D training approach using 2D labels to enhance detection performance in scenes containing only 2D labels. With this framework, we could train models on a joint set of various open 3D/2D datasets to obtain models with significantly stronger generalization capability and enhanced performance on new dataset with only 2D labels. We conduct extensive experiments on KITTI, nuScenes, ONCE, Cityscapes and BDD100K datasets to demonstrate the scaling ability of the proposed method.
|
|
09:00-10:00, Paper FrPI6T4.5 | |
Direct TPS-Based 3D Non-Rigid Motion Estimation on 3D Colored Point Cloud in Eye-In-Hand Configuration |
|
Cuau, Lénaïc | LIRMM |
Cavalcanti Santos, Joao | University of Montpellier, LIRMM |
Poignet, Philippe | LIRMM University of Montpellier CNRS |
Zemiti, Nabil | LIRMM, Université Montpellier II - CNRS UMR 5506 |
Keywords: Computer Vision for Automation, Visual Tracking, RGB-D Perception
Abstract: In this paper, a method for 3D non-rigid motion estimation of a surface using an RGB-D camera in eye-in-hand configuration is presented. The eye-in-hand configuration eliminates errors typically associated with camera-end-effector calibration, and is thus desirable for task on moving surfaces such as bioprinting. However, its implementation is challenging since camera and surface of interest are moving, making mesh-based approaches unsuitable. Thus, the proposed method operates directly on point clouds, benefiting from accurate and simplified data processing. A point cloud contains both intensity and depth data, with the former used to estimate in-plane deformation and the latter to compute full 3D deformation. Surface deformation is modeled via a Thin Plate Spline model. The method accuracy is assessed at 0.1 mm accuracy in simulated datasets, rendering it suitable for precision tasks, and its feasibility is validated experimentally on a moving platform that deforms at a rate of 0.8 Hz with a 4 mm in-plane amplitude and a 20 mm elevation amplitude.
|
|
09:00-10:00, Paper FrPI6T4.6 | |
OW3Det: Toward Open-World 3D Object Detection for Autonomous Driving |
|
Hu, Wenfei | Peking University |
Lin, Weikai | Peking University |
Fang, Hongyu | Peking University, Beijing, China |
Wang, Yi | Tsinghua University |
Luo, Dingsheng | Peking University |
Keywords: Computer Vision for Automation, Computer Vision for Transportation, Object Detection, Segmentation and Categorization
Abstract: Despite their success in LIDAR object detection, modern detectors are vulnerable to uncommon instances and corner cases (e.g., a runaway tire) since they are closed-set and static. Networks under the closed-set setup only predict labels of seen classes, while static models suffer from catastrophic forgetting when gradually learning novel concepts. This motivates us to formulate the open-world 3D object detection task for autonomous driving, which aims to 1) tackle the closed-set issue by identifying unseen instances as unknown and 2) incrementally learn novel classes without forgetting previously obtained knowledge. To achieve the open-world objectives, we propose Open-World 3D Detector (OW3Det), the first framework for open-world 3D object detection. The OW3Det comprises a base detector, a self-supervised unknown identifier, and a knowledge-distillation-restricted incremental learner. Although knowledge distillation facilitates preserving memories, imposing penalties on areas containing unknown objects hinders the incremental learning process. We mitigate this hindrance by employing unknown-driven pivotal mask, which eliminates unnecessary restrictions on regions overlapping with novel instances. Abundant experiments and visualizations demonstrate that the proposed OW3Det attains state-of-the-art performance.
|
|
09:00-10:00, Paper FrPI6T4.7 | |
FoveaCam++: Systems-Level Advances for Long Range Multi-Object High-Resolution Tracking |
|
Zhang, Yuxuan | University of Florida |
Koppal, Sanjeev | University of Florida |
Keywords: Computer Vision for Automation, Hardware-Software Integration in Robotics
Abstract: UAVs and other fast moving robots often need to keep track of distant objects. Conventional zoom cameras commit to a particular viewpoint, and carrying multiple zoom cameras for multi-object tracking is not feasible for power limited robotic systems. We present a dual camera setup that allows tracking of multiple targets at nearly 1km distance with high-resolution. Our setup includes a wide angle camera providing a conventional resolution view and a MEMS driven zoom camera that can query a specific region within the wide angle camera (WAC). We built and calibrated the two-camera system and implemented a real-time image fusion pipeline. We show multi-object tracking and stabilization in real world scenarios.
|
|
09:00-10:00, Paper FrPI6T4.8 | |
Robot Traversability Prediction: Towards Third-Person-View Extension of Walk2Map with Photometric and Physical Constraints |
|
Tay Yu Liang, Jonathan | University of Fukui |
Tanaka, Kanji | University of Fukui |
Keywords: Computer Vision for Automation, Mapping, SLAM
Abstract: Walk2Map has emerged as a promising data-driven method to generate indoor traversability maps based solely on pedestrian trajectories, offering great potential for indoor robot navigation. In this study, we investigate a novel approach, referred to as Walk2Map++, which involves replacing Walk2Map’s first-person sensor (i.e., IMU) with a human observing third-person view from the robot’s onboard camera. However, human observation from a third-person camera is significantly ill-posed due to visual uncertainties resulting from occlusion, nonlinear perspective, depth ambiguity, and human-to-human interaction. To regularize the ill-posedness, we propose to integrate two types of constraints: photometric (i.e., occlusion ordering) and physical (i.e., collision avoidance). We demonstrate that these constraints can be effectively inferred from the interaction between past and present observations, human trackers, and object reconstructions. We depict the seamless integration of asynchronous map optimization events, like loop closure, into the real-time traversability map, facilitating incremental and efficient map refinement. We validate the efficacy of our enhanced methodology through rigorous fusion and comparison with established techniques, demonstrating its capability to advance traversability prediction in complex indoor environments. The code and datasets associated with this study will be made publicly available upon acceptance, facilitating further research and adoption in the field.
|
|
09:00-10:00, Paper FrPI6T4.9 | |
Click to Grasp: Zero-Shot Precise Manipulation Via Visual Diffusion Descriptors |
|
Tsagkas, Nikolaos | University of Edinburgh |
Rome, Jack A | University of Edinburgh |
Ramamoorthy, Subramanian | The University of Edinburgh |
Mac Aodha, Oisin | University of Edinburgh |
Lu, Chris Xiaoxuan | University College London |
Keywords: Computer Vision for Automation, Perception for Grasping and Manipulation
Abstract: Precise manipulation that is generalizable across scenes and objects remains a persistent challenge in robotics. Current approaches for this task heavily depend on having a significant number of training instances to handle objects with pronounced visual and/or geometric part ambiguities. Our work explores the grounding of fine-grained part descriptors for precise manipulation in a zero-shot setting by utilizing web-trained text-to-image diffusion-based generative models. We tackle the problem by framing it as a dense semantic part correspondence task. Our model returns a gripper pose for manipulating a specific part, using as reference a user-defined click from a source image of a visually different instance of the same object. We require no manual grasping demonstrations as we leverage the intrinsic object geometry and features. Practical experiments in a real-world tabletop scenario validate the efficacy of our approach, demonstrating its potential for advancing semantic-aware robotics manipulation.
|
|
09:00-10:00, Paper FrPI6T4.10 | |
Refining Airway Segmentation through Breakage Filling and Leakage Reduction Using Point Clouds |
|
Hu, Yan | University of New South Wales |
Meijering, Erik | University of New South Wales |
Song, Yang | University of New South Wales |
Keywords: Computer Vision for Automation, Computer Vision for Medical Robotics, Object Detection, Segmentation and Categorization
Abstract: Bronchoscopy reveals air passages and internal tissues for accurate diagnosis of various lung diseases. Robot-assisted bronchoscopy using an airway tree model can help path planning before surgery and navigation during surgery. In airway tree modeling, though volumetric deep learning methods have achieved good performance for airway segmentation, it remains a challenge due to the breakages and leakages. Some existing methods adopt post-processing using traditional methods like morphological and fuzzy connected algorithms. Also, some methods convert the volumetric data to point cloud format to refine segmentation. In this paper, we develop a new point cloud-based approach to refine volumetric segmentation. To address the breakage issue, we approach it as a regression problem of the branch extension direction and length. To tackle the leakage issue, we approach it as a segmentation task to eliminate leakages caused by breakage filling and from volumetric segmentation. Moreover, the direction information of branches is crucial for constructing the airway tree while point clouds do not naturally encode it. To introduce this information, we propose a directional feature aggregation, which first decomposes features of neighboring points based on their locations and aggregates decomposed features to aid the network in capturing the directional information effectively. Our proposed model has been evaluated on two public datasets, and the results show that our refinement can improve the volumetric segmentation.
|
|
09:00-10:00, Paper FrPI6T4.11 | |
Differentiable Fluid Physics Parameter Identification by Stirring and for Stirring |
|
Xu, Wenqiang | Shanghai Jiaotong University |
Zheng, Dongzhe | Shanghai Jiao Tong University |
Li, Yutong | Shanghai Jiao Tong University |
Ren, Jieji | Shanghai Jiao Tong University |
Lu, Cewu | ShangHai Jiao Tong University |
Keywords: Computer Vision for Automation, Simulation and Animation, Calibration and Identification
Abstract: Fluid interactions are crucial in daily tasks, with properties like density and viscosity being key parameters. The property states can be used as control signals for robot operation. While density estimation is simple, assessing viscosity, especially for different fluid types, is complex. This study introduces a novel differentiable fitting framework, DiffStir, tailored to identify key physics parameters through stirring. Then, given the estimated physics parameters, we can generate commands to guide the robotic stirring. Comprehensive experiments were conducted to validate the efficacy of DiffStir, showcasing its precision in parameter estimation when benchmarked against reported values in the literature. More experiments and videos can be found in the supplementary materials and on the website: url{https://diffstir.robotflow.ai}.
|
|
09:00-10:00, Paper FrPI6T4.12 | |
Enhancing 3D Single Object Tracking with Efficient Point Cloud Segmentation |
|
Yang, Yu Shi | Nanjing University of Posts and Telecommunications |
Fan, Baojie | Nanjing University of Posts and Telecommunications |
Jiang, Yuyu | Nanjing University of Posts and Telecommunications |
Zhou, Wuyang | Nanjing University of Posts and Telecommunications |
Chen, Dong | Nanjing University of Posts and Telecommunications |
Xu, Hongxin | Delft University of Technology |
Keywords: Computer Vision for Automation, Deep Learning for Visual Perception, Autonomous Vehicle Navigation
Abstract: 3D single object tracking (SOT) based on point cloud has attracted much attention due to its important role in machine vision and autonomous driving. Recently, textbf{M^2} -Track propose a two-stage tracking structure centered on motion, but they ignore the effect of segmentation errors in sparse point cloud scenarios, which hinder the ability of networks to accurately represent tracking targets. To solve the problems, we propose an efficient 3D single object tracker (Abbr. EST) that can effectively segment point cloud features. Firstly, the proposed fusion segmentation module makes up for the feature loss caused by the downsampling strategy and enhances the ability of the network to recognize foreground points. In addition, the global embedded module is used to further focus on the crucial features of the target. This module provides global information by using residual networks and adding background information. Numerous experiments conducted on KITTI and NuScenes benchmarks show that EST achieves superior point cloud tracking in both performance and efficiency.
|
|
09:00-10:00, Paper FrPI6T4.13 | |
SwinMTL: A Shared Architecture for Simultaneous Depth Estimation and Semantic Segmentation from Monocular Camera Images |
|
Taghavi, Pardis | Texas A&M |
Pandey, Gaurav | Texas A&M |
Langari, Reza | Texas A&M University |
Keywords: Computer Vision for Automation, Deep Learning for Visual Perception, RGB-D Perception
Abstract: This research paper presents an innovative multi-task learning framework that allows concurrent depth estimation and semantic segmentation using a single camera. The proposed approach is based on a shared encoder-decoder architecture, which integrates various techniques to improve the accuracy of the depth estimation and semantic segmentation task without compromising computational efficiency. Additionally, the paper incorporates an adversarial training component, employing a Wasserstein GAN framework with a critic network, to refine model's predictions. The framework is thoroughly evaluated on two datasets - the outdoor Cityscapes dataset and the indoor NYU Depth V2 dataset - and it outperforms existing state-of-the-art methods in both segmentation and depth estimation tasks. We also conducted ablation studies to analyze the contributions of different components, including pre-training strategies, the inclusion of critics, the use of logarithmic depth scaling, and advanced image augmentations, to provide a better understanding of the proposed framework.
|
|
09:00-10:00, Paper FrPI6T4.14 | |
Monocular 3D Reconstruction of Cheetahs in the Wild |
|
da Silva, Zico | University of Cape Town |
Parkar, Zuhayr | University of Cape Town |
Muramatsu, Naoya | University of Cape Town |
Nicolls, Fred | University of Cape Town |
Patel, Amir | University of Cape Town |
Keywords: Computer Vision for Automation, Sensor Fusion, Robotics and Automation in Life Sciences
Abstract: This paper introduces a framework for monocular 3D reconstruction of cheetah movements, leveraging a combination of data-driven and physics-based modeling as well as trajectory optimization. Unlike traditional methods that rely solely on kinematics, our approach integrates dynamic motion principles, enhancing the plausibility and generalization of motion estimates. Validated on the cheetah running dataset, AcinoSet, we achieve mean per-joint position errors of 78.8 mm and 72.5 mm, showcasing significant advancements over the existing model used in AcinoSet. By addressing the challenge of absent ground truth data, this work not only advances animal motion capture techniques but also informs the development of bio-inspired robotic systems, offering a robust solution for accurately capturing complex animal locomotion in natural settings.
|
|
09:00-10:00, Paper FrPI6T4.15 | |
Scalable Network and Adaptive Refinement Module for 6D Pose Estimation of Diverse Industrial Components |
|
Qian, Kun | University of Manchester |
Erden, Mustafa Suphi | Heriot-Watt University |
Kong, Xianwen | Heriot-Watt Universiy |
Keywords: Computer Vision for Automation, Deep Learning for Visual Perception, Industrial Robots
Abstract: The estimation of the 6D pose of industrial components is essential for smart manufacturing. Especially for complex units that require intensive manual operations, such as a concentrator photovoltaics solar panel, accurate spatial localization provides visual aids for industrial automation. In this paper, we propose an accurate and scalable framework to address the dimensional variability of industrial components and tackle practical implementation issues. First, we use the scalable architecture EfficientNet as the backbone coupled with an enhanced feature pyramid network to estimate the object’s pose. By introducing vertical and horizontal connections of shallow layers, the feature extraction of small objects is optimized for better detection accuracy. Second, leveraging the reliable 2D detection results and geometry information, an adaptive pose refinement module is designed to adjust the estimated 6D pose. The scaling of the backbone network and the computational complexity of refined modules are uniformly adjusted via a shared hyperparameter, resulting in a globally scalable framework. In terms of the pose estimation accuracy, the effectiveness of the refinement module and the real-time performance, validations are conducted both on the LINEMOD dataset and our customized datasets comprising of objects from the industrial photovoltaic system. Additionally, to further illustrate the effectiveness of the proposed method, a precision parallel robot is employed to validate the accuracy of real-time object pose tracking.
|
|
09:00-10:00, Paper FrPI6T4.16 | |
AirShot: Efficient Few-Shot Detection for Autonomous Exploration |
|
Wang, Zihan | Carnegie Mellon University |
Li, Bowen | Carnegie Mellon University |
Wang, Chen | University at Buffalo |
Scherer, Sebastian | Carnegie Mellon University |
Keywords: Computer Vision for Automation, Deep Learning for Visual Perception, Data Sets for Robotic Vision
Abstract: Few-shot object detection has drawn increasing attention in the field of robotic exploration, where robots are required to find unseen objects with a few online provided examples. Despite recent efforts have been made to yield online processing capabilities, slow inference speeds of low-powered robots fail to meet the demands of real-time detection-making them impractical for autonomous exploration. Existing methods still face performance and efficiency challenges, mainly due to unreliable features and exhaustive class loops. In this work, we propose a new paradigm AirShot, and discover that, by fully exploiting the valuable correlation map, AirShot can result in a more robust and faster few-shot object detection system, which is more applicable to robotics community. The core module Top Prediction Filter (TPF) can operate on multi-scale correlation maps in both the training and inference stages. During train- ing, TPF supervises the generation of a more representative correlation map, while during inference, it reduces looping iterations by selecting top-ranked classes, thus cutting down on computational costs with better performance. Surprisingly, this dual functionality exhibits general effectiveness and efficiency on various off-the-shelf models. Exhaustive experiments on COCO2017, VOC2014, and SubT datasets demonstrate that TPF can significantly boost the efficacy and efficiency of most off-the-shelf models, achieving up to 36.4% precision improve- ments along with 56.3% faster inference speed. Code and Data are at: https://github.com/ImNotPrepared/AirShot.
|
|
09:00-10:00, Paper FrPI6T4.17 | |
RATE: Real-Time Asynchronous Feature Tracking with Event Cameras |
|
Ikura, Mikihiro | University of Tokyo |
Le Gentil, Cedric | University of Technology Sydney |
Müller, Marcus Gerhard | German Aerospace Center |
Schuler, Florian | German Aerospace Center |
Yamashita, Atsushi | The University of Tokyo |
Stuerzl, Wolfgang | DLR, Institute of Robotics and Mechatronics |
Keywords: Visual Tracking, Computer Vision for Automation, Vision-Based Navigation
Abstract: Vision-based self-localization is a crucial technology for enabling autonomous robot navigation in GPS-deprived environments. However, standard frame cameras are subject to motion blur and suffer from a limited dynamic range. This research focuses on efficient feature tracking for self-localization by using event-based cameras. Such cameras do not provide regular snapshots of the environment but asynchronously collect events that correspond to a small delta of illumination in each pixel independently, thus addressing the issue of motion blur during fast motion and high dynamic range. Specifically, we propose a continuous real-time asynchronous event-based feature tracking pipeline, named RATE. This pipeline integrates (i) a corner detector node utilizing a time slice of the Surface of Active Events to initialize trackers continuously, along with (ii) a tracker node with a proposed tracking manager, consisting of a grid-based distributor to reduce redundant trackers and to remove feature tracks of poor quality. Evaluations using public datasets reveal that our method maintains a stable number of tracked features, and performs real-time tracking efficiently while maintaining or even improving tracking accuracy compared to state-of-the-art event-only tracking methods. Our ROS implementation is released as open-source: https://github.com/mikihiroikura/RATE
|
|
FrPI6T5 |
Room 5 |
Field Robotics |
Teaser Session |
|
09:00-10:00, Paper FrPI6T5.1 | |
PhysORD: A Neuro-Symbolic Approach for Physics-Infused Motion Prediction in Off-Road Driving |
|
Zhao, Zhipeng | University at Buffalo |
Li, Bowen | Carnegie Mellon University |
Du, Yi | University at Buffalo |
Fu, Taimeng | University at Buffalo |
Wang, Chen | University at Buffalo |
Keywords: Dynamics, AI-Based Methods, Field Robots
Abstract: Motion prediction is critical for autonomous off-road driving, however, it presents significantly more challenges than on-road driving because of the complex interaction between the vehicle and the terrain. Traditional physics-based approaches encounter difficulties in accurately modeling dynamic systems and external disturbance. In contrast, data-driven neural networks require extensive datasets and struggle with explicitly capturing the fundamental physical laws, which can easily lead to poor generalization. By merging the advantages of both methods, neuro-symbolic approaches present a promising direction. These methods embed physical laws into neural models, potentially significantly improving generalization capabilities. However, no prior works were evaluated in real-world settings for off-road driving. To bridge this gap, we present PhysORD, a neural-symbolic approach integrating the conservation law, i.e., the Euler-Lagrange equation, into data-driven neural models for motion prediction in off-road driving. Our experiments showed that PhysORD can accurately predict vehicle motion and tolerate external disturbance by modeling uncertainties. It outperforms existing methods both in accuracy and efficiency and demonstrates data-efficient learning and generalization ability in long-term prediction.
|
|
09:00-10:00, Paper FrPI6T5.2 | |
Kinetic-Energy-Optimal and Safety-Guaranteed Trajectory Planning for Bridge Inspection Robot Manipulator |
|
Zhang, Tianyu | University of Chinese Academy of Sciences |
Chang, Yong | Chinese Academy of Sciences, Shenyang Institute of Automation |
Wang, Hongguang | Shenyang Institute of Automation, Chinese AcademyofSciences |
Wang, Tianlong | Shenyang Institute of Automation, Chinese Academy of Sciences |
Keywords: Field Robots, Motion and Path Planning, Motion Control
Abstract: Bridge inspections are essential for maintaining key infrastructure and preventing structural and functional failures. Nevertheless, traditional manual inspection techniques are plagued by laboriousness, high risk, and low efficiency. Although numerous automation inspection methods have been studied, inspection performance remains challenging. The main difficulties are redundant mechanisms, complex control, high energy consumption, and limited autonomy and safety. To address these problems, we are developing a small, lightweight, electrically-driven robotic manipulator for bridge inspection named the BIRM. Here, we propose a kinetic-energy-optimal and safety-guaranteed trajectory planning for BIRM. Compared with existing methods, it simultaneously addresses energy consumption and safety. The approach formulates a quadratic programming (QP) problem by considering the robot's kinetic energy as the objective function, and the augmented Lagrange multiplier (ALM) is applied to find the solution of the QP. The proposed method completely satisfies the joint position, velocity, and acceleration limits at the speed level while considering collision avoidance. In this paper, the collision detection strategy can achieve low-complexity computation through several structural parameters of the bridge, thereby quickly adapting to environmental changes. Through simulation experiments, we validate the effectiveness and superiority of the proposed method. Through physical experiments, we demonstrate the sustainability and safety of bridge inspections in the field.
|
|
09:00-10:00, Paper FrPI6T5.3 | |
Proprioception Is All You Need: Terrain Classification for Boreal Forests |
|
LaRocque, Damien | Université Laval |
Guimont-Martin, William | Université Laval |
Duclos, David-Alexandre | Université Laval |
Giguère, Philippe | Université Laval |
Pomerleau, Francois | Université Laval |
Keywords: Field Robots, Energy and Environment-Aware Automation, Deep Learning Methods
Abstract: Recent works in field robotics highlighted the importance of resiliency against different types of terrains. Boreal forests, in particular, are home to many mobility-impeding terrains that should be considered for off-road autonomous navigation. Also, being one of the largest land biomes on Earth, boreal forests are an area where autonomous vehicles are expected to become increasingly common. In this paper, we address the issue of classifying boreal terrains by introducing BorealTC, a publicly available dataset for proprioceptive-based terrain classification (TC). Recorded with a Husky A200, our dataset contains 116 min of Inertial Measurement Unit (IMU), motor current, and wheel odometry data, focusing on typical boreal forest terrains, notably snow, ice, and silty loam. Combining our dataset with another dataset from the literature, we evaluate both a Convolutional Neural Network (CNN) and the novel state space model (SSM)-based Mamba architecture on a TC task. We show that while CNN outperforms Mamba on each separate dataset, Mamba achieves greater accuracy when trained on a combination of both. In addition, we demonstrate that Mamba's learning capacity is greater than a CNN for increasing amounts of data. We show that the combination of two TC datasets yields a latent space that can be interpreted with the properties of the terrains. We also discuss the implications of merging datasets on classification. Our source code and dataset are publicly available online: https://github.com/norlab-ulaval/BorealTC.
|
|
09:00-10:00, Paper FrPI6T5.4 | |
On Predicting Terrain Changes Induced by Mobile Robot Traversal |
|
Pragr, Milos | Czech Technical University in Prague, FEE |
Bayer, Jan | Czech Technical University in Prague |
Faigl, Jan | Czech Technical University in Prague |
Keywords: Field Robots, Mapping, Wheeled Robots
Abstract: Mobile robots operating in convoys have a limited view of the terrain to be traversed if it is occluded by the preceding vehicle. Furthermore, the preceding vehicle might change the terrain geometry and eventually significantly alter its traversability by driving over the terrain. When the following vehicles do not consider such changes, they can use spurious terrain appearance and geometry to decide whether to follow in the tracks of the previous vehicle or to avoid them since the preceding vehicle's tracks can make the terrain untraversable. We propose to predict the terrain changes induced by the robot traversal on the traversed terrain and thus support the decision-making of the following vehicles. The developed model projects the robot wheel footprint along the planned robot path and combines the projection with the terrain appearance and prior terrain elevation. The coupled model is used in a convolutional neural network that predicts the elevation after traversal. The footprint projection component is designed so that learned networks can be transferred to vehicles with different wheel footprints without relearning the model. The proposed model is verified using a dataset captured using a real, one-ton, six-wheel robot traversing rigid roads and vegetated fields.
|
|
09:00-10:00, Paper FrPI6T5.5 | |
Real-Time Terrain Assessment and Bayesian-Based Path Planning for Off-Road Navigation |
|
Niu, Tianwei | Beijing Institute of Technology |
Yu, Shuwei | Beijing Institute of Technology |
Wang, Liang | Beijing Institute of Technology |
Yuan, Haoyu | Beijing Institute of Technology |
Wang, Shoukun | Beijing Institute of Technology |
Wang, Junzheng | Beijing Institute of Technology |
Keywords: Field Robots, Motion and Path Planning, Reactive and Sensor-Based Planning
Abstract: In the context of unstructured and unknown environment, the autonomous navigation still faces many challenges, such as assessing rough terrain and deciding how to safely navigate complex terrain. In this work, we propose a robust and practical off-road navigation framework that has been successfully deployed on a vibroseis truck for land exploration. First, in degraded wild scenes, a tightly coupled lidar-GNSS-inertial fusion odometry and mapping framework is adopted to construct a local point cloud map around the vehicle in real-time and provide precise localization. Then, based on amplitude-frequency characteristic analysis and point cloud PCA, a multi-layer terrain assessment map containing terrain roughness, obstacles and slope information is obtained. Finally, combining Gaussian distribution based adaptive sampler and Bayesian sequentially updated proposal distribution, a local graph is efficiently built to obtain multiple path solutions under constrained conditions. Both simulations and field experiments show that the proposed navigation framework can decide how to travel on a flat road even in harsh terrain conditions, naturally suppressing frequent attitude angle changes and preventing vehicle accidents.
|
|
09:00-10:00, Paper FrPI6T5.6 | |
PARE: A Plane-Assisted Autonomous Robot Exploration Framework in Unknown and Uneven Terrain |
|
Xu, Pu | Northeastern University |
Bai, Zhaoqiang | Northeastern University |
Liu, Haoming | Northeastern University(CN) |
Fang, Zheng | Northeastern University |
Keywords: Field Robots, Collision Avoidance, Motion and Path Planning
Abstract: Identifying traversable areas is a critical task for unmanned vehicles exploring safely through unstructured environments. In practice, the ambiguity in perceiving terrain traversability usually brings great challenges for autonomous exploration in unknown and uneven terrain, which often leads to conservative strategies or potential risk of vehicle damage, resulting in many unexplored areas in the environment. To that end, this paper proposes a plane-assisted autonomous robot exploration framework (PARE) to achieve maximum volume and safe autonomous exploration. The process is carried out by a three-step dual-layer framework: constructing a local tree using Plane-Assisted RRT* (PA-RRT*), calculating exploration gain based on terrain information, and maintaining a global search graph. Firstly, the planar feature metrics (flatness, sparsity, elevation variation, slope and slope variation) are introduced to determine the terrain traversability. Secondly, to completely explore the rugged environment, we propose a dual-layer exploration framework comprising local and global strategies. A local planner based on PA-RRT* is proposed to find the best path by evaluating the planar information and the volumetric gain within the local exploration tree. Meanwhile, a global planner constructed by graph is proposed to record unexplored nodes with high exploration gain from the local tree to ensure a high level of exploration volume. Extensive simulation and real-world experiments demonstrate that our method significantly outperforms existing frameworks, with an average improvement of more than 12% in exploration volume.
|
|
09:00-10:00, Paper FrPI6T5.7 | |
Low-Cost Urban Localization with Magnetometer and LoRa Technology |
|
Benham, Derek | Brigham Young University |
Palacios, Ashton | Brigham Young University |
Lundrigan, Philip | Brigham Young University |
Mangelson, Joshua | Brigham Young University |
Keywords: Field Robots, Localization, Networked Robots
Abstract: With the goal of developing low-cost and innovative perception and localization techniques for autonomous vehicles, this work explores a system that solely relies on a LoRa receiver and a magnetometer for agent localization within urban environments. Using the received signal strength from LoRa beacons distributed across a test area of 16,000 square meters, a model of expected RSSI values per beacon is estimated using Gaussian Process (GP) regression. Motion is estimated using a probabilistic signal similarity classifier, and localization is obtained via a particle filter. Our experiments demonstrate that our proposed system is able to estimate our location to within three meters RMSE when the agent is within the convex hull of prior data. In real-world scenarios, characterized by signal interference and environmental complexities, our approach highlights the potential of leveraging affordable technology such as LoRa receivers and magnetometers for robust and accurate location estimation in complex urban environments. The integration of low-cost LoRa devices, Gaussian Process regression, particle filtering and our novel signal similarity motion estimator offers a promising avenue for achieving cost-effective localization solutions without compromising accuracy or reliability.
|
|
09:00-10:00, Paper FrPI6T5.8 | |
Side-Scan Sonar Based Landmark Detection for Underwater Vehicles |
|
Hoff, Simon Andreas | Norwegian University of Science and Technology |
Haraldstad, Vegard | Norwegian University of Science and Technology |
Reitan Hogstad, Bjørnar | Norwegian University of Science and Technology |
Varagnolo, Damiano | Norwegian University of Science and Technology |
Keywords: Marine Robotics, Autonomous Vehicle Navigation, Mapping
Abstract: We propose and analyze a pipeline to transform raw sonar data from underwater vehicles into actionable information for Simultaneous Localization and Mapping (SLAM) in real time. The pipeline encompasses three sequential steps, each building upon state-of-the-art algorithms from the existing literature: swath processing to preprocess the raw sonar data, with a primary focus on eliminating blind zones and noise re- duction; transformation of these swaths into probabilistic maps of the surroundings of the sonar; and finally, detection and classification of underwater landmarks from the probabilistic maps. Our work contributes by modifying algorithms from the literature to ensure computational efficiency and integrating them so that they operate in sequence, thereby furnishing valuable information for navigation purposes. Through validation with field data, we then discuss which scenarios may prove difficult for the individual stages of the proposed pipeline. We provide indications on whether each step may encounter challenges and discuss how this occurrence may affect the overall quality of the final result. This empirical discussion is useful for discerning the practical applicability of the proposed pipeline in real-world settings.
|
|
09:00-10:00, Paper FrPI6T5.9 | |
Development of a Throwbot with Shock Absorption Structure |
|
Keum, Jaeyeong | DGIST |
Kim, Jaemin | DGIST |
Lee, Changgi | DGIST |
Lim, Seunghyun | DGIST |
Ju, Insung | DGIST |
Yun, Dongwon | Daegu Gyeongbuk Institute of Science and Technology (DGIST) |
Keywords: Wheeled Robots, Product Design, Development and Prototyping, Search and Rescue Robots
Abstract: In this study, a throwing robot equipped with an shock absorbing structure, utilizing paired-Cross Flexural Hinge (p-CFH) and an airbag, was fabricated and validated to assess the effectiveness of its impact absorption mechanism. This robot was developed in anticipation of situations where direct human intervention for life rescue would be challenging. Throwing robots can be broadly categorized into ball type, wheel type, and hybrid type. The hybrid type combines the advantages of both: the ease of throwing from ball type, due to its low air resistance coefficient, and the versatile mobility of the wheel type in diverse environments. However, hybrid type throwing robots are more vulnerable to external impacts due to the complexity of their internal structure, resulting in a lower maximum drop height compared to wheel type robots. To address these challenges, this research proposes a the Throwbot that combines the easy throwing capability of ball type with the obstacle overcoming ability of the wheel type, while also addressing the low free fall height drawback inherent in hybrid types. To achieve this, we developed a Throwbot with a ball to wheel transform structure, p-CFH mechanism, and airbag based impact absorption system. Additionally, materials were selected based on simulation results to refine the Throw- bot. The performance of the proposed robot was evaluated through various assessments, including free fall experiments and obstacle overcoming tests. Through this research, the proposed Throwbot effectively addresses the shortcomings of existing throwing robots, establishing a novel approach to throwing robot design.
|
|
09:00-10:00, Paper FrPI6T5.10 | |
Archie Jnr: A Robotic Platform for Autonomous Cane Pruning of Grapevines |
|
Williams, Henry | University of Auckland |
Smith, David Anthony James | University of Auckland |
Shahabi, Jalil | University of Auckland |
Gee, Trevor | The University of Auckland |
Qureshi, Ans | University of Auckland |
McGuinness, Benjamin John | University of Waikato |
Harvey, Scott | University of Waikato |
Downes, Catherine | University of Waikato |
Jangali, Rahul | The University of Waikato |
Black, Kale | Black Box Technologies LTD |
Lim, Shen Hin | University of Waikato |
Duke, Mike | Waikato University |
MacDonald, Bruce | University of Auckland |
Keywords: Agricultural Automation, Robotics and Automation in Agriculture and Forestry, Field Robots
Abstract: Cane pruning grapevines is a complex manual task requiring expert vine assessment to determine which canes to prune. This paper presents Archie Jnr, which was developed to autonomously assess the structure of the vine and prune the lower-quality canes as an expert pruner would. The platform has been extensively evaluated in a real-world commercial vineyard using a three-cane pruning method. The results show the effectiveness of the vision system for generating accurate assessments of a vine’s canes. The platform is also shown to be capable of successfully pruning 71.1% of the 311 total canes that required pruning across 25 vines.
|
|
09:00-10:00, Paper FrPI6T5.11 | |
CAIS: Culvert Autonomous Inspection Robotic System |
|
Le, Chuong | University of Nevada, Reno |
Walunj, Pratik | University of Nevada Reno |
Nguyen, An | University of Nevada, Reno |
Zhou, Yong | University of Nevada, Reno |
Nguyen, Thanh Binh | TAMUCC |
Nguyen, Thang | Texas A&M University-Corpus Christi |
Netchaev, Anton | USACE ERDC |
La, Hung | University of Nevada at Reno |
Keywords: Service Robotics, Software-Hardware Integration for Robot Systems, Field Robots
Abstract: Culverts, essential components of drainage systems, require regular inspection to ensure optimal functionality. However, culvert inspections pose numerous challenges, including accessibility, manpower, defect localization, and reliance on superficial assessments. To address these challenges, we propose a novel Culvert Autonomous Inspection Robotic System (CAIS) equipped with advanced sensing and evaluation capabilities. Our solution integrates an RGBD camera, deep learning, lighting systems, and non-destructive evaluation (NDE) techniques to enable accurate and comprehensive condition assessments. We present a pioneering Partially Observable Markov Decision Process (POMDP) framework to resolve uncertainty in autonomous inspections, especially in confined and unstructured environments like culverts or tunnels. The framework outputs detailed 3D maps highlighting visual defects and NDE condition assessments, demonstrating consistent and reliable performance in both indoor and outdoor scenarios. Additionally, we provide an open-source implementation of our framework on GitHub, contributing to the advancement of autonomous inspection technology and fostering collaboration within the research community.
|
|
09:00-10:00, Paper FrPI6T5.12 | |
Intelligent Fish Detection System with Similarity-Aware Transformer |
|
Li, Shengchen | Tongji University |
Zuo, Haobo | University of Hong Kong |
Fu, Changhong | Tongji University |
Wang, Zhiyong | Fishery Machinery and Instrument Research Institute, Chinese Aca |
Xu, Zhiqiang | Fishery Machinery and Instrument Research Institute |
Keywords: Agricultural Automation, Robotics and Automation in Agriculture and Forestry, Computer Vision for Automation
Abstract: Fish detection in water-land transfer has significantly contributed to the fishery. However, manual fish detection in crowd-collaboration performs inefficiently and expensively, involving insufficient accuracy. To further enhance the water-land transfer efficiency, improve detection accuracy, and reduce labor costs, this work designs a new type of lightweight and plug-and-play edge intelligent vision system to automatically conduct fast fish detection with high-speed camera. Moreover, a novel similarity-aware vision Transformer for fast fish detection (FishViT) is proposed to onboard identify every single fish in a dense and similar group. Specifically, a novel similarity-aware multi-level encoder is developed to enhance multi-scale features in parallel, thereby yielding discriminative representations for varying-size fish. Additionally, a new soft-threshold attention mechanism is introduced, which not only effectively eliminates background noise from images but also accurately recognizes both the edge details and overall features of different similar fish. 85 challenging video sequences with high framerate and high-resolution are collected to establish a benchmark from real fish water-land transfer scenarios. Exhaustive evaluation conducted with this challenging benchmark has proved the robustness and effectiveness of FishViT with over 80 FPS. Real work scenario tests validate the practicality of the proposed method. The code and demo video are available at https://github.com/vision4robotics/FishViT.
|
|
09:00-10:00, Paper FrPI6T5.13 | |
Calibration-Free Vision-Assisted Container Loading of RTG Cranes |
|
Yang, Jianbing | Nanyang Technological University |
Wang, Yuanzhe | Nanyang Technological University |
Jiang, Hao | Shanghai Zhenhua Heavy Industries Co., Ltd |
Zhao, Bin | Shanghai Zhenhua Heavy Industries Co., Ltd |
Li, Yiming | Shanghai Zhenhua Heavy Industries Co., Ltd |
Wang, Danwei | Nanyang Technological University |
Keywords: Engineering for Robotic Systems, Field Robots, Automation Technologies for Smart Cities
Abstract: Vision-assisted container loading of Rubber Tyred Gantry (RTG) cranes are facing two primary challenges. Firstly, the uncertainty inherent in Covolutional Neural Network (CNN) based detection hinders its direct application in the safety-critical operation of such heavy-duty machinery. Secondly, sensor calibration introduces additional complexities and errors into the system. However, existing studies have not adequately addressed these challenges. Motivated by this gap, this paper proposes an integrated approach for target detection and alignment control in container loading of RTG cranes. To ensure reliable target marker identification, a heuristic postprocessing algorithm is developed as a complement to CNN-based foreground segmentation, thereby ensuring safety during the container handling process. On this basis, a pixel-based control scheme is designed to align the container with the target markers, which eliminates the need for offline or online sensor calibrations. The proposed approach has been successfully implemented on a real RTG crane manufactured by Shanghai Zhenhua Heavy Industries Co., Ltd. (ZPMC) and validated at the Port of Ningbo, China. Experimental results demonstrate the superiority of the proposed approach over current manual operations in port industries, highlighting its potential for crane automation.
|
|
09:00-10:00, Paper FrPI6T5.14 | |
Online Tree Reconstruction and Forest Inventory on a Mobile Robotic System |
|
Freißmuth, Leonard | Technical University Munich |
Mattamala, Matias | University of Oxford |
Chebrolu, Nived | University of Oxford |
Schaefer, Simon | Technical University of Munich |
Leutenegger, Stefan | Technical University of Munich |
Fallon, Maurice | University of Oxford |
Keywords: Robotics and Automation in Agriculture and Forestry, Mapping, Field Robots
Abstract: Terrestrial laser scanning (TLS) is the standard technique used to create accurate point clouds for digital forest inventories. However, the measurement process is de- manding, requiring up to two days per hectare for data collection, significant data storage, as well as resource-heavy post-processing of 3D data. In this work, we present a real-time mapping and analysis system that enables online generation of forest inventories using mobile laser scanners that can be mounted e.g. on mobile robots. Given incrementally created and locally accurate submaps—data payloads—our approach extracts tree candidates using a custom, Voronoi-inspired clus- tering algorithm. Tree candidates are reconstructed using an algorithm based on the Hough transform, which enables robust modeling of the tree stem. Further, we explicitly incorporate the incremental nature of the data collection by consistently updating the database using a pose graph LiDAR SLAM system. This enables us to refine our estimates of the tree traits if an area is revisited later during a mission. We demonstrate competitive accuracy to TLS or manual measurements using laser scanners that we mounted on backpacks or mobile robots operating in conifer, broad-leaf and mixed forests. Our results achieve RMSE of 1.93 cm, a bias of 0.65 cm and a standard deviation of 1.81 cm (averaged across these sequences)—with no post-processing required after the mission is complete.
|
|
09:00-10:00, Paper FrPI6T5.15 | |
Roofus: Learning-Based Robotic Moisture Mapping on Flat Rooftops with Ground Penetrating Radar |
|
Lee, Kevin | New York University |
Lin, Wei-Heng | New York University |
Javed, Talha | Building Diagnostic Robotics |
Madhusudhan, Sruti | Building Diagnostic Robotics |
Sher, Bilal | Building Diagnostic Robotics |
Feng, Chen | New York University |
Keywords: Robotics and Automation in Construction, Field Robots, AI-Based Methods
Abstract: Robust moisture detection is crucial for building maintenance and cost reduction. Current methods are often limited by the type of roofing material or are cumbersome and expensive. Ground Penetrating Radar (GPR) has shown promise in recent works in moisture detection due to its effectiveness across a broader range of materials, its compactness and lightweight nature, and its ability to image the subsurface. We introduce Roofus, an integrated robotic moisture detection system for flat rooftops, designed to overcome traditional method limitations. It combines a remotely controlled robot with deep learning GPR data processing and automatic map generation. Real-world data is collected and manually annotated for supervised learning. We investigate a novel approach to interpreting GPR data via deep learning using Transformer-based classifiers. LiDAR inertial odometry is employed to integrate multiple individual GPR scans into a holistic moisture map over the rooftop. When evaluated against existing methods such as infrared thermal imaging, electrical capacitance surveys, and nuclear moisture gauges, our system shows promising viability for industry application.
|
|
09:00-10:00, Paper FrPI6T5.16 | |
Archie Snr: A Robotic Platform for Autonomous Apple Fruitlet Thinning |
|
Williams, Henry | University of Auckland |
Qureshi, Ans | University of Auckland |
Smith, David Anthony James | University of Auckland |
Gee, Trevor | The University of Auckland |
McGuinness, Benjamin John | University of Waikato |
Jangali, Rahul | The University of Waikato |
Black, Kale | Black Box Technologies LTD |
Harvey, Scott | University of Waikato |
Downes, Catherine | University of Waikato |
Lim, Shen Hin | University of Waikato |
Oliver, Richard | Plant and Food Research |
Duke, Mike | Waikato University |
MacDonald, Bruce | University of Auckland |
Keywords: Robotics and Automation in Agriculture and Forestry, Field Robots
Abstract: Apple fruitlet thinning is critical in cultivating high-quality apples, requiring an expert workforce to manage the orchard. The thinning process requires precise mapping of fruitlet clusters across the tree branches to manage the desired load for each tree. This paper presents Archie Snr, which was developed to autonomously assess the current load of the tree and thin the excess apples as an expert thinner would. The platform has been extensively evaluated in a real-world commercial orchard. The results show the platform can generate an average load count accuracy of 82.1% with a recall of 93.3%. The system was then able to successfully thin 66.14% of the fruitlets from the canopy.
|
|
FrPI6T6 |
Room 6 |
Learning V |
Teaser Session |
Chair: Wang, Yang | Shanghaitech University |
|
09:00-10:00, Paper FrPI6T6.1 | |
Neural Kinodynamic Planning: Learning for Kinodynamic Tree Expansion |
|
Lai, Tin | University of Sydney |
Zhi, Weiming | Carnegie Mellon University |
Hermans, Tucker | University of Utah |
Ramos, Fabio | University of Sydney, NVIDIA |
Keywords: Learning from Experience, Deep Learning in Grasping and Manipulation, Motion and Path Planning
Abstract: We integrate neural networks into kinodynamic motion planning and present the Learning for KinoDynamic Tree Expansion (L4KDE) method. Tree-based planning approaches, such as rapidly exploring random tree (RRT), are the dominant approach to finding globally optimal plans in continuous state-space motion planning. Central to these approaches is tree expansion, the procedure in which new nodes are added to an ever-expanding tree. We study the kinodynamic variants of tree-based planning, where we have known system dynamics and kinematic constraints. In the interest of quickly selecting nodes to connect newly sampled coordinates, existing methods typically cannot optimise the finding of nodes that have a low cost to transition to sampled coordinates. Instead, they use metrics like Euclidean distance between coordinates as a heuristic for selecting candidate nodes to connect to the search tree. We propose L4KDE to address this issue. L4KDE uses a neural network to predict transition costs between queried states, which can be efficiently computed in batch, providing much higher quality estimates of transition cost compared to commonly used heuristics while maintaining almost-surely asymptotic optimality guarantee. We empirically demonstrate the significant performance improvement provided by L4KDE on a variety of challenging system dynamics, with the ability to generalise across different instances of the same model class and in conjunction with a suite of modern tree-based motion planners.
|
|
09:00-10:00, Paper FrPI6T6.2 | |
Unsupervised Multiple Proactive Behavior Learning of Mobile Robots for Smooth and Safe Navigation |
|
Srisuchinnawong, Arthicha | University of Southern Denmark and Vidyasirimedhi Institute of S |
Baech, Jonas | Danish Technological Institute |
Hyzy, Marek Piotr | Technical University of Denmark, Lungby |
Kounalakis, Tsampikos | Danish Technological Institute |
Boukas, Evangelos | Technical University of Denmark |
Manoonpong, Poramate | Vidyasirimedhi Institute of Science and Technology (VISTEC) |
Keywords: Machine Learning for Robot Control, Sensorimotor Learning, Bioinspired Robot Learning
Abstract: While different control approaches have been developed for smooth and safe navigation in service applications, they are limited by the needs for model-based assumptions, true training target/reward function, and/or large sample data. To overcome these limitations, this study proposes a model-free neural control architecture with a generic plug-and-play online Multiple Proactive Behavior Learning (MPL) module. The MPL adapts robot neural control policy in an online unsupervised manner with small sample data by correlating its sensory inputs to a local planner command. As a result, it allows a mobile robot to autonomously and quickly learn and balance various proactive behaviors related to smooth motion and collision avoidance. It also compensates for the limited planning update rates and the planning model mismatch of an arbitrary local motion planner. Compared with existing control approaches without the MPL, our control architecture with the MPL leads to (1) a 10% improvement in the smoothness of robot motion and 30% fewer collisions in a narrow static environment, and (2) trading motion smoothness for up to 70% fewer collisions in an unknown dynamic environment. Taken together, this study also demonstrates how to apply model-free neural control with unsupervised learning to existing model-based control (e.g., local motion planner) for efficient proactive behavior learning and control of mobile robots.
|
|
09:00-10:00, Paper FrPI6T6.3 | |
NFPDE: Normalizing Flow-Based Parameter Distribution Estimation for Offline Adaptive Domain Randomization |
|
Takano, Rin | NEC Corporation |
Takaya, Kei | NEC Corporation |
Oyama, Hiroyuki | NEC Corporation |
Keywords: Machine Learning for Robot Control, Model Learning for Control, Reinforcement Learning
Abstract: Reinforcement learning with domain randomization (DR) has been proposed as a promising approach for learning robust policies to environmental changes. However, for DR to work well in real-world environments, it is necessary to design appropriate DR distributions for model parameters. This paper proposes Normalizing Flow-based Parameter Distribution Estimation (NFPDE), a new estimation method for DR distributions. NFPDE models the target distribution by a flow-based generative model using normalizing flow and estimates the target distribution based on an offline dataset collected a priori in the target environment. Through numerical experiments on the OpenAI gym environment, we show that NFPDE can estimate the target distribution more accurately and efficiently than the previous estimation methods. We also show that the estimated DR distributions can improve the robustness of trained policies.
|
|
09:00-10:00, Paper FrPI6T6.4 | |
The Power of Input: Benchmarking Zero-Shot Sim-To-Real Transfer of Reinforcement Learning Control Policies for Quadrotor Control |
|
Dionigi, Alberto | University of Perugia |
Costante, Gabriele | University of Perugia |
Loianno, Giuseppe | New York University |
Keywords: Machine Learning for Robot Control, Aerial Systems: Applications
Abstract: In the last decade, data-driven approaches have become popular choices for quadrotor control, thanks to their ability to facilitate the adaptation to unknown or uncertain flight conditions. Among the different data-driven paradigms, Deep Reinforcement Learning (DRL) is currently one of the most explored. However, the design of DRL agents for Micro Aerial Vehicles (MAVs) remains an open challenge. While some works have studied the output configuration of these agents (i.e., what kind of control to compute), there is no general consensus on the type of input data these approaches should employ. Multiple works simply provide the DRL agent with full state information, without questioning if this might be redundant and unnecessarily complicate the learning process, or pose superfluous constraints on the availability of such information in real platforms. In this work, we provide an in-depth benchmark analysis of different configurations of the observation space. We optimize multiple DRL agents in simulated environments with different input choices and study their robustness and their sim-to-real transfer capabilities with zero-shot adaptation. We believe that the outcomes and discussions presented in this work supported by extensive experimental results could be an important milestone in guiding future research on the development of DRL agents for aerial robot tasks.
|
|
09:00-10:00, Paper FrPI6T6.5 | |
Tube-GAN: A Novel Virtual Tube Generation Method for Unmanned Aerial Swarms Based on Generative Adversarial Network |
|
Zhai, Shixun | North Automatic Control Technology Institute |
Zhang, Kaige | Utah State University |
Nan, Bo | North Automatic Control Technology Institute |
Sun, Yanwen | North Automatic Control Technology Institute |
Fu, Qianyi | Leeds/Zhejiang University |
Keywords: Machine Learning for Robot Control, Path Planning for Multiple Mobile Robots or Agents, Computer Vision for Automation
Abstract: Virtual tube is a two-dimensional or three-dimensional strip or tubular area similar to RSFC (Relative Safe Flight Corridor), which can provide smooth, feasible, and safe space for UAV swarm in environments with dense obstacles. In order to address the problem that current virtual tube planning methods are mainly based on complex heuristic algorithm with consuming time complexity, we modify the model architecture by introducing generative adversarial network (GAN), and propose a Tube-GAN model. Tube-GAN takes the key point prompt image and obstacle environment image as inputs, and outputs the image of the virtual tube, which transforms the optimization problem into an image generation problem, leveraging the performance of computational efficiency for the construction of virtual tube. The experimental results demonstrate that the proposed Tube-GAN model can quickly generate virtual tube in random environments (less than 25ms), providing a new direction for the construction of virtual tube in real-time.
|
|
09:00-10:00, Paper FrPI6T6.6 | |
Repairing Neural Networks for Safety in Robotic Systems Using Predictive Models |
|
Majd, Keyvan | Arizona State University |
Clark, Geoffrey | ASU |
Fainekos, Georgios | Toyota NA-R&D |
Ben Amor, Heni | Arizona State University |
Keywords: Machine Learning for Robot Control, Neural and Fuzzy Control, Learning from Demonstration
Abstract: This paper introduces a new method for safety-aware robot learning, focusing on repairing policies using predictive models. Our method combines behavioral cloning with neural network repair in a two-step supervised learning framework. It first learns a policy from expert demonstrations and then applies repair subject to predictive models to enforce safety constraints. The predictive models can encompass various aspects relevant to robot learning applications, such as proprioceptive states and collision likelihood. Our experimental results demonstrate that the learned policy successfully adheres to a predefined set of safety constraints on two applications: mobile robot navigation, and real-world lower-leg prostheses. Additionally, we have shown that our method effectively reduces repeated interaction with the robot, leading to substantial time savings during the learning process.
|
|
09:00-10:00, Paper FrPI6T6.7 | |
Performing Efficient and Safe Deformable Package Transport Operations Using Suction Cups |
|
Shukla, Rishabh | University of Southern California |
Yu, Zeren | Covariant.ai |
Moode, Samrudh | University of Southern California |
Manyar, Omey Mohan | University of Southern California |
Wang, Fan | Amazon Robotics |
Mayya, Siddharth | Amazon Robotics |
Gupta, Satyandra K. | University of Southern California |
Keywords: Machine Learning for Robot Control, Manipulation Planning, Motion and Path Planning
Abstract: Suction cups are popular for picking and transporting packages in warehouse applications. To maximize throughput, high transport speeds are desired. Many packages are deformable and may detach from the suction cups due to inertial loading if trajectories use excessive velocities. This paper introduces a novel methodology that analyzes package deformation through its curvature at the package-suction cup contact interface to generate a Factor-of-Safety (FOS) score for each waypoint in a given trajectory. By maintaining the FOS above a predetermined threshold, the trajectory planner is able to generate transport trajectories that are both safe and time-optimized. Experimental results show the method's efficacy, demonstrating a 21.92% reduction in transport times compared to a conservative trajectory generation. Our FOS predictor identified trajectories that ensured safe package transport with 100% accuracy across all 627 real-world experiments.
|
|
09:00-10:00, Paper FrPI6T6.8 | |
Dynamic Modeling of Robotic Fish Considering Background Flow Using Koopman Operators |
|
Lin, Xiaozhu | ShanghaiTech University |
Liu, Song | ShanghaiTech University |
Liu, Chengyuan | Loughborough Univeristy |
Wang, Yang | Shanghaitech University |
Keywords: Model Learning for Control
Abstract: Robotic Fish is increasingly employed in various aquatic application, where modelling and controlling their dynamic behaviour are crucial. Despite considerable efforts in robotic fish dynamic modeling, controllers based on existing models cannot perform satisfactorily in non-stationary flow environments. The main reason is that the nonlinear and two-way coupled nature of the interaction between background flow and robotic fish dynamics underscores a significant challenge in integrating the influence of background flow on robotic fish behavior, a topic that remains relatively under-explored in existing literature. To this end, we propose a novel approach for dynamic modeling of robotic fish considering background flow using Koopman operators, namely Flow-Aware Robot-Fish Modeling (FARM). Specifically, we first collect motion data of the robotic fish in different background flow fields, and then obtain a linear approximation (dynamic model) of nonlinear dynamics through carefully selected lifted functions. The obtained model can provide the next state at the given current state, control input, and average flow velocity of the background flow field. We evaluate the accuracy of the obtained model by evaluating the RMSE of predicted motion trajectories and real trajectories in different flow field environments. The results indicate that the FARM is highly promising in obtain reliable dynamic model, and obtained model can achieve comparable prediction accuracy even in unseen flow field environments with rough flow map for activity region. This lays a solid foundation for the motion control of robotic fish in different background flow fields.
|
|
09:00-10:00, Paper FrPI6T6.9 | |
Data-Driven Force Observer for Human-Robot Interaction with Series Elastic Actuators Using Gaussian Processes |
|
Tesfazgi, Samuel | Technical University of Munich |
Keßler, Markus | Technical University of Munich |
Trigili, Emilio | Scuola Superiore Sant'Anna |
Lederer, Armin | ETH Zurich |
Hirche, Sandra | Technische Universität München |
Keywords: Model Learning for Control, Human-Centered Robotics, Wearable Robotics
Abstract: Ensuring safety and adapting to the user's behavior are of paramount importance in physical human-robot interaction. Thus, incorporating elastic actuators in the robot's mechanical design has become popular, since it offers intrinsic compliance and additionally provide a coarse estimate for the interaction force by measuring the deformation of the elastic components. While observer-based methods have been shown to improve these estimates, they rely on accurate models of the system, which are challenging to obtain in complex operating environments. In this work, we overcome this issue by learning the unknown dynamics components using Gaussian process (GP) regression. By employing the learned model in a Bayesian filtering framework, we improve the estimation accuracy and additionally obtain an observer that explicitly considers local model uncertainty in the confidence measure of the state estimate. Furthermore, we derive guaranteed estimation error bounds, thus, facilitating the use in safety-critical applications. We demonstrate the effectiveness of the proposed approach experimentally in a human-exoskeleton interaction scenario.
|
|
09:00-10:00, Paper FrPI6T6.10 | |
Guiding Reinforcement Learning with Incomplete System Dynamics |
|
Wang, Shuyuan | University of British Columbia |
Duan, Jingliang | University of Science and Technology Beijing |
Lawrence, Nathan P. | University of British Columbia |
Loewen, Philip D | University of British Columbia, Vancouver |
Forbes, Michael | Honeywell |
Gopaluni, Bhushan | University of British Columbia |
Zhang, Lixian | Harbin Institute of Technology |
Keywords: Model Learning for Control, Reinforcement Learning, Machine Learning for Robot Control
Abstract: Model-free reinforcement learning (RL) is inher- ently a reactive method, operating under the assumption that it starts with no prior knowledge of the system and entirely depends on trial-and-error for learning. This approach faces several challenges, such as poor sample efficiency, generaliza- tion, and the need for well-designed reward functions to guide learning effectively. On the other hand, controllers based on complete system dynamics do not require data. This paper addresses the intermediate situation where there is not enough model information for complete controller design, but there is enough to suggest that a model-free approach is not the best approach either. By carefully decoupling known and unknown information about the system dynamics, we obtain an embedded controller guided by our partial model and thus improve the learning efficiency of an RL-enhanced approach. A modular design allows us to deploy mainstream RL algorithms to refine the policy. Simulation results show that our method signifi- cantly improves sample efficiency compared with standard RL methods on continuous control tasks, and also offers enhanced performance over traditional control approaches. Experiments on a real ground vehicle also validate the performance of our method, including generalization and robustness.
|
|
09:00-10:00, Paper FrPI6T6.11 | |
Learning Agile Locomotion on Risky Terrains |
|
Zhang, Chong | ETH Zurich |
Rudin, Nikita | ETH Zurich, NVIDIA |
Hoeller, David | ETH Zurich, NVIDIA |
Hutter, Marco | ETH Zurich |
Keywords: Sensorimotor Learning, Legged Robots
Abstract: Quadruped robots have shown remarkable mobility on various terrains through reinforcement learning. Yet, in the presence of sparse footholds and risky terrains such as stepping stones and balance beams, which require precise foot placement to avoid falls, model-based approaches are often used. In this paper, we show that end-to-end reinforcement learning can also enable the robot to traverse risky terrains with dynamic motions. To this end, our approach involves training a generalist policy for agile locomotion on disorderly and sparse stepping stones before transferring its reusable knowledge to various more challenging terrains by finetuning specialist policies from it. Given that the robot needs to rapidly adapt its velocity on these terrains, we formulate the task as a navigation task instead of the commonly used velocity tracking which constrains the robot's behavior and propose an exploration strategy to overcome sparse rewards and achieve high robustness. We validate our proposed method through simulation and real-world experiments on an ANYmal-D robot achieving peak forward velocity of >= 2.5 m/s on sparse stepping stones and narrow balance beams. Video: youtu.be/Z5X0J8OH6z4
|
|
09:00-10:00, Paper FrPI6T6.12 | |
Sensorimotor Attention and Language-Based Regressions in Shared Latent Variables for Integrating Robot Motion Learning and LLM |
|
Suzuki, Kanata | Fujitsu Limited |
Ogata, Tetsuya | Waseda University |
Keywords: Sensorimotor Learning, Imitation Learning, Learning from Experience
Abstract: In recent years, studies have been actively conducted on combining large language models (LLM) and robotics; however, most have not considered end-to-end feedback in the robot-motion generation phase. The prediction of deep neural networks must contain errors, it is required to update the trained model to correspond to the real environment to generate robot motion adaptively. This study proposes an integration method that connects the robot-motion learning model and LLM using shared latent variables. When generating robot motion, the proposed method updates shared parameters based on prediction errors from both sensorimotor attention points and task language instructions given to the robot. This allows the model to search for latent parameters appropriate for the robot task efficiently. Through simulator experiments on multiple robot tasks, we demonstrated the effectiveness of our proposed method from two perspectives: position generalization and language instruction generalization abilities.
|
|
09:00-10:00, Paper FrPI6T6.13 | |
GeRM: A Generalist Robotic Model with Mixture-Of-Experts for Quadruped Robot |
|
Song, Wenxuan | Westlake University |
Zhao, Han | Westlake University |
Ding, Pengxiang | Westlake University |
Cui, Can | Westlake University |
Lyu, Shangke | Westlake University |
Fan, YaNing | Hebei University of Technology |
Wang, Donglin | Westlake University |
Keywords: Perception-Action Coupling, Reinforcement Learning, Legged Robots
Abstract: Multi-task robot learning holds significant importance in tackling diverse and complex scenarios. However, current approaches are hindered by performance issues and difficulties in collecting training datasets. In this paper, we propose GeRM (Generalist Robotic Model). We utilize offline reinforcement learning to optimize data utilization strategies to learn from both demonstrations and sub-optimal data, thus surpassing the limitations of human demonstrations. Thereafter, we employ a transformer-based VLA network to process multi-modal inputs and output actions. By introducing the Mixture-of-Experts structure, GeRM allows faster inference speed with higher whole model capacity, and thus resolves the issue of limited RL parameters, enhancing model performance in multi-task learning while controlling computational costs. Through a series of experiments, we demonstrate that GeRM outperforms other methods across all tasks, while also validating its efficiency in both training and inference processes. Additionally, we uncover its potential to acquire emergent skills. Additionally, we contribute the QUARD-Auto dataset, collected automatically to support our training approach and foster advancements in multi-task quadruped robot learning. This work presents a new paradigm for reducing the cost of collecting robot data and driving progress in the multi-task learning community.
|
|
09:00-10:00, Paper FrPI6T6.14 | |
Feeling Optimistic? Ambiguity Attitudes for Online Decision Making |
|
Beard, Jared | West Virginia University |
Butts, R. Michael | West Virginia University |
Gu, Yu | West Virginia University |
Keywords: Probability and Statistical Methods, Reinforcement Learning, Marine Robotics
Abstract: Due to the complexity of many decision making problems, tree search algorithms often have inadequate information to produce accurate transition models. This results in ambiguities (uncertainties for which there are multiple plausible models). Faced with ambiguities, robust methods have been used to produce safe solutions—often by maximizing the lower bound over the set of plausible transition models. However, they often overlook how much the representation of uncertainty can impact how a decision is made. This work introduces the Ambiguity Attitude Graph Search (AAGS), advocating for more comprehensive representations of ambiguities in decision making. Additionally, AAGS allows users to adjust their ambiguity attitude (or preference), promoting exploration and improving users’ ability to control how an agent should respond when faced with a set of plausible alternatives. Simulation in a dynamic sailing environment shows how environments with high entropy transition models can lead robust methods to fail. Results further demonstrate how adjusting ambiguity attitudes better fulfills objectives while mitigating this failure mode of robust approaches. Because this approach is a generalization of the robust framework, these results further demonstrate how algorithms focused on ambiguity have applicability beyond safety-critical systems.
|
|
09:00-10:00, Paper FrPI6T6.15 | |
Offline Meta-Reinforcement Learning with Evolving Gradient Agreement |
|
Chen, Jiaxing | National University of Defense Technology |
Yuan, Weilin | National University of Defense Technology |
Chen, Shaofei | National University of Defense Technology |
Liu, Furong | National University of Defense Technology |
Ma, Ao | National University of Defense Technology |
Hu, Zhenzhen | National University of Defense Technology |
Li, Peng | National University of Defence Technology |
Keywords: Evolutionary Robotics, Machine Learning for Robot Control, Learning from Experience
Abstract: Meta-Reinforcement Learning (Meta-RL) is a machine learning paradigm aimed at learning reinforcement learning policies that can quickly adapt to unseen tasks with few-shot data. Nevertheless, applying Meta-RL to real-world applications faces challenges due to the cost of data acquisition. To address this problem, offline Meta-RL has emerged as a promising solution, focusing on learning policies from pre-collected data that can effectively and rapidly adapt to unseen tasks. In this paper, we propose a new offline Meta-RL method called Meta-Actor-Critic with Evolving Gradient Agreement (MACEGA). MACEGA utilizes an evolutionary approach to estimate meta-gradients conductive to generalization across unseen tasks. During meta-training, gradient evolution is utilized to meta-update the value network and policies. Moreover, we use gradient agreement as an optimization objective for meta-learning, thereby enhancing the generalization ability of the meta-policy. We experimentally demonstrate the robustness of MACEGA in handling offline data quality. Furthermore, extensive experiments on various benchmarks provide empirical evidence that MACEGA outperforms previous state-of-the-art methods in generalizing to unseen tasks, thus demonstrating its potential for real-world applications.
|
|
09:00-10:00, Paper FrPI6T6.16 | |
Stein Movement Primitives for Adaptive Multi-Modal Trajectory Generation |
|
Zeya, Yin | Univeristy of Sydney |
Lai, Tin | University of Sydney |
Khan, Subhan | The University of Sydney |
Jacob, Jayadeep | The University of Sydney |
Li, Yong Hui | Univeristy of Sydney |
Ramos, Fabio | University of Sydney, NVIDIA |
Keywords: Learning from Experience, Probabilistic Inference, Imitation Learning
Abstract: Probabilistic Movement Primitives (ProMPs) and their variants are powerful methods for robots to learn complex tasks from human demonstrations, where motion trajectories are represented as stochastic processes with Gaussian assumptions. However, despite being computationally efficient, this method has limited expressivity in capturing the diversity found in human demonstrations, which are typically characterised by the multi- modality of motions. For example, when picking up an object partially obscured by an obstacle, some individuals may opt to go to the right while others may choose the left side of the object. In this paper, we introduce Stein Movement Primitives (SMPs), a novel approach to probabilistic movement primitives. We formulate motion primitive adaptation as non-parametric probabilistic inference using Stein Variational Gradient Descent (SVGD), thus not constraining our method to any posterior distribution assumption and enabling direct representation of the multi-modality in human demonstrations. We illustrate how our method can adapt robot motion to different scenarios for collision avoidance and adaptation to new tasks without being restricted to single modal assumptions. Experimentally, we demonstrate our approach on several domain adaptation problems using the LASA dataset and with a real robotic arm.
|
|
FrPI6T7 |
Room 7 |
Optimal Control in Robotics |
Teaser Session |
Chair: Tortora, Stefano | University of Padova |
|
09:00-10:00, Paper FrPI6T7.1 | |
Toward Control of Wheeled Humanoid Robots with Unknown Payloads: Equilibrium Point Estimation Via Real-To-Sim Adaptation |
|
Baek, DongHoon | University of Illinois Urbana-Champaign |
Sim, Youngwoo | University of Illinois at Urbana-Champaign |
Purushottam, Amartya | University of Illinois, Urbana-Champaign |
Gupta, Saurabh | UIUC |
Ramos, Joao | University of Illinois at Urbana-Champaign |
Keywords: Wheeled Robots, Representation Learning, Transfer Learning
Abstract: Model-based controllers using a linearized model around the system’s equilibrium point is a common approach in the control of a wheeled humanoid due to their less computational load and ease of stability analysis. However, controlling a wheeled humanoid robot while it lifts an unknown object presents significant challenges, primarily due to the lack of knowledge in object dynamics. This paper presents a framework designed for predicting the new equilibrium point explicitly to control a wheeled-legged robot with unknown dynamics.We estimated the total mass and center of mass of the system from its response to initially unknown dynamics, then calculated the new equilibrium point accordingly. To avoid using additional sensors (e.g., force torque sensor) and reduce the effort of obtaining expensive real data, a data-driven approach is utilized with a novel real-to-sim adaptation. A more accurate nonlinear dynamics model, offering a closer representation of real-world physics, is injected into a rigid-body simulation for real-to-sim adaptation. The nonlinear dynamics model parameters were optimized using Particle Swarm Optimization. The efficacy of this framework was validated on a physical wheeled inverted pendulum, a simplified model of a wheeledlegged robot. The experimental results indicate that employing a more precise analytical model with optimized parameters significantly reduces the gap between simulation and reality, thus improving the efficiency of a model-based controller in controlling a wheeled robot with unknown dynamics.
|
|
09:00-10:00, Paper FrPI6T7.2 | |
CLIPSwarm: Generating Drone Shows from Text Prompts with Vision-Language Models |
|
Pueyo, Pablo | Universidad De Zaragoza |
Montijano, Eduardo | Universidad De Zaragoza |
Murillo, Ana Cristina | University of Zaragoza |
Schwager, Mac | Stanford University |
Keywords: Art and Entertainment Robotics, Optimization and Optimal Control, AI-Based Methods
Abstract: This paper introduces CLIPSwarm, a new algorithm designed to automate the modeling of swarm drone formations based on natural language. The algorithm begins by enriching a provided word, to compose a text prompt that serves as input to an iterative approach to find the formation that best matches the provided word. The algorithm iteratively refines formations of robots to align with the textual description, employing different steps for “exploration” and “exploitation”. Our framework is currently evaluated on simple formation targets, limited to con- tour shapes. A formation is visually represented through alpha-shape contours and the most representative color is automatically found for the input word. To measure the similarity between the description and the visual representation of the formation, we use CLIP [1], encoding text and images into vectors and assessing their similarity. Subsequently, the algorithm rearranges the formation to visually represent the word more effectively, within the given constraints of available drones. Control actions are then assigned to the drones, ensuring robotic behavior and collision-free movement. Experimental results demonstrate the system’s efficacy in accurately modeling robot formations from natural language descriptions. The algorithm’s versatility is showcased through the execution of drone shows in photorealistic simulation with varying shapes. We refer the reader to the supplementary video for a visual reference of the results.
|
|
09:00-10:00, Paper FrPI6T7.3 | |
Robust Two-View Geometry Estimation with Implicit Differentiation |
|
Pyatov, Vladislav | Skolkovo Institute of Science and Technology |
Koshelev, Iaroslav | AI Foundation and Algorithm Lab |
Lefkimmiatis, Stamatios | MTS AI |
Keywords: Computational Geometry, Optimization and Optimal Control, SLAM
Abstract: We present a novel two-view geometry estimation framework which is based on a differentiable robust loss function fitting. We propose to treat the robust fundamental matrix estimation as an implicit layer, which allows us to avoid backpropagation through time and significantly improves the numerical stability. To take full advantage of the information from the feature matching stage we incorporate learnable weights that depend on the matching confidences. In this way our solution brings together feature extraction, matching and two-view geometry estimation in a unified end-to-end trainable pipeline. We evaluate our approach on the camera pose estimation task in both outdoor and indoor scenarios. The experiments on several datasets show that the proposed method outperforms both classic and learning-based state-of-the-art methods by a large margin. The project webpage is available at: https://github.com/VladPyatov/ihls
|
|
09:00-10:00, Paper FrPI6T7.4 | |
Robustifying Model-Based Locomotion by Zero-Order Stochastic Nonlinear Model Predictive Control with Guard Saltation Matrix |
|
Katayama, Sotaro | Sony Group Corporation |
Takasugi, Noriaki | Sony Group Corporation |
Kaneko, Mitsuhisa | Sony Global Manufacturing & Operations Corporation |
Nagatsuka, Norio | Sony Interactive Entertainment |
Kinoshita, Masaya | Sony Group Corporation |
Keywords: Optimization and Optimal Control, Multi-Contact Whole-Body Motion Planning and Control, Legged Robots
Abstract: This paper presents a stochastic/robust nonlinear model predictive control (NMPC) to enhance the robustness of model-based legged locomotion against contact uncertainties. We integrate the contact uncertainties into the covariance propagation of stochastic/robust NMPC framework by leveraging the guard saltation matrix and an extended Kalman filter-like covariance update. We achieve fast stochastic/robust NMPC computation by utilizing the zero-order algorithm with additional improvements in computational efficiency concerning the feedback gains. We conducted numerical experiments and demonstrate that the proposed method can accurately forecast future state covariance and generate trajectories that satisfies constraints even in the presence of the contact uncertainties. Hardware experiments on the perceptive locomotion of a wheeled-legged robot were also carried out, validating the feasibility of the proposed method in a real-world system with limited on-board computation.
|
|
09:00-10:00, Paper FrPI6T7.5 | |
Momentum-Aware Trajectory Optimisation Using Full-Centroidal Dynamics and Implicit Inverse Kinematics |
|
Papatheodorou, Aristotelis | University of Oxford |
Merkt, Wolfgang Xaver | University of Oxford |
Mitchell, Alexander Luis | University of Oxford |
Havoutis, Ioannis | University of Oxford |
Keywords: Optimization and Optimal Control, Legged Robots, Dynamics
Abstract: The current state-of-the-art gradient-based optimisation frameworks are able to produce impressive dynamic manoeuvres such as linear and rotational jumps. However, these methods, which optimise over the full rigid-body dynamics of the robot, often require precise foothold locations apriori, while real time performance is not guaranteed without elaborate regularisation and tuning of the cost function. In contrast, we investigate the advantages of a task-space optimisation framework, with special focus on acrobatic motions. Our proposed formulation exploits the system’s high-order nonlinearities, such as the nonholonomy of the angular momentum, in order to produce feasible, high acceleration manoeuvres. By leveraging the full-centroidal dynamics of the quadruped ANYmal C and directly optimising its footholds and contact forces, the framework is capable of producing efficient motion plans with low computational overhead. Finally, we deploy our proposed framework on the ANYmal C platform, and demonstrate its true capabilities through real-world experiments, with the successful execution of high-acceleration motions, such as linear and rotational jumps. Extensive analysis of these shows that the robot’s dynamics can be exploited to surpass its hardware limitations of having a high mass and low-torque limits.
|
|
09:00-10:00, Paper FrPI6T7.6 | |
Model Predictive Control for Frenet-Cartesian Trajectory Tracking of a Tricycle Kinematic Automated Guided Vehicle |
|
Subash, Akash John | University of Freiburg |
Kloeser, Daniel | Ek Robotics |
Frey, Jonathan | University of Freiburg |
Reiter, Rudolf | University of Freiburg |
Diehl, Moritz | Univ. of Heidelberg |
Bohlmann, Karsten | Eberhard-Karls-Universität Tübingen |
Keywords: Optimization and Optimal Control, Kinematics, Embedded Systems for Robotic and Automation
Abstract: This work proposes an optimal control scheme for a trajectory-tracking Automated Guided Vehicle considering motion and collision constraints in a warehouse environment. We outline how the simpler obstacle avoidance constraints in the Cartesian Coordinate Frame (CCF) can be retained, while projecting the tricycle kinematics to the Frenet Coordinate Frame (FCF) for track progress. The Nonlinear Model Pre- dictive Control (NMPC) scheme is subsequently implemented using acados and its real-time feasibility is demonstrated in simulation and aboard a test vehicle at a warehouse.
|
|
09:00-10:00, Paper FrPI6T7.7 | |
Ensuring Joint Constraints of Torque-Controlled Robot Manipulators under Bounded Jerk |
|
Ko, Dongwoo | POSTECH |
Kim, Jonghyeok | POSTECH |
Chung, Wan Kyun | POSTECH |
Keywords: Optimization and Optimal Control, Robot Safety
Abstract: This paper proposes an optimization-based control framework for the torque-controlled robot, which can satisfy the joint position, velocity, and acceleration constraints under a bounded jerk. The optimization filter is incorporated as a module to modify the nominal controller output to ensure joint constraints. To formulate the optimization problem as a QP, the torque optimization problem is converted to the jerk optimization problem using the augmented state, and the constraints are reformulated to be affine in the jerk. Here, the viable constraints are derived using the time-optimal braking policy to guarantee the feasibility of the QP. The proposed method was validated in simulation and with a 6-DOF robot manipulator.
|
|
09:00-10:00, Paper FrPI6T7.8 | |
Collaboration Strategies for Two Heterogeneous Pursuers in a Pursuit-Evasion Game Using Deep Reinforcement Learning |
|
Zhong, Zhanping | Beihang University |
Dong, Zhuoning | Beihang University |
Duan, Xiaoming | Shanghai Jiao Tong University |
He, Jianping | Shanghai Jiao Tong University |
Keywords: Optimization and Optimal Control, Reinforcement Learning, Agent-Based Systems
Abstract: We investigate a pursuit-evasion game taking place in an unbounded three-dimensional space, where a flexible pursuer with hybrid dynamics collaborates with a fast pursuer and aims to capture a flexible evader within a finite time. The key feature of this problem lies in the hybrid dynamics of the flexible pursuer, which can change its dynamics once during the game and switch to a fast pursuer with increased speed but lower maneuverability. To address this challenge, we devise a hybrid strategy based on the soft actor-critic framework, tailored specifically for the flexible pursuer, which encompasses both maneuvering and switch tactics. We introduce a switch factor to the input of the actor network and incorporate switch actions to further expand the action space. These additions enable the flexible pursuer to execute maneuvering actions and determine a moment to switch to a fast pursuer. The reward function is designed to account for related angle, altitude, speed, and sparse reward. Through extensive ablation experiments conducted in a simulated environment, we demonstrate the efficacy of our algorithm in facilitating the learning of hybrid strategies for the flexible pursuer, resulting in significantly improved capture rates compared to alternative methods.
|
|
09:00-10:00, Paper FrPI6T7.9 | |
Adaptive Trajectory Database Learning for Nonlinear Control with Hybrid Gradient Optimization |
|
Tseng, Kuan-Yu | University of Illinois at Urbana-Champaign |
Zhang, Mengchao | University of Illinois at Urbana-Champaign |
Hauser, Kris | University of Illinois at Urbana-Champaign |
Dullerud, Geir E. | University of Illinois |
Keywords: Optimization and Optimal Control, Continual Learning, Learning from Experience
Abstract: This paper presents a novel experience-based technique, called EHGO, for sample-efficient adaptive control of nonlinear systems in the presence of dynamical modeling errors. The starting point for EHGO is a database seeded with many trajectories optimized under a reference estimate of real system dynamics. When executed on the real system, these trajectories will be suboptimal due to errors in the reference dynamics. The approach then leverages a hybrid gradient optimization technique, GRILC, which observes executed trajectories and computes gradients from the reference model to refine the control policy without requiring an explicit model of the real system. In past work, GRILC was applied in a restrictive setting in which a robot executes multiple rollouts from identical start states. In this paper, we show how to leverage a database to enable GRILC to operate across a wide envelope of possible start states in different iterations. The database is used to balance between start state proximity and recentness-of-experience via a learned distance metric to generate good initial guesses. Experiments on three dynamical systems (pendulum, car, drone) show that the proposed approach adapts quickly to online experience even when the reference model has significant errors. In these examples EHGO generates near-optimal solutions within hundreds of epochs of real execution, which can be orders of magnitude more sample efficient than reinforcement learning techniques.
|
|
09:00-10:00, Paper FrPI6T7.10 | |
Bi-Level Trajectory Optimization on Uneven Terrains with Differentiable Wheel-Terrain Interaction Model |
|
Manoharan, Amith | University of Tartu |
Sharma, Aditya | Robotics Research Center, IIIT Hyderabad |
Belsare, Himani | International Institute of Information Technology, Hyderabad |
Pal, Kaustab | International Institute of Information Technology, Hyderabad |
Krishna, Madhava | IIIT Hyderabad |
Singh, Arun Kumar | University of Tartu |
Keywords: Optimization and Optimal Control, Motion and Path Planning, Autonomous Vehicle Navigation
Abstract: Navigation of wheeled vehicles on uneven terrain necessitates going beyond the 2D approaches for trajectory planning. Specifically, it is essential to incorporate the full 6dof variation of vehicle pose and its associated stability cost in the planning process. To this end, most recent works aim to learn a neural network model to predict vehicle evolution. However, such approaches are data-intensive and fraught with generalization issues. In this paper, we present a purely model-based approach that just requires the digital elevation information of the terrain. Specifically, we express the wheel-terrain interaction and 6dof pose prediction as a non-linear least squares (NLS) problem. As a result, trajectory planning can be viewed as a bi-level optimization. The inner optimization layer predicts the pose on the terrain along a given trajectory, while the outer layer deforms the trajectory itself to reduce the stability and kinematic costs of the pose. We improve the state-of-the-art in the following respects. First, we show that our NLS-based pose prediction closely matches the output of a high-fidelity physics engine. This result, coupled with the fact that we can query gradients of the NLS solver, makes our pose predictor a differentiable wheel-terrain interaction model. We further leverage this differentiability to efficiently solve the proposed bi-level trajectory optimization problem. Finally, we perform extensive experiments and comparisons with a baseline to showcase the effectiveness of our approach in obtaining smooth, stable trajectories.
|
|
09:00-10:00, Paper FrPI6T7.11 | |
Disturbance-Aware Model Predictive Control of Underactuated Robotics Systems |
|
Kim, Jiwon | KAIST |
Kim, Min Jun | KAIST |
Keywords: Optimization and Optimal Control, Motion Control
Abstract: While robust model predictive control (MPC) has been studied extensively in recent decades, addressing unmatched disturbances in underactuated robotic systems is still challenging. In this paper, we propose a method to enhance the robustness of the MPC through the online estimation of disturbances using a nonlinear disturbance observer (NDOB). We call this method disturbance-aware MPC (DA-MPC), because the proposed method explicitly utilizes the estimated disturbance in the future prediction. We provide a performance analysis of the NDOB, establishing the boundedness between the predicted and real states. The main advantages of the DA-MPC include its applicability to real-time control and its compatibility with off-the-shelf optimal control problem (OCP) solvers. We demonstrate the application of the proposed method using an underactuated quadrotor system. The simulation validation shows the effectiveness of the proposed method compared to L1-adaptive MPC, which is one of the state-of-the-art robust MPC methods.
|
|
09:00-10:00, Paper FrPI6T7.12 | |
A Fast Online Omnidirectional Quadrupedal Jumping Framework Via Virtual-Model Control and Minimum Jerk Trajectory Generation |
|
Yue, Linzhu | The Chinese University of Hong Kong |
Zhang, Lingwei | Hong Kong Centre for Logistics Robotics |
Song, Zhitao | The Chinese University of Hong Kong |
Zhang, Hongbo | The Chinese University of Hong Kong |
Dong, Jinhu | Tongji University |
Zeng, Xuanqi | Chinese University of Hong Kong |
Liu, Yunhui | Chinese University of Hong Kong |
Keywords: Motion Control, Optimization and Optimal Control, Legged Robots
Abstract: Exploring the limits of quadruped robot agility, particularly in the context of rapid and real-time planning and execution of omnidirectional jump trajectories, presents significant challenges due to the complex dynamics involved, especially when considering significant impulse contacts. This paper introduces a new framework to enable fast, omnidirectional jumping capabilities for quadruped robots. Utilizing minimum jerk technology, the proposed framework efficiently generates jump trajectories that exploit its analytical solutions, ensuring numerical stability and dynamic compatibility with minimal computational resources. The virtual model control is employed to formulate a Quadratic Programming (QP) optimization problem to accurately track the Center of Mass (CoM) trajectories during the jump phase. In contrast, whole-body control strategies facilitate precise and compliant landing motion. The framework's efficacy is demonstrated through its implementation on an enhanced version of the open-source textit{Mini Cheetah} robot. Omnidirectional jumps—including forward, backward, and other directional—were successfully executed, showcasing the robot's capability to perform rapid and consecutive jumps with an average trajectory generation and tracking solution time of merely 50 microseconds.
|
|
09:00-10:00, Paper FrPI6T7.13 | |
Adaptive Feedforward Super-Twisting Sliding Mode Control of Parallel Kinematic Manipulators with Real-Time Experiments |
|
Saied, Hussein | Univesity of Montpellier, LIRMM |
Chemori, Ahmed | LIRMM, University of Montpellier, CNRS |
Bouri, Mohamed | EPFL |
El Rafei, Maher | Lebanese University, Faculty of Engineering, CRSI |
Francis, Clovis | Lebanese University |
Keywords: Robust/Adaptive Control, Dynamics, Parallel Robots
Abstract: In this paper, we propose a novel adaptive feedforward super-twisting sliding mode control algorithm to resolve the tracking control problem of parallel manipulators. The proposed control scheme includes three main terms, (i) the standard super-twisting algorithm, (ii) an adaptive feedforward dynamic model, and (iii) a feedback term to ensure stability. The proposed controller provides robustness towards uncertainties and disturbances, less sensitive to measurement noise, and allows dynamic parameters adaptation of the manipulator while executing a certain task. Real-time experiments are conducted on a 3-DOF non-redundant Delta parallel robot, including two main scenarios, (i) nominal case, and (ii) robustness towards operating acceleration changes. The relevance of the proposed controller is verified experimentally in both scenarios and compared with two other controllers from the literature, including the standard and the feedforward super-twisting sliding mode control algorithms.
|
|
09:00-10:00, Paper FrPI6T7.14 | |
SoftMAC: Differentiable Soft Body Simulation with Forecast-Based Contact Model and Two-Way Coupling with Articulated Rigid Bodies and Clothes |
|
Liu, Min | Carnegie Mellon University |
Yang, Gang | National University of Singapore |
Luo, Siyuan | Xi'an Jiaotong University |
Shao, Lin | National University of Singapore |
Keywords: Simulation and Animation, Optimization and Optimal Control
Abstract: Differentiable physics simulation provides an avenue to tackle previously intractable challenges through gradient-based optimization, thereby greatly improving the efficiency of solving robotics-related problems. To apply differentiable simulation in diverse robotic manipulation scenarios, a key challenge is to integrate various materials in a unified framework. We present SoftMAC, a differentiable simulation framework that couples soft bodies with articulated rigid bodies and clothes. SoftMAC simulates soft bodies with the continuum-mechanics-based Material Point Method (MPM). We provide a novel forecast-based contact model for MPM, which effectively reduces penetration without introducing other artifacts like unnatural rebound. To couple MPM particles with deformable and non-volumetric clothes meshes, we also propose a penetration tracing algorithm that reconstructs the signed distance field in local area. Diverging from previous works, SoftMAC simulates the complete dynamics of each modality and incorporates them into a cohesive system with an explicit and differentiable coupling mechanism. The feature empowers SoftMAC to handle a broader spectrum of interactions, such as soft bodies serving as manipulators and engaging with underactuated systems. We conducted comprehensive experiments to validate the effectiveness and accuracy of the proposed differentiable pipeline in downstream robotic manipulation applications. Supplementary materials are available on our project website at https://damianliumin.github.io/SoftMAC.
|
|
09:00-10:00, Paper FrPI6T7.15 | |
Task-Based Design and Policy Co-Optimization for Tendon-Driven Underactuated Kinematic Chains |
|
Islam, Sharfin | Columbia University |
He, Zhanpeng | Columbia University |
Ciocarlie, Matei | Columbia University |
Keywords: Underactuated Robots, Mechanism Design, Optimization and Optimal Control
Abstract: Underactuated manipulators reduce the number of bulky motors, thereby enabling compact and mechanically robust designs. However, fewer actuators than joints means that the manipulator can only access a specific manifold within the joint space, which is particular to a given hardware configuration and can be low-dimensional and/or discontinuous. Determining an appropriate set of hardware parameters for this class of mechanisms, therefore, is difficult - even for traditional task-based co-optimization methods. In this paper, our goal is to implement a task-based design and policy co-optimization method for underactuated, tendon-driven manipulators. We first formulate a general model for an underactuated, tendon-driven transmission. We then use this model to co-optimize a three-link, two-actuator kinematic chain using reinforcement learning. We demonstrate that our optimized tendon transmission and control policy can be transferred reliably to physical hardware with real-world reaching experiments.
|
|
FrPI6T8 |
Room 8 |
Robot Motion Planning V |
Teaser Session |
Chair: Quattrini Li, Alberto | Dartmouth College |
|
09:00-10:00, Paper FrPI6T8.1 | |
Search-Based Strategy for Spatio-Temporal Environmental Property Restoration |
|
Docena, Amel Nestor | Dartmouth College |
Quattrini Li, Alberto | Dartmouth College |
Keywords: Environment Monitoring and Management, Task Planning
Abstract: This paper addresses the spatio-temporal areas restoration problem: a robot, with limited battery life, deployed in a known environment, needs to persistently plan a schedule to visit areas of interest and charge its battery as needed. The goal is to restore the areas' temporal properties that decay over time—--such as air quality--—so that the time the measured property values are below a certain threshold is minimized. This problem is different from typical problems solved in the area of monitoring a spatio-temporal environment. A related problem is the orienteering problem, where a robot visits nodes to maximize the profit collected at each visited node within a time budget frame. That problem is NP-hard. The typical formulation considers static profit, while we consider a time-varying one. Given look-ahead time window or schedule length, we formulate the problem as an optimization search problem with a temporal objective, and devise a heuristic function that enables finding solutions in polynomial time. The heuristic evaluates the discounted opportunity costs of a visit—a concept borrowed from economics. We then develop a greedy algorithm that takes the immediate feasible visit that minimizes this heuristic. This strategy addresses a primary limitation of a recent approach in applications where being able to revisit highly urgent areas within the time window of the schedule is critical. We provide a theoretical analysis on lower and upper bounds for the problem. Extensive experimental results with a robotic simulator show that our method is able to keep the areas in the environment above the threshold better than other methods and closer to the optimal. This work can enable high-impact applications, such as environmental preservation.
|
|
09:00-10:00, Paper FrPI6T8.2 | |
Elliptical K-Nearest Neighbors - Path Optimization Via Coulomb's Law and Invalid Vertices in C-Space Obstacles |
|
Zhang, Liding | Technical University of Munich |
Bing, Zhenshan | Technical University of Munich |
Zhang, Yu | Technical University of Munich |
Cai, Kuanqi | Technical University of Munich |
Chen, Lingyun | Technical University of Munich |
Wu, Fan | Technical University of Munich |
Haddadin, Sami | Technical University of Munich |
Knoll, Alois | Tech. Univ. Muenchen TUM |
Keywords: Motion and Path Planning, Task and Motion Planning, Path Planning for Multiple Mobile Robots or Agents
Abstract: Path planning has long been an important and active research area in robotics. To address challenges in high-dimensional motion planning, this study introduces the Force Direction Informed Trees (FDIT*), a sampling-based planner designed to enhance speed and cost-effectiveness in pathfinding. FDIT* builds upon the state-of-the-art informed sampling planner, the Effort Informed Trees (EIT*), by capitalizing on often-overlooked information in invalid vertices. It incorporates principles of physical force, particularly Coulomb's law. This approach proposes the elliptical k-nearest neighbors search method, enabling fast convergence navigation and avoiding high solution cost or infeasible paths by exploring more problem-specific search-worthy areas. It demonstrates benefits in search efficiency and cost reduction, particularly in confined, high-dimensional environments. It can be viewed as an extension of nearest neighbors search techniques. Fusing invalid vertex data with physical dynamics facilitates force-direction-based search regions, resulting in an improved convergence rate to the optimum. FDIT* outperforms existing single-query, sampling-based planners on the tested problems in R^4 to R^16 and has been demonstrated on a real-world mobile manipulation task.
|
|
09:00-10:00, Paper FrPI6T8.3 | |
EMPOWER: Embodied Multi-Role Open-Vocabulary Planning with Online Grounding and Execution |
|
Argenziano, Francesco | Sapienza University of Rome |
Brienza, Michele | Sapienza University of Rome |
Suriani, Vincenzo | University of Basilicata |
Nardi, Daniele | Sapienza University of Rome |
Bloisi, Domenico | International University of Rome UNINT |
Keywords: Semantic Scene Understanding, Task Planning
Abstract: Task planning for robots in real-life settings presents significant challenges. These challenges stem from three primary issues: the difficulty in identifying grounded sequences of steps to achieve a goal; the lack of a standardized mapping between high-level actions and low-level commands; and the challenge of maintaining low computational overhead given the limited resources of robotic hardware. We introduce EMPOWER, a framework designed for open-vocabulary online grounding and planning for embodied agents aimed at addressing these issues. By leveraging efficient pre-trained foundation models and a multi-role mechanism, EMPOWER demonstrates notable improvements in grounded planning and execution. Quantitative results highlight the effectiveness of our approach, achieving an average success rate of 0.73 across six different real-life scenarios using a TIAGo robot.
|
|
09:00-10:00, Paper FrPI6T8.4 | |
Extended Tree Search for Robot Task and Motion Planning |
|
Ren, Tianyu | Technische Universität Darmstadt |
Chalvatzaki, Georgia | Technische Universität Darmstadt |
Peters, Jan | Technische Universität Darmstadt |
Keywords: Task and Motion Planning, Manipulation Planning, Service Robotics
Abstract: Integrated Task and Motion Planning (TAMP) offers opportunities for achieving generalized autonomy in robots but also poses challenges. It involves searching in both symbolic task space and high-dimensional motion space, while also addressing geometrically infeasible actions within its hierarchical process. We introduce a novel TAMP decision- making framework, utilizing an extended decision tree for both symbolic task planning and high-dimensional motion variable binding. Employing top-k planning, we generate a skeleton space with diverse candidate plans, seamlessly integrating it with motion variable spaces into an extended decision space. Subsequently, Monte-Carlo Tree Search (MCTS) is utilized to maintain a balance between exploration and exploitation at decision nodes, ultimately yielding optimal solutions. Our approach combines symbolic top-k planning with concrete mo- tion variable binding, leveraging MCTS for proven optimality, resulting in a powerful algorithm for handling combinatorial complexity in long-horizon manipulation tasks. Empirical eval- uations demonstrate the algorithm’s effectiveness in diverse, challenging robot tasks, in comparison with its most competitive baseline method.
|
|
09:00-10:00, Paper FrPI6T8.5 | |
HPHS: Hierarchical Planning Based on Hybrid Frontier Sampling for Unknown Environments Exploration |
|
Long, Shijun | Beijing Institute of Technology |
Li, Ying | Beijing Institute of Technology |
Wu, Chenming | Baidu Research |
Xu, Bin | Beijing Institute of Technology |
Fan, Wei | Beijing Institute of Technology |
Keywords: Task and Motion Planning, Search and Rescue Robots, Mapping
Abstract: Rapid sampling from the environment to acquire available frontier points and timely incorporating them into subsequent planning to reduce fragmented regions are critical to improve the efficiency of autonomous exploration. We propose HPHS, a fast and effective method for autonomous exploration of unknown environments. In this work, we quickly sample hybrid frontier points directly from the LiDAR data and the local map around the robot, and exploit hierarchical planning strategy to provide the robot with a global perspective. The hierarchical planning framework divides the environment into multiple subregions and arranges the order of access to them by evaluating the revenue of each subregion. The combination of the frontier point sampling method and hierarchical planning strategy reduces the complexity of the planning problem and mitigates the issue of region remnants during the exploration process. Detailed simulation and real world experiments demonstrate the effectiveness and efficiency of our approach in various aspects. The source code will be released to benefit the further research.
|
|
09:00-10:00, Paper FrPI6T8.6 | |
Multi-Robot Multi-Goal Mission Planning in Terrains of Varying Energy Consumption |
|
Herynek, Jáchym | Czech Technical University in Prague |
Edelkamp, Stefan | Computer Science & Artificial Intelligence Center Faculty of Ele |
Keywords: Task and Motion Planning, Path Planning for Multiple Mobile Robots or Agents, Multi-Robot Systems
Abstract: This paper considers planning missions for a fleet of robots with limited energy. Each robot has size, heading, and velocity and its motion is described by non-linear differential equations. The dynamics of movements, existing obstacles, multiple robots, and waypoints are additional challenges, as the combined task and motion planning procedure prevents collisions. On their long-term missions, robots have to visit several waypoints in a cost-minimizing manner to satisfy the overall mission task. The robots consume energy and have to be recharged. The framework guides expanding a motion tree via a state projection to a discrete problem, whose solutions serve as search heuristics. Our experiments highlight that despite all these challenges, even sizable problem tasks can be solved even for complex environments
|
|
09:00-10:00, Paper FrPI6T8.7 | |
A Framework for Neurosymbolic Goal-Conditioned Continual Learning for Open World Environments |
|
Lorang, Pierrick | AIT Austrian Institute of Technology GmbH - Tufts University |
Goel, Shivam | Tufts University |
Shukla, Yash | Tufts University |
Zips, Patrik | AIT Austrian Institute of Technology GmbH |
Scheutz, Matthias | Tufts University |
Keywords: Task and Motion Planning, AI-Based Methods, Continual Learning
Abstract: In dynamic open-world environments, agents continually face new challenges due to sudden and unpredictable novelties, hindering Task and Motion Planning (TAMP) in autonomous systems. We introduce a novel TAMP architecture that integrates symbolic planning with reinforcement learning to enable autonomous adaptation in such environments, operating without human guidance. Our approach employs symbolic goal representation within a goal-oriented learning framework, coupled with planner-guided goal identification, effectively managing abrupt changes where traditional reinforcement learning, re-planning, and hybrid methods fall short. Through sequential novelty injections in our experiments, we assess our method's adaptability to continual learning scenarios. Extensive simulations conducted in a robotics domain corroborate the superiority of our approach, demonstrating faster convergence to higher performance compared to traditional methods. The success of our framework in navigating diverse novelty scenarios within a continuous domain underscores its potential for critical real-world applications.
|
|
09:00-10:00, Paper FrPI6T8.8 | |
Multi-Stage Monte Carlo Tree Search for Non-Monotone Object Rearrangement Planning in Narrow Confined Environments |
|
Ren, Hanwen | Purdue University |
Qureshi, Ahmed H. | Purdue University |
Keywords: Task and Motion Planning, Task Planning, AI-Based Methods
Abstract: Non-monotone object rearrangement planning in confined spaces such as cabinets and shelves is a widely occurring but challenging problem in robotics. Both the robot motion and the available regions for object relocation are highly constrained because of the limited space. This work proposes a Multi-Stage Monte Carlo Tree Search (MS-MCTS) method to solve non-monotone object rearrangement planning problems in confined spaces. Our approach decouples the complex problem into simpler subproblems using an object stage topology. A subgoal-focused tree expansion algorithm that jointly considers the high-level planning and the low-level robot motion is designed to reduce the search space and better guide the search process. By fitting the task into the MCTS paradigm, our method generates short object rearrangement sequences by balancing exploration and exploitation. The experiments demonstrate that our method outperforms the existing methods in terms of the planning time, the number of steps, the object moving distance and the gripper moving distance. Moreover, we deploy our MS-MCTS to a real-world robot system and verify its performance in different scenarios.
|
|
09:00-10:00, Paper FrPI6T8.9 | |
LLM^3: Large Language Model-Based Task and Motion Planning with Motion Failure Reasoning |
|
Wang, Shu | UCLA |
Han, Muzhi | University of California, Los Angeles |
Jiao, Ziyuan | Beijing Institute for General Artificial Intelligence |
Zhang, Zeyu | Beijing Institute for General Artificial Intelligence |
Wu, Ying Nian | University of California, Los Angeles |
Zhu, Song-Chun | UCLA |
Liu, Hangxin | Beijing Institute for General Artificial Intelligence (BIGAI) |
Keywords: Task and Motion Planning
Abstract: Conventional Task and Motion Planning (TAMP)approaches rely on manually designed interfaces connecting symbolic task planning with continuous motion generation. These domain-specific and labor-intensive modules are limited in addressing emerging tasks in real-world settings. Here, we present LLM^3, a novel LLM-based TAMP framework featuring a domain-independent interface. Specifically, we leverage the powerful reasoning and planning capabilities of pre-trained LLM to propose symbolic action sequences and select continuous action parameters for motion planning. Crucially, LLM^3 incorporates motion planning feedback through prompting, allowing the LLM to iteratively refine its proposals by reasoning about motion failure. Consequently, LLM^3 interfaces between task planning and motion planning, alleviating the intricate design process of handling domain-specific messages between them. Through a series of simulations in a box-packing domain, we quantitatively demonstrate the effectiveness of LLM^3 in solving TAMP problems and the efficiency in selecting action parameters. Ablation studies underscore the significant contribution of motion failure reasoning to the success of LLM^3. Furthermore, we conduct qualitative experiments on a physical manipulator, demonstrating the practical applicability of our approach in real-world settings.
|
|
09:00-10:00, Paper FrPI6T8.11 | |
StratXplore: Strategic Novelty-Seeking and Instruction-Aligned Exploration for Vision and Language Navigation |
|
Gopinathan, Muraleekrishna | Edith Cowan University |
Abu-Khalaf, Jumana | Edith Cowan University |
Suter, David | Edith Cowan University, School of Science, Centre of AI and Mach |
Masek, Martin | ECU |
Keywords: Task and Motion Planning, Vision-Based Navigation, Autonomous Agents
Abstract: Embodied navigation requires robots to understand and interact with the environment based on given tasks. Vision-Language Navigation (VLN) is an embodied navigation task, where a robot navigates within a previously seen and unseen environment, based on linguistic instruction and visual inputs. VLN agents need access to both local and global action spaces; former for immediate decision making and the latter for recovering from navigational mistakes. Prior VLN agents rely only on instruction-viewpoint alignment for local and global decision making and back-track to a previously visited viewpoint, if the instruction and its current viewpoint mismatches. These methods are prone to mistakes, due to the complexity of the instruction and partial observability of the environment. We posit that, back-tracking is sub-optimal and agent that is aware of its mistakes can recover efficiently. For optimal recovery, exploration should be extended to unexplored viewpoints (or frontiers). The optimal frontier is a recently observed but unexplored viewpoint that aligns with the instruction and is novel. We introduce a memory-based and mistake-aware path planning strategy for VLN agents, called StratXplore, that presents global and local action planning to select the optimal frontier for path correction. The proposed method collects all past actions and viewpoint features during navigation and then selects the optimal frontier suitable for recovery. Experimental results show this simple yet effective strategy improves the success rate on two VLN datasets with different task complexities.
|
|
09:00-10:00, Paper FrPI6T8.12 | |
Efficient Target Singulation with Multi-Fingered Gripper Using Propositional Logic |
|
Kim, Hyojeong | Korea Institute of Science and Technology (KIST) |
Jo, Jeong Yong | Hanyang University |
Lim, Myo-Taeg | Korea University |
Kim, ChangHwan | Korea Institute of Science and Technology |
Keywords: Task and Motion Planning, Manipulation Planning, Task Planning
Abstract: When multiple tablewares are closely packed on a table, rearranging obstacles to make space is necessary to grasp the target, often called target singulation. Due to the nature of handling fragile tablewares (i.e. plates, bowls), we make a few assumptions for the target singulation problem. First, tableware is grasped with a multi-fingered gripper; second, rearrangement is based on prehensile motions like pick-and- place. Under these assumptions, we aim to generate a relocation plan that guarantees global optimality. Furthermore, if any relocation plan cannot singulate the target, we aim to determine it quickly. Therefore, we propose a search method that utilizes the relationship between the object and its nearby obstacles expressed in propositional logic. We define the problem as determining logical entailment (i.e., whether the target can be singulated) and expand the search tree from the target while generating an optimal relocation plan. We demonstrate the performance of our algorithm by increasing the number of objects and validate the plan in a simulation environment.
|
|
09:00-10:00, Paper FrPI6T8.13 | |
Reactive Temporal Logic-Based Planning and Control for Interactive Robotic Tasks |
|
Savvas Sadiq Ali, Farhad Nawaz | University of Pennsylvania |
Peng, Shaoting | University of Pennsylvania |
Lindemann, Lars | University of Southern California |
Figueroa, Nadia | University of Pennsylvania |
Matni, Nikolai | University of Pennsylvania |
Keywords: Task and Motion Planning, Physical Human-Robot Interaction, Formal Methods in Robotics and Automation
Abstract: Robots interacting with humans must be safe, reactive and adapt online to unforeseen environmental and task changes. Achieving these requirements concurrently is a challenge as interactive planners lack formal safety guarantees, while safe motion planners lack flexibility to adapt. To tackle this, we propose a modular control architecture that generates both safe and reactive motion plans for human-robot interaction by integrating temporal logic-based discrete task level plans with continuous Dynamical System (DS)-based motion plans. We formulate a reactive temporal logic formula that enables users to define task specifications through structured language, and propose a planning algorithm at the task level that generates a sequence of desired robot behaviors while being adaptive to environmental changes. At the motion level, we incorporate control Lyapunov functions and control barrier functions to compute stable and safe continuous motion plans for two types of robot behaviors: (i) complex, possibly periodic motions given by autonomous DS and (ii) time-critical tasks specified by Signal Temporal Logic~(STL). Our methodology is demonstrated on the Franka robot arm performing wiping tasks on a whiteboard and a mannequin that is compliant to human interactions and adaptive to environmental changes.
|
|
09:00-10:00, Paper FrPI6T8.14 | |
NLNS-MASPF for Solving Multi-Agent Scheduling and Path-Finding |
|
Park, Heemang | KAIST |
Ahn, Kyuree | Omelet |
Park, Jinkyoo | Korea Advanced Institute of Science and Technology |
Keywords: Task Planning, Path Planning for Multiple Mobile Robots or Agents, Cooperating Robots
Abstract: In this work, we propose a novel method, NLNS-MASPF, to solve the Multi-Agent Scheduling and Pathfinding (MASPF) problem. The problem exhibits a bi-level structure, consisting of High-level Scheduling and Low-level Pathfinding. Our method applies a graph neural network in the high-level scheduling process and utilizes a MAPF solver with a schedule segmenting technique in the low-level pathfinding process. Through these approaches, NLNS-MASPF has experimentally demonstrated superior performance compared to the previous state-of-the-art MASPF algorithm, LNS-PBS, in solving the MASPF problem.
|
|
09:00-10:00, Paper FrPI6T8.15 | |
DoReMi: Grounding Language Model by Detecting and Recovering from Plan-Execution Misalignment |
|
Guo, Yanjiang | Tsinghua University |
Wang, Yen-Jen | Tsinghua University |
Zha, Lihan | Stanford University |
Chen, Jianyu | Tsinghua University |
Keywords: Task Planning, Machine Learning for Robot Control, Manipulation Planning
Abstract: Large language models (LLMs) encode a vast amount of semantic knowledge and possess remarkable understanding and reasoning capabilities. Previous work has explored how to ground LLMs in robotic tasks to generate feasible and executable textual plans. However, low-level execution in the physical world may deviate from the high-level textual plan due to environmental perturbations or imperfect controller design. In this paper, we propose textbf{DoReMi}, a novel language model grounding framework that enables immediate Detection and Recovery from Misalignments between plan and execution. Specifically, we leverage LLMs to play a dual role, aiding not only in high-level planning but also generating constraints that can indicate misalignment during execution. Then vision language models (VLMs) are utilized to detect constraint violations continuously. Our pipeline can monitor the low-level execution and enable timely recovery if certain plan-execution misalignment occurs. Experiments on various complex tasks including robot arms and humanoid robots demonstrate that our method can lead to higher task success rates and shorter task completion times.
|
|
09:00-10:00, Paper FrPI6T8.16 | |
Sequential Discrete Action Selection Via Blocking Conditions and Resolutions |
|
Merz Hoffmeister, Liam | Yale University |
Scassellati, Brian | Yale |
Rakita, Daniel | Yale University |
Keywords: Task Planning
Abstract: In this work, we introduce a strategy that frames the sequential action selection problem for robots in terms of resolving blocking conditions, i.e., situations that impede progress on an action en route to a goal. This strategy allows a robot to make one-at-a-time decisions that take in pertinent contextual information and swiftly adapt and react to current situations. We present a first instantiation of this strategy that combines a state-transition graph and a zero-shot Large Language Model (LLM). The state-transition graph tracks which previously attempted actions are currently blocked and which candidate actions may resolve existing blocking conditions. This information from the state-transition graph is used to automatically generate a prompt for the LLM, which then uses the given context and set of possible actions to select a single action to try next. This selection process is iterative, with each chosen and executed action further refining the state-transition graph, continuing until the agent either fulfills the goal criteria or encounters a termination condition. We demonstrate the effectiveness of our approach by comparing it to various LLM and traditional task-planning methods in a testbed of simulation experiments. We discuss the implications of our work based on our results.
|
|
09:00-10:00, Paper FrPI6T8.17 | |
SMART-LLM: Smart Multi-Agent Robot Task Planning Using Large Language Models |
|
Kannan, Shyam Sundar | Purdue University |
Venkatesh, L.N Vishnunandan | Purdue University |
Min, Byung-Cheol | Purdue University |
Keywords: Task Planning, Multi-Robot Systems
Abstract: In this work, we introduce SMART-LLM, an innovative framework designed for embodied multi-robot task planning. SMART-LLM: Smart Multi-Agent Robot Task Planning using Large Language Models (LLMs), harnesses the power of LLMs to convert high-level task instructions provided as input into a multi-robot task plan. It accomplishes this by executing a series of stages, including task decomposition, coalition formation, and task allocation, all guided by programmatic LLM prompts within the few-shot prompting paradigm. We create a benchmark dataset designed for validating the multi-robot task planning problem, encompassing four distinct categories of high-level instructions that vary in task complexity. Our evaluation experiments span both simulation and real-world scenarios, demonstrating that the proposed model can achieve promising results for generating multi-robot task plans. The experimental videos, code, and datasets from the work can be found at https://sites.google.com/view/smart-llm/.
|
|
FrPI6T9 |
Room 9 |
Telerobotics and Teleoperation |
Teaser Session |
Chair: Piater, Justus | University of Innsbruck |
|
09:00-10:00, Paper FrPI6T9.1 | |
Local Linearity Is All You Need (in Data Driven Teleoperation) |
|
Przystupa, Michael | University of Alberta |
Gidel, Gauthier | Université De Montréal |
Taylor, Matthew | University of Alberta |
Jagersand, Martin | University of Alberta |
Piater, Justus | University of Innsbruck |
Tosatto, Samuele | University of Innsbruck |
Keywords: Telerobotics and Teleoperation, Imitation Learning, Deep Learning Methods
Abstract: One of the critical aspects of assistive robotics is to provide a control system of a high-dimensional robot from a low-dimensional user input (i.e. a 2D joystick). Data-driven teleoperation seeks to provide an intuitive user interface called an action map to map the low dimensional input to robot velocities from human demonstrations. Action maps are machine learning models trained on robotic demonstration data to map user input directly to desired movements as opposed to aspects of robot pose (“move to cup or pour content” vs. “move along x- or y-axis”). Many works have investigated nonlinear action maps with multi-layer perceptrons, but recent work suggests that local-linear neural approximations provide better control of the system. However, local linear models assume actions exist on a linear subspace and may not capture nuanced motions in training data. In this work, we hypothesize that local-linear neural networks are effective because they make the action map odd w.r.t. the user input, enhancing the intuitiveness of the controller. Based on this assumption, we propose two nonlinear means of encoding odd behavior that do not constrain the action map to a local linear function. However, our analysis reveals that these models effectively behave like local linear models for relevant mappings between user joysticks and robot movements. We support this claim in simulation, and show on a realworld use case that there is no statistical benefit of using non-linear maps, according to the users experience. These negative results suggest that further investigation into model architectures beyond local linear models may offer diminishing returns for improving user experience in data-driven teleoperation systems.
|
|
09:00-10:00, Paper FrPI6T9.2 | |
GELLO: A General, Low-Cost, and Intuitive Teleoperation Framework for Robot Manipulators |
|
Wu, Shiyao | University of California, Berkeley |
Shentu, Yide | University of California -- Berkeley |
Yi, Zhongke | Covariant |
Lin, Xingyu | UC Berkeley |
Abbeel, Pieter | UC Berkeley |
Keywords: Telerobotics and Teleoperation, Learning from Demonstration, Bimanual Manipulation
Abstract: Humans can teleoperate robots to accomplish complex manipulation tasks. Imitation learning has emerged as a powerful framework that leverages human teleoperated demonstrations to teach robots new skills. However, the performance of the learned policies is bottlenecked by the quality, scale, and variety of the demonstration data. In this paper, we aim to lower the barrier to collecting large and high-quality human demonstration data by proposing a GEneraL framework for building LOw-cost and intuitive teleoperation systems for robotic manipulation (GELLO). Given a target robot arm, we build a GELLO controller device that has the same kinematic structure as the target arm, leveraging 3D-printed parts and economical off-the-shelf motors. GELLO is easy to build and intuitive to use. Through an extensive user study, we show that GELLO enables more reliable and efficient demonstration collection compared to other cost efficient teleoperation devices commonly used in the imitation learning literature such as virtual reality controllers and 3D spacemouses. We further demonstrate the capabilities of GELLO for performing complex bi-manual and contact-rich manipulation tasks. To make GELLO accessible to everyone, we have designed and built GELLO systems for 3 commonly used robotic arms: Franka, UR5, and xArm. All software and hardware are open-sourced and can be found on our website: https://wuphilipp.github.io/gello/.
|
|
09:00-10:00, Paper FrPI6T9.3 | |
Real-Time Dexterous Telemanipulation with an End-Effect-Oriented Learning-Based Approach |
|
Wang, Haoyang | Oklahoma State University |
Bai, He | Oklahoma State University |
Zhang, Xiaoli | Colorado School of Mines |
Jung, Yunsik | Colorado School of Mines |
Bowman, Michael | University of Pennsylvania |
Tao, Lingfeng | Oklahoma State University |
Keywords: Telerobotics and Teleoperation, Dexterous Manipulation, Reinforcement Learning
Abstract: Dexterous telemanipulation is crucial in advancing human-robot systems, especially in tasks requiring precise and safe manipulation. However, it faces significant challenges due to the physical differences between human and robotic hands, the dynamic interaction with objects, and the indirect control and perception of the remote environment. Current approaches predominantly focus on mapping the human hand onto robotic counterparts to replicate motions, which exhibits a critical oversight: it often neglects the physical interaction with objects and relegates the interaction burden to the human to adapt and make laborious adjustments in response to the indirect and counter-intuitive observation of the remote environment. This work develops an End-Effects-Oriented Learning-based Dexterous Telemanipulation (EFOLD) framework to address telemanipulation tasks. EFOLD models telemanipulation as a Markov Game, introducing multiple end-effect features to interpret the human operator's commands during interaction with objects. These features are used by a Deep Reinforcement Learning policy to control the robot and reproduce such end effects. EFOLD was evaluated with real human subjects and two end-effect extraction methods for controlling a virtual Shadow Robot Hand in telemanipulation tasks. EFOLD achieved real-time control capability with low command following latency (delay<0.11s) and highly accurate tracking (MSE<0.084 rad).
|
|
09:00-10:00, Paper FrPI6T9.4 | |
Development of a Bilateral Control Teleoperation System for Bipedal Humanoid Robot Utilizing Foot Sole Haptics Feedback |
|
Shen, Yang | Faculty of Science and Engineering, Waseda University |
Kanazawa, Masanobu | Waseda University |
Mori, Kazuki | Waseda University |
Isono, Ryu | Faculty of Science and Engineering, Waseda University |
Nakazawa, Yuri | Waseda University |
Takanishi, Atsuo | Waseda University |
Otani, Takuya | Shibaura Institute of Technology |
Keywords: Telerobotics and Teleoperation, Haptics and Haptic Interfaces, Humanoid Robot Systems
Abstract: Teleoperating bipedal humanoid robots presents unique challenges, including decreased stability and reduced operator presence. This paper addresses these challenges by proposing a method that leverages the operator's inherent sense of stability by feedback from a sole haptics display to operate a bipedal humanoid robot. We developed a bilateral control system that integrates a device replicating sole haptics feedback and provides the operator with feedback on changes in the robot’s center of gravity. We conducted operating experiments in the forward-backward direction to evaluate its effectiveness and investigate the effectiveness of sole haptics on robot operation. The results demonstrate that operating with both vision and sole haptics feedback significantly reduces the robot's fall rate by over 56% when disturbances are applied, compared to using only vision feedback. Moreover, operators reported a 21% higher sense of presence with both vision and sole haptics feedback compared to using only vision feedback.
|
|
09:00-10:00, Paper FrPI6T9.5 | |
Immersive Human-In-The-Loop Control: Real-Time 3D Surface Meshing and Physics Simulation |
|
Akturk, Sait | University of Alberta |
Valentine, Justin | University of Alberta |
Ahmad, Junaid | University of Alberta |
Jagersand, Martin | University of Alberta |
Keywords: Telerobotics and Teleoperation, Virtual Reality and Interfaces, Simulation and Animation
Abstract: This paper introduces the TactiMesh Teleoperator Interface (TTI), a novel predictive visual and haptic system designed explicitly for human-in-the-loop robot control using a head-mounted display (HMD). By employing simultaneous localization and mapping (SLAM) in tandem with a space carving method (CARV), TTI creates a real-time 3D surface mesh of remote environments from an RGB camera mounted on a Barrett WAM arm. The generated mesh is integrated into a physics simulator, featuring a digital twin of the WAM robot arm to create a virtual environment. In this virtual environment, TTI provides haptic feedback directly in response to the operator’s movements, eliminating the problem with delayed response from the haptic follower robot. Furthermore, texturing the 3D mesh with keyframes from SLAM allows the operator to control the viewpoint of their Head Mounted Dis- play (HMD) independently of the arm-mounted robot camera, giving a better visual immersion and improving manipulation speed. Incorporating predictive visual and haptic feedback significantly improves tele-operation in applications such as search and rescue, inspection, and remote maintenance.
|
|
09:00-10:00, Paper FrPI6T9.6 | |
6D Variable Virtual Fixtures for Telemanipulated Insertion Tasks |
|
Schwarz, Stephan Andreas | Chemnitz University of Technology |
Thomas, Ulrike | Chemnitz University of Technology |
Keywords: Telerobotics and Teleoperation, Physical Human-Robot Interaction, Safety in HRI
Abstract: Telemanipulation enables humans to perform tasks in dangerous environments without exposing them to any risk. The COVID-19 pandemic sadly showed, that these environments can also include the treatment and interaction with infected patients. Since human-robot interactions demand for low interaction forces yet high precision, telemanipulation often results in a high mental workload for the operator. To overcome this, we present a virtual guidance approach to perform telemanipulated insertion tasks. A nasopharyngeal swap sampling procedure is taken as use case. We extend our previously presented approach by adding an additional position fixture, introducing distance-dependent variable stiffness values and guaranteeing stability using energy tanks. Based on RGB-D data, the operator is guided towards a desirable insertion line while approaching the nostril. The distance-dependent stiffness values increase the smoothness of the fixture. Since variable stiffness values can result in unstable behavior, energy tanks for the fixtures are introduced. Experiments show the improvements compared to our previous approach. Further, a comparison between guided and unguided samplings performed by an expert user gives a first impression of the improvements resulting from the fixture.
|
|
09:00-10:00, Paper FrPI6T9.7 | |
Evaluation of Predictive Display for Teleoperated Driving Using CARLA Simulator |
|
Kashwani, Fatima | Khalifa University |
Hassan, Bilal | Khalifa University, Abu Dhabi |
Kong, Peng-Yong | Khalifa University |
Khonji, Majid | Khalifa University |
Dias, Jorge | Khalifa University |
Keywords: Telerobotics and Teleoperation, Object Detection, Segmentation and Categorization, Machine Learning for Robot Control
Abstract: Before the world-wide deployment of autonomous vehicles, it is essential to implement intermediate solutions with partial autonomy. One such solution is the use of vehicle teleoperation, the act of controlling a vehicle from a distance. In real time applications of teleoperation, it is often pertinent to use augmented reality components within the teleoperator view, which are referred to as a predictive display. In this work, we evaluate our predictive display method, which is a guiding path based on the free space in the environment. The path is generated based on our Dual Transformer Network (DTNet), which uses both object detection and lane semantic segmentation to define the free space in the environment. While the model has previously performed well on image data, it is necessary to observe its accuracy in the presence of time delay and packet loss, to assess its performance in a real-time setting. Thus, in this work, we use CARLA simulator to compare the detected free space on the teleoperator side to the true free space on the vehicle side across different values of time delay and packet loss. Under optimal network conditions, our model yielded a remarkable 87.9% DSC score and 81.3% IoU score. Defining our minimum performance threshold as 80% DSC and 70% IoU, we conclude that our model can effectively mitigate the challenges of time delay below 100ms and packet loss below 1%, both of which represent substantial tolerances.
|
|
09:00-10:00, Paper FrPI6T9.8 | |
User-Customizable Shared Control for Robot Teleoperation Via Virtual Reality |
|
Luo, Rui | Northeastern University |
Zolotas, Mark | Northeastern University |
Moore, Drake | Northeastern University |
Padir, Taskin | Northeastern University |
Keywords: Telerobotics and Teleoperation, Human Factors and Human-in-the-Loop, Virtual Reality and Interfaces
Abstract: Shared control can ease and enhance a human operator's ability to teleoperate robots, particularly for intricate tasks demanding fine control over multiple degrees of freedom. However, the arbitration process dictating how much autonomous assistance to administer in shared control can confuse novice operators and impede their understanding of the robot's behavior. To overcome these adverse side-effects, we propose a novel formulation of shared control that enables operators to tailor the arbitration to their unique capabilities and preferences. Unlike prior approaches to customizable shared control where users could indirectly modify the latent parameters of the arbitration function by issuing a feedback command, we instead make these parameters observable and directly editable via a virtual reality (VR) interface. We present our user-customizable shared control method for a teleoperation task in SE(3), known as the buzz wire game. A user study is conducted with participants teleoperating a robotic arm in VR to complete the game. The experiment spanned two weeks per subject to investigate longitudinal trends. Our findings reveal that users allowed to interactively tune the arbitration parameters across trials generalize well to adaptations in the task, exhibiting improvements in precision and fluency over direct teleoperation and conventional shared control.
|
|
09:00-10:00, Paper FrPI6T9.9 | |
Exploring Cognitive Load Dynamics in Human-Machine Interaction for Teleoperation: A User-Centric Perspective on Remote Operation System Design |
|
García Cárdenas, Juan José | ENSTA - Institute Polytechinique De Paris |
Hei, Xiaoxuan | ENSTA Paris, Institut Polytechnique De Paris |
Tapus, Adriana | ENSTA Paris, Institut Polytechnique De Paris |
Keywords: Telerobotics and Teleoperation, Human Factors and Human-in-the-Loop, Design and Human Factors
Abstract: Teleoperated robots, especially in hazardous environments, integrate human cognition with machine efficiency, but can increase cognitive load, causing stress and reducing task performance and safety. This study examines the impact of the information available to the operator on cognitive load, physiological responses (e.g., GSR, blinking, facial temperature), and performance during teleoperation in three conditions: C1 - in presence, C2 - remote with Visual feedback, and C3 - remote with telepresence robot. The findings from our user study involving 20 participants show that information availability significantly impacts perceived cognitive load, as evidenced by the differences observed between conditions in our analysis. Furthermore, the results indicated that blinking rates varied significantly among the conditions. The results also underline that individuals with higher error scores on the spatial orientation test (SOT), reflecting lower spatial ability, are more likely to experience failure in conditions 2 and 3. The results show that information availability significantly affects cognitive load and teleoperation performance, especially depth perception of the robot's actions. Additionally, the thermal and GSR data findings indicate an increase in stress and anxiety levels when operators perform conditions 2 and 3, thus corroborating an increase in the user's cognitive load.
|
|
09:00-10:00, Paper FrPI6T9.10 | |
Deep Learning-Based Delay Compensation Framework for Teleoperated Wheeled Rovers on Soft Terrains |
|
Abubakar, Ahmad | Khalifa University |
Zweiri, Yahya | Khalifa University |
Yakubu, Mubarak | Khalifa University |
Alhammadi, Ruqqayya | Khalifa University |
Mohiuddin, Mohammed | Khalifa University |
Haddad, Abdel Gafoor | Khalifa University |
Dias, Jorge | Khalifa University |
Seneviratne, Lakmal | Khalifa University |
Keywords: Telerobotics and Teleoperation, Wheeled Robots, Space Robotics and Automation
Abstract: The difficulties posed by terrain-induced slippage for wheeled rovers traversing soft terrains are critical to ensuring safe and precise mobility. While bilateral teleoperation systems offer a promising solution to this issue, the inherent network-induced delays hinder the fidelity of the closed-loop integration, potentially compromising teleoperator system controls, and resulting in poor command-tracking performance. This work introduces a new model-free predictor framework based on deep learning designed to improve prediction performance and effectively compensate for large network delays in teleoperated wheeled rovers. Our approach employs the Recurrent Neural Network (RNN) to achieve a significant improvement in modeling complexity and prediction accuracy. Particularly, our framework consists of two distinct predictors, each tailored to the forward and backward coupling variables of the teleoperated wheeled rover. Human-in-the-loop experiments were conducted to validate the effectiveness of the developed framework in compensating for the delays encountered by teleoperated wheeled rovers coupled with terrain-induced slippage. The results confirm the improved prediction accuracy of the framework. This improvement is evidenced by improved performance and transparency metrics, which lead to better command-tracking performance.
|
|
09:00-10:00, Paper FrPI6T9.11 | |
An Optimization Based Scheme for Real-Time Transfer of Human Arm Motion to Robot Arm |
|
Yang, Zhelin | Technical University of Munich |
Bien, Seongjin | Technical University of Munich |
Nertinger, Simone | Technical University of Munich |
Naceri, Abdeldjallil | Technical University of Munich |
Haddadin, Sami | Technical University of Munich |
Keywords: Telerobotics and Teleoperation, Motion Control, Human-Centered Robotics
Abstract: Performing human-like motion is crucial for service humanoid robots. Real-time motion retargeting allows clear observation of the robot's pose and provides instant feedback during human demonstrator actions. This paper presents an optimization-based real-time anthropomorphic motion retargeting framework for transferring human arm motion to a robot arm. The framework is generic, applicable to both spherical-rotational-spherical (SRS) and non-SRS robot arms. We introduce the normalized normal vector of the arm plane as an anthropomorphic criterion within our framework. The method is validated on a service humanoid robot, with both static and dynamic evaluations. The statistical analysis show that our method maintains strong anthropomorphic features while ensuring accurate wrist pose tracking.
|
|
09:00-10:00, Paper FrPI6T9.12 | |
A Tetherless Soft Robotic Wearable Haptic Human Machine Interface for Robot Teleoperation |
|
Thakur, Shilpa | Worcester Polytechnic Institute |
Diaz Armas, Nathalia | University of Massachusetts Lowell |
Adegite, Joseph | Worcester Polytechnic Institute |
Pandey, Ritwik | Worcester Polytechnic Institute |
Mead, Joey | University of Massachusetts Lowell |
Rao, Pratap | Worcester Polytechnic Institute |
Onal, Cagdas | WPI |
Keywords: Telerobotics and Teleoperation, Haptics and Haptic Interfaces, Soft Sensors and Actuators
Abstract: This work describes the development, demonstration, and performance evaluation study of a wearable human machine interface (HMI) for robotic teleoperation. We present a novel tetherless HMI in the form of a backpack, body-worn 3-D arm motion capture sensors, finger flexion sensors, and haptic feedback muscles embedded in a glove. The system is integrated in a complete teleoperation framework, enabling users to be immersed in a remote environment through virtual reality headgear, facilitating intuitive manipulation of an industrial articulated arm. The designed HMI measures the kinematic configuration of the user’s arm, hand and fingers using multiple inertial measurement units (IMUs) and capacitive sensors respectively. Subsequently, the captured data is channeled into the teleoperation software stack. The gripping forces experienced at the robot’s end-effector are acquired using a custom three-dimensional Hall-effect magnetic sensor. The system simultaneously renders the kinesthetic and tactile feedback on the user’s fingers through custom designed pneumatically actuated soft robotic haptic muscles. The efficacy of the HMI and the teleoperation system was tested and evaluated by conducting user study experiments, which showed 31.4% faster teleoperation vs. a keypad controller and 60% less gripping force exerted when the haptics were enabled. The findings of the pilot study guided the design and prototype development of a printed electronics based stretchable sleeve and glove motion capture unit to improve the portability, ergonomics and user experience of the HMI.
|
|
FrPI6T10 |
Room 10 |
Simultaneous Localization and Mapping (SLAM) VI |
Teaser Session |
Chair: Yue, Yufeng | Beijing Institute of Technology |
Co-Chair: Kornilova, Anastasiia | Skolkovo Institute of Science and Technology |
|
09:00-10:00, Paper FrPI6T10.1 | |
PS-Loc: Robust LiDAR Localization with Prior Structural Reference |
|
Li, Rui | Shanghai Jiao Tong University |
Zhao, Wentao | Shanghai Jiao Tong University |
Deng, Tianchen | Shanghai Jiao Tong University |
Yanbo, Wang | Shanghai Jiao Tong University |
Wang, Jingchuan | Shanghai Jiao Tong University |
Keywords: Localization, SLAM
Abstract: Prior structural reference like floor plan is readily accessible in indoor scene, which exhibits the potential of improving localization quality without the requirements of a previously-built high-precision map. This paper introduces a novel optimal transport-based framework for prior structural reference-based localization, aiming to improve the robustness for the robot localization. Leveraging the spacial relations of structures, a matching method based on optimal transport theory is proposed and it improves the robustness of matching results in dynamic scene and rapid rotation conditions. Additionally, this paper handles metric inaccuracies in the known structural reference by implementing an prior guided plane adjustment-based updating strategy. This strategy combines prior and observational information to jointly optimize the structural information within a sliding window. The performance of the framework is validated through real-world experiments, demonstrating superior accuracy and robustness to disturbances from dynamic occlusion and rapid rotation compared to common state-of-the-art SLAM and localization methods.
|
|
09:00-10:00, Paper FrPI6T10.2 | |
Backpropagation-Based Analytical Derivatives of EKF Covariance for Active Sensing |
|
Benhamou, Jonas | Mines Paris/Safran |
Bonnabel, Silvere | Mines ParisTech |
Chapdelaine, Camille | Safran SA |
Keywords: Localization, Optimization and Optimal Control, Reactive and Sensor-Based Planning
Abstract: To enhance accuracy of robot state estimation, active sensing (or perception-aware) methods seek trajectories that maximize the information gathered by the sensors. To this aim, one possibility is to seek trajectories that minimize the (estimation error) covariance matrix output by an extended Kalman filter (EKF), w.r.t. its control inputs over a given horizon. However, this is computationally demanding. In this article, we derive novel backpropagation analytical formulas for the derivatives of the covariance matrices of an EKF w.r.t. all its inputs. We then leverage the obtained analytical gradients as an enabling technology to derive perception-aware optimal motion plans. Simulations validate the approach, showcasing improvements in execution time, notably over PyTorch's automatic differentiation. Experimental results on a real vehicle also support the method.
|
|
09:00-10:00, Paper FrPI6T10.3 | |
EgoVM: Achieving Precise Ego-Localization Using Lightweight Vectorized Maps |
|
He, Yuzhe | Baidu |
Liang, Shuang | Baidu |
Rui, XiaoFei | BAIDU |
Cai, Chengying | Baidu |
Wan, Guowei | Baidu |
Keywords: Localization, Sensor Fusion
Abstract: Accurate and reliable ego-localization is critical for autonomous driving. In this paper, we present EgoVM, an end-to-end localization network that achieves comparable localization accuracy to prior state-of-the-art methods, but uses lightweight vectorized maps instead of heavy point-based maps. To begin with, we extract BEV features from online multi-view images and LiDAR point cloud. Then, we employ a set of learnable semantic embeddings to encode the semantic types of map elements and supervise them with semantic segmentation, to make their feature representation consistent with BEV features. After that, we feed map queries, composed of learnable semantic embeddings and coordinates of map elements, into a transformer decoder to perform cross-modality matching with BEV features. Finally, we adopt a robust histogram-based pose solver to estimate the optimal pose by searching exhaustively over candidate poses. We comprehensively validate the effectiveness of our method using both the nuScenes dataset and a newly collected dataset. The experimental results show that our method achieves centimeter-level localization accuracy, and outperforms existing methods using vectorized maps by a large margin. Furthermore, our model has been extensively tested in a large fleet of autonomous vehicles under various challenging urban scenes.
|
|
09:00-10:00, Paper FrPI6T10.4 | |
Deep Sensor Fusion with Constraint Safety Bounds for High Precision Localization |
|
Schmidt, Sebastian | BMW |
Stumpp, Ludwig | AppliedAI Initiative GmbH |
Valverde Garrro, Diego | BMW |
Günnemann, Stephan | Technical University of Munich |
Keywords: Wheeled Robots, Sensor Fusion, Localization
Abstract: In mobile robotics, particularly in autonomous driving, localization is one of the key challenges for navigation and planning. For safe operation in the open world where vulnerable participants are present, precise and guaranteed safe localization is required. While current classical fusion approaches are safe due to provably bounded closed-form formulation, their situation-adaptivity is limited. In contrast, data-driven approaches are situation-adaptive based on the underlying training data but unbounded and unsafe. In our work, we propose a novel data-driven but provably bounded sensor fusion and apply it to mobile robotic localization. In extensive experiments using an autonomous driving test vehicle, we show that our fusion method outperforms other safe fusion approaches.
|
|
09:00-10:00, Paper FrPI6T10.5 | |
LCP-Fusion: A Neural Implicit SLAM with Enhanced Local Constraints and Computable Prior |
|
Wang, Jiahui | Beijing Institute of Technology |
Deng, Yinan | Beijing Institute of Technology |
Yang, Yi | Beijing Institute of Technology |
Yue, Yufeng | Beijing Institute of Technology |
Keywords: Mapping, SLAM
Abstract: Recently the dense Simultaneous Localization and Mapping (SLAM) based on neural implicit representation has shown impressive progress in hole filling and high-fidelity mapping. Nevertheless, existing methods either heavily rely on known scene bounds or suffer inconsistent reconstruction due to drift in potential loop-closure regions, or both, which can be attributed to the inflexible representation and lack of local constraints. In this paper, we present LCP-Fusion, a neural implicit SLAM system with enhanced local constraints and computable prior, which takes the sparse voxel octree structure containing feature grids and SDF priors as hybrid scene representation, enabling the scalability and robustness during mapping and tracking. To enhance the local constraints, we propose a novel sliding window selection strategy based on visual overlap to address the loop-closure, and a practical warping loss to constrain relative poses. Moreover, we estimate SDF priors as coarse initialization for implicit features, which brings additional explicit constraints and robustness, especially when a light but efficient adaptive early ending is adopted. Experiments demonstrate that our method achieve better localization accuracy and reconstruction consistency than existing RGB-D implicit SLAM, especially in challenging real scenes (ScanNet) as well as self-captured scenes with unknown scene bounds. The code is available at https://github.com/laliwang/LCP-Fusion.
|
|
09:00-10:00, Paper FrPI6T10.6 | |
Long-Term Map-Maintenance in Changing Environments Using Ray-Bundle-Impact-Factor Estimation |
|
Breitfuss, Matthias | Karlsruhe Institute of Technology (KIT) |
Geimer, Marcus | Karlsruhe Institute of Technology |
Gruber, Christoph Johannes | Self-Employed |
Keywords: Mapping, Localization, Robotics and Automation in Construction
Abstract: Ensuring an accurate and robust localization is one of the most significant problems in the field of mobile robotics. In this context, map-based localization methods utilizing 3D LiDARS for environmental perception are widely used. Even if there exist multiple promising techniques in this field, the majority of approaches can only guarantee an accurate and robust operation if there is no deviation between the map and the real surroundings. Consequently, state of the art localization methods frequently suffer from unreliable results or even complete failure in the case of changing environments. In this paper, we propose an efficient technique for a precise and robust maintenance of localization maps through realtime incorporation of 3D LiDAR scans. Our map update procedure is based on a novel way of estimating the interference between laserbeams and map contents, denoted as Ray-Bundle-Impact-Factor (RBIF). Our technique additionally solves the widespread problem of disruptive hole creation caused by discretization effects. Experiments on real-world as well as synthetic data demonstrate the precision and stability of our method under various challenging conditions and evaluate our approach in comparison to multiple SOTA map maintenance algorithms.
|
|
09:00-10:00, Paper FrPI6T10.7 | |
DeepMIF: Deep Monotonic Implicit Fields for Large-Scale LiDAR 3D Mapping |
|
Yilmaz, Kutay | Technical University of Munich |
Niessner, Matthias | Technical University of Munich |
Kornilova, Anastasiia | Skolkovo Institute of Science and Technology |
Artemov, Alexey | Technical University of Munich |
Keywords: Mapping, Range Sensing, Deep Learning for Visual Perception
Abstract: Recently, significant progress has been achieved in sensing real large-scale outdoor 3D environments, particularly by using modern acquisition equipment such as LiDAR sensors. Unfortunately, they are fundamentally limited in their ability to produce dense, complete 3D scenes. To address this issue, recent learning-based methods integrate neural implicit representations and optimizable feature grids to approximate surfaces of 3D scenes. However, naively fitting samples along raw LiDAR rays leads to noisy 3D mapping results due to the nature of sparse, conflicting LiDAR measurements. Instead, in this work we depart from fitting LiDAR data exactly, instead letting the network optimize a non-metric monotonic implicit field defined in 3D space. To fit our field, we design a learning system integrating a monotonicity loss that enables optimizing neural monotonic fields and leverages recent progress in large-scale 3D mapping. Our algorithm achieves high-quality dense 3D mapping performance as cap- tured by multiple quantitative and perceptual measures and visual results obtained for Mai City, Newer College, and KITTI benchmarks. The code of our approach is publicly available at https://github.com/artonson/deepmif.
|
|
09:00-10:00, Paper FrPI6T10.8 | |
MM-Gaussian: 3D Gaussian-Based Multi-Modal Fusion for Localization and Reconstruction in Unbounded Scenes |
|
Wu, Chenyang | University of Science and Technology of China |
Duan, Yifan | University of Science and Technology of China |
Zhang, Xinran | University of Science and Technology of China |
Sheng, Yu | University of Science and Technology of China |
Ji, Jianmin | University of Science and Technology of China |
Zhang, Yanyong | University of Science and Technology of China |
Keywords: Mapping, SLAM
Abstract: Localization and mapping are critical tasks for various applications such as autonomous vehicles and robotics. The challenges posed by outdoor environments present particular complexities due to their unbounded characteristics. In this work, we present MM-Gaussian, a LiDAR-camera multi-modal fusion system for localization and mapping in unbounded scenes. Our approach is inspired by the recently developed 3D Gaussians, which demonstrate remarkable capabilities in achieving high rendering quality and fast rendering speed. Specifically, our system fully utilizes the geometric structure information provided by solid-state LiDAR to address the problem of inaccurate depth encountered when relying solely on visual solutions in unbounded, outdoor scenarios. Additionally, we utilize 3D Gaussian point clouds, with the assistance of pixel-level gradient descent, to fully exploit the color information in photos, thereby achieving realistic rendering effects.To further bolster the robustness of our system, we designed a relocalization module, which assists in returning to the correct trajectory in the event of a localization failure. Experiments conducted in multiple scenarios demonstrate the effectiveness of our method.
|
|
09:00-10:00, Paper FrPI6T10.9 | |
Large-Scale Indoor Mapping with Failure Detection and Recovery in SLAM |
|
Rahman, Sharmin | Amazon |
DiPietro, Robert | Johns Hopkins University |
Kedarisetti, Dharanish | Amazon |
Kulathumani, Vinod | Amazon |
Keywords: Mapping, SLAM, Failure Detection and Recovery
Abstract: This paper addresses the failure detection and recovery problem in visual-inertial based Simultaneous Localization and Mapping (SLAM) systems for large-scale indoor environments. Camera and Inertial Measurement Unit (IMU) are popular choices for SLAM in many robotics tasks (e.g., navigation) due to their complementary sensing capabilities and low cost. However, vision has inherent challenges even in well-lit scenes, including motion blur, lack of features, or even accidental camera blockage. These failures can cause drifts to accumulate over time and can severely impact the scalability of existing solutions to large areas. To address these issues, we propose an automatic map generation service with (i) a failure detection method based on visual feature tracking quality using a health tracker which identifies and discards faulty measurements and (ii) a continuous session merging approach in SLAM. Taken together, this allows us to handle erroneous data without any manual intervention, and allows us to scale to extremely large spaces. The proposed system has been validated on benchmark datasets. Also, experimental results on multiple custom large-scale grocery stores, each between 1700 m^2 to 3700 m^2, and duration 60 to 80 minutes, are presented. Our approach shows the lowest error in all large-scale SLAM cases when compared with state-of-the-art visual-inertial SLAM packages, which often produce highly erroneous trajectories or lose track. Additionally, we provide dense 3D reconstruction with the presence of a depth camera by simply registering the point cloud from RGB-D image with respect to the SLAM generated trajectory -- and the quality of the reconstruction illustrates the efficacy of our proposed method.
|
|
09:00-10:00, Paper FrPI6T10.10 | |
Active Loop Closure for OSM-Guided Robotic Mapping in Large-Scale Urban Environments |
|
Gao, Wei | University of Macau |
Sun, Zezhou | Nanjing University of Science and Technology |
Zhao, Mingle | University of Macau |
Xu, Chengzhong | University of Macau |
Kong, Hui | University of Macau |
Keywords: Autonomous Agents, View Planning for SLAM, Mapping
Abstract: The autonomous mapping of large-scale urban scenes presents significant challenges for autonomous robots. To mitigate the challenges, global planning, such as utilizing prior GPS trajectories from OpenStreetMap (OSM), is often used to guide the autonomous navigation of robots for mapping. However, due to factors like complex terrain, unexpected body movement, and sensor noise, the uncertainty of the robot's pose estimates inevitably increases over time, ultimately leading to the failure of robotic mapping. To address this issue, we propose a novel active loop closure procedure, enabling the robot to actively re-plan the previously planned GPS trajectory. The method can guide the robot to re-visit the previous places where the loop-closure detection can be performed to trigger the back-end optimization, effectively reducing errors and uncertainties in pose estimation. The proposed active loop closure mechanism is implemented and embedded into a real-time OSM-guided robot mapping framework. Empirical results on several large-scale outdoor scenarios demonstrate its effectiveness and promising performance.
|
|
09:00-10:00, Paper FrPI6T10.11 | |
MOE: A Dense LiDAR Moving Event Dataset, Detection Benchmark and LeaderBoard |
|
Chen, Zhiming | Hong Kong University of Science and Technology |
Fang, Haozhe | Hong Kong University of Science and Technology |
Chen, Jiapeng | The Individual Researcher |
Wang, Michael Yu | Mywang@gbu.edu.cn |
Yu, Hongyu | The Hong Kong University of Science and Technology |
Keywords: Data Sets for SLAM, Big Data in Robotics and Automation, Mapping
Abstract: Detecting moving events produced by moving objects is a crucial task in the realms of autonomous driving and mobile robots. Moving objects have the potential to create ghost artifacts in mapped environments and pose risks to autonomous navigation. LiDAR serves as a vital sensor for autonomous systems due to its ability to provide dense and precise range measurements. However, existing LiDAR datasets often lack sufficient discussion on the motion labeling of moving objects, containing only a limited representation of moving entities within a single scene. Furthermore, the methodologies for Moving Event Detection (MED) on LiDAR sensors have not been comprehensively explored or evaluated. To address these gaps, this study focuses on constructing a diverse LiDAR moving event dataset encompassing multiple scenes with a high density of moving objects. A thorough review of current MED techniques is conducted, followed by the establishment of a performance benchmark based on evaluating these methods using our dataset. Additionally, part sequences of the dataset are utilized to host an online MED competition, aimed at fostering collaboration within the research community and advancing related studies.
|
|
09:00-10:00, Paper FrPI6T10.12 | |
I-ASM: Iterative Acoustic Scene Mapping for Enhanced Robot Auditory Perception in Complex Indoor Environments |
|
Fu, Linya | Southern University of Science and Technology |
He, Yuanzheng | Southern University of Science and Technology |
Wang, Jiang | Southern University of Science and Technology |
Qiao, Xu | Department of Mechanical and Energy Engineering, Southern Univer |
Kong, He | Southern University of Science and Technology |
Keywords: Robot Audition, Mapping, Localization
Abstract: This paper addresses the challenge of acoustic scene mapping (ASM) in complex indoor environments with multiple sound sources. Unlike existing methods that rely on prior data association or SLAM frameworks, we propose a novel particle filter-based iterative framework, termed I-ASM, for ASM using a mobile robot equipped with a microphone array and LiDAR. I-ASM harnesses an innovative “implicit association” to align sound sources with Direction of Arrival (DoA) observations without requiring explicit pairing, thereby streamlining the mapping process. Given inputs including an occupancy map, DoA estimates from various robot positions, and corresponding robot pose data, I-ASM performs multi-source mapping through an iterative cycle of ``Filtering-Clustering-Implicit Associating''. The proposed framework has been tested in real-world scenarios with up to 10 concurrent sound sources, demonstrating its robustness against missing and false DoA estimates while achieving high-quality ASM results. To benefit the community, we open-source all the codes and data at https://github.com/AISLAB-sustech/Acoustic-Scene-Mapping
|
|
09:00-10:00, Paper FrPI6T10.13 | |
TivNe-SLAM: Dynamic Mapping and Tracking Via Time-Varying Neural Radiance Fields |
|
Duan, Chengyao | Yunnan University |
Yang, Zhiliu | Yunnan University |
Keywords: RGB-D Perception, SLAM, Localization
Abstract: Previous attempts to integrate Neural Radiance Fields (NeRF) into the Simultaneous Localization and Mapping (SLAM) framework either rely on the assumption of static scenes or require the ground truth camera poses, which impedes their application in real-world scenarios. This paper proposes a time-varying representation to track and reconstruct the dynamic scenes. Firstly, two processes, a tracking process and a mapping process, are maintained simultaneously in our framework. In the tracking process, all input images are uniformly sampled and then progressively trained in a self-supervised paradigm. In the mapping process, we leverage motion masks to distinguish dynamic objects from the static background, and sample more pixels from dynamic areas. Secondly, the parameter optimization for both processes is comprised of two stages: the first stage associates time with 3D positions to convert the deformation field to the canonical field. The second stage associates time with the embeddings of the canonical field to obtain colors and a Signed Distance Function (SDF). Lastly, we propose a novel keyframe selection strategy based on the overlapping rate. Our approach is evaluated on two synthetic datasets and one real-world dataset, and the experiments validate that our method achieves competitive results in both tracking and mapping when compared to existing state-of-the-art NeRF-based dynamic SLAM systems.
|
|
09:00-10:00, Paper FrPI6T10.14 | |
RCAL: A Lightweight Road Cognition and Automated Labeling System for Autonomous Driving Scenarios |
|
Chen, Jiancheng | Li Auto |
Yu, Chao | Li Auto |
Wang, Huayou | Li Auto |
Liu, Kun | Li Auto |
Zhan, Yifei | Li Auto |
Lang, Xianpeng | Li Auto |
Xue, Changliang | Li Auto |
Keywords: Semantic Scene Understanding, Mapping, Autonomous Vehicle Navigation
Abstract: Vectorized reconstruction and topological cognition of road structures are crucial for autonomous vehicles to handle complex scenes. Traditional frameworks rely heavily on high-definition (HD) maps, which place significant demands on storage, computation, and manual labor. To overcome these limitations, we introduce a lightweight Road Cognition and Automated Labeling (RCAL) system. It leverages lightweight road data captured from mass-produced vehicles to vectorize road elements and cognize their topology. RCAL compiles multi-trip data on cloud servers for enhanced accuracy and coverage, addressing the limitations of single-trip data. In the field of element extraction, we proposed a pivotal point priority sampling strategy that can balance the contradiction between road scale and processing efficiency. Additionally, traffic flow is utilized to enhance the accuracy of road topology cognition. With its impressive automation, reliability, and efficiency, RCAL stands as an advanced solution in the field. Our evaluations on the intersection dataset from the real world confirm that RCAL not only achieves comparable precision to traditional HD map labeling systems but also substantially reducing resource costs.
|
|
09:00-10:00, Paper FrPI6T10.15 | |
AutoInst: Automatic Instance-Based Segmentation of LiDAR 3D Scans |
|
Perauer, Cedric | Technical University of Munich |
Zhang, Haifan | Technical University of Munich |
Heidrich, Laurenz Adrian | Technical University of Munich |
Niessner, Matthias | Technical University of Munich |
Kornilova, Anastasiia | Skolkovo Institute of Science and Technology |
Artemov, Alexey | Technical University of Munich |
Keywords: Semantic Scene Understanding, Mapping, Deep Learning for Visual Perception
Abstract: Recently, progress in acquisition equipment such as LiDAR sensors has enabled sensing increasingly spacious outdoor 3D environments. Making sense of such 3D acquisitions requires fine-grained scene understanding, such as constructing instance-based 3D scene segmentations. Commonly, a neural network is trained for this task; however, this requires access to a large, densely annotated dataset, which is widely known to be challenging to obtain. To address this issue, in this work we propose to predict instance segmentations for 3D scenes in an unsupervised way, without relying on ground-truth annotations. To this end, we construct a learning framework consisting of two components: (1) a pseudo-annotation scheme for generating initial unsupervised pseudo-labels; and (2) a self-training algorithm for instance segmentation to fit robust, accurate instances from initial noisy proposals. To enable generating 3D instance mask proposals, we construct a weighted proxy-graph by connecting 3D points with edges integrating multi-modal image- and point-based self-supervised features, and perform graph- cuts to isolate individual pseudo-instances. We then build on a state-of-the-art point-based architecture and train a 3D instance segmentation model, resulting in significant refinement of initial proposals. To scale to arbitrary complexity 3D scenes, we design our algorithm to operate on local 3D point chunks and construct a merging step to generate scene-level instance segmentations. Experiments on the challenging SemanticKITTI benchmark demonstrate the potential of our approach, where it attains 13.3% higher Average Precision and 9.1% higher F1 score compared to the best-performing baseline. The code is publicly available at https://github.com/artonson/autoinst.
|
|
09:00-10:00, Paper FrPI6T10.16 | |
Visual Timing for Sound Source Depth Estimation in the Wild |
|
Sun, Wei | UT AUSTIN |
Qiu, Lili | UT Austin |
Keywords: Range Sensing, Sensor Fusion
Abstract: Inspired by the flash-to-bang theory, we develop FBDepth, the first passive audio-visual depth estimation framework. It is based on the difference between the time-of-flight (ToF) of the light and the sound. We formulate sound source depth estimation as an audio-visual event localization task for collision events. To approach decimeter-level depth accuracy, we design a coarse-to-fine pipeline to push the temporary localization accuracy from event-level to millisecond-level by aligning audio-visual correspondence and manipulating optical flow. FBDepth feeds the estimated visual timestamp together with the audio clip and objects visual features to regress the source depth. We use a mobile phone to collect 3.6K+ video clips with 24 different objects at up to 65m. FBDepth shows superior performance especially at a long range compared to monocular and stereo methods.
|
|
FrPI6T11 |
Room 11 |
Safety for Robots |
Teaser Session |
Chair: Althoefer, Kaspar | Queen Mary University of London |
|
09:00-10:00, Paper FrPI6T11.1 | |
TacLink-Integrated Robot Arm Toward Safe Human-Robot Interaction |
|
Luu, Quan | Japan Advanced Institute of Science and Technology |
Albini, Alessandro | University of Oxford |
Maiolino, Perla | University of Oxford |
Ho, Van | Japan Advanced Institute of Science and Technology |
Keywords: Force and Tactile Sensing, Soft Sensors and Actuators, Soft Robot Applications
Abstract: Recent developments in vision-based tactile sensing offer a simple means to enable robots to perceive touch interactions. However, existing sensors are primarily designed for small-scale applications like robotic hands, lacking research on their integration for large-sized robot bodies that can be leveraged for safe human-robot interactions. This paper explores the utilization of the previously-developed vision-based tactile sensing link (called TacLink) with soft skin as a safety control mechanism, which can serve as an alternative to conventional rigid robot links and impact observers. We characterize the behavior of a robot integrated with the soft TacLink in response to collisions, particularly employing a reactive control strategy. The controller is primarily driven by tactile force information acquired from the soft TacLink sensor through a data-driven sim2real learning method. Compared with a standard rigid link, the results obtained from collision experiments also confirm the advantages of our "soft" solution in impact resilience and in facilitating controls that are difficult to achieve with a stiff robot body. This study can act as a benchmark for assessing the efficiency of soft tactile-sensitive skins in reactive collision responses and open new safety standards for soft skin-based collaborative robots in human-robot interaction scenarios.
|
|
09:00-10:00, Paper FrPI6T11.2 | |
Fixing Symbolic Plans with Reinforcement Learning in Object-Based Action Spaces |
|
Thierauf, Christopher | Woods Hole Oceanographic Institution |
Scheutz, Matthias | Tufts University |
Keywords: Failure Detection and Recovery, Robust/Adaptive Control, Reinforcement Learning
Abstract: Abstract— Reinforcement learning techniques are widely used when robots have to learn new tasks but they typically operate on action spaces defined by the joints of the robot. We present a contrasting approach where actions spaces are the trajectories of objects in the environment, requiring robots to discover events such as object changes and behaviors that must occur to accomplish the task. We show that this allows robots to learn faster, to learn semantic representations that can be communicated to humans, and to learn in a manner that does not depend on the robot itself, enabling low-cost policy transfer between different types of robots. Our demonstrations can be replicated using provided source code.
|
|
09:00-10:00, Paper FrPI6T11.3 | |
Online Efficient Safety-Critical Control for Mobile Robots in Unknown Dynamic Multi-Obstacle Environments |
|
Zhang, Yu | Technical University of Munich |
Tian, Guangyao | Technische Universität München |
Wen, Long | Technical University of Munich |
Yao, Xiangtong | Technical University of Munich |
Zhang, Liding | Technical University of Munich |
Bing, Zhenshan | Technical University of Munich |
He, Wei | University of Science and Technology Beijing |
Knoll, Alois | Tech. Univ. Muenchen TUM |
Keywords: Robot Safety, Collision Avoidance, Motion and Path Planning
Abstract: This paper proposes a LiDAR-based goal-seeking and exploration framework, addressing the efficiency of online obstacle avoidance in unstructured environments populated with static and moving obstacles. This framework addresses two significant challenges associated with traditional dynamic control barrier functions (D-CBFs): their online construction and the diminished real-time performance caused by utilizing multiple D-CBFs. To tackle the first challenge, the framework's perception component begins with clustering point clouds via the DBSCAN algorithm, followed by encapsulating these clusters with the minimum bounding ellipses (MBEs) algorithm to create elliptical representations. By comparing the current state of MBEs with those stored from previous moments, the differentiation between static and dynamic obstacles is realized, and the Kalman filter is utilized to predict the movements of the latter. Such analysis facilitates the D-CBF's online construction for each MBE. To tackle the second challenge, we introduce buffer zones, generating Type-II D-CBFs online for each identified obstacle. Utilizing these buffer zones as activation areas substantially reduces the number of D-CBFs that need to be activated. Upon entering these buffer zones, the system prioritizes safety, autonomously navigating safe paths, and hence referred to as the exploration mode. Exiting these buffer zones triggers the system's transition to goal-seeking mode. We demonstrate that the system's states under this framework achieve safety and asymptotic stabilization. Experimental results in simulated and real-world environments have validated our framework's capability, allowing a LiDAR-equipped mobile robot to efficiently and safely reach the desired location within dynamic environments containing multiple obstacles.
|
|
09:00-10:00, Paper FrPI6T11.4 | |
Safe Reinforcement Learning Via Hierarchical Adaptive Chance-Constraint Safeguards |
|
Chen, Zhaorun | Purdue University |
Zhao, Zhuokai | University of Chicago |
He, Tairan | Carnegie Mellon University |
Chen, BinHao | Shanghai Jiao Tong University |
Zhao, Xuhao | Shanghai Jiao Tong University |
Gong, Liang | Shanghai Jiao Tong University |
Liu, Chengliang | Shanghai Jiao Tong University |
|
|
09:00-10:00, Paper FrPI6T11.5 | |
Adaptive Splitting of Reusable Temporal Monitors for Rare Traffic Violations |
|
Innes, Craig | University of Edinburgh |
Ramamoorthy, Subramanian | The University of Edinburgh |
Keywords: Robot Safety, Hybrid Logical/Dynamical Planning and Verification, Autonomous Vehicle Navigation
Abstract: Autonomous Vehicles (AVs) are often tested in simulation to estimate the probability they violate safety specifications. Two common issues arise when using existing techniques to produce this estimation: If violations are rare, Monte-Carlo sampling can fail to produce efficient estimates; if simulation horizons are too long, importance samplers (which learn distributions from past simulations) can fail to converge. This paper addresses both issues by interleaving rare-event sampling techniques with online specification monitoring algorithms. We use importance splitting to decompose simulations into partial trajectories, then calculate the distance of those partial trajectories to failure by leveraging robustness metrics from Signal Temporal Logic (STL). By caching those partial robustness metric values, we can efficiently re-use computations across multiple sampling stages. Our experiments on an interstate lane-change scenario show our method is viable for testing simulated AV-pipelines, efficiently estimating failure probabilities for STL specifications based on real traffic rules. We produce better estimates than Monte-Carlo and importance sampling in fewer simulations.
|
|
09:00-10:00, Paper FrPI6T11.6 | |
Interruptive Language Control of Bipedal Locomotion |
|
Malik, Ashish | Oregon State University |
Lee, Stefan | Oregon State University |
Fern, Alan | Oregon State University |
Keywords: Robot Safety, Humanoid and Bipedal Locomotion
Abstract: We study the problem of natural language-based control of dynamic bipedal locomotion from the perspective of operational robustness and hardware safety. Existing work on natural language-based robot control has focused on episodic command execution for stable robot platforms, such as fixed-based manipulators in table-top scenarios. These scenarios feature non-overlapping phases of instruction and execution, with execution mishaps usually posing no threat to the robot safety. This allows for non-trivial failure rates to be acceptable. In contrast, our work involves indistinguishable instruction and execution stages for a dynamically unstable robot where execution failures can harm the robot. For example, interrupting a bipedal robot with a new instruction in certain states may cause it to fall. Our first contribution is to design and train a natural language based controller for the bipedal robot Cassie that can take in new language commands at any time. Our second contribution is to introduce a protocol for evaluating the robustness to interruptions of such controllers and evaluating the learned controller in simulation under different interruption distributions. Our third contribution is to learn a detector for interruptions that are likely to lead to failure and to integrate that detector into a failure mitigation strategy. Overall, our results show that interruptions can lead to non-trivial failure rates for the original controller and that the proposed mitigation strategy can help to significantly reduce that rate.
|
|
09:00-10:00, Paper FrPI6T11.7 | |
Safe Offline-To-Online Multi-Agent Decision Transformer: A Safety Conscious Sequence Modeling Approach |
|
Shah, Aamir Bader | University of Houston |
Wen, Yu | University of Houston |
Chen, Jiefu | University of Houston |
Wu, Xuqing | University of Houston |
Fu, Xin | University of Houston |
Keywords: Robot Safety, Deep Learning Methods, Reinforcement Learning
Abstract: We introduce the Safe Offline-to-Online Multi-Agent Decision Transformer (SO2-MADT), an innovative framework that revolutionizes safety considerations in Multi-agent Reinforcement Learning (MARL) through a novel sequence modeling approach. Leveraging the dynamic capabilities inherent in Decision Transformers, our methodology seamlessly incorporates safety protocols as a cornerstone element, ensuring secure operations throughout both the offline pre-training phase and the adaptive online fine-tuning phase. At the core of our framework lie two pivotal innovations: the Safety-To-Go (STG) token, embedding safety at a macro level, and the Agent Prioritization Module (APM), facilitating explicit credit assignment at a micro level. Through extensive testing against the challenging environments of the StarCraft Multi-Agent Challenge (SMAC) and Multi-agent MuJoCo, our SO2-MADT not only excels in offline pre-training but also demonstrates superior performance during online fine-tuning, without any degradation in performance. The implications of our work provide a pathway for deployment in critical real-world applications where safety is paramount and non-negotiable. The code is available at https://github.com/shahaamirbader/SO2-MADT.
|
|
09:00-10:00, Paper FrPI6T11.8 | |
Differential-Algebraic Equation Control Barrier Function for Flexible Link Manipulator |
|
Park, Younghwa | Maersk Mc-Kinney Moller Institute, University of Southern Denmar |
Sloth, Christoffer | University of Southern Denmark |
Keywords: Robot Safety, Underactuated Robots, Optimization and Optimal Control
Abstract: This paper presents a control barrier function (CBF) for systems described by differential-algebraic equations, and applies the method to guarantee the safety of a two-link flexible-link manipulator. The two main contributions of the paper are: a) an extension of CBFs to systems governed by differential-algebraic equations; b) a framework simulation of flexible-link robots in a floating frame of reference formulation (FFRF) finite element method (FEM). Numerical simulations demonstrate the minimally invasive safety control of a flexible two-link manipulator with position constraints through CBF quadratic programming without converting the differential-algebraic equations to a control-affine system.
|
|
09:00-10:00, Paper FrPI6T11.9 | |
MIXED-SENSE: A Mixed Reality Sensor Emulation Framework for Test and Evaluation of UAVs against False Data Injection Attacks |
|
Pant, Kartik Anand | Purdue University |
Lin, Li-Yu | Purdue University |
Kim, Jaehyeok | Purdue University - West Lafayette |
Sribunma, Worawis | Purdue University |
Goppert, James | Purdue University |
Hwang, Inseok | Purdue University |
Keywords: Virtual Reality and Interfaces, Robot Safety, Aerial Systems: Perception and Autonomy
Abstract: We present a high-fidelity Mixed Reality sensor emulation framework for testing and evaluating the resilience of Unmanned Aerial Vehicles (UAVs) against false data injection (FDI) attacks. The proposed approach can be utilized to assess the impact of FDI attacks, benchmark attack detector performance, and validate the effectiveness of mitigation/reconfiguration strategies in single-UAV and UAV swarm operations. Our Mixed Reality framework leverages high-fidelity simulations of Gazebo and a Motion Capture system to emulate proprioceptive (e.g., GNSS) and exteroceptive (e.g., camera) sensor measurements in real-time. We propose an empirical approach to faithfully recreate signal characteristics such as latency and noise in these measurements. Finally, we illustrate the efficacy of our proposed framework through a Mixed Reality experiment consisting of an emulated GNSS attack on an actual UAV, which (i)demonstrates the impact of false data injection attacks on GNSS measurements and (ii) validates a mitigation strategy utilizing a distributed camera network developed in our previous work. Our open-source implementation is available at https://github.com/CogniPilot/mixed_sense
|
|
09:00-10:00, Paper FrPI6T11.10 | |
Safe Multi-Agent Reinforcement Learning for Bimanual Dexterous Manipulation |
|
Zhan, Weishu | Dartmouth College |
Chin, Peter | Dartmouth College |
Keywords: Bimanual Manipulation, Reinforcement Learning, Robot Safety
Abstract: Bimanual dexterous manipulation in robotics, essential for a wide range of applications, addresses the critical challenge of balancing intricate operational capabilities with assured safety and reliability. While Safe Reinforcement Learning is integral to the robustness of robotic systems, the area of safe multi-agent reinforcement learning (MARL), cooperative control of multiple robots has been scarcely studied. In this study, we explore MARL for safe cooperative control with multiple robot hands. Each robot must follow individual and collective safety guidelines to ensure safe team actions. However, the non-stationarity inherent in current algorithms hinders the precise updating of strategies to satisfy these safety constraints effectively. In this paper, we propose Multi-Agent Constrained Proximal Advantage Optimization (MACPAO), which considers the sequence of agent updates and integrates non-stationarity into sequential update schemes. This algorithm ensures consistent improvement in both rewards and adherence to safety constraints in each iteration. We tested MACPAO on various tasks with safety constraints and demonstrated that it outperforms other MARL algorithms in balancing reward enhancement and safety compliance. Supplementary materials and code are available at the provided link https://github.com/YONEX4090/MultiSafeHand.git.
|
|
09:00-10:00, Paper FrPI6T11.11 | |
CBFkit: A Control Barrier Function Toolbox for Robotics Applications |
|
Black, Mitchell | MIT Lincoln Laboratory |
Fainekos, Georgios | Toyota NA-R&D |
Hoxha, Bardh | Toyota Research Institute of North America |
Okamoto, Hideki | Toyota Motor North America |
Prokhorov, Danil | Toyota Tech Center |
Keywords: Control Architectures and Programming, Formal Methods in Robotics and Automation, Optimization and Optimal Control
Abstract: This paper introduces CBFkit, a Python/ROS toolbox for safe robotics planning and control under uncertainty. The toolbox provides a general framework for designing control barrier functions for mobility systems within both deterministic and stochastic environments. It can be connected to the ROS open-source robotics middleware, allowing for the setup of multi-robot applications, encoding of environments and maps, and integrations with predictive motion planning algorithms. Additionally, it offers multiple CBF variations and algorithms for robot control. The CBFKit is demonstrated on the Toyota Human Support Robot (HSR) in both simulation and in physical experiments.
|
|
09:00-10:00, Paper FrPI6T11.12 | |
RECOVER: A Neuro-Symbolic Framework for Failure Detection and Recovery |
|
Cornelio, Cristina | Samsung AI |
Diab, Mohammed | Imperial College London |
Keywords: Failure Detection and Recovery, AI-Enabled Robotics, AI-Based Methods
Abstract: Recognizing failures during task execution and implementing recovery procedures is challenging in robotics. Traditional approaches rely on the availability of extensive data or a tight set of constraints, while more recent approaches leverage large language models (LLMs) to verify task steps and replan accordingly. However, these methods often operate offline, necessitating scene resets and incurring in high costs. This paper introduces RECOVER, a neuro-symbolic framework for online failure identification and recovery. By integrating ontologies, logical rules, and LLM-based planners, RECOVER exploits symbolic information to enhance the ability of LLMs to generate recovery plans and also to decrease the associated costs. In order to demonstrate the capabilities of our method in a simulated kitchen environment, we introduce ONTOTHOR, an ontology describing the AI2Thor simulator setting. Empirical evaluation shows that ONTOTHOR’s logical rules accurately detect all failures in the analyzed tasks, and that RECOVER considerably outperforms, for both failure detection and recovery, a baseline method reliant solely on LLMs. Supplementary material, including the ONTOTHOR ontology, is available at: https://recover-ontothor.github.io.
|
|
09:00-10:00, Paper FrPI6T11.13 | |
Hybrid Continuum-Eversion Robot: Precise Navigation and Decontamination in Nuclear Environments Using Vine Robot |
|
Al-Dubooni, Mohammed | Queen Mary University of London |
Wong, Cuebong | National Nuclear Laboratory |
Althoefer, Kaspar | Queen Mary University of London |
Keywords: Soft Robot Applications, Robot Safety
Abstract: Soft growing vine robots show great potential for navigation and decontamination tasks in the nuclear industry. This paper introduces a novel hybrid continuum-eversion robot designed to address certain challenges in relation to navigating and operating within pipe networks and enclosed remote vessels. The hybrid robot combines the flexibility of a soft eversion robot with the precision of a continuum robot at its tip, allowing for controlled steering and movement in hard to access and/or complex environments. The design enables the delivery of sensors, liquids, and aerosols to remote areas, supporting remote decontamination activities. This paper outlines the design and construction of the robot and the methods by which it achieves selective steering. We also include a comprehensive review of current related work in eversion robotics, as well as other steering devices and actuators currently under research, which underpin this novel active steering approach. This is followed by an experimental evaluation that demonstrates the robot’s real-world capabilities in delivering liquids and aerosols to remote locations. The experiments reveal successful outcomes, with over 95% success in precision spraying tests. The paper concludes by discussing future work alongside limitations in the current design, ultimately showcasing its potential as a solution for remote decontamination operations in the nuclear industry.
|
|
09:00-10:00, Paper FrPI6T11.14 | |
RoboGuardZ: A Scalable Zero-Shot Framework for Zero-Day Malware Detection in Robots |
|
Kaur, Upinder | Purdue University |
Celik, Berkay | Purdue University |
Voyles, Richard | Purdue University |
Keywords: Engineering for Robotic Systems, AI-Based Methods
Abstract: The ubiquitous deployment of robots across diverse domains, from industrial automation to personal care, underscores their critical role in modern society. However, this growing dependence has also revealed security vulnerabilities. An attack vector involves the deployment of malicious software (malware) on robots, which can cause harm to robots themselves, users, and even the surrounding environment. Machine learning approaches, particularly supervised ones, have shown promise in malware detection by building intricate models to identify known malicious code patterns. However, these methods are inherently limited in detecting unseen or zero-day malware variants as they require regularly updated massive datasets that might be unavailable to robots. To address this challenge, we introduce RoboGuardZ, a novel malware detection framework based on zero-shot learning for robots. This approach allows RoboGuardZ to identify unseen malware by establishing relationships between known malicious code and benign behaviors, allowing detection even before the code executes on the robot. To ensure practical deployment in resource-constrained robotic hardware, we employ a unique parallel structured pruning and quantization strategy that compresses the RoboGuardZ detection model by 37.4% while maintaining its accuracy. This strategy reduces the size of the model and computational demands, making it suitable for real-world robotic systems. We evaluated RoboGuardZ on a recent dataset containing real-world binary executables from multi-sensor autonomous car controllers. The framework was deployed on two popular robot embedded hardware platforms. Our results demonstrate an average detection accuracy of 94.25% and a low false negative rate of 5.8% with a minimal latency of 20 ms, which demonstrates its effectiveness and practicality.
|
|
09:00-10:00, Paper FrPI6T11.15 | |
RoboCop: A Robust Zero-Day Cyber-Physical Attack Detection Framework for Robots |
|
Kaur, Upinder | Purdue University |
Celik, Berkay | Purdue University |
Voyles, Richard | Purdue University |
Keywords: Embedded Systems for Robotic and Automation, AI-Based Methods
Abstract: Zero-day vulnerabilities pose a significant challenge to robot cyber-physical systems (CPS). Attackers can exploit software vulnerabilities in widely-used robotics software, such as the Robot Operating System (ROS), to manipulate robot behavior, compromising both safety and operational effectiveness. The hidden nature of these vulnerabilities requires strong defense mechanisms to guarantee the safety and dependability of robotic systems. In this paper, we introduce RoboCop, a cyber-physical attack detection framework designed to protect robots from zero-day threats. RoboCop leverages static software features in the pre-execution analysis along with runtime state monitoring to identify attack patterns and deviations that signal attacks, thus ensuring the robot's operational integrity. We evaluated RoboCop on the F1-tenth autonomous car platform. It achieves a 93% detection accuracy against a variety of zero-day attacks targeting sensors, actuators, and controller logic. Importantly, in on-robot deployments, it identifies attacks in less than 7 seconds with a 12% computational overhead.
|
|
09:00-10:00, Paper FrPI6T11.16 | |
Collision Detection between Smooth Convex Bodies Via Riemannian Optimization Framework |
|
An, Seoki | Seoul National University |
Lee, Somang | Seoul National University |
Lee, Jeongmin | Seoul National University |
Park, Sunkyung | Seoul National University |
Lee, Dongjun | Seoul National University |
Keywords: Computational Geometry, Contact Modeling, Simulation and Animation
Abstract: Collision detection is a fundamental problem across various fields such as robotics, physical simulation, and computer graphics. While numerous studies have provided efficient solutions, based on the well-known Gilbert, Johnson, and Keerthi (GJK) algorithm and Expanding Polytope Algorithm (EPA), existing methods utilizing GJK-EPA often struggle with smooth strictly convex shapes like ellipsoids. This paper proposes a novel approach to the collision detection problem converting it to a problem compatible with an unconstrained Riemannian optimization problem. Moreover, we presents a specific method of solving the problem based on twice differentiable support functions and the Riemannian trust region (RTR) method. The method exhibits fast and robust convergence rate, leveraging the well-established theory of Riemannian optimization. In addition, it offers the capability to compute derivatives of resultant contact features. The evaluation studies comparing our method to GJK-EPA method are done with pre-defined primitive shapes. Additionally, a test result with several more complex shapes is demonstrated exhibiting the method's effectiveness and applicability.
|
|
FrPI6T12 |
Room 12 |
Sensor Fusion for Robots |
Teaser Session |
Co-Chair: Kim, Jinwhan | KAIST |
|
09:00-10:00, Paper FrPI6T12.1 | |
A Case Study on Visual-Audio-Tactile Cross-Modal Retrieval |
|
Wojcik, Jagoda | King's College London |
Jiang, Jiaqi | King's College London |
Wu, Jiacheng | King’s College London |
Luo, Shan | King's College London |
Keywords: Force and Tactile Sensing, Perception for Grasping and Manipulation
Abstract: Cross-Modal Retrieval (CMR), which retrieves relevant items from one modality (e.g., audio) given a query in another modality (e.g., visual), has undergone significant advancements in recent years. This capability is crucial for robots to integrate and interpret information across diverse sensory inputs. However, the retrieval space in existing robotic CMR approaches often consists of only one modality, which limits the robot’s performance. In this paper, we propose a novel CMR model that incorporates three different modalities, i.e., visual, audio and tactile, for enhanced multi-modal object retrieval, named as VAT-CMR. In this model, multi-modal representations are first fused to provide a holistic view of object features. To mitigate the semantic gaps between representations of different modalities, a dominant modality is then selected during the classification training phase to improve the distinctiveness of the representations, so as to improve the retrieval performance. To evaluate our proposed approach, we conducted a case study and the results demonstrate that our VAT-CMR model surpasses competing approaches. Further, our proposed dominant modality selection significantly enhances cross-retrieval accuracy.
|
|
09:00-10:00, Paper FrPI6T12.2 | |
ASY-VRNet: Waterway Panoptic Driving Perception Model Based on Asymmetric Fair Fusion of Vision and 4D mmWave Radar |
|
Guan, Runwei | University of Liverpool |
Yao, Shanliang | XJTLU |
Man, Ka Lok | Xi'an Jiaotong-Liverpool University |
Zhu, Xiaohui | Xi'an Jiaotong-Liverpool University |
Yue, Yong | Xi'an Jiaotong-Liverpool University |
Smith, Jeremy | University of Liverpool |
Lim, Eng Gee | Xi'an Jiaotong-Liverpool University |
Yue, Yutao | Hong Kong University of Science and Technology (Guangzhou) |
Keywords: Sensor Fusion, Computer Vision for Transportation
Abstract: Panoptic Driving Perception (PDP) is critical for the autonomous navigation of Unmanned Surface Vehicles (USVs). A PDP model typically integrates multiple tasks, necessitating the simultaneous and robust execution of various perception tasks to facilitate downstream path planning. The fusion of visual and radar sensors is currently acknowledged as a robust and cost-effective approach. However, most existing research has primarily focused on fusing visual and radar features dedicated to object detection or utilizing a shared feature space for multiple tasks, neglecting the individual representation differences between various tasks. To address this gap, we propose a pair of Asymmetric Fair Fusion (AFF) modules with favorable explainability designed to efficiently interact with independent features from both visual and radar modalities, tailored to the specific requirements of object detection and semantic segmentation tasks. The AFF modules treat image and radar maps as irregular point sets and transform these features into a crossed-shared feature space for multitasking, ensuring equitable treatment of vision and radar point cloud features. Leveraging AFF modules, we propose a novel and efficient PDP model, ASY-VRNet, which processes image and radar features based on irregular super-pixel point sets. Additionally, we propose an effective multi-task learning method specifically designed for PDP models. Compared to other lightweight models, ASY-VRNet achieves state-of-the-art performance in object detection, semantic segmentation, and drivable-area segmentation on the WaterScenes benchmark. Our project is publicly available at https://github.com/GuanRunwei/ASY-VRNet.
|
|
09:00-10:00, Paper FrPI6T12.3 | |
KLILO: Kalman Filter Based LiDAR-Inertial-Leg Odometry for Legged Robots |
|
Xu, Shaohang | Huazhong University of Science and Technology |
Zhang, Wentao | Huazhong University of Science and Technology |
Zhu, Lijun | Huazhong University of Science and Technology |
Keywords: Sensor Fusion, SLAM, Mapping
Abstract: This paper presents a Kalman filter based LiDAR-Inertial-Leg Odometry (KLILO) system for legged robots to navigate in challenging environments. In particular, we employ the iterated error-state extended Kalman filter framework on manifolds to fuse measurements from the inertial measurement unit (IMU), LiDAR, joint encoders, and contact force sensors in a tightly coupled manner. To assess the performance of KLILO, we build a dataset that encompasses intricate environments with challenging conditions such as dynamic objects and deformable terrains. The results demonstrate that our algorithm can provide efficient and reliable localization in all tests. It exhibits an average improvement of around 40% in positioning accuracy compared to the baselines. Furthermore, we validate KLILO in a challenging navigation task on a real robot, where the LiDAR encounters ineffective measurements.
|
|
09:00-10:00, Paper FrPI6T12.4 | |
AnytimeFusion: Parameter-Free RGB Camera-Radar Sensor Fusion Algorithm in Complex Maritime Situations |
|
Shin, Yeongha | Korea Advanced Institute of Science and Technology |
Kim, Hanguen | Seadronix Corp |
Kim, Jinwhan | KAIST |
Keywords: Sensor Fusion, Marine Robotics, Autonomous Vehicle Navigation
Abstract: Determining the position of obstacles is crucial for unmanned vehicles, and, to achieve this, cameras and radar sensors are widely utilized. However, establishing correlation between two or more sensors proves challenging in the dynamically changing maritime environment. To solve these issues, we propose the AnytimeFusion algorithm. The key innovation of AnytimeFusion lies in the utilization of a parameter-free method that does not require accurate sensor alignment and calibration. The algorithm consists of the following four stages. First, calibration targets are selected in the maritime environment based on segmentation images. Second, radar and camera data are pre-fused to model the correlation of azimuth information. After completing the auto-calibration stages, Inverse Perspective Mapping (IPM) is employed to integrate the coordinate systems of the two sensors. To determine the parameters for this integration, optimization based on the Particle Swarm Optimization (PSO) method is employed. Finally, an Error Polygon for the positions of the camera and radar is generated, and sensor fusion is carried out based on this information. We validated our method through experiments conducted on real ships in complex maritime environments, achieving an average accuracy of 95.7%.
|
|
09:00-10:00, Paper FrPI6T12.5 | |
Implicit Neural Fusion of RGB and Far-Infrared 3D Imagery for Invisible Scenes |
|
Li, Xiangjie | The University of Tokyo |
Xie, Shuxiang | The University of Tokyo |
Sakurada, Ken | National Institute of Advanced Industrial Science and Technology |
Sagawa, Ryusuke | National Institute of Advanced Industrial Science AndTechnology |
Oishi, Takeshi | The University of Tokyo |
Keywords: Sensor Fusion, Recognition, Computer Vision for Automation
Abstract: Optical sensors, such as the Far Infrared (FIR) sensor, have demonstrated advantages over traditional imaging. For example, 3D reconstruction in the FIR field captures the heat distribution of a scene that is invisible to RGB, aiding various applications like gas leak detection. However, less texture information and challenges in acquiring FIR frames hinder the reconstruction process. Given that implicit neural representations (INRs) can integrate geometric information across different sensors, we propose Implicit Neural Fusion (INF) of RGB and FIR for 3D reconstruction of invisible scenes in the FIR field. Our method first obtains a neural density field of objects from RGB frames. Then, with the trained object density field, a separate neural density field of gases is optimized using limited view inputs of FIR frames. Our method not only demonstrates outstanding reconstruction quality in the FIR field through extensive experiments but also can isolate the geometric information of the invisible, offering a new dimension of scene understanding.
|
|
09:00-10:00, Paper FrPI6T12.6 | |
Audio-Visual Traffic Light State Detection for Urban Robots |
|
Gupta, Sagar | Deakin University |
Cosgun, Akansel | Monash University |
Keywords: Sensor Fusion, Intelligent Transportation Systems, Autonomous Vehicle Navigation
Abstract: We present a multimodal traffic light state detection using vision and sound, from the viewpoint of a quadruped robot navigating in urban settings. This is a challenging problem because of the visual occlusions and noise from robot locomotion. Our method combines features from raw audio with the ratios of red and green pixels within bounding boxes, identified by established vision-based detectors. The fusion method aggregates features across multiple frames in a given timeframe, increasing robustness and adaptability. Results show that our approach effectively addresses the challenge of visual occlusion and surpasses the performance of single-modality solutions when the robot is in motion. This study serves as a proof of concept, highlighting the significant, yet often overlooked, potential of multi-modal perception in robotics.
|
|
09:00-10:00, Paper FrPI6T12.7 | |
Accurately Tracking Relative Positions of Moving Trackers Based on UWB Ranging and Inertial Sensing without Anchors |
|
Armani, Rayan | ETH Zurich |
Holz, Christian | ETH Zürich |
Keywords: Sensor Fusion, Range Sensing, Sensor Networks
Abstract: We present a tracking system for relative positioning that can operate on entirely moving tracking nodes without the need for stationary anchors. Each node embeds a 9-DOF magnetic and inertial measurement unit and a single-antenna ultra-wideband radio. We introduce a multi-stage filtering pipeline through which our system estimates the relative layout of all tracking nodes within the group. The key novelty of our method is the integration of a custom Extended Kalman filter (EKF) with a refinement step via multidimensional scaling (MDS). Our method integrates the MDS output back into the EKF, thereby creating a dynamic feedback loop for more robust estimates. We complement our method with UWB ranging protocol that we designed to allow tracking nodes to opportunistically join and leave the group. In our evaluation with constantly moving nodes, our system estimated relative positions with an error of 10.2cm (in 2D) and 21.7cm (in 3D), including obstacles that occluded the line of sight between tracking nodes. Our approach requires no external infrastructure, making it particularly suitable for operation in environments where stationary setups are impractical.
|
|
09:00-10:00, Paper FrPI6T12.8 | |
Bridging Language, Vision and Action: Multimodal VAEs in Robotic Manipulation Tasks |
|
Sejnova, Gabriela | Czech Technical University in Prague |
Vavrecka, Michal | Czech Technical University CIIRC |
Stepanova, Karla | Czech Technical University |
Keywords: Sensor Fusion, Visual Learning, Semantic Scene Understanding
Abstract: In this work, we focus on unsupervised vision-language-action mapping in the area of robotic manipulation. Recently, multiple approaches employing pre-trained large language and vision models have been proposed for this task. However, they are computationally demanding and require careful fine-tuning of the produced output. A more lightweight alternative would be the implementation of multimodal Variational Autoencoders (VAEs) which can extract the latent features of the data and integrate them into a joint representation, as has been demonstrated mostly on image-image or image-text data for the state-of-the-art models. Here, we explore whether and how multimodal VAEs can be employed in unsupervised robotic manipulation tasks in a simulated environment. Based on the results obtained, we propose a model-invariant training alternative that improves the models' performance in a simulator by up to 55%. Moreover, we systematically evaluate the challenges raised by individual tasks, such as object or robot position variability, number of distractors, or task length. Our work thus also sheds light on the potential benefits and limitations of using the current multimodal VAEs for unsupervised learning of robotic motion trajectories based on vision and language.
|
|
09:00-10:00, Paper FrPI6T12.9 | |
Adaptive Visual-Aided 4D Radar Odometry through Transformer-Based Feature Fusion |
|
Zhang, Yuanfan | Harbin Institute of Technology |
Xiao, Renxiang | Harbin Institute of Technology, Shenzhen |
Hong, Ziyang | Heriot-Watt University |
Hu, Liang | Harbin Institute of Technology, Shenzhen |
Liu, Jie | Harbin Institute of Technology |
Keywords: Sensor Fusion, SLAM, Deep Learning for Visual Perception
Abstract: Multimodal sensor fusion has been successfully utilized in many odometry and localization methods as it increases both estimate accuracy and robustness in application scenarios. To address the challenge of odometry under varying-weather conditions, we propose a novel visual 4D radar fusion based odometry in an unsupervised deep learning approach. In our method, we adopt transformer-based cascaded decoders to facilitate efficient feature extraction of images and radar point clouds. Considering that radars are weather-agnostic and information-rich cameras are susceptible to adverse weathers, we deliberately introduce an adaptive feature fusion strategy via the attention mechanism, in which the attention shifts dynamically to adapt to changing weather conditions based on the amount of information content in image features. Through extensive comparative experiments, our method surpasses different state-of-the-art single-modal odometry estimation methods. Our code and trained model will be released publicly.
|
|
09:00-10:00, Paper FrPI6T12.10 | |
VIRUS-NeRF - Vision, InfraRed and UltraSonic Based Neural Radiance Fields |
|
Schmid, Nicolaj | EPFL |
von Einem, Cornelius | ETH Zürich |
Cadena Lerma, Cesar | ETH Zurich |
Siegwart, Roland | ETH Zurich |
Hruby, Lorenz | Filics GmbH |
Tschopp, Florian | Voliro AG |
Keywords: Sensor Fusion, Mapping, AI-Enabled Robotics
Abstract: Autonomous mobile robots are an increasingly integral part of modern factory and warehouse operations. Obstacle detection, avoidance and path planning are critical safety-relevant tasks, which are often solved using expensive LiDAR sensors and depth cameras. We propose to use costeffective low-resolution ranging sensors, such as ultrasonic and infrared time-of-flight sensors by developing VIRUS-NeRF - Vision, InfraRed, and UltraSonic based Neural Radiance Fields. Building upon Instant Neural Graphics Primitives with a Multiresolution Hash Encoding (Instant-NGP), VIRUS-NeRF incorporates depth measurements from ultrasonic and infrared sensors and utilizes them to update the occupancy grid used for ray marching. Experimental evaluation in 2D demonstrates that VIRUS-NeRF achieves comparable mapping performance to LiDAR point clouds regarding coverage. Notably, in small environments, its accuracy aligns with that of LiDAR measurements, while in larger ones, it is bounded by the utilized ultrasonic sensors. An in-depth ablation study reveals that adding ultrasonic and infrared sensors is highly effective when dealing with sparse data and low view variation. Further, the proposed occupancy grid of VIRUS-NeRF improves the mapping capabilities and increases the training speed by 46% compared to Instant-NGP. Overall, VIRUS-NeRF presents a promising approach for cost-effective local mapping in mobile robotics, with potential applications in safety and navigation tasks. The code can be found at https://github.com/ethz-asl/virus_nerf.
|
|
09:00-10:00, Paper FrPI6T12.11 | |
Monocular Event-Inertial Odometry with Adaptive Decay-Based Time Surface and Polarity-Aware Tracking |
|
Tang, Kai | Zhejiang University |
Lang, Xiaolei | Zhejiang University |
Ma, Yukai | Zhejiang University |
Huang, Yuehao | Zhejiang University |
Li, Laijian | Zhejiang University |
Liu, Yong | Zhejiang University |
Lv, Jiajun | Zhejiang University |
Keywords: Sensor Fusion, Visual-Inertial SLAM, Visual Tracking
Abstract: In this paper, we propose a monocular event-inertial odometry that incorporates an adaptive decay kernel-based time surface with a polarity-aware tracking method. Event cameras have garnered considerable attention because of their advantages over traditional cameras in low power consumption, high dynamic range, and no motion blur. To extract texture information from asynchronous events, our odometry implements the Time Surface based on adaptive decay, and the adaptive decay can adapt to the dynamic characteristics of the event stream and enhance the representation of environmental textures. Moreover, polarity-weighted time surfaces suffer from event polarity shifts in the motion direction change. To mitigate its adverse effects on feature tracking, we optimize the tracking process by incorporating an additional polarity-inverted time surface to enhance the robustness. We compare our method with visual-intertial and event-inertial odometry methods in terms of accuracy. Additionally, we conduct ablation experiments on both decays, parameter settings and the polarity-aware tracking method. The experimental results demonstrate better performance over the state-of-the-art methods, along with competitive outcomes across various datasets.
|
|
09:00-10:00, Paper FrPI6T12.12 | |
DCSANet: Dual Cross-Channel and Spatial Attention Make RGB-T Object Detection Better |
|
Lan, Xiaoxiong | Sun Yat-Sen University |
Liu, Shenghao | Sun Yat-Sen University |
Zhang, Zhiyong | Sun Yat-Sen University |
Qiu, Changzhen | Sun Yat-Sen University |
Keywords: Sensor Fusion, Computer Vision for Automation, Object Detection, Segmentation and Categorization
Abstract: Multimodal image pairs can make object detection more reliable in challenging environments, so RGB-T object detection has gained extensive attention over the past decade. To alleviate the complementarity of the visible and thermal modality, we propose a novel lightweight Feature Enhancement-fusion Module (FEM), which is composed of the Channel Enhancement-fusion Unit (CEU) and Spatial Enhancement-fusion Unit (SEU) by extending the attention mechanism to operate on two modalities. CEU is used to exploit the complementarity and alleviate the data imbalance by combining internal and global channel attention. Additionally, SEU is utilized to guide the model to pay more attention to the regions of interest. By incorporating FEM, enhanced and fused features are obtained, leading to improved performance. The effectiveness and generalizability of FEM are validated by two public datasets and our proposed DCSANet achieves competitive performance while maintaining high speed (+%7.0 on LLVIP and +1.2% on FLIR in mAP). Moreover, we conducted ablation experiments to verify the effectiveness of the proposed operators.
|
|
09:00-10:00, Paper FrPI6T12.13 | |
Advanced Liquid and Dust Detection Sensor Setup and Algorithm Based on YOLO and Feature Extraction for Commercial Autonomous Cleaning Robots |
|
Jung, Dae-Hwan | Samsung Electronics Company, Ltd |
Hong, Hyun Seok | Samsung Electronics |
Park, Sahng-Gyu | Samsung Electronics |
Lee, Yeongrok | Samsung Electronics |
Lee, Woosub | Samsung |
|
09:00-10:00, Paper FrPI6T12.14 | |
Error-State Kalman Filter Based Visual-Inertial Odometry Using Orientation Measurement on Unit Quaternion Group |
|
Chang, Chao-Wei | National Taiwan University |
Lian, Feng-Li | National Taiwan University |
Keywords: Sensor Fusion, Visual-Inertial SLAM, Aerial Systems: Perception and Autonomy
Abstract: The inaccessibility of data from standard sensor suites on closed-platform unmanned aerial vehicles (UAVs) has been a hindrance to developing a compatible visual-inertial odometry (VIO). Despite the advance of recent VIO research, these works often emphasize fusing detailed sensor models with available sensor data at a relatively high frequency. To address this issue, in this paper, we derive an innovation signal for an orientation measurement model on the unit quaternion group S^3 based on the error-state Kalman filter (ESKF) framework. Leveraging the error-state formulation, the innovation signal directly exploits the geometric error representation on S^3 instead of treating unit quaternions as R^4 vectors. Flight experiments on a small commercial UAV (Fig. 1) have been carried out to compare the performance of the proposed ESKF with quaternion measurements on S^3 (ESKF-Q) against the original ESKF framework. Experimental results demonstrate that while both representations of unit quaternion measurements in ESKF framework improve orientation estimates with unperturbed orientation measurement model, only the proposed ESKF-Q exhibits convergent state estimates in the presence of uncertainties in the orientation measurement model.
|
|
09:00-10:00, Paper FrPI6T12.15 | |
Real-Time Truly-Coupled Lidar-Inertial Motion Correction and Spatiotemporal Dynamic Object Detection |
|
Le Gentil, Cedric | University of Technology Sydney |
Falque, Raphael | University of Technology Sydney |
Vidal-Calleja, Teresa A. | University of Technology Sydney |
Keywords: Sensor Fusion, Object Detection, Segmentation and Categorization, SLAM
Abstract: Over the past decade, lidars have become a cornerstone of robotics state estimation and perception thanks to their ability to provide accurate geometric information about their surroundings in the form of 3D scans. Unfortunately, most of nowadays lidars do not take snapshots of the environment but sweep the environment over a period of time (typically around 100 ms). Such a rolling-shutter-like mechanism introduces motion distortion into the collected lidar scan, thus hindering downstream perception applications. In this paper, we present a novel method for motion distortion correction of lidar data by tightly coupling lidar with Inertial Measurement Unit (IMU) data. The motivation of this work is a map-free dynamic object detection based on lidar. The proposed lidar data undistortion method relies on continuous preintegrated of IMU measurements that allow parameterising the sensors’ continuous 6-DoF trajectory using solely eleven discrete state variables (biases, initial velocity, and gravity direction). The undistortion consists of feature-based distance minimisation of point-to-line and point-to-plane residuals in a non-linear least-square formulation. Given undistorted geometric data over a short temporal window, the proposed pipeline computes the spatiotemporal normal vector of each of the lidar points. The temporal component of the normals is a proxy for the corresponding point’s velocity, therefore allowing for learning-free dynamic object classification without the need for registration in a global reference frame. We demonstrate the soundness of the proposed method and its different components using public datasets and compare them with state-of-the-art lidar-inertial state estimation and dynamic object detection algorithms.
|
|
09:00-10:00, Paper FrPI6T12.16 | |
Accurate and Efficient Loop Closure Detection with Deep Binary Image Descriptor and Augmented Point Cloud Registration |
|
Wang, Jialiang | The Chinese University of Hong Kong |
Gao, Zhi | Temasek Laboratories @ NUS |
Lin, Zhipeng | The Chinese University of Hong Kong |
Zhou, Zhiyu | Wuhan University |
Wang, Xiaonan | ZG Technology Co., Ltd |
Cheng, Jianhua | ZG Technology Co., Ltd |
Zhang, Hao | Wuhan University |
Liu, Xinyi | Wuhan University |
Chen, Ben M. | Chinese University of Hong Kong |
Keywords: Sensor Fusion, Vision-Based Navigation, Computer Vision for Automation
Abstract: Loop Closure Detection (LCD) is an essential component of Simultaneous Localization and Mapping (SLAM), helping to correct drift errors, facilitate map merging, or both by identifying previously observed scenes. Despite its importance, traditional LCD algorithms based on single sensor such as camera or LiDAR exhibit degraded performance in challenging scenarios due to their inherent limitations. To address this issue, we propose a novel LCD method based on camera-LiDAR fusion, exploiting the rich textural information from cameras and the accurate geometric data from LiDAR to ensure robustness and speed in challenging environments. Specifically, we first employ deep hashing learning to encode deep image features into binary image descriptor for extremely fast loop candidate (LC) retrieval. Then, LiDAR points are augmented with image color for accurate geometric verification. Finally, we incorporate a spatial-temporal consistency check that mandates an LC to have consistently matched neighbors to be accepted as true. Our method is extensively verified and compared with state-of-the-art methods on various public datasets and our own data, encompassing both indoor and outdoor environments. Experimental results demonstrate that our method obtains the best performance, increasing the maximum recall rate at 100% precision by a significant margin of 20% while operating in real-time at an average speed of 30 fps.
|
|
09:00-10:00, Paper FrPI6T12.17 | |
Event-Intensity Stereo with Cross-Modal Fusion and Contrast |
|
Wang, Yuanbo | Dalian University of Technolory |
Qu, Shanglai | Dalian University of Technology |
Meng, Tianyu | Dalian University of Technology |
Cui, Yan | China Germany Artificial Intelligence Institute |
Piao, Haiyin | Northwestern Polytechnical University |
Wei, Xiaopeng | Dalian University of Technology |
Yang, Xin | Dalian University of Technology |
Keywords: Sensor Fusion, Deep Learning Methods, RGB-D Perception
Abstract: For binocular stereo, traditional cameras excel in capturing fine details and texture information but are limited in terms of dynamic range and their ability to handle rapid motion. On the other hand, event cameras provide pixel-level intensity changes with low latency and a wide dynamic range, albeit at the cost of less detail in their output. It is natural to leverage the strengths of both modalities. We solve this problem by introducing a cross-modal fusion module that learns a visual representation from both sensor inputs. Additionally, we extract and compare dense event-intensity stereo pair features by contrasting “pairs of event-intensity pairs from different views and different modalities and different timestamps”. This provides the flexibility in masking hard negatives and enables networks to effectively combine event-intensity signals within a contrastive learning framework, leading to an improved matching accuracy and facilitating more accurate estimation of disparity. Experimental results validate the effectiveness of our model and the improvement of disparity estimation accuracy.
|
|
FrAT1 |
Room 1 |
SLAM V |
Regular session |
|
10:00-10:15, Paper FrAT1.1 | |
EverySync: An Open Hardware Time Synchronization Sensor Suite for Common Sensors in SLAM |
|
Wu, Xuankang | Northeastern University |
Sun, Haoxiang | Northeastern University |
Wu, Rongguang | Northeastern University |
Fang, Zheng | Northeastern University |
Keywords: Software-Hardware Integration for Robot Systems, Sensor Fusion, SLAM
Abstract: Multi-sensor fusion systems have been widely applied in various fields, including mobile robot, simultaneous localization and mapping (SLAM), and autonomous driving. For a tightly coupled multi-sensor fusion system, strict time synchronization between sensors will improve the accuracy of the system. However, there is currently a lack of open-source and general-purpose hardware synchronization systems for Cameras, IMUs, LiDARs, GNSS/RTK in the academic community. Therefore, we propose EverySync, an open hardware time synchronization system to address this gap. The synchronization accuracy of the system was evaluated through multiple experiments, achieving an accuracy of less than 1 ms. And, real-world experiments proved that hardware time synchronization improves the accuracy of the SLAM system. This open-source system is available on GitHub.
|
|
10:15-10:30, Paper FrAT1.2 | |
A Point-Line Features Fusion Method for Fast and Robust Monocular Visual-Inertial Initialization |
|
Xie, Guoqiang | Sichuan University |
Chen, Jie | Sichuan University |
Tang, Tianhang | Sichuan University |
Chen, Zeyu | Sichuan University |
Lei, Ling | Sichuan University |
Liu, Yiguang | Sichuan University |
Keywords: Visual-Inertial SLAM, SLAM
Abstract: Fast and robust initialization is essential for highly accurate monocular visual-inertial odometer (VIO), but at present majority of initialization methods rely only on point features, unstable in low texture and blurring situations. Therefore, we propose a novel point-line features fusion method for monocular visual-inertial initialization, as line features are more stable and provide richer geometric information than point features: 1) a closed-form line features initialization method is presented, and combined with point features to obtain a more integrated and robust linear system; 2) a monocular depth network is adopted to provide learned affine-invariant depth map, requiring only one prior depth map for the first frame, which can improve performance under low-parallax scenarios; 3) we can easily use RANSAC to reject outliers in solving linear system based on our formulation. Moreover, line feature re-projection residual is added to visual-inertial bundle adjustment (VI-BA) to obtain more accurate initial parameters. The proposed method is more accurate and robust than state-of-the-art methods due to the line features, especially under extreme low-parallax scenarios, and extensive experiments on popular datasets have confirmed, 0.5s initialization window on EuRoC MAV, 0.3s initialization window on TUM-VI, while the standard method normally waits for a window of 2s.
|
|
10:30-10:45, Paper FrAT1.3 | |
NVINS: Robust Visual Inertial Navigation Fused with NeRF-Augmented Camera Pose Regressor and Uncertainty Quantification |
|
Han, Juyeop | Massachusetts Institute of Technology |
Lao Beyer, Lukas | Massachusetts Institute of Technology |
Cavalheiro, Guilherme | MIT |
Karaman, Sertac | Massachusetts Institute of Technology |
Keywords: Visual-Inertial SLAM, Deep Learning for Visual Perception, Probabilistic Inference
Abstract: In recent years, Neural Radiance Fields (NeRF) have emerged as a powerful tool for 3D reconstruction and novel view synthesis. However, the computational cost of NeRF rendering and degradation in quality due to the presence of artifacts pose significant challenges for its application in real-time and robust robotic tasks, especially on embedded systems. This paper introduces a novel framework that integrates NeRF-derived localization information with Visual-Inertial Odometry (VIO) to provide a robust solution for robotic navigation in a real-time. By training an absolute pose regression network with augmented image data rendered from a NeRF and quantifying its uncertainty, our approach effectively counters positional drift and enhances system reliability. We also establish a mathematically sound foundation for combining visual inertial navigation with camera localization neural networks, considering uncertainty under a Bayesian framework. Experimental validation in the photorealistic simulation environment demonstrates significant improvements in accuracy compared to a conventional VIO approach.
|
|
10:45-11:00, Paper FrAT1.4 | |
Online Refractive Camera Model Calibration in Visual Inertial Odometry |
|
Singh, Mohit | NTNU: Norwegian University of Science and Technology |
Alexis, Kostas | NTNU - Norwegian University of Science and Technology |
Keywords: Visual-Inertial SLAM, Marine Robotics
Abstract: This paper presents a general refractive camera model and online co-estimation of odometry and the refractive index of an unknown media. This enables operation in diverse and varying refractive fluids, given only the camera calibration in air. The refractive index is estimated online as a state variable of a monocular visual-inertial odometry framework in an iterative formulation using the proposed camera model. The method was verified on data collected using an underwater robot traversing inside a pool. The evaluations demonstrate convergence to the ideal refractive index for water despite significant perturbations in the initialization. Simultaneously, the approach enables on-par visual-inertial odometry performance in refractive media without prior knowledge of the refractive index or requirement of medium-specific camera calibration.
|
|
FrAT2 |
Room 2 |
Neurorobotics |
Regular session |
Co-Chair: Anil Meera, Ajith | Radboud University |
|
10:00-10:15, Paper FrAT2.1 | |
Confidence-Aware Decision-Making and Control for Tool Selection |
|
Anil Meera, Ajith | Radboud University |
Lanillos, Pablo | Donders Institute for Brain, Cognition and Behavior, Radboud Uni |
Keywords: Neurorobotics, Cognitive Modeling, Cognitive Control Architectures
Abstract: Self-reflecting about our performance (e.g., how confident we are) before doing a task is essential for decision making, such as selecting the most suitable tool or choosing the best route to drive. While this form of awareness (thinking about our performance or metacognitive performance) is well-known in humans, robots still lack this cognitive ability. This reflective monitoring can enhance their embodied decision power, robustness and safety. Here, we take a step in this direction by introducing a mathematical framework that allows robots to use their control self-confidence to make better-informed decisions. We derive a mathematical closed-form expression for control confidence for dynamic systems (i.e., the posterior inverse covariance of the control action). This control confidence seamlessly integrates within an objective function for decision making, that balances the: i) performance for task completion, ii) control effort, and iii) self-confidence. To evaluate our theoretical account, we framed the decision-making within the tool selection problem, where the agent has to select the best robot arm for a particular control task. The statistical analysis of the numerical simulations with randomized 2DOF arms shows that using control confidence during tool selection improves both real task performance, and the reliability of the tool for performance under unmodelled perturbations (e.g., external forces). Furthermore, our results indicate that control confidence is an early indicator of performance and thus, it can be used as a heuristic for making decisions when computation power is restricted or decision-making is intractable. Overall, we show the advantages of using confidence-aware decision-making and control scheme for dynamic systems.
|
|
10:15-10:30, Paper FrAT2.2 | |
Environment Transformer and Policy Optimization for Model-Based Offline Reinforcement Learning |
|
Wang, Pengqin | The Hong Kong University of Science and Technology |
Zhu, Meixin | Hong Kong University of Science and Technology (Guangzhou) |
Shen, Shaojie | Hong Kong University of Science and Technology |
Keywords: Environment Monitoring and Management, Long term Interaction, Dynamics
Abstract: Interacting with the actual environment to acquire data is often costly and time-consuming in robotic tasks. Model-based offline reinforcement learning (RL) provides a feasible solution. On the one hand, it eliminates the requirements of interaction with the actual environment. On the other hand, it learns the transition dynamics and reward function from the offline datasets and generates simulated rollouts to accelerate training. Previous model-based offline RL methods adopt probabilistic ensemble neural networks (NN) to model aleatoric uncertainty and epistemic uncertainty. However, this results in a great increase in training time and computing resource requirements. Furthermore, these methods are easily disturbed by the accumulative errors of the environment dynamics models when simulating long-term rollouts. To solve the above problems, we propose an uncertainty-aware sequence modeling architecture called Environment Transformer. It models the probability distribution of the environment dynamics and reward function to capture aleatoric uncertainty and treats epistemic uncertainty as a learnable noise parameter. Benefiting from the accurate modeling of the transition dynamics and reward function, Environment Transformer can be combined with arbitrary planning, dynamics programming, or policy optimization algorithms for offline RL. In this case, we perform Conservative Q-Learning (CQL) to learn a conservative Q-function. Through simulation experiments, we demonstrate that our method achieves or exceeds state-of-the-art performance in widely studied offline RL benchmarks. Moreover, we show that Environment Transformer's simulated rollout quality, sample efficiency, and long-term rollout simulation capability are superior to those of previous model-based offline RL methods.
|
|
10:30-10:45, Paper FrAT2.3 | |
Learning to Recover from Plan Execution Errors During Robot Manipulation: A Neuro-Symbolic Approach |
|
Kalithasan, Namasivayam | Indian Institute of Technology, Delhi |
Tuli, Arnav | Indian Institute of Technology, Delhi |
Bindal, Vishal | Indian Institute of Technology, Delhi |
Singh, Himanshu Gaurav | University of California Berkeley |
Singla, Parag | Indian Institute of Technology, Delhi |
Paul, Rohan | Indian Institute of Technology Delhi |
Keywords: Failure Detection and Recovery, Manipulation Planning, Learning from Demonstration
Abstract: Automatically detecting and recovering from failures is an important but challenging problem for autonomous robots. Most of the recent work on learning to plan from demonstrations lacks the ability to detect and recover from errors in the absence of an explicit state representation and/or a (sub-) goal check function. We propose an approach (blending learning with symbolic search) for automated error discovery and recovery, without needing annotated data of failures. Central to our approach is a neuro-symbolic state representation, in the form of dense scene graph, structured based on the objects present within the environment. This enables efficient learning of the transition function and a discriminator that not only identifies failures but also localizes them facilitating fast re-planning via computation of heuristic distance function. We also present an anytime version of our algorithm, where instead of recovering to the last correct state, we search for a sub-goal in the original plan minimizing the total distance to the goal given a re-planning budget. Experiments on a physics simulator with a variety of simulated failures show the effectiveness of our approach compared to existing baselines, both in terms of efficiency as well as accuracy of our recovery mechanism.
|
|
10:45-11:00, Paper FrAT2.4 | |
MLPER: Multi-Level Prompts for Adaptively Enhancing Vision-Language Emotion Recognition |
|
Gao, Yu | Harbin Institute of Technology, Shenzhen |
Ren, Weihong | Harbin Institute of Technology (Shenzhen) |
Xu, Xinglong | Harbin Institute of Technology(Shenzhen) |
Wang, Yan | Harbin Institute of Technology |
Wang, Zhiyong | Harbin Institute of Technology Shenzhen |
Liu, Honghai | Portsmouth University |
Keywords: Gesture, Posture and Facial Expressions, Computer Vision for Medical Robotics, Emotional Robotics
Abstract: In the fields of robotics, vision-based Emotion Recognition (ER) has achieved significant progress, but it still faces the challenge of poor generalization ability under unconstrained conditions (e.g., occlusions and pose variations). In this work, we propose MLPER model, which introduces Vision-Language Model for Emotion Recognition to learn discriminative representations adaptively. Specifically, different from typically leveraging a hand-crafted prompt (e.g., “a photo of a [class] person”), we first establish Multi-Level Prompts from three aspects: facial expression, human posture and situational condition using large language models, like ChatGPT. Correspondingly, we extract the visual tokens from three levels: the face, body, and context. Further, to achieve fine-grained alignment on each level, we adopt textual tokens from the positive and the hard negative to query visual tokens, predicting whether a pair of image and text is matched. Experimental results demonstrate that our MLPER model outperforms the state-of-the-art methods on several ER benchmarks, especially under the conditions of occlusions and pose variations.
|
|
FrAT3 |
Room 3 |
Cooperative Manipulation |
Regular session |
Chair: Hamaya, Masashi | OMRON SINIC X Corporation |
Co-Chair: Park, J. hyeon | Samsung Electronics |
|
10:00-10:15, Paper FrAT3.1 | |
Hierarchical Action Chunking Transformer: Learning Temporal Multimodality from Demonstrations with Fast Imitation Behavior |
|
Park, J. hyeon | Samsung Electronics |
Choi, Wonhyuk | SAMSUNG Electronics |
Hong, Sunpyo | Samsung Electronics |
Seo, Hoseong | Seoul National University |
Ahn, Joonmo | Samsung Electronics |
Ha, Changsu | Samsung Electronics |
Han, Heungwoo | Samsung Research |
Kwon, Junghyun | Seoul National University |
Keywords: Bimanual Manipulation, Learning from Demonstration, Deep Learning in Grasping and Manipulation
Abstract: Behavioral cloning from human demonstrations has succeeded in programming a robot to generate fine-grained motion, but it is still challenging to learn multimodal trajectories such as with various speeds. This restricts the use of a robot dataset collected by multiusers because the different proficiency of robot operators makes the dataset have diverse distributions of speed. To tackle this issue, we develop Hierarchical Action Chunking Transformer with Vector-quantization (HACT-Vq) to efficiently learn temporal multimodality in addition to fine-grained motion. The proposed hierarchical model consists of a high-level policy to make planning for a latent subgoal and style, and a low-level policy to predict an action chunk conditioned with the latent subgoal and style. The latent subgoal and style are trained as discrete representations so that high-level policy can efficiently learn multimodal distributions of demonstrations and retrieve the mode of fast behavior. In experiments, we set up bimanual robots in both simulation and real-world environments, and collected demonstrations with various speeds. The proposed model with the quantized subgoal and style showed the highest success rates with fast imitation behavior. Our code is available at textit{https://github.com/SamsungLabs/hierarchical-act.
|
|
10:15-10:30, Paper FrAT3.2 | |
Dynamic Manipulation of Deformable Objects Using Imitation Learning with Adaptation to Hardware Constraints |
|
Hannus, Eric | Aalto University |
Nguyen Le, Tran | Aalto University |
Blanco-Mulero, David | Institut De Robòtica I Informàtica Industrial, CSIC-UPC |
Kyrki, Ville | Aalto University |
Keywords: Bimanual Manipulation, Imitation Learning, Dual Arm Manipulation
Abstract: Imitation Learning (IL) is a promising paradigm for learning dynamic manipulation of deformable objects since it does not depend on difficult-to-create accurate simulations of such objects. However, the translation of motions demonstrated by a human to a robot is a challenge for IL, due to differences in the embodiments and the robot’s physical limits. These limits are especially relevant in dynamic manipulation where high velocities and accelerations are typical. To address this problem, we propose a framework that first maps a dynamic demonstration into a motion that respects the robot’s constraints using a constrained Dynamic Movement Primitive. Second,the resulting object state is further optimized by quasi-static refinement motions to optimize task performance metrics. This allows both efficiently altering the object state by dynamic motions and stable small-scale refinements. We evaluate the framework in the challenging task of bag opening, designing the system BILBO: Bimanual dynamic manipulation using Imitation Learning for Bag Opening. Our results show that BILBO can successfully open a wide range of crumpled bags, using a demonstration with a single bag. See supplementary material at https://sites.google.com/view/bilbo-bag.
|
|
10:30-10:45, Paper FrAT3.3 | |
Learning Variable Compliance Control from a Few Demonstrations for Bimanual Robot with Haptic Feedback Teleoperation System |
|
Kamijo, Tatsuya | The University of Tokyo |
Beltran-Hernandez, Cristian Camilo | OMRON SINIC X Corporation |
Hamaya, Masashi | OMRON SINIC X Corporation |
Keywords: AI-Enabled Robotics, Dual Arm Manipulation, Telerobotics and Teleoperation
Abstract: Automating dexterous, contact-rich manipulation tasks using rigid robots is a significant challenge in robotics. Rigid robots, defined by their actuation through position commands, face issues of excessive contact forces due to their inability to adapt to contact with the environment, potentially causing damage. While compliance control schemes have been introduced to mitigate these issues by controlling forces via external sensors, they are hampered by the need for fine-tuning task-specific controller parameters. Learning from Demonstrations (LfD) offers an intuitive alternative, allowing robots to learn manipulations through observed actions. In this work, we introduce a novel system to enhance the teaching of dexterous, contact-rich manipulations to rigid robots. Our system is twofold: firstly, it incorporates a teleoperation interface utilizing Virtual Reality (VR) controllers, designed to provide an intuitive and cost-effective method for task demonstration with haptic feedback. Secondly, we present Comp-ACT (Compliance Control via Action Chunking with Transformers), a method that leverages the demonstrations to learn variable compliance control from a few demonstrations. Our methods have been validated across various complex contact-rich manipulation tasks using single-arm and bimanual robot setups in simulated and real-world environments, demonstrating the effectiveness of our system in teaching robots dexterous manipulations with enhanced adaptability and safety. Code available at https://github.com/omron-sinicx/CompACT
|
|
10:45-11:00, Paper FrAT3.4 | |
Multi-Agent Behavior Retrieval: Retrieval-Augmented Policy Training for Cooperative Push Manipulation by Mobile Robots |
|
Kuroki, So | The University of Tokyo |
Nishimura, Mai | Omron Sinic X |
Kozuno, Tadashi | Omron Sinic X |
Keywords: Cooperating Robots, Path Planning for Multiple Mobile Robots or Agents, Machine Learning for Robot Control
Abstract: Due to the complex interactions between agents, learning multi-agent control policy often requires a prohibitive amount of data. This paper aims to enable multi-agent systems to effectively utilize past memories to adapt to novel collaborative tasks in a data-efficient fashion. We propose the Multi-Agent Coordination Skill Database, a repository for storing a collection of coordinated behaviors associated with key vectors distinctive to them. Our Transformer-based skill encoder effectively captures spatio-temporal interactions that contribute to coordination and provides a unique skill representation for each coordinated behavior. By leveraging only a small number of demonstrations of the target task, the database enables us to train the policy using a dataset augmented with the retrieved demonstrations. Experimental evaluations demonstrate that our method achieves a significantly higher success rate in push manipulation tasks compared with baseline methods like few-shot imitation learning. Furthermore, we validate the effectiveness of our retrieve-and-learn framework in a real environment using a team of wheeled robots.
|
|
FrAT4 |
Room 4 |
Underactuated Robots |
Regular session |
Co-Chair: Renda, Federico | Khalifa University of Science and Technology |
|
10:00-10:15, Paper FrAT4.1 | |
On Performing Non-Prehensile Rolling Manipulations: Stabilizing Synchronous Motions of Butterfly Robots |
|
Surov, Maksim | Arrival R&D |
Pchelkin, Stepan | Huawei |
Shiriaev, Anton | Norwegian University of Science and Technology |
Gusev, Sergei V. | St. Petersburg State University |
Freidovich, Leonid | Umeå University |
Keywords: Motion Control, Underactuated Robots, Manipulation Planning
Abstract: The paper explores the challenging task of performing a non-prehensile manipulation of several balls synchronously rolling on the curved hands of Butterfly robots. Each Butterfly robot represents a standard benchmark hardware setup, comprising a DC motor rotating a butterfly-shaped frame in a vertical plane, with a ball moving freely upon it, equipped with integrated computer vision, communication, programmable control, and computation interfaces. The combined dynamics of the considered system, consisting of N (2 or more) such robots, is inherently underactuated, characterized by N active and N passive degrees of freedom, as well as N independent unilateral constraints that model the interactions between the frames and the balls, assuming no slipping. We focus on designing a model-based centralized feedback controller to achieve synchronized rotations of the balls. We assume the accuracy of our mathematical model and the feasibility of implementing a discretized digital controller with a small sampling time. Without rigorous analysis, we will experimentally check robustness to various inevitable challenges such as noises, disturbances, uncertainties, and communication delays. Hence, our concentration lies in designing an orbitally stabilizing controller for the underactuated models. The primary contribution is proposing one set of transverse coordinates, enabling transverse-linearization-based controller design, accompanied by pertinent closed-loop system analysis tools, thereby enhancing the efficacy of solving the manipulation task. Analytical and model-based arguments are validated through successful simulations and experiments conducted on two Butterfly robots, thereby emphasizing the validity and practicality of the proposed approach.
|
|
10:15-10:30, Paper FrAT4.2 | |
Dynamic Walking on Highly Underactuated Point Foot Humanoids: Closing the Loop between HZD and HLIP |
|
Ghansah, Adrian | California Institute of Technology |
Kim, Jeeseop | Caltech |
Li, Kejun | California Institute of Technology |
Ames, Aaron | Caltech |
Keywords: Underactuated Robots, Humanoid and Bipedal Locomotion, Whole-Body Motion Planning and Control
Abstract: Realizing bipedal locomotion on humanoid robots with point feet is especially challenging due to their highly underactuated nature, high degrees of freedom, and hybrid dynamics resulting from impacts. With the goal of addressing this challenging problem, this paper develops a control framework for realizing dynamic locomotion and implements it on a novel point foot humanoid: ADAM. To this end, we close the loop between Hybrid Zero Dynamics (HZD) and Hybrid linear inverted pendulum (HLIP) based step length regulation. To leverage the full-order hybrid dynamics of the robot, walking gaits are first generated offline by utilizing HZD. These trajectories are stabilized online through the use of a HLIP based regulator. Finally, the planned trajectories are mapped into the full-order system by using a task space controller incorporating inverse kinematics. The proposed method is verified through numerical simulations and hardware experiments on the humanoid robot ADAM marking the first humanoid point foot walking. Moreover, we experimentally demonstrate the robustness of the realized walking via the ability to track a desired reference speed, robustness to pushes, and locomotion on uneven terrain.
|
|
10:30-10:45, Paper FrAT4.3 | |
Reinforcement Learning Control for Autonomous Hydraulic Material Handling Machines with Underactuated Tools |
|
Spinelli, Filippo Alberto | ETH Zürich |
Egli, Pascal Arturo | RSL, ETHZ |
Nubert, Julian | ETH Zürich |
Nan, Fang | ETH Zurich |
Bleumer, Thilo | Liebherr Hydraulikbagger GmbH |
Goegler, Patrick | Liebherr Hydraulikbagger GmbH |
Brockes, Stephan | Liebherr Hydraulikbagger GmbH |
Hofmann, Ferdinand | Liebherr Hydraulikbagger GmbH |
Hutter, Marco | ETH Zurich |
Keywords: Robotics and Automation in Construction, Hydraulic/Pneumatic Actuators, Reinforcement Learning
Abstract: The precise and safe control of heavy material handling machines presents numerous challenges due to the hard-to-model hydraulically actuated joints and the need for collision-free trajectory planning with a free-swinging end-effector tool. In this work, we propose an RL-based controller that commands the cabin joint and the arm simultaneously. It is trained in a simulation combining data-driven modeling techniques with first-principles modeling. On the one hand, we employ a neural network model to capture the highly nonlinear dynamics of the upper carriage turn hydraulic motor, incorporating explicit pressure prediction to handle delays better. On the other hand, we model the arm as velocity-controllable and the free-swinging end-effector tool as a damped pendulum using first principles. This combined model enhances our simulation environment, enabling the training of RL controllers that can be directly transferred to the real machine. Designed to reach steady-state Cartesian targets, the RL controller learns to leverage the hydraulic dynamics to improve accuracy, maintain high speeds, and minimize end-effector tool oscillations. Our controller, tested on a mid-size prototype material handler, is more accurate than an inexperienced operator and causes fewer tool oscillations. It demonstrates competitive performance even compared to an experienced professional driver.
|
|
10:45-11:00, Paper FrAT4.4 | |
Motion Primitives Planning for Center-Articulated Vehicles |
|
Hu, Jiangpeng | ETH |
Yang, Fan | ETH Zurich |
Nan, Fang | ETH Zurich |
Hutter, Marco | ETH Zurich |
Keywords: Reactive and Sensor-Based Planning, Motion and Path Planning, Motion Control
Abstract: Autonomous navigation across unstructured terrains, including forests and construction areas, faces unique challenges due to intricate obstacles and the element of the unknown. Lacking pre-existing maps, these scenarios necessitate a motion planning approach that combines agility with efficiency. Critically, it must also incorporate the robot's kinematic constraints to navigate more effectively through complex environments. This work introduces a novel planning method for center-articulated vehicles (CAV), leveraging motion primitives within a receding horizon planning framework using onboard sensing. The approach commences with the offline creation of motion primitives, generated through forward simulations that reflect the distinct kinematic model of center-articulated vehicles. These primitives undergo evaluation through a heuristic-based scoring function, facilitating the selection of the most suitable path for real-time navigation. To account for disturbances, we develop a pose-stabilizing controller, tailored to the kinematic specifications of center-articulated vehicles. During experiments, our method demonstrates a 67% improvement in SPL (Success Rate weighted by Path Length) performance over existing strategies. Furthermore, its efficacy was validated through real-world experiments conducted with a tree harvester vehicle - SAHA.
|
|
FrAT5 |
Room 5 |
Robust and Adaptive Control II |
Regular session |
Chair: Kumar, Shivesh | DFKI GmbH |
|
10:00-10:15, Paper FrAT5.1 | |
Grow-To-Shape Control of Variable Length Continuum Robots Via Adaptive Visual Servoing |
|
Gandhi, Abhinav | Worceser Polytechnic Institute |
Chiang, Shou-Shan | Worcester Polytechnic Institute |
Onal, Cagdas | WPI |
Calli, Berk | Worcester Polytechnic Institute |
Keywords: Modeling, Control, and Learning for Soft Robots, Visual Servoing, Robust/Adaptive Control
Abstract: In this paper, we propose an adaptive eye-to-hand vision-based control methodology, which enables a closed-loop grow-to-shape capability for variable length continuum manipulators in 2D. Our method utilizes shape features of the continuum robot, i.e. module curvature and length, which are obtained from the image. Our adaptive control algorithm servos the robot to converge and track the desired values of these features in the image space without the need of a robot model. As a result the robot starts from a minimum length configuration and grows into a given desired shape, always staying on the course of the desired shape. We believe that this approach unlocks capabilities for variable length continuum robots by leveraging their actuation redundancy and avoiding obstacles while carrying out object manipulation or inspection tasks in cluttered and constrained environments. We perform experiments in simulations and on a real robot to assess the performance of our visual servoing algorithm. Our experimental results demonstrate the controllers ability to accurately converge the current features to their references, for a variety of desired shapes in the image, while ensuring a smooth tracking response. We also present some proof of concept results demonstrating the effectiveness of this technique for controlling the robot in constrained environments. Markedly, this is the first successful demonstration for automatic grow-to-shape control using visual feedback for variable length continuum manipulators.
|
|
10:15-10:30, Paper FrAT5.2 | |
Feasibility-Guided Safety-Aware Model Predictive Control for Jump Markov Linear Systems |
|
Laouar, Zakariya | University of Colorado |
Ho, Qi Heng | University of Colorado Boulder |
Mazouz, Rayan | University of Colorado Boulder |
Becker, Tyler | University of Colorado Boulder |
Sunberg, Zachary | University of Colorado |
Keywords: Robust/Adaptive Control, Motion Control, Autonomous Agents
Abstract: In this paper, we present a controller framework that synthesizes control policies for Jump Markov Linear Systems subject to stochastic mode switches and imperfect mode estimation. Our approach builds on safe and robust methods for Model Predictive Control (MPC), but in contrast to existing approaches that either optimize without regard to feasibility or utilize soft constraints that increase computational requirements, we employ a safe and robust control approach informed by the feasibility of the optimization problem. We formulate and encode finite horizon safety for multiple model systems in our MPC design using Control Barrier Functions (CBFs). When subject to inaccurate hybrid state estimation, our feasibility-guided MPC generates a control policy that is maximally robust to uncertainty in the system's modes. We evaluate our approach on an orbital rendezvous problem and a six degree-of-freedom hexacopter under several scenarios and benchmarks to demonstrate the utility of the framework. Results indicate that the proposed technique of maximizing the robustness horizon, and the use of CBFs for safety awareness, improve the overall safety and performance of MPC for Jump Markov Linear Systems.
|
|
10:30-10:45, Paper FrAT5.3 | |
Adaptive Stochastic Nonlinear Model Predictive Control with Look-Ahead Deep Reinforcement Learning for Autonomous Vehicle Motion Control |
|
Zarrouki, Baha | Technical University of Munich |
Wang, Chenyang | Technical University of Munich |
Betz, Johannes | Technical University of Munich |
Keywords: Robust/Adaptive Control, Reinforcement Learning, Motion Control
Abstract: Propagating uncertainties through nonlinear system dynamics in the context of Stochastic Nonlinear Model Predictive Control (SNMPC) is challenging, especially for high-dimensional systems requiring real-time control and operating under time-variant uncertainties such as autonomous vehicles. In this work, we propose an Adaptive SNMPC (aSNMPC) driven by Deep Reinforcement Learning (DRL) to optimize uncertainty handling, constraints robustification, feasibility, and closed-loop performance. To this end, our SNMPC uses Polynomial Chaos Expansion (PCE) for efficient uncertainty propagation, limits its propagation time through an Uncertainty Propagation Horizon (UPH), and transforms nonlinear chance constraints into robustified deterministic ones. We conceive a DRL agent to proactively anticipate upcoming control tasks and to dynamically reduce conservatism by determining the most suitable constraints robustification factor kappa, and to enhance feasibility by choosing optimal UPH length T_u. We analyze the trained DRL agent's decision-making process and highlight its ability to learn context-dependent optimal parameters. We showcase the enhanced robustness and feasibility of our DRL-driven aSNMPC through the real-time motion control task of an autonomous passenger vehicle when confronted with significant time-variant disturbances while achieving a minimum solution frequency of 110Hz. The code used in this research is publicly accessible as open-source software: https://github.com/bzarr/TUM-CONTROL
|
|
10:45-11:00, Paper FrAT5.4 | |
Accelerating Model Predictive Control for Legged Robots through Distributed Optimization |
|
Amatucci, Lorenzo | Istituto Italiano Di Tecnologia |
Turrisi, Giulio | Istituto Italiano Di Tecnologia |
Bratta, Angelo | Istituto Italiano Di Tecnologia |
Barasuol, Victor | Istituto Italiano Di Tecnologia |
Semini, Claudio | Istituto Italiano Di Tecnologia |
Keywords: Whole-Body Motion Planning and Control, Legged Robots, Optimization and Optimal Control
Abstract: This paper presents a novel approach to enhance Model Predictive Control (MPC) for legged robots through Distributed Optimization. Our method focuses on decomposing the robot dynamics into smaller, parallelizable subsystems, and utilizing the Alternating Direction Method of Multipliers (ADMM) to ensure consensus among them. Each subsystem is managed by its own Optimal Control Problem, with ADMM facilitating consistency between their optimizations. This approach not only decreases the computational time but also allows for effective scaling with more complex robot configurations, facilitating the integration of additional subsystems such as articulated arms on a quadruped robot. We demonstrate, through numerical evaluations, the convergence of our approach on two systems with increasing complexity. In addition, we showcase that our approach converges towards the same solution when compared to a state-of-the-art centralized whole-body MPC implementation. Moreover, we quantitatively compare the computational efficiency of our method to the centralized approach, revealing up to a 75% reduction in computational time. Overall, our approach offers a promising avenue for accelerating MPC solutions for legged robots, paving the way for more effective utilization of the computational performance of modern hardware. Accompanying video at https://youtu.be/Yar4W-Vlh2A. The related code can be found at https://github.com/iit-DLSLab/DWMPC.
|
|
FrAT6 |
Room 6 |
Aerial Systems: Applications III |
Regular session |
Chair: Loianno, Giuseppe | New York University |
|
10:00-10:15, Paper FrAT6.1 | |
Det-Recon-Reg: An Intelligent Framework towards Automated Large-Scale Infrastructure Inspection |
|
Yang, Guidong | The Chinese University of Hong Kong |
Zhang, Jihan | Chinese University of Hong Kong |
Zhao, Benyun | The Chinese University of Hong Kong |
Gao, Chuanxiang | The Chinese University of Hong Kong |
Huang, Yijun | The Chinese University of Hong Kong |
Wen, Junjie | The Chinese University of Hong Kong |
Li, Qingxiang | The Chineses University of Hong Kong |
Tang, Haoyun (Jerry) | UC Berkeley |
Chen, Xi | The Chinese University of Hong Kong |
Chen, Ben M. | Chinese University of Hong Kong |
Keywords: Aerial Systems: Applications
Abstract: Visual inspection plays a predominant role in inspecting infrastructure surface. However, the generalization of existing visual inspection systems to large-scale real-world scenes remains challenging. In this paper, we introduce Det-Recon-Reg, an intelligent framework separating the complex inspection procedure into three stages: Detect, Reconstruct, and Register. For defect detection (Detect), we present the first high-resolution defect dataset tailored for large-scale defect detection. Based on the dataset, we evaluate the most effective real-time object detection algorithms and push the boundary by proposing CUBIT-Net for real-world defect inspection. For infrastructure reconstruction (Reconstruct), we propose a learning-based multi view stereo (MVS) network to adapt to large-scale scenes, taking as input the multi-view images and outputting the point cloud reconstruction, where its performance has been validated on the standard MVS datasets, including BlendedMVS, DTU, and Tanks and Temples datasets. For defect localization (Register), we propose an effective registration method based on the geographic information system that registers the detected defects onto the reconstructed infrastructure model to establish a global reference for maintenance measures. The real-world experiments verify the effectiveness and efficiency of our proposed framework. Dataset, code, and appendix are available on our project page.
|
|
10:15-10:30, Paper FrAT6.2 | |
Kinodynamic Motion Planning for a Team of Multirotors Transporting a Cable-Suspended Payload in Cluttered Environments |
|
Wahba, Khaled | Technical University of Berlin |
Ortiz-Haro, Joaquim | TU Berlin |
Toussaint, Marc | TU Berlin |
Hoenig, Wolfgang | TU Berlin |
Keywords: Aerial Systems: Applications, Path Planning for Multiple Mobile Robots or Agents, Nonholonomic Motion Planning
Abstract: We propose a motion planner for cable-driven payload transportation using multiple unmanned aerial vehicles (UAVs) in an environment cluttered with obstacles. Our planner is kinodynamic, i.e., it considers the full dynamics model of the transporting system including actuation constraints. Due to the high dimensionality of the planning problem, we use a hierarchical approach where we first solve the geometric motion planning using a sampling-based method with a novel sampler, followed by constrained trajectory optimization that considers the full dynamics of the system. Both planning stages consider inter-robot and robot/obstacle collisions. We demonstrate in a software-in-the-loop simulation and real flight experiments that there is a significant benefit in kinodynamic motion planning for such payload transport systems with respect to payload tracking error and energy consumption compared to the standard methods of planning for the payload alone. Notably, we observe a significantly higher success rate in scenarios where the team formation changes are needed to move through tight spaces.
|
|
10:30-10:45, Paper FrAT6.3 | |
Learning Long-Horizon Predictions for Quadrotor Dynamics |
|
Rao, Pratyaksh | New York University |
Saviolo, Alessandro | New York University |
Castiglione Ferrari, Tommaso | Technology Innovation Institute |
Loianno, Giuseppe | New York University |
Keywords: Aerial Systems: Applications, Machine Learning for Robot Control, Deep Learning Methods
Abstract: Accurate modeling of system dynamics is crucial for achieving high-performance planning and control of robotic systems. Although existing data-driven approaches represent a promising approach for modeling dynamics, their accuracy is limited to a short prediction horizon, overlooking the impact of compounding prediction errors over longer prediction horizons. Strategies to mitigate these cumulative errors remain underexplored. To bridge this gap, in this paper, we study the key design choices for efficiently learning long-horizon prediction dynamics for quadrotors. Specifically, we analyze the impact of multiple architectures, historical data, and multi-step loss formulation. We show that sequential modeling techniques showcase their advantage in minimizing compounding errors compared to other types of solutions. Furthermore, we propose a novel decoupled dynamics learning approach, which further simplifies the learning process while also enhancing the approach modularity. Extensive experiments and ablation studies on real-world quadrotor data demonstrate the versatility and precision of the proposed approach. Our outcomes offer several insights and methodologies for enhancing long-term predictive accuracy of learned quadrotor dynamics for planning and control.
|
|
10:45-11:00, Paper FrAT6.4 | |
Design of an Adaptive Lightweight LiDAR to Decouple Robot-Camera Geometry (I) |
|
Chen, Yuyang | University at Buffalo |
Wang, Dingkang | University of Florida |
Thomas, Lenworth | University of Florida |
Dantu, Karthik | University of Buffalo |
Koppal, Sanjeev | University of Florida |
Keywords: Sensor-based Control, Range Sensing, SLAM, Adaptive Sensor Design
Abstract: A fundamental challenge in robot perception is the coupling of the sensor pose and robot pose. This has led to research in active vision where robot pose is changed to reorient the sensor to areas of interest for perception. Further, egomotion such as jitter, and external effects such as wind and others affect perception requiring additional effort in software such as image stabilization. This effect is particularly pronounced in micro-air vehicles and micro-robots who typically are lighter and subject to larger jitter but do not have the computational capability to perform stabilization in real-time. We present a novel microelectromechanical (MEMS) mirror LiDAR system to change the field of view of the LiDAR independent of the robot motion. Our design has the potential for use on small, low-power systems where the expensive components of the LiDAR can be placed external to the small robot. We show the utility of our approach in simulation and on prototype hardware mounted on a UAV. We believe that this LiDAR and its compact movable scanning design provide mechanisms to decouple robot and sensor geometry allowing us to simplify robot perception. We also demonstra
|
|
FrAT7 |
Room 7 |
Medical Robotics IV |
Regular session |
Chair: Arai, Fumihito | The University of Tokyo |
Co-Chair: Manoonpong, Poramate | Vidyasirimedhi Institute of Science and Technology (VISTEC) |
|
10:00-10:15, Paper FrAT7.1 | |
Single Protoplasts Pickup System Combining Brightfield and Confocal Images |
|
Ando, Daito | The University of Tokyo |
Turan, Bilal | The University of Tokyo |
Amaya, Satoshi | The University of Tokyo |
Ukai, Yuko | Nagoya University |
Sato, Yoshikatsu | Nagoya University |
Arai, Fumihito | The University of Tokyo |
Keywords: Biological Cell Manipulation, Automation at Micro-Nano Scales
Abstract: This paper presents a system that picks up protoplasts produced by removing the surrounding cell wall of root cells while preserving their positional information. The fundamental concept of this system involves scanning the root tip over time using a confocal microscopy to measure the positional information of each cell. Then, the protoplast pickup is conducted after switching to a brightfield microscopy to ensure the certainty of pickup. The system measures the position of single protoplasts, adjusts the position of the pipette using a 3-axis micromanipulator, and picks up the target protoplast using a microfluidic pump driven by a piezoelectric actuator. To automate this pickup process, we achieved calibration of the system. The fully automatic 3D calibration of the pipette tip was achieved, allowing 3D micromanipulation under the microscope with an accuracy of 3.1 um in the XY-plane. Furthermore, by implementing multiple functions such as automatic detection of protoplasts, the process of protoplast pickup has been achieved.
|
|
10:15-10:30, Paper FrAT7.2 | |
SuPerPM: A Surgical Perception Framework Based on Deep Point Matching Learned from Physical Constrained Simulation Data |
|
Lin, Shan | University of California, San Diego |
Miao, Albert | University of California, San Diego |
Alabiad, Ali | University of California San Diego |
Liu, Fei | University of Tennessee Knoxville |
Wang, Kaiyuan | University of California, San Diego |
Lu, Jingpei | University of California San Diego |
Richter, Florian | University of California, San Diego |
Yip, Michael C. | University of California, San Diego |
Keywords: Computer Vision for Medical Robotics, Surgical Robotics: Laparoscopy
Abstract: A major source of endoscopic tissue tracking errors during deformations stems from wrong data association between observed sensor measurements with previously tracked scene. To mitigate this issue, we present a surgical perception framework, SuPerPM, that leverages learning-based non-rigid point cloud matching for data association, thus accommodating larger deformations than previous approaches which relied on Iterative Closest Point (ICP) for point associations. The learning models typically require training data with ground truth point cloud correspondences, which is challenging or even impractical to collect in surgical environments. Thus, for tuning the learning model, we gather endoscopic data of soft tissue being manipulated by a surgical robot and then establish correspondences between point clouds at different time points to serve as ground truth. This was achieved by employing a position-based dynamics (PBD) simulation to ensure that the correspondences adhered to physical constraints. The proposed framework is demonstrated on several challenging surgical datasets that are characterized by large deformations, achieving superior performance over advanced surgical scene tracking algorithms.
|
|
10:30-10:45, Paper FrAT7.3 | |
Towards Design and Development of a Soft Pressure Sensing Sleeve for Performing Safe Colonoscopic Procedures |
|
Rafiee Javazm, Mohammad | University of Texas at Austin |
Kiehler, Sonika | The University of Texas at Austin |
Kara, Ozdemir Can | University of Texas at Austin |
Alambeigi, Farshid | University of Texas at Austin |
Keywords: Medical Robots and Systems, Soft Sensors and Actuators, Surgical Robotics: Steerable Catheters/Needles
Abstract: In this paper, with the goal of enhancing the safety of current colonoscopic procedures and providing the pressure and location of the contact between the colonoscope and the colon's surface, we propose design and development of a unique Soft Pressure Sensing Sleeve (SPSS). SPSS can seamlessly be integrated with the existing colonoscopic devices and would not change the existing diagnosis workflow. The pressure sensing of SPSS is performed based on the resistance change of a liquid metal (i.e., Gallium) embedded into several micro-channels located within SPSS's deformable sleeve when it interacts with the colon surface. To demonstrate functionality of the SPSS, without loss of generality, in this paper, we designed and fabricated a SPSS with 4 sensing regions. We also proposed and experimentally evaluated an empirical calibration function for this sensor. Results demonstrate high accuracy (RMSE=2.45 and mean absolute error < 3%) of the proposed calibration function compared with the evaluation experiments.
|
|
10:45-11:00, Paper FrAT7.4 | |
Online Adaptive Impedance Control with Gravity Compensation for an Interactive Lower-Limb Exoskeleton |
|
Janna, Run | Vidyasirimedhi Institute of Science and Technology |
Tarapongnivat, Kanut | Vidyasirimedhi Institute of Science and Technology |
Sricom, Natchaya | VISTEC |
Akkawutvanich, Akkawutvanich | Vidyasirimedhi Institute of Science and Technology |
Xiong, Xiaofeng | University of Southern Denmark |
Manoonpong, Poramate | Vidyasirimedhi Institute of Science and Technology (VISTEC) |
Keywords: Rehabilitation Robotics, Physical Human-Robot Interaction, Compliance and Impedance Control
Abstract: While lower-limb exoskeletons have been increasingly used for gait assistance and rehabilitation, most of them continue to function as assistive devices, with the exoskeleton-user relationship as a leader and follower. This limits the user's ability to interactively contribute to gait control. Therefore, this study proposes an interactive user-exoskeleton control strategy aiming to translate the exoskeletons into interactive compliant companion devices with the exoskeleton-user relationship as collaborators. This control strategy is implemented through online adaptive impedance control with gravity compensation (OAIC-GC). It relies solely on internal pose feedback (joint position), rather than external sensors such as electromyography, torque, or force, as utilized in other assist-as-needed (AAN) control methods. The OAIC-GC can automatically capture the mechanical impedance dynamics of the user's lower limbs during walking, and thus facilitate adaptive, versatile, and personalized gait assistance. It is evaluated using a real lower-limb exoskeleton system with six degrees of freedom (DOFs) across different users engaging in various activities. These activities include symmetrical and asymmetrical walking on a split-belt treadmill at different speeds, as well as walking up stairs. The results indicate a significant improvement in the exoskeleton's performance in terms of adaptability and movement smoothness under all activities when compared to traditional control. The proposed control reduces joint assistance torque across all exoskeleton joints, enhancing user interaction and comfort. This enables users to actively control their gait patterns, enabling the exoskeleton to operate in an interactive assist-as-needed (IAAN) mode.
|
|
FrAT8 |
Room 8 |
Mapping II |
Regular session |
Co-Chair: Verdoja, Francesco | Aalto University |
|
10:00-10:15, Paper FrAT8.1 | |
Bayesian Floor Field: Transferring People Flow Predictions across Environments |
|
Verdoja, Francesco | Aalto University |
Kucner, Tomasz Piotr | Aalto University |
Kyrki, Ville | Aalto University |
Keywords: Mapping, Semantic Scene Understanding, Service Robotics
Abstract: Mapping people dynamics is a crucial skill for robots, because it enables them to coexist in human-inhabited environments. However, learning a model of people dynamics is a time consuming process which requires observation of large amount of people moving in an environment. Moreover, approaches for mapping dynamics are unable to transfer the learned models across environments: each model is only able to describe the dynamics of the environment it has been built in. However, the impact of architectural geometry on people's movement can be used to anticipate their patterns of dynamics, and recent work has looked into learning maps of dynamics from occupancy. So far however, approaches based on trajectories and those based on geometry have not been combined. In this work we propose a novel Bayesian approach to learn people dynamics able to combine knowledge about the environment geometry with observations from human trajectories. An occupancy-based deep prior is used to build an initial transition model without requiring any observations of pedestrian; the model is then updated when observations become available using Bayesian inference. We demonstrate the ability of our model to increase data efficiency and to generalize across real large-scale environments, which is unprecedented for maps of dynamics.
|
|
10:15-10:30, Paper FrAT8.2 | |
Leveraging GNSS and Onboard Visual Data from Consumer Vehicles for Robust Road Network Estimation |
|
Opra, István Balázs | Woven by Toyota / University of Bonn |
Le Dem, Betty | Woven by Toyota |
Walls, Jeffrey | University of Michigan |
Lukarski, Dimitar | Woven by Toyota |
Stachniss, Cyrill | University of Bonn |
Keywords: Mapping, Intelligent Transportation Systems
Abstract: Maps are essential for diverse applications, such as vehicle navigation and autonomous robotics. Both require spatial models for effective route planning and localization. This paper addresses the challenge of road graph construction for autonomous vehicles. Despite recent advances, creating a road graph remains labor-intensive and has yet to achieve full automation. The goal of this paper is to generate such graphs automatically and accurately. Modern cars are equipped with onboard sensors used for today's advanced driver assistance systems like lane keeping. We propose using global navigation satellite system (GNSS) traces and basic image data acquired from these standard sensors in consumer vehicles to estimate road-level maps with minimal effort. We exploit the spatial information in the data by framing the problem as a road centerline semantic segmentation task using a convolutional neural network. We also utilize the data's time series nature to refine the neural network's output by using map matching. We implemented and evaluated our method using a fleet of real consumer vehicles, only using the deployed onboard sensors. Our evaluation demonstrates that our approach not only matches existing methods on simpler road configurations but also significantly outperforms them on more complex road geometries and topologies. This work received the 2023 Woven by Toyota Invention Award.
|
|
10:30-10:45, Paper FrAT8.3 | |
Refractive COLMAP: Refractive Structure-From-Motion Revisited |
|
She, Mengkun | Kiel University |
Seegräber, Felix | Kiel University |
Nakath, David | University Kiel |
Koeser, Kevin | University of Kiel |
Keywords: Mapping, SLAM, Marine Robotics
Abstract: In this paper, we present a complete refractive Structure-from-Motion (RSfM) framework for underwater 3D reconstruction using refractive camera setups (for both, flat- and dome-port underwater housings). Despite notable achievements in refractive multi-view geometry over the past decade, a robust, complete and publicly available solution for such tasks is not available at present, and often practical applications have to resort to approximating refraction effects by the intrinsic (distortion) parameters of a pinhole camera model. To fill this gap, we have integrated refraction considerations throughout the entire SfM process within the state-of-the-art, open-source SfM framework COLMAP. Numerical simulations and reconstruction results on synthetically generated but photo-realistic images with ground truth validate that enabling refraction does not compromise accuracy or robustness as compared to in-air reconstructions. Finally, we demonstrate the capability of our approach for large-scale refractive scenarios using a dataset consisting of nearly 6000 images. The implementation is released as open-source at: https://cau-git.rz.uni-kiel.de/inf-ag-koeser/colmap_underwa ter.
|
|
10:45-11:00, Paper FrAT8.4 | |
Evaluation and Deployment of LiDAR-Based Place Recognition in Dense Forests |
|
Oh, Haedam | University of Oxford |
Chebrolu, Nived | University of Oxford |
Mattamala, Matias | University of Oxford |
Freißmuth, Leonard | Technical University Munich |
Fallon, Maurice | University of Oxford |
Keywords: Robotics and Automation in Agriculture and Forestry, SLAM, Localization
Abstract: Many LiDAR place recognition systems have been developed and tested specifically for urban driving scenarios. Their performance in natural environments such as forests and woodlands have been studied less closely. In this paper, we analyze the capabilities of four different LiDAR place recognition systems, both handcrafted and learning-based methods, using LiDAR data collected with a handheld device and legged robot in dense forest environments. In particular, we focused on evaluating localization where there is significant transnational and orientation difference between corresponding LiDAR scan pairs. This is particularly important for forest survey systems where the sensor or robot does not follow a defined road or path. Extending our analysis we then incorporated the best performing approach, Logg3dNet, into a full 6-DoF pose estimation system---introducing several verification layers for precise registration. We demonstrated the performance of our methods in three operational modes: online SLAM, offline multi-mission SLAM map merging, and relocalization into a prior map. We evaluated these modes using data captured in forests from three different countries, achieving 80 % of correct loop closures candidates with baseline distances up to 5m, and 60 % up to 10m. Video at: https://youtu.be/86l-oxjwmjY
|
|
FrAT9 |
Room 9 |
Optimization and Optimal Control |
Regular session |
Co-Chair: Kyriakopoulos, Kostas | New York University - Abu Dhabi |
|
10:00-10:15, Paper FrAT9.1 | |
Centroidal State Estimation Based on the Koopman Embedding for Dynamic Legged Locomotion |
|
Khorshidi, Shahram | University of Bonn |
Elnagdi, Murad | University of Bonn |
Bennewitz, Maren | University of Bonn |
Keywords: Optimization and Optimal Control, Legged Robots, Deep Learning Methods
Abstract: In this paper, we introduce a novel approach to centroidal state estimation, which plays a crucial role in predictive model-based control strategies for dynamic legged locomotion. Our approach uses the Koopman operator theory to transform the robot's complex nonlinear dynamics into a linear system, by employing dynamic mode decomposition and deep learning for model construction. We evaluate both models on their linearization accuracy and capability to capture both fast and slow dynamic system responses. We then select the most suitable model for estimation purposes, and integrate it within a moving horizon estimator. This estimator is formulated as a convex quadratic program to facilitate robust, real-time centroidal state estimation. Through extensive simulation experiments on a quadruped robot executing various dynamic gaits, our data-driven framework outperforms conventional Extended Kalman Filtering technique based on nonlinear dynamics. Our estimator addresses challenges posed by force/torque measurement noise in highly dynamic motions and accurately recovers the centroidal states, demonstrating the adaptability and effectiveness of the Koopman-based linear representation for complex locomotive behaviors. Importantly, our model based on dynamic mode decomposition, trained with two locomotion patterns (trot and jump), successfully estimates the centroidal states for a different motion (bound) without retraining.
|
|
10:15-10:30, Paper FrAT9.2 | |
Perfecting Periodic Trajectory Tracking: Model Predictive Control with a Periodic Observer |
|
Pabon, Luis | Stanford University |
Köhler, Johannes | ETH Zurich |
Alora, John Irvin | Stanford University |
Eberhard, Patrick Benito | ETH Zurich |
Carron, Andrea | ETH Zurich |
Zeilinger, Melanie N. | ETH Zurich |
Pavone, Marco | Stanford University |
Keywords: Optimization and Optimal Control, Motion Control, Modeling, Control, and Learning for Soft Robots
Abstract: In Model Predictive Control (MPC), discrepancies between the actual system and the predictive model can lead to substantial tracking errors and significantly degrade performance and reliability. While such discrepancies can be alleviated with more complex models, this often complicates controller design and implementation. By leveraging the fact that many trajectories of interest are periodic, we show that perfect tracking is possible when incorporating a simple observer that estimates and compensates for periodic disturbances. We present the design of the observer and the accompanying tracking MPC scheme, proving that their combination achieves zero tracking error asymptotically, regardless of the complexity of the unmodelled dynamics. We validate the effectiveness of our method, demonstrating asymptotically perfect tracking on a high-dimensional soft robot with nearly 10,000 states and a fivefold reduction in tracking errors compared to a baseline MPC on small-scale autonomous race car experiments.
|
|
10:30-10:45, Paper FrAT9.3 | |
Pose Graph Optimization Over Planar Unit Dual Quaternions: Improved Accuracy with Provably Convergent Riemannian Optimization |
|
Warke, William | University of Florida |
Ramos, J Humberto | University of Florida |
Ganesh, Prashant | EpiSys Science Inc |
Brink, Kevin | AFRL |
Hale, Matthew | Georgia Institute of Technology |
Keywords: Optimization and Optimal Control, Localization
Abstract: It is common in pose graph optimization (PGO) algorithms to assume that noise in the translations and rotations of relative pose measurements is uncorrelated. However, existing work shows that in practice these measurements can be highly correlated, which leads to degradation in the accuracy of PGO solutions that rely on this assumption. Therefore, in this paper we develop a novel algorithm derived from a realistic, correlated model of relative pose uncertainty, and we quantify the resulting improvement in the accuracy of the solutions we obtain relative to state-of-the-art PGO algorithms. Our approach utilizes Riemannian optimization on the planar unit dual quaternion (PUDQ) manifold, and we prove that it converges to first-order stationary points of a Lie-theoretic maximum likelihood objective. Then we show experimentally that, compared to state-of-the-art PGO algorithms, this algorithm produces estimation errors that are lower by 10% to 25% across several orders of magnitude of correlated noise levels and graph sizes.
|
|
10:45-11:00, Paper FrAT9.4 | |
Probabilistic Homotopy Optimization for Dynamic Motion Planning |
|
Chignoli, Matthew | Massachusetts Institute of Technology |
Pardis, Shayan | MIT |
Kim, Sangbae | Massachusetts Institute of Technology |
Keywords: Optimization and Optimal Control, Task and Motion Planning, Humanoid Robot Systems
Abstract: We present a homotopic approach to solving challenging, optimization-based motion planning problems. The approach uses Homotopy Optimization, which, unlike standard continuation methods for solving homotopy problems, solves a sequence of constrained optimization problems rather than a sequence of nonlinear systems of equations. The insight behind our proposed algorithm is formulating the discovery of this sequence of optimization problems as a search problem in a multidimensional homotopy parameter space. Our proposed algorithm, the Probabilistic Homotopy Optimization algorithm, switches between solve and sample phases, using solutions to easy problems as initial guesses to more challenging problems. We analyze how our algorithm performs in the presence of common challenges to homotopy methods, such as bifurcation, folding, and disconnectedness of the homotopy solution manifold. Finally, we demonstrate its utility via a case study on two dynamic motion planning problems: the cart-pole and the MIT Humanoid.
|
|
FrAT10 |
Room 10 |
Deep Learning for Perception |
Regular session |
Chair: Jayasuriya, Suren | Arizona State University |
|
10:00-10:15, Paper FrAT10.1 | |
DarkGS: Learning Neural Illumination and 3D Gaussians Relighting for Robotic Exploration in the Dark |
|
Zhang, Tianyi | Carnegie Mellon University |
Huang, Kaining | Carnegie Mellon University |
Zhi, Weiming | Carnegie Mellon University |
Johnson-Roberson, Matthew | Carnegie Mellon University |
Keywords: Deep Learning for Visual Perception, Calibration and Identification
Abstract: Humans have the remarkable ability to construct consistent mental models of an environment, even under limited or varying levels of illumination. We wish to endow robots with this same capability. In this paper, we tackle the challenge of constructing a photorealistic scene representation under poorly illuminated conditions and with a moving light source. We approach the task of modeling illumination as a learning problem, and utilize the developed illumination model to aid in scene reconstruction. We introduce an innovative framework that uses a data-driven approach, Neural Light Simulators (NeLiS), to model and calibrate the camera-light system. Furthermore, we present DarkGS, a method that applies NeLiS to create a relightable 3D Gaussian scene model capable of real-time, photorealistic rendering from novel viewpoints. We show the applicability and robustness of our proposed simulator and system in a variety of real-world environments.
|
|
10:15-10:30, Paper FrAT10.2 | |
NeuralFloors++: Consistent Street-Level Scene Generation from BEV Semantic Maps |
|
Musat, Valentina | University of Oxford |
De Martini, Daniele | University of Oxford |
Gadd, Matthew | University of Oxford |
Newman, Paul | Oxford University |
Keywords: Deep Learning for Visual Perception, Computer Vision for Transportation, Semantic Scene Understanding
Abstract: Learning autonomous driving capabilities requires diverse and realistic training data. This has led to exploring generative techniques as an alternative to real-world data collection. In this paper we propose a method for synthesising photo-realistic urban driving scenes, along with semantic, instance and depth ground-truth. Our model relies on Bird's Eye View (BEV) representations due to their compositionality and scene content control capabilities, reducing the need for traditional simulators. We employ a two-stage process: first, a 3D scene representation is extracted from BEV semantic, instance and style maps using a neural field. After rendering the semantic, instance, depth and style maps from a ground-view perspective, a second stage based on a diffusion model is used to generate the photo-realistic scene. We extend our prior work - NeuralFloors, to include multiple-view outputs, style manipulation for finer control at the object level through instance-wise style maps and cross-frame consistency via auto-regressive training. The proposed system is evaluated extensively on the KITTI-360 dataset, showing improved realism and semantic alignment for generated images.
|
|
10:30-10:45, Paper FrAT10.3 | |
PathFinder: Attention-Driven Dynamic Non-Line-Of-Sight Tracking with a Mobile Robot |
|
Kannapiran, Shenbagaraj | Arizona State University |
Chandran, Sreenithy | Arizona State University, USA |
Jayasuriya, Suren | Arizona State University |
Berman, Spring | Arizona State University |
Keywords: Deep Learning for Visual Perception, Visual Tracking, Data Sets for Robotic Vision
Abstract: The study of non-line-of-sight (NLOS) imaging is growing due to its many potential applications, including rescue operations and pedestrian detection by self-driving cars. However, implementing NLOS imaging on a moving camera remains an open area of research. Existing NLOS imaging methods rely on time-resolved detectors and laser configurations that require precise optical alignment, making it difficult to deploy them in dynamic environments. This work proposes a data-driven approach to NLOS imaging, PathFinder, that can be used with a standard RGB camera mounted on a small, power-constrained mobile robot, such as an aerial drone. Our experimental pipeline is designed to accurately estimate the 2D trajectory of a person who moves in a Manhattan-world environment while remaining hidden from the camera’s field- of-view. We introduce a novel approach to process a sequence of dynamic successive frames in a line-of-sight (LOS) video using an attention-based neural network that performs inference in real-time. The method also includes a preprocessing selection metric that analyzes images from a moving camera which contain multiple vertical planar surfaces, such as walls and building facades, and extracts planes that return maximum NLOS information. We validate the approach on in-the-wild scenes using a drone for video capture, thus demonstrating low-cost NLOS imaging in dynamic capture environments.
|
|
10:45-11:00, Paper FrAT10.4 | |
Text3DAug – Prompted Instance Augmentation for LiDAR Perception |
|
Reichardt, Laurenz | HS Mannheim |
Uhr, Luca | Hochschule Mannheim |
Wasenmüller, Oliver | Mannheim University of Applied Sciences |
Keywords: Deep Learning Methods, Object Detection, Segmentation and Categorization, Computer Vision for Transportation
Abstract: LiDAR data of urban scenarios poses unique challenges, such as heterogeneous characteristics and inherent class imbalance. Therefore, large-scale datasets are necessary to apply deep learning methods. Instance augmentation has emerged as an efficient method to increase dataset diversity. However, current methods require the time-consuming curation of 3D models or costly manual data annotation. To overcome these limitations, we propose Text3DAug, a novel approach leveraging generative models for instance augmentation. Text3DAug does not depend on labeled data and is the first of its kind to generate instances and annotations from text. This allows for a fully automated pipeline, eliminating the need for manual effort in practical applications. Additionally, Text3DAug is sensor agnostic and can be applied regardless of the LiDAR sensor used. Comprehensive experimental analysis on LiDAR segmentation, detection and novel class discovery demonstrates that Text3DAug is effective in supplementing existing methods or as a standalone method, performing on par or better than established methods, however while overcoming their specific drawbacks. The code is publicly available.
|
|
FrAT11 |
Room 11 |
Legged Robots I |
Regular session |
Co-Chair: Zimmermann, Karel | Ceske Vysoke Uceni Technicke V Praze, FEL |
|
10:00-10:15, Paper FrAT11.1 | |
MonoForce: Self-Supervised Learning of Physics-Informed Model for Predicting Robot-Terrain Interaction |
|
Agishev, Ruslan | Czech Technical University in Prague, FEE |
Zimmermann, Karel | Ceske Vysoke Uceni Technicke V Praze, FEL |
Kubelka, Vladimir | Örebro University |
Pecka, Martin | Ceske Vysoke Uceni Technicke V Praze, FEL |
Svoboda, Tomas | Ceske Vysoke Uceni Technicke V Praze, FEL |
Keywords: Learning from Experience, Vision-Based Navigation, Field Robots
Abstract: While autonomous navigation of mobile robots on rigid terrain is a well-explored problem, navigating on deformable terrain such as tall grass or bushes remains a challenge. To address it, we introduce an explainable, physics-aware and end-to-end differentiable model which predicts the outcome of robot-terrain interaction from camera images, both on rigid and non-rigid terrain. The proposed MonoForce model consists of a black-box module which predicts robot-terrain interaction forces from onboard cameras, followed by a white-box module, which transforms these forces and a control signals into predicted trajectories, using only the laws of classical mechanics. The differentiable white-box module allows backpropagating the predicted trajectory errors into the black-box module, serving as a self-supervised loss that measures consistency between the predicted forces and ground-truth trajectories of the robot. Experimental evaluation on a public dataset and our data has shown that while the prediction capabilities are comparable to state-of-the-art algorithms on rigid terrain, MonoForce shows superior accuracy on non-rigid terrain such as tall grass or bushes. To facilitate the reproducibility of our results, we release both the code and datasets.
|
|
10:15-10:30, Paper FrAT11.2 | |
LEEPS: Learning End-To-End Legged Perceptive Parkour Skills on Challenging Terrains |
|
Qian, Tangyu | University of Science and Technology of China |
Zhang, Hao | University of Science and Technology of China |
Zhou, Zhangli | University of Science and Technology of China |
Wang, Hao | University of Science and Technology of China |
Mingyu, Cai | University of California Riverside |
Kan, Zhen | University of Science and Technology of China |
Keywords: Legged Robots, Reinforcement Learning
Abstract: Empowering legged robots with agile maneuvers is a great challenge. While existing works have proposed diverse control-based and learning-based methods, it remains an open problem to endow robots with animal-like perception and athleticism. Towards this goal, we develop an End-to-End Legged Perceptive Parkour Skill Learning (LEEPS) framework to train quadruped robots to master parkour skills in complex environments. In particular, LEEPS incorporates a vision-based perception module equipped with multi-layered scans, supplying robots with comprehensive, precise, and adaptable information about their surroundings. Leveraging such visual data, a position-based task formulation liberates the robot from velocity constraints and directs it toward the target using innovative reward mechanisms. The resulting controller empowers an affordable quadruped robot to successfully traverse previously challenging and unprecedented obstacles. We evaluate LEEPS on various challenging tasks, which demonstrate its effectiveness, robustness, and generalizability. Supplementary and videos are available at: https://sites.google.com/view/leeps
|
|
10:30-10:45, Paper FrAT11.3 | |
DexDribbler: Learning Dexterous Soccer Manipulation Via Dynamic Supervision |
|
Hu, Yutong | ETH Zurich |
Wen, Kehan | ETH Zurich |
Yu, Fisher | ETH Zürich |
Keywords: Legged Robots, Reinforcement Learning, Mobile Manipulation
Abstract: Learning dexterous locomotion policy for legged robots is becoming increasingly popular due to its ability to handle diverse terrains and resemble intelligent behaviors. However, joint manipulation of moving objects and locomotion with legs, such as playing soccer, receive scant attention in the learning community, although it is natural for humans and smart animals. A key challenge to solve this multitask problem is to infer the objectives of locomotion from the states and targets of the manipulated objects. The implicit relation between the object states and robot locomotion can be hard to capture directly from the training experience. We propose adding a feedback control block to compute the necessary body-level movement accurately and using the outputs as dynamic joint-level locomotion supervision explicitly. We further utilize an improved ball dynamic model, an extended context-aided estimator, and a comprehensive ball observer to facilitate transferring policy learned in simulation to the real world. We observe that our learning scheme can not only make the policy network converge faster but also enable soccer robots to perform sophisticated maneuvers like sharp cuts and turns on flat surfaces, a capability that was lacking in previous methods. Video and code are available at github.com/SysCV/soccer-player/.
|
|
10:45-11:00, Paper FrAT11.4 | |
Modeling and Gait Analysis of Passive Rimless Wheel with Compliant Feet |
|
Zheng, Yanqiu | Ritsumeikan University |
Yan, Cong | Ritsumeikan University |
He, Yuetong | Japan Advanced Institute of Science and Technology |
Asano, Fumihiko | Japan Advanced Institute of Science and Technology |
Tokuda, Isao | Ritsumeikan University |
Keywords: Legged Robots, Passive Walking, Modeling and Simulating Humans
Abstract: The movement of the legs involves the interaction between the feet and the ground. Consequently, most animals possess a wide variety of foot morphologies and multifunctional capabilities. The selection and switching of these foot functions are passive and environment-dependent, ensuring environmental compliance. Despite this, current research on compliant feet lacks mathematical models that simultaneously encompass locomotion and foot compliance. Therefore, conducting in-depth studies on locomotion properties under current conditions is challenging. In this study, we present novel passive compliant feet applicable to the passive walking of a rimless wheel. We first introduce a dynamic model, achieve passive walking through numerical simulations, and subsequently analyze the gait patterns for compliance and multi-period gait. This study bridges a gap in understanding the interaction between motion and compliance in foot design, providing insights into the dynamics of compliant motion.
|
|
FrAT12 |
Room 12 |
Semantic Scene Understanding II |
Regular session |
|
10:00-10:15, Paper FrAT12.1 | |
Volumetric Semantically Consistent 3D Panoptic Mapping |
|
Miao, Yang | ETH Zurich |
Armeni, Iro | Stanford University |
Pollefeys, Marc | ETH Zurich |
Barath, Daniel | MTA SZTAKI; Visual Recognition Group in CTU Prague |
Keywords: Semantic Scene Understanding, Computer Vision for Automation, RGB-D Perception
Abstract: We introduce an online 2D-to-3D semantic instance mapping algorithm aimed at generating comprehensive, accurate, and efficient semantic 3D maps suitable for autonomous agents in unstructured environments. The proposed approach is based on a Voxel-TSDF representation used in recent algorithms. It introduces novel ways of integrating semantic prediction confidence during mapping, producing semantic and instance-consistent 3D regions. Further improvements are achieved by graph optimization-based semantic labeling and instance refinement. The proposed method achieves accuracy superior to the state of the art on public large-scale datasets, improving on a number of widely used metrics. We also highlight a downfall in the evaluation of recent studies: using the ground truth trajectory as input instead of a SLAM-estimated one substantially affects the accuracy, creating a large gap between the reported results and the actual performance on real-world data. The code is available: https://github.com/y9miao/ConsistentPanopticSLAM.
|
|
10:15-10:30, Paper FrAT12.2 | |
Answerability Fields: Answerable Location Estimation Via Diffusion Models |
|
Azuma, Daichi | Sony Semiconductor Solutions |
Miyanishi, Taiki | Advanced Telecommunications Research Institute International |
Kurita, Shuhei | RIKEN |
Sakamoto, Koya | Kyoto University, ATR |
Kawanabe, Motoaki | Advanced Telecommunications Research Institutte International |
Keywords: Semantic Scene Understanding, Data Sets for Robotic Vision
Abstract: We propose Answerability Fields (AnsFields), a novel approach for predicting the answerability of questions at different locations within indoor environments. AnsFields is represented as a map, where each grid’s score reflects how well a question can be answered using the panoramic image at that location. Using a 3D question-answering dataset, we construct comprehensive AnsFields covering diverse scenes from ScanNet. Additionally, we employ a diffusion model to infer AnsFields from a scene’s top-down view image and the question. We then conduct 3D question-answering using these predicted AnsFields and achieve a 24% improvement in accuracy over the standard 3D-QA method. Our results demonstrate the importance of object locations for answering questions in the environment, highlighting the potential of AnsFields for applications in robotics, augmented reality, and human-robot interaction.
|
|
10:30-10:45, Paper FrAT12.3 | |
Multi-Modal NeRF Self-Supervision for LiDAR Semantic Segmentation |
|
Timoneda, Xavier | CARIAD SE |
Herb, Markus | Technische Universität München |
Duerr, Fabian | Audi AG |
Goehring, Daniel | Freie Universität Berlin |
Yu, Fisher | ETH Zürich |
Keywords: Semantic Scene Understanding, Deep Learning for Visual Perception
Abstract: LiDAR Semantic Segmentation is a fundamental task in autonomous driving perception consisting of associating each LiDAR point to a semantic label. Fully-supervised models have widely tackled this task, but they require labels for each scan, which either limits their domain or requires impractical amounts of expensive annotations. Camera images, which are generally recorded alongside LiDAR pointclouds, can be processed by the widely available 2D foundation models, which are generic and dataset-agnostic. However, distilling knowledge from 2D data to improve LiDAR perception raises domain adaptation challenges. For example, the classical perspective projection suffers from the parallax effect produced by the position shift between both sensors at their respective capture times. We propose a Semi-Supervised Learning setup to leverage unlabeled LiDAR pointclouds alongside distilled knowledge from the camera images. To self-supervise our model on the unlabeled scans, we add an auxiliary NeRF head and cast rays from the camera viewpoint over the unlabeled voxel features. The NeRF head predicts densities and semantic logits at each sampled ray location which are used for rendering pixel semantics. Concurrently, we query the Segment-Anything (SAM) foundation model with the camera image to generate a set of unlabeled generic masks. We fuse the masks with the rendered pixel semantics from LiDAR to produce pseudo-labels that supervise the pixel predictions. During inference, we drop the NeRF head and run our model with only LiDAR. We show the effectiveness of our approach in three public LiDAR Semantic Segmentation benchmarks: nuScenes, SemanticKITTI and ScribbleKITTI.
|
|
10:45-11:00, Paper FrAT12.4 | |
PanopticRecon: Leverage Open-Vocabulary Instance Segmentation for Zero-Shot Panoptic Reconstruction |
|
Yu, Xuan | Zhejiang University |
Liu, Yili | Zhejiang University |
Han, Chenrui | Zhejiang University |
Mao, Sitong | ShenZhen Huawei Cloud Computing Technologies Co., Ltd |
Zhou, Shunbo | The Chinese University of Hong Kong |
Xiong, Rong | Zhejiang University |
Liao, Yiyi | Zhejiang University |
Wang, Yue | Zhejiang University |
Keywords: Semantic Scene Understanding, RGB-D Perception, Mapping
Abstract: Panoptic reconstruction is a challenging task in 3D scene understanding. However, most existing methods heavily rely on pre-trained semantic segmentation models and known 3D object bounding boxes for 3D panoptic segmentation, which is not available for in-the-wild scenes. In this paper, we propose a novel zero-shot panoptic reconstruction method from RGB-D images of scenes. For zero-shot segmentation, we leverage open-vocabulary instance segmentation, but it has to face partial labeling and instance association challenges. We tackle both challenges by propagating partial labels with the aid of dense generalized features and building a 3D instance graph for associating 2D instance IDs. Specifically, we exploit partial labels to learn a classifier for generalized semantic features to provide complete labels for scenes with dense distilled features. Moreover, we formulate instance association as a 3D instance graph segmentation problem, allowing us to fully utilize the scene geometry prior and all 2D instance masks to infer global unique pseudo 3D instance ID. Our method outperforms state-of-the-art methods on the indoor dataset ScanNet V2 and the outdoor dataset KITTI-360, demonstrating the effectiveness of our graph segmentation method and reconstruction network.
|
|
FrAT13 |
Room 13 |
Computer Vision for Automation III |
Regular session |
|
10:00-10:15, Paper FrAT13.1 | |
NRDF - Neural Region Descriptor Fields As Implicit ROI Representation for Robotic 3D Surface Processing |
|
Pratheepkumar, Anish | Profactor Gmbh |
Ikeda, Markus | PROFACTOR GmbH |
Hofmann, Michael | Profactor Gmbh |
Widmoser, Fabian | Profactor Gmbh |
Pichler, Andreas | Profactor Gmbh |
Vincze, Markus | Vienna University of Technology |
Keywords: Computer Vision for Automation, Deep Learning Methods, Representation Learning
Abstract: To automate 3D surface processing across diverse category-level objects it is imperative to represent process-related region of interest (P-ROI), which is not obtained with conventional keypoint or semantic part correspondences. To resolve this issue, we propose Neural Region Descriptor Fields (NRDF) for achieving unsupervised dense 3D surface region correspondence such that arbitrary ROI is retrieved for a new instance of a known category of object. We utilize the NRDF representation as a medium to facilitate one-shot P-ROI level process knowledge transfer. Recent developments in implicit 3D object representations have focused on keypoint or part correspondences, which have resulted in applications like robotic grasping and manipulation. However, explicit one-shot P-ROI correspondence, and its application for 3D surface process knowledge transfer, is treated for the first time in this work, to the best of our knowledge. The evaluation results show that the proposed approach outperforms the dense correspondence baselines in implicit shape representation and the capacity to retrieve matching arbitrary ROIs. In addition, we validate the practicality of our proposed system in a real-world robotic surface processing application. Our code is available at https://github.com/Profactor/Neural-Region-Descriptor-Fields.
|
|
10:15-10:30, Paper FrAT13.2 | |
Sparse Points to Dense Clouds: Enhancing 3D Detection with Limited LiDAR Data |
|
Kumar, Aakash | University of Central Florida |
Chen, Chen | University of Central Florida |
Mian, Ajmal | University of Western Australia |
Lobo, Niels | University of Central Florida |
Shah, Mubarak | University of Central Florida |
Keywords: Computer Vision for Automation, Vision-Based Navigation, Visual Learning
Abstract: 3D detection is a critical task that enables machines to identify and locate objects in three-dimensional space. It has a broad range of applications in several fields, including autonomous driving, robotics and augmented reality. Monocular 3D detection is attractive as it requires only a single camera, however, it lacks the accuracy and robustness required for real world applications. High resolution LiDAR on the other hand, can be expensive and lead to interference problems in heavy traffic given their active transmissions. We propose a balanced approach that combines the advantages of monocular and point cloud-based 3D detection. Our method requires only a small number of 3D points, that can be obtained from a low-cost, low-resolution sensor. Specifically, we use only 512 points, which is just 1% of a full LiDAR frame in the KITTI dataset. Our method reconstructs a complete 3D point cloud from this limited 3D information combined with a single image. The reconstructed 3D point cloud and corresponding image can be used by any multi-modal off-the-shelf detector for 3D object detection. By using the proposed network architecture with an off-the-shelf multi-modal 3D detector, the accuracy of 3D detection improves by 20% compared to the state-of-the-art monocular detection methods and 6% to 9% compare to the baseline multi-modal methods on KITTI and JackRabbot datasets.
|
|
10:30-10:45, Paper FrAT13.3 | |
Conditional Generative Denoiser for Nighttime UAV Tracking |
|
Wang, Yucheng | Tongji University |
Fu, Changhong | Tongji University |
Lu, Kunhan | Tongji University |
Yao, Liangliang | Tongji University |
Zuo, Haobo | University of Hong Kong |
Keywords: Computer Vision for Automation, Deep Learning for Visual Perception, Aerial Systems: Applications
Abstract: State-of-the-art (SOTA) visual object tracking methods have significantly enhanced the autonomy of unmanned aerial vehicles (UAVs). However, in low-light conditions, the presence of irregular real noise from the environments severely degrades the performance of these SOTA methods. Moreover, existing SOTA denoising techniques often fail to meet the real-time processing requirements when deployed as plug-and-play denoisers for UAV tracking. To address this challenge, this work proposes a novel conditional generative denoiser (CGDenoiser), which breaks free from the limitations of traditional deterministic paradigms and generates the noise conditioning on the input, subsequently removing it. To better align the input dimensions and accelerate inference, a novel nested residual Transformer conditionalizer is developed. Furthermore, an innovative multi-kernel conditional refiner is designed to pertinently refine the denoised output. Extensive experiments show that CGDenoiser promotes the tracking precision of the SOTA tracker by 18.18% on DarkTrack2021 whereas working 5.8 times faster than the second well-performed denoiser. Real-world tests with complex challenges also prove the effectiveness and practicality of CGDenoiser. Code, video demo and supplementary proof for CGDenoier are now available at: https://github.com/vision4robotics/CGDenoiser.
|
|
FrBT1 |
Room 1 |
Vision-Based Navigation II |
Regular session |
|
11:00-11:15, Paper FrBT1.1 | |
DD-VNB: A Depth-Based Dual-Loop Framework for Real-Time Visually Navigated Bronchoscopy |
|
Tian, Qingyao | University of Chinese Academy of Sciences |
Liao, Huai | Department of Pulmonary and Critical Care Medicine, the First Af |
Huang, Xinyan | Department of Pulmonary and Critical Care Medicine, the First Af |
Chen, Jian | Hong Kong Institute of Science and Innovation, Chinese Academy O |
Zhang, Zihui | Institute of Automation, Chinese Academy of Sciences |
Yang, Bingyu | Institute of Automation, Chinese Academy of Sciences; Sch |
Ourselin, Sebastien | University College London |
Liu, Hongbin | Institute of Automation,Chinese Academy of Sciences |
Keywords: Vision-Based Navigation, Computer Vision for Medical Robotics, Localization
Abstract: Real-time 6 DOF localization of bronchoscopes is crucial for enhancing intervention quality. However, current vision-based technologies struggle to balance between generalization to unseen data and computational speed. In this study, we propose a Depth-based Dual-Loop framework for real-time Visually Navigated Bronchoscopy (DD-VNB) that can generalize across patient cases without the need of re-training. The DD-VNB framework integrates two key modules: depth estimation and dual-loop localization. To address the domain gap among patients, we propose a knowledge-embedded depth estimation network that maps endoscope frames to depth, ensuring generalization by eliminating patient-specific textures. The network embeds view synthesis knowledge into a cycle adversarial architecture for scale-constrained monocular depth estimation. For real-time performance, our localization module embeds a fast ego-motion estimation network into the loop of depth registration. The ego-motion inference network estimates the pose change of the bronchoscope in high frequency while depth registration against the pre-operative 3D model provides absolute pose periodically. Specifically, the relative pose changes are fed into the registration process as the initial guess to boost its accuracy and speed. Experiments on phantom and in-vivo data from patients demonstrate the effectiveness of our framework: 1) monocular depth estimation outperforms SOTA, 2) localization achieves an accuracy of Absolute Tracking Error (ATE) of 4.7 ± 3.17 mm in phantom and 6.49 ± 3.88 mm in patient data, 3) with a frame-rate approaching video capture speed, 4) without the necessity of case-wise network retraining. The framework's superior speed and accuracy demonstrate its promising clinical potential for real-time bronchoscopic navigation.
|
|
11:15-11:30, Paper FrBT1.2 | |
RNR-Nav: A Real-World Visual Navigation System Using Renderable Neural Radiance Maps |
|
Kim, Minsoo | Seoul National University |
Kwon, Obin | Seoul Natl University |
Jun, Howoong | Seoul National University |
Oh, Songhwai | Seoul National University |
Keywords: Vision-Based Navigation, Localization
Abstract: We propose a novel visual localization and navigation framework for real-world environments directly integrating observed visual information into the bird-eye-view map. While the renderable neural radiance map (RNR-Map) shows considerable promise in simulated settings, its deployment in real-world scenarios poses undiscovered challenges. RNR-Map utilizes projections of multiple vectors into a single latent code, resulting in information loss under suboptimal conditions. To address such issues, our enhanced RNR-Map for real-world robots, RNR-Map++, incorporates strategies to mitigate information loss, such as a weighted map and positional encoding. For robust real-time localization, we integrate a particle filter into the correlation-based localization framework using RNRMap++ without a rendering procedure. Consequently, we establish a real-world robot system for visual navigation utilizing RNR-Map++, which we call “RNR-Nav.” Experimental results demonstrate that the proposed methods significantly enhance rendering quality and localization robustness compared to previous approaches. In real-world navigation tasks, RNR-Nav achieves a success rate of 84.4%, marking a 68.8% enhancement over the methods of the original RNR-Map paper.
|
|
11:30-11:45, Paper FrBT1.3 | |
Mind the Error! Detection and Localization of Instruction Errors in Vision-And-Language Navigation |
|
Taioli, Francesco | University of Verona |
Rosa, Stefano | Istituto Italiano Di Tecnologia |
Castellini, Alberto | Verona University |
Natale, Lorenzo | Istituto Italiano Di Tecnologia |
Del Bue, Alessio | Istituto Italiano Di Tecnologia |
Farinelli, Alessandro | University of Verona |
Cristani, Marco | University of Verona |
Wang, Yiming | Fondazione Bruno Kessler |
Keywords: Vision-Based Navigation, Deep Learning Methods, Deep Learning for Visual Perception
Abstract: Vision-and-Language Navigation in Continuous Environments (VLN-CE) is one of the most intuitive yet challenging embodied AI tasks. Agents are tasked to navigate towards a target goal by executing a set of low-level actions, following a series of natural language instructions. All VLN-CE methods in the literature assume that language instructions are exact. However, in practice, instructions given by humans can contain errors when describing a spatial environment due to inaccurate memory or confusion. Current VLN-CE benchmarks do not address this scenario, making the state-of-the-art methods in VLN-CE fragile in the presence of erroneous instructions from human users. For the first time, we propose a novel benchmark dataset that introduces various types of instruction errors considering potential human causes. This benchmark provides valuable insight into the robustness of VLN systems in continuous environments. We observe a noticeable performance drop (up to −25%) in Success Rate when evaluating the state-of-the-art VLN-CE methods on our benchmark. Moreover, we formally define the task of Instruction Error Detection and Localization, and establish an evaluation protocol on top of our benchmark dataset. We also propose an effective method, based on a cross-modal transformer architecture, that achieves the best performance in error detection and localization, compared to baselines. Surprisingly, our proposed method has revealed errors in the validation set of the two commonly used datasets for VLN-CE, i.e., R2R-CE and RxR-CE, demonstrating the utility of our technique in other tasks.
|
|
11:45-12:00, Paper FrBT1.4 | |
Distilling Knowledge for Short-To-Long Term Trajectory Prediction |
|
Das, Sourav | University of Padova |
Camporese, Guglielmo | University of Padova |
Cheng, Shaokang | Northwestern Polytechnical University |
Ballan, Lamberto | University of Padova |
Keywords: Vision-Based Navigation, Visual Tracking, Human-Aware Motion Planning
Abstract: Long-term trajectory forecasting is an important and challenging problem in the fields of computer vision, machine learning, and robotics. One fundamental difficulty stands in the evolution of the trajectory that becomes more and more uncertain and unpredictable as the time horizon grows, subsequently increasing the complexity of the problem. To overcome this issue, in this paper, we propose Di-Long, a new method that employs the distillation of a short-term trajectory model forecaster that guides a student network for long-term trajectory prediction during the training process. Given a total sequence length that comprehends the allowed observation for the student network and the complementary target sequence, we let the student and the teacher solve two different related tasks defined over the same full trajectory: the student observes a short sequence and predicts a long trajectory, whereas the teacher observes a longer sequence and predicts the remaining short target trajectory. The teacher's task is less uncertain, and we use its accurate predictions to guide the student through our knowledge distillation framework, reducing long-term future uncertainty. Our experiments show that our proposed Di-Long method is effective for long-term forecasting and achieves state-of-the-art performance on the Intersection Drone Dataset (inD) and the Stanford Drone Dataset (SDD).
|
|
FrBT2 |
Room 2 |
Human-Aware Motion Planning |
Regular session |
|
11:00-11:15, Paper FrBT2.1 | |
SparseGTN: Human Trajectory Forecasting with Sparsely Represented Scene and Incomplete Trajectories |
|
Liu, Jianbang | The Chinese University of Hong Kong |
Li, Guangyang | Harbin Institute of Technology, Shenzhen |
Mao, Xinyu | The Chinese University of Hong Kong |
Meng, Fei | The Chinese University of Hong Kong |
Mei, Jie | Harbin Institute of Technology |
Meng, Max Q.-H. | The Chinese University of Hong Kong |
Keywords: Human and Humanoid Motion Analysis and Synthesis, Deep Learning Methods, Autonomous Vehicle Navigation
Abstract: In recent years, great progress has been made in forecasting human motion in crowded scenes. However, current methods are far from practical applications due to the unbearable high computation costs, especially for encoding scene context. In addition, neglecting the partially detected trajectories makes the predicted outcome deviate from the real trajectory distribution. To handle the aforementioned concerns, we propose to represent the scene context and partially observed trajectories with sparse graphs. Customized for this special data structure, we design a hierarchical Graph Transformer Network model SparseGTN to predict multiple possible future trajectories of the target pedestrian by digesting the sparsely represented inputs. Our approach exhibits superiority over the state-of-the-art (SOTA) methods, utilizing a mere 3.42% of the number of floating point operations (FLOPs) and 0.53% of the number of model parameters. The code will be available online.
|
|
11:15-11:30, Paper FrBT2.2 | |
GazeMotion: Gaze-Guided Human Motion Forecasting |
|
Hu, Zhiming | University of Stuttgart |
Schmitt, Syn | University of Stuttgart, Germany |
Haeufle, Daniel Florian Benedict | Heidelberg University, Germany |
Bulling, Andreas | University of Stuttgart |
Keywords: Human-Aware Motion Planning
Abstract: We present GazeMotion – a novel method for human motion forecasting that combines information on past human poses with human eye gaze. Inspired by evidence from behavioural sciences showing that human eye and body movements are closely coordinated, GazeMotion first predicts future eye gaze from past gaze, then fuses predicted future gaze and past poses into a gaze-pose graph, and finally uses a residual graph convolutional network to forecast body motion. We extensively evaluate our method on the MoGaze, ADT, and GIMO benchmarks and demonstrate that it outperforms state-of-the-art methods by up to 7.4% improvement in mean per joint position error. Using head direction as a proxy to gaze, our method still achieves an average improvement of 5.5%. We finally report an online user study showing that our method also outperforms prior methods in terms of perceived realism. These results show the significant information content available in eye gaze for human motion forecasting as well as the effectiveness of our method in exploiting this information.
|
|
11:30-11:45, Paper FrBT2.3 | |
Hyp2Nav: Hyperbolic Planning and Curiosity for Crowd Navigation |
|
D'Amely di Melendugno, Guido Maria | Sapienza University of Rome |
Flaborea, Alessandro | Sapienza University of Rome |
Mettes, Pascal | University of Amsterdam |
Galasso, Fabio | Sapienza University of Rome |
Keywords: Human-Aware Motion Planning, Task and Motion Planning, Reinforcement Learning
Abstract: Autonomous robots are increasingly becoming a strong fixture in social environments. Effective crowd navigation requires not only safe yet fast planning, but should also enable interpretability and computational efficiency for working in real-time on embedded devices. In this work, we advocate for hyperbolic learning to enable crowd navigation and we introduce Hyp2Nav. Different from conventional reinforcement learning-based crowd navigation methods, Hyp2Nav leverages the intrinsic properties of hyperbolic geometry to better encode the hierarchical nature of decision-making processes in navigation tasks. We propose a hyperbolic policy model and a hyperbolic curiosity module that results in effective social navigation, best success rates, and returns across multiple simulation settings, using up to 6 times fewer parameters than competitor state-of-the-art models. With our approach, it becomes even possible to obtain policies that work in 2-dimensional embedding spaces, opening up new possibilities for low-resource crowd navigation and model interpretability. Insightfully, the internal hyperbolic representation of Hyp2Nav correlates with how much attention the robot pays to the surrounding crowds, e.g. due to multiple people occluding its pathway or to a few of them showing colliding plans, rather than to its own planned route. The code is available at https://github.com/GDam90/hyp2nav.
|
|
11:45-12:00, Paper FrBT2.4 | |
Map-Aware Human Pose Prediction for Robot Follow-Ahead |
|
Jiang, Qingyuan | University of Minnesota |
Susam, Burak | University of Minnesota |
Chao, Jun-Jee | University of Minnesota |
Isler, Volkan | University of Minnesota |
Keywords: Human-Aware Motion Planning, Gesture, Posture and Facial Expressions, Human Detection and Tracking
Abstract: In the robot follow-ahead task, a mobile robot is tasked to maintain its relative position in front of a moving human actor while keeping the actor in sight. To accomplish this task, it is important that the robot understand the full 3D pose of the human (since the head orientation can be different than the torso) and predict future human poses so as to plan accordingly. This prediction task is especially tricky in a complex environment with junctions and multiple corridors. In this work, we address the problem of forecasting the full 3D trajectory of a human in such environments. Our main insight is to show that one can first predict the 2D trajectory and then estimate the full 3D trajectory by conditioning the estimator on the predicted 2D trajectory. With this approach, we achieve results comparable or better than the state-of-the-art methods three times faster. As part of our contribution, we present a new dataset where, in contrast to existing datasets, the human motion is in a much larger area than a single room. We also present a complete robot system that integrates our human pose forecasting network on the mobile robot to enable real-time robot follow-ahead and present results from real-world experiments in multiple buildings on campus. Our project page, including supplementary material and videos, can be found at: https://qingyuan-jiang.github.io/iros2024_poseForecasting/
|
|
FrBT3 |
Room 3 |
Micro/Nano Robots I |
Regular session |
Co-Chair: Liu, Song | ShanghaiTech University |
|
11:00-11:15, Paper FrBT3.1 | |
Learning a Tracking Controller for Rolling μbots |
|
Beaver, Logan | Old Dominion University |
Max, Sokolich | University of Delaware |
Alsalehi, Suhail | Boston Unviersity |
Weiss, Ron | Massachusettes Institute of Technology |
Das, Sambeeta | University of Delaware |
Belta, Calin | Boston University |
Keywords: Micro/Nano Robots, Machine Learning for Robot Control, Optimization and Optimal Control
Abstract: Micron-scale robots (μbots) have recently shown great promise for emerging medical applications. Accurate control of μbots, while critical to their successful deployment, is challenging. In this work, we consider the problem of tracking a reference trajectory using a μbot in the presence of disturbances and uncertainty. The disturbances primarily come from Brownian motion and other environmental phenomena, while the uncertainty originates from errors in the model parameters. We model the μbot as an uncertain unicycle that is controlled by a global magnetic field. To compensate for disturbances and uncertainties, we develop a nonlinear mismatch controller. We define the model mismatch error as the difference between our model's predicted velocity and the actual velocity of the μbot. We employ a Gaussian Process to learn the model mismatch error as a function of the applied control input. Then we use a least-squares minimization to select a control action that minimizes the difference between the actual velocity of the μbot and a reference velocity. We demonstrate the online performance of our joint learning and control algorithm in simulation, where our approach accurately learns the model mismatch and improves tracking performance. We also validate our approach in an experiment and show that certain error metrics are reduced by up to 40%.
|
|
11:15-11:30, Paper FrBT3.2 | |
The Design of a Layered Brain-Computer Interface System with Target Identification Module to Control Home Service Robot |
|
Wang, Wenzhi | Nankai University |
Mao, Yuqing, Troy | University of California, Davis |
Duan, Feng | Nankai University |
Keywords: Micro/Nano Robots, Biomimetics, Brain-Machine Interfaces
Abstract: Brain-computer interface (BCI) systems, a kind of communication channel between human mind and the environment, turn brain activities into control commands. However, disabled people cannot withstand continuously high-tense operation of robots to fulfill complicated tasks. In order to simplify the operating process and strengthen the practicality of services, this paper proposed a layered BCI system integrated with a two-level target identification system to control a home service robot. We recorded single-channel steady-state visual evoked potentials (SSVEP) to diminish the number of electrodes in use. This hierarchical architecture can enhance the accuracy and sustainability during operation. The target identification system was established to reduce the burden of users and accelerate the service procedures. Subjects recruited in the experiment succeed in operating the robot to provide basic services. The average accuracy of online experiment is 87.38%. These results prove the effectiveness of this hierarchical system in operating a multifunctional home service robot with single-channel SSVEP, which plays an important role in both medical care and daily life.
|
|
11:30-11:45, Paper FrBT3.3 | |
A Magnetic Helical Miniature Robot with Soft Magnetic-Controlled Gripper |
|
Zhu, Aoji | Fudan University |
Bai, Chenyao | Fudan University |
Lu, Xiwen | Fudan University |
Zhu, Yunlong | Fudan University |
Wang, Kezhi | Brunel University London |
Zhu, Jiarui | Fudan University |
Keywords: Micro/Nano Robots, Biologically-Inspired Robots, Modeling, Control, and Learning for Soft Robots
Abstract: Magnetic helical miniature robots (MHMRs) exhibit efficient motion performance in low Reynolds number environments, having great promise for biomedical applications like targeted delivery. However, during targeted delivery, the backward propulsion of MHMRs leads to cargo being released, limiting their degrees of freedom and interference resistance. Furthermore, the basic magnetic field parameter, amplitude, has not been effectively utilized in previous MHMRs. In this letter, we propose a magnetic helical miniature robot with soft magnetic-controlled gripper (MHMR-G), using magnetic field amplitude to functionalize the MHMR for the first time. The velocity of MHMR-G is controlled by magnetic field frequency and the grasping of gripper is controlled by magnetic field amplitude. It is proposed that the lag angle and rotation frequency will adversely affect the grasping of gripper under a rotating magnetic field, but results show that increasing magnetic field amplitude can effectively mitigate these adverse effects. Finally, a manipulation test of cargo transport is performed, demonstrating that the gripper of MHMR-G can effectively confine cargo during MHMR-G propulsion.
|
|
11:45-12:00, Paper FrBT3.4 | |
ActNeRF: Uncertainty-Aware Active Learning of NeRF-Based Object Models for Robot Manipulators Using Visual and Re-Orientation Actions |
|
Dasgupta, Saptarshi | Indian Institute of Technology Delhi |
Gupta, Akshat | Indian Institute of Technology Delhi |
Tuli, Shreshth | Indian Institute of Technology Delhi |
Paul, Rohan | Indian Institute of Technology Delhi |
Keywords: Perception-Action Coupling, Deep Learning in Grasping and Manipulation, Incremental Learning
Abstract: Manipulating unseen objects is challenging without a 3D representation, as objects generally have occluded surfaces. This requires physical interaction with objects to build their internal representations. This paper presents an approach that enables a robot to rapidly learn the complete 3D model of a given object for manipulation in unfamiliar orientations. We use an ensemble of partially constructed NeRF models to quantify model uncertainty to determine the next action (a visual or re-orientation action) by optimizing informativeness and feasibility. Further, our approach determines emph{when} and how to grasp and re-orient an object given its partial NeRF model and re-estimates the object pose to rectify misalignments introduced during the interaction. Experiments with a simulated Franka Emika Robot Manipulator operating in a tabletop environment with benchmark objects demonstrate an improvement of (i) 14% in visual reconstruction quality (PSNR), (ii) 20% in the geometric/depth reconstruction of the object surface (F-score) and (iii) 71% in the task success rate of manipulating objects a-priori unseen orientations/stable configurations in the scene; over current methods. The project page can be found at https://actnerf.github.io/
|
|
FrBT4 |
Room 4 |
Micro/Nano Robots II |
Regular session |
Co-Chair: Liu, Song | ShanghaiTech University |
|
11:00-11:15, Paper FrBT4.1 | |
Learning the Inverse Kinematics of Magnetic Continuum Robot for Teleoperated Navigation |
|
Xiang, Pingyu | Zhejiang University |
Qiu, Ke | Zhejiang University |
Sun, Danying | Zhejiang University |
Zhang, Jingyu | Zhejiang University |
Fang, Qin | Zhejiang University |
Mi, Xiangyu | Zhejiang University |
Wang, Shudong | Xi'an Jiaotong University |
Chen, Mengxiao | Zhejiang Lab |
Wang, Yue | Zhejiang University |
Xiong, Rong | Zhejiang University |
Lu, Haojian | Zhejiang University |
Keywords: Automation at Micro-Nano Scales, Micro/Nano Robots, Biologically-Inspired Robots
Abstract: Magnetic continuum robots are subject to external magnetic fields and deformed remotely, simplifying the robot's transmission mechanism and providing it with significant potential for miniaturization and operational flexibility. However, modeling magnetic field distribution generated by permanent magnets is complex and requires time-consuming pre-calibrations. Moreover, it is highly susceptible to environments with ferromagnetic materials, posing significant challenges for the control of magnetic continuum robots. In response, we propose an approach that does not overly focus on the magnetic field distribution but instead directly learns the inverse kinematics of magnetic continuum robots end-to-end. Binding the robot's configuration to the pose of external magnets, precise control of continuum robots is facilitated. Additionally, we leverage teleoperation techniques to broaden the applicability of this method. By mounting magnets on a robotic arm and directly utilizing the target pose of the external magnet predicted by a multi-layer perceptron (MLP), we achieve the operation and navigation of magnetic continuum robots in complex environments. Experiments demonstrate that the mean control accuracy along the robot using our learning-based inverse kinematics is about half of the robot's diameter.
|
|
11:15-11:30, Paper FrBT4.2 | |
Real-Time Particle Cluster Manipulation with Holographic Acoustic End-Effector under Microscope |
|
An, Siyuan | Shanghaitech University |
Zhong, Chengxi | ShanghaiTech University |
Wang, Mingyue | Shanghaitech Univerisity |
Wang, Shudong | Xi'an Jiaotong University |
Lu, Haojian | Zhejiang University |
Li, Jiaqi | ShanghaiTech University |
Li, Y.F. | City University of Hong Kong |
Liu, Song | ShanghaiTech University |
Keywords: Micro/Nano Robots, Grippers and Other End-Effectors, Deep Learning in Grasping and Manipulation
Abstract: Non-contact particle cluster manipulation holds significant promise in the realms of advanced manufacturing, chemistry, and pharmacy. However, achieving precise and dynamic control over the spatial kinematics of particle clusters remains a significant challenge, necessitating real-time and accurately programmable robotic end-effector. To this end, we develop an innovative non-contact, precise particle cluster manipulation system with ultrasonic phased array transducer (PAT) under microscope. This system combines a physics-based deep learning algorithm for real-time calculation of phase-only holograms (POHs), supporting PAT to dynamically form acoustic fields, namely holographic acoustic end-effector (HAE). Leveraging the dynamically and accurately generated HAEs by our system, kinematics control of particle clusters including aggregation, rotation, and translation is yielded. The extensive experiments well demonstrated the effectiveness of proposed system for particle cluster manipulation.
|
|
11:30-11:45, Paper FrBT4.3 | |
Absolute Pose Estimation for a Millimeter-Scale Vision System |
|
Ozturk, Derin | Cornell University |
Wang, Zilin | Cornell University |
Helbling, E. Farrell | Cornell University |
Keywords: Micro/Nano Robots, Embedded Systems for Robotic and Automation, Vision-Based Navigation
Abstract: Vision is an important component of robotic perception systems due to the rich information provided by high resolution image sensors, but computer vision algorithms can be computationally expensive and ill-suited to resource-constrained robotic systems. Here, we present a mm-scale vision system capable of performing absolute pose estimation at 16.5 FPS. This novel vision system uses a commercial-off-the-shelf sensor and microcontroller unit, as well as planar light-based landmarks in the environment to simplify feature detection. We exploit the structure of the planar pose problem to reduce algorithmic complexity and improve latency and energy consumption through software-, processor-, and hardware-in-the-loop testing. The end-to-end system consumes 49 mA of current and computes absolute pose estimates within 15 mm over a number of reference trajectories.
|
|
11:45-12:00, Paper FrBT4.4 | |
Design and Control of a Three-Dimensional Electromagnetic Drive System for Micro-Robots |
|
Zhang, Yunrui | Jiangnan University |
Liu, Yueyue | Jiangnan University |
Fan, Qigao | Jiangnan University |
Keywords: Micro/Nano Robots
Abstract: Three-dimensional electromagnetic field drive technology, as a cutting-edge remote wireless control method, is extensively utilized in the biomedical diagnosis and treatment of micro-robots. This paper presents the design of a three-dimensional electromagnetic drive system for micro-robots, leveraging a gradient magnetic field to achieve comprehensive automatic control in three axes. Firstly, we refine the iron core's end structure to produce an uniform gradient magnetic field throughout the three-dimensional space. Following that, the parameters at the end of the iron core are fine-tuned to meet the specifications for magnetic field gradient, magnetic flux density, and effective workspace. Then a three-dimensional electromagnetic drive system with strong magnetic field gradient is established, achieving a remarkable maximum gradient of 1.70 T/m at the center of the workspace. Compared with other systems, the gradient is significantly enhanced. Subsequently, we carry out a three-dimensional drive experiment for a micro-robot, confirming the system's driving efficacy. To enable precise path following for micro-robots within a three-dimensional space, we have formulated a control strategy rooted in micro-robot dynamics. The controller stability is guaranteed through the Lyapunov theory. Ultimately, a three-dimensional path following experiment is executed on the developed electromagnetic drive system. The experiment confirms the capability of our designed system which can achieve the three-dimensional closed-loop motion for the micro-robot.
|
|
FrBT5 |
Room 5 |
Grasping Control |
Regular session |
Co-Chair: Heppert, Nick | University of Freiburg |
|
11:00-11:15, Paper FrBT5.1 | |
AO-Grasp: Articulated Object Grasp Generation |
|
Pares-Morlans, Carlota | Stanford University |
Chen, Claire | Stanford University |
Weng, Yijia | Stanford |
Yi, Michelle | Stanford University |
Huang, Yuying | Stanford University |
Heppert, Nick | University of Freiburg |
Zhou, Linqi | Stanford University |
Guibas, Leonidas | Stanford University |
Bohg, Jeannette | Stanford University |
Keywords: Grasping, Data Sets for Robot Learning, Deep Learning in Grasping and Manipulation
Abstract: We introduce AO-Grasp, a grasp proposal method that generates 6 DoF grasps that enable robots to interact with articulated objects, such as opening and closing cabinets and appliances. AO-Grasp consists of two main contributions: the AO-Grasp Model and the AO-Grasp Dataset. Given a segmented partial point cloud of a single articulated object, the AO-Grasp Model predicts the best grasp points on the object with an Actionable Grasp Point Predictor. Then, it finds corresponding grasp orientations for each of these points, resulting in stable and actionable grasp proposals. We train the AO-Grasp Model on our new AO-Grasp Dataset, which contains 78K actionable parallel-jaw grasps on synthetic articulated objects. In simulation, AO-Grasp achieves a 45.0% grasp success rate, whereas the highest performing baseline achieves a 35.0% success rate. Additionally, we evaluate AO-Grasp on 120 real-world scenes of objects with varied geometries, articulation axes, and joint states, where AO-Grasp produces successful grasps on 67.5% of scenes, while the baseline only produces successful grasps on 33.3% of scenes. To the best of our knowledge, AO-Grasp is the first method for generating 6 DoF grasps on articulated objects directly from partial point clouds without requiring part detection or hand-designed grasp heuristics. The AO-Grasp Dataset and a pre-trained AO-Grasp model are available at our project website: https://stanford-iprl-lab.github.io/ao-grasp/.
|
|
11:15-11:30, Paper FrBT5.2 | |
Evaluating a Movable Palm in Caging Inspired Grasping Using a Reinforcement Learning-Based Approach |
|
Beddow, Luke Jonathan | University College London |
Wurdemann, Helge Arne | University College London |
Kanoulas, Dimitrios | University College London |
Keywords: Grippers and Other End-Effectors, Grasping, Reinforcement Learning
Abstract: In this paper, we study the effectiveness of using a rigid movable palm for grasping varied objects, on a caging inspired gripper with three flexible fingers. This rigid palm extends to actively exert downwards force on objects, in contrast with existing methods, which combine movable palms with negative pressure to exert lifting forces on objects. We compare grasping with and without the palm, whilst also changing finger stiffness and fingertip angle, to analyse the effect on grasp success rate and stability over 24 design permutations. Reinforcement learning was used to train a unique grasping controller in every design case, aiming to achieve optimal grasping as the basis for comparison. Validation in both simulation and the real world was completed for every permutation. We demonstrated that the using palm improved success rates on average by 11% in simulation, 13% in the real world, and achieved a best real world success rate of 96% on 18 YCB benchmark food objects. Grasp stability against disturbances in three axes improved by 15% on average when using the palm. Our investigation determined fingertip angle had a large effect, whereas finger stiffness was less important.
|
|
11:30-11:45, Paper FrBT5.3 | |
Learning a Shape-Conditioned Agent for Purely Tactile In-Hand Manipulation of Various Objects |
|
Pitz, Johannes | German Aerospace Center |
Röstel, Lennart | German Aerospace Center (DLR) |
Sievers, Leon | German Aerospace Center |
Burschka, Darius | Technische Universitaet Muenchen |
Bäuml, Berthold | Technical University of Munich |
Keywords: In-Hand Manipulation, Dexterous Manipulation, Multifingered Hands
Abstract: Reorienting diverse objects with a multi-fingered hand is a challenging task. Current methods in robotic in-hand manipulation are either object-specific or require permanent supervision of the object state from visual sensors. This is far from human capabilities and from what is needed in real-world applications. In this work, we address this gap by training shape-conditioned agents to reorient diverse objects in hand, relying purely on tactile feedback (via torque and position measurements of the fingers' joints). To achieve this, we propose a learning framework that exploits shape information in a reinforcement learning policy and a learned state estimator. We find that representing 3D shapes by vectors from a fixed set of basis points to the shape's surface, transformed by its predicted 3D pose, is especially helpful for learning dexterous in-hand manipulation. In simulation and real-world experiments, we show the reorientation of many objects with high success rates, on par with state-of-the-art results obtained with specialized single-object agents. Moreover, we show generalization to novel objects, achieving success rates of ~90% even for non-convex shapes.
|
|
11:45-12:00, Paper FrBT5.4 | |
Fine Manipulation Using a Tactile Skin: Learning in Simulation and Sim-To-Real Transfer |
|
Kasolowsky, Ulf | Technical University of Munich |
Bäuml , Berthold | Technical University of Munich |
Keywords: In-Hand Manipulation, Deep Learning in Grasping and Manipulation, Multifingered Hands
Abstract: We want to enable fine manipulation with a multi-fingered robotic hand by using modern deep reinforcement learning methods. Key for fine manipulation is a spatially resolved tactile sensor. Here, we present a novel model of a tactile skin that can be used together with rigid-body (hence fast) physics simulators. The model considers the softness of the real fingertips such that a contact can spread across multiple taxels of the sensor depending on the contact geometry. We calibrate the model parameters to allow for an accurate simulation of the real-world sensor. For this, we present a self-contained calibration method without external tools or sensors. To demonstrate the validity of our approach, we learn two challenging fine manipulation tasks: Rolling a marble and a bolt between two fingers. We show in simulation experiments that tactile feedback is crucial for precise manipulation and reaching sub-taxel resolution of <1mm (despite a taxel spacing of 4mm). Moreover, we demonstrate that all policies successfully transfer from the simulation to the real robotic hand. Website: https://aidx-lab.org/skin/iros24
|
|
FrBT6 |
Room 6 |
Aerial Systems: Motion Control and Planning |
Regular session |
Chair: Saska, Martin | Czech Technical University in Prague |
Co-Chair: Agarwal, Saurav | University of Pennsylvania |
|
11:00-11:15, Paper FrBT6.1 | |
Identifying Optimal Launch Sites of High-Altitude Latex-Balloons Using Bayesian Optimisation for the Task of Station-Keeping |
|
Saunders, Jack | University of Bath |
Saeedi, Sajad | Toronto Metropolitan University |
Hartshorne, Adam | University of Bath |
Xu, Binbin | University of Toronto |
Şimşek, Özgür | University of Bath |
Hunter, Alan Joseph | University of Bath |
Li, Wenbin | University of Bath |
Keywords: Aerial Systems: Applications, Machine Learning for Robot Control, Reinforcement Learning
Abstract: Station-keeping tasks for high-altitude balloons show promise in areas such as ecological surveys, atmospheric analysis, and communication relays. However, identifying the optimal time and position to launch a latex high-altitude balloon is still a challenging and multifaceted problem. For example, tasks such as forest fire tracking place geometric constraints on the launch location of the balloon. Furthermore, identifying the most optimal location also heavily depends on atmospheric conditions. We first illustrate how reinforcement learning-based controllers, frequently used for station-keeping tasks, can exploit the environment. This exploitation can degrade performance on unseen weather patterns and affect station-keeping performance when identifying an optimal launch configuration. Valuing all states equally in the region, the agent exploits the region's geometry by flying near the edge, leading to risky behaviours. We propose a modification which compensates for this exploitation and finds this leads to, on average, higher steps within the target region on unseen data. Then, we illustrate how Bayesian Optimisation (BO) can identify the optimal launch location to perform station-keeping tasks, maximising the return from a given rollout. We show BO can find this launch location in fewer steps compared to other optimisation methods. Results indicate that, surprisingly, the most optimal location to launch from is not commonly within the target region. Please find further information about our project at https://sites.google.com/view/bo-lauch-balloon/.
|
|
11:15-11:30, Paper FrBT6.2 | |
TOPPQuad: Dynamically-Feasible Time-Optimal Path Parametrization for Quadrotors |
|
Mao, Katherine | University of Pennsylvania |
Spasojevic, Igor | University of Pennsylvania |
Hsieh, M. Ani | University of Pennsylvania |
Kumar, Vijay | University of Pennsylvania |
Keywords: Aerial Systems: Mechanics and Control, Motion and Path Planning, Optimization and Optimal Control
Abstract: Planning time-optimal trajectories for quadrotors in cluttered environments is a challenging, non-convex problem. This paper addresses minimizing the traversal time of a given collision free geometric path without violating actuation bounds of the vehicle. Previous approaches have either relied on convex relaxations that do not guarantee dynamic feasibility, or have generated overly conservative time parametrizations. We propose TOPPQuad, a time-optimal path parameterization algorithm for quadrotors which explicitly incorporates quadrotor rigid body dynamics and constraints such as bounds on inputs (including motor thrusts) and state of the vehicle (including the pose, linear and angular velocity and acceleration). We demonstrate the ability of the planner to generate faster trajectories that respect hardware constraints of the robot compared to several planners with relaxed notions of dynamic feasibility in simulation and on hardware. We also demonstrate how TOPPQuad can be used to plan trajectories for quadrotors that utilize bidirectional motors. Overall, the proposed approach paves a way towards maximizing the efficacy of autonomous micro aerial vehicles while ensuring their safety.
|
|
11:30-11:45, Paper FrBT6.3 | |
Model Predictive Path Integral Control for Agile Unmanned Aerial Vehicles |
|
Minařík, Michal | Czech Technical University in Prague |
Penicka, Robert | Czech Technical University in Prague |
Vonasek, Vojtech | Czech Technical University in Prague |
Saska, Martin | Czech Technical University in Prague |
Keywords: Aerial Systems: Mechanics and Control, Optimization and Optimal Control, Motion and Path Planning
Abstract: This paper introduces a control architecture for real-time and onboard control of Unmanned Aerial Vehicles (UAVs) in environments with obstacles using the Model Predictive Path Integral (MPPI) methodology. MPPI allows the use of the full nonlinear model of UAV dynamics and a more general cost function at the cost of a high computational demand. To run the controller in real-time, the sampling-based optimization is performed in parallel on a graphics processing unit onboard the UAV. We propose an approach to the simulation of the nonlinear system which respects low-level constraints, while also able to dynamically handle obstacle avoidance, and prove that our methods are able to run in real-time without the need for external computers. The MPPI controller is compared to MPC and SE(3) controllers on the reference tracking task, showing a comparable performance. We demonstrate the viability of the proposed method in multiple simulation and real-world experiments, tracking a reference at up to 44 km/h and acceleration close to 20 m/s^2, while still being able to avoid obstacles. To the best of our knowledge, this is the first method to demonstrate an MPPI-based approach in real flight.
|
|
11:45-12:00, Paper FrBT6.4 | |
CoDe: A Cooperative and Decentralized Collision Avoidance Algorithm for Small-Scale UAV Swarms Considering Energy Efficiency |
|
Huang, Shuangyao | Xi'an Jiaotong-Liverpool University |
Zhang, Haibo | University of Otago |
Huang, Zhiyi | University of Otago |
Keywords: Collision Avoidance, Swarm Robotics, Reinforcement Learning
Abstract: This paper introduces a cooperative and decentralized collision avoidance algorithm (CoDe) for small-scale UAV swarms consisting of up to three UAVs. CoDe improves energy efficiency of UAVs by achieving effective cooperation among UAVs. Moreover, CoDe is specifically tailored for UAV's operations by addressing the challenges faced by existing schemes, such as ineffectiveness in selecting actions from continuous action spaces and high computational complexity. CoDe is based on Multi-Agent Reinforcement Learning (MARL), and finds cooperative policies by incorporating a novel credit assignment scheme. The novel credit assignment scheme estimates the contribution of an individual by subtracting a baseline from the joint action value for the swarm. The credit assignment scheme in CoDe outperforms other benchmarks as the baseline takes into account not only the importance of a UAV's action but also the interrelation between UAVs. Furthermore, extensive experiments are conducted against existing MARL-based and conventional heuristic-based algorithms to demonstrate the advantages of the proposed algorithm.
|
|
FrBT7 |
Room 7 |
Computer Vision for Medical Robotics |
Regular session |
Co-Chair: Nasseri, M. Ali | Technische Universitaet Muenchen |
|
11:00-11:15, Paper FrBT7.1 | |
DeepBHMR: Learning Bidirectional Hybrid Mixture Models for Generalized Rigid Point Set Registration |
|
Min, Zhe | University College London |
Zhang, Zhengyan | Harbin Institute of Technology, Shenzhen |
Zhang, Ang | The Chinese University of Hong Kong |
Song, Rui | Shandong University |
Li, Yibin | Shandong University |
Meng, Max Q.-H. | The Chinese University of Hong Kong |
Keywords: Computer Vision for Medical Robotics, Medical Robots and Systems, Probability and Statistical Methods
Abstract: This paper presents a novel normal-assisted learning-based rigid registration method, Deep Bi-directional Hybrid Mixture Registration (DeepBHMR), where normal vectors are used in both correspondence and transformation stages, and also the optimization objective is formulated in a Bi-directional way. The designated neural network consists of three components, (1) the correspondence network that estimates the correspondence probability between points within one generalized point set and components of Hybrid Mixture Models (HMMs) representing the other generalized point set; (2) the posterior module that computes the HMMs parameters; (3) the transformation module that computes the rotation matrix and the translation vector is given the estimated generalized-point to hybrid-distribution correspondences and HMMs parameters. Our DeepBHMR has been validated on the medical data set and outperforms the state-of-the-art registration methods. In the circumstance of femur bones, the mean rotation error is around 1° (i.e., 1.01°) and the mean translation error is less than 1 mm (i.e., 0.36mm), respectively. Even under the large transformation (i.e., global registration), the mean rotation and translation error values being 3.47° and 2.08 mm are still satisfactory. The results demonstrate the DeepBHMR's favorable generalizability on the different shapes (e.g., from femur to hip), and that DeepBHMR can successfully handle the large transformation and partial registration respectively.
|
|
11:15-11:30, Paper FrBT7.2 | |
A CT-Guided Control Framework of a Robotic Flexible Endoscope for the Diagnosis of the Maxillary Sinusitis |
|
Zhu, Puchen | The Chinese University of Hong Kong |
Zhang, Huayu | The Chinese University of Hong Kong |
Ma, Xin | Chinese Univerisity of HongKong |
Zheng, Xiaoyin | XMotors.ai |
Wang, Xuchen | The Chinese University of Hong Kong |
Au, K. W. Samuel | The Chinese University of Hong Kong |
Keywords: Computer Vision for Medical Robotics, Flexible Robotics, Surgical Robotics: Planning
Abstract: Flexible endoscopes are commonly adopted in narrow and confined anatomical cavities due to their higher reachability and dexterity. However, prolonged and unintuitive manipulation of these endoscopes leads to an increased workload on surgeons and risks of collision. To address these challenges, this paper proposes a CT-guided control framework for the diagnosis of maxillary sinusitis by using a robotic flexible endoscope. In the CT-guided control framework, a feasible path to the target position in the maxillary sinus cavity for the robotic flexible endoscope is designed. Besides, an optimal control scheme is proposed to autonomously control the robotic flexible endoscope to follow the feasible path. This greatly improves the efficiency and reduces the workload for surgeons. Several experiments were conducted based on a widely utilized sinus phantom, and the results showed that the robotic flexible endoscope can accurately and autonomously follow the feasible path and reach the target position in the maxillary sinus cavity. The results also verified the feasibility of the CT-guided control framework, which contributes an effective approach to early diagnosis of sinusitis in the future.
|
|
11:30-11:45, Paper FrBT7.3 | |
Estimating the Joint Angles of a Magnetic Surgical Tool Using Monocular 3D Keypoint Detection and Particle Filtering |
|
Fredin, Erik | University of Toronto |
Diller, Eric D. | University of Toronto |
Keywords: Computer Vision for Medical Robotics, Micro/Nano Robots, Medical Robots and Systems
Abstract: Magnetic surgical tools benefit greatly from real-time pose estimation, as this is essential for controlling them safely and effectively. Current pose estimation methods for surgical tools either focus on rigid tools, or are developed specifically for the da Vinci surgical system. In this work, we use computer vision from a monocular endoscopic camera to estimate the pose of an articulated magnetic surgical tool. In particular, we present a deep 3D keypoint estimation framework and a particle filter to achieve this. The former method can be used for any articulated surgical tool, while the latter method is specific to magnetic tools. We show that the deep 3D keypoint estimation framework estimates the surgical tool's joint angles with an average error of 4.0 degrees and a speed of 29 Hz. In addition, we demonstrate the robustness of the magnetic particle filter and the deep pose estimation method for real-time tool pose estimation.
|
|
11:45-12:00, Paper FrBT7.4 | |
Intraocular Reflection Modeling and Avoidance Planning in Image-Guided Ophthalmic Surgeries |
|
Yang, Junjie | TUM |
Zhao, Zhihao | Technische Universität München |
Zhao, Yinzheng | Klinikum Rechts Der Isar |
Zapp, Daniel | Klinikum Rechts Der Isar Der TU München |
Maier, Mathias | Klinikum Rechts Der Isar Der TU München |
Huang, Kai | Sun Yat-Sen University |
Navab, Nassir | TU Munich |
Nasseri, M. Ali | Technische Universitaet Muenchen |
Keywords: Computer Vision for Medical Robotics, Surgical Robotics: Planning, Motion and Path Planning
Abstract: Intuitive enhancement of surgical precision in robotic retinal surgery highly depends on the stable acquisition of intraocular imaging data. Such acquisition requires segmenting intraocular components, especially instrument-tip positions, to achieve state estimation and subsequent navigation and motion control. However, intraocular light reflections and glares significantly impact instrument segmentation, state estimation, and subsequent visual servoing in retinal surgery. At the same time, light reflections are among the sources of information for intraoperative navigation. In this work, we propose a method for modeling and optimizing light reflections using microscopy as the standard surgical imaging modality. Beyond optimization, our approach seamlessly integrates the optimized reflection with path planning, strategically circumventing reflection areas and ensuring uninterrupted visibility of instrument tips throughout the surgical procedure. Experiments demonstrate the efficacy and potential of the presented methodology to avoid glare affections during eye surgeries.
|
|
FrBT8 |
Room 8 |
Autonomous Vehicle Navigation II |
Regular session |
Chair: Yang, Ming | Shanghai Jiao Tong University |
Co-Chair: Qin, Tong | Shanghai Jiao Tong University |
|
11:00-11:15, Paper FrBT8.1 | |
METAVerse: Meta-Learning Traversability Cost Map for Off-Road Navigation |
|
Seo, Junwon | Carnegie Mellon University |
Kim, Taekyung | University of Michigan |
Ahn, Seongyong | KAIST |
Kwak, Kiho | Agency for Defense Development |
Keywords: Autonomous Vehicle Navigation, Vision-Based Navigation, Field Robots
Abstract: Autonomous navigation in off-road conditions requires an accurate estimation of terrain traversability. However, traversability estimation in unstructured environments is subject to high uncertainty due to the variability of numerous factors that influence vehicle-terrain interaction. Consequently, it is challenging to obtain a generalizable model that can accurately predict traversability in a variety of environments. This paper presents METAVerse, a meta-learning framework for learning a global model that accurately and reliably predicts terrain traversability across diverse environments. We train the traversability prediction network to generate a dense and continuous-valued cost map from a sparse LiDAR point cloud, leveraging vehicle-terrain interaction feedback in a self-supervised manner. Meta-learning is utilized to train a global model with driving data collected from multiple environments, effectively minimizing estimation uncertainty. During deployment, online adaptation is performed to rapidly adapt the network to the local environment by exploiting recent interaction experiences. To conduct a comprehensive evaluation, we collect driving data from various terrains and demonstrate that our method can obtain a global model that minimizes uncertainty. Moreover, by integrating our model with a model predictive controller, we demonstrate that the reduced uncertainty results in safe and stable navigation in unstructured and unknown terrains.
|
|
11:15-11:30, Paper FrBT8.2 | |
MapLocNet: Coarse-To-Fine Feature Registration for Visual Re-Localization in Navigation Maps |
|
Wu, Hang | Huawei Technology |
Zhang, Zhenghao | Huawei Technology |
Lin, Siyuan | Huawei Technology |
Mu, Xiangru | Shanghai Jiao Tong University |
Zhao, Qiang | Huawei |
Yang, Ming | Shanghai Jiao Tong University |
Qin, Tong | Shanghai Jiao Tong University |
Keywords: Autonomous Vehicle Navigation, Localization
Abstract: Robust localization is the cornerstone of autonomous driving, especially in challenging urban environments where GPS signals suffer from multipath errors. Traditional localization approaches rely on high-definition (HD) maps, which consist of precisely annotated landmarks. However, building HD map is expensive and challenging to scale up. Given these limitations, leveraging navigation maps has emerged as a promising low-cost alternative for localization. Current approaches based on navigation maps can achieve highly accurate localization, but their complex matching strategies lead to unacceptable inference latency that fails to meet the real-time demands. To address these limitations, we introduce MapLocNet, a novel transformer-based neural re-localization method. Inspired by image registration, our approach performs a coarse-to-fine neural feature registration between navigation map features and visual bird's-eye view features. MapLocNet substantially outperforms the current state-of-the-art methods on both nuScenes and Argoverse datasets, demonstrating significant improvements in localization accuracy and inference speed across both single-view and surround-view input settings. We highlight that our research presents an HD-map-free localization method for autonomous driving, offering a cost-effective, reliable, and scalable solution for challenging urban environments.
|
|
11:30-11:45, Paper FrBT8.3 | |
ParkingE2E: Camera-Based End-To-End Parking Network, from Images to Planning |
|
Li, Changze | Shanghai Jiao Tong University |
Ji, Ziheng | Shanghai Jiao Tong University |
Chen, Zhe | Shanghai Jiao Tong University |
Qin, Tong | Shanghai Jiao Tong University |
Yang, Ming | Shanghai Jiao Tong University |
Keywords: Autonomous Vehicle Navigation
Abstract: Autonomous parking is a crucial task in the intelligent driving field. Traditional parking algorithms are usually implemented using rule-based schemes. However, these methods are less effective in complex parking scenarios due to the intricate design of the algorithms. In contrast, neural-network-based methods tend to be more intuitive and versatile than the rule-based methods. By collecting a large number of expert parking trajectory data and emulating human strategy via learning-based methods, the parking task can be effectively addressed. In this paper, we employ imitation learning to perform end-to-end planning from RGB images to path planning by imitating human driving trajectories. The proposed end-to-end approach utilizes a target query encoder to fuse images and target features, and a transformer-based decoder to autoregressive predict future waypoints. We conducted extensive experiments in real-world scenarios, and the results demonstrate that the proposed method achieved an average parking success rate of 87.8% across four different real-world garages. Real-vehicle experiments further validate the feasibility and effectiveness of the method proposed in this paper.
|
|
11:45-12:00, Paper FrBT8.4 | |
M3-GMN: A Multi-Environment, Multi-LiDAR, Multi-Task Dataset for Grid Map Based Navigation |
|
Xie, Guanglei | National University of Defense Technology |
Fu, Hao | National University of Defense Technology |
Xue, Hanzhang | National University of Defense Technology |
Liu, Bokai | National University of Defense Technology |
Xu, Xin | National University of Defense Technology |
Li, Xiaohui | National University of Defense Technology |
Sun, Zhenping | National University of Defense Technology |
Keywords: Autonomous Vehicle Navigation, Field Robots
Abstract: In this paper, we propose a multi-environment, multi-LiDAR, multi-task dataset to promote the grid map-based navigation capability for autonomous vehicles. The dataset comprises structured and unstructured environmental data captured by different types of LiDAR and contains various challenging scenarios, including moving objects, negative obstacles, steep slopes, cliffs, overhangs, etc. Further, we have devised an innovative method for the generation of ground truth, facilitating the creation of dense, accurate, and stable grid map with a minimal requirement for human annotation efforts. A new baseline method and two existing approaches are evaluated on this dataset. Results indicate that existing approaches perform much worse than the proposed baseline. The dataset will be made publicly available at https://github.com/guanglei96/M3-GMN.
|
|
FrBT9 |
Room 9 |
Path Planning for Multiple Robots |
Regular session |
Chair: Indelman, Vadim | Technion - Israel Institute of Technology |
Co-Chair: Bezzo, Nicola | University of Virginia |
|
11:00-11:15, Paper FrBT9.1 | |
Multi-Robot Communication-Aware Cooperative Belief Space Planning with Inconsistent Beliefs: An Action-Consistent Approach |
|
Kundu, Tanmoy | Technion - Israel Institute of Technology |
Rafaeli, Moshe | Technion - Israel Institute of Technology |
Indelman, Vadim | Technion - Israel Institute of Technology |
Keywords: Path Planning for Multiple Mobile Robots or Agents, Planning, Scheduling and Coordination, Multi-Robot Systems
Abstract: Multi-robot belief space planning (MR-BSP) is essential for reliable and safe autonomy. While planning, each robot maintains a belief over the state of the environment and reasons how the belief would evolve in the future for different candidate actions. Yet, existing MR-BSP works have a common assumption that the beliefs of different robots are consistent at planning time. Such an assumption is often highly unrealistic, as it requires prohibitively extensive and frequent communication capabilities. In practice, each robot may have a different belief about the state of the environment. Crucially, when the beliefs of different robots are inconsistent, state-of-the-art MR-BSP approaches could result in a lack of coordination between the robots, and in general, could yield dangerous, unsafe and sub-optimal decisions. In this paper, we tackle this crucial gap. We develop a novel decentralized algorithm that is guaranteed to find a consistent joint action. For a given robot, our algorithm reasons for action preferences about 1) its local information, 2) what it perceives about the reasoning of the other robot, and 3) what it perceives about the reasoning of itself perceived by the other robot. This algorithm finds a consistent joint action whenever these steps yield the same best joint action obtained by reasoning about action preferences; otherwise, it self-triggers communication between the robots. Experimental results show efficacy of our algorithm in comparison with two baseline algorithms.
|
|
11:15-11:30, Paper FrBT9.2 | |
Robust Online Epistemic Replanning of Multi-Robot Missions |
|
Bramblett, Lauren | University of Virginia |
Miloradovic, Branko | Mälardalen University |
Sherman, Patrick | University of Virginia |
Papadopoulos, Alessandro Vittorio | Mälardalen University |
Bezzo, Nicola | University of Virginia |
Keywords: Path Planning for Multiple Mobile Robots or Agents, Distributed Robot Systems, Planning, Scheduling and Coordination
Abstract: As Multi-Robot Systems (MRS) become more affordable and computing capabilities grow, they provide significant advantages for complex applications such as environmental monitoring, underwater inspections, or space exploration. However, accounting for potential communication loss or the unavailability of communication infrastructures in these application domains remains an open problem. Much of the applicable MRS research assumes that the system can sustain communication through proximity regulations and formation control or by devising a framework for separating and adhering to a predetermined plan for extended periods of disconnection. The latter technique enables an MRS to be more efficient, but breakdowns and environmental uncertainties can have a domino effect throughout the system, particularly when the mission goal is intricate or time-sensitive. To deal with this problem, our proposed framework has two main phases: i) a centralized planner to allocate mission tasks by rewarding intermittent rendezvous between robots to mitigate the effects of the unforeseen events during mission execution, and ii) a decentralized replanning scheme leveraging epistemic planning to formalize belief propagation and a Monte Carlo tree search for policy optimization given distributed rational belief updates. The proposed framework outperforms a baseline heuristic and is validated using simulations and experiments with aerial vehicles.
|
|
11:30-11:45, Paper FrBT9.3 | |
A Heterogeneous System of Systems Framework for Proactive Path Planning of a UAV-Assisted UGV in Uncertain Environments |
|
Sherman, Patrick | University of Virginia |
Bezzo, Nicola | University of Virginia |
Keywords: Path Planning for Multiple Mobile Robots or Agents, Cooperating Robots, Multi-Robot Systems
Abstract: A common challenge for mobile robots is traversing uncertain environments containing obstacles, rough terrain, or hazards. Without full knowledge of the environment, an unmanned ground vehicle (UGV) navigating towards a goal could easily drive down a path that is blocked (requiring the robot to retrace sections of its path) or run into a hazard causing a catastrophic failure. To address this issue we propose a system of systems (SoS) abstraction to group a distributed set of robots into a single system. Specifically, we propose augmenting the sensing capabilities of a UGV using an unmanned aerial vehicle (UAV). With different dynamic and sensing capabilities, the UAV scouts ahead and proactively updates the plan for the UGV using information discovered about the environment. To predict reachable states of the UGV, the UAV employs a sampling-based method in which a set of virtual particles representing simulated instances of the UGV are used to approximate the distribution of possible trajectories. The UAV assesses if the current UGV path plan is inefficient or unsafe, and if so, provides an alternative path to the UGV. For robustness, a model predictive path integral (MPPI) optimization method is used to modify the waypoints when delivered to the UGV. The strategy is validated in simulation and experimentally.
|
|
11:45-12:00, Paper FrBT9.4 | |
IR2: Implicit Rendezvous for Robotic Exploration Teams under Sparse Intermittent Connectivity |
|
Tan, Derek Ming Siang | National University of Singapore |
Ma, Yixiao | National University of Singapore |
Liang, Jingsong | National University of Singapore |
Chng, Yi Cheng | Singapore Technologies Engineering Land Systems |
Cao, Yuhong | National University of Singapore |
Sartoretti, Guillaume Adrien | National University of Singapore (NUS) |
Keywords: Path Planning for Multiple Mobile Robots or Agents, Reinforcement Learning, Distributed Robot Systems
Abstract: Information sharing is critical in time-sensitive and realistic multi-robot exploration, especially for smaller robotic teams in large-scale environments where connectivity may be sparse and intermittent. Existing methods often overlook such communication constraints by assuming unrealistic global connectivity. Other works account for communication constraints (by maintaining close proximity or line of sight during information exchange), but are often inefficient. For instance, preplanned rendezvous approaches typically involve unnecessary detours resulting from poorly timed rendezvous, while pursuit-based approaches often result in short-sighted decisions due to their greedy nature. We present IR2, a deep reinforcement learning approach to information sharing for multi-robot exploration. Leveraging attention-based neural networks trained via reinforcement and curriculum learning, IR2 allows robots to effectively reason about the longer-term trade-offs between disconnecting for solo exploration and reconnecting for information sharing. In addition, we propose a hierarchical graph formulation to maintain a sparse yet informative graph, enabling our approach to scale to large-scale environments. We present simulation results in three large-scale Gazebo environments, which show that our approach yields 6.6 - 34.1% shorter exploration paths and significantly improved mapped area consistency among robots when compared to state-of-the-art baselines. Our simulation training and testing code is available at https://github.com/marmotlab/IR2.
|
|
FrBT10 |
Room 10 |
Computer Vision for Transportation II |
Regular session |
Chair: Valada, Abhinav | University of Freiburg |
|
11:00-11:15, Paper FrBT10.1 | |
PCT: Perspective Cue Training Framework for Multi-Camera BEV Segmentation |
|
Ishikawa, Haruya | Keio University |
Iida, Takumi | SenseTime Japan |
Konishi, Yoshinori | SenseTime Japan Ltd |
Aoki, Yoshimitsu | Keio University |
Keywords: Computer Vision for Transportation, Object Detection, Segmentation and Categorization, Mapping
Abstract: Generating annotations for bird's-eye-view (BEV) segmentation presents significant challenges due to the scenes' complexity and the high manual annotation cost. In this work, we address these challenges by leveraging the abundance of unlabeled data available. We propose the Perspective Cue Training (PCT) framework, a novel training framework that utilizes pseudo-labels generated from unlabeled perspective images using publicly available semantic segmentation models trained on large street-view datasets. PCT applies a perspective view task head to the image encoder shared with the BEV segmentation head, effectively utilizing the unlabeled data to be trained with the generated pseudo-labels. Since image encoders are present in nearly all camera-based BEV segmentation architectures, PCT is flexible and applicable to various existing BEV architectures. In this paper, we applied PCT for semi-supervised learning (SSL) and unsupervised domain adaptation (UDA). Additionally, we introduce strong input perturbation through Camera Dropout (CamDrop) and feature perturbation via BEV Feature Dropout (BFD), which are crucial for enhancing SSL capabilities using our teacher-student framework. Our comprehensive approach is simple and flexible but yields significant improvements over various baselines for SSL and UDA, achieving competitive performances even against the current state-of-the-art.
|
|
11:15-11:30, Paper FrBT10.2 | |
A Point-Based Approach to Efficient LiDAR Multi-Task Perception |
|
Lang, Christopher | University of Freiburg |
Braun, Alexander | Robert Bosch GmbH |
Schillingmann, Lars | Robert Bosch GmbH |
Valada, Abhinav | University of Freiburg |
Keywords: Computer Vision for Transportation, Object Detection, Segmentation and Categorization, Deep Learning for Visual Perception
Abstract: Multi-task networks can potentially improve performance and computational efficiency compared to single-task networks, facilitating online deployment. However, current multi-task architectures in point cloud perception combine multiple task-specific point cloud representations, each requiring a separate feature encoder and making the network structures bulky and slow. We propose PAttFormer, an efficient multi-task architecture for joint semantic segmentation and object detection in point clouds that only relies on a point-based representation. The network builds on transformer-based feature encoders using neighborhood attention and grid-pooling and a query-based detection decoder using a novel 3D deformable-attention detection head design. Unlike other LiDAR-based multi-task architectures, our proposed PAttFormer does not require separate feature encoders for multiple task-specific point cloud representations, resulting in a network that is 3x smaller and 1.4x faster while achieving competitive performance on the nuScenes and KITTI benchmarks for autonomous driving perception. Our extensive evaluations show substantial gains from multi-task learning, improving LiDAR semantic segmentation by +1.7% in mIou and 3D object detection by +1.7% in mAP on the nuScenes benchmark compared to the single-task models.
|
|
11:30-11:45, Paper FrBT10.3 | |
Depth Completion Using Galerkin Attention |
|
Xu, Yinuo | Beijing University of Posts and Telecommunications |
Zhang, Xuesong | Beijing University of Posts and Telecommunications |
Keywords: Vision-Based Navigation, Computer Vision for Transportation, Computer Vision for Automation
Abstract: Current depth completion methods usually employ a pair of calibrated RGB and depth sensors to reconstruct a dense depth map. Although RGB (dense) and depth (sparse) measurements are collected from the same underlying scene, they reflect different physical characteristics and thus it remains rather intricate how the devised RGB guidance scheme can effectively leads to a faithful depth recovery. Different from existing 3D geometry representations, such as point cloud, voxels or meshes, we propose to define 3D scenes as vector-valued functions, mapping from the image plane to RGBD vectors. This scene function representation brings two benefits: 1) allowing for the adaptation of the Galerkin method to explore the nodal basis of the scene function space, and 2) transforming the irregularly scattered (X,Y,Z) points in the Euclidean space into the depth function defined over the regular grid in the image plane. We further leverage these two benefits within a deep neural network, characterized by an efficient Galerkin attention-based RGBD function embedding to effectively explore the interaction of color and depth information, and by the utilization of equivariant convolution operation on the RGBD feature map as efficient basic blocks. Experiments show that the proposed method achieves significant performance improvement over state-of-the-arts. Code at https://github.com/ZXS-Labs/DCGA.
|
|
11:45-12:00, Paper FrBT10.4 | |
BEV^2PR: BEV-Enhanced Visual Place Recognition with Structural Cues |
|
Ge, Fudong | Institute of Automation,Chinese Academy of Sciences |
Zhang, Yiwei | Institute of Automation, Chinese Academy of Sciences |
Shen, Shuhan | Institute of Automation, Chinese Academy of Sciences |
Hu, Weiming | University of Chinese Academy of Sciences |
Wang, Yue | Zhejiang University |
Gao, Jin | Institute of Automation Chinese Academy of Sciences |
Keywords: Vision-Based Navigation, Localization, Computer Vision for Transportation
Abstract: In this paper, we propose a new image-based visual place recognition (VPR) framework by exploiting the structural cues in bird’s-eye view (BEV) from a single monocular camera. The motivation arises from two key observations about place recognition methods based on both appearance and structure: 1) For the methods relying on LiDAR sensors, the integration of LiDAR in robotic systems has led to increased expenses, while the alignment of data between different sensors is also a major challenge. 2) Other image-/camera-based methods, involving integrating RGB images and their derived variants (e.g., pseudo depth images, pseudo 3D point clouds), exhibit several limitations, such as the failure to effectively exploit the explicit spatial relationships between different objects. To tackle the above issues, we design a new BEV-enhanced VPR framework, namely BEV^2PR, generating a composite descriptor with both visual cues and spatial awareness based on a single camera. The key points lie in: 1) We use BEV features as an explicit source of structural knowledge in constructing global features. 2) The lower layers of the pre-trained backbone from BEV generation are shared for visual and structural streams in VPR, facilitating the learning of fine-grained local features in the visual stream. 3) The complementary visual and structural features can jointly enhance VPR performance. Our BEV^2PR framework enables consistent performance improvements over several popular aggregation modules for RGB global features. The experiments on our collected VPR-NuScenes dataset demonstrate an absolute gain of 2.47% on Recall@1 for the strong Conv-AP baseline to achieve the best performance in our setting, and notably, a 18.06% gain on the hard set. The code and dataset will be available at https://github.com/FudongGe/BEV2PR.
|
|
FrBT11 |
Room 11 |
Legged Robots II |
Regular session |
Co-Chair: Zimmermann, Karel | Ceske Vysoke Uceni Technicke V Praze, FEL |
|
11:00-11:15, Paper FrBT11.1 | |
Accurate Power Consumption Estimation Method Makes Walking Robots Energy Efficient and Quiet |
|
Valsecchi, Giorgio | Robotic System Lab, ETH |
Vicari, Andrea | Scuola Superiore Sant'Anna |
Tischhauser, Fabian | ETH Zurich |
Garabini, Manolo | Università Di Pisa |
Hutter, Marco | ETH Zurich |
Keywords: Machine Learning for Robot Control, Model Learning for Control, Legged Robots
Abstract: Power consumption is a frequently overlooked aspect in robotics, especially in the context of legged robots. Nevertheless, improving the efficiency of walking robots is crucial to overcome the current limitations in runtime. This work proposes a novel method for precisely estimating actuator power consumption based on LSTM neural networks. The performance of this approach is benchmarked against currently employed models and validated on real hardware using certified instruments. The proposed method is integrated into the Isaac Gym framework and utilized to train a power-efficient policy. Instead of optimizing for handcrafted cost functions, such as the often used torque-square minimization, our approach for the first time trains RL policies that minimize the effective energy consumption. Hardware results demonstrate a reduction of approximately 25% in the robot's total power consumption, with a notable 50% decrease observed for the knee actuator. Additionally, the newly developed policy generates significantly smoother and quieter motions.
|
|
11:15-11:30, Paper FrBT11.2 | |
Co-RaL: Complementary Radar-Leg Odometry with 4-DoF Optimization and Rolling Contact |
|
Jung, Sangwoo | Seoul National University |
Yang, Wooseong | Seoul National University |
Kim, Ayoung | Seoul National University |
Keywords: Range Sensing, Legged Robots, SLAM
Abstract: Robust and accurate localization in challenging environments is becoming crucial for SLAM. In this paper, we propose a unique sensor configuration for precise and robust odometry by integrating chip radar and a legged robot. Specifically, we introduce a tightly coupled radar-leg odometry algorithm for complementary drift correction. Adopting the 4-DoF optimization and decoupled RANSAC to mmWave chip radar significantly enhances radar odometry beyond the existing method, especially z-directional even when using a single radar. For the leg odometry, we employ rolling contact modeling-aided forward kinematics, accommodating scenarios with the potential possibility of contact drift and radar failure. We evaluate our method by comparing it with other chip radar odometry algorithms using real-world datasets with diverse environments while the datasets will be released for the robotics community. https://github.com/SangwooJung98/Co-RaL-Dataset
|
|
11:30-11:45, Paper FrBT11.3 | |
Experience-Learning Inspired Two-Step Reward Method for Efficient Legged Locomotion Learning towards Natural and Robust Gaits |
|
Li, Yinghui | Shanghai Jiao Tong University |
Wu, Jinze | Shanghai Jiao Tong University |
Liu, Xin | Shanghai Jiao Tong University |
Guo, Weizhong | Shanghai Jiao Tong University |
Xue, Yufei | Shanghai Jiao Tong University |
Keywords: Reinforcement Learning, Legged Robots, Bioinspired Robot Learning
Abstract: Legged robots excel in navigating complex terrains, yet learning natural and robust motions in such environments remains challenging. Inspired by animals’ experience-based stepwise learning process, we propose a two-stage framework for legged robots to progressively learn naturally robust movements using a two-step reward method. Initially robots learn the fundamental gaits on flat terrains with gait-rewards and generating valuable motion data. Subsequently, leveraging learned motion experience, they adopt adversarial imitation learning to tackle challenging terrains with refined movements.Our method addresses the challenge of acquiring effective imitation data and facilitates the learning process under various gait parameters with ease. The effectiveness of this approach has been validated on both quadruped and hexapod robots, demonstrating naturally robust gaits in real-world applications.
|
|
11:45-12:00, Paper FrBT11.4 | |
CaT: Constraints As Terminations for Legged Locomotion Reinforcement Learning |
|
Chane-Sane, Elliot | LAAS, CNRS |
Leziart, Pierre-Alexandre | Laboratory for Analysis and Architecture of Systems (LAAS-CNRS), |
Flayols, Thomas | LAAS, CNRS |
Stasse, Olivier | LAAS, CNRS |
Soueres, Philippe | LAAS-CNRS |
Mansard, Nicolas | CNRS |
Keywords: Reinforcement Learning, Legged Robots
Abstract: Deep Reinforcement Learning (RL) has demonstrated impressive results in solving complex robotic tasks such as quadruped locomotion. Yet, current solvers fail to produce efficient policies respecting hard constraints. In this work, we advocate for integrating constraints into robot learning and present Constraints as Terminations (CaT), a novel constrained RL algorithm. Departing from classical constrained RL formulations, we reformulate constraints through stochastic terminations during policy learning: any violation of a constraint triggers a probability of terminating potential future rewards the RL agent could attain. We propose an algorithmic approach to this formulation, by minimally modifying widely used off-the-shelf RL algorithms in robot learning (such as Proximal Policy Optimization). Our approach leads to excellent constraint adherence without introducing undue complexity and computational overhead, thus mitigating barriers to broader adoption. Through empirical evaluation on the real quadruped robot Solo crossing challenging obstacles, we demonstrate that CaT provides a compelling solution for incorporating constraints into RL frameworks. Videos and code are available at https://constraints-as-terminations.github.io.
|
|
FrBT12 |
Room 12 |
Semantic Scene Understanding III |
Regular session |
Chair: Beltrame, Giovanni | Ecole Polytechnique De Montreal |
|
11:00-11:15, Paper FrBT12.1 | |
QueSTMaps: Queryable Semantic Topological Maps for 3D Scene Understanding |
|
Mehan, Yash | International Institute of Information Technology |
Gupta, Kumaraditya | IIIT Hyderabad |
Jayanti, Rohit | Robotics Research Center, IIIT Hyderabad |
Govil, Anirudh | Robotics Research Center, International Institute of Information |
Garg, Sourav | University of Adelaide |
Krishna, Madhava | IIIT Hyderabad |
Keywords: Semantic Scene Understanding, Object Detection, Segmentation and Categorization, Recognition
Abstract: Robotic tasks such as planning and navigation require a hierarchical semantic understanding of a scene, which could include multiple floors and rooms. Current methods primarily focus on object segmentation for 3D scene understanding. However, such methods struggle to segment out topological regions like ``kitchen'' in the scene. In this work, we introduce a two-step pipeline to solve this problem. First, we extract a topological map, i.e., floorplan of the indoor scene using a novel multi-channel occupancy representation. Then, we generate CLIP-aligned features and semantic labels for every room instance based on the objects it contains using a self-attention transformer. Our language-topology alignment supports natural language querying, e.g., a ``place to cook'' locates the ``kitchen''. We outperform the current state-of-the-art on room segmentation by ~20% and room classification by~12%. Our detailed qualitative analysis and ablation studies provide insights into the problem of joint structural and semantic 3D scene understanding. Project Page: https://quest-maps.github.io
|
|
11:15-11:30, Paper FrBT12.2 | |
Commonsense Scene Graph-Based Target Localization for Object Search |
|
Ge, Wenqi | Southern University of Science and Technology |
Tang, Chao | Southern University of Science and Technology |
Zhang, Hong | SUSTech |
Keywords: Semantic Scene Understanding, Task Planning, Mapping
Abstract: Object search is a fundamental skill for household robots, yet the core problem lies in the robot's ability to locate the target object accurately. The dynamic nature of household environments, characterized by the arbitrary placement of daily objects by users, makes it challenging to perform target localization. To efficiently locate the target object, the robot needs to be equipped with knowledge at both the object and room level. However, existing approaches rely solely on one type of knowledge, leading to unsatisfactory object localization performance and,consequently, inefficient object search processes. To address this problem, we propose a commonsense scene graph-based target localization, CSG-TL, to enhance target object search in the household environment. Given the pre-built map with stationery items, the robot models the room-level knowledge by a scene graph and incorporates object-level commonsense knowledge generated by a large language model (LLM). To demonstrate the superiority of CSG-TL on object localization, extensive experiments are performed on the real-world ScanNet dataset and the AI2Thor simulator. Moreover, we have extended CSG-TL to an object search framework, CSG-OS, validated in both simulated and real-world environments. Code and videos are available at https://sites.google.com/view/csg-os.
|
|
11:30-11:45, Paper FrBT12.3 | |
Language-Embedded Gaussian Splats (LEGS): Incrementally Building Room-Scale Representations with a Mobile Robot |
|
Yu, Justin | University of California Berkeley |
Hari, Kush | UC Berkeley |
Srinivas, Kishore | UC Berkeley |
El-Refai, Karim | University of California, Berkeley |
Rashid, Adam | UC Berkeley |
Kim, Chung Min | University of California, Berkeley |
Kerr, Justin | University of California, Berkeley |
Cheng, Richard | California Institute of Technology |
Irshad, Muhammad Zubair | Georgia Institute of Technology |
Balakrishna, Ashwin | Toyota Research Institute |
Kollar, Thomas | Toyota Research Institute |
Goldberg, Ken | UC Berkeley |
Keywords: Semantic Scene Understanding, Inventory Management, AI-Enabled Robotics
Abstract: Building semantic 3D maps is valuable for searching for objects of interest in offices, warehouses, stores, and homes. We present a mapping system that incrementally builds a Language-Embedded Gaussian Splat (LEGS): a detailed 3D scene representation that encodes both appearance and semantics in a unified representation. LEGS is trained online as a robot traverses its environment to enable localization of open-vocabulary object queries. We evaluate LEGS on 4 room-scale scenes where we query for objects in the scene to assess how LEGS can capture semantic meaning. We compare LEGS to LERF [1] and find that while both systems have comparable object query success rates, LEGS trains over 3.5x faster than LERF. Results suggest that a multi-camera setup and incremental bundle adjustment can boost visual reconstruction quality in constrained robot trajectories, and suggest LEGS can localize open-vocabulary and long-tail object queries with up to 66% accuracy. See project website at: berkeleyautomation.github.io/LEGS
|
|
11:45-12:00, Paper FrBT12.4 | |
SSCBench: A Large-Scale 3D Semantic Scene Completion Benchmark for Autonomous Driving |
|
Li, Yiming | New York University |
Li, Sihang | New York University |
Liu, Xinhao | New York University |
Gong, Moonjun | New York University |
Li, Kenan | New York University |
Nuo, Chen | New York University |
Wang, Zijun | AI4CE |
Li, Zhiheng | New York University |
Jiang, Tao | Tsinghua |
Yu, Fisher | ETH Zürich |
Wang, Yue | USC |
Zhao, Hang | Tsinghua University |
Yu, Zhiding | NVIDIA |
Feng, Chen | New York University |
Keywords: Semantic Scene Understanding, AI-Enabled Robotics, Autonomous Agents
Abstract: Monocular scene understanding is a foundational component of autonomous systems. Within the spectrum of monocular perception topics, one crucial and useful task for holistic 3D scene understanding is semantic scene completion (SSC), which jointly completes semantic information and geometric details from RGB input. However, progress in SSC, particularly in large-scale street views, is hindered by the scarcity of high-quality datasets. To address this issue, we introduce SSCBench, a comprehensive benchmark that integrates scenes from widely used automotive datasets (e.g., KITTI-360, nuScenes, and Waymo). SSCBench follows an established setup and format in the community, facilitating the easy exploration of SSC methods in various street views. We benchmark models using monocular, trinocular, and point cloud input to assess the performance gap resulting from sensor coverage and modality. Moreover, we have unified semantic labels across diverse datasets to simplify cross-domain generalization testing. We commit to including more datasets and SSC models to drive further advancements in this field. Our data and code are available at https://github.com/ai4ce/SSCBench.
|
|
FrBT13 |
Room 13 |
Computer Vision for Automation IV |
Regular session |
Chair: Lim, Yongseob | DGIST |
Co-Chair: Popovic, Marija | TU Delft |
|
11:00-11:15, Paper FrBT13.1 | |
Exploiting Priors from 3D Diffusion Models for RGB-Based One-Shot View Planning |
|
Pan, Sicong | University of Bonn |
Jin, Liren | University of Bonn |
Huang, Xuying | University of Bonn |
Stachniss, Cyrill | University of Bonn |
Popovic, Marija | TU Delft |
Bennewitz, Maren | University of Bonn |
Keywords: Computer Vision for Automation, Motion and Path Planning
Abstract: Object reconstruction is relevant for many autonomous robotic tasks that require interaction with the environment. A key challenge in such scenarios is planning view configurations to collect informative measurements for reconstructing an initially unknown object. One-shot view planning enables efficient data collection by predicting view configurations and planning the globally shortest path connecting all views at once. However, prior knowledge about the object is required to conduct one-shot view planning. In this work, we propose a novel one-shot view planning approach that utilizes the powerful 3D generation capabilities of diffusion models as priors. By incorporating such geometric priors into our pipeline, we achieve effective one-shot view planning starting with only a single RGB image of the object to be reconstructed. Our planning experiments in simulation and real-world setups indicate that our approach balances well between object reconstruction quality and movement cost.
|
|
11:15-11:30, Paper FrBT13.2 | |
Shape-Prior Free Space-Time Neural Radiance Field for 4D Semantic Reconstruction of Dynamic Scene from Sparse-View RGB Videos |
|
Biswas, Sandika | IIT Bombay |
Banerjee, Biplab | Indian Institute of Technology, Bombay |
Rezatofighi, Hamid | Monash University |
|
11:30-11:45, Paper FrBT13.3 | |
Hybrid Stereo Dense Depth Estimation for Robotics Tasks in Industrial Automation |
|
Singh, Suhani | Roboception GmbH |
Suppa, Michael | Roboception GmbH and University of Bremen |
Suarez, Raul | Universitat Politecnica De Catalunya (UPC) |
Rosell, Jan | Universitat Politècnica De Catalunya (UPC) |
Keywords: Computer Vision for Automation, Deep Learning for Visual Perception, Deep Learning Methods
Abstract: We introduce a simple yet effective approach for dense depth reconstruction that operates directly on raw disparity data, eliminating the need for additional disparity refinement stages. By leveraging disparity maps generated from conventional stereo methods, we train a U-Net-based model to directly map disparity to depth, bypassing complex feature engineering. Our method capitalizes on the robustness of traditional stereo matching techniques to varying scenes, focusing exclusively on dense depth reconstruction. This approach not only simplifies the training process but also significantly reduces the requirement for large-scale training datasets. Extensive evaluations demonstrate that our method surpasses classical stereo matching frameworks and state-of-the-art classical post-refinement techniques, achieving superior accuracy. Additionally, our approach offers competitive inference times, comparable to classical as well as end-to-end deep learning methods, making it highly suitable for real-time robotic applications.
|
|
11:45-12:00, Paper FrBT13.4 | |
Recovering Missed Detections in an Elevator Button Segmentation Task |
|
Verzic, Nicholas | University of Texas at Austin |
Chadaga, Abhinav | The University of Texas at Austin |
Hart, Justin | University of Texas at Austin |
Keywords: Computer Vision for Automation, Data Sets for Robotic Vision
Abstract: One obstacle that mobile service robots face is operating elevators. Reading elevator control panel buttons involves both an instance segmentation of buttons and labels and associating buttons with their respective metal labels in the elevator. Segmentation algorithms, however, can miss detections. This paper presents a segmentation model specifically designed to solve the problem of missed detections. This can be used to recover detections that the initial model misses. This work presents: 1) a new elevator button dataset containing both 108 images sampled from the internet and 292 images imaged from 24 buildings from the University of Texas at Austin campus and the surrounding neighborhood, along with their segmentation boundaries and associated labels; 2) a vision pipeline based on Mask-RCNN for solving the initial image segmentation and labeling task; and 3) a novel method for identifying missed detections, using a Mask-RCNN network trained on expected button locations. Results show that the missed detections model, specifically developed to recover buttons and labels that were missed by the initial pass, is accurate on up to 99.33% of its predicted missed features on a synthetic missed-detection dataset and 97.14% of its predictions for features missed on a non-synthetic dataset. In the case of the average accuracy of successful button and label detections of a specifically-trained "weak" initial detector at a standard IoU threshold of 0.5, the missed detection model improves the detector's success rate from 78.57% on the button recognition task with the initial segmentation model only to an average accuracy of 87.08% with the missed detections model enabled. The overall accuracy of the best-performing pipeline implementing the missed detections model is 91.73% and 98.27% on our Internet subset and Campus subset of our dataset, respectively.
|
|