| |
Last updated on October 7, 2024. This conference program is tentative and subject to change
Technical Program for Wednesday October 16, 2024
|
WePI2T1 |
Room 1 |
Robotics and Automation II |
Teaser Session |
Chair: Sahoo, Soumya Ranjan | Indian Institute of Technology Kanpur |
|
09:00-10:00, Paper WePI2T1.1 | |
Robust Backstepping Controller with Adaptive Sliding Mode Observer for a Tilt-Augmented Quadrotor with Uncertainty Using SO(3) |
|
Seshasayanan, Sathyanarayanan | Indian Institute of Technology Kanpur |
Sahoo, Soumya Ranjan | Indian Institute of Technology Kanpur |
Keywords: Robust/Adaptive Control, Space Robotics and Automation
Abstract: The conventional quadrotor is incapable of controlling position and orientation independently. To mitigate this deficiency, we use a tilt-augmented quadrotor for greater mobility in a constrained environment. When the rotors tilt in a tilt-augmented quadrotor, it leads to changes in moment-of-inertia. These changes in the moment-of-inertia and external disturbances introduce uncertainty terms into the model. In this paper, we design an adaptive super-twisting sliding mode observer that guarantees the finite time estimation of uncertain terms with unknown maximum bounds. With the help of this observer, a backstepping controller using SO(3) is developed to establish exponential convergence to the desired trajectory. The exponential convergence of the backstepping controller and finite time convergence of the observer are established using the Lyapunov approach. Experiments are performed to compare the performance of both the existing controller and our proposed controller and corresponding videos are at https://www.youtube.com/watch?v=brTd5UYvciM.
|
|
09:00-10:00, Paper WePI2T1.2 | |
Design, Prototype, and Performance Assessment of an Autonomous Manipulation System for Mars Sample Recovery Helicopter |
|
Kalantari, Arash | NASA JPL |
Brinkman, Alexander | Jet Propulsion Laboratory, California Institute of Technology |
Carpenter, Kalind | Jet Propulsion Laboratory |
Gildner, Matthew | Jet Propulsion Laboratory |
Jenkins, Justin | Jet Propulsion Laboratory |
Newill-Smith, David | NASA Jet Propulsion Laboratory |
Seiden, Jeffrey | NASA Jet Propulsion Laboratory |
Umali, Allen | NASA Jet Propulsion Laboratory |
McCormick, Ryan | University of Nebraska - Lincoln |
Keywords: Mobile Manipulation, Space Robotics and Automation, Aerial Systems: Applications
Abstract: This paper presents the design, prototype, and testing of a 150 g (current best estimate) manipulation system that enables Mars Sample Recovery Helicopter (SRH) concept to autonomously pickup, stow, and drop-off Returnable Sample Tube and Glove Assemblies (RGAs) on the surface of Mars next to the Sample Retrieval Lander (SRL). It consists of a 3 DOF planar Robotic Arm (RA), a novel 2 DOF Gripper with compliant fingers, and a Stow Mechanism. Within the planned Mars Sample Return (MSR) campaign, two SRHs would operate in parallel to retrieve and transfer total of 10 RGAs (146g each) to the SRL, as the backup to the Perseverance Rover. Once SRH arrives at the target pickup location, the RA places the Gripper precisely over the RGA. The gripper grabs and picks up RGAs using a linkage based non-back-drivable mechanism and its compliant fingers. Subsequently, the RA is secured into the stow features, following dislodging rocks and pebbles, by going through a specific sequence of joint trajectories. This ensures the RA and RGA are stable and secure during transit to the SRL while all Manipulation System actuators are powered off. The whole sequence of manipulation is performed autonomously using feedback of a pair of stereo-cameras and absolute encoders. Experimental evaluation of the Manipulation System performance has proved its robustness and consistency in successful RGA pickup, stow, and drop-off.
|
|
09:00-10:00, Paper WePI2T1.3 | |
The Control Strategy for Vehicle Transfer Robots in RO/RO Terminal Environments |
|
Liu, Zhi | Beijing Institute of Technology |
Xu, Yongkang | Beijing Institute of Technology |
Zhang, Lin | Beijing Institute of Technology |
Wang, Shoukun | Beijing Institute of Technology |
Wang, Junzheng | Beijing Institute of Technology |
Keywords: Industrial Robots, Sensor-based Control, Motion Control
Abstract: In the labor-intensive Roll-On/Roll-Off (RO/RO) terminal environment, research on vehicle transport robots with mobility, stability, and reliability is receiving increasing attention. This paper presents a novel control framework for a Straddle-Type Dual-Body vehicle transfer robot. Initially, fine segmentation and processing of point clouds from different areas of the robot are performed, switching perception strategies for different areas based on event triggers. For target pose estimation, a traversal-based point cloud matrix fitting algorithm is designed. Additionally, for loading and unloading operations, a docking controller based on real-time target detection is developed to ensure minimal lateral and angular errors during target docking. Finally, the proposed control framework is validated through operations of the vehicle transfer robot in outdoor RO/RO terminal yards. Experimental results indicate that the average docking error remains within 3 cm, with a 6.5% reduction in docking time under the same conditions. The docking precision and stability performance of the vehicle transfer robot surpass traditional methods, demonstrating satisfactory performance.
|
|
09:00-10:00, Paper WePI2T1.4 | |
Visuo-Tactile Exploration of Unknown Rigid 3D Curvatures by Vision-Augmented Unified Force-Impedance Control |
|
Karacan, Kübra | Technical University of Munich |
Zhang, Anran | Technical University of Munich |
Sadeghian, Hamid | Technical University of Munich |
Wu, Fan | Technical University of Munich |
Haddadin, Sami | Technical University of Munich |
Keywords: Energy and Environment-Aware Automation, Sensor-based Control, Factory Automation
Abstract: Despite recent advancements in torque-controlled tactile robots, integrating them into manufacturing settings remains challenging, particularly in complex environments. Simplifying robotic skill programming for non-experts is crucial for increasing robot deployment in manufacturing. This work proposes an innovative approach, Vision-Augmented Unified Force-Impedance Control (VA-UFIC), aimed at intuitive visuo-tactile exploration of unknown 3D curvatures. VA-UFIC stands out by seamlessly integrating vision and tactile data, enabling the exploration of diverse contact shapes in three dimensions, including point contacts, flat contacts with concave and convex curvatures, and scenarios involving contact loss. A pivotal component of our method is a robust online contact alignment monitoring system that considers tactile error, local surface curvature, and orientation, facilitating adaptive adjustments of robot stiffness and force regulation during exploration. We introduce virtual energy tanks within the control framework to ensure safety and stability, effectively addressing inherent safety concerns in visuo-tactile exploration. Evaluation using a Franka Emika research robot demonstrates the efficacy of VA-UFIC in exploring unknown 3D curvatures while adhering to arbitrarily defined force-motion policies. By seamlessly integrating vision and tactile sensing, VA-UFIC offers a promising avenue for intuitive exploration of complex environments, with potential applications spanning manufacturing, inspection, and beyond.
|
|
09:00-10:00, Paper WePI2T1.5 | |
Time-Optimal TCP and Robot Base Placement for Pick-And-Place Tasks in Highly Constrained Environments |
|
Wachter, Alexander | TU Wien |
Kugi, Andreas | TU Wien |
Hartl-Nesic, Christian | TU Wien |
Keywords: Factory Automation, Industrial Robots, Manufacturing, Maintenance and Supply Chains
Abstract: This work proposes a highly parallelized optimization scheme to simultaneously optimize the robot base and tool center point (TCP) placement within a robotic work cell for a sequence of pick-and-place tasks. The placement is optimized for minimum cycle time by considering the scenario holistically, including point-to-point trajectory planning while respecting the kinodynamic constraints of the robot, collision avoidance in highly constrained environments, redundancy in grasp configurations and inverse kinematic solutions, and the cyclic constraint of the process. The proposed algorithm is applied to optimize the robot base and TCP placements in a spatially constrained packaging scenario, demonstrating a cycle time reduction of 41 % compared to state-of-the-art approaches. The results are validated experimentally using a KUKA LBR iiwa with 7 degrees of freedom, where the TCP placement is realized using topology optimization and 3D printing.
|
|
09:00-10:00, Paper WePI2T1.6 | |
Evaluation of the Design of a Tool for the Automated Assembly of Preconfigured Wires |
|
Bartelt, Stefanie | Ruhr-Universität Bochum |
Kuhlenkötter, Bernd | Ruhr-Universität Bochum, Chair of Production Systems |
Keywords: Factory Automation, Mechanism Design, Grippers and Other End-Effectors
Abstract: The assembly of control cabinets is highly affected by manual processes. For reasons such as the shortage of skilled workers, there is a considerable need to automate the production steps. The automated wiring of prefabricated wires, which is a mayor process step in the manufacturing, is not automated yet. Besides different sensor technologies, a reliable tool for assembly must be developed. This article discusses the challenges and criteria for the development of such a tool. The most important functions of the tool include the handling of wires with different lengths, cross-sections, and colors as well as the consideration of close mounting positions. Based on a morphological box, a tool concept is derived and validated via tests on a demonstrator.
|
|
09:00-10:00, Paper WePI2T1.7 | |
Image to Patterning: Density-Specified Patterning of Micro-Structured Surfaces with a Mobile Robot |
|
Taylor, Annalisa T. | Northwestern University |
Landis, Malachi | Northwestern University |
Wang, Yaoke | Northwestern University |
Murphey, Todd | Northwestern University |
Guo, Ping | Northwestern University |
Keywords: Intelligent and Flexible Manufacturing, Control Architectures and Programming, Engineering for Robotic Systems
Abstract: Micro-structured surfaces possess useful properties such as friction modification, anti-fouling, and hydrophobicity. However, manufacturing these surfaces in an affordable, scalable, and efficient manner remains challenging. Standard coverage methods for surface patterning require precise placement of micro-scale features over meter-scale surfaces with expensive tooling for support. In this work, we address the scalability challenge in surface patterning by designing a mobile robot with a credit-card-sized footprint to generate micro-scale divots using a modulated tool tip. We provide a control architecture with a target feature density to specify surface coverage, eliminating the dependence on individual indentation locations. Our robot produces high-fidelity surface patterns and achieves automatic coverage of a surface from sophisticated target images. We validate an exemplary application of such micro-structured surfaces by controlling the friction coefficients at different locations according to the density of indentations. These results show the potential for compact robots to perform scalable manufacturing of functional surfaces, switching the focus from precision machines to small-footprint devices tasked with matching only the density of features.
|
|
09:00-10:00, Paper WePI2T1.8 | |
Modernising Delivery: A Low-Energy Tethered Package System Using Fixed-Wing Drones |
|
Ord, Samuel | RMIT University |
Marino, Matthew | RMIT University |
Wiley, Timothy Colin | RMIT University |
Keywords: Logistics, Product Design, Development and Prototyping, Dynamics
Abstract: Fixed-wing Uncrewed Aerial Vehicles (UAVs) can be used for remote package delivery missions by connecting a package to a UAV via a long tether. With a circular flight path at a calculated loiter radius, tether length and orbiting velocity, a package can be lowered to the ground in a quasi-stationary manner. This method achieves this with significantly less energy than hybrid hovering aircraft, which are currently used. UAV operating limitations pose a challenge for this method as ensuring the package stabilises such that it can be safely deployed without damage or posing a risk to people or property is challenging in most environmental conditions. To offer improved tether and package stabilisation, we introduce a novel Mid-Tether Drag Device (MTDD) that improves the stability of the package and enables compliant delivery missions within regulatory frameworks. We present a mathematical model of the delivery with a UAV and MTDD. We verify the accuracy of our model with real-world flight tests in low wind conditions without the MTDD, which have not been previously conducted at this scale in literature. Further validation is presented with flight tests at a flight range using a UAV instrumented with a package deployment system with both UAV and package tracking data acquisition. Our work enhances the abilities of UAVs to conduct aerial package delivery.
|
|
09:00-10:00, Paper WePI2T1.9 | |
IDF-MFL: Infrastructure-Free and Drift-Free Magnetic Field Localization for Mobile Robot |
|
Shen, Hongming | Nanyang Technological University |
Wu, Zhenyu | Nanyang Technological University |
Wang, Wei | Nanyang Technological University |
Lyu, Qiyang | Nanyang Technological University |
Zhou, Huiqin | Nanyang Technological University |
Wang, Danwei | Nanyang Technological University |
Keywords: Logistics, Factory Automation, Localization
Abstract: In recent years, infrastructure-based localization methods have achieved significant progress thanks to their reliable and drift-free localization capability. However, the pre-installed infrastructures suffer from inflexibilities and high maintenance costs. This poses an interesting problem of how to develop a drift-free localization system without using the pre-installed infrastructures. In this paper, an infrastructure-free and drift-free localization system is proposed using the ambient magnetic field (MF) information, namely IDF-MFL. IDF-MFL is infrastructure-free thanks to the high distinctiveness of the ambient MF information produced by inherent ferromagnetic objects in the environment, such as steel and reinforced concrete structures of buildings, and underground pipelines. The MF-based localization problem is defined as a stochastic optimization problem with the consideration of the non-Gaussian heavy-tailed noise introduced by MF measurement outliers (caused by dynamic ferromagnetic objects), and an outlier-robust state estimation algorithm is derived to find the optimal distribution of robot state that makes the expectation of MF matching cost achieves its lower bound. The proposed method is evaluated in multiple scenarios, including experiments on high-fidelity simulation, and real-world environments. The results demonstrate that the proposed method can achieve high-accuracy, reliable, and real-time localization without any pre-installed infrastructures.
|
|
09:00-10:00, Paper WePI2T1.10 | |
Data-Driven Modeling of Cable Slab Dynamics Via Neural Networks |
|
Al-Rawashdeh, Yazan | Memorial University of Newfoundland |
Al Saaideh, Mohammad | Memorial University of Newfoundland |
Pumphrey, Michael Joseph | Memorial University of Newfoundland |
Alatawneh, Natheer | Cysca Technologies |
Al Janaideh, Mohammad | University of Guelph |
Keywords: Semiconductor Manufacturing, Actuation and Joint Mechanisms, Force Control
Abstract: A novel method for analyzing the dynamics and bend geometry of a cable slab via trained neural networks is introduced. Neural networks are trained from real-time visual feedback capture via a high-speed camera during cyclic motion to track the positions of multiple markers affixed to the cable slab through image processing techniques. Experimental parameters are systematically varied to ensure a diverse range of training patterns. Consequently, two distinct data-driven neural network models are developed: a coupled model and a decoupled model. These models accurately predict the two-dimensional positions of the markers, even during non-cyclic motion profiles. Subsequently, the marker positions are utilized as waypoints to generate a cubic spline curve with time-varying coefficients, approximating the spatiotemporal solution of the cable slab dynamics. Notably, this spline can be segmented into smaller sections tailored to specific research objectives. Experimental results validate the effectiveness of the proposed methodology.
|
|
09:00-10:00, Paper WePI2T1.11 | |
One Problem, One Solution: Unifying Robot Design and Cell Layout Optimization |
|
Baumgärtner, Jan | Karlsruhe Institute of Technology |
Puchta, Alexander | Karlsruhe Institute of Technology |
Fleischer, Jürgen | Karlsruhe Institute of Technology (KIT) |
Keywords: Industrial Robots, Mechanism Design, Motion Control
Abstract: The task-specific optimization of robotic systems has since the inception of the field been divided into the optimization of the robot and the optimization of the layout of its workstations. In this letter, we argue that these two problems are interdependent and should be treated as such. To this end, we present a unified problem formulation that enables for the simultaneous optimization of both the robot kinematics and the workstation layout. We demonstrate the effectiveness of our approach by jointly optimizing a robotic milling system. To compare our approach to the state of the art, we optimize the robot's kinematics and layout separately. The results show that our approach outperforms the state of the art and that simultaneous optimization leads to up to eight times better solutions.
|
|
09:00-10:00, Paper WePI2T1.12 | |
Soft Task Planning with Hierarchical Temporal Logic Specifications |
|
Chen, Ziyang | University of Science and Technology of China |
Zhou, Zhangli | University of Science and Technology of China |
Li, Lin | University of Science and Technology of China |
Kan, Zhen | University of Science and Technology of China |
Keywords: Formal Methods in Robotics and Automation, Task and Motion Planning
Abstract: This works exploits soft constraints in linear temporal logic task planning to enhance the agent’s capability in handling potentially conflicting or even infeasible tasks. Different from most existing works that focus on sticking to the original plan and trying to find a relaxed plan if the workspace does not permit, we augment the soft constraints to represent possible candidate sub-tasks that can be selected to fulfill the global task. Specifically, a hierarchical temporal logic specification is developed to represent LTL tasks with soft constraints and preferences. The hierarchical structure consists of an outer and inner layer, where the outer layer uses co-safe LTL to specify the task-level specifications and the inner layer specifies the low- level task-related atomic propositions via soft constraints. To cope with the hierarchical temporal logic specification, a hierarchical iterative search (HIS) algorithm is developed, which incrementally searches feasible atomic propositions and automaton states, and returns a task plan with minimum cost. Rigorous analysis shows that HIS based planning is feasible (i.e., the generated plan is applicable and satisfactory with respect to the task specification) and optimal (i.e, with minimum cost). Extensive simulation demonstrates the effectiveness of the proposed soft task planning approach.
|
|
09:00-10:00, Paper WePI2T1.13 | |
Efficiently Obtaining Reachset Conformance for the Formal Analysis of Robotic Contact Tasks |
|
Tang, Chencheng | Technical University of Munich |
Althoff, Matthias | Technische Universität München |
Keywords: Formal Methods in Robotics and Automation
Abstract: Formal verification of robotic tasks requires a simple yet conformant model of the used robot. We present the first work on generating reachset conformant models for robotic contact tasks considering hybrid (mixed continuous and discrete) dynamics. Reachset conformance requires that the set of reachable outputs of the abstract model encloses all previous measurements to transfer safety properties. Aiming for industrial applications, we describe the system using a simple hybrid automaton with linear dynamics. We inject non-determinism into the continuous dynamics and the discrete transitions, and we optimally identify all model parameters together with the non-determinism required to capture the recorded behaviors. Using two 3-DOF robots, we show that our approach can effectively generate models to capture uncertainties in system behavior and substantially reduce the required testing effort in industrial applications.
|
|
09:00-10:00, Paper WePI2T1.14 | |
Stick Roller: Precise In-Hand Stick Rolling with a Sample-Efficient Tactile Model |
|
Du, Yipai | Hong Kong University of Science and Technology |
Zhou, Pokuang | Purdue University |
Wang, Michael Yu | Mywang@gbu.edu.cn |
Lian, Wenzhao | Google X |
She, Yu | Purdue University |
Keywords: Force and Tactile Sensing, Contact Modeling, In-Hand Manipulation
Abstract: In-hand manipulation is challenging in robotics due to the intricate contact dynamics and high degrees of control freedom. Precise manipulation with high accuracy often requires tactile perception, which adds further complexity to the system. Despite the challenges in perception and control, the rolling stick problem is an essential and practical motion primitive with many demanding industrial applications. This work aims to learn the high-resolution tactile dynamics of the rolling stick. Specifically, we try manipulating a small stick using the Allegro hand equipped with the Digit vision-based tactile sensor. The learning framework includes an action filtering module, tactile perception module, and learning with uncertainty module, all designed to operate in low data regimes. With only 2.3% amount of data and 5.7% model complexity of previous similar work, our learned contact dynamics model achieves better grasp stability, sub-millimeter precision, and promising zero-shot generalizability across novel objects. The proposed framework demonstrates the potential for precise in-hand manipulation with tactile feedback on real hardware. The project source code is available at: https://github.com/duyipai/Allegro Digit.
|
|
09:00-10:00, Paper WePI2T1.15 | |
Robotic Measurement for Electrical Property of Polymers by Force-Sensing Robot Toward Materials Lab-Automation |
|
Asano, Yuki | The University of Tokyo |
Okada, Kei | The University of Tokyo |
Shiomi, Junichiro | University of Tokyo |
Keywords: Engineering for Robotic Systems, Software-Hardware Integration for Robot Systems, Force and Tactile Sensing
Abstract: With the background of research on materials laboratory automation, this study aims to construct an automation system for measuring dielectric property, which is an electrical property of materials. The automation system is composed of the combination of a manipulation by a forcesensing robot and a control system for the measurement instrument. As challenges for the automation system, we worked on stabilizing a polymer film placement during insertion into the measurement instrument, implementing a communication control system between different platforms, and constructing a polymer film transfer environment. In the measurement experiment using the automation system, it was confirmed that the dielectric properties could be measured as well as that of a human.
|
|
09:00-10:00, Paper WePI2T1.16 | |
Lang2LTL-2: Grounding Spatiotemporal Navigation Commands Using Large Language and Vision-Language Models |
|
Liu, Jason Xinyu | Brown University |
Shah, Ankit | Brown University |
Konidaris, George | Brown University |
Tellex, Stefanie | Brown |
Paulius, David | Brown University |
Keywords: Formal Methods in Robotics and Automation, AI-Enabled Robotics, AI-Based Methods
Abstract: Grounding spatiotemporal navigation commands to structured task specifications enables autonomous robots to understand a broad range of natural language and solve long-horizon tasks with safety guarantees. Prior works mostly focus on grounding spatial or temporally extended language for robots. We propose Lang2LTL-2, a modular system that leverages pretrained large language and vision-language models and multimodal semantic information to ground spatiotemporal navigation commands in novel city-scaled environments without retraining. Lang2LTL-2 achieves 93.53% language grounding accuracy on a dataset of 21,780 semantically diverse natural language commands in unseen environments. We run an ablation study to validate the need for different modalities. We also show that a physical robot equipped with the same system without modification can execute 50 semantically diverse natural language commands in both indoor and outdoor environments.
|
|
09:00-10:00, Paper WePI2T1.17 | |
Scheduling of Robotic Cellular Manufacturing Systems with Timed Petri Nets and Reinforcement Learning |
|
Yao, ZhuTao | Nanjing University of Sci & Tech |
Huang, Bo | Nanjing University of Science and Technology |
Lv, Jianyong | Nanjing University of Science and Technology |
Lu, Xiaoyu | Nanjing University of Science and Technology |
Cui, MeiJi | Nanjing University of Sciencen and Technology |
Yu, ShaoHua | Nanjing University of Sciencen and Technology |
Keywords: Planning, Scheduling and Coordination, Petri Nets for Automation Control, Intelligent and Flexible Manufacturing
Abstract: Scheduling of robotic cellular manufacturing (RCM) systems belongs to NP-hard problems. In this paper, we propose a Petri-net-based Q-learning scheduling method to efficiently schedule RCM systems. First, we use generalized and place-timed Petri nets to model RCM systems, since they can naturally and concisely model system structures, such as conflict, concurrency, and synchronization. Then, we formulate a reinforcement learning method with a sparse Q-table to evaluate state-transition pairs of the net's reachability graph. It uses the negative transition firing time as a reward for an action selection and a large penalty for any encountered deadlock, and it balances the state exploration the experience exploitation using a dynamic epsilon-greedy policy to update the state values with an accumulative reward. It provides a new method to efficiently schedule such systems based on Petri nets. Some benchmark RCM systems are tested with the proposed method and popular PN-based online dispatching rules, such as FIFO and SRPT. Simulation results indicate that the designed method can schedule RCM systems as quickly as the online dispatching rules while outperforming them in terms of result quality.
|
|
WePI2T2 |
Room 2 |
Robotics in Healthcare I |
Teaser Session |
Chair: Alambeigi, Farshid | University of Texas at Austin |
Co-Chair: Tavakoli, Mahdi | University of Alberta |
|
09:00-10:00, Paper WePI2T2.1 | |
A Feasibility Study of a Soft, Low-Cost, 6-Axis Load Cell for Haptics |
|
Veliky, Madison | Vanderbilt University |
Johnston, Garrison | Vanderbilt University |
Yildiz, Ahmet | Vanderbilt University |
Simaan, Nabil | Vanderbilt University |
Keywords: Force and Tactile Sensing, Haptics and Haptic Interfaces, Surgical Robotics: Laparoscopy
Abstract: Haptic devices have shown to be valuable in supplementing surgical training, especially when providing haptic feedback based on user performance metrics such as wrench applied by the user on the tool. However, current 6-axis force/torque sensors are prohibitively expensive. This paper presents the design and calibration of a low-cost, six-axis force/torque sensor specially designed for laparoscopic haptic training applications. The proposed design uses Hall-effect sensors to measure the change in the position of magnets embedded in a silicone layer that results from an applied wrench to the device. Preliminary experimental validation demonstrates that these sensors can achieve an accuracy of 0.45 N and 0.014 Nm, and a theoretical XY range of +/-50N, Z range of +/-20N, and torque range of +/-0.2Nm. This study indicates that the proposed low-cost 6-axis force/torque sensor can accurately measure user force and provide useful feedback during laparoscopic training on a haptic device.
|
|
09:00-10:00, Paper WePI2T2.2 | |
Dung Beetle Optimizer-Based High-Precision Localization for Magnetic-Controlled Capsule Robot |
|
Zeng, Zijin | Beihang University |
Wang, Fengwu | Beihang University |
Li, Chan | Beihang University |
Tan, Menglu | Beihang University |
Wang, Shengyuan | Beihang University |
Feng, Lin | Beihang University |
Keywords: Medical Robots and Systems
Abstract: As a medical microrobot, magnetic-controlled capsule robots (MCRs) are pivotal in internal diagnostics and therapeutic interventions. Achieving high-precision localization of MCRs is essential for the successful execution of medical procedures. This paper introduces a novel Dung Beetle Optimizer (DBO)-based localization method for MCR, demonstrating high localization accuracy and flexibility in static magnetic field environments and under the control of existing magnetic control systems. With the aid of an FPGA-based parallel measurement system, it can effectively eliminate measurement distortion. The average position and orientation errors could achieve 0.53 mm and 0.60° when performing 600 iterations per computation, and further increasing the number of iterations reduces the errors, which is superior to existing methods. Experimental validations underscore the method’s robust performance and compatibility with existing magnetic control systems.
|
|
09:00-10:00, Paper WePI2T2.3 | |
3D Ultrasound Image Acquisition and Diagnostic Analysis of the Common Carotid Artery with a Portable Robotic Device |
|
Tan, Longyue | Institute of Automation, Chinese Academy of Sciences |
Deng, Zhaokun | Institute of Automation, Chinese Academy of Sciences |
Hao, Mingrui | Institute of Automation, Chinese Academy of Sciences |
Zhang, Pengcheng | Institute of Automation, Chinese Academy of Sciences |
Hou, Xilong | Centre for Artificial Intelligence and Robotics, Hong Kong Insti |
Chen, Chen | Institute of Automation, Chinese Academy of Sciences |
Gu, Xiaolin | Lingshu Medical Company |
Zhou, Xiao-Hu | Institute of Automation, Chinese Academy of Sciences |
Hou, Zeng-Guang | Chinese Academy of Science |
Wang, Shuangyi | Chinese Academy of Sciences |
Keywords: Medical Robots and Systems, Computer Vision for Medical Robotics, Object Detection, Segmentation and Categorization
Abstract: Ultrasound (US) imaging of the carotid artery (CA) is a non-invasive diagnostic tool widely used in the medical field to assess the condition of the carotid artery, thereby predicting the risk of cardiovascular and cerebrovascular diseases. However, implementing this method in primary healthcare can be challenging due to the requirement for professionally trained sonographers. With the adoption of US robotic devices,the probe pose can be acquired while scanning, offering the possibility for 3D reconstruction and providing analyses that are not dependent on operator experience. This article introduces a method to semi-automatically acquire serialized ultrasound images of the common carotid artery (CCA). The method involves a specially designed robotic device built with a 6-RSU parallel mechanism, which is controlled according to robot pos, force sensor data and synchronous ultrasound images. To validate the images acquired, a method is proposed to segment the intima-media of CCA and calculate the intima-media thickness (IMT), which is a key indicator for cerebrovascular events prediction. After that, we propose an algorithm to reconstruct CCA into a 3D voxel with patient movement and cardiac cycle compensated, and a longitudinal view US image of CCA can be resliced from the voxel. The methods are tested on human subjects and the results indicate that the system and workflow can provide both quantitative and qualitative information of CCA for further diagnosis.
|
|
09:00-10:00, Paper WePI2T2.4 | |
Robot-Enabled Machine Learning-Based Diagnosis of Gastric Cancer Polyps Using Partial Surface Tactile Imaging |
|
Kapuria, Siddhartha | University of Texas at Austin |
Bonyun, Jeff | University of Texas at Austin |
Kulkarni, Yash | The University of Texas at Austin |
Ikoma, Naruhiko | The University of Texas MD Anderson Cancer Center |
Chinchali, Sandeep | The University of Texas at Austin |
Alambeigi, Farshid | University of Texas at Austin |
Keywords: Medical Robots and Systems, Computer Vision for Medical Robotics, Soft Sensors and Actuators
Abstract: In this paper, to collectively address the existing limitations on endoscopic diagnosis of Advanced Gastric Cancer (AGC) Tumors, for the first time, we propose (i) utilization and evaluation of our recently developed Vision-based Tactile Sensor (VTS), and (ii) a complementary Machine Learning (ML) algorithm for classifying tumors using their textural features. Leveraging a seven DoF robotic manipulator and unique custom-designed and additively-manufactured realistic AGC tumor phantoms, we demonstrated the advantages of automated data collection using the VTS addressing the problem of data scarcity and biases encountered in traditional ML-based approaches. Our synthetic-data-trained ML model was successfully evaluated and compared with traditional ML models utilizing various statistical metrics even under mixed morphological characteristics and partial sensor contact.
|
|
09:00-10:00, Paper WePI2T2.5 | |
Development of a Low Pressure Pouch Sensor for Force Measurement in Colonoscopy Procedures |
|
Borvorntanajanya, Korn | Imperial College London |
Ahmed, Jabed F | Department of Surgery & Cancer, Imperial College London |
Runciman, Mark | Imperial College London |
Franco, Enrico | Imperial College London |
Patel, Nisha | Imperial College London, Department of Surgery and Cancer |
Rodriguez y Baena, Ferdinando | Imperial College, London, UK |
Keywords: Medical Robots and Systems, Soft Sensors and Actuators, Force and Tactile Sensing
Abstract: This paper presents a novel pneumatic pouch sensor, designed to mount on a colonoscope, that can effectively estimate the contact forces with the environment. The pouch sensor was designed to maximize the sensing range, and it was fabricated using a 2D laser welding technique from our track record. A flow compensation (FC) algorithm was introduced to improve the accuracy of the sensor in the presence of static load. The proposed system can reliably measure external forces up to 9.5 N with high repeatability. The system allows discriminating between different levels of force which are typically associated with increasing patient discomfort in colonoscopy: low (0-4 N), medium (4-6 N), and high (>6 N). This system achieves over 80% accuracy in comparison to the ground truth under steady state conditions (P ≤ 0.05) and maintains over 68% accuracy in dynamic scenarios. Index Terms—Force measurement, Pneumatic sensor, Colonoscopy.
|
|
09:00-10:00, Paper WePI2T2.6 | |
Thermal Ablation Therapy Control with Tissue Necrosis-Driven Temperature Feedback Enabled by Neural State Space Model with Extended Kalman Filter |
|
Murakami, Ryo | Worcester Polytechnic Institute |
Mori, Satoshi | NA |
Zhang, Haichong | Worcester Polytechnic Institute |
Keywords: Medical Robots and Systems, Model Learning for Control, Computer Vision for Medical Robotics
Abstract: Thermal ablation therapy is a major minimally invasive treatment. One of the challenges is that the targeted region and therapeutic progression are often invisible to clinicians, requiring feedback provided in numerical information or imaging. Several emerging imaging modalities offer visualization of the ablation-induced necrosis formation; however, relying solely on necrosis monitoring can result in tissue overheating and endangering patients. Some of the necrosis monitoring modalities are known for their capabilities in temperature sensing, but the principles on which they are based have several limitations, such as sensitivity to the tissue motion and their environment. In this study, we propose a necrosis progression-based temperature estimation technique as an added safety feature for avoiding overheating. This model-based method does not require additional sensing hardware. It is designed to work as an independent estimator or a complimentary estimation component with other thermometers for improved robustness. For this objective, the Neural State Space model is used to approximate the ablation therapy, whose theoretical models involve nonlinear partial differential equations. Then, the Extended Kalman Filter is designed based on the model. The simulation study shows the estimation module robustly estimates the tissue temperature under several types of noise. The maximum estimation error observed before terminating ablation was around 1 K, and the desired safety feature was successfully demonstrated. The estimator is expected to be used in a variety of necrosis monitoring modalities to guarantee more precise and safer treatment. More ambitiously, the architecture with the Neural State Space model and Extended Kalman Filter is generalizable to other medical/biological procedures involving nonlinear and patient/environment-specific physics and even to procedures having no reliable theoretical models.
|
|
09:00-10:00, Paper WePI2T2.7 | |
Towards Robotised Palpation for Cancer Detection through Online Tissue Viscoelastic Characterisation with a Collaborative Robotic Arm |
|
Beber, Luca | University of Trento |
Lamon, Edoardo | University of Trento |
Moretti, Giacomo | University of Trento |
Fontanelli, Daniele | University of Trento |
Saveriano, Matteo | University of Trento |
Palopoli, Luigi | University of Trento |
Keywords: Medical Robots and Systems, Force and Tactile Sensing, Compliance and Impedance Control
Abstract: This paper introduces a new method for online estimating the penetration of the end-effector and the viscoelastic properties of a soft body, by means of palpation exams using a collaborative robotic arm. The estimator is based on the dimensionality reduction method that simplifies the nonlinear Hunt-Crossley model. In addition, in our algorithm, the model parameters can be found without a force sensor, leveraging only the robotic arm controller data. To achieve online estimation, an extended Kalman filter is employed, which embeds the dynamic contact model. The algorithm is tested with various types of silicone, a material that resembles biological tissues, including samples with hard intrusions to simulate cancerous cells within a softer tissue. The results indicate that this technique can accurately determine the model parameters and estimate the penetration of the end-effector into the soft body. These promising preliminary results demonstrate the potential for robots to serve as an effective tool for early-stage cancer diagnostics.
|
|
09:00-10:00, Paper WePI2T2.8 | |
Wirelessly Actuated Rotation-Free Magnetic Motor |
|
Harman, Umur Ulas | University of Sheffield |
Hafez, Ahmed | University of Sheffield |
Duffield, Cameron | The University of Sheffield |
Zhao, Zihan | The University of Sheffield |
Dixon, Luke | University of Sheffield |
Rus, Daniela | MIT |
Miyashita, Shuhei | University of Sheffield |
Keywords: Medical Robots and Systems, Mechanism Design, Micro/Nano Robots
Abstract: This paper addresses the challenge of actuating millimetre-sized motors, which are wirelessly driven by external magnetic fields. Traditional approaches, relying on rotating magnetic fields, often inadvertently cause the entire robot – especially if it’s small and lightweight – to rotate, instead of a specified shaft in the motor. To overcome this issue, our study introduces a novel mechanism that leverages symmetrically configured magnetic motors to cancel out the torques, thus preventing unwanted rotation of the robot. This is achieved by utilizing a magnetic field along a single axis to induce rotational movement. The design features two millimetre-sized rotating magnets that interact to achieve a 90 degrees rotation, complemented by an external magnetic field that accomplishes the remaining 270 degrees, thus completing a full rotation. Furthermore, we demonstrate that applying a perpendicularly oriented magnetic field can inversely affect the motor’s rotation direction. A proof-of-concept experiment employing this mechanism successfully actuated a gripper in a water tank while it is free-floating, showcasing its potential for enhancing robotic applications at the sub-centimeter scale, where the small net torque of a miniature motor is essential.
|
|
09:00-10:00, Paper WePI2T2.9 | |
Development of Five-Finger Hand-Type Robotic Forceps for Laparoscopic Gastrointestinal Surgery |
|
Wakamatsu, Hiroyuki | Yokohama National University |
Kobayashi, Ibuki | Yokohama National University |
Nagase, Yuya | Yokohama National University |
Kato, Ryu | Yokohama National University |
Mukai, Masaya | Tokai University |
Keywords: Medical Robots and Systems, Surgical Robotics: Laparoscopy
Abstract: Compared to open surgery, in laparoscopic surgery, with its smaller incisions, is superior in terms of less invasiveness, appearance, and postoperative hospital stay. However, the forceps used in laparoscopic surgery have small end effectors, making it difficult to manipulate large organs with just one forceps and resulting in low surgical efficiency. In addressing these issues, research is being conducted to robotic hands referencing Hand-Assisted Laparoscopic Surgery (HALS). In previous research, the posture when using proposed instrument put great physical strain on the surgeon. Therefore, in this study, we aim to develop a surgical instrument that minimizes the physical strain on the operating surgeon, allowing insertion through small incisions and handling large organs. We propose the instrument with five-fingered robotic hand and gun grip-type input device. First, the five-fingered robotic hand achieve three actions, pinch, grasp, and exclusion, and is insertable through an incision of less than 20-mm with folding. Second, the surgeon holds the input device by grasping the grip part with the ring finger and little finger and operating the robotic hand with the thumb, index finger, and middle finger. In addition, the surgeon uses his arm to manipulate the wrist part by changing the angle of the joint of the input device. Because of using the angle of the surgeon’s fingers for control input the surgeon can intuitively manipulate. We conducted a comparative experiment between the proposed robotic forceps and conventional forceps for sigmoidectomy, a typical surgery targeting the digestive organ. In this experiment, we verified whether the method with the proposed robotic forceps would shorten the surgical time and investigated the degree of burden on the posture when using the proposed robotic forceps. The results of an experiment simulating sigmoidectomy showed that the proposed instrument could shorten the surgical time. The reduced working time is likely because there is no longer a need to give instructions to the assistant or cooperate with the assistant. Moreover, the task completion time for moving the instrument and grasping was also reduced. When grasping and pulling the large intestine and applying tension to the membrane tissue, it must be grasped with contact at multiple points. However, since conventional forceps can only grasp one point, it is necessary to use two forceps, and the movement of the instrument and the grasping action must be performed twice. On the other hand, since the proposed robotic forceps has a large surface and multiple fingers, achieving multi-point contact with a single movement is possible, and may have reduced the task completion time. In addition, we confirmed that the physical burden while using the robotic forceps was comparable to that of conventional forceps. This demonstrated the potential of robotic forceps to improve operative efficiency with a little physical strain.
|
|
09:00-10:00, Paper WePI2T2.10 | |
A Novel Approach for Precise Tissue Tracking in Breast Lumpectomy |
|
Aliyari, Yeganeh | University of Alberta |
Afshar, Mehrnoosh | University of Alberta |
Wiebe, Ericka | University of Alberta |
Peiris, Lashan | University of Alberta |
Tavakoli, Mahdi | University of Alberta |
Keywords: Medical Robots and Systems, AI-Based Methods, Simulation and Animation
Abstract: Breast cancer is one of the most common cancers in the female population and can be treated surgically in the early stages with a lumpectomy technique. In the context of breast lumpectomy procedures, accurately tracking tumours presents a critical challenge worsened by various sources of anatomical deformations, including breathing, tissue cutting, and ultrasound probe pressure. To address this, we explore how a realistic tissue deformation simulator can enhance the precision of locating internal targets by accurately assessing the deformation applied to a preoperative model of the breast, considering the distinct mechanical properties of both the breast tissue and the tumour within it. Our method uses advanced artificial intelligence techniques by combining a generative variation autoencoder (GNN-VAE) and an updating method called ensemble smoother with multiple data assimilation (ES-MDA), creating a dynamic model based exclusively on surface node data to update all nodes within the tissue. By leveraging a realistic tissue deformation simulator, our approach uses breast surface tracking to infer full tissue deformations. This makes the method compatible with various simulation tools and suitable for tissues with complex properties. The results demonstrate that the accuracy of the trained network on training data is 0.014 cm, and on testing data is 0.026 cm which shows precision in tumour localization, significantly improving upon current methods. This innovation has the potential to enhance patient outcomes by making breast cancer surgery safer, less invasive, and more efficient.
|
|
09:00-10:00, Paper WePI2T2.11 | |
Portable Robot for Needle Insertion Assistance to Femoral Artery |
|
Cheng, Zhuoqi | University of Southern Denmark |
Mány, Bence | Neurescue ApS |
Jørgensen, Kasper Balsby | University of Southern Denmark |
An, Siheon | University |
Jensen, Marcus Leander | Neurescue ApS |
Thulstrup, Richard | Neurescue ApS |
Frost, Habib | Neurescue ApS |
Savarimuthu, Thiusius Rajeeth | University of Southern Denmark |
Huldt, Olof | Neurescue ApS |
Keywords: Medical Robots and Systems, Robotics and Automation in Life Sciences, Sensor-based Control
Abstract: Femoral artery access is a common and critical procedure for various cardiovascular interventions. Although it is a time critical operation, accessing the Common Femoral Artery (CFA) typically requires expertise found in specialized medical settings. The necessity for specialized personnel or transport to equipped facilities can lead to delays, potentially exacerbating patient outcomes. To address this challenge, a portable and cost-effective robotic device that autonomously localizes a CFA and precisely positions a needle guide is developed. Through the needle guide, needle can be quickly and accurately inserted into the artery even by non-specialist physicians. Different from the conventional B-mode ultrasound guided procedure, the proposed robotic solution utilizes a Doppler transducer for detecting the arterial location and employs a single M-mode transducer for depth measurement. A series of experiments are designed and conducted to validate the system's feasibility, achieving high accuracy within 2mm, rapid processing within 1.5min, and a 100% success rate, thus proving the system's efficacy. These results convince us for further refinement of the system and support its evaluation in animal studies.
|
|
09:00-10:00, Paper WePI2T2.12 | |
The Design of a Sensorized Laryngoscope Training System for Pediatric Intubation |
|
Hou, Ningzhe | University of Oxford |
He, Liang | University of Oxford |
Albini, Alessandro | University of Oxford |
Halamek, Louis | Stanford University |
Maiolino, Perla | University of Oxford |
Keywords: Medical Robots and Systems, Health Care Management, Learning from Demonstration
Abstract: Intubation is essential for ventilating critically ill patients and involves precise maneuvering of a laryngoscope to place an endotracheal tube (ETT). However, training for this procedure is fraught with challenges. Traditional methods, relying on manikins or training with a single sensing modality, fail to adequately convey important interaction information. Such challenge is heightened in pediatric intubation due to anatomical differences that demand greater precision. Furthermore, integrating multiple sensing modalities into a laryngoscope without changing its size presents a significant design challenge, critical for maintaining realistic training scenarios. To overcome these obstacles, we developed a sensorized laryngoscope system equipped with a force-torque sensor, a 9-axis inertial measurement unit (IMU), and tactile sensors. This system, validated in a preliminary user study, provides online feedback on angles, forces, and grip strength through an online feedback GUI. Adopting a learning-by-demonstration approach with both experts and novices, the initial validation confirmed the system's potential, paving the way for expanded trials with more participants.
|
|
09:00-10:00, Paper WePI2T2.13 | |
Enhancing Surgical Precision in Autonomous Robotic Incisions Via Physics-Based Tissue Cutting Simulation |
|
Ge, Jiawei | Johns Hopkins University |
Kilmer, Ethan | Johns Hopkins University |
Mady, Leila | Johns Hopkins University |
Opfermann, Justin | Johns Hopkins University |
Krieger, Axel | Johns Hopkins University |
Keywords: Medical Robots and Systems, Simulation and Animation, Surgical Robotics: Planning
Abstract: In soft tissue surgeries, such as tumor resections, achieving precision is of utmost importance. Surgeons conventionally achieve this precision through intraoperative adjustments to the cutting plan, responding to deformations from tool-tissue interactions. This study examines the integration of physics-based tissue cutting simulations into autonomous robotic surgery to preoperatively predict and compensate for such deformations, aiming to improve surgical precision and reduce the necessity for dynamic adjustments during autonomous surgeries. This study adapts a real-to-sim-to-real workflow. Initially, the Autonomous System for Tumor Resection (ASTR) was employed to evaluate its accuracy in performing preoperatively intended incisions along the irregular contours of porcine tongue pseudotumors. Following this, a finite element analysis-based simulation, utilizing the Simulation Open Framework Architecture (SOFA), was developed and tuned to accurately mimic these tissue and incision interactions. Insights gained from this simulation were applied to refine the robot’s path planning, ensuring a closer alignment of actual incisions with the initially intended surgical plan. The efficacy of this approach was validated by comparing surface incision precision on ex vivo porcine tongues, with the average absolute error reducing from 1.73mm to 1.46mm after applying simulation-driven path adjustments (p<0.001). Additionally, our method not only demonstrated improvements in maintaining the intended cutting shapes and locations, with shape matching scores using Hu moments enhancing from 0.10 to 0.06 and centroid shifts decreasing from 2.09mm to 1.33mm, but it also potentially reduced the likelihood of adverse oncologic outcomes by preventing clinically suggested excessively close margins of 2.2mm. This feasibility study suggests that merging physics-based cutting simulations with autonomous robotic surgery could potentially lead to more accurate incisions.
|
|
09:00-10:00, Paper WePI2T2.14 | |
Head-Mounted Hydraulic Needle Driver for Targeted Interventions in Neurosurger |
|
Fang, Zhiwei | The Chinese University of Hong Kong |
Xu, Chao | The Chinese University of Hong Kong |
Gao, Huxin | National University of Singapore |
Chan, Tat-Ming | Prince of Wales Hospital |
Yuan, Wu | The Chinese University of Hong Kong |
Ren, Hongliang | Chinese Univ Hong Kong (CUHK) & National Univ Singapore(NUS) |
Keywords: Medical Robots and Systems, Surgical Robotics: Steerable Catheters/Needles
Abstract: Needle interventions are crucial in neurosurgery, requiring high precision and stability. This paper presents a 5-DoF head-mounted hydraulic needle robot designed for accurate and targeted needle insertion and neuroimaging in the deep brain. The robot is compact and lightweight by utilizing a hydraulic pipe transmission to connect the needle driver and actuator. The syringe pistons serve as the actuator and executor, enabling synchronized motion, minimal hysteresis, and high-accuracy insertion. The hydraulic transmission system exhibits hysteresis of less than 0.8 mm, with bidirectional insertion accuracy of approximately 0.05 mm. The resulting needle driver features a compact structure measuring 48 mm × 25 mm ×9mm, accompanied by a 70-mm-long needle guide. The needle driver is mainly 3D printed, while the hydraulic transmission ensures full compatibility with magnetic resonance imaging (MRI) by isolating all electromagnetic parts from the executor. This compact and lightweight robot-assisted needle intervention system significantly enhances the safety, accuracy, and effectiveness of deep-brain neuroimaging. The feasibility of precise positioning and insertion is further demonstrated by deploying an optical coherence tomography (OCT) microneedle in a rat brain.
|
|
09:00-10:00, Paper WePI2T2.15 | |
CathFlow: Self-Supervised Segmentation of Catheters in Interventional Ultrasound Using Optical Flow and Transformers |
|
Ranne, Alex | Imperial College London |
Kuang, Liming | Techinical University of Munich |
Velikova, Yordanka | TU Munich |
Navab, Nassir | TU Munich |
Rodriguez y Baena, Ferdinando | Imperial College, London, UK |
Keywords: Medical Robots and Systems, Object Detection, Segmentation and Categorization, Deep Learning Methods
Abstract: In minimally invasive endovascular procedures, contrast-enhanced angiography remains the most robust imag- ing technique. However, it is at the expense of the patient and clinician’s health due to prolonged radiation exposure. As an alternative, interventional ultrasound has notable benefits such as being radiation-free, fast to deploy, and having a small footprint in the operating room. Yet, ultrasound is hard to interpret, and highly prone to artifacts and noise. Additionally, interventional radiologists must undergo extensive training before they become qualified to diagnose and treat patients effectively, leading to a shortage of staff, and a lack of open-source datasets. In this work, we seek to address both problems by introducing a self-supervised deep learning architecture to segment catheters in longitudinal ultrasound images, without demanding any labeled data. The network architecture builds upon AiAReSeg, a segmentation transformer built with the Attention in Attention mechanism, and is capable of learning feature changes across time and space. To facilitate training, we used synthetic ultrasound data based on physics-driven catheter insertion simulations, and translated the data into a unique CT-Ultrasound common domain, CACTUSS, to improve the segmentation performance. We generated ground truth segmentation masks by computing the optical flow between adjacent frames using FlowNet2, and performed thresholding to obtain a binary map estimate. Finally, we validated our model on a test dataset, consisting of unseen synthetic data and images collected from silicon aorta phantoms, thus demonstrating its potential for applications to clinical data in the future.
|
|
09:00-10:00, Paper WePI2T2.16 | |
Seven Benefits of Using Series Elastic Actuators in the Design of an Affordable, Simple Controlled, and Functional Prosthetic Hand |
|
Koochakzadeh, Erfan | University of Tehran |
Kargar, Alireza | University of Tehran |
Sattari, Parsa | University of Tehran |
Ravanshid, Diba | University of Tehran |
Nasiri, Rezvan | University of Tehran |
Keywords: Prosthetics and Exoskeletons, Compliant Joints and Mechanisms, Grasping
Abstract: This paper highlights the benefits of using series elastic actuators (SEA) in designing a cost-efficient, easily controlled, and functional prosthetic hand. The designed 3D-printed hand uses only two motors in an antagonistic configuration, transferring power to the fingers via pulleys, cables, and springs; i.e., the motors are in an SEA configuration with the load/fingers. In the designed underactuated prosthetic hand, the thumb is adjustable for various tasks, and the optimization of pulley diameters ensures synchronized finger movement during hand flexion and extension. Thanks to the SEA configuration of the motors and fingers, simple position control of the motor enables features like hand position control, morphological grasp, force control, impedance control, slippage detection, safe interaction, and efficient grasp. An extensive set of experiments has been conducted to evaluate the designed prosthetic hand's performance. The experiments confirm the hand's satisfactory performance while also highlighting the importance of improving the proposed design in different aspects. To attain better position control and morphological grasp, minimizing the cable-body and joint friction is recommended. A higher resolution of the current/torque sensor is needed for the precise force control and slippage detection. Finally, a motor brake system is required to achieve efficient grasping.
|
|
WePI2T3 |
Room 3 |
Social HRI I |
Teaser Session |
Co-Chair: Alami, Rachid | CNRS |
|
09:00-10:00, Paper WePI2T3.1 | |
Autonomous Storytelling for Social Robot with Human-Centered Reinforcement Learning |
|
Zhang, Lei | Ocean University of China |
Zheng, Chuanxiong | Ocean University of China |
Wang, Hui | Ocean University of China |
Gomez, Randy | Honda Research Institute Japan Co., Ltd |
Nichols, Eric | Honda Research Institute Japan |
Li, Guangliang | Ocean University of China |
Keywords: Human-Centered Automation, Human-Centered Robotics, Emotional Robotics
Abstract: Social robots are gradually integrating into human's daily lives. Storytelling by social robots could bring a different experience to users through non-verbal and emotional capabilities compared to text-only one. However, as user needs and preferences over storytelling might change over time during long-term interaction with social robots, it is important for social robots to learn from social interactions with human users in real-time. In this paper, we propose to allow our social robot Haru to learn personalized storytelling styles for different human user's emotional states via human-centered reinforcement learning using the reward provided and delivered by directly interaction with the user explicitly. Results of our user study show that Haru can learn to adapt its storytelling style for detected human emotional states in a few number of interactions, and was perceived to have a better storytelling performance, experience and impact than a neutral one.
|
|
09:00-10:00, Paper WePI2T3.2 | |
Understanding Robot Minds: Leveraging Machine Teaching for Transparent Human-Robot Collaboration across Diverse Groups |
|
Jayaraman, Suresh Kumaar | Carnegie Mellon University |
Simmons, Reid | Carnegie Mellon University |
Steinfeld, Aaron | Carnegie Mellon University |
Admoni, Henny | Carnegie Mellon University |
Keywords: Human-Centered Robotics, Human-Robot Teaming, Modeling and Simulating Humans
Abstract: In this work, we aim to improve transparency and efficacy in human-robot collaboration by developing machine teaching algorithms suitable for groups with varied learning capabilities. While previous approaches focused on tailored approaches for teaching individuals, our method teaches teams with various compositions of diverse learners using team belief representations. We investigate various group teaching strategies, such as focusing on individual beliefs or the group's collective beliefs, and assess their impact on learning robot policies for different team compositions. Our findings reveal that team belief strategies produce less variation in learning duration and better accommodate diverse teams compared to individual belief strategies, suggesting their suitability in mixed proficiency settings with limited resources. In contrast, individual belief strategies provide a more uniform knowledge level, particularly effective for homogeneously inexperienced groups. Our study indicates that the effectiveness of the teaching strategy is significantly influenced by team composition and learner proficiency, highlighting the importance of real-time assessment of learner proficiency and adapting teaching approaches based on learner proficiency for optimal teaching outcomes.
|
|
09:00-10:00, Paper WePI2T3.3 | |
Emotional Tandem Robots: How Different Robot Behaviors Affect Human Perception While Controlling a Mobile Robot |
|
Kaduk, Julian | University of Konstanz |
Weilbeer, Friederike | Universität Zu Lübeck |
Hamann, Heiko | University of Konstanz |
Keywords: Human-Robot Collaboration, Emotional Robotics, Multi-Robot Systems
Abstract: In human-robot interaction (HRI), we study how humans interact with robots, but also the effects of robot behavior on human perception and well-being. Especially, the influence on humans by tandem robots with one human-controlled and one autonomous robot or even semi-autonomous multi-robot systems is not yet fully understood. Here, we focus on a leader-follower scenario and study how emotionally expressive motion patterns of a small, mobile follower robot affect the perception of a human operator controlling the leading robot. We examined three distinct emotional behaviors for the follower compared to a neutral condition: angry, happy, and sad. We asked participants to maneuver the leader robot along a set path while experiencing each follower behavior in a randomized order. We identified a significant shift in subjective attention toward the follower with emotionally expressive behaviors compared to the neutral condition. For example, the angry behavior significantly heightened participant stress levels and was considered the least preferred behavior. The happy behavior was the most preferred and associated with increased excitement by the participants. Integrating the proposed behaviors in robots can profoundly influence the human operator's perceived attention, emotional state, and overall experience. These insights are valuable for future HRI tandem robot designs.
|
|
09:00-10:00, Paper WePI2T3.4 | |
Good Things Come in Threes: The Impact of Robot Responsiveness on Workload and Trust in Multi-User Human-Robot Collaboration |
|
Semeraro, Francesco | The University of Manchester |
Carberry, Jon | BAE Systems |
Leadbetter, James Hugo | BAE Systems Ltd |
Cangelosi, Angelo | University of Manchester |
Keywords: Human-Robot Collaboration, Acceptability and Trust
Abstract: Human-robot collaboration has the potential of unlocking new manufacturing paradigms thanks to the introduction of a robotic architecture in a production chain that involves human workers. A possible innovative declination of this is the use of collaborative robots to enable two workers to concurrently act on the same manufacturing target without causing mutual disturbances. By doing so, the efficiency of the process would be preserved while reducing the production times. This work designs a physical collaborative task that involves two users and one collaborative robot. The users act in the scenario in a concurrent way on the same target object, while the robot physically intervenes in the scene as a mediator by adjusting the position and orientation of the object to accommodate both users at the same time. Through this experimental setup, 78 apprentices and teachers of the BAE Systems Academy for Skills and Knowledge Centre were recruited to investigate the users' perception of the task workload and trust towards the robotic system. Specifically, they performed the same task under two experimental conditions, in which the robot responded to changes in the interaction in a reactive or timed way, respectively. The statistical analysis showed that a timed response of the robot was associated with lower perceived workload and higher predictability of the system.
|
|
09:00-10:00, Paper WePI2T3.5 | |
PhotoBot: Reference-Guided Interactive Photography Via Natural Language |
|
Limoyo, Oliver | University of Toronto |
Li, Jimmy | McGill University |
Rivkin, Dmitriy | None |
Kelly, Jonathan | University of Toronto |
Dudek, Gregory | McGill University |
Keywords: Natural Dialog for HRI, Computer Vision for Automation, Human Factors and Human-in-the-Loop
Abstract: We introduce PhotoBot, a framework for fully automated photo acquisition based on an interplay between high-level human language guidance and a robot photographer. We propose to communicate photography suggestions to the user via reference images that are selected from a curated gallery. We leverage a visual language model (VLM) and an object detector to characterize the reference images via textual descriptions and then use a large language model (LLM) to retrieve relevant reference images based on a user's language query through text-based reasoning. To correspond the reference image and the observed scene, we exploit pre-trained features from a vision transformer capable of capturing semantic similarity across marked appearance variations. Using these features, we compute suggested pose adjustments for an RGB-D camera by solving a perspective-n-point (PnP) problem. We demonstrate our approach using a manipulator equipped with a wrist camera. Our user studies show that photos taken by PhotoBot are often more aesthetically pleasing than those taken by users themselves, as measured by human feedback. We also show that PhotoBot can generalize to other reference sources such as paintings.
|
|
09:00-10:00, Paper WePI2T3.6 | |
Multimodal Coherent Explanation Generation of Robot Failures |
|
Pramanick, Pradip | University of Naples Federico II |
Rossi, Silvia | Universita' Di Napoli Federico II |
Keywords: Natural Dialog for HRI, Deep Learning Methods
Abstract: The explainability of a robot’s actions is crucial to its acceptance in social spaces. Explaining why a robot fails to complete a given task is particularly important for non-expert users to be aware of the robot's capabilities and limitations. So far, research on explaining robot failures has only considered generating textual explanations, even though several studies have shown the benefits of multimodal ones. However, a simple combination of multiple modalities may lead to semantic incoherence between the information across different modalities - a problem that is not well-studied. An incoherent multimodal explanation can be difficult to understand, and it may even become inconsistent with what the robot and the human observe and how they perform reasoning with the observations. Such inconsistencies may lead to wrong conclusions about the robot's capabilities. In this paper, we introduce an approach to generate coherent multimodal explanations by checking the logical coherence of explanations from different modalities, followed by refinements as required. We propose a classification approach for coherence assessment, where we evaluate if an explanation logically follows another. Our experiments suggest that fine-tuning a neural network that was pre-trained to recognize textual entailment, performs well for coherence assessment of multimodal explanations. Code & data: https://pradippramanick.github.io/coherent-explain/.
|
|
09:00-10:00, Paper WePI2T3.7 | |
Where and When Should the Teleoperated Avatar Look: Gaze Instruction Dataset for Enhanced Teleoperated Avatar Communication |
|
Hoshimure, Kenya | Osaka University |
Baba, Jun | CyberAgent, Inc |
Nakanishi, Junya | Osaka Univ |
Yoshikawa, Yuichiro | Osaka University |
Ishiguro, Hiroshi | Osaka University |
Keywords: Natural Dialog for HRI, Social HRI, Telerobotics and Teleoperation
Abstract: Effective teleoperated avatar communication requires expressing social behaviors. Gaze behavior is one of the crucial social behaviors and includes reflexive reactions to the avatar's surroundings and intentional responses to the operator's speech and actions. Teleoperated avatars must have their gaze behavior controlled according to situational changes in both the avatar's and operator's contexts. However, it is not clear how to adjust the avatar's gaze in response to changes in both situations. In this paper, we collect a dataset of gazing positions that the avatar is instructed to face, taking into account both avatar and operator situations, and annotation labels that represent both situations in detail. We then exploratorily analyze the ratio of gazing positions per situation through dynamic area-of-interest (AOI) analysis. Our analysis provides insights into determining the gaze behavior of teleoperated avatars.
|
|
09:00-10:00, Paper WePI2T3.8 | |
Empathetic Response Generation System: Enhancing Photo Reminiscence Chatbot with Emotional Context Analysis |
|
Herrera Ruiz, Alberto | National Taiwan University |
Qian, Xiaobei | National Taiwan University |
Fu, Li-Chen | National Taiwan University |
Keywords: Natural Dialog for HRI, Robot Companions, Cognitive Modeling
Abstract: Dementia affects 50 million people worldwide, underscoring the urgent need for effective interventions to enhance their well-being. While reminiscence intervention shows promise, its implementation is hindered by limited human resources, making machine-aided systems a viable automated solution for seamless photo-reminiscence sessions. In this paper, we introduce an empathetic response generation system specifically designed to enhance a question-only photo-reminiscence chatbot, with a focus on improving emotional context understanding and enhancing conversation engagement. We leverage Transformers to encode dialogue history, infer emotional states from user responses, and extract named entities. By combining template-based utterances with a retrieval chatbot, our system generates relevant and empathetic responses to user replies. Our system's effectiveness is validated through human evaluations using a Likert-like scale to assess engagement levels. The results demonstrate that our approach surpasses both the question-only system and other models from existing works, including retrieval and generated models. This highlights our system’s potential to enhance interactions and engagement, advancing technology-driven interventions for dementia that improve well-being and quality of life.
|
|
09:00-10:00, Paper WePI2T3.9 | |
OmniRace: 6D Hand Pose Estimation for Intuitive Guidance of Racing Drone |
|
Serpiva, Valerii | Skolkovo Institute of Science and Technology |
Fedoseev, Aleksey | Skolkovo Institute of Science AndTechnology |
Karaf, Sausar | Skolkovo Institute of Science and Technology |
Abdulkarim, Ali Alridha | Skolkovo Institute of Science and Technology |
Dzmitry, Tsetserukou | Skolkovo Institute of Science and Technology |
Keywords: Natural Dialog for HRI, Aerial Systems: Applications, Human Detection and Tracking
Abstract: This paper presents the OmniRace approach to controlling a racing drone with 6-degree of freedom (DoF) hand pose estimation and gesture recognition. To our knowledge, this is the first technology enabling low-level control of high-speed drones through gestures. OmniRace employs a gesture interface based on computer vision and a deep neural network to estimate 6-DoF hand pose. The advanced machine learning algorithm robustly interprets human gestures, allowing users to control drone motion intuitively. Real-time control tests validate the system's effectiveness and its potential to revolutionize drone racing and other applications. Experimental results conducted in simulation environment revealed that OmniRace allows the users to complite the UAV race track significantly (by 25.1%) faster and to decrease the length of the test drone path (from 102.9 to 83.7 m). Users preferred the gesture interface for attractiveness (1.57 UEQ score), hedonic quality (1.56 UEQ score), and lower perceived temporal demand (32.0 score in NASA-TLX), while noting the high efficiency (0.75 UEQ score) and low physical demand (19.0 score in NASA-TLX) of the baseline remote controller. The deep neural network attains an average accuracy of 99.75% when applied to both normalized datasets and raw datasets. OmniRace can potentially change the way humans interact with and navigate racing drones in dynamic and complex environments. The source code is available at https://github.com/SerValera/OmniRace.git.
|
|
09:00-10:00, Paper WePI2T3.10 | |
Investigating Behavioral and Cognitive Changes Induced by Autonomous Delivery Robots in Incidentally Copresent Persons |
|
Kim, Nayoung | KOREA, Korea Institute of Science and Technology (KIST) |
Kwak, Sonya Sona | Korea Institute of Science and Technology (KIST) |
Keywords: Social HRI, Physical Human-Robot Interaction
Abstract: Autonomous delivery robots (ADRs) encounter incidentally copresent persons (InCoPs) during their delivery journeys. Despite the potential for ADRs' behavior to influence the behavior and cognition of InCoPs, there is limited research on the interaction between ADRs and InCoPs. Therefore, in this study, we conducted a within-participants experiment (N=30) with a 3 (confederate types: humans vs. high anthropomorphism robots vs. low anthropomorphism robots) x 2 (jaywalking status: jaywalking vs. not jaywalking) design to investigate the impact of ADRs on InCoPs’ behavioral and cognitive changes induced by the social influence of ADRs. During the experiment, participants watched a video depicting interactions between ADRs and InCoPs at a crosswalk. Each participant was immersed in the video as an InCoP, instructed to make jaywalking decisions, and subsequently completed questionnaires. Results indicated that, behaviorally, participants displayed similar levels of conformity towards jaywalking behaviors across both the human and robot confederates. Cognitively, there were significant differences in morality based on the confederate types. Additionally, robots that refrained from jaywalking received more positive ratings in terms of morality and intention to use. This study confirms that ADRs have the capacity to induce conformity similar to humans and that the ethical behavior of ADRs can positively influence InCoPs' impressions and intention to use toward ADRs.
|
|
09:00-10:00, Paper WePI2T3.11 | |
Are Large Language Models Aligned with People's Social Intuitions for Human–Robot Interactions? |
|
Wachowiak, Lennart | King's College London |
Coles, Andrew | Kings College London |
Celiktutan, Oya | King's College London |
Canal, Gerard | King's College London |
Keywords: Social HRI, AI-Enabled Robotics, Human-Centered Robotics
Abstract: Large language models (LLMs) are increasingly used in robotics, especially for high-level action planning. Meanwhile, many robotics applications involve human supervisors or collaborators. Hence, it is crucial for LLMs to generate socially acceptable actions that align with people's preferences and values. In this work, we test whether LLMs capture people's intuitions about behavior judgments and communication preferences in human–robot interaction (HRI) scenarios. For evaluation, we reproduce three HRI user studies, comparing the output of LLMs with that of real participants. We find that GPT-4 strongly outperforms other models, generating answers that correlate strongly with users' answers in two studies — the first study dealing with selecting the most appropriate communicative act for a robot in various situations (r = 0.82), and the second with judging the desirability, intentionality, and surprisingness of behavior (r = 0.83). However, for the last study, testing whether people judge the behavior of robots and humans differently, no model achieves strong correlations. Moreover, we show that vision models fail to capture the essence of video stimuli and that LLMs tend to rate different communicative acts and behavior desirability higher than people.
|
|
09:00-10:00, Paper WePI2T3.12 | |
Belief-Aided Navigation Using Bayesian Reinforcement Learning for Avoiding Humans in Blind Spots |
|
Kim, Jinyeob | KyungHee University |
Daewon, Kwak | Kyunghee.uni |
Rim, Hyunwoo | Kyung Hee University |
Kim, Donghan | Kyung Hee University |
Keywords: Social HRI, Reinforcement Learning, Autonomous Vehicle Navigation
Abstract: Recent research on mobile robot navigation has focused on socially aware navigation in crowded environments. However, existing methods do not adequately account for human–robot interactions and demand accurate location information from omnidirectional sensors, rendering them unsuitable for practical applications. In response to this need, this study introduces a novel algorithm, BNBRL+, predicated on the partially observable Markov decision process framework to assess risks in unobservable areas and formulate movement strategies under uncertainty. BNBRL+ consolidates belief algorithms with Bayesian neural networks to probabilistically infer beliefs based on the positional data of humans. It further integrates the interactions between the robot, humans, and inferred beliefs to determine the navigation paths, thereby facilitating socially aware navigation. Through experiments in various risk-laden scenarios, this study validates the effectiveness of BNBRL+ in navigating crowded environments with blind spots. The model's ability to navigate effectively in spaces with limited visibility and avoid obstacles dynamically can significantly improve the safety and reliability of autonomous vehicles. The complement source code can be accessed here: https://github.com/JinnnK/BNBRLplus.
|
|
09:00-10:00, Paper WePI2T3.13 | |
A Service Robot in the Wild: Analysis of Users Intentions, Robot Behaviors, and Their Impact on the Interaction |
|
Arreghini, Simone | IDSIA USI-SUPSI |
Abbate, Gabriele | Istituto Dalle Molle Di Studi sull'Intelligenza Artificiale (IDS |
Giusti, Alessandro | IDSIA USI-SUPSI |
Paolillo, Antonio | IDSIA USI-SUPSI |
Keywords: Social HRI, Intention Recognition
Abstract: We consider a service robot that offers chocolate treats to people passing in its proximity: it has the capability of predicting in advance a person’s intention to interact, and to actuate an “offering” gesture, subtly extending the tray of chocolates towards a given target. We run the system for more than 5 hours across 3 days and two different crowded public locations; the system implements three possible behaviors that are randomly toggled every few minutes: passive (e.g. never performing the offering gesture); or active, triggered by either a naive distance-based rule, or a smart approach that relies on various behavioral cues of the user. We collect a real-world dataset that includes information on 1777 users with several spontaneous human-robot interactions and study the influence of robot actions on people’s behavior. Our comprehensive analysis suggests that users are more prone to engage with the robot when it proactively starts the interaction. We release the dataset and provide insights to make our work reproducible for the community. Also, we report qualitative observations collected during the acquisition campaign and identify future challenges and research directions in the domain of social human-robot interaction.
|
|
09:00-10:00, Paper WePI2T3.14 | |
Context-Aware Conversation Adaptation for Human-Robot Interaction |
|
Su, Zhidong | Oklahoma State University |
Sheng, Weihua | Oklahoma State University |
Keywords: Social HRI, Reinforcement Learning, Robot Companions
Abstract: Existing conversational robots are mostly reactive in that the interactions are usually initiated by the users. With the knowledge of the environmental context such as people's daily activities, robots can be more intelligent and proactive. In this paper, we proposed a context-aware conversation adaptation system (CACAS) for human-robot interaction (HRI). First, a context recognition module and a language processing module are developed to obtain the context information, user intent and slots, which become part of the state. Second, a reinforcement learning algorithm is developed to train an initial policy with a simulated user. User feedback data is collected through HRI using the initial policy. Third, a policy combining the reinforcement learning-based policy with the neural network-based policy is adapted based on the user feedback. We conducted both simulated user tests and real human subject tests to evaluate the proposed system. The results show that CACAS achieved a success rate of 85% in the real human subject test and 87.5% of participants were satisfied with the adaptation results. For the simulation test, CACAS had the highest success rate compared with the baseline methods.
|
|
09:00-10:00, Paper WePI2T3.15 | |
AEGO: Modeling Attention for HRI in Ego-Sphere Neural Networks |
|
Ferreira Chame, Hendry | University of Lorraine / CNRS |
Alami, Rachid | CNRS |
Keywords: Social HRI, Neurorobotics, Embodied Cognitive Science
Abstract: Despite important progress in recent years, social robots are still far away from showing advanced behavior for interaction and adaptation in human environments. Thus, we are interested in studying social cognition in human-robot interaction (HRI), notably in improving communication skills relying on joint attention (JA) and knowledge sharing. Since JA involves low-level cognitive processes in humans, we take into account the implications of Moravec's Paradox and focus on the aspect of knowledge representation. Inspired by 4E cognition principles, we study egocentric localization through the concept of sensory ego-sphere. We propose a neural network architecture named AEGO to model attention for each agent in interaction and show how to fuse information in a common representation space. From the perspective of dynamic fields theory, AEGO takes into account the dynamics of bottom-up and top-down modulation processes and the effects of neural excitatory and inhibitory synaptic interaction. In this work we evaluate the model in simulation and experiments with the robot Pepper in JA tasks based on proprioception, vision, rudimentary natural language and Hebbian plasticity. Results show that AEGO is convenient for HRI, allowing the human and the robot to share attention and knowledge about objects in scenarios close to everyday situations. AEGO constitutes a novel brain-inspired architecture to model attention that is suitable for multi-agent applications relying on social cognition skills, having the potential to generalize to several robotics platforms and HRI scenarios.
|
|
09:00-10:00, Paper WePI2T3.16 | |
Architectural-Scale Artistic Brush Painting with a Hybrid Cable Robot |
|
Chen, Gerry | Georgia Institute of Technology |
Al-Haddad, Tristan | Formations Studio |
Dellaert, Frank | Verdant Robotics/Georgia Tech |
Hutchinson, Seth | Georgia Institute of Technology |
Keywords: Art and Entertainment Robotics, Engineering for Robotic Systems, Parallel Robots
Abstract: Robot art presents an opportunity to both showcase and advance state-of-the-art robotics through the challenging task of creating art. Creating large-scale artworks in particular engages the public in a way that small-scale works cannot, and the distinct qualities of brush strokes contribute to an organic and human-like quality. Combining the large scale of murals with the strokes of the brush medium presents an especially impactful result, but also introduces unique challenges in maintaining precise, dextrous motion control of the brush across such a large workspace. In this work, we present the first robot to our knowledge that can paint architectural-scale murals with a brush. We create a hybrid robot consisting of a cable-driven parallel robot and 4 degree of freedom (DoF) serial manipulator to paint a 27m by 3.7m mural on windows spanning 2-stories of a building. We discuss our approach to achieving both the scale and accuracy required for brush-painting a mural through a combination of novel mechanical design elements, coordinated planning and control, and on-site calibration algorithms with experimental validations.
|
|
WePI2T4 |
Room 4 |
Perception I (Detection and Categorization) |
Teaser Session |
Chair: Xiang, Yu | University of Texas at Dallas |
Co-Chair: Bensalem, Saddek | University Grenoble |
|
09:00-10:00, Paper WePI2T4.1 | |
Swiss DINO: Efficient and Versatile Vision Framework for On-Device Personal Object Search |
|
Paramonov, Kirill | Samsung Research UK |
Zhong, Jia-Xing | University of Oxford |
Michieli, Umberto | Samsung Research |
Moon, Jijoong | Samsung Research Korea |
Ozay, Mete | Samsung Research |
Keywords: Object Detection, Segmentation and Categorization, Semantic Scene Understanding, Visual Learning
Abstract: In this paper, we address a recent trend in robotic home appliances to include vision systems on personal devices, capable of personalizing the appliances on the fly. In particular, we formulate and address an important technical task of personal object search, which involves localization and identification of personal items of interest on images captured by robotic appliances, with each item referenced only by a few annotated images. The task is crucial for robotic home appliances and mobile systems, which need to process personal visual scenes or to operate with particular personal objects (e.g., for grasping or navigation). In practice, personal object search presents two main technical challenges. First, a robot vision system needs to be able to distinguish between many fine-grained classes, in the presence of occlusions and clutter. Second, the strict resource requirements for the on-device system restrict the usage of most state-of-the-art methods for few-shot learning and often prevent on-device adaptation. In this work, we propose Swiss DINO: a simple yet effective framework for one-shot personal object search based on the recent DINOv2 transformer model, which was shown to have strong zero-shot generalization properties. Swiss DINO handles challenging on-device personalized scene understanding requirements and does not require any adaptation training. We show significant improvement (up to 55%) in segmentation and recognition accuracy compared to the common lightweight solutions, and significant footprint reduction of backbone inference time (up to 100x) and GPU consumption (up to 10x) compared to the heavy transformer-based solutions
|
|
09:00-10:00, Paper WePI2T4.2 | |
Continuous Rapid Learning by Human Imitation Using Audio Prompts and One-Shot Learning |
|
Duque Domingo, Jaime | University of Valladolid |
García-Gómez, Miguel | University of Valladolid |
Zalama, Eduardo | Instituo De Las Tecnologías delaProducción(ITAP).Universityof Va |
Gomez Garcia Bermejo, Jaime | University of Valladolid |
Keywords: Object Detection, Segmentation and Categorization, Imitation Learning, Deep Learning for Visual Perception
Abstract: In the general field of collaborative robotics, one of the topics of greatest interest to the scientific community is the ability to learn to perform certain actions by imitating humans. If we think about humans, when someone teaches us how to perform a certain action, we often need to be shown just one time how to do it. Likewise, we believe that robotics should follow this line, using models that do not involve the capture of huge data sets or exhaustive training. Furthermore, while general models can typically be pretrained offline, the robot must quickly adapt to new knowledge without requiring an expensive retraining process. In this article we present a flexible neural learning architecture that allows a robot to learn how-to pick-up a given object just by watching how a human does it. Then, the robot will be able to pick up the current object, or other objects previously learned, anywhere in the work field, with a simple audible indication from the user. This is achieved based on continuous incremental learning techniques and generic segmentation networks integrated with Siamese network models according to the recently proposed CP-CVV method. Results are presented for the success rate in grasping a varied set of objects.
|
|
09:00-10:00, Paper WePI2T4.3 | |
FedRC: A Rapid-Converged Hierarchical Federated Learning Framework in Street Scene Semantic Understanding |
|
Kou, Wei-Bin | The University of Hong Kong |
Lin, Qingfeng | The University of HongKong |
Tang, Ming | Southern University of Science and Technology |
Wang, Shuai | Shenzhen Institute of Advanced Technology, Chinese Academy of Sc |
Zhu, Guangxu | Shenzhen Research Institute of Big Data |
Wu, Yik-Chung | The University of Hong Kong |
Keywords: Object Detection, Segmentation and Categorization, Deep Learning for Visual Perception
Abstract: Street Scene Semantic Understanding (TriSU) is a crucial but complex task for world-wide distributed autonomous driving (AD) vehicles (e.g., Tesla). Its inference model faces poor generalization issue due to inter-city domain-shift. Hierarchical Federated Learning (HFL) offers a potential solution for improving TriSU model generalization, but suffers from slow convergence rate because of vehicles’ surrounding heterogeneity across cities. Going beyond existing HFL works that have deficient capabilities in complex tasks, we propose a rapid-converged heterogeneous HFL framework (FedRC) to address the inter-city data heterogeneity and accelerate HFL model convergence rate. In our proposed FedRC framework, both single RGB image and RGB dataset are modelled as Gaussian distributions. This approach not only differentiates each RGB sample instead of typically equalizing them, but also considers both data volume and statistical properties rather than simply taking data quantity into consideration. Extensive experiments on the TriSU task using across-city datasets demonstrate that FedRC converges faster than the state-of-the-art benchmark by 38.7%, 37.5%, 35.5%, and 40.6% in terms of mIoU, mPrecision, mRecall, and mF1, respectively. Furthermore, qualitative evaluations in the CARLA simulation environment confirm that the proposed FedRC framework delivers top-tier performance.
|
|
09:00-10:00, Paper WePI2T4.4 | |
Model Agnostic Defense against Adversarial Patch Attacks on Object Detection in Unmanned Aerial Vehicles |
|
Pathak, Saurabh | Technology Innovation Institute |
Shrestha, Samridha | Technology Innovation Institute |
AlMahmoud, Abdelrahman | Technology Innovation Institute |
Keywords: Object Detection, Segmentation and Categorization, Aerial Systems: Perception and Autonomy, Deep Learning for Visual Perception
Abstract: Object detection forms a key component in Unmanned Aerial Vehicles (UAVs) for completing high-level tasks that depend on the awareness of objects on the ground from an aerial perspective. In that scenario, adversarial patch attacks on an onboard object detector can severely impair the performance of upstream tasks. This paper proposes a novel model-agnostic defense mechanism against the threat of adversarial patch attacks in the context of UAV-based object detection. We formulate adversarial patch defense as an occlusion removal task. The proposed defense method can neutralize adversarial patches located on objects of interest, without exposure to adversarial patches during training. Our lightweight single-stage defense approach allows us to maintain a model-agnostic nature, that once deployed does not require to be updated in response to changes in the object detection pipeline. The evaluations in digital and physical domains show the feasibility of our method for deployment in UAV object detection pipelines, by significantly decreasing the Attack Success Ratio without incurring significant processing costs. As a result, the proposed defense solution can improve the reliability of object detection for UAVs.
|
|
09:00-10:00, Paper WePI2T4.5 | |
Proto-CLIP: Vision-Language Prototypical Network for Few-Shot Learning |
|
P, Jishnu Jaykumar | The University of Texas at Dallas |
Palanisamy, Kamalesh | University of Texas at Dallas |
Chao, Yu-Wei | NVIDIA |
Du, Xinya | UT Dallas |
Xiang, Yu | University of Texas at Dallas |
Keywords: Object Detection, Segmentation and Categorization, Recognition, Representation Learning
Abstract: We propose a novel framework for few-shot learning by leveraging large-scale vision-language models such as CLIP. Motivated by unimodal prototypical networks for few-shot learning, we introduce Proto-CLIP which utilizes image prototypes and text prototypes for few-shot learning. Specifically, Proto-CLIP adapts the image and text encoder embeddings from CLIP in a joint fashion using few-shot examples. The embeddings from the two encoders are used to compute the respective prototypes of image classes for classification. During adaptation, we propose aligning the image and text prototypes of the corresponding classes. Such alignment is beneficial for few-shot classification due to the reinforced contributions from both types of prototypes. Proto-CLIP has both training-free and fine-tuned variants. We demonstrate the effectiveness of our method by conducting experiments on benchmark datasets for few-shot learning, as well as in the real world for robot perception. The project page can be found at https://irvlutd.github.io/Proto-CLIP.
|
|
09:00-10:00, Paper WePI2T4.6 | |
SWCF-Net: Similarity-Weighted Convolution and Local-Global Fusion for Efficient Large-Scale Point Cloud Semantic Segmentation |
|
Lin, Zhenchao | Guangdong University of Technology |
He, Li | Southern University of Science and Technology |
Yang, Hongqiang | Meituan Technology Co., Ltd |
Xiaoqun, Sun | Meituan |
Zhang, Guojin | Meituan |
Chen, Weinan | Guangdong University of Technology |
Guan, Yisheng | Guangdong University of Technology |
Zhang, Hong | Southern University of Science and Technology |
Keywords: Object Detection, Segmentation and Categorization, Semantic Scene Understanding, Deep Learning Methods
Abstract: Large-scale point cloud consists of a multitude of individual objects, thereby encompassing rich structural and underlying semantic contextual information, resulting in a challenging problem in efficiently segmenting a point cloud. Most existing researches mainly focus on capturing intricate local features without giving due consideration to global ones, thus failing to leverage semantic context. In this paper, we propose a Similarity-Weighted Convolution and local-global Fusion Network, named SWCF-Net, which takes into account both local and global features. We propose a Similarity-Weighted Convolution (SWConv) to effectively extract local features, where similarity weights are incorporated into the convolution operation to enhance the generalization capabilities. Then, we employ a downsampling operation on the K and V channels within the attention module, thereby reducing the quadratic complexity to linear, enabling Transformer to deal with large-scale point cloud. At last, orthogonal components are extracted in the global features and then aggregated with local features, thereby eliminating redundant information between local and global features and consequently promoting efficiency. We evaluate SWCF-Net on large-scale outdoor datasets SemanticKITTI and Toronto3D. Our experimental results demonstrate the effectiveness of the proposed network. Our method achieves a competitive result with less computational cost, and is able to handle large-scale point clouds efficiently. The code is available at https://github.com/Sylva-Lin/SWCF-Net.
|
|
09:00-10:00, Paper WePI2T4.7 | |
3D Object Detection Via Stereo Pyramid Transformers with Rich Semantic Feature Fusion |
|
Gu, Rongqi | Tongji University |
Yang, Chu | Tongji University |
Lu, Yaohan | Westwell-Lab |
Liu, Peigen | Tongji University |
Wu, Fei | Westwell-Lab |
Chen, Guang | Tongji University |
Keywords: Object Detection, Segmentation and Categorization, Computer Vision for Transportation, Deep Learning Methods
Abstract: Camera-based 3D object detectors, prized for their broader applicability and cost-effectiveness compared to LiDAR sensors, still grapple with the inherently ill-posed nature of depth extraction from images. In this work, we present a novel approach that employs a transformer-based backbone and a fused geometry volume to bolster feature richness and elevate detection accuracy. Firstly, we propose the Stereo Pyramid Transformer backbone to extract features from stereo images, which can capture global information and establish cross-image semantic connections. Then, to tackle the challenge posed by small baseline binocular cameras, we propose to fuse stereo geometry volumes constructed by Stereo Plane Sweeping Volume (SPSV), Monocular Semantic Volume (MSV), and Lifted Volume (LV) to create finely detailed feature volumes. Through extensive experiments on both the KITTI and our datasets, our approach not only surpasses all existing transformer-based stereo 3D detection methods but also marks a significant milestone by achieving comparable performance with the leading CNN-based 3D detectors for the first time.
|
|
09:00-10:00, Paper WePI2T4.8 | |
MOSFormer: A Transformer-Based Multi-Modal Fusion Network for Moving Object Segmentation |
|
Cheng, Zike | Shanghai Jiao Tong University |
Zhao, Hengwang | Shanghai Jiao Tong University |
Shen, Qiyuan | Shanghai Jiao Tong University |
Yan, Weihao | Shanghai Jiao Tong University |
Wang, Chunxiang | Shanghai Jiaotong University |
Yang, Ming | Shanghai Jiao Tong University |
Keywords: Object Detection, Segmentation and Categorization, Computer Vision for Transportation, Intelligent Transportation Systems
Abstract: 3D moving object segmentation (MOS) is vital for autonomous systems, providing essential information for downstream tasks like mapping and localization. However, current MOS methods face challenges due to the limitation of existing datasets, which are sparse in moving objects and limited in scene diversity. Meanwhile, the prevalent methods are projection-based, struggling with the challenge of blurred boundaries. To tackle the dataset issue, we introduce a nuScenes-based MOS dataset, which provides richer scenes and more dynamic instances. To alleviate the boundary blurring issue and further improve accuracy and generalizability, we propose a dual-branch multimodal fusion MOS network, MOSFormer. The Transformer structure is incorporated to extract spatio-temporal information better, while image semantic information is utilized to refine the boundaries of moving objects. Finally, experiments on two datasets show that our method achieves state-of-the-art performance, and a mapping experiment with our method confirms its effectiveness in downstream tasks such as mapping and localization.
|
|
09:00-10:00, Paper WePI2T4.9 | |
CTS: Sim-To-Real Unsupervised Domain Adaptation on 3D Detection |
|
Zhang, Meiying | Southern University of Science and Technology |
Peng, Weiyuan | Southern University of Science and Technology |
Ding, Guangyao | Southern University of Science and Technology |
Lei, Chenyang | Southern University of Science and Technology |
Ji, Chunlin | Kuang-Chi Institute of Advanced Technology |
Hao, Qi | Southern University of Science and Technology |
Keywords: Object Detection, Segmentation and Categorization, Transfer Learning, Computer Vision for Transportation
Abstract: Simulation data can be accurately labeled and have been expected to improve the performance of data-driven algorithms, including object detection. However, due to the various domain inconsistencies from simulation to reality (sim-to-real), cross-domain object detection algorithms usually suffer from dramatic performance drops. While numerous unsupervised domain adaptation (UDA) methods have been developed to address cross-domain tasks between real-world datasets, progress in sim-to-real remains limited. This paper presents a novel Complex-to-Simple (CTS) framework to transfer models from labeled simulation (source) to unlabeled reality (target) domains. Based on a two-stage detector, the novelty of this work is threefold: 1) developing fixed-size anchor heads and RoI augmentation to address size bias and feature diversity between two domains, thereby improving the quality of pseudo-label; 2) developing a novel corner-format representation of aleatoric uncertainty (AU) for the bounding box, to uniformly quantify pseudo-label quality; 3) developing a noise-aware mean teacher domain adaptation method based on AU, as well as object-level and frame-level sampling strategies, to migrate the impact of noisy labels. Experimental results demonstrate that our proposed approach significantly enhances the sim-to-real domain adaptation capability of 3D object detection models, outperforming state-of-the-art cross-domain algorithms, which are usually developed for real-to-real UDA tasks.
|
|
09:00-10:00, Paper WePI2T4.10 | |
BAM: Box Abstraction Monitors for Real-Time OoD Detection in Object Detection |
|
Wu, Changshun | Université Grenoble Alpes |
He, Weicheng | Université Grenoble Alpes |
Cheng, Chih-Hong | Chalmers University of Technology |
Huang, Xiaowei | University of Liverpool |
Bensalem, Saddek | University Grenoble |
Keywords: Object Detection, Segmentation and Categorization, Computer Vision for Transportation
Abstract: Out-of-distribution (OoD) detection techniques for deep neural networks (DNNs) become crucial thanks to their filtering of abnormal inputs, especially when DNNs are used in safety-critical applications and interact with an open and dynamic environment. Nevertheless, integrating OoD detection into state-of-the-art (SOTA) object detection DNNs poses significant challenges, partly due to the complexity introduced by the SOTA OoD construction methods, which require the modification of DNN architecture and the introduction of complex loss functions. This paper proposes a simple, yet surprisingly effective, method that requires neither retraining nor architectural change in object detection DNN, called Box Abstraction-based Monitors (BAM). The novelty of BAM stems from using a finite union of convex box abstractions to capture the learned features of objects for in-distribution (ID) data, and an important observation that features from OoD data are more likely to fall outside of these boxes. The union of convex regions within the feature space allows the formation of non-convex and interpretable decision boundaries, overcoming the limitations of VOS-like detectors without sacrificing real-time performance. Experiments integrating BAM into Faster R-CNN-based object detection DNNs demonstrate a considerably improved performance against SOTA OoD detection techniques, with a reduction in the false detection rate of over 10% in most cases.
|
|
09:00-10:00, Paper WePI2T4.11 | |
Embodied Uncertainty-Aware Object Segmentation |
|
Fang, Xiaolin | MIT |
Kaelbling, Leslie | MIT |
Lozano-Perez, Tomas | MIT |
Keywords: Object Detection, Segmentation and Categorization, Perception-Action Coupling, Perception for Grasping and Manipulation
Abstract: We introduce uncertainty-aware object instance segmentation (UNCOS) and demonstrate its usefulness for embodied interactive segmentation. To deal with uncertainty in robot perception, we propose a method for generating a hypothesis distribution of object segmentation. We obtain a set of region-factored segmentation hypotheses together with confidence estimates by making multiple queries of large pre-trained models. This process can produce segmentation results that achieve state-of-the-art performance on unseen object segmentation problem. The output can also serve as input to a belief-driven process for selecting robot actions to perturb the scene to reduce ambiguity. We demonstrate the effectiveness of this method in real-robot experiments.
|
|
09:00-10:00, Paper WePI2T4.12 | |
Unsupervised 3D Part Decomposition Via Leveraged Gaussian Splatting |
|
Choy, Jae Goo | Sequor Robotics |
Cha, Geonho | NAVER Corp |
Kee, Hogun | Seoul National University |
Oh, Songhwai | Seoul National University |
Keywords: Object Detection, Segmentation and Categorization, Recognition
Abstract: We propose a novel unsupervised method for motion-based 3D part decomposition of articulated objects using a single monocular video of a dynamic scene. In contrast to existing unsupervised methods relying on optical flow or tracking techniques, our approach addresses this problem without additional information by leveraging Gaussian splatting techniques. We generate a series of Gaussians from a monocular video and analyze their relationships to decompose the dynamic scene into motion-based parts. To decompose dynamic scenes consisting of articulated objects, we design an articulated deformation field suitable for the movement of articulated objects. And to effectively understand the relationships of Gaussians of different shapes, we propose a 3D reconstruction loss using 3D occupied voxel maps generated from the Gaussians. Experimental results demonstrate that our method outperforms existing approaches in terms of 3D part decomposition for articulated objects. More demos and code are available at https://choonsik93.github.io/artnerf/.
|
|
09:00-10:00, Paper WePI2T4.13 | |
Non-Repetitive: A Promising LiDAR Scanning Pattern |
|
Xie, Angchen | Shanghai Jiao Tong University |
Qian, Yeqiang | Shanghai Jiao Tong University |
Yan, Weihao | Shanghai Jiao Tong University |
Wang, Chunxiang | Shanghai Jiaotong University |
Yang, Ming | Shanghai Jiao Tong University |
Keywords: Object Detection, Segmentation and Categorization, Deep Learning for Visual Perception, Data Sets for Robotic Vision
Abstract: LiDAR is an essential sensor for intelligent vehicles. Recently, LiDARs used in vehicles produced by different companies have significant differences in their scanning patterns. Some vehicles use mechanical and solid-state (repetitive) LiDARs, while others use prism-based (non-repetitive) LiDARs. The scanning pattern of a LiDAR has a profound impact on its scanning performance. To investigate the influence of LiDAR scanning patterns, we created the ``Repetitive-or-not" dataset, which is collected simultaneously by LiDARs with both repetitive and non-repetitive scanning patterns in the CARLA simulation environment. Using this dataset, we conducted a comprehensive statistical analysis of the scanning ability of repetitive and non-repetitive LiDARs. Furthermore, we looked into the effects of these two LiDAR scanning patterns on the performance of various 3D object detection algorithms. Finally, we explored the domain gap in the point cloud data produced by repetitive and non-repetitive LiDARs. Through an in-depth investigation of the ``Repetitive-or-not" dataset, we have discovered that non-repetitive LiDAR shows great promise. This conclusion is primarily supported by its superior object scanning capabilities.
|
|
09:00-10:00, Paper WePI2T4.14 | |
Scale Disparity of Instances in Interactive Point Cloud Segmentation |
|
Han, Chenrui | Zhejiang University |
Yu, Xuan | Zhejiang University |
Xie, Yuxuan | Tongji University |
Liu, Yili | Zhejiang University |
Mao, Sitong | ShenZhen Huawei Cloud Computing Technologies Co., Ltd |
Zhou, Shunbo | The Chinese University of Hong Kong |
Xiong, Rong | Zhejiang University |
Wang, Yue | Zhejiang University |
Keywords: Object Detection, Segmentation and Categorization, Semantic Scene Understanding, Computer Vision for Automation
Abstract: Interactive point cloud segmentation has become a pivotal task for understanding 3D scenes, enabling users to guide segmentation models with simple interactions such as clicks, therefore significantly reducing the effort required to tailor models to diverse scenarios and new categories.However, in the realm of interactive segmentation, the meaning of instance diverges from that in instance segmentation, because users might desire to segment instances of both thing and stuff categories that vary greatly in scale. Existing methods have focused on thing categories, neglecting the segmentation of stuff categories and the difficulties arising from scale disparity.To bridge this gap, we propose ClickFormer, an innovative interactive point cloud segmentation model that accurately segments instances of both thing and stuff categories. We propose a query augmentation module to augment click queries by a global query sampling strategy, thus maintaining consistent performance across different instance scales. Additionally, we employ global attention in the query-voxel transformer to mitigate the risk of generating false positives, along with several other network structure improvements to further enhance the model's segmentation performance. Experiments demonstrate that ClickFormer outperforms existing interactive point cloud segmentation methods across both indoor and outdoor datasets,providing more accurate segmentation results with fewer user clicks in an open-world setting.
|
|
09:00-10:00, Paper WePI2T4.15 | |
MDHA: Multi-Scale Deformable Transformer with Hybrid Anchors for Multi-View 3D Object Detection |
|
Adeline, Michelle | Monash University Malaysia |
Loo, Junn Yong | Monash Malaysia |
Baskaran, Vishnu Monn | Monash University Malaysia |
Keywords: Object Detection, Segmentation and Categorization, Autonomous Vehicle Navigation, Deep Learning for Visual Perception
Abstract: Multi-view 3D object detection is a crucial component of autonomous driving systems. Contemporary query-based methods primarily depend either on dataset-specific initialization of 3D anchors, introducing bias, or utilize dense attention mechanisms, which are computationally inefficient and unscalable. To overcome these issues, we present MDHA, a novel sparse query-based framework, which constructs adaptive 3D output proposals using hybrid anchors from multi-view, multi-scale image input. Fixed 2D anchors are combined with depth predictions to form 2.5D anchors, which are projected to obtain 3D proposals. To ensure high efficiency, our proposed Anchor Encoder performs sparse refinement and selects the top-k anchors and features. Moreover, while existing multi-view attention mechanisms rely on projecting reference points to multiple images, our novel Circular Deformable Attention mechanism only projects to a single image but allows reference points to seamlessly attend to adjacent images, improving efficiency without compromising on performance. On the nuScenes val set, it achieves 46.4% mAP and 55.0% NDS with a ResNet101 backbone. MDHA significantly outperforms the baseline where anchor proposals are modelled as learnable embeddings. Code is available at https://github.com/NaomiEX/MDHA.
|
|
09:00-10:00, Paper WePI2T4.16 | |
R2SNet: Scalable Domain Adaptation for Object Detection in Cloud-Based Robotic Ecosystems Via Proposal Refinement |
|
Antonazzi, Michele | University of Milan |
Luperto, Matteo | Università Degli Studi Di Milano |
Borghese, N. Alberto | University of Milano |
Basilico, Nicola | University of Milan |
Keywords: Distributed Robot Systems, Object Detection, Segmentation and Categorization
Abstract: We introduce a novel approach for scalable domain adaptation in cloud robotics scenarios where robots rely on third-party AI inference services powered by large pre-trained deep neural networks. Our method is based on a downstream proposal-refinement stage running locally on the robots, exploiting a new lightweight DNN architecture, R2SNet. This architecture aims to mitigate performance degradation from domain shifts by adapting the object detection process to the target environment, focusing on relabeling, rescoring, and suppression of bounding-box proposals. Our method allows for local execution on robots, addressing the scalability challenges of domain adaptation without incurring significant computational costs. Real-world results on mobile service robots performing door detection show the effectiveness of the proposed method in achieving scalable domain adaptation.
|
|
WePI2T5 |
Room 5 |
Deep Learning II |
Teaser Session |
Chair: Shafique, Muhammad | New York University Abu Dhabi |
|
09:00-10:00, Paper WePI2T5.1 | |
A Non-Invasive Device for Skin Cancer Diagnosis: First Clinical Evidence with Spectroscopic Data Enhanced by Machine Learning Algorithms |
|
Mainardi, Vanessa | Scuola Superiore Sant'Anna |
Carletti, Laura | Scuola Superiore Sant'Anna |
Tsiakmakis, Dimitrios | Aristotle University of Thessaloniki |
Dal Canto, Marco | Scuola Superiore Sant'Anna |
Mellilo, Tommaso | Scuola Superiore Sant'Anna |
Noferi, Stefano | Noze Srl |
Bagnoni, Giovanni | Dermatological Department of Spedali Riuniti |
Rubegni, Pietro | Dermatological Department of Senese Hospital |
Ciuti, Gastone | Scuola Superiore Sant'Anna |
Keywords: AI-Based Methods, Health Care Management, Engineering for Robotic Systems
Abstract: Skin cancer represents a significant global health concern, with melanoma alone accounting for thousands of deaths annually. Early diagnosis is crucial for improving survival rates and reducing healthcare costs. While traditional diagnostic approaches involve visual inspection followed by biopsy, emerging technologies offer less invasive options with improved precision. In this study, a novel non-invasive device was designed, developed, and validated to employ near-infrared reflectance spectroscopy for skin lesion analysis. Furthermore, this work presents a machine learning approach aimed at classifying different types of skin lesions, as well as a new sequential approach to distinguish benign from malignant lesions based on spectral data and exploring the impact of anamnestic features. The device was used in two independent hospitals in Italy to collect data from 69 patients in total, including various types of skin lesions, all of whom followed the standard protocol for screening and diagnosis intervention. The implemented model achieved a recall of 93.8% and an accuracy of 75% for melanoma and benign classification, and a recall of 100% and an accuracy of 98.6% in distinguishing non-melanoma cancer from benign lesions, demonstrating promising results for skin cancer diagnosis utilizing spectral and anamnestic data. In summary, this study contributes to the development of allied non-invasive diagnostic tools and underscores the potential of machine learning in dermatology using spectroscopic data.
|
|
09:00-10:00, Paper WePI2T5.2 | |
A Deep Signed Directional Distance Function for Shape Representation |
|
Zobeidi, Ehsan | University of California San Diego |
Atanasov, Nikolay | University of California, San Diego |
Keywords: Deep Learning for Visual Perception, Representation Learning, RGB-D Perception
Abstract: Predicting accurate observations efficiently from novel views is a key requirement for several robotics applications. Existing shape and surface representations, however, either require expensive ray-tracing operations, e.g., in the case of meshes or signed distance functions (SDFs), or offer only a coarse view, e.g., in the case of quadrics or point clouds. We develop a new representation that captures viewing direction and enables fast novel view synthesis. Our first contribution is a signed directional distance function (SDDF) that extends the SDF definition by measuring distance in a desired viewing direction rather than to the nearest point. As a result, SDDF removes post-processing steps for view synthesis required by SDF, such as surface extraction via marching cubes or rendering via sphere tracing, and allows ray-tracing through a single function call. SDDF also encodes by construction the property that distance decreases linearly along the viewing direction. We show that this enables dimensionality reduction in the function representation and guarantees the prediction accuracy independent of the distance to the surface. Recent advances demonstrate impressive performance of deep neural networks for shape learning, including IGR for SDF, Occupancy Networks for occupancy, AtlasNet for meshes, and NeRF for density. Our second contribution, DeepSDDF, is a deep neural network model for SDDF shape learning. Similar to IGR, we show that DeepSDDF can model whole object categories and interpolate or complete shapes from partial views.
|
|
09:00-10:00, Paper WePI2T5.3 | |
Best of Both Worlds: Hybrid SNN-ANN Architecture for Event-Based Optical Flow Estimation |
|
Negi, Shubham | Purdue University |
Sharma, Deepika | Purdue University |
Kosta, Adarsh Kumar | Purdue University |
Roy, Kaushik | Purdue University |
Keywords: Deep Learning for Visual Perception, Vision-Based Navigation
Abstract: In the field of robotics, event-based cameras are emerging as a promising low-power alternative to traditional frame-based cameras for capturing high-speed motion and high dynamic range scenes. This is due to their sparse and asynchronous event outputs. Spiking Neural Networks (SNNs) with their asynchronous event-driven compute, show great potential for extracting the spatio-temporal features from these event streams. In contrast, the standard Analog Neural Networks (ANNs) fail to process event data effectively. However, training SNNs is difficult due to additional trainable parameters (thresholds and leaks), vanishing spikes at deeper layers, and a non-differentiable binary activation function. Furthermore, an additional data structure, “membrane potential", responsible for keeping track of temporal information, must be fetched and updated at every timestep in SNNs. To overcome these challenges, we propose a novel SNN-ANN hybrid architecture that combines the strengths of both. Specifically, we leverage the asynchronous compute capabilities of SNN layers to effectively extract the input temporal information. Concurrently, the ANN layers facilitate training and efficient hardware deployment on traditional machine learning hardware such as GPUs. We provide extensive experimental analysis for assigning each layer to be spiking or analog, leading to a network configuration optimized for performance and ease of training. We evaluate our hybrid architecture for optical flow estimation on DSEC-flow and Multi-Vehicle Stereo Event-Camera (MVSEC) datasets. On the DSEC-flow dataset, the hybrid SNN-ANN architecture achieves a 40% reduction in average endpoint error (AEE) with 22% lower energy consumption compared to Full-SNN, and 48% lower AEE compared to Full-ANN, while maintaining comparable energy usage.
|
|
09:00-10:00, Paper WePI2T5.4 | |
Just Flip: Flipped Observation Generation and Optimization for Neural Radiance Fields to Cover Unobserved View |
|
Lee, Sibaek | Sungkyunkwan University (SKKU) |
Kang, Kyeongsu | Ulsan National Institute of Science and Technology (UNIST) |
Yu, Hyeonwoo | SungKyunKwan University |
Keywords: Deep Learning for Visual Perception
Abstract: With the advent of Neural Radiance Field (NeRF), representing 3D scenes through multiple observations has shown significant improvements. Since this cutting-edge technique can obtain high-resolution renderings by interpolating dense 3D environments, various approaches have been proposed to apply NeRF for the spatial understanding of robot perception. However, previous works are challenging to represent unobserved scenes or views on the unexplored robot trajectory, as these works do not take into account 3D reconstruction without observation information. To overcome this problem, we propose a method to generate flipped observation in order to cover absent observation for unexplored robot trajectory. To achieve this, we propose a data augmentation method for 3D reconstruction using NeRF by flipping observed images and estimating flipped camera 6DOF poses. Furthermore, to ensure the NeRF model operates robustly in general scenarios, we also propose a training method that adjusts the flipped pose and considers the uncertainty in flipped images accordingly. Our technique does not utilize an additional network, making it simple but fast, thus ensuring its suitability for robotic applications where real-time performance is important.
|
|
09:00-10:00, Paper WePI2T5.5 | |
RAM-NAS: Resource-Aware Multiobjective Neural Architecture Search Method for Robot Vision Tasks |
|
Mao, Shouren | Harbin Institute of Technology |
Qin, MingHao | Harbin Institute of Technology |
Dong, Wei | Harbin Institute of Technology |
Liu, Huajian | Harbin Institute of Technology |
Gao, Yongzhuo | Harbin Institute of Technology |
Keywords: Deep Learning for Visual Perception, Computer Vision for Automation
Abstract: Neural architecture search (NAS) has shown great promise in automatically designing lightweight models. However, conventional approaches are insufficient in training the supernet and pay little attention to actual robot hardware resources. To meet such challenges, we propose RAM-NAS, a resource-aware multi-objective NAS method that focuses on improving the supernet pretrain and resource-awareness on robot hardware devices. We introduce the concept of subnets mutual distillation, which refers to mutually distilling all subnets sampled by the sandwich rule. Additionally, we utilize the Decoupled Knowledge Distillation (DKD) loss to enhance logits distillation performance. To expedite the search process with consideration for hardware resources, we used data from three types of robotic edge hardware to train Latency Surrogate predictors. These predictors facilitated the estimation of hardware inference latency during the search phase, enabling a unified multi-objective evolutionary search to balance model accuracy and latency trade-offs. Our discovered model family, RAM-NAS models, can achieve top-1 accuracy ranging from 76.7% to 81.4% on ImageNet. In addition, the resource-aware multi-objective NAS we employ significantly reduces the model's inference latency on edge hardware for robots. We conducted experiments on downstream tasks to verify the scalability of our methods. The inference time for detection and segmentation is reduced on all three hardware types compared to MobileNetv3-based methods. Our work fills the gap in NAS for robot hardware resource-aware.
|
|
09:00-10:00, Paper WePI2T5.6 | |
Towards Dynamic and Small Objects Refinement for Unsupervised Domain Adaptative Nighttime Semantic Segmentation |
|
Pan, Jingyi | The Hong Kong University of Science and Technology (Guangzhou) |
Li, Sihang | New York University |
Chen, Yucheng | Hong Kong University of Technology and Science(Guangzhou) |
Zhu, Jinjing | HKUST(GZ) |
Wang, Lin | HKUST |
Keywords: Deep Learning for Visual Perception, Computer Vision for Transportation, Object Detection, Segmentation and Categorization
Abstract: Nighttime semantic segmentation plays a crucial role in practical applications, such as autonomous driving, where it frequently encounters difficulties caused by inadequate illumination conditions and the absence of well-annotated datasets. Moreover, semantic segmentation models trained on daytime datasets often face difficulties in generalizing effectively to nighttime conditions. Unsupervised domain adaptation (UDA) has shown the potential to address the challenges and achieved remarkable results for nighttime semantic segmentation. However, existing methods still face limitations in 1) their reliance on style transfer or relighting models, which struggle to generalize to complex nighttime environments, and 2) their ignorance of dynamic and small objects like vehicles and poles, which are difficult to be directly learned from other domains. This paper proposes a novel UDA method that refines both label and feature levels for dynamic and small objects for nighttime semantic segmentation. First, we propose a dynamic and small object refinement module to complement the knowledge of dynamic and small objects from the source domain to target the nighttime domain. These dynamic and small objects are normally context-inconsistent in under-exposed conditions. Then, we design a feature prototype alignment module to reduce the domain gap by deploying contrastive learning between features and prototypes of the same class from different domains, while re-weighting the categories of dynamic and small objects. Extensive experiments on three benchmark datasets demonstrate that our method outperforms prior arts by a large margin for nighttime segmentation. Project page: https://rorisis.github.io/DSRNSS/.
|
|
09:00-10:00, Paper WePI2T5.7 | |
Latent Disentanglement for Low Light Image Enhancement |
|
Zheng, Zhihao | Lehigh University |
Chuah, Mooi Choo | Lehigh University |
Keywords: Deep Learning for Visual Perception, Deep Learning Methods, Visual Learning
Abstract: Many learning-based low light image enhancement (LLIE) algorithms are based on the Retinex theory. However, the Retinex-based decomposition models introduce corruptions which limit their enhancement performance. In this paper, we propose a Latent Disentangle-based Enhancement Network (LDE-Net) for low light vision tasks. The latent disentanglement module disentangles the input image in latent space such that no corruption remains in the disentangled Content and Illumination components. For LLIE task, we design a Content-Aware Embedding (CAE) module that utilizes Content features to direct the enhancement of the Illumination component. For downstream tasks (e.g. nighttime UAV tracking and low light object detection), we develop an effective light-weight enhancer based on the latent disentanglement framework. Comprehensive quantitative and qualitative experiments demonstrate that our LDE-Net significantly outperforms state-of-the-art methods on various LLIE benchmarks. In addition, the great results obtained by applying our framework on the downstream tasks also demonstrate the usefulness of our latent disentanglement design.
|
|
09:00-10:00, Paper WePI2T5.8 | |
CaFNet: A Confidence-Driven Framework for Radar Camera Depth Estimation |
|
Sun, Huawei | Technical University of Munich; Infineon Technologies AG |
Feng, Hao | Technical University of Munich |
Ott, Julius | Infineon Technologies AG |
Servadei, Lorenzo | Technical University of Munich |
Wille, Robert | Technical University of Munich |
Keywords: Deep Learning for Visual Perception, Sensor Fusion, RGB-D Perception
Abstract: Depth estimation is critical in autonomous driving for interpreting 3D scenes accurately. Recently, radar-camera depth estimation has become of sufficient interest due to the robustness and low-cost properties of radar. Thus, this paper introduces a two-stage, end-to-end trainable Confidence-aware Fusion Net (CaFNet) for dense depth estimation, combining RGB imagery with sparse and noisy radar point cloud data. The first stage addresses radar-specific challenges, such as ambiguous elevation and noisy measurements, by predicting a radar confidence map and a preliminary coarse depth map. A novel approach is presented for generating the ground truth for the confidence map, which involves associating each radar point with its corresponding object to identify potential projection surfaces. These maps, together with the initial radar input, are processed by a second encoder. For the final depth estimation, we innovate a confidence-aware gated fusion mechanism to integrate radar and image features effectively, thereby enhancing the reliability of the depth map by filtering out radar noise. Our methodology, evaluated on the nuScenes dataset, demonstrates superior performance, improving upon the current leading model by 3.2% in Mean Absolute Error (MAE) and 2.7% in Root Mean Square Error (RMSE). Code: https://github.com/harborsarah/CaFNet
|
|
09:00-10:00, Paper WePI2T5.9 | |
VANP: Learning Where to See for Navigation with Self-Supervised Vision-Action Pre-Training |
|
Nazeri, Mohammad | PhD Student at George Mason University |
Wang, Junzhe | George Mason University |
Payandeh, Amirreza | George Mason |
Xiao, Xuesu | George Mason University |
Keywords: Deep Learning for Visual Perception, Representation Learning, Vision-Based Navigation
Abstract: Humans excel at efficiently navigating through crowds without collision by focusing on specific visual regions relevant to navigation. However, most robotic visual navigation methods rely on deep learning models pre-trained on vision tasks, which prioritize salient objects---not necessarily relevant to navigation and potentially misleading. Alternative approaches train specialized navigation models from scratch, requiring significant computation. On the other hand, self-supervised learning has revolutionized computer vision and natural language processing, but its application to robotic navigation remains underexplored due to the difficulty of defining effective self-supervision signals. Motivated by these observations, in this work, we propose a Self-Supervised Vision-Action Model for Visual Navigation Pre-Training (VANP). Instead of detecting salient objects that are beneficial for tasks such as classification or detection, VANP learns to focus only on specific visual regions that are relevant to the navigation task. To achieve this, VANP uses a history of visual observations, future actions, and a goal image for self-supervision, and embeds them using two small Transformer Encoders. Then, VANP maximizes the information between the embeddings by using a mutual information maximization objective function. We demonstrate that most VANP-extracted features match with human navigation intuition. VANP achieves comparable performance as models learned end-to-end with half the training time and models trained on a large-scale, fully supervised dataset, i.e., ImageNet, with only 0.08% data.
|
|
09:00-10:00, Paper WePI2T5.10 | |
SD-Net: Symmetric-Aware Keypoint Prediction and Domain Adaptation for 6D Pose Estimation in Bin-Picking Scenarios |
|
Huang, Dingtao | Tsinghua University |
Lin, Ente | Tsinghua University |
Chen, Lipeng | Tencent |
Liu, Lifu | Shenzhen International Graduate School, Tsinghua University |
Zeng, Long | Tsinghua University |
Keywords: Deep Learning for Visual Perception, Perception for Grasping and Manipulation, Computer Vision for Manufacturing
Abstract: Despite the success in 6D pose estimation in bin-picking scenarios, existing methods still struggle to produce accurate prediction results for symmetry objects and real world scenarios. The primary bottlenecks include 1) the ambiguity keypoints caused by object symmetries; 2) the domain gap between real and synthetic data. To circumvent these problem, we propose a new 6D pose estimation network with symmetric-aware keypoint prediction and self-training domain adaptation (SD-Net). SD-Net builds on pointwise keypoint regression and deep hough voting to perform reliable detection keypoint under clutter and occlusion. Specifically, at the keypoint prediction stage, we designe a robust 3D keypoints selection strategy considering the symmetry class of objects and equivalent keypoints, which facilitate locating 3D keypoints even in highly occluded scenes. Additionally, we build an effective filtering algorithm on predicted keypoint to dynamically eliminate multiple ambiguity and outlier keypoint candidates. At the domain adaptation stage, we propose the self-training framework using a student-teacher training scheme. To carefully distinguish reliable predictions, we harnesses a tailored heuristics for 3D geometry pseudo labelling based on semi-chamfer distance. On public Sil'eane dataset, SD-Net achieves state-of-the-art results, obtaining an average precision of 96%. Testing learning and generalization abilities on public Parametric datasets, SD-Net is 8% higher than the state-of-the-art method. The code is available at https://github.com/dingthuang/SD-Net.
|
|
09:00-10:00, Paper WePI2T5.11 | |
MaskingDepth: Masked Consistency Regularization for Semi-Supervised Monocular Depth Estimation |
|
Baek, Jongbeom | Korea University |
Kim, Gyeongnyeon | Korea University |
Park, Seonghoon | Korea University |
An, Honggyu | Korea University |
Poggi, Matteo | University of Bologna |
Kim, Seungryong | Korea University |
Keywords: Deep Learning for Visual Perception, Visual Learning, Deep Learning Methods
Abstract: We propose MaskingDepth, a semi-supervised learning framework for monocular depth estimation. MaskingDepth is designed to enforce consistency between the depths obtained from strongly-augmented images and the pseudo-depths derived from weakly-augmented images, which enables mitigating the reliance on large ground-truth depth quantities. In this framework, we leverage uncertainty estimation to only retain high-confident depth predictions from the weakly-augmented branch as pseudo-depths. We also present a novel data augmentation, dubbed K-way disjoint masking, that takes advantage of a naive token masking strategy as an augmentation, while avoiding its scale ambiguity problem between depths from weakly- and strongly-augmented branches and risk of missing small-scale objects. Experiments on KITTI and NYU-Depth-v2 datasets demonstrate the effectiveness of each component, its robustness to the use of fewer depth-annotated images, and superior performance compared to other state-of-the-art semi-supervised learning methods for monocular depth estimation.
|
|
09:00-10:00, Paper WePI2T5.12 | |
Learning to Estimate the Pose of a Peer Robot in a Camera Image by Predicting the States of Its LEDs |
|
Carlotti, Nicholas | Dalle Molle Institute for Artificial Intelligence (IDSIA) |
Nava, Mirko | IDSIA |
Giusti, Alessandro | IDSIA USI-SUPSI |
Keywords: Deep Learning for Visual Perception, Deep Learning Methods, Visual Learning
Abstract: We consider the problem of training a fully convolutional network to estimate the relative 6D pose of a robot given a camera image, when the robot is equipped with independent controllable LEDs placed in different parts of its body. The training data is composed by few (or zero) images labeled with a ground truth relative pose and many images labeled only with the true state (on or off) of each of the peer LEDs. The former data is expensive to acquire, requiring external infrastructure for tracking the two robots; the latter is cheap as it can be acquired by two unsupervised robots moving randomly and toggling their LEDs while sharing the true LED states via radio. Training with the latter dataset on estimating the LEDs' state of the peer robot (pretext task) promotes learning the relative localization task (end task). Experiments on real-world data acquired by two autonomous wheeled robots show that a model trained only on the pretext task successfully learns to localize a peer robot on the image plane; fine-tuning such model on the end task with few labeled images yields statistically significant improvements in 6D relative pose estimation with respect to baselines that do not use pretext-task pre-training, and alternative approaches. Estimating the state of multiple independent LEDs promotes learning to estimate relative heading. The approach works even when a large fraction of training images do not include the peer robot and generalizes well to unseen environments.
|
|
09:00-10:00, Paper WePI2T5.13 | |
Exploring Few-Beam LiDAR Assistance in Self-Supervised Multi-Frame Depth Estimation |
|
Fan, Rizhao | University of Bologna |
Poggi, Matteo | University of Bologna |
Mattoccia, Stefano | University of Bologna |
Keywords: Deep Learning for Visual Perception, Sensor Fusion, Computer Vision for Automation
Abstract: Self-supervised multi-frame depth estimation methods only require unlabeled monocular videos for training. However, most existing methods face challenges, including accuracy degradation caused by moving objects in dynamic scenes and scale ambiguity due to the absence of real-world references. In this field, the emergence of low-cost LiDAR sensors highlights the potential to improve the robustness of multi-frame depth estimation by exploiting accurate sparse measurements at the correct scale. Moreover, the LiDAR ranging points often intersect moving objects, providing more precise depth cues for them. This paper explores the impact of few-beam LiDAR data on self-supervised multi-frame depth estimation, proposing a method that fuses multi-frame matching and sparse depth features. It significantly enhances depth estimation robustness, particularly in scenarios involving moving objects and textureless backgrounds. We demonstrate the effectiveness of our approach through comprehensive experiments, showcasing its potential to address the limitations of existing methods and paving the way for more robust and reliable depth estimation based on this paradigm.
|
|
09:00-10:00, Paper WePI2T5.14 | |
MARVIS: Motion & Geometry Aware Real and Virtual Image Segmentation |
|
Wu, Jiayi | University of Maryland, College Park |
Lin, Xiaomin | University of Maryland |
Negahdaripour, Shahriar | University of Miami |
Fermuller, Cornelia | University of Maryland |
Aloimonos, Yiannis | University of Maryland |
Keywords: Deep Learning for Visual Perception, Object Detection, Segmentation and Categorization, Marine Robotics
Abstract: Tasks such as autonomous navigation, 3D reconstruction, and object recognition near the water surfaces are crucial in marine robotics applications. However, challenges arise due to dynamic disturbances, e.g., light reflections and refraction from the random air-water interface, irregular liquid flow, and similar factors, which can lead to potential failures in perception and navigation systems. Traditional computer vision algorithms struggle to differentiate between real and virtual image regions, significantly complicating tasks. A virtual image region is an apparent representation formed by the redirection of light rays, typically through reflection or refraction, creating the illusion of an object's presence without its actual physical location. This work proposes a novel approach for segmentation on real and virtual image regions, exploiting synthetic images combined with domain-invariant information, a Motion Entropy Kernel, and Epipolar Geometric Consistency. Our segmentation network does not need to be re-trained if the domain changes. We show this by deploying the same segmentation network in two different domains: simulation and the real world. By creating realistic synthetic images that mimic the complexities of the water surface, we provide fine-grained training data for our network (MARVIS) to discern between real and virtual images effectively. By motion & geometry-aware design choices and through comprehensive experimental analysis, we achieve state-of-the-art real-virtual image segmentation performance in unseen real world domain, achieving an IoU over 78% and a F1-Score over 86% while ensuring a small computational footprint. MARVIS offers over 43 FPS (8 FPS) inference rates on a single GPU (CPU core). Our code and dataset are available here https://github.com/jiayi-wu-umd/MARVIS.
|
|
09:00-10:00, Paper WePI2T5.15 | |
SSAP: A Shape-Sensitive Adversarial Patch for Comprehensive Disruption of Monocular Depth Estimation in Autonomous Navigation Applications |
|
Guesmi, Amira | NYU Abu Dhabi |
Hanif, Muhammad Abdullah | New York University Abu Dhabi (NYUAD) |
Alouani, Ihsen | Queen's University Belfast |
Ouni, Bassem | Technology Innovation Institute |
Shafique, Muhammad | New York University Abu Dhabi |
Keywords: Deep Learning for Visual Perception, Autonomous Vehicle Navigation, Vision-Based Navigation
Abstract: Monocular depth estimation (MDE) has advanced significantly, primarily through the integration of convolutional neural networks (CNNs) and more recently, Transformers. However, concerns about their susceptibility to adversarial attacks have emerged, especially in safety-critical domains like autonomous driving and robotic navigation. Existing approaches for assessing CNN-based depth prediction methods have fallen short in inducing comprehensive disruptions to the vision system, often limited to specific local areas. In this paper, we introduce SSAP (Shape-Sensitive Adversarial Patch), a novel approach designed to comprehensively disrupt monocular depth estimation (MDE) in autonomous navigation applications. Our patch is crafted to selectively undermine MDE in two distinct ways: by distorting estimated distances or by creating the illusion of an object disappearing from the system's perspective. Notably, our patch is shape-sensitive, meaning it considers the specific shape and scale of the target object, thereby extending its influence beyond immediate proximity. Furthermore, our patch is trained to effectively address different scales and distances from the camera. Experimental results demonstrate that our approach induces a mean depth estimation error surpassing 0.5, impacting up to 99% of the targeted region for CNN-based MDE models. Additionally, we investigate the vulnerability of Transformer-based MDE models to patch-based attacks, revealing that SSAP yields a significant error of 0.59 and exerts substantial influence over 99% of the target region on these models.
|
|
09:00-10:00, Paper WePI2T5.16 | |
Conditional Variational Autoencoders for Probabilistic Pose Regression |
|
Zangeneh, Fereidoon | KTH Royal Institute of Technology |
Bruns, Leonard | KTH Royal Institute of Technology |
Dekel, Amit | Univrses AB |
Pieropan, Alessandro | KTH |
Jensfelt, Patric | KTH - Royal Institute of Technology |
Keywords: Deep Learning for Visual Perception, Localization
Abstract: Robots rely on visual relocalization to estimate their pose from camera images when they lose track. One of the challenges in visual relocalization is repetitive structures in the operation environment of the robot. This calls for probabilistic methods that support multiple hypotheses for robot's pose. We propose such a probabilistic method to predict the posterior distribution of camera poses given an observed image. Our proposed training strategy results in a generative model of camera poses given an image, which can be used to draw samples from the pose posterior distribution. Our method is streamlined and well-founded in theory and outperforms existing methods on localization in presence of ambiguities.
|
|
WePI2T6 |
Room 6 |
Learning I |
Teaser Session |
Co-Chair: Sartoretti, Guillaume Adrien | National University of Singapore (NUS) |
|
09:00-10:00, Paper WePI2T6.1 | |
Bayesian Optimization for Sample-Efficient Policy Improvement in Robotic Manipulation |
|
Röfer, Adrian | University of Freiburg |
Nematollahi, Iman | University of Freiburg |
Welschehold, Tim | Albert-Ludwigs-Universität Freiburg |
Burgard, Wolfram | University of Technology Nuremberg |
Valada, Abhinav | University of Freiburg |
Keywords: Imitation Learning, Reinforcement Learning, Learning from Experience
Abstract: Sample efficient learning of manipulation skills poses a major challenge in robotics. While recent approaches demonstrate impressive advances in the type of task that can be addressed and the sensing modalities that can be incorporated, they still require large amounts of training data. Especially with regard to learning actions on robots in the real world, this poses a major problem due to the high costs associated with both demonstrations and real-world robot interactions. To address this challenge, we introduce BOpt-GMM, a hybrid approach that combines imitation learning with own experience collection. We first learn a skill model as a dynamical system encoded in a Gaussian Mixture Model from a few demonstrations. We then improve this model with Bayesian optimization building on a small number of autonomous skill executions in a sparse reward setting. We demonstrate the sample efficiency of our approach on multiple complex manipulation skills in both simulations and real-world experiments. We make the video, code, and pre-trained models publicly available at http://bopt-gmm.cs.uni-freiburg.de.
|
|
09:00-10:00, Paper WePI2T6.2 | |
DecAP : Decaying Action Priors for Accelerated Imitation Learning of Torque-Based Legged Locomotion Policies |
|
Sood, Shivam | Indian Institute of Technology Kharagpur |
Sun, Ge | National University of Singapore |
Li, Peizhuo | National University of Singapore |
Sartoretti, Guillaume Adrien | National University of Singapore (NUS) |
Keywords: Imitation Learning, Legged Robots, Reinforcement Learning
Abstract: Optimal Control for legged robots has gone through a paradigm shift from position-based to torque-based control, owing to the latter's compliant and robust nature. In parallel to this shift, the community has also turned to Deep Reinforcement Learning (DRL) as a promising approach to directly learn locomotion policies for complex real-life tasks. However, most end-to-end DRL approaches still operate in position space, mainly because learning in torque space is often sample-inefficient and does not consistently converge to natural gaits. To address these challenges, we propose a two-stage framework. In the first stage, we generate our own imitation data by training a position-based policy, eliminating the need for expert knowledge to design optimal controllers. The second stage incorporates decaying action priors, a novel method to enhance the exploration of torque-based policies aided by imitation rewards. We show that our approach consistently outperforms imitation learning alone and is robust to scaling these rewards from 0.1x to 10x. We further validate the benefits of torque control by comparing the robustness of a position-based policy to a position-assisted torque-based policy on a quadruped (Unitree Go1) without any domain randomization in the form of external disturbances during training.
|
|
09:00-10:00, Paper WePI2T6.3 | |
Efficient Trajectory Forecasting and Generation with Conditional Flow Matching |
|
Ye, Sean | Georgia Institute of Technology |
Gombolay, Matthew | Georgia Institute of Technology |
Keywords: Imitation Learning, Deep Learning Methods, Learning from Demonstration
Abstract: Trajectory prediction and generation are crucial for autonomous robots in dynamic environments. While prior research has typically focused on either prediction or generation, our approach unifies these tasks to provide a versatile framework and achieve state-of-the-art performance. While diffusion models excel in trajectory generation, their iterative sampling process is computationally intensive, hindering robotic systems' dynamic capabilities. We introduce Trajectory Conditional Flow Matching (T-CFM), a novel approach using flow matching techniques to learn a solver time-varying vector field for efficient, fast trajectory generation. T-CFM demonstrates effectiveness in adversarial tracking, real-world aircraft trajectory forecasting, and long-horizon planning, outperforming state-of-the-art baselines with 35% higher predictive accuracy and 142% improved planning performance. Crucially, T-CFM achieves up to 100x speed-up compared to diffusion models without sacrificing accuracy, enabling real-time decision making in robotics. Codebase: https://github.com/CORE-Robotics-Lab/TCFM
|
|
09:00-10:00, Paper WePI2T6.4 | |
Driving from Vision through Differentiable Optimal Control |
|
Acerbo, Flavia Sofia | Siemens Digital Industries Software |
Swevers, Jan | KU Leuven |
Tuytelaars, Tinne | KU Leuven |
Tong, Son | Siemens Digital Industries Software |
Keywords: Imitation Learning, Optimization and Optimal Control, Intelligent Transportation Systems
Abstract: This paper proposes DriViDOC: a framework for Driving from Vision through Differentiable Optimal Control, and its application to learn autonomous driving controllers from human demonstrations. DriViDOC combines the automatic inference of relevant features from camera frames with the properties of nonlinear model predictive control (NMPC), such as constraint satisfaction. Our approach leverages the differentiability of parametric NMPC, allowing for end-to-end learning of the driving model from images to control. The model is trained on an offline dataset comprising various human demonstrations collected on a motion-base driving simulator. During online testing, the model demonstrates successful imitation of different driving styles, and the interpreted NMPC parameters provide insights into the achievement of specific driving behaviors. Our experimental results show that DriViDOC outperforms other methods involving NMPC and neural networks, exhibiting an average improvement of 20% in imitation scores.
|
|
09:00-10:00, Paper WePI2T6.5 | |
Learning Multi-Reference Frame Skills from Demonstration with Task-Parameterized Gaussian Processes |
|
Ramirez Montero, Mariano | Delft University of Technology |
Franzese, Giovanni | TU Delft |
Kober, Jens | TU Delft |
Della Santina, Cosimo | TU Delft |
Keywords: Imitation Learning, Probabilistic Inference, Probability and Statistical Methods
Abstract: A central challenge in Learning from Demonstration is to generate representations that are adaptable and can generalize to unseen situations. This work proposes to learn such a representation without using task-specific heuristics within the context of multi-reference frame skill learning by superimposing local skills in the global frame. Local policies are first learned by fitting the relative skills with respect to each frame using Gaussian Processes (GPs). Then, another GP, which determines the relevance of each frame for every time step, is trained in a self-supervised manner from a different batch of demonstrations. The uncertainty quantification capability of GPs is exploited to stabilize the local policies and to train the frame relevance in a fully Bayesian way. We validate the method through a dataset of multi-frame tasks generated in simulation and on real-world experiments with a robotic manipulation pick-and-place re-shelving task. We evaluate the performance of our method with two metrics: how close the generated trajectories get to each of the task goals and the deviation between these trajectories and test expert trajectories. According to both of these metrics, the proposed method consistently outperforms the state-of-the-art baseline, Task-Parameterised Gaussian Mixture Model (TPGMM).
|
|
09:00-10:00, Paper WePI2T6.6 | |
IntervenGen: Interventional Data Generation for Robust and Data-Efficient Robot Imitation Learning |
|
Hoque, Ryan | University of California, Berkeley |
Mandlekar, Ajay Uday | NVIDIA |
Garrett, Caelan | NVIDIA |
Goldberg, Ken | UC Berkeley |
Fox, Dieter | University of Washington |
Keywords: Imitation Learning, Data Sets for Robot Learning, Learning from Demonstration
Abstract: Imitation learning is a promising paradigm for training robot control policies, but these policies can suffer from distribution shift, where the conditions at evaluation time differ from those in the training data. A popular approach for increasing policy robustness to distribution shift is interactive imitation learning (i.e., DAgger and variants), where a human operator provides corrective interventions during policy rollouts. However, collecting a sufficient amount of interventions to cover the distribution of policy mistakes can be burdensome for human operators. We propose IntervenGen (I-Gen), a novel data augmentation system for robot control that autonomously produces a large set of corrective interventions with rich coverage of the state space from a small number of human interventions. We apply I-Gen to 4 simulated environments and 1 physical environment with object pose estimation error and show that it can increase policy robustness by up to 39x with only 10 human interventions. Videos and more results are available at https://sites.google.com/view/intervengen2024.
|
|
09:00-10:00, Paper WePI2T6.7 | |
Learning Generalizable Tool-Use Skills through Trajectory Generation |
|
Qi, Carl | University of Texas at Austin |
Wu, Yilin | Carnegie Mellon University |
Yu, Lifan | Carnegie Mellon University |
Liu, Haoyue | Carnegie Mellon University |
Jiang, Bowen | Carnegie Mellon University |
Lin, Xingyu | UC Berkeley |
Held, David | Carnegie Mellon University |
Keywords: Imitation Learning, Deep Learning in Grasping and Manipulation, Learning from Demonstration
Abstract: Autonomous systems that efficiently utilize tools can assist humans in completing many common tasks such as cooking and cleaning. However, current systems fall short of matching human-level of intelligence in terms of adapting to novel tools. Prior works based on affordance often make strong assumptions about the environments and cannot scale to more complex, contact-rich tasks. In this work, we tackle this challenge and explore how agents can learn to use previously unseen tools to manipulate deformable objects. We propose to learn a generative model of the tool-use trajectories as a sequence of point clouds, which generalizes to different tool shapes. Given any novel tool, we first generate a tool-use trajectory and then optimize the sequence of tool poses to align with the generated trajectory. We train a textit{single model} for four different challenging deformable object manipulation tasks. Our model is trained with demonstration data from just a textit{single tool} for each task and is able to generalize to various novel tools, significantly outperforming baselines. We also test our trained policy in the real world with unseen tools and it achieves the performance similar to human oracle.Additional materials can be found on our project website.
|
|
09:00-10:00, Paper WePI2T6.8 | |
ARCADE: Scalable Demonstration Collection and Generation Via Augmented Reality for Imitation Learning |
|
Yang, Yue | The University of North Carolina at Chapel Hill |
Ikeda, Bryce | University of North Carolina Chapel Hill |
Bertasius, Gedas | UNC Chapel Hill |
Szafir, Daniel J. | University of North Carolina at Chapel Hill |
Keywords: Imitation Learning, Learning from Demonstration, Physical Human-Robot Interaction
Abstract: Robot Imitation Learning (IL) is a crucial technique in robot learning, where agents learn by mimicking human demonstrations. However, IL encounters scalability challenges stemming from both non-user-friendly demonstration collection methods and the extensive time required to amass a sufficient number of demonstrations for effective training. In response, we introduce the Augmented Reality for Collection and genertextitAtion of DEmonstrations (ARCADE) framework, designed to scale up demonstration collection for robot manipulation tasks. Our framework combines two key capabilities: 1) it leverages AR to make demonstration collection as simple as users performing daily tasks using their hands, and 2) it enables the automatic generation of additional synthetic demonstrations from a single human-derived demonstration, significantly reducing user effort and time. We assess ARCADE's performance on a real Fetch robot across three robotics tasks: 3-Waypoints-Reach, Push, and Pick-And-Place. Using our framework, we were able to rapidly train a policy using vanilla Behavioral Cloning (BC), a classic IL algorithm, which excelled across these three tasks. We also deploy ARCADE on a real household task, Pouring-Water, achieving an 80% success rate.
|
|
09:00-10:00, Paper WePI2T6.9 | |
Self Supervised Detection of Incorrect Human Demonstrations: A Path Toward Safe Imitation Learning by Robots in the Wild |
|
Sojib, Noushad | University of New Hampshire |
Begum, Momotaz | University of New Hampshire |
Keywords: Imitation Learning, Data Sets for Robot Learning, Learning from Demonstration
Abstract: A major appeal of learning from demonstrations or imitation learning (IL) in robotics is that it learns a policy directly from lay users. However, Lay users may inadvertently provide erroneous demonstrations that lead to learning of policies that are inaccurate and hence, unsafe for humans and/or robot. This paper makes two contributions in the endeavour of recognizing human errors in demonstrations and thereby helping to learn a safe IL policy. First, we created a dataset – Layman V1.0 – with 15 lay users who provided a total of 1200 demonstrations for three simulated tasks – Lift, Can and Square in the simulated Robosuite environment – and two real robot tasks with a Sawyer robot, using a custom designed Android app for tele-operation. Second, we propose a framework named Behavior Cloning for Error Detection (BED) to autonomously detect and discard erroneous demonstrations from a demonstration pool. Our method uses a Behavior Cloning method as self-supervised technique and assigns binary weight to each demonstration based on its inconsistencies with the rest of the demonstrations. We show the effectiveness of this framework in detecting incorrect demonstrations in the Layman V1.0 dataset. We further show that state-of-the- art (SOTA) policy learners learns a better policy when bad demonstrations, identified through the proposed framework, are removed from the training pool. Dataset and Codes are available in https://github.com/AssistiveRoboticsUNH/bed
|
|
09:00-10:00, Paper WePI2T6.10 | |
Learning Force-Based Control Policies Via Differentiable Virtual Coupling (Diff-VC) |
|
Galvan, Aldo | University of Texas at Austin |
Majewicz Fey, Ann | University of Texas at Austin |
Patel, Ravi | University of Texas at Austin |
|
09:00-10:00, Paper WePI2T6.11 | |
RISE: 3D Perception Makes Real-World Robot Imitation Simple and Effective |
|
Wang, Chenxi | Shanghai Noematrix Intelligence Technology Ltd |
Fang, Hongjie | Shanghai Jiao Tong University |
Fang, Hao-Shu | Massachusetts Institute of Technology |
Lu, Cewu | ShangHai Jiao Tong University |
Keywords: Imitation Learning, RGB-D Perception
Abstract: Precise robot manipulations require rich spatial information in imitation learning. Image-based policies model object positions from fixed cameras, which are sensitive to camera view changes. Policies utilizing 3D point clouds usually predict keyframes rather than continuous actions, posing difficulty in frequently changing scenarios. To utilize 3D perception efficiently, we present RISE, an end-to-end baseline for real-world imitation learning, which predicts continuous actions directly from single-view point clouds. It compresses the point cloud to tokens with a sparse 3D encoder. After adding sparse positional encoding, the tokens are then featurized using a transformer. Finally, the features are decoded into robot actions by a diffusion head. Trained with 50 demonstrations for each real-world task, RISE surpasses currently representative 2D and 3D policies by a large margin, showcasing significant advantages in both accuracy and efficiency. Experiments also demonstrate that RISE is more general and robust to environmental change compared with previous baselines. Project website: rise-policy.github.io.
|
|
09:00-10:00, Paper WePI2T6.12 | |
TinyLidarNet: 2D Lidar-Based End-To-End Deep Learning Model for F1TENTH Autonomous Racing |
|
Zarrar, Mohammed Misbah | University of Kansas |
Weng, QiTao | University of Kansas |
Yerjan, Bakhbyergyen | University of Kansas |
Soyyigit, Ahmet | University of Kansas |
Yun, Heechul | University of Kansas |
Keywords: Imitation Learning, Machine Learning for Robot Control, Embedded Systems for Robotic and Automation
Abstract: Prior research has demonstrated the effectiveness of end-to-end deep learning for robotic navigation, where the control signals are directly derived from raw sensory data. However, the majority of existing end-to-end navigation solutions are predominantly camera-based. In this paper, we introduce TinyLidarNet, a lightweight 2D LiDAR-based end-to-end deep learning model for autonomous racing. An F1TENTH vehicle using TinyLidarNet won 3rd place in the 12th F1TENTH Autonomous Grand Prix competition, demonstrating its competitive performance. We systematically analyze its performance on untrained tracks and computing requirements for real-time processing. We find that TinyLidarNet's 1D Convolutional Neural Network (CNN) based architecture significantly outperforms widely used Multi-Layer Perceptron (MLP) based architecture. In addition, we show that it can be processed in real-time on low-end micro-controller units (MCUs).
|
|
09:00-10:00, Paper WePI2T6.13 | |
Robust Imitation Learning for Mobile Manipulator Focusing on Task-Related Viewpoints and Regions |
|
Ishida, Yutaro | Toyota Motor Corporation |
Noguchi, Yuki | Toyota Motor Corporation |
Kanai, Takayuki | Toyota Motor Corporation |
Shintani, Kazuhiro | Toyota Motor Corporation |
Bito, Hiroshi | Toyota Motor Corporation |
Keywords: Imitation Learning, Visual Learning, Mobile Manipulation
Abstract: We study how to generalize the visuomotor policy of a mobile manipulator from the perspective of visual observations. The mobile manipulator is prone to occlusion owing to its own body when only a single viewpoint is employed and a significant domain shift when deployed in diverse situations. However, to the best of the authors’ knowledge, no study has been able to solve occlusion and domain shift simultaneously and propose a robust policy. In this paper, we propose a robust imitation learning method for mobile manipulators that focuses on task-related viewpoints and their spatial regions when observing multiple viewpoints. The multiple viewpoint policy includes attention mechanism, which is learned with an augmented dataset, and brings optimal viewpoints and robust visual embedding against occlusion and domain shift. Comparison of our results for different tasks and environments with those of previous studies revealed that our proposed method improves the success rate by up to 29.3 points. We also conduct ablation studies using our proposed method. Learning task-related viewpoints from the multiple viewpoints dataset increases robustness to occlusion than using a uniquely defined viewpoint. Focusing on task-related regions contributes to up to a 33.3-point improvement in the success rate against domain shift.
|
|
09:00-10:00, Paper WePI2T6.14 | |
Safe CoR: A Dual-Expert Approach to Integrating Imitation Learning and Safe Reinforcement Learning Using Constraint Rewards |
|
Kwon, Hyeokjin | Seoul National University |
Lee, Gunmin | Seoul National University |
Lee, Junseo | Seoul National University |
Oh, Songhwai | Seoul National University |
Keywords: Reinforcement Learning, Imitation Learning
Abstract: In the realm of autonomous agents, ensuring safety and reliability in complex and dynamic environments remains a paramount challenge. Safe reinforcement learning addresses these concerns by introducing safety constraints, but still faces challenges in navigating intricate environments such as complex driving situations. To overcome these challenges, we present the safe constraint reward (Safe CoR) framework, a novel method that utilizes two types of expert demonstrations—reward expert demonstrations focusing on performance optimization and safe expert demonstrations prioritizing safety. By exploiting a constraint reward (CoR), our framework guides the agent to balance performance goals of reward sum with safety constraints. We test the proposed framework in diverse environments, including the safety gym, metadrive, and the real-world Jackal platform. Our proposed framework improves algorithm performance by 39% and reduces constraint violations by 88% on the real-world Jackal platform, highlighting its effectiveness. Through this innovative approach, we expect significant advancements in real-world performance, leading to transformative effects in the realm of safe and reliable autonomous agents.
|
|
09:00-10:00, Paper WePI2T6.15 | |
Imitation Learning for Sim-To-Real Adaptation of Robotic Cutting Policies Based on Residual Gaussian Process Disturbance Force Model |
|
Hathaway, Jamie | University of Birmingham, Birmingham, UK |
Rastegarpanah, Alireza | University of Birmingham |
Stolkin, Rustam | University of Birmingham |
Keywords: Reinforcement Learning, Transfer Learning, Model Learning for Control
Abstract: Robotic cutting, a crucial task in applications such as disassembly and decommissioning, faces challenges due to uncertainties in real-world environments. This paper presents a novel approach to enhance sim-to-real transfer of robotic cutting policies, leveraging a hybrid method integrating Gaussian process (GP) regression to model disturbance forces encountered during cutting tasks. By learning from a limited number of real-world trials, our method captures residual process dynamics, enabling effective adaptation to diverse materials without the need for fine-tuning on physical robots. Key to our approach is the utilisation of imitation learning, where expert actions in the uncorrected simulation are paired with GP-corrected observations. This pairing aligns action distributions between simulated and real-world domains, facilitating robust policy transfer. We illustrate the efficacy of our method through real world cutting trials in autonomously adapting to diverse material properties; our method surpasses re-training, while providing similar benefits to fine-tuning in real-world cutting scenarios. Notably, policies transferred using our approach exhibit enhanced resilience to noise and disturbances, while maintaining fidelity to expert behaviours from the source domain.
|
|
09:00-10:00, Paper WePI2T6.16 | |
ViSaRL: Visual Reinforcement Learning Guided by Human Saliency |
|
Liang, Anthony | University of Southern California |
Thomason, Jesse | USC Viterbi School of Engineering |
Bıyık, Erdem | University of Southern California |
Keywords: Representation Learning, Reinforcement Learning, Imitation Learning
Abstract: Training robots to perform complex control tasks from high-dimensional pixel input using reinforcement learning (RL) is sample-inefficient, because image observations are comprised primarily of task-irrelevant information. By contrast, humans are able to visually attend to task-relevant objects and areas. Based on this insight, we introduce Visual Saliency-Guided Reinforcement Learning (ViSaRL). Using ViSaRL to learn visual representations significantly improves the success rate, sample efficiency, and generalization of an RL agent on diverse tasks including DeepMind Control benchmark, robot manipulation in simulation and on a real robot. We present approaches for incorporating saliency into both CNN and Transformer-based encoders. We show that visual representations learned using ViSaRL are robust to various sources of visual perturbations including perceptual noise and scene variations. ViSaRL nearly doubles success rate on the real-robot tasks compared to the baseline which does not use saliency.
|
|
WePI2T7 |
Room 7 |
Grasping & Manipulation I |
Teaser Session |
|
09:00-10:00, Paper WePI2T7.1 | |
RTTF: Rapid Tactile Transfer Framework for Contact-Rich Manipulation Tasks |
|
Wu, Qiwei | Harbin Institute of Technology, Shenzhen |
Peng, Xuanbin | Harbin Institute of Technology, Shenzhen |
Zhou, Jiayu | Harbin Institute of Technology, Shenzhen |
Sun, Zhuoran | Harbin Institute of Technology, Shenzhen |
Xiong, Xiaogang | Harbin Institute of Technology, Shenzhen |
Lou, Yunjiang | Harbin Institute of Technology, Shenzhen |
Keywords: Force and Tactile Sensing, Dexterous Manipulation, Machine Learning for Robot Control
Abstract: An increasing number of robotic manipulation tasks now use optical tactile sensors to provide tactile feedback, making tactile servo control a crucial aspect of robotic operations. This paper presents a rapid tactile transfer framework (RTTF) that achieves optical-tactile image sim2real transfer and robust tactile servo control using limited paired data. The sim2real aspect of RTTF employs a semi-supervised approach, beginning with pretraining the latent space representations of tactile images and subsequently mapping different tactile image domains to a shared latent space within a simulated tactile image domain. This latent space, combined with the proprioceptive information of the robotic arm, is then integrated into a privileged learning framework for policy training, which results in a deployable tactile control policy. Our results demonstrate the robustness of the proposed framework in achieving task objectives across different tactile sensors with varying physical parameters. Furthermore, manipulators equipped with tactile sensors, allow for rapid training and deployment for diverse contact-rich tasks, including object pushing and surface-following.
|
|
09:00-10:00, Paper WePI2T7.2 | |
Seg2Grasp: A Robust Modular Suction Grasping in Bin Picking |
|
Yoon, Hye Jung | Seoul National University |
Kim, Juno | Seoul National University |
Park, Yesol | Seoul National University |
Lee, Jun Ki | Seoul National University |
Zhang, Byoung-Tak | Seoul National University |
Keywords: Logistics, RGB-D Perception, Grasping
Abstract: Current bin picking methods that rely heavily on end-to-end learning often falter when confronted with unfamiliar or complex objects in unstructured environments. To overcome these limitations, we introduce Seg2Grasp, a modular pipeline designed for robust suction grasping in dynamic and cluttered bin scenarios. Seg2Grasp is built on a three-step process: Segmentation, Grasping, and Classification. The Segmentation module employs a Transformer-based model to generate class-agnostic object masks from RGB-D images, ensuring accurate detection across various conditions. The Grasping module uses surface normals and mask proposals to determine the optimal suction points, enhancing grasp success. Finally, the Classification module leverages fine-tuned open-vocabulary Mask-CLIP for precise object identification, enabling versatile handling of diverse objects. Real-world robotic experiments demonstrate that Seg2Grasp outperforms existing methods in success rates and adaptability, establishing it as a powerful tool for automated bin picking in industrial settings.
|
|
09:00-10:00, Paper WePI2T7.3 | |
Beyond the Cascade: Juggling Vanilla Siteswap Patterns |
|
Gomez Andreu, Mario Alejandro | Technical University Darmstadt |
Ploeger, Kai | Technische Universität Darmstadt |
Peters, Jan | Technische Universität Darmstadt |
Keywords: Art and Entertainment Robotics, Manipulation Planning, Dexterous Manipulation
Abstract: Being widespread in human motor behavior, dynamic movements demonstrate higher efficiency and greater capacity to address a broader range of skill domains compared to their quasi-static counterparts. Among the frequently studied dynamic manipulation problems, robotic juggling tasks stand out due to their inherent ability to scale their difficulty levels to arbitrary extents, making them an excellent subject for investigation. In this study, we explore juggling patterns with mixed throw heights, following the vanilla siteswap juggling notation, which jugglers widely adopted to describe toss juggling patterns. This requires extending our previous analysis of the simpler cascade juggling task by a throw-height sequence planner and further constraints on the end effector trajectory. These are not necessary for cascade patterns but are vital to achieving patterns with mixed throw heights. Using a simulated environment, we demonstrate successful juggling of most common 3-9 ball siteswap patterns up to 9 ball height, transitions between these patterns, and random sequences covering all possible vanilla siteswap patterns with throws between 2 and 9 ball height. https://kai-ploeger.com/beyond-cascades
|
|
09:00-10:00, Paper WePI2T7.4 | |
Insert-One: One-Shot Robust Visual-Force Servoing for Novel Object Insertion with 6-DoF Tracking |
|
Chang, Haonan | Rutgers University |
Boularias, Abdeslam | Rutgers University |
Jain, Siddarth | Mitsubishi Electric Research Laboratories (MERL) |
Keywords: Assembly, Perception for Grasping and Manipulation, Visual Servoing
Abstract: Recent advancements in autonomous robotic assembly have shown promising results, especially in addressing the precision insertion challenge. However, achieving adaptability across diverse object categories and tasks often necessitates a learning phase that requires costly real-world data collection. Moreover, previous research often assumes either the rigid attachment of the inserted object to the robot’s end-effector or relies on precise calibration within structured environments. We propose a one-shot method for high-precision contact-rich manipulation assembly tasks, enabling a robot to perform insertions of new objects from randomly presented orientations using just a single demonstration image. Our method incorporates a hybrid framework that blends 6-DoF visual tracking-based iterative control and impedance control, facilitating high-precision tasks with real-time visual feedback. Importantly, our approach requires no pre-training and demonstrates resilience against uncertainties arising from camera pose calibration errors and disturbances in the object in-hand pose. We validate the effectiveness of the proposed framework through extensive experiments in real-world scenarios, encompassing various high-precision assembly tasks.
|
|
09:00-10:00, Paper WePI2T7.5 | |
Exploring How Non-Prehensile Manipulation Expands Capability in Robots Experiencing Multi-Joint Failure |
|
Briscoe-Martinez, Gilberto | University of Colorado Boulder |
Pasricha, Anuj | University of Colorado Boulder |
Abderezaei, Ava | University of Colorado Boulder |
Chaganti, Rama Durga Santosh Kumar | University of Colorado Boulder |
Vajrala, Sarath Chandra | University of Colorado Boulder |
Popuri, Srikanth | University of Colorado Boulder |
Roncone, Alessandro | University of Colorado Boulder |
Keywords: Failure Detection and Recovery, Manipulation Planning, Motion and Path Planning
Abstract: This work explores non-prehensile manipulation (NPM) and whole-body interaction as strategies for enabling robotic manipulators to conduct manipulation tasks despite experiencing locked multi-joint (LMJ) failures. LMJs are critical system faults where two or more joints become inoperable; they impose constraints on the robot's configuration and control spaces, consequently limiting the capability and reach of a prehensile-only approach. This approach involves three components: i) modeling the failure-constrained workspace of the robot, ii) generating a kinodynamic map of NPM actions within this workspace, and iii) a manipulation action planner that uses a sim-in-the-loop approach to select the best actions to take from the kinodynamic map. The experimental evaluation shows that our approach can increase the failure-constrained reachable area in LMJ cases by 79%. Further, it demonstrates the ability to complete real-world manipulation with up to 88.9% success when the end-effector is unusable and up to 100% success when it is usable.
|
|
09:00-10:00, Paper WePI2T7.6 | |
Multimodal Failure Prediction for Vision-Based Manipulation Tasks with Camera Faults |
|
Ma, Yuliang | University of Stuttgart |
Liu, Jingyi | University of Stuttgart |
Mamaev, Ilshat | Proximity Robotics & Automation GmbH |
Morozov, Andrey | University of Stuttgart |
Keywords: Failure Detection and Recovery, Data Sets for Robotic Vision, Deep Learning in Grasping and Manipulation
Abstract: Due to the increasing behavioral and structural complexity of robots, it is challenging to predict the execution outcome after error detection. Anomaly detection methods can help detect errors and prevent potential failures. However, not every fault leads to a failure due to the system's fault tolerance or unintended error masking. In practical applications, a robotic system should have a potential failure evaluation module to estimate the probability of failures when receiving an error alert. Subsequently, a decision-making mechanism should help to take the next action, e.g., terminate, degrade performance, or continue the execution of the task. This paper proposes a multimodal method for failure prediction for vision-based manipulation systems that suffer from potential camera faults. We inject faults into images (e.g., noise and blur) and observe manipulation failure scenarios (e.g., pick failure, place failure, and collision) that can occur during the task. Through extensive fault injection experiments, we created a FAULT-to-FAILURE dataset containing 4000 real-world manipulation samples. The dataset is subsequently used to train the failure predictor. Our approach processes the combination of RGB images, masked images, and planned paths to effectively evaluate whether a certain faulty image could potentially lead to a manipulation failure. Results demonstrate that the proposed method outperforms state-of-the-art models in terms of overall performance, requires fewer sensors, and achieves faster inference speeds. The analytical software prototype and dataset are available at Github: MultimodalFailurePrediction.
|
|
09:00-10:00, Paper WePI2T7.7 | |
Development of a Bendable and Extendable Soft Gripper Driven by Differential Worm Gear Mechanism |
|
Selvamuthu, Moses Gladson | Yamagata University |
Tadakuma, Riichiro | Yamagata University |
Keywords: Grippers and Other End-Effectors, Mechanism Design, Soft Robot Applications
Abstract: A gripper mechanism using flexible double-rack finger actuated using differential worm gear mechanism that can change its finger length according to the object being grasped was developed. Thermoplastic polyurethane (TPU) based fingers were soft, impact resistant, highly compliant, and were able to conform to the contour of the objects when grasping. The fingers can extend, contract and bend in 2D space when driven by the differential worm gear mechanism. A normal worm and an inner worm gear was used to actuate the flexible double-rack fingers. The relative motions between the two worm gears resulted in different motions of the finger. Multiple fingered configuration of the gripper can be driven from the same mechanism actuated using two actuators. A mathematical model was developed to describe the kinematics of the finger for positioning experiments. Experiments were also conducted to measure the fingertip force for different lengths of the finger. Grasping experiments of two and three-finger gripper were performed to test the grasping performance of the gripper for objects of different size, shape and weight. The gripper successfully grasped objects of different sizes by adjusting the finger length and conforming to the shape of the objects.
|
|
09:00-10:00, Paper WePI2T7.8 | |
Gravity-Aware Grasp Generation with Implicit Grasp Mode Selection for Underactuated Hands |
|
Ko, Tianyi | Woven by Toyota, Inc |
Ikeda, Takuya | Woven by Toyota, Inc |
Stewart, Thomas | Woven by Toyota |
Lee, Robert | Australian Centre for Robotic Vision |
Nishiwaki, Koichi | Woven by Toyota |
Keywords: Grasping, Perception for Grasping and Manipulation, AI-Enabled Robotics
Abstract: Learning-based grasp detectors typically assume precision grasp, where each finger only has one contact point, and estimate the grasp probability. In this work, we propose a data generation and learning pipeline that can leverage power grasp, which has more contact points with an enveloping configuration and is robust against both positioning error and force disturbance. To train a grasp detector to prioritize power grasp while still keeping precision grasp as the secondary choice, we propose to train the network against the magnitude of disturbance in the gravity direction a grasp can resist (gravity-rejection score) rather than the binary classification of success. We also provide an efficient data generation pipeline for a dataset with gravity-rejection score annotation. In addition to thorough ablation studies, quantitative evaluation in both simulation and real-robot clarifies the significant improvement in our approach, especially when the objects are heavy.
|
|
09:00-10:00, Paper WePI2T7.9 | |
Multi-Fingered End-Effector Grasp Reflex Modeling for One-Shot Tactile Servoing in Tool Manipulation Tasks |
|
Sheetz, Emily | University of Michigan |
Savchenko, Misha | METECS |
Zemler, Emma | NASA |
Presswala, Abbas | Aeyon (Jacobs) |
Crouch, Andrew | CACI |
Azimi, Shaun | NASA |
Kuipers, Benjamin | University of Michigan |
Keywords: Grasping, In-Hand Manipulation, Multifingered Hands
Abstract: Autonomous tool manipulation tasks are challenging for robots because they must reason over the tool's object affordances, how to grasp the tool so it may be used, how the tool will interact with other objects in the environment, and how to perform the complex tool affordances to complete the manipulation task. Focusing on tool grasping presents further challenges, specifically generalization to novel tools and modeling the problem in an explainable way suitable for safety-critical task domains, such as robots operating autonomously to perform repair tasks in NASA lunar habitats. In this work, we focus on grasping tools in an explainable way that can be generalized to novel tools. We present a logistic regression based grasp reflex model, which maps continuous end-effector sensor data to a set of discrete symbolic states. An adjustment policy uses these symbolic states to compute the appropriate gradient to change the end-effector pose and increase the probability of a secure tool grasp. Once the tool grasp is sufficiently secure, the robot proceeds with the rest of the manipulation task. We test our grasp reflex model on 6 novel tools, and find that the model achieves one-shot generalization by successfully using tactile servoing to secure grasps from one example of a secure grasp state. The robot's ability to learn to grasp tools in an explainable way that achieves one-shot generalization to novel tools demonstrates the power of our grasp reflex model in allowing robots to achieve autonomous tool manipulation tasks.
|
|
09:00-10:00, Paper WePI2T7.10 | |
MultiGripperGrasp: A Dataset for Robotic Grasping from Parallel Jaw Grippers to Dexterous Hands |
|
Casas, Luis Felipe | University of Texas at Dallas |
Khargonkar, Ninad | University of Texas at Dallas |
Prabhakaran, B | University of Texas at Dallas |
Xiang, Yu | University of Texas at Dallas |
Keywords: Grasping, Multifingered Hands, Dexterous Manipulation
Abstract: We introduce a large-scale dataset named MultiGripperGrasp for robotic grasping. Our dataset contains 30.4M grasps from 11 grippers for 345 objects. These grippers range from two-finger grippers to five-finger grippers, including a human hand. All grasps in the dataset are verified in the robot simulator Isaac Sim to classify them as successful and unsuccessful grasps. Additionally, the object fall-off time for each grasp is recorded as a grasp quality measurement. Furthermore, the grippers in our dataset are aligned according to the orientation and position of their palms, allowing us to transfer grasps from one gripper to another. The grasp transfer significantly increases the number of successful grasps for each gripper in the dataset. Our dataset is useful to study generalized grasp planning and grasp transfer across different grippers.
|
|
09:00-10:00, Paper WePI2T7.11 | |
Speeding up 6-DoF Grasp Sampling with Quality-Diversity |
|
Huber, Johann | ISIR, Sorbonne Université |
Hélénon, François | Sorbonne Université |
Kappel, Mathilde | Institut Des Systèmes Intelligents Et De Robotique |
Chelly, Elie | Sorbonne Université - Institut Des Systèmes Intelligents Et Rob |
Khoramshahi, Mahdi | Sorbonne Université |
Ben Amar, Faiz | Université Pierre Et Marie Curie, Paris 6 |
Doncieux, Stéphane | Sorbonne University |
Keywords: Grasping, Data Sets for Robot Learning, Evolutionary Robotics
Abstract: Recent advances in AI have led to significant results in robotic learning, including natural language-conditioned planning and efficient optimization of controllers using generative models. However, the interaction data remains the bottleneck for generalization. Getting data for grasping is a critical challenge, as this skill is required to complete many manipulation tasks. Quality-Diversity (QD) algorithms optimize a set of solutions to get diverse, high-performing solutions to a given problem. This paper investigates how QD can be combined with priors to speed up the generation of diverse grasps poses in simulation compared to standard 6-DoF grasp sampling schemes. Experiments conducted on 4 grippers with 2-to-5 fingers on standard objects show that QD outperforms commonly used methods by a large margin. Further experiments show that QD optimization automatically finds some efficient priors that are usually hard coded. The deployment of generated grasps on a 2-finger gripper and an Allegro hand shows that the diversity produced maintains sim-to-real transferability. We believe these results to be a significant step toward the generation of large datasets that can lead to robust and generalizing robotic grasping policies.
|
|
09:00-10:00, Paper WePI2T7.12 | |
Toward an Analytic Theory of Intrinsic Robustness for Dexterous Grasping |
|
Li, Albert H. | California Institute of Technology |
Culbertson, Preston | Stanford University |
Ames, Aaron | Caltech |
Keywords: Grasping, Dexterous Manipulation, Multifingered Hands
Abstract: Conventional approaches to grasp planning require perfect knowledge of an object's pose and geometry. Uncertainties in these quantities induce uncertainties in the quality of planned grasps, which can lead to failure. Classically, grasp robustness refers to the ability to resist external disturbances after grasping an object. In contrast, this work studies robustness to intrinsic sources of uncertainty like object pose or geometry affecting grasp planning before execution. To do so, we develop a novel analytic theory of grasping that reasons about this intrinsic robustness by characterizing the effect of friction cone uncertainty on a grasp's force closure status. We apply this result in two ways. First, we analyze the theoretical guarantees on intrinsic robustness of two grasp metrics in the literature, the classical Ferrari-Canny metric and more recent min-weight metric. We validate these results with hardware trials that compare grasps synthesized with and without robustness guarantees, showing a clear improvement in success rates. Second, we use our theory to develop a novel analytic notion of probabilistic force closure, which we show can generate unique, uncertainty-aware grasps in simulation.
|
|
09:00-10:00, Paper WePI2T7.13 | |
6-DoF Grasp Detection in Clutter with Enhanced Receptive Field and Graspable Balance Sampling |
|
Wang, Hanwen | Beijing University of Posts and Telecommunications |
Ying, Zhang | Beijing University of Posts and Telecommunications |
Wang, Yunlong | Institute of Automation, Chinese Academy of Sciences (CASIA) |
Li, Jian | Beihang University & National Research Center for Rehabilitation |
Keywords: Grasping, Perception for Grasping and Manipulation, Deep Learning in Grasping and Manipulation
Abstract: 6-DoF grasp detection of small-scale grasps is crucial for robots to perform specific tasks. This paper focuses on enhancing the recognition capability of small-scale grasping, aiming to improve the overall accuracy of grasping prediction results and the generalization ability of the network. We propose an enhanced receptive field method that includes a multi-radii cylinder grouping module and a passive attention module. This method enhances the receptive field area within the graspable space and strengthens the learning of graspable features. Additionally, we design a graspable balance sampling module based on a segmentation network, which enables the network to focus on features of small objects, thereby improving the recognition capability of small-scale grasping. Our network achieves state-of-the-art performance on the GraspNet-1Billion dataset, with an overall improvement of approximately 10% in average precision@k (AP). Furthermore, We deployed our grasp detection model on the PyBullet grasping platform and in real-world scenarios, validating the effectiveness of our method.
|
|
09:00-10:00, Paper WePI2T7.14 | |
GripFlexer: Development of Hybrid Gripper with a Novel Shape That Can Perform in Narrow Spaces |
|
Kim, Donghyun | Daegu Gyeongbuk Institute of Science and Technology |
Choi, Sunghyun | Daegu Gyeongbuk Institute of Science & Technology |
Song, Bongsub | Daegu Gyeongbuk Institute of Science and Technology |
Song, Jinhyeok | DGIST |
Yoon, Jingon | Daegu Gyeongbuk Institute of Science and Technology (DGIST), Dae |
Yun, Dongwon | Daegu Gyeongbuk Institute of Science and Technology (DGIST) |
Keywords: Grasping, Tendon/Wire Mechanism, Underactuated Robots
Abstract: In recent years, the role of robots across industries has become increasingly diverse, and they are now required to perform complex missions beyond simple repetitive tasks. However, robots used in confined spaces that humans cannot reach or in disaster field missions have challenges in performing various tasks due to their small size. In this study, we developed a compact hybrid gripper that fuses a multi-finger gripper and a jamming gripper to perform various tasks in a confined environment. Such a hybrid gripper can have both the strengths of a multi finger gripper that can perform various tasks and a jamming gripper that can effectively handle irregular small objects. In this study, we developed a hybrid gripper "GripFlexer" based on theoretical analysis and confirmed its performance through experiments by taking the task of turning a circular doorknob, which is one of the most difficult tasks in disaster sites, as the final target task. We also confirmed that the two grippers of GripFlexer can interact by showing performance improvement effects when two grippers are operated simultaneously.
|
|
09:00-10:00, Paper WePI2T7.15 | |
Enhancing Object Grasping Efficiency with Deep Learning and Post-Processing for Multi-Finger Robotic Hands |
|
Samandi, Pouya | Simon Fraser University |
Gupta, Kamal | Simon Fraser University |
Mehrandezh, Mehran | University of Regina |
Keywords: Grasping, Force and Tactile Sensing, Computer Vision for Automation
Abstract: This paper builds upon the well-established ML-based grasping technique, known as the Grasp-Rectangle (GR) method. The original GR method made two simplifying assumptions: it was designed exclusively for two-finger grippers, and it assumed that the gripper would approach objects solely from a top-down perspective on a horizontal surface. We have extended the GR method, for a multi-finger hand beyond these assumptions to (1) enable grasping from top and side angles and (2) engage multiple points of contact, enhancing the algorithm’s overall performance. Our approach leverages geometric cues extracted from object images to calculate the optimal grasp pose and contact points, thereby enhancing grasp reliability. Extensive testing was conducted using a 7- DOF robotic arm equipped with a 7-DOF 3-finger gripper. We achieved an accuracy of 98.6% on the Cornell Grasping Dataset with a processing time of 120 milliseconds. Furthermore, when assessing object grasping from both top and side perspectives, our algorithm delivered successful grasps at rates of 95% and 96%, respectively. These findings are rooted in a comprehensive series of tests performed across a diverse array of objects.
|
|
09:00-10:00, Paper WePI2T7.16 | |
Task-Oriented Design Method for Monolithic Flexible Hands with Wire Drive Systems |
|
Kusuhara, Rina | Osaka University |
Higashimori, Mitsuru | Osaka University |
Keywords: Tendon/Wire Mechanism, Flexible Robotics, Multifingered Hands
Abstract: This paper discusses a novel task-oriented design method for wire-driven flexible hands. For a monolithic hand fabricated using 3D printing, an analytical design method is proposed to enable it to perform the given tasks. First, the wiring-synergy equation, which relates the parameters of the hand mechanism, the wire tension, and the generated posture is derived based on an analytical model of a hand with wire drive systems. Next, the posture-synergy equation is derived, using principal component analysis for multiple desired postures given to perform a task. Based on the isomorphism of the mathematical structure in the two synergy equations, a method for designing a hand is developed. By quantitatively evaluating the posture reproducibility with respect to the number of wire drive systems, this method can analytically determine the mechanism parameters and wire tension for the desired postures. Subsequently, the proposed method is validated through case studies. Finally, a hand for an in-hand manipulation task is developed, and the feasibility of the proposed method is validated experimentally. The method potentially contributes to expediting the design procedure, increasing the accuracy of the posture reproduction, and reducing the number of actuators.
|
|
09:00-10:00, Paper WePI2T7.17 | |
A Novel Geometrical Structure Robot Hand for Linear-Parallel Pinching and Coupled Self-Adaptive Hybrid Grasping |
|
Chen, Shi | Nanchang University |
Zhang, Bihao | University of Science and Technology of China |
Feng, Kehan | Nanjing University of Aeronautics and Astronautics |
Wang, Yizhou | Southern University of Science and Technology |
Li, Jiayun | The Hong Kong University of Science and Technology |
Zhang, Wenzeng | Shenzhen X-Institute |
Keywords: Underactuated Robots, Grippers and Other End-Effectors
Abstract: Current robot hand grippers capable of self-adaptive or coupled grasping often cannot perform linear-parallel pinching at the physical end of the gripper, which is widely used in industrial applications. For this reason, this paper introduces a gripper with hybrid grasping modes— the LPCSA hand. It can achieve three grasping modes: linear-parallel pinching, coupled, and self-adaptive grasping. The design cleverly couples two kinds of Chebyshev linear mechanisms to enable flat movement at the end of the finger. It also utilizes the deformability of the parallelogram to achieve self-adaptive grasping. Furthermore, the gripper uses an idle stroke and a special component to facilitate the switch between the three modes. The linear-parallel pinching function is suitable for pinching objects of different sizes on the desktop. The self-adaptive grasping mode can adapt to objects of various shapes and sizes. The coupled grasping mode enables fast grasping of irregular objects. This paper also analyzes the kinematics and dynamics of the LPCSA hand. Combined with experiments, it demonstrates that the LPCSA hand has a wide range of grasping space and stable performance.
|
|
WePI2T8 |
Room 8 |
Robot Motion Planning I |
Teaser Session |
Chair: Manoonpong, Poramate | Vidyasirimedhi Institute of Science and Technology (VISTEC) |
|
09:00-10:00, Paper WePI2T8.1 | |
Active Information Gathering for Long-Horizon Navigation under Uncertainty by Predicting the Value of Information |
|
Arnob, Raihan Islam | George Mason University |
Stein, Gregory | George Mason University |
Keywords: Motion and Path Planning, AI-Enabled Robotics, Autonomous Agents
Abstract: We address the task of long-horizon navigation in partially mapped environments for which active gathering of information about faraway unseen space is essential for good behavior. We present a novel planning strategy that, at training time, affords tractable computation of the value of information associated with revealing potentially informative regions of unseen space, data used to train a graph neural network to predict the goodness of temporally-extended exploratory actions. Our learning-augmented model-based planning approach predicts the expected value of information of revealing unseen space and is capable of using these predictions to actively seek information and so improving long-horizon navigation. Across two simulated office-like environments, our planner outperforms both competitive learned and non-learned baseline navigation strategies, achieving improvements of up to 63.76% and 36.68%, demonstrating its capacity to actively seek performance-critical information.
|
|
09:00-10:00, Paper WePI2T8.2 | |
Time-Optimal Path Parameterization for Cooperative Multi-Arm Robotic Systems with Third-Order Constraints |
|
Dio, Maximilian | Friedrich-Alexander-Universität Erlangen-Nürnberg |
Graichen, Knut | Friedrich Alexander University Erlangen-Nürnberg |
Völz, Andreas | Friedrich-Alexander-Universität Erlangen-Nürnberg |
Keywords: Motion and Path Planning, Dual Arm Manipulation, Multi-Robot Systems
Abstract: This paper presents a time-optimal path parameterization (TOPP) method for cooperative multi-arm robotic systems (MARS) manipulating heavy objects with third-order constraints that include jerk, torque rate and wrench rate limits. The method is based on a problem reformulation as a sequential linear program and provides a unified planning approach that is faster than previous convex optimization techniques. The equivalence to a reachability-based TOPP is shown and simulation results for a cooperative MARS consisting of two 7 degree of freedom (DOF) robots and a tightly grasped object with 6 DOFs are provided.
|
|
09:00-10:00, Paper WePI2T8.3 | |
Neural Trajectory Model: Implicit Neural Trajectory Representation for Trajectories Generation |
|
Yu, Zihan | The Hongkong University of Science and Technology(Guangzhou) |
Tang, Yuqing | International Digital Economy Academy (IDEA) |
Keywords: Motion and Path Planning, Path Planning for Multiple Mobile Robots or Agents, AI-Based Methods
Abstract: The multi-agent trajectory planning problem is a difficult problem in robotics due to its computational complexity and real-world environment complexity with uncertainty, non-linearity, and real-time requirements. Many existing solutions are either search-based or optimization-based approaches with simplified assumptions of environment, limited planning speed, and limited scalability in the number of agents. In this work, we first attempt to reformulate single-agent and multi-agent trajectory planning problems as query problems over an implicit neural representation of trajectories. We formulate such implicit representations as Neural Trajectory Models (NTM) which can be queried to generate nearly optimal trajectory in complex environments. We conduct experiments in simulation environments and demonstrate that NTM can solve single-agent and multi-agent trajectory planning problems. In the experiments, NTMs achieve (1) sub-millisecond panning time using GPUs, (2) almost avoiding all environment collision, (3) almost avoiding all inter-agent collision, and (4) generating almost shortest paths. We also demonstrate that the same NTM framework can also be used for trajectory correction and multi-trajectory conflict resolution refining low-quality and conflicting multi-agent trajectories into nearly optimal solutions efficiently. (Open source code is available at https://github.com/ laser2099/neural-trajectory-model)
|
|
09:00-10:00, Paper WePI2T8.4 | |
Planning for Long-Term Monitoring Missions in Time-Varying Environments |
|
Stephens, Alex | University of Oxford |
Lacerda, Bruno | University of Oxford |
Hawes, Nick | University of Oxford |
Keywords: Planning, Scheduling and Coordination, Environment Monitoring and Management, Field Robots
Abstract: Recent years have seen autonomous robots deployed in long-term missions across an ever-increasing breadth of domains. We consider robots deployed over a sequence of finite-horizon missions in the same environment, with the objective of maximising the value from observations of some unknown spatiotemporal process. This work is motivated by applications such as ecological monitoring, in which a robot might be repeatedly deployed in the field over weeks or months with the task of modelling processes of scientific interest. We formalise the problem of long-term monitoring over multiple finite-horizon missions as a Markov decision process with a partially unknown state, and present an online planning approach to address it. Our approach uses a spatiotemporal Gaussian process to model the environment and make predictions about unvisited states, integrating this with a belief-based Monte Carlo tree search algorithm which decides where the robot should go next. We empirically demonstrate the strengths of our framework through a series of experiments using synthetic data as well as real acoustic data from monitoring of bioactivity in coral reefs.
|
|
09:00-10:00, Paper WePI2T8.5 | |
Local Path Planning among Pushable Objects Based on Reinforcement Learning |
|
Yao, Linghong | University College London |
Modugno, Valerio | University College London |
Delfaki, Andromachi Maria | University College London |
Liu, Yuanchang | University College London |
Stoyanov, Danail | University College London |
Kanoulas, Dimitrios | University College London |
Keywords: Motion and Path Planning, Legged Robots, Autonomous Vehicle Navigation
Abstract: In this paper, we introduce a method to tackle the problem of robot local path planning among pushable objects - an open problem in robotics. In particular, we simultaneously train multiple agents in a physics-based simulation environment, utilizing an Advantage Actor-Critic algorithm coupled with a deep neural network. The developed online policy enables these agents to push obstacles in ways that are not limited to axial alignments, adapt to unforeseen changes in obstacle dynamics instantaneously, and effectively tackle local path planning in confined areas. We tested the method in various simulated environments to prove the adaptation effectiveness to various unseen scenarios in unfamiliar settings. Moreover, we have successfully applied this policy on an actual quadruped robot, confirming its capability to handle the unpredictability and noise associated with real-world sensors and the inherent uncertainties in unexplored object-pushing tasks.
|
|
09:00-10:00, Paper WePI2T8.6 | |
Learning-Informed Long-Horizon Navigation under Uncertainty for Vehicles with Dynamics |
|
Khanal, Abhish | George Mason University |
Bui, Hoang-Dung | George Mason University |
Plaku, Erion | U.S. National Science Foundation |
Stein, Gregory | George Mason University |
Keywords: Motion and Path Planning, Nonholonomic Motion Planning, AI-Enabled Robotics
Abstract: We present a novel approach to learning-augmented, long-horizon navigation under uncertainty in large-scale environments in which considering the robot dynamics is essential for informing good behavior. Our approach tightly integrates high-level planning, in which a dynamics-aware learned model estimates the goodness of actions that enter unseen space, and low-level planning, which provides dynamically feasible trajectories for both informing high-level decision-making and low-level progress towards the unseen goal. Owing to its ability to understand the impacts of the robot’s dynamics on how it should attempt to reach the goal, our approach achieves both higher reliability and improved navigation performance compared to competitive learning-informed and non-learned baselines in simulated office-building-like environments.
|
|
09:00-10:00, Paper WePI2T8.7 | |
Enhancing Safety Via Deep Reinforcement Learning in Trajectory Planning for Agile Flights within Unknown Environments |
|
Rocha, Lidia | UFSCar |
Bidinotto, Jorge | University of Sao Paulo |
Heintz, Fredrik | Linköping University |
Tiger, Mattias | AI and Integrated Computer Systems (AIICS), Linköping University |
Vivaldini, Kelen Cristiane Teixeira | FEL-CTU / DC - UFSCar |
Keywords: Motion and Path Planning, AI-Enabled Robotics, Autonomous Agents
Abstract: Unmanned aerial vehicles (UAVs), known for their agile flight capabilities, require safe trajectory planning to achieve high-speed flights. This is necessary to swiftly evade obstacles and adapt trajectories under hard real-time constraints. These adjustments are essential to generate viable paths that prevent collisions while maintaining high speeds with minimal tracking errors. This paper addresses the challenge of enhancing the safety of agile trajectory planning. The proposed method combines a supervised learning approach, as teacher policy, with deep reinforcement learning (DRL), as student policy. Initially, we train the teacher policy using a path planning algorithm that prioritizes safety while minimizing jerk and flight time. Then, we use this policy to guide the learning of the student policy in various unknown environments. Testing in simulation demonstrates noteworthy advancements, including an 80% reduction in tracking error, a 31% decrease in flight time, a 19% increase in high-speed duration, and a success rate improvement from 50% to 100%, as compared to baseline methods.
|
|
09:00-10:00, Paper WePI2T8.8 | |
A Generic Trajectory Planning Method for Constrained All-Wheel-Steering Robots |
|
Xin, Ren | The Hong Kong University of Science and Technology |
Liu, Hongji | The Hong Kong University of Science and Technology |
Chen, Yingbing | The Hongkokng University of Science and Technology |
Cheng, Jie | Hong Kong University of Science and Technology |
Wang, Sheng | Hong Kong University of Science and Technology |
Ma, Jun | The Hong Kong University of Science and Technology |
Liu, Ming | Hong Kong University of Science and Technology (Guangzhou) |
Keywords: Motion and Path Planning, Nonholonomic Motion Planning, Optimization and Optimal Control
Abstract: This paper presents a generic trajectory planning method for wheeled robots with fixed steering axes while the steering angle of each wheel is constrained. In the existing literatures, All-Wheel-Steering (AWS) robots, incorporating modes such as rotation-free translation maneuvers, in-situ rotational maneuvers, and proportional steering, exhibit inefficient performance due to time-consuming mode switches. This inefficiency arises from wheel rotation constraints and inter-wheel cooperation requirements. The direct application of a holonomic moving strategy can lead to significant slip angles or even structural failure. Additionally, the limited steering range of AWS wheeled robots exacerbates non-linearity characteristics, thereby complicating control processes. To address these challenges, we developed a novel planning method termed Constrained AWS (C-AWS), which integrates second-order discrete search with predictive control techniques. Experimental results demonstrate that our method adeptly generates feasible and smooth trajectories for C-AWS while adhering to steering angle constraints.
|
|
09:00-10:00, Paper WePI2T8.9 | |
A Safe and Efficient Timed-Elastic-Band Planner for Unstructured Environments |
|
Xi, Haoyu | University of Chinese Academy of Sciences |
Li, Wei | Institute of Computing Technology, Chinese Academy of Sciences |
Zhao, Fangzhou | Institute of Computing Technology, Chinese Academy of Sciences |
Chen, Liang | Institute of Computing Technology: Beijing, CN |
Hu, Yu | Institute of Computing Technology Chinese Academy of Sciences |
Keywords: Motion and Path Planning, Autonomous Vehicle Navigation, Field Robots
Abstract: In unstructured environments with complex obstacles and obscure road boundaries, the local planner faces more severe challenges in terms of safety and real-time performance. In order to fulfill these emerging requirements, we propose a novel Timed- Elastic-Band approach for unstructured environments, abbreviated as TEB-U. This approach incorporates a free space extraction optimization module for 2D occupancy grid maps, which efficiently transforms irregular free space boundaries into polygons and restrains robots within the boundaries. Moreover, a dynamic global point adjustment module is designed to adaptively correct the trajectory points obtained from the global planner, thereby enabling robots to travel along the centerline of free space and providing a better initial trajectory for subsequent modules. To reduce the computational cost, we replace the obstacle constraint of TEB with the boundary constraint in hyper-graph optimization. We evaluate our planner in three distinct scenarios, and the results show that TEB-U improves the average success rate by 21% and reduces the planning time by 23% compared to TEB in unstructured road, which demonstrates its safety and efficiency.
|
|
09:00-10:00, Paper WePI2T8.10 | |
An Optimization-Based Planner with B-Spline Parameterized Continuous-Time Reference Signals |
|
Tao, Chuyuan | University of Illinois, Urbana and Champaign |
Cheng, Sheng | University of Illinois Urbana-Champaign |
Zhao, Yang | University of Illinois Urbana-Champaign |
Wang, Fanxin | University of Illinois at Urbana-Champaign |
Hovakimyan, Naira | University of Illinois at Urbana-Champaign |
Keywords: Motion and Path Planning, Optimization and Optimal Control
Abstract: For the cascaded planning and control modules implemented for robot navigation, the frequency gap between the planner and controller has received limited attention. In this study, we introduce a novel B-spline parameterized optimization-based planner (BSPOP) designed to address the frequency gap challenge with limited onboard computational power in robots. The proposed planner generates continuous-time control inputs for low-level controllers running at arbitrary frequencies to track. Furthermore, when considering the convex control action sets, BSPOP uses the convex hull property to automatically constrain the continuous-time control inputs within the convex set. Consequently, compared with the discrete-time optimization-based planners, BSPOP reduces the number of decision variables and inequality constraints, which improves computational efficiency as a byproduct. Simulation results demonstrate that our approach can achieve a comparable planning performance to the high-frequency baseline optimization-based planners while demanding less computational power. Both simulation and experiment results show that the proposed method performs better in planning compared with baseline planners in the same frequency.
|
|
09:00-10:00, Paper WePI2T8.11 | |
Sequential Convex Programming for Time-Optimal Quadrotor Waypoint Flight |
|
Shen, Zhipeng | The Hong Kong Polytechnic University |
Zhou, Guanzhong | The Hong Kong Polytechnic University |
Huang, Hailong | The Hong Kong Polytechnic University |
Keywords: Motion and Path Planning, Motion Control, Optimization and Optimal Control
Abstract: Agile flight is significant for target tracking, search and rescue, and delivery applications. To achieve agile flight, we can exploit the actuator's potential by utilizing the full dynamics of the quadrotor. However, the 6-degrees-of-freedom dynamics render the optimization problem non-convex, and thus computationally intractable. To tackle this issue, we convert the original non-convex optimal control problem (OCP) into a convex subproblem and use the sequential convex programming (SCP) algorithm to iteratively solve the subproblems. Moreover, the state-triggered constraints are proposed to simultaneously optimize the time allocation of the waypoint and the trajectory itself. The numerical and physical experiment results show that the SCP algorithm can significantly reduce the computing time while ensuring a satisfactory solution.
|
|
09:00-10:00, Paper WePI2T8.12 | |
Energy-Optimized Planning in Non-Uniform Wind Fields with Fixed-Wing Aerial Vehicles |
|
Duan, Yufei | KTH Royal Institute of Technology |
Achermann, Florian | ETH Zurich, ASL |
Lim, Jaeyoung | ETH Zurich |
Siegwart, Roland | ETH Zurich |
Keywords: Motion and Path Planning, Nonholonomic Motion Planning, Aerial Systems: Perception and Autonomy
Abstract: Fixed-wing small uncrewed aerial vehicles (sUAVs) possess the capability to remain airborne for extended durations and traverse vast distances. However, their operation is susceptible to wind conditions, particularly in regions of complex terrain where high wind speeds may push the aircraft beyond its operational limits, potentially raising safety concerns. Moreover, wind impacts the energy required to follow a path, especially in locations where the wind direction and speed are not favorable. Incorporating wind information into mission planning is essential to ensure both safety and energy efficiency. In this paper, we propose a sampling-based planner using the kinematic Dubins aircraft paths with respect to the ground, to plan energy-efficient paths in non-uniform wind fields. We study the characteristics of the planner with synthetic and real-world wind data and compare its performance against baseline cost and path formulations. We demonstrate that the energy-optimized planner effectively utilizes updrafts to minimize energy consumption, albeit at the expense of increased travel time. The ground-relative path formulation facilitates the generation of safe trajectories onboard sUAVs within reasonable computational timeframes.
|
|
09:00-10:00, Paper WePI2T8.13 | |
Efficient Path Planning for Modular Reconfigurable Robots |
|
Mayer, Matthias | Technical University of Munich |
Li, Zihao | Technical University of Munich |
Althoff, Matthias | Technische Universität München |
Keywords: Motion and Path Planning, Cellular and Modular Robots, Industrial Robots
Abstract: Industrial robots are essential for modern production but often struggle to adapt to new tasks. Modular (reconfigurable) robots can overcome this challenge by eliminating the need to replace the whole robot. However, finding the optimal assembly for a task remains difficult because a valid path has to be computed for each generated assembly - consuming a significant fraction of the computation time. Similar to online path planning, where previous approaches adapt known paths to a changing environment, we show that transferring paths from previously considered module assemblies accelerates path planning for the next assemblies. On average, our method reduces the planning time for single-goal tasks by 50%. The usefulness of our method is evaluated by integrating it in a genetic algorithm (GA) for optimizing assemblies and evaluating it on our benchmark suite CoBRA. Within the optimization loop for modular robots, the time used to check a single assembly is shortened by up to 50%.
|
|
09:00-10:00, Paper WePI2T8.14 | |
Robust Precision Landing of a Quadrotor with Online Temporal Scaling Adaptation of Dynamic Movement Primitives |
|
Rothomphiwat, Kongkiat | VidyasirimedhiInstitute of Science and Technology (VISTEC) |
Jaroonsorn, Prakarn | AI and Robotics Ventures Co., Ltd |
Kriengkomol, Pakpoom | AI and Robotics Ventures |
Manoonpong, Poramate | Vidyasirimedhi Institute of Science and Technology (VISTEC) |
Keywords: Motion and Path Planning, Aerial Systems: Mechanics and Control, Robust/Adaptive Control
Abstract: In this work, we address the challenges of robust precision landing maneuvers for a quadrotor on both stationary and moving ground targets in the presence of disturbances that can cause the quadrotor to deviate from its desired trajectory, leading to maneuver failure. To overcome this, we propose a novel online adaptive trajectory planning approach based on the online temporal scaling adaptation of dynamic movement primitives (DMPs). This adaptation enables the desired trajectory to be dynamically adjusted in response to tracking errors and the goal’s state. Consequently, our proposed approach enhances accuracy, precision, and safety during landing maneuvers. The effectiveness of the approach is evaluated through comprehensive experiments conducted in both physical simulations and real-world environments, covering various disturbance scenarios.
|
|
09:00-10:00, Paper WePI2T8.15 | |
Sampling-Based Motion Planning for Optimal Probability of Collision under Environment Uncertainty |
|
Lu, Hao | Australian National University |
Kurniawati, Hanna | Australian National University |
Shome, Rahul | The Australian National University |
Keywords: Motion and Path Planning
Abstract: Motion planning is a fundamental capability in robotics applications. Real-world scenarios can introduce un- certainty to the motion planning problem. In this work we study environment uncertainty in general high-dimensional problems wherein the choice of appropriate metrics and formulations are shown to have significant effect on the probability of collision of the solution path. Several practically motivated cost functions have been proposed in literature to model and solve the problem but are shown in this work to suffer from higher probabilities of collision. The current work presents a theoretically sound formulation that was first mentioned in previous work on minimum constraint removal. In this work, approximating the optimal problem is shown to be better in achieving lower probability of collision. To demonstrate the formulation in a sampling-based setting, a mixed integer linear program seeded by greedy search over a roadmap with sampled environments is used to report paths with low probability of collision. Compared against minimizing the sum and minimizing max probability cost functions on a seven degree-of-freedom robotic arm in uncertain environments, we show clear benefits and promise towards motion planning for optimal probability of collision.
|
|
09:00-10:00, Paper WePI2T8.16 | |
Flexible Informed Trees (FIT*): Adaptive Batch-Size Approach in Informed Sampling-Based Path Planning |
|
Zhang, Liding | Technical University of Munich |
Bing, Zhenshan | Technical University of Munich |
Chen, Kejia | Technical University of Munich |
Chen, Lingyun | Technical University of Munich |
Cai, Kuanqi | Technical University of Munich |
Zhang, Yu | Technical University of Munich |
Wu, Fan | Technical University of Munich |
Krumbholz, Peter | KION Group |
Yuan, Zhilin | KION Group |
Haddadin, Sami | Technical University of Munich |
Knoll, Alois | Tech. Univ. Muenchen TUM |
Keywords: Motion and Path Planning, Manipulation Planning, Task and Motion Planning
Abstract: In path planning, anytime almost-surely asymptotically optimal planners dominate the benchmark of sampling-based planners. A notable example is Batch Informed Trees (BIT*), where planners iteratively determine paths to batches of vertices within the exploration area. However, utilizing a consistent batch size is inefficient for initial pathfinding and optimal performance, it relies on effective task allocation. This paper introduces Flexible Informed Trees (FIT*), a sampling-based planner that integrates an adaptive batch-size method to enhance the initial path convergence rate. FIT* employs a flexible approach in adjusting batch sizes dynamically based on the inherent dimension of the configuration spaces and the hypervolume of the n-dimensional hyperellipsoid. By applying dense and sparse sampling strategy, FIT* improves convergence rate while finding successful solutions faster with lower initial solution cost. This method enhances the planner's ability to handle confined, narrow spaces in the initial finding phase and increases batch vertices sampling frequency in the optimization phase. FIT* outperforms existing single-query, sampling-based planners on the tested problems in R^2 to R^8, and was demonstrated on a real-world mobile manipulation task.
|
|
WePI2T9 |
Room 9 |
Navigation I |
Teaser Session |
Co-Chair: Moustakas, Konstantinos | University of Patras |
|
09:00-10:00, Paper WePI2T9.1 | |
DriVLMe: Enhancing LLM-Based Autonomous Driving Agents with Embodied and Social Experiences |
|
Huang, Yidong | University of Michigan |
Sansom, Jacob | University of Michigan |
Ma, Ziqiao | University of Michigan |
Gervits, Felix | DEVCOM Army Research Laboratory |
Chai, Joyce | University of Michigan |
Keywords: Autonomous Vehicle Navigation, Natural Dialog for HRI, Multi-Modal Perception for HRI
Abstract: Recent advancements in foundation models (FMs) have unlocked new prospects in autonomous driving, yet the experimental settings of these studies are preliminary, over-simplified, and fail to capture the complexity of real-world driving scenarios in human environments. It remains under-explored whether FM agents can handle long-horizon navigation tasks with free-from dialogue and deal with unexpected situations caused by environmental dynamics or task changes. To explore the capabilities and boundaries of FMs faced with the challenges above, we introduce DriVLMe, a video-language-model-based agent to facilitate natural and effective communication between humans and autonomous vehicles that perceive the environment and navigate. We develop DriVLMe from both embodied experiences in a simulated environment and social experiences from real human dialogue. While DriVLMe demonstrates competitive performance in both open-loop benchmarks and closed-loop human studies, we reveal several limitations and challenges, including unacceptable inference time, imbalanced training data, limited visual understanding, challenges with multi-turn interactions, simplified language generation from robotic experiences, and difficulties in handling on-the-fly unexpected situations like environmental dynamics and task changes. Nevertheless, DriVLMe offers a promising new direction for autonomous driving agents that need to navigate not just complex environments but also complex social interactions.
|
|
09:00-10:00, Paper WePI2T9.2 | |
Perception for Connected Autonomous Vehicles under Adverse Weather Conditions |
|
Tsakmakopoulou, Dimitra | University of Patras |
Moustakas, Konstantinos | University of Patras |
Keywords: Autonomous Vehicle Navigation, Object Detection, Segmentation and Categorization, Intelligent Transportation Systems
Abstract: Autonomous Vehicles (AVs) have recently attracted considerable attention due to their potential to significantly reduce road accidents and improve people’s lives. However, they rely solely on the data collected by their mounted sensors to make predictions, which can lead to inaccurate results if a sensor becomes occluded or damaged. This issue can be addressed by employing Vehicle-to-Vehicle communication, which allows a Connected Autonomous Vehicle (CAV) to interact with other CAVs within its field of view and exchange information about their surrounding objects. Existing research on cooperative perception has primarily focused on clear weather scenarios, with limited exploration into adverse weather conditions. This paper demonstrates the necessity of Vehicle-to-Vehicle communication by showcasing its benefits in maintaining high accuracy under adverse weather conditions. A collaborative perception system is introduced and its performance in foggy weather scenarios is assessed to further improve adverse weather perception. The pipeline of the network combines state-of-the-art methods for accurate object detection. Specifically, with PointPillars as the backbone, the Spatial-wise Adaptive Feature Fusion method is used to aggregate information from different vehicles. The model is trained on the large-scale dataset OPV2V and evaluated on modified data to simulate fog. The experiments show that cooperative perception can maintain high detection accuracy even in challenging weather conditions. Finally, a comparative analysis of LiDAR detectors for cooperative perception in bad weather conditions is presented.
|
|
09:00-10:00, Paper WePI2T9.3 | |
Reward-Field Guided Motion Planner for Navigation with Limited Sensing Range |
|
Bayer, Jan | Czech Technical University in Prague |
Faigl, Jan | Czech Technical University in Prague |
Keywords: Autonomous Vehicle Navigation, Motion and Path Planning
Abstract: In this paper, we focus on improving planning efficiency for ground vehicles in navigation and exploration tasks where the environment is unknown or partially known, leading to frequent updates of the navigational goal as new sensory information is acquired. Asymptotically optimal motion planners like RRT* or FMT* can be used to plan the sequence of actions the robot can follow to achieve its current goal. Frequent replanning of the whole action sequence becomes computationally demanding when actions are not executed precisely because of limited information about the foreground terrain. The decoupled approach can decrease the computational burden with separated path planning and path following; however, it might lead to suboptimal solutions. Therefore, we propose a novel approach based on generating a reusable reward function that guides a fast sampling-based motion planner. The proposed method provides improved results in navigation scenarios compared to the former approaches, and it led to about 7% faster autonomous exploration than the decoupled approach. The present results support the suitability of the proposed method in navigation tasks with continuously updated navigation goals.
|
|
09:00-10:00, Paper WePI2T9.4 | |
Real-Time Path Generation and Alignment Control for Autonomous Curb Following |
|
Wang, Yuanzhe | Nanyang Technological University |
Dai, Yunxiang | Nanyang Technological University |
Wang, Danwei | Nanyang Technological University |
Keywords: Autonomous Vehicle Navigation, Motion and Path Planning, Perception-Action Coupling
Abstract: Curb following is a key technology for autonomous road sweeping vehicles. Currently, existing implementations primarily involve pre-recording waypoints during human driving and subsequently retracing them autonomously. Moreover, existing research related to this topic predominately focuses on curb detection for driver assistance, yet the resultant curb detection outcomes remain underutilized in the development of autonomous curb following systems. To fill this gap, this paper proposes a real-time path generation and alignment control approach to facilitate autonomous curb following. Firstly, a segmented path generation algorithm is introduced that progressively generates reference path segments while ensuring the overall continuity of the reference path. Secondly, a parameterized alignment control algorithm is developed to accurately navigate the vehicle along the planned reference path with proved stability. Real public road experiments have been conducted to validate the proposed approach. The experimental results demonstrate the efficacy of the proposed methodologies across various curb following scenarios, including common concave, convex, and straight-concave curbs, thereby showcasing the practical viability of our methods in real-world applications.
|
|
09:00-10:00, Paper WePI2T9.5 | |
Real-Time Hazard Prediction in Connected Autonomous Vehicles: A Digital Twin Approach |
|
Barroso Ramírez, Sergio | Universidad De Extremadura |
Zapata Cornejo, Noé José | Universidad De Extremadura |
Pérez González, Gerardo | Universidad De Extremadura |
Bustos, Pablo | Universidad De Extremadura |
Núñez, Pedro | University of Extremadura |
Keywords: Autonomous Vehicle Navigation, Collision Avoidance
Abstract: The growing interest in connected autonomous vehicles (CAVs) has intensified the focus on technologies and algorithms that enhance behavior, comfort, and safety. Among these, the concept of Digital Twins (DT) represents an emerging field of research that is now beginning to be applied to autonomous systems. Traditional Advanced Driver-Assistance Systems (ADAS) can prevent real-time collisions using sensor data. However, we propose that employing a DT can enable the accounting for complex, simulated decisions before they occur in reality. This paper introduces an initial model of a Digital Twin, founded on an internal simulator aligned with vehicle control architecture, for real-time hazard prediction and effective decision-making. Our DT synchronizes with the vehicle's state to simulate various hazardous scenarios in advance, allowing for preemptive actions. To support our hypothesis, we introduce an algorithm for the early detection of potential collisions between CAVs and pedestrians through the unsupervised simulation of diverse traffic scenarios. This solution integrates the CORTEX cognitive architecture with CARLA for internal simulation, leveraging probabilistic models to select optimal scenarios. Employing data from external pedestrian cameras, a particle filter predicts the most probable pedestrian trajectories via DT simulations, thereby informing safe maneuvers. Although the algorithm itself is established, the novelty of our approach lies in incorporating a simulator within the digital twin. This simulator, informed by real-time data on the vehicle's and environment's state, facilitates appropriate responses to unpredictable behaviors. We have conducted extensive tests with an actual autonomous electric vehicle on a university campus to validate the system's predictive and adaptive functions.
|
|
09:00-10:00, Paper WePI2T9.6 | |
Domain Adaptation in Visual Reinforcement Learning Via Self-Expert Imitation with Purifying Latent Feature |
|
Chen, Lin | Hu Nan University |
Huang, Jianan | Hunan University |
Zhou, Zhen | Hunan University |
Wang, Yaonan | Hunan University |
Mo, Yang | Hunan University |
Miao, Zhiqiang | Hunan University |
Zeng, Kai | Hunan University |
Feng, Mingtao | Xidian University |
Wang, Danwei | Nanyang Technological University |
Keywords: Autonomous Vehicle Navigation, Intelligent Transportation Systems
Abstract: Generalizing visual reinforcement learning is fundamental to robot visual navigation, involving the acquisition of a policy from interactions with source environments to facilitate adaptation to analogous, yet unfamiliar target environments. Recent advancements capitalize on data augmentation techniques, self-supervised learning methods, and the generative adversarial network framework to train policy neural networks with enhanced generalizability. However, current methods, upon extracting domain-general latent features, further utilize these features to train the reinforcement learning policy, resulting in a decline in the performance of the learned policy guiding the agent to accomplish tasks. To tackle these challenges, a framework of self-expert imitation with purifying latent features was devised, empowering the policy to achieve robust and stable zero-shot generalization performance in visually similar domains previously unseen, without diminishing the performance of guiding the agent to accomplish tasks. The extraction method of domain-general latent features is proposed to enhance their quality based on the variational autoencoder. Extensive experiments have shown that our policy, compared with state-of-the-art counterparts, does not diminish the performance of the policy guiding the agent to accomplish tasks after generalization.
|
|
09:00-10:00, Paper WePI2T9.7 | |
Switching Sampling Space of Model Predictive Path-Integral Controller to Balance Efficiency and Safety in 4WIDS Vehicle Navigation |
|
Aoki, Mizuho | Nagoya University |
Honda, Kohei | Nagoya University |
Okuda, Hiroyuki | Nagoya University |
Suzuki, Tatsuya | Nagoya University |
Keywords: Autonomous Vehicle Navigation, Redundant Robots, Motion and Path Planning
Abstract: Four-wheel independent drive and steering vehicle (4WIDS Vehicle, Swerve Drive Robot) has the ability to move in any direction by its eight degrees of freedom (DoF) control inputs. Although the high maneuverability enables efficient navigation in narrow spaces, obtaining the optimal command is challenging due to the high dimension of the solution space. This paper presents a navigation architecture using the Model Predictive Path Integral (MPPI) control algorithm to avoid collisions with obstacles of any shape and reach a goal point. The key idea to make the problem easier is to explore the optimal control input in a reasonably reduced dimension that is adequate for navigation. Through evaluation in simulation, we found that selecting the sampling space of MPPI greatly affects navigation performance. In addition, our proposed controller which switches multiple sampling spaces according to the real-time situation can achieve balanced behavior between efficiency and safety. Source code is available at https://github.com/MizuhoAOKI/mppi_swerve_drive_ros
|
|
09:00-10:00, Paper WePI2T9.8 | |
Visual Perception System for Autonomous Driving |
|
Zhang, Qi | University of Bath |
Gou, Siyuan | University of Bath |
Li, Wenbin | University of Bath |
Keywords: Autonomous Vehicle Navigation, Vision-Based Navigation, SLAM
Abstract: The recent surge in interest in autonomous driving is fueled by its rapidly developing capacity to enhance safety, efficiency, and convenience. A key component of autonomous driving technology lies in its perceptual systems, where advancements have led to more precise algorithms applicable to autonomous driving, such as vision-based Simultaneous Localization and Mapping (SLAM), object detection, and tracking algorithms. This work introduces a visual-based perception system for autonomous driving that integrates trajectory tracking and prediction of moving objects to prevent collisions, while addressing the localization and mapping needs of autonomous driving. The system leverages motion cues from pedestrians to monitor and forecast their movements while simultaneously mapping the environment. This integrated approach resolves camera localization and tracks other moving objects in the scene, ultimately generating a sparse map to facilitate vehicle navigation. The performance, efficiency, and resilience of this approach are demonstrated through comprehensive evaluations of both simulated and real-world datasets.
|
|
09:00-10:00, Paper WePI2T9.9 | |
Rain-Reaper: Unmasking LiDAR-Based Detector Vulnerabilities in Rain |
|
Capraru, Richard | Nanyang Technological University |
Lupu, Emil Constantin | Imperial College London |
Demetriou, Soteris | Imperial College London |
Wang, Jian-Gang | Institute for Infocomm Research |
Soong, Boon Hee | Nanyang Technological University |
Keywords: Autonomous Vehicle Navigation, Object Detection, Segmentation and Categorization, Intelligent Transportation Systems
Abstract: LIDAR-based 3D object detection aims to enhance the situational awareness of autonomous vehicles. Despite recent advancements in this technology, there has been evidence that the susceptibility of 3D object detectors to signal spoofing is high, leading to the erroneous detection of 'ghost objects' or the failure to detect genuine ones. While prior work has investigated the design of these new attacks and new defenses, the effect of weather conditions, which is a hot topic in autonomous vehicle research, on both attacks and defenses has never been studied. Inspired by this observation, in this paper, we present a novel genetic algorithm-based attack, entitled Rain-Reaper, that leverages on the effect of rain and identifies critical detection points used by 3D detectors. We show that adverse weather conditions not only diminish detection distance and accuracy but also expose the limitations of existing defenses. We have found that the unique characteristics of wet roads lead to under-performing defenses, thus, leading to a false sense of confidence in them. The effectiveness and efficiency of the attack and the robustness of the defenses have been evaluated with both simulated and real data. Our Rain-Reaper demonstrates a high attack success rate while successfully evading existing defenses with an adversarial point budget of up to 8.8 times smaller than previously demonstrated state-of-the-art attacks.
|
|
09:00-10:00, Paper WePI2T9.10 | |
An Observability Constrained Downward-Facing Optical-Flow-Aided Visual-Inertial Odometry |
|
Liu, Dandi | Zhejiang University |
Mei, Jiahao | Zhejiang University of Technology |
Zhou, Jin | Zhejiang University |
Li, Shuo | Zhejiang University |
Keywords: Autonomous Vehicle Navigation, Sensor Fusion
Abstract: Visual-Inertial Odometry (VIO) has been widely used by autonomous drones as an onboard navigation method. However, it suffers from drifts especially in scenarios where the environments have few texture features such as an empty room with solid color walls. Optical flow sensors are another type of onboard sensor used by drones that face downward and measure the velocity by detecting changes in pixels between consecutive images, which don't introduce accumulative error. In this work, we present an efficient tight-coupled estimator to improve the accuracy of VIO by fusing the measurements of a downward-facing optical flow sensor into the VIO framework consistently. We further analyze the observability of the estimators and prove that there are four unobservable directions in the ideal case and then we utilize OC-EKF to maintain the consistency of the estimator. Furthermore, we extend an adaptive weighting algorithm to the proposed method, which can better adapt to the scenes where feature tracking is less accurate. Finally, both simulation and real-world experiments demonstrate the feasibility of the proposed method.
|
|
09:00-10:00, Paper WePI2T9.11 | |
Learning Autonomous Driving from Aerial Imagery |
|
Murali, Varun | Massachusetts Institute of Technology |
Rosman, Guy | Massachusetts Institute of Technology |
Karaman, Sertac | Massachusetts Institute of Technology |
Rus, Daniela | MIT |
Keywords: Autonomous Vehicle Navigation, Vision-Based Navigation, Autonomous Agents
Abstract: In this work, we consider the problem of learning end to end perception to control for ground vehicles solely from aerial imagery. Photogrammetric simulators allow the synthesis of novel views through the transformation of pre-generated assets into novel views. However, they have a large setup cost, require careful collection of data and often human effort to create usable simulators. We use a Neural Radiance Field (NeRF) as an intermediate representation to synthesize novel views from the point of view of a ground vehicle. These novel viewpoints can then be used for several downstream autonomous navigation applications. In this work, we demonstrate the utility of novel view synthesis though the application of training a policy for end to end learning from images and depth data. In a traditional real to sim to real framework, the collected data would be transformed into a visual simulator which could then be used to generate novel views. In contrast, using a NeRF allows a compact representation and the ability to optimize over the parameters of the visual simulator as more data is gathered in the environment. We demonstrate the efficacy of our method in a custom built mini-city environment through the deployment of imitation policies on robotic cars. We additionally consider the task of place localization and demonstrate that our method is able to relocalize the car in the real world.
|
|
09:00-10:00, Paper WePI2T9.12 | |
Magnetic Field Aided Vehicle Localization with Acceleration Correction |
|
Deshpande, Mrunmayee | Texas A&M University |
Majji, Manoranjan | Texas A&M University |
Ramos, J Humberto | University of Florida |
Keywords: Autonomous Vehicle Navigation, Localization, Mapping
Abstract: This paper presents a novel approach for vehicle localization by leveraging the ambient magnetic field within a given environment. Our approach involves introducing a global mathematical function for magnetic field mapping, combined with Euclidean distance-based matching technique for accurately estimating vehicle position in suburban settings. The mathematical function based map structure ensures efficiency and scalability of the magnetic field map, while the batch processing based localization provides continuity in pose estimation. Additionally, we establish a bias estimation pipeline for an onboard accelerometer by utilizing the updated poses obtained through magnetic field matching. Our work aims to showcase the potential utility of magnetic fields as supplementary aids to existing localization methods, particularly beneficial in scenarios where Global Positioning System (GPS) signal is restricted or where cost-effective navigation systems are required.
|
|
09:00-10:00, Paper WePI2T9.13 | |
Neuro-Explorer: Efficient and Scalable Exploration Planning Via Learned Frontier Regions |
|
Han, Kyung Min | Ewha Womans Univeristy |
Kim, Young J. | Ewha Womans University |
Keywords: Autonomous Vehicle Navigation, Motion and Path Planning, Machine Learning for Robot Control
Abstract: We present an efficient and scalable learning-based autonomous exploration system for mobile robots navigating unknown indoor environments. Our system incorporates three network models trained to identify the frontier region (FR), to evaluate the detected FR regions based on their proximity to the robot (A*-Net), and to measure the coverage reward at the FR regions (Viz-Net). Our method employs an active window of the map that moves along with the robot, offering scalable exploration capabilities while maintaining a high rate of exploration coverage owing to the two exploratory measures utilized by A*-Net (proximity) and Viz-Net (coverage). Consequently, Our system completes over 99% coverage in a large-scale benchmarking world, scaling up to 135m×80m. In contrast, other state-of-the-art approaches completed only less than 40% of the same world with a 30% slower exploration speed than ours.
|
|
09:00-10:00, Paper WePI2T9.14 | |
Skill Q-Network: Learning Adaptive Skill Ensemble for Mapless Navigation in Unknown Environments |
|
Seong, Hyunki | KAIST |
Shim, David Hyunchul | KAIST |
Keywords: Autonomous Vehicle Navigation, Reinforcement Learning, Autonomous Agents
Abstract: This paper focuses on the acquisition of mapless navigation skills within unknown environments. We introduce the Skill Q-Network (SQN), a novel reinforcement learning method featuring an adaptive skill ensemble mechanism. Unlike existing methods, our model concurrently learns a high-level skill decision process alongside multiple low-level navigation skills, all without the need for prior knowledge. Leveraging a tailored reward function for mapless navigation, the SQN is capable of learning adaptive maneuvers that incorporate both exploration and goal-directed skills, enabling effective navigation in new environments. Our experiments demonstrate that our SQN can effectively navigate complex environments, exhibiting a 40% higher performance compared to baseline models. Without explicit guidance, SQN discovers how to combine low-level skill policies, showcasing both goal-directed navigations to reach destinations and exploration maneuvers to escape from local minimum regions in challenging scenarios. Remarkably, our adaptive skill ensemble method enables zero-shot transfer to out-of-distribution domains, characterized by unseen observations from non-convex obstacles or uneven, subterranean-like environments. The project page is available at https://sites.google.com/view/skill-q-net.
|
|
09:00-10:00, Paper WePI2T9.15 | |
Look before You Leap: Socially Acceptable High-Speed Ground Robot Navigation in Crowded Hallways |
|
Sharma, Lakshay | Massachusetts Institute of Technology |
Buono, Nicolaniello | Massachusetts Institute of Technology |
Flather, Ashton | Massachusetts Institute of Technology |
Cai, Xiaoyi | Massachusetts Institute of Technology |
How, Jonathan | Massachusetts Institute of Technology |
Keywords: Autonomous Vehicle Navigation, Motion and Path Planning, Social HRI
Abstract: To operate safely and efficiently, autonomous warehouse/delivery robots must be able to accomplish tasks while navigating in dynamic environments and handling the large uncertainties associated with the motions/behaviors of other robots and/or humans. A key scenario in such environments is the hallway problem, where robots must operate in the same narrow corridor as human traffic going in one or both directions. Traditionally, robot planners have tended to focus on socially acceptable behavior in the hallway scenario at the expense of performance. This paper proposes a planner that aims to address the consequent "robot freezing problem" in hallways by allowing for "peek-and-pass" maneuvers. We then go on to demonstrate in simulation how this planner improves robot time to goal without violating social norms. Finally, we show initial hardware demonstrations of this planner in the real world, along with a novel STAR (Socially Trained Agile Robot) platform designed with human comfort in mind.
|
|
09:00-10:00, Paper WePI2T9.16 | |
Learning Sampling Distribution and Safety Filter for Autonomous Driving with VQ-VAE and Differentiable Optimization |
|
Idoko, Simon | University of Tartu |
Sharma, Basant | University of Tartu |
Singh, Arun Kumar | University of Tartu |
Keywords: Autonomous Vehicle Navigation, Optimization and Optimal Control, Machine Learning for Robot Control
Abstract: Sampling trajectories from a distribution followed by ranking them based on a specified cost function is a common approach in autonomous driving. Typically, the sampling distribution is hand-crafted (e.g a Gaussian, or a grid). Recently, there have been efforts towards learning the sampling distribution through generative models such as Conditional Variational Autoencoder (CVAE). However, these approaches fail to capture the multi-modality of the driving behaviour due to the Gaussian latent prior of the CVAE. Thus, in this paper, we re-imagine the distribution learning through vector quantized variational autoencoder (VQ-VAE), whose discrete latent-space is well equipped to capture multi-modal sampling distribution. The VQ-VAE is trained with demonstration data of optimal trajectories. We further propose a differentiable optimization based safety filter to minimally correct the VQVAE sampled trajectories to ensure collision avoidance. We use backpropagation through the optimization layers in a selfsupervised learning set-up to learn good initialization and optimal parameters of the safety filter. We perform extensive comparisons with state-of-the-art CVAE-based baseline in dense and aggressive traffic scenarios and show a reduction of up to 12 times in collision-rate while being competitive in driving speeds.
|
|
WePI2T10 |
Room 10 |
Simultaneous Localization and Mapping (SLAM) II |
Teaser Session |
Chair: La, Hung | University of Nevada at Reno |
Co-Chair: Milford, Michael J | Queensland University of Technology |
|
09:00-10:00, Paper WePI2T10.1 | |
CBGL: Fast Monte Carlo Passive Global Localisation of 2D LIDAR Sensor |
|
Filotheou, Alexandros | Aristotle University of Thessaloniki |
Keywords: Localization, Range Sensing
Abstract: Navigation of a mobile robot is conditioned on the knowledge of its pose. In observer-based localisation configurations its initial pose may not be knowable in advance, leading to the need of its estimation. Solutions to the problem of global localisation are either robust against noise and environment arbitrariness but require motion and time, which may (need to) be economised on, or require minimal estimation time but assume environmental structure, may be sensitive to noise, and demand preprocessing and tuning. This article proposes a method that retains the strengths and avoids the weaknesses of the two approaches. The method leverages properties of the Cumulative Absolute Error per Ray (CAER) metric with respect to the errors of pose estimates of a 2D LIDAR sensor, and utilises scan--to--map-scan matching for fine(r) pose estimations. A large number of tests, in real and simulated conditions, involving disparate environments and sensor properties, illustrate that the proposed method outperforms state-of-the-art methods of both classes of solutions in terms of pose discovery rate and execution time. The source code is available for download.
|
|
09:00-10:00, Paper WePI2T10.2 | |
SGNet: Salient Geometric Network for Point Cloud Registration |
|
Wu, Qianliang | Nanjing University of Science and Technology |
Ding, Yaqing | Czech Technical University in Prague |
Luo, Lei | Nanjing University of Science and Technology |
Jiang, Haobo | Nanjing University of Science and Technology |
Gu, Shuo | Nanjing University of Science and Technology |
Zhou, Chuanwei | Nanjing University of Science and Technology |
Xie, Jin | Nanjing University of Science and Technology |
Yang, Jian | Nanjing University of Science & Technology |
Keywords: Localization, Deep Learning for Visual Perception, Computer Vision for Automation
Abstract: Point Cloud Registration (PCR) is a critical and challenging task in computer vision and robotics. One of the primary difficulties in PCR is identifying salient and meaningful points that exhibit consistent semantic and geometric properties across different scans. Previous methods have encountered challenges with ambiguous matching due to the similarity among patch blocks throughout the entire point cloud and the lack of consideration for efficient global geometric consistency. To address these issues, we propose a new framework that includes several novel techniques. Firstly, we introduce a semantic-aware geometric encoder that combines object-level and patch-level semantic information. This encoder significantly improves registration recall by reducing ambiguity in patch-level superpoint matching. Additionally, we incorporate a prior knowledge approach that utilizes an intrinsic shape signature to identify salient points. This enables us to extract the most salient super points and meaningful dense points in the scene. Secondly, we introduce an innovative transformer that encodes High-Order (HO) geometric features. These features are crucial for identifying salient points within initial overlap regions while considering global high-order geometric consistency. We introduce an anchor node selection strategy to optimize this high-order transformer further. By encoding inter-frame triangle or polyhedron consistency features based on these anchor nodes, we can effectively learn high-order geometric features of salient super points. These high-order features are then propagated to dense points and utilized by a Sinkhorn matching module to identify critical correspondences for successful registration. The experiments conducted on the 3DMatch/3DLoMatch and KITTI datasets demonstrate the effectiveness of our method.
|
|
09:00-10:00, Paper WePI2T10.3 | |
Resource-Aware Collaborative Monte Carlo Localization with Distribution Compression |
|
Zimmerman, Nicky | University of Lugano |
Giusti, Alessandro | IDSIA USI-SUPSI |
Guzzi, Jerome | IDSIA, USI-SUPSI |
Keywords: Localization, Multi-Robot Systems
Abstract: Global localization is essential in enabling robot autonomy, and collaborative localization is key for multi-robot systems, allowing for more efficient planning and execution of tasks. In this paper, we address the task of collaborative global localization under computational and communication constraints. We propose a method which reduces the amount of information exchanged and the computational cost. We also analyze, implement and open-source seminal approaches, which we believe to be a valuable contribution to the community. We exploit techniques for distribution compression in near-linear time, with error guarantees. We evaluate our approach and the implemented baselines on multiple challenging scenarios, simulated and real-world. Our approach can run online on an onboard computer. We release an open-source C++/ROS2 implementation of our approach, as well as the baselines.
|
|
09:00-10:00, Paper WePI2T10.4 | |
Neighborhood Consensus Guided Matching Based Place Recognition with Spatial-Channel Embedding |
|
Li, Kunmo | Northeastern University |
Zhang, Yunzhou | Northeastern University |
Ning, Jian | Northeastern University |
Zhao, Xinge | Northeastern University |
Wang, Guiyuan | Jiangsu Shuguang Optoelectronics Co., Ltd., Yangzhou, China |
Liu, Wei | Jiangsu Shuguang Optoelectronics Co., Ltd., Yangzhou, China |
Keywords: Localization, SLAM
Abstract: As a crucial part of mobile robotics and autonomous driving, Visual Place Recognition (VPR) is usually addressed by recognizing its similar reference images from a pre-obtained database. However, VPR always suffers from environmental changes, such as weather, illumination, perceptualaliasing and so on. To address this, we firstly introduce a robust and discriminative global descriptor aggregation technique that normalizes the spatial and channel dimensions of features. A Spatial-Channel Embedding (SCE) module is proposed to learn the spatial and scale information of features which make global features more discriminative. Meanwhile, the traditional re-ranking methods (e.g. RANSAC) for geometric consistency verification are time-consuming. Here we propose a Neighborhood Consensus Guided Matching (NCGM) module, which uses Neighborhood Consensus to filter the features from patch-level matching to achieve more accurate matching while reduces the time consumption. Through extensive experiments on multiple benchmarks, we demonstrate that our method outperforms several state-of-the-art methods while maintaining lower time consumption and storage requirements.
|
|
09:00-10:00, Paper WePI2T10.5 | |
Optimal Robot Formations: Balancing Range-Based Observability and User-Defined Configurations |
|
Ahmed, Syed Shabbir | McGill University |
Shalaby, Mohammed Ayman | McGill University |
Le Ny, Jerome | Polytechnique Montreal |
Forbes, James Richard | McGill University |
Keywords: Localization, Path Planning for Multiple Mobile Robots or Agents, Aerial Systems: Applications
Abstract: This paper introduces a set of customizable and novel cost functions that enable the user to easily specify desirable robot formations, such as a ``high-coverage'' infrastructure-inspection formation, while maintaining high relative pose estimation accuracy. The overall cost function balances the need for the robots to be close together for good ranging-based relative localization accuracy and the need for the robots to achieve specific tasks, such as minimizing the time taken to inspect a given area. The formations found by minimizing the aggregated cost function are evaluated in a coverage path planning task in simulation and experiment, where the robots localize themselves and unknown landmarks using a simultaneous localization and mapping algorithm based on the extended Kalman filter. Compared to an optimal formation that maximizes ranging-based relative localization accuracy, these formations significantly reduce the time to cover a given area with minimal impact on relative pose estimation accuracy.
|
|
09:00-10:00, Paper WePI2T10.6 | |
Augmenting Vision with Radar for All-Weather Geo-Localization without a Prior HD Map |
|
Dong, Can | Harbin Institute of Technology, Shenzhen |
Hong, Ziyang | Heriot-Watt University |
Li, Siru | Harbin Institute of Technology, Shenzhen |
Hu, Liang | Harbin Institute of Technology, Shenzhen |
Gao, Huijun | Harbin Institute of Technology |
Keywords: Localization, SLAM, Sensor Fusion
Abstract: Accurate and robust geo-localization in all-weather conditions is essential for enabling autonomous vehicles and delivery robots to offer uninterrupted mobility services in the real world. In this paper, we propose the first camera and radar fusion based geo-localisation method that is robust to all-weather conditions. The core of the proposed method is to leverage the rich semantics information in images and sensing consistency in radars across all-weather. Our proposed method surpasses the state of the art camera-based and LiDAR-camera based methods in inclement weather conditions, shown by extensive comparative experiments. Notably, our approach requires only an open-accessible map, eliminating the need for high-definition maps and offering a cost-effective solution for geo-localizing or globally localizing autonomous vehicles in any weather condition. Our code and trained model will be released publicly.
|
|
09:00-10:00, Paper WePI2T10.7 | |
High-Accuracy 2-D AoA Estimation Using Lightweight UWB Arrays |
|
Li, Yi | Tsinghua University |
Zhao, Hanying | Tsinghua University |
Liu, Yiman | Tsinghua University |
Wang, Tianyu | QiYuan Lab |
Jincheng, Yu | Tsinghua University |
Shen, Yuan | Tsinghua University |
Keywords: Localization, Sensor Networks
Abstract: UWB systems are gaining popularity for multi-robot localization benefiting from their high-accuracy ranging capabilities. However, current UWB systems fall short in determining orientations and realizing pair-wise localization for neglecting bearing information. Given the importance of bearing capabilities, especially when vision-based methods fail, this paper proposes a high-accuracy 2-D bearing estimation method using stereo UWB arrays. We propose a novel phase error calibration method that effectively mitigates various phase imperfections. This array is designed with antenna spacing larger than half the wavelength to diminish antenna coupling and enhance bearing accuracy. As regards the phase ambiguity issue arising from large antenna spacing, a distributed range-assisted phase ambiguity determination method is developed. Our bearing estimation method exhibits low complexity and is well-suited for the deployment on mobile robots with limited computational resources. The performance of the proposed method is validated on the practical platforms under dynamic scenarios, yielding RMSEs less than 4^circ and 3^circ for azimuth and elevation angle estimation, respectively.
|
|
09:00-10:00, Paper WePI2T10.8 | |
Explicit Interaction for Fusion-Based Place Recognition |
|
Xu, Jingyi | Beijing Institute of Technology |
Ma, Junyi | Beijing Institute of Technology |
Wu, Qi | Shanghai Jiao Tong University |
Zhou, Zijie | Beijing Institute of Technology |
Wang, Yue | Zhejiang University |
Chen, Xieyuanli | National University of Defense Technology |
Yu, Wenxian | Shanghai Jiao Tong University |
Pei, Ling | Shanghai Jiao Tong University |
Keywords: Localization, SLAM, Sensor Fusion
Abstract: Fusion-based place recognition is an emerging technique jointly utilizing multi-modal perception data, to recognize previously visited places in GPS-denied scenarios for robots and autonomous vehicles. Recent fusion-based place recognition methods combine multi-modal features in implicit manners. While achieving remarkable results, they do not explicitly consider what the individual modality affords in the fusion system. Therefore, the benefit of multi-modal feature fusion may not be fully explored. In this paper, we propose a novel fusion-based network, dubbed EINet, to achieve explicit interaction of the two modalities. EINet uses LiDAR ranges to supervise more robust vision features for long time spans, and simultaneously uses camera RGB data to improve the discrimination of LiDAR point clouds. In addition, we develop a new benchmark for the place recognition task based on the nuScenes dataset. To establish this benchmark for future research with comprehensive comparisons, we introduce both supervised and self-supervised training schemes alongside evaluation protocols. We conduct extensive experiments on the proposed benchmark, and the experimental results show that our EINet exhibits better recognition performance as well as solid generalization ability compared to the state-of-the-art fusion-based place recognition approaches. Our open-source code and benchmark are released at: https://github.com/BIT-XJY/EINet.
|
|
09:00-10:00, Paper WePI2T10.9 | |
ModaLink: Unifying Modalities for Efficient Image-To-PointCloud Place Recognition |
|
Xie, Weidong | Xi'an Jiaotong University |
Luo, Lun | Zhejiang University |
Ye, Nanfei | Haomo |
Ren, Yi | Carnegie Mellon University |
Du, Shaoyi | Xi'an Jiaotong University |
Wang, Minhang | HAOMO.AI Technology Co., Ltd |
Xu, Jintao | HAOMO.AI Technology Co., Ltd |
Ai, Rui | HAOMO.AI Technology Co., Ltd |
Gu, Weihao | HAOMO.AI Technology Co., Ltd |
Chen, Xieyuanli | National University of Defense Technology |
Keywords: Localization, Autonomous Vehicle Navigation, Visual Learning
Abstract: Place recognition is an important task for robots and autonomous cars to localize themselves and close loops in pre-built maps. While single-modal sensor-based methods have shown satisfactory performance, cross-modal place recognition that retrieving images from a point-cloud database remains a challenging problem. Current cross-modal methods transform images into 3D points using depth estimation for modality conversion, which are usually computationally intensive and need expensive labeled data for depth supervision. In this work, we introduce a fast and lightweight framework to encode images and point clouds into place-distinctive descriptors. We propose an effective Field of View (FoV) transformation module to convert point clouds into an analogous modality as images. This module eliminates the necessity for depth estimation and helps subsequent modules achieve real-time performance. We further design a non-negative factorization-based encoder to extract mutually consistent semantic features between point clouds and images. This encoder yields more distinctive global descriptors for retrieval. Experimental results on the KITTI dataset show that our proposed methods achieve state-of-the-art performance while running in real time. Additional evaluation on the HAOMO dataset covering a 17 km trajectory further shows the practical generalization capabilities. We have released the implementation of our methods as open source at: url{https://github.com/SpadyDong/ModaLink.git}.
|
|
09:00-10:00, Paper WePI2T10.10 | |
A Multi-Model Fusion of LiDAR-Inertial Odometry Via Localization and Mapping |
|
Nguyen, An | University of Nevada, Reno |
Le, Chuong | University of Nevada, Reno |
Walunj, Pratik | University of Nevada Reno |
Do, Thanh Nho | UNSW |
Netchaev, Anton | USACE ERDC |
La, Hung | University of Nevada at Reno |
Keywords: Localization, Sensor Fusion, Range Sensing
Abstract: This work presents a comprehensive LiDAR-inertial odometry framework featuring robust smoothing and mapping capabilities, effectively correcting LiDAR feature point skewness using an inertial measurement unit (IMU). While the Extended Kalman Filter (EKF) is a common choice for nonlinear motion estimation, its complexity grows when handling maneuvering targets. To overcome this challenge, a new framework that incorporates the Iterated Interactive Multiple Models of Kalman Filter (IMMKF) is given, providing a solution for reliable navigation in dynamic motion and noisy conditions. To ensure map consistency, an ikd-tree that facilitates continuous updates and adaptive rebalance is employed, preserving the map's integrity. To guarantee the robustness of our approach, it undergoes extensive testing across diverse scales of indoor and outdoor environments. This testing scenario simulates absolute GPS denial. In terms of estimated motion, the new algorithm demonstrates superior accuracy compared to existing approaches. The implementation is openly accessible on GitHub^4 for further exploration.
|
|
09:00-10:00, Paper WePI2T10.11 | |
Dynamically Modulating Visual Place Recognition Sequence Length for Minimum Acceptable Performance Scenarios |
|
Malone, Connor | Queensland University of Technology |
Vora, Ankit | Ford Motor Company |
Peynot, Thierry | Queensland University of Technology (QUT) |
Milford, Michael J | Queensland University of Technology |
Keywords: Localization
Abstract: Mobile robots and autonomous vehicles are often required to function in environments where critical position estimates from sensors such as GPS become uncertain or unreliable. Single image visual place recognition (VPR) provides an alternative for localization but often requires techniques such as sequence matching to improve robustness, which incurs additional computation and latency costs. Even then, the sequence length required to localize at an acceptable performance level varies widely; and simply setting overly long fixed sequence lengths creates unnecessary latency, computational overhead, and can even degrade performance. In these scenarios it is often more desirable to meet or exceed a set target performance at minimal expense. In this paper we present an approach which uses a calibration set of data to fit a model that modulates sequence length for VPR as needed to exceed a target localization performance. We make use of a coarse position prior, which could be provided by any other localization system, and capture the variation in appearance across this region. We use the correlation between appearance variation and sequence length to curate VPR features and fit a Multi-Layer Perceptron (MLP) for selecting the optimal length. We demonstrate that this method is effective at modulating sequence length to maximize the number of sections in a dataset which meet or exceed a target performance whilst minimizing the median length used. We show applicability across several datasets and reveal key phenomena like generalization capabilities, the benefits of curating features and the utility of non-state-of-the-art feature extractors with nuanced properties.
|
|
09:00-10:00, Paper WePI2T10.12 | |
JointLoc: A Real-Time Visual Localization Framework for Planetary UAVs Based on Joint Relative and Absolute Pose Estimation |
|
Luo, Xubo | University of Chinese Academy of Sciences |
Wan, Xue | Technology and Engineering Center for Space Utilization, Chinese |
Gao, Yixing | Jilin University |
Tian, Yaolin | University of Chinese Academy of Sciences |
Zhang, Wei | Chinese Academy of Sciences |
Shu, Leizheng | Chinese Academy of Sciences |
Keywords: Localization, SLAM, Vision-Based Navigation
Abstract: Unmanned aerial vehicles (UAVs) visual localization in planetary aims to estimate the absolute pose of the UAV in the world coordinate system through satellite maps and images captured by on-board cameras. However, since planetary scenes often lack significant landmarks and there are modal differences between satellite maps and UAV images, the accuracy and real-time performance of UAV positioning will be reduced. In order to accurately determine the position of the UAV in a planetary scene in the absence of the global navigation satellite system (GNSS), this paper proposes JointLoc, which estimates the real-time UAV position in the world coordinate system by adaptively fusing the absolute 2-degree-of-freedom (2-DoF) pose and the relative 6-degree-of-freedom (6-DoF) pose. Extensive comparative experiments were conducted on a proposed planetary UAV image cross-modal localization dataset, which contains three types of typical Martian topography generated via a simulation engine as well as real Martian UAV images from the Ingenuity helicopter. JointLoc achieved a root-mean-square error of 0.237m in the trajectories of up to 1,000m, compared to 0.594m and 0.557m for ORB-SLAM2 and ORB-SLAM3 respectively. The source code will be available at https://github.com/LuoXubo/JointLoc.
|
|
09:00-10:00, Paper WePI2T10.13 | |
Enhancing Visual Place Recognition Via Fast and Slow Adaptive Biasing in Event Cameras |
|
B Nair, Gokul | QUT Centre for Robotics, Brisbane, Australia |
Milford, Michael J | Queensland University of Technology |
Fischer, Tobias | Queensland University of Technology |
Keywords: Localization
Abstract: Event cameras are increasingly popular in robotics due to beneficial features such as low latency, energy efficiency, and high dynamic range. Nevertheless, their downstream task performance is greatly influenced by the optimization of bias parameters. These parameters, for instance, regulate the necessary change in light intensity to trigger an event, which in turn depends on factors such as the environment lighting and camera motion. This paper introduces feedback control algorithms that automatically tune the bias parameters through two interacting methods: 1) An immediate, on-the-fly fast adaptation of the refractory period, which sets the minimum interval between consecutive events, and 2) if the event rate exceeds the specified bounds even after changing the refractory period repeatedly, the controller adapts the pixel bandwidth and event thresholds, which stabilizes after a short period of noise events across all pixels (slow adaptation). Our evaluation focuses on the visual place recognition task, where incoming query images are compared to a given reference database. We conducted comprehensive evaluations of our algorithms’ adaptive feedback control in real-time. To do so, we collected the QCR-Fast-and-Slow dataset that contains DAVIS346 event camera streams from 366 repeated traversals of a Scout Mini robot navigating through a 100 meter long indoor lab setting (totaling over 35km distance traveled) in varying brightness conditions with ground truth location information. Our proposed feedback controllers result in superior performance when compared to the standard bias settings and prior feedback control methods. Our findings also detail the impact of bias adjustments on task performance and feature ablation studies on the fast and slow adaptation mechanisms.
|
|
09:00-10:00, Paper WePI2T10.14 | |
Tightly-Coupled Factor Graph Formulation for Radar-Inertial Odometry |
|
Michalczyk, Jan | University of Klagenfurt |
Quell, Julius Karsten Oskar | Institute of Robotics and Mechatronics - German Aerospace Center |
Steidle, Florian | German Aerospace Center |
Müller, Marcus Gerhard | German Aerospace Center |
Weiss, Stephan | Universität Klagenfurt |
Keywords: Localization, Sensor Fusion, SLAM
Abstract: In this paper, we present a Radar-Inertial Odometry (RIO) method based on the nonlinear optimization of factor graphs in a sliding window fashion. Our method makes use of a light-weight, low-power, inexpensive and commonly available hardware enabling easy deployment on small Unmanned Aerial Vehicles (UAV)s. We keep the state estimation problem bounded by employing partial marginalization of the oldest states, rendering the method real-time capable. We compare the implemented approach to the state-of-the-art multi-state Extended Kalman Filter (EKF)-based method in a one-to-one fashion. That is, we implemented in a single custom C++ RIO framework both estimation back-ends with all other parts shared and thus identical for a fair direct comparison. In the real-world flight experiments, we compare the two methods and show that both perform similarly in terms of accuracy when the linearization point is not far from the true state. Upon wrong initialization, the factor graph approach heavily outperforms the EKF approach. We also acknowledge that the influence of undetected outliers can overwhelm the inherent benefits of the nonlinear optimization approach leading to the insight that the estimator front-end has an important (and often underestimated) role in the overall performance.
|
|
09:00-10:00, Paper WePI2T10.15 | |
Three-Dimensional Vehicle Dynamics State Estimation for High-Speed Race Cars under Varying Signal Quality |
|
Goblirsch, Sven | Technical University Munich |
Weinmann, Marcel | Technical University Munich |
Betz, Johannes | Technical University of Munich |
Keywords: Localization, Autonomous Agents, Autonomous Vehicle Navigation
Abstract: This work aims to present a three-dimensional vehicle dynamics state estimation under varying signal quality. Few researchers have investigated the impact of three-dimensional road geometries on the state estimation and, thus, neglect road inclination and banking. Especially considering high velocities and accelerations, the literature does not address these effects. Therefore, we compare two- and three-dimensional state estimation schemes to outline the impact of road geometries. We use an Extended Kalman Filter with a point-mass motion model and extend it by an additional formulation of reference angles. Furthermore, virtual velocity measurements significantly improve the estimation of road angles and the vehicle’s side slip angle. We highlight the importance of steady estimations for vehicle motion control algorithms and demonstrate the challenges of degraded signal quality and Global Navigation Satellite System dropouts. The proposed adaptive covariance facilitates a smooth estimation and enables stable controller behavior. The developed state estimation has been deployed on a high-speed autonomous race car at various racetracks. Our findings indicate that our approach outperforms state-of-the-art vehicle dynamics state estimators and an industry-grade Inertial Navigation System. Further studies are needed to investigate the performance under varying track conditions and on other vehicle types.
|
|
09:00-10:00, Paper WePI2T10.16 | |
LiDAR-Based HD Map Localization Using Semantic Generalized ICP with Road Marking Detection |
|
Gong, Yansong | UISEE Technology Co., Ltd |
Zhang, Xinglian | UISEE (Shanghai) Automotive Technologies Ltd |
Feng, Jingyi | UISEE Technology Co., Ltd |
He, Xiao | UISEE Technology (Beijing) Co., Ltd |
Zhang, Dan | Uisee Technology (Beijing) Co., Ltd |
Keywords: Localization, Autonomous Vehicle Navigation, Intelligent Transportation Systems
Abstract: In GPS-denied scenarios, a robust environmental perception and localization system becomes crucial for au- tonomous driving. In this paper, a LiDAR-based online localiza- tion system is developed, incorporating road marking detection and registration on a high-definition (HD) map. Within our system, a road marking detection approach is proposed with real- time performance, in which an adaptive segmentation technique is first introduced to isolate high-reflectance points correlated with road markings, enhancing real-time efficiency. Then, a spatio-temporal probabilistic local map is formed by aggregating historical LiDAR scans, providing a dense point cloud. Finally, a LiDAR bird’s-eye view (LiBEV) image is generated, and an instance segmentation network is applied to accurately label the road markings. For road marking registration, a semantic generalized iterative closest point (SG-ICP) algorithm is designed. Linear road markings are modeled as 1-manifolds embedded in 2D space, mitigating the influence of constraints along the linear direction, addressing the under-constrained problem and achieving a lower localization errors on HD maps than ICP. Extensive experiments are conducted in real-world scenarios, demonstrating the effectiveness and robustness of our system.
|
|
WePI2T11 |
Room 11 |
Multi-Robot Systems and Swarms I |
Teaser Session |
Chair: Parasuraman, Ramviyas | University of Georgia |
Co-Chair: Simonin, Olivier | INSA De Lyon |
|
09:00-10:00, Paper WePI2T11.1 | |
HGP-RL: Distributed Hierarchical Gaussian Processes for Wi-Fi-Based Relative Localization in Multi-Robot Systems |
|
Latif, Ehsan | University of Georgia |
Parasuraman, Ramviyas | University of Georgia |
Keywords: Multi-Robot Systems, Localization, Networked Robots
Abstract: Relative localization is crucial for multi-robot systems to perform cooperative tasks, especially in GPS-denied environments. Current techniques for multi-robot relative localization rely on expensive or short-range sensors such as cameras and LIDARs. As a result, these algorithms face challenges such as high computational complexity (e.g., map merging), dependencies on well-structured environments, etc. To remedy this gap, we propose a new distributed approach to perform relative localization (RL) using a common Access Point (AP). To achieve this efficiently, we propose a novel Hierarchical Gaussian Processes (HGP) mapping of the Radio Signal Strength Indicator (RSSI) values from a Wi-Fi AP to which the robots are connected. We termed this approach as HGP-RL (Hierarchical Gaussian Process for Relative Localization). Each robot performs hierarchical inference using the HGP map to locate the AP in its reference frame, and the robots obtain relative locations of the neighboring robots leveraging AP-oriented algebraic transformations. The approach readily applies to resource-constrained devices and relies only on the ubiquitously-available WiFi RSSI measurement. We extensively validate the performance of the proposed HGP-RL in Robotarium simulations against several state-of-the-art methods. The results indicate superior performance of HGP-RL regarding localization accuracy, computation, and communication overheads. Finally, we showcase the utility of HGP-RL through a multi-robot cooperative experiment to achieve a rendezvous task in a team of three mobile robots.
|
|
09:00-10:00, Paper WePI2T11.2 | |
Anchor-Oriented Localized Voronoi Partitioning for GPS-Denied Multi-Robot Coverage |
|
Munir, Aiman | University of Georgia |
Latif, Ehsan | University of Georgia |
Parasuraman, Ramviyas | University of Georgia |
Keywords: Multi-Robot Systems, Distributed Robot Systems, Sensor Networks
Abstract: Multi-robot coverage is crucial in numerous applications, including environmental monitoring, search and rescue operations, and precision agriculture. In modern applications, a multi-robot team must collaboratively explore unknown spatial fields in GPS-denied and extreme environments where global localization is unavailable. Coverage algorithms typically assume that the robot positions and the coverage environment are defined in a global reference frame. However, coordinating robot motion and ensuring coverage of the shared convex workspace without global localization is challenging. This paper proposes a novel anchor-oriented coverage (AOC) approach to generate dynamic localized Voronoi partitions based around a common anchor position. We further propose a consensus-based coordination algorithm that achieves agreement on the coverage workspace around the anchor in the robots' relative frames of reference. Through extensive simulations and real-world experiments, we demonstrate that the proposed anchor-oriented approach using localized Voronoi partitioning performs as well as the state-of-the-art coverage controller using GPS.
|
|
09:00-10:00, Paper WePI2T11.3 | |
Deep Ad-Hoc Sub-Team Partition Learning for Multi-Agent Air Combat Cooperation |
|
Fan, Songyuan | Harbin Institute of Technology |
Piao, Haiyin | Northwestern Polytechnical University |
Hu, Yi | Harbin Institute of Technology |
Jiang, Feng | Harbin Institute of Technology |
Yang, Roushu | SAIL |
Keywords: Multi-Robot Systems, Machine Learning for Robot Control, Reinforcement Learning
Abstract: In the future, unmanned autonomous air combat will encounter large-scale confrontation scenarios, where agents must consider complex time-varying relationships among aircraft when making decisions. Previous works have already introduced Multi-Agent Reinforcement Learning (MARL) into air combat and succeeded in surpassing the human expert level. However, they mainly focus on small-scale air combat with low relationship complexity, e.g., 1-vs-1 or 2-vs-2. As more agents join the confrontation, existing algorithms tend to suffer significant performance degradation due to the increase in problem dimensions. In view of this, this paper proposes Deep Ad-hoc Sub-Team Partition Learning(DASPL) to address large-scale air combat problems. DASPL models multi-agent air combat as a graph to handle the complex relations and introduces an automatic partitioning mechanism to generate dynamic sub-teams, which converts the existing large-scale multi-agent air combat cooperation problem into multiple small-scale equivalence problems. Additionally, DASPL incorporates an efficient message passing method among the participating sub-teams. Extensive experiments demonstrate that DASPL outperforms state-of-the-art algorithms with at least about 28.3% in large-scale air combat environments.
|
|
09:00-10:00, Paper WePI2T11.4 | |
Robustness Study of Optimal Geometries for Cooperative Multi-Robot Localization |
|
Theunissen, Mathilde | LS2N, CNRS |
Fantoni, Isabelle | CNRS |
Malis, Ezio | Inria |
Martinet, Philippe | INRIA |
Keywords: Multi-Robot Systems, Localization
Abstract: This work focuses on localizing a single target robot with multi-robot formations in 2D space. The cooperative robots employ inter-robot range measurements to assess the target position. In the presence of noisy measurements, the choice of formation geometries significantly impacts the accuracy of the target robot's pose estimation. While an infinite number of geometries exists to optimize localization accuracy, the current practice is to choose the final formation geometry based on convenience criteria such as simplicity or proximity to the initial position of the robots. The former leads to the selection of regular polygon-shaped formations, while the latter results in behaviour-based formations. Different from existing works, we conduct a complete robustness study of formation geometries in the presence of deviations from the desired formation and range measurement errors. In 2D scenarios, we establish necessary and sufficient conditions for formation geometries to be robust against robot positioning errors. This result substantiates the extensive use of regular polygon formations. However, our analysis reveals the lack of robustness of the commonly used square formation geometry, which stands as an exception. Simulation results illustrate the advantages of these robust geometries in enhancing target localization accuracy.
|
|
09:00-10:00, Paper WePI2T11.5 | |
Decentralized Communication-Maintained Coordination for Multi-Robot Exploration: Achieving Connectivity and Adaptability |
|
Tang, Wei | Zhejiang University |
Li, Chao | Hangzhou Deeprobotics Co.Ltd |
Wu, Jun | Zhejiang University |
Zhu, Qiuguo | Zhejiang University |
Keywords: Multi-Robot Systems, Reinforcement Learning, Planning, Scheduling and Coordination
Abstract: The realm of multi-robot autonomous exploration tasks underscores the critical role of communication in coordinating group activities. This paper introduces an innovative decentralized multi-robot exploration algorithm, meticulously crafted to ensure unbroken communication within robotic groups, a crucial element for effective coordination. The motivation for our work is two-fold: Firstly, seamless communication is vital for coordinating multi-robot autonomous exploration tasks. Secondly, in applications such as disaster rescue operations or military maneuvers, there are numerous scenarios where spatial congregation of multiple robots is imperative for joint task accomplishment. Our approach addresses these challenges through a stringent communication constraint, ensuring that each robot remains in constant communicative contact with the rest of the group. This is realized by employing a decentralized policy that integrates Graph Neural Network (GNN) layers with self-attention mechanism. Such policy network design allows adaptation to different numbers of robots and varied environments. After an initial imitation learning phase, the policy is refined through learning from experiences generated via a tree-search-based lookahead technique. Our experimental analysis validates that the algorithm not only maintains consistent communication links among all group members but also improve the exploration efficiency under the communication constraints. These results highlight the potential of our method in enhancing the effectiveness of robotic group explorations while ensuring robust communication connection.
|
|
09:00-10:00, Paper WePI2T11.6 | |
Collaborative Object Manipulation on the Water Surface by a UAV-USV Team Using Tethers |
|
Novák, Filip | Czech Technical University in Prague |
Baca, Tomas | Ceske Vysoke Uceni Technicke V Praze, FEL |
Saska, Martin | Czech Technical University in Prague |
Keywords: Multi-Robot Systems, Motion Control, Motion and Path Planning
Abstract: This paper introduces an innovative methodology for object manipulation on the surface of water through the collaboration of an Unmanned Aerial Vehicle (UAV) and an Unmanned Surface Vehicle (USV) connected to the object by tethers. We propose a novel mathematical model of a robotic system that combines the UAV, USV, and the tethered floating object. A novel Model Predictive Control (MPC) framework is designed for using this model to achieve precise control and guidance for this collaborative robotic system. Extensive simulations in the realistic robotic simulator Gazebo demonstrate the system’s readiness for real-world deployment, highlighting its versatility and effectiveness. Our multi-robot system overcomes the state-of-the-art single-robot approach, exhibiting smaller control errors during the tracking of the floating object’s reference. Additionally, our multi-robot system demonstrates a shorter recovery time from a disturbance compared to the single-robot approach.
|
|
09:00-10:00, Paper WePI2T11.7 | |
Multi-Robot Path Planning with Boolean Specification Tasks under Motion Uncertainties |
|
Zhang, Zhe | Shaanxi University of Science and Technology |
He, Zhou | Shaanxi University of Science and Technology |
Ran, Ning | Hebei University |
Reniers, Michel | Eindhoven University of Technology |
Keywords: Multi-Robot Systems, Task and Motion Planning, Planning, Scheduling and Coordination
Abstract: This paper studies the path planning problem of multi-robot systems under motion uncertainties with high-level tasks that are expressed as Boolean specifications. The specification imposes logical constraints on robot trajectories and final states. First, a global Markov decision process model of the multi-robot system is constructed to provide its current state. In order to tackle the state explosion problem, at each stage, we construct a local Markov decision process for every individual agent in sequence to compute the local optimal movement strategy and update the global Markov decision process accordingly (i.e., compute locally and update globally). Next, we propose a heuristic reward function design method that provides different rewards for visiting different task points by introducing the estimated distance to complete the global task. Finally, a series of numerical experiments are conducted to demonstrate the computational efficiency and scalability of our developed approach.
|
|
09:00-10:00, Paper WePI2T11.8 | |
Coalition Formation Game Approach for Task Allocation in Heterogeneous Multi-Robot Systems under Resource Constraints |
|
Zhang, Liwang | National University of Defense Technology |
Liang, Dong | College of Sciences, National University of Defense Technology |
Li, Minglong | National University of Defense Technology |
Yang, Wenjing | State Key Laboratory of High Performance Computing (HPCL), Schoo |
Yang, Shaowu | National University of Defense Technology |
Keywords: Multi-Robot Systems, Aerial Systems: Applications, Distributed Robot Systems
Abstract: This paper studies a case of the multi-robot task allocation (MRTA) problem, where each unmanned aerial vehicle (UAV) is endowed with multiple but limited resources. Completing each task necessitates UAVs to combine different resources through coalition formation, which will incur various costs including flight cost, execution cost, and cooperation cost. To minimize the total cost while maximizing both task completion rate and resource utilization rate, we model the MRTA problem of the UAVs as a leader-follower coalition formation game. In this game, leader UAVs coordinate follower UAVs to fulfill task resource requisites. Meanwhile, follower UAVs select suitable coalitions to join based on the altruistic preference. Theoretical analysis confirms the existence of a Nash stable partition in the coalition formation game. To achieve this stable partition, we propose a coalition formation algorithm. Simulation experiments validate that the proposed algorithm outperforms existing methods for the MRTA problem under resource constraints in terms of both task completion rate and resource utilization rate.
|
|
09:00-10:00, Paper WePI2T11.9 | |
Design of a Multi-Robot Coordination System Based on Functional Expressions Using Large Language Models |
|
Kato, Yuki | Osaka University |
Yoshida, Takahiro | Osaka University |
Sueoka, Yuichiro | Osaka Univ |
Osuka, Koichi | Osaka University |
Yajima, Ryosuke | The University of Tokyo |
Nagatani, Keiji | The University of Tokyo |
Asama, Hajime | The University of Tokyo |
Keywords: Multi-Robot Systems, Distributed Robot Systems, Swarm Robotics
Abstract: A system is expected to facilitate coordination among multiple construction machines or robots, enabling them to adaptively perform various tasks in disaster sites and unknown environments. Prior research generally adopts a model-based approach to designing cooperative behavior. However, it is difficult to adapt to environments and scenarios that cannot be predicted by the model. In recent years, it has been reported that a robot equipped with foundation models can adapt to unknown (open) environments and unpredictable situations. However, there has been little discussion on foundation models for multiple robot systems; a flow that cooperatively handles unexpected events does not exist. In this paper, we propose the system flow that enables multiple robots to adaptively coordinate to unforeseen scenarios based on the functional expressions of each other and environment understanding utilizing GPT-4 and GPT-4V. Through experimentation, we verify that the proposed flow is able to adapt to an unforeseen environment, particularly path obstruction via robot experiments. Furthermore, we examine the validity of the proposed flow by varying the robots' functional expressions and sensor information for the environment.
|
|
09:00-10:00, Paper WePI2T11.10 | |
CGA: Corridor Generating Algorithm for Multi-Agent Environments |
|
Pertzovsky, Arseniy | Ben-Gurion University of the Negev |
Stern, Roni | Ben Gurion University of the Negev, Palo Alto Research Center (P |
Zivan, Roie | Ben Gurion University of the Negev |
Keywords: Multi-Robot Systems, Path Planning for Multiple Mobile Robots or Agents, Planning, Scheduling and Coordination
Abstract: In this work, we consider path planning for a team of mobile agents where one agent must reach a given target as soon as possible and the others must accommodate to avoid collisions. We call this practical problem the Single-Agent Corridor Generating (SACG) problem and explore several algorithms for solving it. We propose two baseline algorithms based on existing Multi-Agent Path Finding (MAPF) algorithms and outline their limitations. Then, we present the Corridor Generating Algorithm (CGA), a fast and complete algorithm for solving SACG. CGA performs well compared to the baseline approaches. In addition, we show how CGA can be generalized to address the lifelong version of MAPF, where new goals appear over time.
|
|
09:00-10:00, Paper WePI2T11.11 | |
Learning to Imitate Spatial Organization in Multi-Robot Systems |
|
Agunloye, Ayomide Oluwaseyi | University of Southampton |
Ramchurn, Sarvapali | University of Southampton |
Soorati, Mohammad D. | University of Southampton |
Keywords: Multi-Robot Systems, Imitation Learning, Swarm Robotics
Abstract: Understanding collective behavior and how it evolves is important to ensure that robot swarms can be trusted in a shared environment. One way to understand the behavior of the swarm is through collective behavior reconstruction using prior demonstrations. Existing approaches often require access to the swarm controller which may not be available. We reconstruct collective behaviors in distinct swarm scenarios involving shared environments without using swarm controller information. We achieve this by transforming prior demonstrations into features that describe multi-agent interactions before behavior reconstruction with multi-agent generative adversarial imitation learning (MA-GAIL). We show that our approach outperforms existing algorithms in spatial organization, and can be used to observe and reconstruct a swarm's behavior for further analysis and testing, which might be impractical or undesirable on the original robot swarm.
|
|
09:00-10:00, Paper WePI2T11.12 | |
D-MARL: A Dynamic Communication-Based Action Space Enhancement for Multi Agent Reinforcement Learning Exploration of Large Scale Unknown Environments |
|
Calzolari, Gabriele | Luleå Tekniska Universitet |
Sumathy, Vidya | Luleå University of Technology |
Kanellakis, Christoforos | LTU |
Nikolakopoulos, George | Luleå University of Technology |
Keywords: Multi-Robot Systems, Reinforcement Learning, Cooperating Robots
Abstract: In this article, we propose a novel communication-based action space enhancement for the D-MARL exploration algorithm to improve the efficiency of mapping an unknown environment, represented by an occupancy grid map. In general, communication between autonomous systems is crucial when exploring large and unstructured environments. In such real-world scenarios, data transmission is limited and relies heavily on inter-agent proximity and the attributes of the autonomous platforms. In the proposed approach, each agent's policy is optimized by utilizing the heterogeneous-agent proximal policy optimization algorithm to autonomously choose whether to communicate or explore the environment. To accomplish this, multiple novel reward functions are formulated by integrating inter-agent communication and exploration. The investigated approach aims to increase efficiency and robustness in the mapping process, minimize exploration overlap, and prevent agent collisions. The D-MARL policies trained on different reward functions have been compared to understand the effect of different reward terms on the collaborative attitude of the homogeneous agents. Finally, multiple simulation results are provided to prove the efficacy of the proposed scheme.
|
|
09:00-10:00, Paper WePI2T11.13 | |
Opinion-Based Strategy for Distributed Multi-Robot Task Allocation in Swarms of Robots |
|
Zhang, Ziqiao | Georgia Institute of Technology |
Chen, Shengkang | Georgia Tech |
Mayberry, Scott | Georgia Institute of Technology |
Zhang, Fumin | Georgia Institute of Technology |
Keywords: Multi-Robot Systems, Task Planning, Swarm Robotics
Abstract: Opinions of individuals in large groups evolve through interactions with neighbors and the environment, which can be modeled with opinion dynamics. In this paper, we propose a distributed opinion-based strategy for large-scale multi-robot task allocation utilizing the convergence behaviors of opinion dynamics. The strategy relies on the specialized opinion dynamics on the unit sphere for robot task selection. We investigate the convergence behaviors of opinion dynamics in the context of regions of attraction. Simulation results with a swarm of 200 homogeneous robots validate the effectiveness of our proposed strategy.
|
|
09:00-10:00, Paper WePI2T11.14 | |
Robust and Safe Task-Driven Planning and Navigation for Heterogeneous Multi-Robot Teams with Uncertain Dynamics |
|
Pan, Tianyang | Rice University |
Verginis, Christos | Uppsala University |
Kavraki, Lydia | Rice University |
Keywords: Multi-Robot Systems, Path Planning for Multiple Mobile Robots or Agents, Robust/Adaptive Control
Abstract: Task and motion planning (TAMP) can enhance intelligent multi-robot coordination. TAMP becomes significantly more complicated in obstacle-cluttered environments and in the presence of robot dynamic uncertainties. We propose a control framework that solves the motion-planning problem for multi-robot teams with uncertain dynamics, addressing a key component of the TAMP pipeline. The principal part of the proposed algorithm constitutes a decentralized feedback control policy for tracking of reference paths by the robots while avoiding collision and adapting in real time to the underlying dynamic uncertainties. The proposed framework further leverages sampling-based motion planners to free the robots from local-minimum configurations. Extensive experimental results in complex, realistic environments illustrate the superior efficiency of the proposed approach, in terms of planning time and amount of encountered local minima, with respect to state-of-the-art baseline methods.
|
|
09:00-10:00, Paper WePI2T11.15 | |
Communication-Constrained Multi-Robot Exploration with Intermittent Rendezvous |
|
Ribeiro da Silva, Alysson | Universidade Federal De Minas Gerais |
Chaimowicz, Luiz | Federal University of Minas Gerais |
Costa Silva, Thales | University of Pennsylvania |
Hsieh, M. Ani | University of Pennsylvania |
Keywords: Multi-Robot Systems, Distributed Robot Systems, Cooperating Robots
Abstract: Communication constraints can significantly impact robots' ability to share information, coordinate their movements, and synchronize their actions, thus limiting coordination in Multi-Robot Exploration (MRE) applications. In this work, we address these challenges by modeling the MRE application as a DEC-POMDP and designing a joint policy that follows a rendezvous plan. This policy allows robots to explore unknown environments while intermittently sharing maps opportunistically or at rendezvous locations without being constrained by joint path optimizations. To generate the rendezvous plan, robots represent the MRE task as an instance of the Job Shop Scheduling Problem (JSSP) and minimize JSSP metrics. They aim to reduce waiting times and increase connectivity, which correlates to the DEC-POMDP rewards and time to complete the task. Our simulation results suggest that our method is more efficient than using relays or maintaining intermittent communication with a base station, being a suitable approach for Multi-Robot Exploration. We developed a proof-of-concept using the Robot Operating System (ROS) that is available at: https://github.com/multirobotplayground/ROS-Noetic-Multi-robot-Sandbox.
|
|
09:00-10:00, Paper WePI2T11.16 | |
Tree-Based Reconfiguration of Metamorphic Robots |
|
Ondika, Patrick | Faculty of Informatics Masaryk University |
Mrázek, Jan | Masaryk University |
Barnat, Jiri | Faculty of Informatics Masaryk University |
Keywords: Multi-Robot Systems, Cellular and Modular Robots, Cooperating Robots
Abstract: Metamorphic robots have gained traction since the start of the 21st century due to their ability to change shape and adapt to various tasks. In order to build versatile and robust metamorphic systems, we need to be able to find a reconfiguration plan efficiently. This paper presents a new approach to the reconfiguration problem of chain-type metamorphic robots. Our algorithm relies on forming tentacles and searching through a lower-dimensional space by solving smaller planning instances. As a result, we obtain a solution that is more scalable than optimal planners, while producing higher quality plans than previously introduced fast solutions.
|
|
09:00-10:00, Paper WePI2T11.17 | |
Multi-Robot Navigation among Movable Obstacles: Implicit Coordination to Deal with Conflicts and Deadlocks |
|
Renault, Benoit | INSA Lyon |
Saraydaryan, Jacques | Cpe Lyon |
Brown, David | Inria |
Simonin, Olivier | INSA De Lyon |
Keywords: Planning, Scheduling and Coordination, Path Planning for Multiple Mobile Robots or Agents, Multi-Robot Systems
Abstract: How to find efficient paths for multiple robots in modifiable cluttered environments? This question leads us to the formulation of the new problem of Multi-Robot Navigation Among Movable Obstacles (MR-NAMO). In MR-NAMO, robots must not only plan for the possibility of displacing obstacles as needed to facilitate their navigation, but also solve conflicts that may arise when trying to simultaneously access a location or obstacle. As a first approach to this new problem, we introduce and compare variants of an implicit coordination strategy allowing the use of existing NAMO Algorithms in a Multi-Robot context. We also show how our previously introduced social occupation cost model can improve the efficiency of multi-robot plans with better obstacle placement choices, and how it can be applied in a novel way to find relevant robot placement choices in deadlock situations.
|
|
WePI2T12 |
Room 12 |
Mechanisms and Actuation |
Teaser Session |
Chair: Ikemoto, Shuhei | Kyushu Institute of Technology |
|
09:00-10:00, Paper WePI2T12.1 | |
Active Learning for Forward/Inverse Kinematics of Redundantly-Driven Flexible Tensegrity Manipulator |
|
Yoshimitsu, Yuhei | Kyushu Institute of Technology |
Osa, Takayuki | University of Tokyo |
Ben Amor, Heni | Arizona State University |
Ikemoto, Shuhei | Kyushu Institute of Technology |
Keywords: Flexible Robotics, Redundant Robots, Modeling, Control, and Learning for Soft Robots
Abstract: In flexible redundantly-driven multi-DOF systems, like living beings, the representation of redundant kinematics including the diversity of solutions, is crucial for leveraging its distinctive characteristics. This paper proposes an active learning framework for forward and inverse modeling of complex kinematics that improves expressions of control space, task space, and null space. It consists of a VAE-type network that internally holds expressions of control space, task space, and null space, and an algorithm for selecting new data using the cross-entropy method. The validity of the proposed system was verified using a tensegrity manipulator driven by 40 pneumatic cylinders. As a result, it was confirmed that active learning contributed to achieving the entire range of motion covered and a well-organized representation of the null space.
|
|
09:00-10:00, Paper WePI2T12.2 | |
Design of a Variable Wheel-Propeller Integrated Mechanism for Amphibious Robots |
|
Lu, Liang | Huazhong University of Science and Technology |
Gao, Xiangquan | Huazhong University of Science and Technology |
Xiang, Ming | Huazhong University of Science and Technology |
Yan, Zefeng | Huazhong University of Science & Technology |
Han, Bin | Huazhong University of Science and Technology |
Keywords: Wheeled Robots, Field Robots
Abstract: In order to address the high complexity and low efficiency of amphibious propulsion systems, this paper proposes a novel variable wheel-propeller integrated mechanism for amphibious robots. By adjusting the blade pitch angle, it enables multiple motion modes, including rapid and stable movement on flat ground, obstacle crossing, and omnidirectional movement on water surface. This study establishes a kinematic model for the propeller blades and conducts multi-objective optimization of the structural parameters by considering both the land obstacle-crossing performance and underwater propulsion performance. Based on the optimized structural parameters, a virtual simulation prototype is constructed. Simulation results indicate that when water surface movement, with a driving torque of 3N.m, robot achieves a maximum linear velocity of 1.25m/s and a maximum angular self-rotation velocity of 3.5rad/s. Moreover, varying the blade pitch angle can alter the thrust direction, enabling omnidirectional mobility on water surface. During land movement, with a rotation speed of 60rpm, the highest obstacle-crossing height is 184mm. This wheel-propeller integrated mechanism exhibits robust comprehensive motion performance and environmental adaptability, with convenient motion modes switching.
|
|
09:00-10:00, Paper WePI2T12.3 | |
Static Modeling of the Stiffness and Contact Forces of Rolling Element Eccentric Drives for Use in Robotic Drive Systems |
|
Fritsch, Simon | Technical University of Munich |
Landler, Stefan | Technical University of Munich |
Otto, Michael | Technical University of Munich, Chair of Machine Elements, Gear |
Vogel-Heuser, Birgit | Technical University Munich |
Zimmermann, Markus | Technical University of Munich |
Stahl, Karsten | Technical University of Munich |
Keywords: Actuation and Joint Mechanisms, Methods and Tools for Robot System Design, Mechanism Design
Abstract: Rolling element eccentric drives promise to be an easy-to-manufacture and performant gear system for robotic actuators. They share characteristics with other eccentric drives, such as strain wave and cycloidal drives, but use rolling elements instead of an eccentric gear. They offer reduced manufacturing complexity and costs by using readily available standard parts. Little research into rolling element eccentric drives is available, and their characteristics are still underexplored. This work uses a contact-based model to investigate the previously unknown stiffness of rolling element eccentric drives. Such calculation methods are well established for structurally similar components, such as cycloidal drives and roller bearings, and provide a high-level and computationally efficient model. Good stiffness models are critical for accurately predicting robotic actuator behavior and enabling better control of robotic systems. Additionally, the proposed model is used to calculate the contact forces under load occurring in rolling element eccentric drives. Contact forces are critical to calculating a drive’s load capacity, lifetime, and efficiency and serve as the foundation for further research. The mathematical description of the proposed model is derived, and the stiffness of a representative rolling element eccentric drive is calculated. Different manufacturing techniques, characterized by tolerance levels and material choices, are compared. Irrespective of manufacturing precision, similar stiffness curves result for drives made of steel, but higher contact forces result from less precise manufacturing. The stiffness of drives made from 3D printed plastic is considerably lower than that of drives made from steel. Additionally, the stiffness of rolling element eccentric drives is compared to similar eccentric drives, and a comparable twist-over-torque curve is shown.
|
|
09:00-10:00, Paper WePI2T12.4 | |
Energy Minimization Using Custom-Designed Magnetic-Spring Actuators |
|
Fu, Yue Yang | Vanderbilt University |
Kilic, Ali Umut | Vanderbilt University |
Braun, David | Vanderbilt University |
Keywords: Actuation and Joint Mechanisms, Compliant Joints and Mechanisms
Abstract: This study introduces an innovative actuator that resembles a motor with a non-uniform permanent magnetic field. We have developed a prototype of the actuator by combining a standard motor, characterized by a uniform magnetic field, with a custom rotary magnetic spring exhibiting a non-uniform magnetic field. We have also presented a systematic computational approach to customize the magnetic field to minimize the energy consumption of the actuator when used for a user-defined oscillatory task. Experiments demonstrate that this optimized actuator significantly lowers energy consumption in a typical oscillatory task, such as pick-and-place or oscillatory limb motion during locomotion, compared to conventional motors. Our findings imply that incorporating task-optimized non-uniform permanent magnetic fields into conventional motors and direct-drive actuators could enhance the energy efficiency of robotic systems.
|
|
09:00-10:00, Paper WePI2T12.5 | |
Novel Multiport Output Twisted String Actuator with Self-Differential Mechanism: Hand Glove Application |
|
Wei, Dunwen | University of Electronic Science and Technology of China |
Cui, Chenguang | University of Electronic Science and Technology of China |
Yu, Haitao | University of Electronic Science and Technology of China |
Gao, Tao | University of Electronic Science and Technology of China |
Li, Chao | Sichuan Cancer Center |
Hussain, Sajjad | University of Naples Federico II |
Ficuciello, Fanny | Università Di Napoli Federico II |
Keywords: Actuation and Joint Mechanisms, Tendon/Wire Mechanism, Prosthetics and Exoskeletons
Abstract: The differential mechanism can reduce the number of actuators and efficiently distribute force or power. We proposed a novel multiport output twisted string actuator (MO-TSA) with self-differential mechanism that employs a single actuator to achieve multiport outputs. The differential MO-TSA is adaptively controlled in accordance with the force differences at each output port, thus replacing the traditional differential gears and whiffletree mechanisms. Inspired by the hand muscles, we designed one hand glove using the MO-TSA, aiming to enhance the range of achievable grasp configurations. The hand glove is capable of performing various grasps with a single actuator, resulting in a lighter and simpler hand design and revolutionizing the field of twisted string actuators (TSAs) by offering a streamlined solution for achieving versatile actuation.
|
|
09:00-10:00, Paper WePI2T12.6 | |
Torque Ripple Reduction in Quasi-Direct Drive Motors through Angle-Based Repetitive Learning Observer and Model Predictive Torque Controller |
|
Zhang, Hefei | University of Science and Technology of China |
Zhang, Xiaohu | University of Science and Technology of China |
Cheng, Jinyu | University of Science and Technology of China |
Hu, Jiangtao | University of Science and Technology of China |
Ji, Chao | University of Science and Technology of China |
Wang, Yu | Harbin Institute of Technology, Shenzhen |
Jiang, Yutong | China North Vehicle Research Institute |
Han, Zhen | China North Vehicle Research Institute |
Gao, Wei | University of Science and Technology of China |
Zhang, Shiwu | University of Science and Technology of China |
Keywords: Actuation and Joint Mechanisms, Force Control, Legged Robots
Abstract: Torque ripple reduction in quasi-direct drive (QDD) motors is crucial in their robotic applications for dynamic locomotion and dexterous manipulation. In this paper, we present a novel approach for reducing torque ripples of QDD motors, which integrates an angle-based repetitive learning observer (ARLO) and a model predictive control-based field-oriented controller (MPC-FOC). The proposed method successfully improves the torque loop control bandwidth and surpasses conventional proportional-integral (PI) controllers owing to the integrated physical constraints inside MPC. Additionally, the ARLO portion is able to mitigate ripple caused by the inherent cogging torque in brushless motors and also the periodic friction torque from the planetary gearboxes in QDD systems. The effectiveness of the proposed method is demonstrated through both simulation of a single QDD motor and experiments on a two-degree-of-freedom robotic leg, where the performance improvement can be 72.7% in speed tracking and 58.5% in trajectory tracking. The proposed method shows great potential in facilitating smooth motion and precise force control in future robotic applications.
|
|
09:00-10:00, Paper WePI2T12.7 | |
Development of a Mobile Reconfigurable Mecanum Robot with a Locking Device of Rollers |
|
Zakharov, Dmitrii | ITMO University |
Iaremenko, Andrei | ITMO |
Kurovskii, Denis | ITMO University |
Kurovskii, Artem | ITMO University |
Borisov, Oleg | ITMO University |
Zhang, Botao | Hangzhou Dianzi University |
Keywords: Wheeled Robots, Mechanism Design, Service Robotics
Abstract: This paper presents the design and analysis of an omnidirectional reconfigurable wheeled robot capable of switching between omnidirectional and conventional wheeled mode. We have developed a new pneumatic locking mechanism of rollers for the mecanum wheel. In the mecanum mode, the robot can perform holonomic movements, and in the wheeled platform mode, it can overcome inclined surfaces and perform more energy-efficient movements. In addition, the locking device allows the robot to brake faster compared to other mecanum robots. Unlike other works describing the reconfigurable structure of the mecanum wheel, this work offers a new design characterized by the simplicity of the mechanism and does not require the location of active reconfiguration elements inside the wheel itself. The paper describes the design concept and presents the mechanism for locking rollers. The study evaluates the use of the developed robot in various scenarios, including movement on an inclined surface, sudden braking on a plane and an inclined surface, and also analyzes the energy efficiency of the resulting solution for some operating scenarios. The experiments carried out confirm that this mobile platform, when switching mode, is able to move on surfaces with a large angle of inclination and perform more effective deceleration on both flat and inclined surfaces.
|
|
09:00-10:00, Paper WePI2T12.8 | |
Parametric Synthesis of Compliant Joints for Impact Robust Shaftless Leg Mechanisms |
|
Rakshin, Egor | ITMO University |
Ogureckiy, Dmitriy | ITMO University |
Borisov, Ivan | ITMO University |
Kolyubin, Sergey | ITMO University |
Keywords: Compliant Joints and Mechanisms, Mechanism Design, Legged Robots
Abstract: This paper describes a novel parametric optimization procedure for three flexure cross hinges (TFCH) integrated into multi-link leg mechanisms with closed-loop kinematics. Despite advantages such as compliance, no need for joint lubrication, light weight and cost-efficiency, such shaftless mechanisms have not been widely used, especially in the field of dynamic locomotion, also because their design is challenging and barely studied. Using a morphological computation approach, we have optimized the TFCH geometry to achieve the desired joint stiffness using frequency analysis, ensuring safe and stable hopping under external perturbations. We combined rigid body dynamics with lumped stiffness model and finite element modeling using the SPACAR toolbox to simulate various designs within our optimization pipeline. To illustrate the efficiency of the resulting designs, we built a prototype and conducted a series of full-scale experiments with ramp jumps whose trajectories were recorded by a motion capture system. The experiments showed that TFCH can be effectively integrated into leg mechanisms, providing benefits such as impact robustness, energy recuperation, and the ability to work in extreme conditions.
|
|
09:00-10:00, Paper WePI2T12.9 | |
Trans-Rotor: An Active Omnidirectional Aerial-Ground Vehicle with Differential Gear Joint Transformation Mechanism |
|
Wu, Xuankang | Northeastern University |
Sun, Haoxiang | Northeastern University |
Xiao, Tong | Northeastern University |
Pan, Yanzhang | Northeastern University |
Fang, Zheng | Northeastern University |
Keywords: Mechanism Design, Aerial Systems: Mechanics and Control, Aerial Systems: Applications
Abstract: Air-ground vehicles have shown great potential in various fields due to their superior mobility and outstanding endurance. However, most of morphing air-ground vehicles consider little about controllability and traversability in ground mode. We present a novel air-ground vehicle called Trans-Rotor. By proposing a differential gear joint, we equip Trans-Rotor with omnidirectional mobility in both air and ground mode. Besides, using a four-wheel-steering model in ground mode provides better traversability and ground flexibility. With mid-mode transformation, Trans-Rotor can provide smooth and rapid mode switching. In this work, we firstly propose a novel design of an air-ground vehicle with a differential gear joint transformation mechanism. Furthermore, to achieve autonomous navigation, we propose the vehicle's decoupled controller considering the four-wheel-steer model. Meanwhile, comprehensive experiments and a benchmark comparison are carried out to validate the system's outstanding performance, where the system shows ground flexibility and saves energy up to more than 95%.
|
|
09:00-10:00, Paper WePI2T12.10 | |
SNU-Avatar Haptic Glove: Novel Modularized Haptic Glove Via Trigonometric Series Elastic Actuators |
|
Sung, Eunho | Seoul National University |
You, Seungbin | Seoul National University |
Moon, Seongkyeong | Seoul National University |
Kim, Juhyun | Seoul National University |
Park, Jaeheung | Seoul National University |
Keywords: Mechanism Design, Haptics and Haptic Interfaces
Abstract: The avatar robot is a robot capable of realistic remote operation. In remote operation, the controllability of the glove, which can manipulate the hand interacting directly with the environment at the remote site, is an important factor. The glove must be able to accurately estimate the hand posture and provide haptic feedback to convey information about the remote environment and enhance operability. It should minimize user discomfort throughout the process. To achieve this goal, the research proposes providing force feedback to the fingers using Trigonometric Series Elastic Actuators, and devices are attached to the Middle Phalanx to facilitate the easy installation of additional add-ons, ensuring users feel securely fixed when attached. Additionally, by proposing an algorithm to estimate the fingertip position without directly attaching it to the fingertip, we estimate the hand posture and provide appropriate force when necessary. Furthermore, several additional add-ons can be attached to the fingertips to enable roughness and temperature feedback. Finally, we participated in the ANA Avatar XPRIZE using this system and performed eight missions, including not only remote manipulation of objects but also social interactions, demonstrating its effectiveness.
|
|
09:00-10:00, Paper WePI2T12.11 | |
Versatile Variable-Stiffness Scooping End-Effector: Tilting-Scooping-Transfer Mechanism for Objects with Various Properties |
|
Takahashi, Yuta | Tohoku University |
Tadakuma, Kenjiro | Osaka University |
Abe, Kazuki | Osaka University |
Watanabe, Masahiro | Osaka University |
Shimizu, Shoya | Tohoku University |
Tadokoro, Satoshi | Tohoku University |
Keywords: Mechanism Design, Grippers and Other End-Effectors
Abstract: To address the setup and changeover time issues in high-mix, low-volume production systems, we developed an endeffector capable of uniformly scooping, holding, and transporting a wide variety of objects, and demonstrated this system with prototype. Our experiments showed that the prototype was successful in scooping up and transporting a wide variety of objects and could be applied to high-mix low-volume production systems. In addition, the load testing of the spatula and modeling of objects that can be tilted backwards provided insight into further improvements of the scooping performance of this mechanism.
|
|
09:00-10:00, Paper WePI2T12.12 | |
Enhanced Omni-Ball: Spherical Omnidirectional Wheel Achieving Passive Rollers with High Load Capacity and Smoothness through an Offset Rotational Axis |
|
Tadakuma, Kenjiro | Osaka University |
Sakiyama, Seiji | Tohoku University |
Takane, Eri | Tohoku University |
Tadakuma, Riichiro | Yamagata University |
Tadokoro, Satoshi | Tohoku University |
Keywords: Mechanism Design, Wheeled Robots
Abstract: This paper introduces an innovation of the Spherical Omnidirectional Wheel, designed to achieve omnidirectional driving motion. In previous models, the supporting shaft was placed at the center of the mechanism. However, achieving both smoothness and high load-capacity in such designs proved challenging. The mechanism proposed in this study features an offset design, enabling outer support for the wheel. A prototype was developed and its basic motion was experimentally validated.
|
|
09:00-10:00, Paper WePI2T12.13 | |
Design and Control of a Novel Six-Degree-Of-Freedom Hybrid Robotic Arm |
|
Chen, Yang | Beijing Academy of Agriculture and Forestry Sciences |
Miao, Zhonghua | Shanghai University |
Ge, Yuanyue | Beijing Academy of Agriculture and Forestry Sciences |
Lin, Sen | Intelligent Equipment Research Center, Beijing Academy of Agricu |
Chen, Liping | Intelligent Equipment Research Center, Beijing Academy of Agricu |
Xiong, Ya | Beijing Academy of Agriculture and Forestry Sciences |
Keywords: Mechanism Design, Robotics and Automation in Agriculture and Forestry, Parallel Robots
Abstract: Robotic arms are key components in fruit-harvesting robots. In agricultural settings, conventional serial or parallel robotic arms often fall short in meeting the demands for a large workspace, rapid movement, enhanced capability of obstacle avoidance and affordability. This study proposes LingXtend, a novel hybrid six-degree-of-freedom (DoF) robotic arm that combines the advantages of parallel and serial mechanisms. Inspired by yoga, we designed two sliders capable of moving independently along a single rail, acting as two feet. These sliders are interconnected with linkages and a meshed-gear set, allowing the parallel mechanism to lower itself and perform a split to pass under obstacles. This unique feature allows the arm to avoid obstacles such as pipes, tables and beams typically found in greenhouses. Integrated with serially mounted joints, the patented hybrid arm is able to maintain the end's pose even when it moves with a mobile platform, facilitating fruit picking with the optimal pose in dynamic conditions. Moreover, the hybrid arm's workspace is substantially larger, being almost three times the volume of UR3 serial arms and fourteen times that of the ABB IRB parallel arms. Experiments show that the repeatability errors are 0.017 mm, 0.03 mm and 0.109 mm for the two sliders and the arm's end, respectively, providing sufficient precision for agricultural robots.
|
|
09:00-10:00, Paper WePI2T12.14 | |
DIABLO: A 6-DoF Wheeled Bipedal Robot Composed Entirely of Direct-Drive Joints |
|
Liu, Dingchuan | Sun Yat-Sen University |
Fangfang, Yang | Sun Yat-Sen University |
Liao, Xuanhong | Direct Drive Technology Ltd |
Lyu, Ximin | Sun Yat-Sen University |
Keywords: Wheeled Robots, Field Robots, Legged Robots
Abstract: Wheeled bipedal robots offer the advantages of both wheeled and legged robots, combining the ability to traverse a wide range of terrains and environments with high efficiency. However, the conventional approach in existing wheeled bipedal robots involves motor-driven joints with high-ratio gearboxes. While this approach provides specific benefits, it also presents several challenges, including increased mechanical complexity, efficiency losses, noise, vibrations, and higher maintenance and lubrication requirements. Addressing the aforementioned concerns, we developed a direct-drive wheeled bipedal robot called DIABLO, which eliminates the use of gearboxes entirely. Our robotic system is simplified as a second-order inverted pendulum, and we have designed an LQR-based balance controller to ensure stability. Additionally, we implemented comprehensive motion controller, including yaw, split-angle, height, and roll controllers. Through simulations and field experiments, we have demonstrated that our platform achieves satisfactory performance.
|
|
09:00-10:00, Paper WePI2T12.15 | |
Safe Imitation Learning of Nonlinear Model Predictive Control for Flexible Robots |
|
Mamedov, Shamil | KU Leuven |
Reiter, Rudolf | University of Freiburg |
Basiri Azad, Seyed Mahdi | University of Freiburg |
Viljoen, Ruan Matthys | KU Leuven |
Boedecker, Joschka | University of Freiburg |
Diehl, Moritz | Univ. of Heidelberg |
Swevers, Jan | KU Leuven |
Keywords: Flexible Robotics, Optimization and Optimal Control, Imitation Learning
Abstract: Flexible robots may overcome some of industry's major challenges, such as enabling intrinsically safe human-robot collaboration and achieving a higher payload-to-mass ratio. However, controlling flexible robots is complicated due to their complex dynamics, which include oscillatory behavior and a high-dimensional state space. Nonlinear model predictive control (NMPC) offers an effective means to control such robots, but its significant computational demand often limits its application in real-time scenarios. To enable fast control of flexible robots, we propose a framework for a safe approximation of NMPC using imitation learning and a predictive safety filter. Our framework significantly reduces computation time while incurring a slight loss in performance. Compared to NMPC, our framework shows more than an eightfold improvement in computation time when controlling a three-dimensional flexible robot arm in simulation, all while guaranteeing safety constraints. Notably, our approach outperforms state-of-the-art reinforcement learning methods. The development of fast and safe approximate NMPC holds the potential to accelerate the adoption of flexible robots in industry.
|
|
09:00-10:00, Paper WePI2T12.16 | |
Design and Modeling of a Thin-Walled Multi-Segment Continuum Robotic Bronchoscope |
|
Bian, Gui-Bin | Institute of Automation, Chinese Academy of Sciences |
Zhang, Ming-Yang | Institute of Automation, Chinese Academy of Sciences |
Ye, Qiang | Institute of Automation, Chinese Academy of Sciences |
Ren, Han | Institute of Automation, Chinese Academy of Sciences |
Zhai, Yu-Peng | School of Automation, Beijing Information Science and Technology |
Ma, Ruichen | Institute of Automation, Chinese Academy of Sciences |
Li, Zhen | Institute of Automation, Chinese Academy of Sciences |
Keywords: Flexible Robotics, Mechanism Design, Medical Robots and Systems
Abstract: Cable-driven continuum robots in bronchoscopic procedures hold immense potential to revolutionize the diagnosis and treatment of lung cancer. However, robotic bronchoscopes in current studies are typically large in size and inflexible. Therefore, this article introduces a novel cable-driven continuum robot bronchoscopy system that achieves modular design between the actuation and operation ends. A continuum structure with a dual-segment notched flexible skeleton, featuring a wall thickness of 0.45 mm, has been designed to perform bending movements exceeding 190°. This enhances flexibility and increases the spatial capacity of the working channels. A kinematic model was developed, integrating the actuation force and the mechanical characteristics of the driving cables for error compensation, estimating the correlation between the displacement of the driving cables and the position of the continuum robot's end-effector. The verification showed that the root mean square error (RMSE) of the end-effector position is 2.57 mm, which accounts for 4.8% of the continuum's length. A prototype of the robotic bronchoscopy system was created, and its performance and potential applications in bronchoscopic intervention surgeries were validated through vivo pig intervention experiments.
|
|
WeAT1 |
Room 1 |
Best Safety, Security, and Rescue Robotics Papers (IRSI) |
Regular session |
Chair: Kyrki, Ville | Aalto University |
|
10:00-10:15, Paper WeAT1.1 | |
Automating ROS2 Security Policies Extraction through Static Analysis |
|
Zanatta, Giacomo | Ca' Foscari University of Venice |
Caiazza, Gianluca | Ca Foscari University of Venice |
Ferrara, Pietro | Ca' Foscari University of Venice |
Negrini, Luca | Ca' Foscari University of Venice |
White, Ruffin | University of California San Diego |
Keywords: Formal Methods in Robotics and Automation, Software, Middleware and Programming Environments, Robot Safety
Abstract: Cybersecurity in mission-critical robotic applications is a necessity to scale deployments securely. ROS2 builds upon DDS-Security specs in ROS Client Library (RCL) to implement its security features. Utilizing SROS2, developers have access to a set of utilities to help set up security in a way RCL can use. Through SROS2, security deployment is eased for developers. However, while access control is handled by DDS and consequently based on the SROS2-generated permission artifacts, the necessary authorization policies are manually generated by developers. This requires an entire system exercise to be sampled via live extraction and, per each node, list all the necessary Topics, Services, and Actions, which is a daunting and laborious process. Developers first have to generate tests. Then, they obtain a 'snapshot' of the system for each test. Later, these snapshots must be collected and grouped into a policy by a minimum set of rules. All this procedure is quite error-prone. This paper introduces LiSA4ROS2, a tool for automatically extract the ROS2 computational graph via static analysis to derive a minimal correct configuration for ROS2 security policies. Our approach relies on the abstract interpretation theory to statically overapproximate all possible executions to extract a minimal and complete configuration per node. We evaluate our approach with minimal examples covering all the main communication patterns in ROS2 tutorials and all publicly available real-world ROS2 Python systems extracted from GitHub. The results of the minimal examples show that LiSA4ROS2 precisely supports all the main communication patterns. The extensive evaluation underlines that our prototype implementation of the analysis in LiSA4ROS2 is already able to precisely analyze 66% of existing repositories, automatically producing detailed computational graphs and access policies. All the results of the analysis, as well as a Docker artifact to reproduce them, are publicly available.
|
|
10:15-10:30, Paper WeAT1.2 | |
Jointly Learning Cost and Constraints from Demonstrations for Safe Trajectory Generation |
|
Chaubey, Shivam | Aalto University |
Verdoja, Francesco | Aalto University |
Kyrki, Ville | Aalto University |
Keywords: Learning from Demonstration, Task and Motion Planning, Optimization and Optimal Control
Abstract: Learning from Demonstration allows robots to mimic human actions. However, these methods do not model constraints crucial to ensure safety of the learned skill. Moreover, even when explicitly modelling constraints, they rely on the assumption of a known cost function, which limits their practical usability for task with unknown cost. In this work we propose a two-step optimization process that allow to estimate cost and constraints by decoupling the learning of cost functions from the identification of unknown constraints within the demonstrated trajectories. Initially, we identify the cost function by isolating the effect of constraints on parts of the demonstrations. Subsequently, a constraint leaning method is used to identify the unknown constraints. Our approach is validated both on simulated trajectories and a real robotic manipulation task. Our experiments show the impact that incorrect cost estimation has on the learned constraints and illustrate how the proposed method is able to infer unknown constraints, such as obstacles, from demonstrated trajectories without any initial knowledge of the cost.
|
|
10:30-10:45, Paper WeAT1.3 | |
Learned Regions of Attraction for Safe Motion Primitive Transitions |
|
Ubellacker, Wyatt | California Institute of Technology |
Ames, Aaron | California Institute of Technology |
Keywords: Legged Robots, Machine Learning for Robot Control, Dynamics
Abstract: Estimating regions of attraction (ROAs) of dynamical systems is critical for understanding the operational bounds within which a system will converge to a desired state. In this paper, we introduce a neural network-based approach to approximating ROAs that leverages labeled data generated by offline sampling and simulation of initial conditions, with labels determined by flow membership in an “explicit region of attraction.” This framework is designed to estimate ROAs with a level of precision suitable for integration into a motion primitive transition framework as conditions to switch between candidate primitive behaviors. To account for gaps between the simulated environment and the real world, online learning is employed; this refines the offline-learned model of the ROA based on observed discrepancies between predicted and actual system behaviors. We validate this methodology on a quadrupedal robot, demonstrating that our ROA estimates can effectively model regions of attraction for a high-dimensional system. We show this for multiple primitive behaviors and in environments different from the training data. The outcomes highlight the usefulness of our method in estimating regions of attraction and informing transition conditions between primitive behaviors.
|
|
10:45-11:00, Paper WeAT1.4 | |
Embodied AI with Two Arms: Zero-Shot Learning, Safety and Modularity |
|
Varley, Jacob | Google |
Singh, Sumeet | Google |
Jain, Deepali | Robotics at Google |
Choromanski, Krzysztof | Google DeepMind Robotics |
Zeng, Andy | Google DeepMind |
Basu Roy Chowdhury, Somnath | UNC Chapel Hill |
Dubey, Avinava | Google |
Sindhwani, Vikas | Google Brain, NYC |
Keywords: Bimanual Manipulation, AI-Enabled Robotics, Robot Safety
Abstract: We present an embodied AI system which receives open-ended natural language instructions from a human, and controls two arms to collaboratively accomplish potentially long-horizon tasks over a large workspace. Our system is modular: it deploys state of the art Large Language Models for task planning, Vision-Language models for semantic perception, and Point Cloud transformers for grasping. With semantic and physical safety in mind, these modules are interfaced with a real-time trajectory optimizer and a compliant tracking controller to enable human-robot proximity. We demonstrate performance for the following tasks: bi-arm sorting, bottle opening, and trash disposal tasks. These are done zero-shot where the models used have not been trained with any real world data from this bi-arm robot, scenes or workspace. Composing both learning- and non-learning-based components in a modular fashion with interpretable inputs and outputs allows the user to easily debug points of failures and fragilities. One may also in-place swap modules to improve the robustness of the overall platform, for instance with imitation-learned policies.
|
|
WeAT2 |
Room 2 |
Best Mobile Manipulation Papers (OMRON Sinix X Corp.) |
Regular session |
Chair: Harada, Kensuke | Osaka University |
|
10:00-10:15, Paper WeAT2.1 | |
Harmonic Mobile Manipulation |
|
Yang, Ruihan | UC San Diego |
Kim, Yejin | Allen Institute for AI |
Hendrix, Rose | Allen Institute for AI |
Kembhavi, Aniruddha | Allen Institute for AI |
Wang, Xiaolong | UC San Diego |
Ehsani, Kiana | Allen Institute for Artificial Intelligence |
Keywords: Mobile Manipulation, Visual Learning, Reinforcement Learning
Abstract: Recent advancements in robotics have enabled robots to navigate complex scenes or manipulate diverse objects independently. However, robots are still impotent in many household tasks requiring coordinated behaviors such as opening doors. The factorization of navigation and manipulation, while effective for some tasks, fails in scenarios requiring coordinated actions. To address this challenge, we introduce, HarmonicMM, an end-to-end learning method that optimizes both navigation and manipulation, showing notable improvement over existing techniques in everyday tasks. This approach is validated in simulated and real-world environments and adapts to novel unseen settings without additional tuning. Our contributions include a new benchmark for mobile manipulation and the successful deployment with only RGB visual observation in a real unseen apartment, demonstrating the potential for practical indoor robot deployment in daily life
|
|
10:15-10:30, Paper WeAT2.2 | |
BaSeNet: A Learning-Based Mobile Manipulator Base Pose Sequence Planning for Pickup Tasks |
|
Naik, Lakshadeep | University of Southern Denmark (SDU) |
Kalkan, Sinan | Middle East Technical University |
Sørensen, Sune Lundø | University of Southern Denmark |
Mikkel, Kjærgaard | University of Southern Denmark |
Krüger, Norbert | University of Southern Denmark |
Keywords: Mobile Manipulation, Reinforcement Learning
Abstract: In many applications, a mobile manipulator robot is required to grasp a set of objects distributed in space.This may not be feasible from a single base pose and the robot must plan the sequence of base poses for grasping all objects, minimizing the total navigation and grasping time. This is a Combinatorial Optimization problem that can be solved using exact methods, which provide optimal solutions but are computationally expensive, or approximate methods, which offer computationally efficient but sub-optimal solutions. Recent studies have shown that learning-based methods can solve Combinatorial Optimization problems, providing near-optimal and computationally efficient solutions. In this work, we present BASENET - a learning-based approach to plan the sequence of base poses for the robot to grasp all the objects in the scene. We propose a Reinforcement Learning based solution that learns the base poses for grasping individual objects and the sequence in which the objects should be grasped to minimize the total navigation and grasping costs using Layered Learning. As the problem has a varying number of states and actions, we represent states and actions as a graph and use Graph Neural Networks for learning. We show that the proposed method can produce comparable solutions to exact and approximate methods with significantly less computation time. The code, Reinforcement Learning environments, and pre-trained models will be made available on the project webpage.
|
|
10:30-10:45, Paper WeAT2.3 | |
MAkEable: Memory-Centered and Affordance-Based Task Execution Framework for Transferable Mobile Manipulation Skills |
|
Pohl, Christoph | Karlsruhe Institute of Technology (KIT) |
Reister, Fabian | Karlsruhe Institute of Technology (KIT) |
Peller-Konrad, Fabian | Karlsruhe Institute of Technology (KIT) |
Asfour, Tamim | Karlsruhe Institute of Technology (KIT) |
Keywords: Mobile Manipulation, Cognitive Control Architectures, Software Architecture for Robotic and Automation
Abstract: To perform versatile mobile manipulation tasks in human-centered environments, the ability to efficiently transfer learned skills, knowledge, and experiences from one robot to another or across different environments is critical. In this paper, we present MAkEable, a versatile uni- and multi-manual mobile manipulation framework that facilitates the transfer of capabilities and knowledge across different tasks, environments, and robots. Our framework integrates an affordance-based task description into the memory-centric cognitive architecture of the ARMAR humanoid robot family, which supports the sharing of experiences and demonstrations for transferring mobile manipulation skills. By representing mobile manipulation actions through affordances, i.e., interaction possibilities of the robot with its environment, we provide a unifying framework for the autonomous uni- and multi-manual manipulation of known and unknown objects in various environments. We demonstrate MAkEable’s applicability in real-world experiments for multiple robots, tasks, and environments. This includes grasping known and unknown objects, object placing, bimanual object grasping, memory-enabled skill transfer in a drawer opening scenario across two different humanoid robots, and a pouring task learned from human demonstration. Code is available through our project page https://h2t-projects.webarchiv.kit.edu/software/MAkEable.
|
|
10:45-11:00, Paper WeAT2.4 | |
A Novel Variable Stiffness Suspension System for Improved Stability and Control of Tactile Mobile Manipulators |
|
Kuhn, Sebastian | Technical University of Munich |
Yildirim, Mehmet Can | Technical University of Munich |
Pozo Fortunić, Edmundo | Technical University of Munich |
Karacan, Kübra | Technical University of Munich |
Swikir, Abdalla | Technical University of Munich |
Haddadin, Sami | Technical University of Munich |
Keywords: Compliant Joints and Mechanisms, Actuation and Joint Mechanisms, Wheeled Robots
Abstract: Mobile manipulators (MM) have proven valuable in assisting humans in industrial settings. However, their strict separation from humans in controlled environments limits their effectiveness. Efforts have been made to bridge this gap for physical human-robot interaction (pHRI), leading to the development of collaborative mobile manipulators (CMM). Nonetheless, unpredictable environments continue to present challenges. This paper introduces an innovative suspension design for mobile bases (MBs) to enhance the safety and autonomy of CMMs. We propose an electromechanical approach leveraging variable stiffness and combining passive springs with adaptive transmission mechanisms. Through simulation, physical prototype development, and experimental validation, we demonstrate the effectiveness of our approach in stabilizing the MB against external disturbances. Our findings provide valuable insights for the development of CMMs in dynamic environments.
|
|
WeAT3 |
Room 3 |
Manipulation and Grasping I |
Regular session |
Chair: D'Avella, Salvatore | Sant'Anna School of Advanced Studies |
Co-Chair: Khorrami, Farshad | New York University Tandon School of Engineering |
|
10:00-10:15, Paper WeAT3.1 | |
A Novel Dual-Robot Accurate Calibration Method Using Convex Optimization and Lie Derivative (I) |
|
Jiang, Cheng | Huazhong University of Science and Technology |
Li, Wen-long | Huazhong University of Science and Technology |
Li, Wen-pan | The Chinese University of Hong Kong |
Wang, Dongfang | Huazhong University of Science and Technology |
Zhu, Lijun | Huazhong University of Science and Technology |
Xu, Wei | Huazhong University of Science & Technology |
Zhao, Huan | Huazhong University of Science and Technology |
Ding, Han | Huazhong University of Science and Technology |
Keywords: Dual Arm Manipulation, Calibration and Identification, Convex Optimization and Lie Derivative, Industrial Robots
Abstract: Calibrating unknown transformation relationships is an essential task for multi-robot cooperative systems. Traditional linear methods are inadequate to decouple and simultaneously solve the unknown matrices due to their intercoupling. This paper proposes a novel dual-robot accurate calibration method that uses convex optimization and Lie derivative to solve the dual-robot calibration problem simultaneously. The key idea is that a convex optimization model based on dual-robot transformation chain is established using Lie representation of SE(3). The Jacobian matrix of the established optimization model is explicitly derived using the corresponding Lie derivative of SE(3). To balance the influence of the magnitudes of the rotational and translational variables, a weight coefficient is defined. Due to the closure and smoothness of Lie group, the optimization model can be solved simultaneously using Newton-like iterative methods without orthogonalization processing. The performance of the proposed method is verified through simulation and actual calibration experiments. The results show the proposed method outperforms the previous calibration methods in term of accuracy.
|
|
10:15-10:30, Paper WeAT3.2 | |
Grasp Multiple Objects with One Hand |
|
Li, Yuyang | Tsinghua University |
Liu, Bo | National University of Singapore |
Geng, Yiran | Peking University |
Li, Puhao | Tsinghua University |
Yang, Yaodong | Peking University |
Zhu, Yixin | Peking University |
Liu, Tengyu | Beijing Institute for General Artificial Intelligence |
Huang, Siyuan | Beijing Institute for General Artificial Intelligence |
Keywords: Grasping, Data Sets for Robot Learning
Abstract: The intricate kinematics of the human hand enable simultaneous grasping and manipulation of multiple objects, essential for tasks such as object transfer and in-hand manipulation. Despite its significance, the domain of robotic multi-object grasping is relatively unexplored and presents notable challenges in kinematics, dynamics, and object configurations. This paper introduces MultiGrasp, a novel two-stage approach for multi-object grasping using a dexterous multi-fingered robotic hand on a tabletop. The process consists of (i) generating pre-grasp proposals and (ii) executing the grasp and lifting the objects. Our experimental focus is primarily on dual-object grasping, achieving a success rate of 44.13%, highlighting adaptability to new object configurations and tolerance for imprecise grasps. Additionally, the framework demonstrates the potential for grasping more than two objects at the cost of inference speed.
|
|
10:30-10:45, Paper WeAT3.3 | |
One-Finger Manipulation of 3D Objects by Planning Start-To-Push Points and Pushing Forces |
|
Xiao, Mubang | National University of Defense Technology, |
Ding, Ye | Shanghai Jiao Tong University |
Fan, Shixun | National University of Defense Technology |
Keywords: Manipulation Planning, Dexterous Manipulation
Abstract: This paper presents a one-finger nonprehensile manipulation framework for steering a 3D object to the desired pose. The proposed manipulation framework consists of the periodic searching and pushing modes for the robotic finger. In the searching mode, the optimal start-to-push point on the object is planned based on the defined manipulability indices that comprehensively reflect the continuous pushing distance, the contact force transmissibility, and the pushing demand. In the pushing mode, the pushing force is planned to accelerate the object toward the desired pose. During the pushing process, the equivalent inertia and the pushing force offset at the contact point are estimated online. Both simulation and experimental results show that different 3D objects on a frictional table can be driven to the desired pose by using a torque-driven robotic finger when the finger's base frame is appropriately selected.
|
|
10:45-11:00, Paper WeAT3.4 | |
Enabling Grasp Synthesis Approaches to Task-Oriented Grasping Considering the End-State Comfort and Confidence Effects |
|
Maranci, Emilio | Scuola Superiore Sant'Anna |
D'Avella, Salvatore | Sant'Anna School of Advanced Studies |
Tripicchio, Paolo | Scuola Superiore Sant'Anna |
Avizzano, Carlo Alberto | Scuola Superiore Sant'Anna |
Keywords: Manipulation Planning, Grasping
Abstract: Choosing a good grasp is fundamental for accomplishing robotic grasping and manipulation tasks. Typically, the grasp synthesis is addressed separately from the planning phase, which can lead to failures during the execution of the task. In addition, most of the current grasping approaches privilege stability metrics, providing unsuitable grasps for executing subsequent tasks. The proposed work presents a framework for high-level reasoning to select the best-suited grasp depending on the task. The best grasp is chosen among a set of grasp candidates by solving an optimization problem, considering the environmental constraints, and guaranteeing the end-state comfort and the confidence effects for the task, similar to human behavior. The framework leverages Generalized Bender Decomposition to decouple the main non-linear optimization problem into sub-problems, thus presenting a modular structure. The method is validated with an experimental campaign using three different state-of-the-art grasping algorithms and three low-level motion planners in three different types of tasks: pick-and-place in a constrained environment, handover/tool-use, and object re-orientation. The experiments show that the proposed approach is able to find the best grasp, or at least one feasible, among the provided candidates for each task.
|
|
WeAT4 |
Room 4 |
Soft Robot Materials and Design I |
Regular session |
Chair: Wakimoto, Shuichi | Okayama University |
Co-Chair: Cha, Youngsu | Korea University |
|
10:00-10:15, Paper WeAT4.1 | |
Electroactive Soft Bistable Actuator with Adjustable Energy Barrier and Stiffness (I) |
|
Jiang, Lei | Xi'an Jiaotong University |
Li, Bo | Xi'an Jiatotong University |
Ma, Wentao | Xi'an Jiaotong University, School of Mechanical Engineering |
Wu, Yehui | Xi'an Jiaotong University |
Bai, Ruiyu | Xi'an Jiaotong University |
Sun, Wenjie | School of Mechanical and Precision Instrument Engineering, Xi' A |
Wang, Yanjie | Hohai University |
Chen, Guimin | Xi'an Jiaotong University |
Keywords: Soft Robot Materials and Design, Modeling, Control, and Learning for Soft Robots, Soft Robot Applications, Electroactive soft bistable actuator
Abstract: A soft bistable actuator can generate high-speed motion between two prescribed stable positions, which is very useful for boosting the actuation of soft robots. Generally, the stroke of such an actuator is completely determined once the design is finalized, which prohibits its applications in robots that perform multiple tasks. In the current work, a bistable actuator with adjustable characteristics is proposed by exploring its strain energy landscape, in which the energy barrier is manipulatable via electroactive twisted and coiled polymer fibers. As such, the actuator can operate in either bistable or postbistable mode, both of which exhibit adjustable stiffness. A kinetostatic model that combines the chained beam constraint model and the mechanics of electroactive materials is established to characterize the actuator design. Experimental results validate the kinetostatic model and the behaviors of the actuator. As a robotic demonstration, a gripper that is formed by two actuators is prototyped, and it exhibits an adjustable load capacity (up to 6.5 times its weight under a 3 V voltage).
|
|
10:15-10:30, Paper WeAT4.2 | |
Multi-Modal Soft Amphibious Robots Using Simple Plastic Sheet-Reinforced Thin Pneumatic Actuators (I) |
|
Wu, Jiaxi | Peking University |
Mingxin, Wu | Peking University |
Chen, Wenhui | Peking University |
Wang, Chen | Peking University |
Xie, Guangming | Peking University |
Keywords: Soft Robot Materials and Design, Biologically-Inspired Robots, Hydraulic/Pneumatic Actuators, Multi-Modal Locomotion
Abstract: A large challenge in the field of soft amphibious robotics is achieving high maneuverability and multi-terrain adaptability through multi-modal locomotion. To address this issue, drawing inspiration from fruit-fly larvae and Spanish dancer sea slugs, a novel tethered soft amphibious robot with multi-modal locomotion is proposed in this paper, performing forward, backward, turning, and self-overturn motions both on land and in water. It leverages plastic sheet-reinforced thin pneumatic actuators, which are constructed from thermoplastic membranes and semi-rigid plastic sheets. The robot achieves a forward jumping velocity of 1.77BL/s and a forward swimming velocity of 0.69BL/s, both faster than previously reported soft amphibious robots; connecting two actuator units in parallel, it achieves agile turning with a velocity of 111.8°/s. Our proposed robot demonstrates exceptional multi-terrain adaptability, facile terrestrial-aquatic transition capabilities, and underwater buoyancy adjustment ability. Especially when accidentally overturned, it can recover itself without external assistance, a capability rarely achieved by other soft robots.
|
|
10:30-10:45, Paper WeAT4.3 | |
Design of an Accordion-Fold-Inspired Soft Electrohydraulic Actuator for Angular Motion |
|
Kim, Sohyun | Korea University |
Oh, Yenee | Korea University |
Kang, Joohyeon | Korea University |
Cha, Youngsu | Korea University |
Keywords: Soft Robot Materials and Design, Soft Sensors and Actuators, Soft Robot Applications
Abstract: Angular motion of soft actuators is important for utilization in robotic systems. In the robots, the angular motion should enable agile movement and a wide range. This study proposes a novel design of a soft electrohydraulic actuator inspired by an accordion with an improved angular deformation range. The actuator generates instantaneous hydraulic force and achieves a fast response within 0.15 s upon activation. Furthermore, the accordion-fold structure of the actuator induces large rotational deformations of 171 degrees. We conduct a series of experiments to investigate the angular motion characteristics from the geometrical parameters of the actuator. Additionally, based on the experimental results, we demonstrate how to apply electrohydraulic actuators to animate objects.
|
|
10:45-11:00, Paper WeAT4.4 | |
Fabrication Process for Twisting Artificial Muscles by Utilizing Braiding Technology and Water-Soluble Fibers |
|
Tian, Weihang | Okayama University |
Wakimoto, Shuichi | Okayama University |
Yamaguchi, Daisuke | Okayama University |
Kanda, Takefumi | Okayama Univ |
Keywords: Soft Robot Materials and Design, Hydraulic/Pneumatic Actuators
Abstract: The McKibben artificial muscle is a pneumatic soft actuator consisting of a rubber tube, with fibers covering it. In this study, we propose a new fabrication process to easily realize twisting artificial muscles by utilizing braiding technology and water-soluble fibers based on the McKibben artificial muscle. In the braiding process, half of the sleeve fibers of the artificial muscle are substituted with water-soluble fibers. However, the water-soluble fibers are removed by placing the muscle in warm water to realize a twisting motion. By changing the direction of the fibers, artificial muscles that twist clockwise or counterclockwise can be produced. We believe that the establishment of such a simple and efficient fabrication process will promote to the practical application of artificial muscles.
|
|
WeAT5 |
Room 5 |
Robot Safety I |
Regular session |
Chair: Saveriano, Matteo | University of Trento |
|
10:00-10:15, Paper WeAT5.1 | |
Safe-VLN: Collision Avoidance for Vision-And-Language Navigation of Autonomous Robots Operating in Continuous Environments |
|
Yue, Lu | Peking University |
Zhou, Dongliang | Harbin Institute of Technology, Shenzhen |
Xie, Liang | Unmanned Systems Research Center, National Institute of Defense |
Zhang, Feitian | Peking University |
Yan, Ye | Academy of Military Sciences China |
Yin, Erwei | Harbin Engineering University |
Keywords: Collision Avoidance, Embodied Cognitive Science, Vision-Based Navigation
Abstract: The task of vision-and-language navigation in continuous environments (VLN-CE) aims at training an autonomous agent to perform low-level actions to navigate through 3D continuous surroundings using visual observations and language instructions. The significant potential of VLN-CE for mobile robots has been demonstrated across a large number of studies. However, most existing works in VLN-CE focus primarily on transferring the standard discrete vision-and-language navigation (VLN) methods to continuous environments, overlooking the problem of collisions. Such oversight often results in the agent deviating from the planned path or, in severe instances, the agent being trapped in obstacle areas and failing the navigational task. To address the above-mentioned issues, this paper investigates various collision scenarios within VLN-CE and proposes a classification method to predicate the underlying causes of collisions. Furthermore, a new VLN-CE algorithm, named Safe-VLN, is proposed to bolster collision avoidance capabilities including two key components, i.e., a waypoint predictor and a navigator. In particular, the waypoint predictor leverages a simulated 2D LiDAR occupancy mask to prevent the predicted waypoints from being situated in obstacle-ridden areas. The navigator, on the other hand, employs the strategy of 're-selection after collision' to prevent the robot agent from becoming ensnared in a cycle of perpetual collisions. The proposed Safe-VLN is evaluated on the R2R-CE, the results of which demonstrate an enhanced navigational performance and a statistically significant reduction in collision incidences.
|
|
10:15-10:30, Paper WeAT5.2 | |
Safe Control for Navigation in Cluttered Space Using Multiple Lyapunov-Based Control Barrier Functions |
|
Jang, Inkyu | Seoul National University |
Kim, H. Jin | Seoul National University |
Keywords: Robot Safety, Autonomous Vehicle Navigation, Motion Control
Abstract: Control barrier functions (CBFs) are powerful tools for ensuring safety in controlled systems, commonly employed through the construction of a safety filter using quadratic programming (QP), known as CBF-QP. However, synthesizing a CBF specifically for the navigation tasks of mobile robots, where safety is crucial, poses challenges due to the complexity of the operating environments. In addition to that, the CBF synthesis should be repeated for every new environment, further escalating the computational burden. In this paper, we introduce Lyapunov-based CBFs, which is a CBF built solely from a control Lyapunov function (CLF). By utilizing multiple Lyapunov-based CBFs as building blocks to create a large control invariant set, we formulate a CBF-QP-like safety filter to ensure safety in cluttered environments. The proposed safety filter inherits the favorable characteristics of CBF-QP such as fast computation and safety guarantee, and can adapt to diverse environments without the need for burdensome resynthesis of a new environment-specific CBF. We demonstrate the effectiveness of the proposed approach through multiple simulation and real-world experiments, whose results show that the proposed safety filter was successful in providing safety for the robot even in complex workspaces with many obstacles.
|
|
10:30-10:45, Paper WeAT5.3 | |
A Novel Safety-Aware Energy Tank Formulation Based on Control Barrier Functions |
|
Michel, Youssef | Technical University of Munich |
Saveriano, Matteo | University of Trento |
Lee, Dongheui | Technische Universität Wien (TU Wien) |
Keywords: Robot Safety, Compliance and Impedance Control, Force Control
Abstract: In this work, we propose a novel formulation for energy tanks based on Control Barrier Functions (CBF). Our approach is able to handle simultaneously energy constraints to ensure passivity, as well as enforce power limits in the system to enhance safety. Furthermore, our approach overcomes the discrete switching nature of classical energy tanks, ensuring smooth control commands. To achieve our desiderata, we formulate our tank as a second order dynamical system, where we exploit CBF and Higher-Order CBF to obtain theoretical guarantees on fulfilling the energy and power constraints in the system. Furthermore, we derive conditions related to our tank design in order to ensure the passivity of the controlled robot. Our proposed approach is tested in a series of robot experiments where we validate our approach on tasks such variable stiffness and force control, and in a scenario where it is desired to constraint the kinetic energy in the system.
|
|
10:45-11:00, Paper WeAT5.4 | |
Compliant Robust Control for Robotic Insertion of Soft Bodies |
|
Liu, Yi | University Ghent |
Verleysen, Andreas | Ghent University |
Wyffels, Francis | Ghent University |
Keywords: Robot Safety, Reinforcement Learning
Abstract: This paper proposes a novel framework for insertion-type tasks with soft bodies, such as cleaning a bottle with a soft brush. First, a multimodal model based on vision and force perception is trained, where we use domain randomization for the soft body’s properties to overcome the simulation-to- reality gap. Second, we propose a dynamic safety lock method based on force perception, which is embedded in the training model to make sure that the tool explores and traverses the hole’s path in a compliant way, this results in a higher success rate without damaging the tools/holes. Finally, we perform experiments in simulation and the real world, and the success rate of our proposed method reaches 85.14% in simulation and 83.45% in the real world. Ablation experiments in the real world demonstrate that our method is effective for complex paths and soft bodies with varying deformation intensities.
|
|
WeAT6 |
Room 6 |
Actuation and Joint Mechanisms |
Regular session |
Chair: Khorasani, Amin | Vrije Universiteit Brussel |
|
10:00-10:15, Paper WeAT6.1 | |
Mitigating Collision Forces and Improving Response Performance in Human-Robot Interaction by Using Dual-Motor Actuators |
|
Khorasani, Amin | Vrije Universiteit Brussel |
Usman, Muhammad | Vrije Universiteit Brussel |
Hubert, Thierry | Vrije Universiteit Brussel |
Furnémont, Raphaël | Vrije Universiteit Brussel |
Lefeber, Dirk | Vrije Universiteit Brussel - VUB |
Vanderborght, Bram | VUB |
Verstraten, Tom | Vrije Universiteit Brussel |
Keywords: Actuation and Joint Mechanisms, Safety in HRI, Compliance and Impedance Control
Abstract: In collaborative robotics, the safety of humans interacting with cobots is crucial. There is a need for cobots that can move quickly while still being safe. This paper introduces the use of a kinematically redundant actuator in impedance control mode to reduce collision forces, aiming to improve both the safety and efficiency of cobots. By distributing power across multiple drive trains, each with unique properties such as reflected inertia, the actuator's behavior during collisions is optimized, which is key for safe interactions. Using theoretical analysis and practical experiments, we evaluate the response performance of the redundant actuator in various collision situations according to ISO/TS 15066, comparing it with that of a standard single-drive actuator. Our experiments show that the redundant actuator significantly lowers collision forces, with a 44% reduction in peak forces and an 81% decrease in transferred impulses during collisions. The paper concludes by offering a design parameter recommendation for designing actuators with reduced reflected inertia.
|
|
10:15-10:30, Paper WeAT6.2 | |
Flexible Shaft As Remote and Elastic Transmission for Robot Arms |
|
Usman, Muhammad | Vrije Universiteit Brussel |
Hubert, Thierry | Vrije Universiteit Brussel |
Khorasani, Amin | Vrije Universiteit Brussel |
Furnémont, Raphaël | Vrije Universiteit Brussel |
Vanderborght, Bram | Vrije Universiteit Brussel |
Lefeber, Dirk | Vrije Universiteit Brussel - VUB |
Van de Perre, Greet | Vrije Universiteit Brussel |
Verstraten, Tom | Vrije Universiteit Brussel |
Keywords: Compliant Joints and Mechanisms, Actuation and Joint Mechanisms, Mechanism Design
Abstract: Research on human-friendly robots focuses on safety through software and hardware. Hardware-based safety offers a significant advantage over software-based safety if an accurate hardware model is integrated into the solution. The design of elastic and off-joint actuation has established safety by hardware, where the inherent qualities of elastic and lightweight nature make the robot safe for interaction. Combining series elastic actuators with cable/belt pulley-based remote transmission offers inherently safe hardware design, albeit with increased design and modeling complexity. This paper introduces remote and elastic actuation as a single-element solution for robot arm design using a flexible shaft. The test-bench approach studies the remote and elastic effects of a flexible shaft-based transmission for a robot. A set of nine flexible shafts, differing in length and diameter, are used for benchmarking as 3-D surface empirical maps to facilitate their optimal selection for robot design. An example 3 Degree Of Freedom (DOF) robot arm using a flexible shaft as a remote and elastic actuator is designed and modeled. A low-level control based on a flexible shaft is proposed, backed by the experimental results.
|
|
10:30-10:45, Paper WeAT6.3 | |
Universal Actuation Module and Kinematic Model for Heart Valve Interventional Catheter Robotization |
|
Wang, Weizhao | King's College London |
Wu, Zicong | King's College London |
Saija, Carlo | King's College London |
Zeidan, Aya Mutaz | King’s College London |
Xu, Zhouyang | King's College London |
Pishkahi, Aryana | King's College London |
Patterson, Tiffany | Guy’s & St. Thomas’ Hospitals NHS Foundation Trust |
Redwood, Simon | King’s College London |
Wang, Shuangyi | Chinese Academy of Sciences |
Rhode, Kawal | King's College London |
Housden, Richard James | King's College London |
Keywords: Kinematics, Actuation and Joint Mechanisms, Modeling, Control, and Learning for Soft Robots
Abstract: Catheters have been widely used to deal with heart valve diseases. However, the diversity in handle structures and bending curvatures imposes significant complexities in safe delivery and positioning. In this letter, we designed a module for single knob actuation assembled coaxially on the catheter handle, composed of a chuck for universal clamping of diameters from 15 to 45 mm and a position-adjustable shaft to accommodate various spacing between knobs. In addition, we proposed a two-curvature with pseudo joints (TC-PJ) model for bending control of bendable sections (BSs) in catheters. The verification was decoupled into two steps based on the other three deformation patterns. Firstly, comparing the two-curvature (TC) model with pseudo-rigid-body (PRB), constant curvature (CC), and Euler spiral (ES) models to simulate planar bending and elongation, the results showed a more accurate shape representation. Then, five distinct catheters were employed to test the clamping universality of the module and tip positioning precision of the TC-PJ model which took torsion and shear strain into consideration. The root-mean-square error (RMSE) and the standard deviation (SD) of tip position and direction were analysed. Results indicated the module’s suitability for clamping these catheters, with the large guide sheath exhibiting minimal position RMSE (SD) of around 0.10 (0.051) mm and 0.049 (2.15) degrees, while the puncture catheter demonstrated the highest position and direction RMSE (SD) extending to about 1.16 (0.53) mm and 0.70 (31.33) degrees, primarily attributed to the coupling of two sequential bendable components. Overall, the proposed actuation module and kinematic model showed the ability of universal manipulation and an average tip position and direction RMSE of 0.65 mm and 0.23 degrees in free space.
|
|
10:45-11:00, Paper WeAT6.4 | |
Foam-Embedded Soft Robotic Joint with Inverse Kinematic Modeling by Iterative Self-Improving Learning |
|
Huang, Anlun | University of California, San Diego |
Cao, Yongxi | Delft University of Technology |
Guo, Jiajie | Nanyang Technological University |
Fang, Zhonggui | Southern University of Science and Technology |
Su, Yinyin | The University of Hong Kong |
Liu, Sicong | Southern University of Science and Technology |
Yi, Juan | Southern University of Science and Technology |
Wang, Hongqiang | Southern University of Science and Technology |
Dai, Jian | School of Natural and Mathematical Sciences, King's College Lond |
Wang, Zheng | Southern University of Science and Technology |
Keywords: Soft Robot Materials and Design, Compliant Joints and Mechanisms, Modeling, Control, and Learning for Soft Robots
Abstract: Soft robotic arms have gained significant attention owing to their flexibility and adaptability. Nonetheless, the instability due to their high-elasticity structure further leads to the difficulty of precise kinematic modeling and control. This work introduces a novel solution employing foam-embedded joint design (Fe-Joint), effectively mitigating oscillations and enhancing motion stability. This innovation is integrated into the new continuum soft robotic arm (Fe-Arm). Through iterative design optimization, the Fe-Arm attains superior mechanical performance and control capabilities, enabling a settling state in 0.4 seconds post external force. Enabled by the quasi-static behavior of Fe-Arm, we propose a long short-term memory network (LSTM) based iterative self-improving learning strategy (ISL) for end-to-end inverse kinematics modeling, tailored to Fe-Arm’s mechanical traits, enhancing modeling performance with limited data. Investigating key control parameters, we achieve target trajectory modeling errors within 9% of the workspace radius. The generalization potential of the ISL method is demonstrated using the pentagonal trajectory and on a different Fe-Arm configuration.
|
|
WeAT7 |
Room 7 |
Rehabilitation Robotics |
Regular session |
Chair: Kyung, Ki-Uk | Korea Advanced Institute of Science & Technology (KAIST) |
Co-Chair: Campolo, Domenico | Nanyang Technological University |
|
10:00-10:15, Paper WeAT7.1 | |
Hierarchical Trajectory Deformation Algorithm with Hybrid Controller for Active Lower Limb Rehabilitation |
|
Yang, Ze | University of Science and Technology of China |
Jin, Hu | University of Science and Technology of China |
Gao, Wei | University of Science and Technology of China |
Wang, Erlong | University of Science and Technology of China |
Shu, Yang | University of Science and Technology of China |
Wu, Ming | The First Affiliated Hospital of USTC, Division of Life Sciences |
Zhang, Shiwu | University of Science and Technology of China |
Keywords: Physical Human-Robot Interaction, Rehabilitation Robotics
Abstract: Robot-aided active rehabilitation has shown to be an effective treatment approach for hemiplegic patients. This paper presents an active control framework for lower limb rehabilitation, combining an interaction layer with a hierarchical trajectory deformation algorithm (HTDA), and an assist-as-needed (AAN) layer with a hybrid controller. The HTDA uses constrained optimization in both position and velocity domains to continuously generate smooth reference trajectories based on virtual interaction forces during physical human-robot interaction (pHRI). An additional optimization loop is also implemented to achieve adaptive parameter adjustment for HTDA. Meanwhile, the hybrid controller relies on a force field term and a velocity field term to provide AAN feature. The proposed method is validated on a two-degree-of-freedom lower limb rehabilitation robot for walking with variable step height and step length. The experimental results demonstrate that compared to previously developed admittance model (AM) and trajectory deformation algorithm (TDA), under four different evaluation metrics, HTDA can improve dimensionless squared jerk (DSJ) by 73.6% comparing to AM and improve constraint force percentage (CFP) by 20.4% comparing to TDA. This demonstrate the effectiveness of the proposed framework in reducing human-robot confrontation, especially in improving robot actuation compliance and movement smoothness.
|
|
10:15-10:30, Paper WeAT7.2 | |
Optimization-Based Adaptive Assistance for Lower Limb Exoskeleton Robots with a Robotic Walker Via Spatially Quantized Gait (I) |
|
Zou, Chaobin | University of Electronic Science and Technology of China |
Peng, Zhinan | Unversity of Electronic Science and Tehcnology of China |
Zhang, Long | University of Electronic Science and Technology of China |
Mu, Fengjun | University of Electronic Science and Technology of China |
Huang, Rui | University of Electronic Science and Technology of China |
Cheng, Hong | University of Electronic Science and Technology |
Keywords: Rehabilitation Robotics, Motion and Path Planning, Optimization and Optimal Control
Abstract: Gait training with human-like gait patterns can be provided by lower limb exoskeletons (LLEs) for patients with gait impairments. For patients with little effort to keep balance, using a mobile robotic walker to assist gait training with LLEs is an effective way. Since gait patterns are varying with walking speeds, it is a critical issue to coordinated control the robotic walker and the exoskeleton to obtain a natural and human-like walking posture. In this paper, a novel adaptive assistance approach named SQG-OPT is proposed to tackle the problem, which comprises of two parts: the Spatially Quantized Gait (SQG) and the optimization. The SQG generates reference joint angles and reference trajectory of the Center Of Mass (COM) for the human-exoskeleton system in space domain. The optimization part is constructed to convert the reference joint angles from the space domain to the time domain, which is based on the dynamics model of the human-exoskeleton-walker system and adaptive to different walking speeds. The proposed approach has been tested on the robot simulation platform CoppeliaSim, the experimental results indicate that the proposed approach can generate human-like gait patterns for different walking speeds from 0 to 0.8 m/s. Additionally, in comparison with other methods, the proposed approach has a better performance on the movement tracking of the COM for a natural walking posture.
|
|
10:30-10:45, Paper WeAT7.3 | |
Development of a Dual Function Joint Modular Soft Actuator and Its Evaluation Using a Novel Dummy Finger Joint-Soft Actuator Complex Model |
|
Tortós, Pablo | Department of Medical System Engineering, Chiba University |
Kokubu, Shota | Chiba University |
Matsunaga, Fuko | Chiba University |
Lu, Yuxi | Department of Medical System Engineering, Chiba University |
Zhou, Zhongchao | Graduate School of Science and Engineering, Chiba University |
Gomez-Tames, Jose | Chiba University |
Yu, Wenwei | Chiba University |
Keywords: Soft Robot Materials and Design, Modeling, Control, and Learning for Soft Robots, Rehabilitation Robotics
Abstract: Soft actuators, made from soft materials, offer a safer alternative to rigid robots for use on hand rehabilitation devices. A current challenge is to ensure these actuators comply with human finger morphology. To gain better insights into actuator mechanics when worn on and interacting with human fingers, combining physical experiments with simulation approaches is necessary. However, no simulation has been implemented for finger-actuator interactions. This study proposes a new joint modular soft actuator designed to comply with a dummy finger joint. The new actuator has a dual function design for increasing axial elongation during bending, facilitating compliance with finger morphology. In addition, a novel FEM for the new actuator’s interaction with the dummy finger joint (dummy joint-soft actuator complex) is developed and used with physical experiments for evaluating actuator performance. Results show that the new design increases the dummy joint’s bending range while exerting smaller contact forces on the joint. Even when the joint is blocked at specific bending angles, the actuator remains compliant to finger morphology. This research is a significant advancement in soft actuators design for hand rehabilitation, emphasizing the interaction between human fingers and soft actuators.
|
|
10:45-11:00, Paper WeAT7.4 | |
Origami-Inspired Wearable Robot for Shoulder Abduction Assistance: A Double-Petal Mechanism Utilizing Shape Memory Alloy Actuators |
|
Chung, Chongyoung | Korea Advanced Institute of Science and Technology (KAIST) |
Hyeon, Kyujin | KAIST |
Jeong, Jaeyeon | Korea Advanced Institute of Science Ane Technology |
Lee, Dae-Young | Korea Advanced Institute of Science and Technology |
Kyung, Ki-Uk | Korea Advanced Institute of Science & Technology (KAIST) |
Keywords: Wearable Robotics, Rehabilitation Robotics, Mechanism Design
Abstract: This paper proposes a novel wearable robot designed to assist with shoulder abduction using a double-petal mechanism based on petal fold origami driven by shape memory alloy (SMA)-based artificial muscle. The proposed double-petal mechanism consists of two petal structures that mimic the scapula and humerus, respectively. It follows the scapulohumeral rhythm to prevent bone collision and reduce the compressive force on the glenohumeral joint. The mechanism is designed to achieve high mechanical advantage and torque output while minimizing the overall weight using lightweight SMA spring actuators and carbon fiber-reinforced plastic-based frames. The proposed robot can assist with shoulder abduction both with (active support) and without energy input (passive support) using bundles of SMA spring actuators. It can generate assistance torque up to 6.36 Nm passively and 12.6 Nm actively at a 90° abduction angle. To verify the assistance performance of the proposed robot, surface electromyography of the lateral deltoid is measured during shoulder abduction with and without the assistance of the robot and the results confirm that the robot effectively assists in shoulder abduction.
|
|
WeAT8 |
Room 8 |
Mapping I |
Regular session |
Chair: Nuechter, Andreas | University of Würzburg |
|
10:00-10:15, Paper WeAT8.1 | |
An Integrated Hierarchical Approach for Real-Time Mapping with Gaussian Mixture Model |
|
Gao, Yuan | Shanghai Jiao Tong University |
Dong, Wei | Shanghai Jiao Tong University |
Keywords: Mapping, Swarm Robotics, Aerial Systems: Perception and Autonomy
Abstract: To achieve effective collaboration of multiple robots, it requires efficient exchanges of map information. As directly exchanging generally used depth map requires high communication bandwidth, it is practical to enhance the efficiency using map compression techniques based on Gaussian mixture models. Currently, parameters of the Gaussian mixture model are mostly computed using the expectation-maximization algorithm. It is time consuming as it has to iteratively update parameters by traversing all points in a point cloud converted from the depth map, and it is not suitable for real-time applications. Other methods directly segment the point cloud into grids and then perform a single Gaussian parameter estimation for each grid. They achieve real-time compression but generate parameter sensitive results. To tackle issues above, we improve compression methods with an integrated hierarchical approach. First, the points are clustered hierarchically and efficiently by K-means, generating coarse clusters. Then, each cluster is further hierarchically clustered by expectation-maximization algorithm for accuracy enhancement. After each clustering process, an evaluation index for ensuring accuracy and preventing over-fitting is calculated to determine whether pruning or retention of newly generated clusters is appropriate. At last, parameters of each Gaussian distribution in the model are estimated by points in a corresponding cluster. Experiments conducted in various environments demonstrate that our approach improves computing efficiency by over 79 times compared to the state-of-the-art approach.
|
|
10:15-10:30, Paper WeAT8.2 | |
Incremental Triangle Mesh Generation with Mesh Refinement |
|
Niedźwiedzki, Jakub | Lodz University of Technology |
Lipinski, Piotr | Lodz University of Technology |
Podsedkowski, Leszek | Lodz University of Technology, Institute of Machine Tools and Pr |
Keywords: Mapping, SLAM, Range Sensing
Abstract: This letter presents an incremental algorithm to generate triangle meshes from Light Detection and Ranging (LiDAR) point clouds with mesh refinement. The algorithm produces triangle mesh directly from LiDAR output without storing a dense point cloud to create a high-quality triangle mesh. In our algorithm, as the number of captured points increases during the LiDAR operation and robot movement, the new scan points from the LiDAR output refine or extend the existing triangle mesh. Such an approach is suitable for computationally-constrained systems like aerial vehicles, mobile robots, and smartphones, as it requires relatively limited resources. Our algorithm can reconstruct the topology of city-sized scenes maintaining a maximum triangle mesh error below 2 cm much faster than state-of-the-art triangle mesh generation algorithms that we demonstrate on publicly available data sets.
|
|
10:30-10:45, Paper WeAT8.3 | |
Uni-Fusion: Universal Continuous Mapping (I) |
|
Yuan, Yijun | University of Wuerzburg |
Nuechter, Andreas | University of Würzburg |
Keywords: Mapping, RGB-D Perception, Semantic Scene Understanding, Universal mapping
Abstract: We present Uni-Fusion, a universal continuous mapping framework for surfaces, surface properties (color, infrared, etc.) and more (latent features in CLIP embedding space, etc.). We propose the first universal implicit encoding model that supports encoding of both geometry and different types of properties (RGB, infrared, features, etc.) without requiring any training. Based on this, our framework divides the point cloud into regular grid voxels and generates a latent feature in each voxel to form a Latent Implicit Map (LIM) for geometries and arbitrary properties. Then, by fusing a local LIM frame-wisely into a global LIM, an incremental reconstruction is achieved. Encoded with corresponding types of data, our Latent Implicit Map is capable of generating continuous surfaces, surface property fields, surface feature fields, and all other possible options. To demonstrate the capabilities of our model, we implement three applications: (1) incremental reconstruction for surfaces and color (2) 2D-to-3D transfer of fabricated properties (3) open-vocabulary scene understanding by creating a text CLIP feature field on surfaces. We evaluate Uni-Fusion by comparing it
|
|
10:45-11:00, Paper WeAT8.4 | |
BeautyMap: Binary-Encoded Adaptable Ground Matrix for Dynamic Points Removal in Global Maps |
|
Jia, Mingkai | The Hong Kong University of Science and Technology |
Zhang, Qingwen | KTH Royal Institute of Technology |
Yang, Bowen | The Hong Kong University of Science and Technology, Robotics Ins |
Wu, Jin | HKUST |
Liu, Ming | Hong Kong University of Science and Technology (Guangzhou) |
Jensfelt, Patric | KTH - Royal Institute of Technology |
Keywords: Object Detection, Segmentation and Categorization, Mapping, Autonomous Agents
Abstract: Global point clouds that correctly represent the static environment features can facilitate accurate localization and robust path planning. However, dynamic objects introduce undesired 'ghost' tracks that are mixed up with the static environment. Existing dynamic removal methods normally fail to balance the performance in computational efficiency and accuracy. In response, we present 'BeautyMap' to efficiently remove the dynamic points while retaining static features for high-fidelity global maps. Our approach utilizes a binary-encoded matrix to efficiently extract the environment features. With a bit-wise comparison between matrices of each frame and the corresponding map region, we can extract potential dynamic regions. Then we use coarse to fine hierarchical segmentation of the z-axis to handle terrain variations. The final static restoration module accounts for the range-visibility of each single scan and protects static points out of sight. Comparative experiments underscore BeautyMap's superior performance in both accuracy and efficiency against other dynamic points removal methods. The code is open-sourced at https://github.com/MKJia/BeautyMap.
|
|
WeAT9 |
Room 9 |
Task and Motion Planning I |
Regular session |
Chair: Loianno, Giuseppe | New York University |
Co-Chair: Ornik, Melkior | University of Illinois Urbana-Champaign |
|
10:00-10:15, Paper WeAT9.1 | |
E(2)-Equivariant Graph Planning for Navigation |
|
Zhao, Linfeng | Northeastern University |
Li, Hongyu | Brown University |
Padir, Taskin | Northeastern University |
Jiang, Huaizu | Northeastern University |
Wong, Lawson L.S. | Northeastern University |
Keywords: Integrated Planning and Learning, Deep Learning Methods, Vision-Based Navigation
Abstract: Learning for robot navigation presents a critical and challenging task. The scarcity and costliness of real-world datasets necessitate efficient learning approaches. In this letter, we exploit Euclidean symmetry in planning for 2D navigation, which originates from Euclidean transformations between reference frames and enables parameter sharing. To address the challenges of unstructured environments, we formulate the navigation problem as planning on a geometric graph and develop an equivariant message passing network to perform value iteration. Furthermore, to handle multi-camera input, we propose a learnable equivariant layer to lift features to a desired space. We conduct comprehensive evaluations across five diverse tasks encompassing structured and unstructured environments, along with maps of known and unknown, given point goals or semantic goals. Our experiments confirm the substantial benefits of training efficiency, stability, and generalization.
|
|
10:15-10:30, Paper WeAT9.2 | |
Text2Reaction : Enabling Reactive Task Planning Using Large Language Models |
|
Yang, Zejun | University |
Ning, Li | University of Chinese Academy of Science |
Wang, Haitao | University of Chinese Academy of Sciences |
Jiang, Tianyu | Institute of Automation, Chinese Academy of Scienses |
Zhang, Shaolin | Institute of Automation, Chinese Academy of Sciences |
Cui, Shaowei | Institute of Automation, Chinese Academy of Sciences |
Jiang, Hao | Institute of Computing Technology, Chinese Academy of Sciences |
Li, Chunpeng | University |
Wang, Shuo | Chinese Academy of Sciences |
Wang, Zhaoqi | Institute of Computing Technology, the Chinese Academy of Scienc |
Keywords: Planning under Uncertainty, AI-Based Methods, Learning from Demonstration
Abstract: To complete tasks in dynamic environments, robots need to timely update their plans to react to environment changes. Traditional stripe-like or learning-based planners struggle to achieve this due to their high reliance on meticulously predefined planning rules or labeled data. Fortunately, recent works find that Large Language Models (LLMs) can be effectively prompted to solve planning problems. Thus, we investigate the strategies for LLMs to master reactive planning problems without complex definitions and extra training. We propose Text2Reaction, an LLM-based framework enabling robots to continuously reason and update plans according to the latest environment changes. Inspired from human’s step-by-step re-planning process, we present the Re-planning Prompt, which informs LLMs the basic principles of replanning and fosters the gradual development of a current plan to a new one in a three-hop reasoning manner - cause analysis, consequence inference, and plan adjustment. Additionally, Text2Reaction is designed to first generate an initial plan based on the task description before execution, allowing for subsequent iterative updates of this plan. We demonstrate the superior performance of Text2Reaction over prior works in reacting to various environment changes and completing varied tasks. Additionally, we validate the reliability of our re-planning prompt through ablation experiments and its capability when deployed in real-world robots, enabling continuous reasoning in the face of diverse changes until the user instructions are successfully completed.
|
|
10:30-10:45, Paper WeAT9.3 | |
Graph Neural Network for Decentralized Multi-Robot Goal Assignment |
|
Goarin, Manohari | New York University, Tandon School of Engineering |
Loianno, Giuseppe | New York University |
Keywords: Task and Motion Planning, Integrated Planning and Learning, Deep Learning Methods
Abstract: The problem of assigning a set of spatial goals to a team of robots plays a crucial role in various multi-robot planning applications including, but not limited to exploration, search and rescue, or surveillance. The Linear Sum Assignment Problem (LSAP) is a common way of formulating and resolving this problem. This optimization problem aims at assigning the tasks to the robots minimizing the sum of costs while respecting a one-to-one matching constraint. However, communication restrictions in real-world scenarios pose significant challenges. Existing decentralized solutions often rely on numerous communication interactions to converge to a conflict-free and optimal solution or assume a prior conflict-free random assignment. In this paper, we propose a novel Decentralized Graph Neural Network approach for multi-robot Goal Assignment (DGNN-GA). We leverage a heterogeneous graph representation to model the inter-robot communication and the assignment relations between goals and robots. We compare in simulation its performance to other decentralized state-of-the-art approaches. Specifically, our method outperforms popular state-of-the art approaches in strictly restricted communication scenarios and does not rely on any initial conflict-free guess compared to two other algorithms. Finally, the DGNN-GA is also deployed and validated in real-world experiments.
|
|
10:45-11:00, Paper WeAT9.4 | |
Modular Multi-Level Replanning TAMP Framework for Dynamic Environment |
|
Lin, Tao | Harbin Institute of Technology |
Yue, Chengfei | Harbin Institute of Technology, Shenzhen |
Liu, Ziran | Research Center of the Satellite Technology |
Cao, Xibin | Research Center of the Satellite Technology |
Keywords: Task and Motion Planning, Task Planning, Manipulation Planning
Abstract: Task and Motion Planning (TAMP) algorithms can generate plans that combine logic and motion aspects for robots. However, these plans are sensitive to interference and control errors. To make TAMP algorithms more applicable and robust in the real world, we propose the modular multi-level replanning TAMP framework (MMRF), expanded existing TAMP algorithms by combining real-time replanning components. MMRF generates a nominal plan from the initial state and then reconstructs this plan in real-time to reorder manipulations. Following the logic-level adjustment, MMRF attempts to replan a new motion path, ensuring that the updated plan is feasible at the motion level. Finally, we conducted several real-world experiments. The result demonstrated MMRF swiftly completing tasks in scenarios with moving obstacles and varying degrees of interference.
|
|
WeAT10 |
Room 10 |
Vision-Based Navigation I |
Regular session |
Chair: Yang, Tao | Northwestern Polytechnical University |
Co-Chair: Ehsan, Shoaib | University of Essex |
|
10:00-10:15, Paper WeAT10.1 | |
RMSC-VIO: Robust Multi-Stereoscopic Visual-Inertial Odometry for Local Visually Challenging Scenarios |
|
Zhang, Tong | Northwestern Polytechnical University |
Xu, Jianyu | Northwestern Polytechnical University |
Shen, Hao | Northwestern Polytechnical University |
Yang, Rui | Université De Technologie De Belfort Montbéliard |
Yang, Tao | Northwestern Polytechnical University |
Keywords: SLAM, Vision-Based Navigation, Sensor Fusion
Abstract: We present a Multi-Stereoscopic Visual-Inertial Odometry (VIO) system capable of integrating an arbitrary number of stereo cameras, exhibiting excellent robustness in the face of visually challenging scenarios. During system initialization, we introduce multi-view keyframes for simultaneous processing of multiple image inputs and propose an adaptive feature selection method to alleviate the computational burden of multi-camera systems. This method iteratively updates the state information of visual features, filtering out high-quality image feature points and effectively reducing unnecessary redundancy consumption.In the backend phase, we propose an adaptive tightly coupled optimization method, assigning corresponding optimization weights based on the quality of different image feature points, effectively enhancing localization precision.We validate the effectiveness and robustness of our system through a series of datasets, encompassing various visually challenging scenarios and practical flight experiments. Our approach achieves up to a 90% reduction in Absolute Trajectory Error (ATE) compared to state-of-the-art multi-camera VIO methods.
|
|
10:15-10:30, Paper WeAT10.2 | |
LIVER: A Tightly Coupled LiDAR-Inertial-Visual State Estimator with High Robustness for Underground Environments |
|
Wen, Tianci | Nankai University |
Fang, Yongchun | Nankai University |
Lu, Biao | Nankai University |
Zhang, Xuebo | Nankai University, |
Tang, Chaoquan | China University of Mining and Technology |
Keywords: SLAM, Localization, Sensor Fusion
Abstract: In this paper, we propose a tightly coupled LiDAR-inertial-visual (LIV) state estimator termed LIVER, which achieves robust and accurate localization and mapping in underground environments. LIVER starts with an effective strategy for LIV synchronization. A robust initialization process that integrates LiDAR, vision, and IMU is realized. A tightly coupled, nonlinear optimization-based method achieves highly accurate LiDAR-inertial-visual odometry (LIVO) by fusing LiDAR, visual, and IMU information. We consider scenarios in underground environments that are unfriendly to LiDAR and cameras. A visual-IMU-assisted method enables the evaluation and handling of LiDAR degeneracy. A deep neural network is introduced to eliminate the impact of poor lighting conditions on images. We verify the performance of the proposed method by comparing it with the state-of-the-art methods through public datasets and real-world experiments, including underground mines (see our attached video https://youtu.be/0wjXEz3K3ng). In underground mines test, tightly coupled methods without degeneracy handling lead to failure due to self-similar areas (affecting LiDAR) and poor lighting conditions (affecting vision). In these conditions, our degeneracy handling approach successfully eliminates the impact of disturbances on the system.
|
|
10:30-10:45, Paper WeAT10.3 | |
Aggregating Multiple Bio-Inspired Image Region Classifiers for Effective and Lightweight Visual Place Recognition |
|
Arcanjo, Bruno | University of Essex |
Ferrarini, Bruno | Universtiy of Essex |
Fasli, Maria | University of Essex |
Milford, Michael J | Queensland University of Technology |
McDonald-Maier, Klaus | University of Essex |
Ehsan, Shoaib | University of Essex |
Keywords: Vision-Based Navigation, Localization, Bioinspired Robot Learning
Abstract: Visual place recognition (VPR) enables autonomous systems to localize themselves within an environment using image information. While VPR techniques built upon a Convolutional Neural Network (CNN) backbone dominate state-of-the-art VPR performance, their high computational requirements make them unsuitable for platforms equipped with low-end hardware. Recently, a lightweight VPR system based on multiple bio-inspired classifiers, dubbed DrosoNets, has been proposed, achieving great computational efficiency at the cost of reduced absolute place retrieval performance. In this work, we propose a novel multi-DrosoNet localization system, dubbed RegionDrosoNet, with significantly improved VPR performance, while preserving a low-computational profile. Our approach relies on specializing distinct groups of DrosoNets on differently sliced partitions of the original images, increasing model differentiation. Furthermore, we introduce a novel voting module to combine the outputs of all DrosoNets into the final place prediction which considers multiple top reference candidates from each DrosoNet. RegionDrosoNet outperforms other lightweight VPR techniques when dealing with both appearance changes and viewpoint variations. Moreover, it competes with computationally expensive methods on some benchmark datasets at a small fraction of their online inference time.
|
|
10:45-11:00, Paper WeAT10.4 | |
Design Space Exploration of Low-Bit Quantized Neural Networks for Visual Place Recognition |
|
Grainge, Oliver Edward | University of Southampton |
Milford, Michael J | Queensland University of Technology |
Bodala, Indu | University of Southampton |
Ramchurn, Sarvapali | University of Southampton |
Ehsan, Shoaib | University of Essex |
Keywords: Vision-Based Navigation, Localization, Recognition
Abstract: Visual Place Recognition (VPR) is a critical task for performing global re-localization in visual perception systems, requiring the ability to recognize a previously visited location under variations such as illumination, occlusion, appearance and viewpoint. In the case of robotics, the target devices for deployment are usually embedded systems. Therefore whilst the accuracy of VPR systems is important so too is memory consumption and latency. Recently new works have focused on the Recall@1 metric paying limited attention to resource utilization, resulting in methods that use complex models unsuitable for edge deployment. We hypothesize that these methods can be optimized to satisfy the constraints of low powered embedded systems whilst maintaining high recall performance. Our work studies the impact of compact architectural design in combination with full-precision and mixed-precision post-training quantization on VPR performance. Importantly we not only measure performance via the Recall@1 score but also measure memory consumption and latency. We characterize the design implications on memory, latency and recall scores and provide a number of design recommendations for VPR systems under these limitations.
|
|
WeAT11 |
Room 11 |
Path Planning for Multiple Mobile Robots or Agents |
Regular session |
Chair: Sartoretti, Guillaume Adrien | National University of Singapore (NUS) |
|
10:00-10:15, Paper WeAT11.1 | |
Collaborative Planning for Catching and Transporting Objects in Unstructured Environments |
|
Pei, Liuao | Zhejiang University |
Lin, Junxiao | Zhejiang University |
Han, Zhichao | Zhejiang University |
Quan, Lun | Zhejiang University |
Cao, Yanjun | Zhejiang University, Huzhou Institute of Zhejiang University |
Xu, Chao | Zhejiang University |
Gao, Fei | Zhejiang University |
Keywords: Intelligent Transportation Systems, Path Planning for Multiple Mobile Robots or Agents, Multi-Robot Systems
Abstract: Multi-robot teams have attracted attention from industry and academia for their ability to perform collaborative tasks in unstructured environments, such as wilderness rescue and collaborative transportation. In this paper, we propose a trajectory planning method for a non-holonomic robotic team with collaboration in unstructured environments. For the adaptive state collaboration of a robot team to catch and transport targets to be rescued using a net, we model the process of catching the falling target with a net in a continuous and differentiable form. This enables the robot team to fully exploit the kinematic potential, thereby adaptively catching the target in an appropriate state. Furthermore, the size safety and topological safety of the net, resulting from the collaborative support of the robots, are guaranteed through geometric constraints. We integrate our algorithm on a car-like robot team and test it in simulations and real-world experiments to validate our performance. Our method is compared to state-of-the-art multivehicle trajectory planning methods, demonstrating significant performance in efficiency and trajectory quality.
|
|
10:15-10:30, Paper WeAT11.2 | |
A TSP-Based Online Algorithm for Multi-Task Multi-Agent Pickup and Delivery |
|
Kudo, Fumiya | Osaka Metropolitan University |
Cai, Kai | Osaka Metropolitan University |
Keywords: Path Planning for Multiple Mobile Robots or Agents, Collision Avoidance, Discrete Event Dynamic Automation Systems
Abstract: The Multi-Agent Path Finding (MAPF) and its extension, Multi-Agent Pickup and Delivery (MAPD), have received much attention in academia. In industry, on the other hand, automatic control of teams of robots and AGVs on factory floors and logistic warehouses for pickup and delivery operations have also been studied intensively. Currently, MAPD problem formulation does not fully capture important aspects of many real-world industrial applications, e.g., MAPD allocates only one task at a time for each agent, payload capacity for each agent is ignored, and pickup & dropoff operations are assumed to be done immediately. In this paper, we extend MAPD problem to a multi-task setting where each agent is allocated multiple tasks considering payload capacity as well as pickup & dropoff cost. We propose an online multi-task MAPD algorithm which is a combination of MAPF and Traveling Salesman Problem (TSP) algorithm. Comparisons between the proposed and conventional MAPD show that the proposed MAPD is able to achieve 18% − 38% shorter makespan paths in wide range of agent numbers. We also examine the behavior of the proposed online multi-task MAPD by changing payload capacity distribution and pickup & dropoff cost. Simulation results indicate that increase of pickup cost can largely increase the makespan when agent number is small; on the other hand, increase of dropoff cost tend to increase the makespan when agent number is large. Our empirical study also demonstrates that the proposed online multitask MAPD is applicable to large scale environment (e.g., agent number= 300) in an online manner.
|
|
10:30-10:45, Paper WeAT11.3 | |
Motion Planning for Multiple Heterogeneous Magnetic Robots under Global Input (I) |
|
Asadi, Farshid | Southern Methodist University |
Hurmuzlu, Yildirim | Southern Methodist University |
Keywords: Path Planning for Multiple Mobile Robots or Agents, Underactuated Robots, Magnetic Actuatioin, Collision Avoidance
Abstract: Magnetism provides an untethered actuation mechanism and an alternative way to actuate robots. Using a magnetic field we can control the motion of robots embedded with magnets. This scales down the size of the robots dramatically such that they can be used in applications like drug delivery, sample collection, micromanipulation, and non-invasive procedures. Despite all advantages and potentials, magnetic actuation has one major drawback. Due to the similar interaction between the magnetic field and the embedded magnets in multi-robot systems, controlling the robots independently is challenging. Using heterogeneous magnetic robots is one way to overcome the independent control challenge. Here, motion planning for multiple magnetic robots that move in parallel directions at different speeds in response to a global input is addressed in the absence of obstacles in a polygonal workspace. Through controllability analysis, it will be shown that having n linearly independent heterogeneous responses to the global input, called Modes of Motion here, enables independent position control of n robots in the system. Further, a procedure to have a potentially feasible sequen
|
|
10:45-11:00, Paper WeAT11.4 | |
Leadership Inference for Multi-Agent Interactions |
|
Khan, Hamzah | The University of Texas at Austin |
Fridovich-Keil, David | The University of Texas at Austin |
Keywords: Probabilistic Inference, Optimization and Optimal Control
Abstract: Effectively predicting intent and behavior requires inferring leadership in multi-agent interactions. Dynamic games provide an expressive theoretical framework for modeling these interactions. Employing this framework, we propose a novel method to infer the leader in a two-agent game by observing the agents' behavior in complex, long-horizon interactions. We make two contributions. First, we introduce an iterative algorithm that solves dynamic two-agent Stackelberg games with nonlinear dynamics and nonquadratic costs, and demonstrate that it consistently converges in repeated trials. Second, we propose the Stackelberg Leadership Filter (SLF), an online method for identifying the leading agent in interactive scenarios based on observations of the game interactions. We validate the leadership filter's efficacy on simulated driving scenarios to demonstrate that the SLF can draw conclusions about leadership that match right-of-way expectations.
|
|
WeAT12 |
Room 12 |
Reinforcement Learning I |
Regular session |
Co-Chair: Panov, Aleksandr | AIRI |
|
10:00-10:15, Paper WeAT12.1 | |
Learning Whole-Body Manipulation for Quadrupedal Robot |
|
Jeon, Seunghun | KAIST |
Jung, Moonkyu | Korea Advanced Institute of Science and Technology |
Choi, Suyoung | KAIST |
Kim, Beomjoon | Korea Advanced Institute of Science and Technology |
Hwangbo, Jemin | Korean Advanced Institute of Science and Technology |
Keywords: Reinforcement Learning, Deep Learning Methods, Legged Robots
Abstract: We propose a learning-based system for enabling quadrupedal robots to manipulate large, heavy objects using their whole body. Our system is based on a hierarchical control strategy that uses the deep latent variable embedding which captures manipulation-relevant information from interactions, proprioception, and action history, allowing the robot to implicitly understand object properties. We evaluate our framework in both simulation and real-world scenarios. In the simulation, it achieves a success rate of 93.6 % in accurately re-positioning and re-orienting various objects within a tolerance of 0.03 m and 5 ◦. Real-world experiments demonstrate the successful manipulation of objects such as a 19.2 kg water-filled drum and a 15.3 kg plastic box filled with heavy objects while the robot weighs 27 kg. Unlike previous works that focus on manipulating small and light objects using prehensile manipulation, our framework illustrates the possibility of using quadrupeds for manipulating large and heavy objects that are ungraspable with the robot’s entire body. Our method does not require explicit object modeling and offers significant computational efficiency compared to optimization-based methods.
|
|
10:15-10:30, Paper WeAT12.2 | |
OmniDrones: An Efficient and Flexible Platform for Reinforcement Learning in Drone Control |
|
Xu, Botian | Tsinghua University |
Gao, Feng | Tsinghua University |
Yu, Chao | Tsinghua University |
Zhang, Ruize | Tsinghua University |
Wu, Yi | Tsinghua University |
Wang, Yu | Tsinghua University |
Keywords: Reinforcement Learning, Machine Learning for Robot Control
Abstract: In this work, we introduce OmniDrones, an efficient and flexible platform tailored for reinforcement learning in drone control, built on Nvidia’s Omniverse Isaac Sim. It employs a bottom-up design approach that allows users to easily design and experiment with various application scenarios on top of GPU-parallelized simulations. It also offers a range of benchmark tasks, presenting challenges ranging from single-drone hovering to over-actuated system tracking. In summary, we propose an open-sourced drone simulation platform, equipped with an extensive suite of tools for drone learning. It includes 4 drone models, 5 sensor modalities, 4 control modes, over 10 benchmark tasks, and a selection of widely used RL baselines. To showcase the capabilities of OmniDrones and to support future research, we also provide preliminary results on these benchmark tasks. We hope this platform will encourage further studies on applying RL to practical drone systems. For more resources including documentation and code, please visit: https://omnidrones.readthedocs.io/.
|
|
10:30-10:45, Paper WeAT12.3 | |
Skill-Critic: Refining Learned Skills for Hierarchical Reinforcement Learning |
|
Hao, Ce | University of California, Berkeley |
Weaver, Catherine | University of California, Berkeley |
Tang, Chen | University of California Berkeley |
Kawamoto, Kenta | Sony Research Inc |
Tomizuka, Masayoshi | University of California |
Zhan, Wei | Univeristy of California, Berkeley |
Keywords: Reinforcement Learning, Learning from Demonstration, Transfer Learning
Abstract: Hierarchical reinforcement learning (RL) can accelerate long-horizon decision-making by temporally abstracting a policy into multiple levels. Promising results in sparse reward environments have been seen with skills, i.e. sequences of primitive actions. Typically, a skill latent space and policy are discovered from offline data. However, the resulting low-level policy can be unreliable due to low-coverage demonstrations or distribution shifts. As a solution, we propose the Skill-Critic algorithm to fine-tune the low-level policy in conjunction with high-level skill selection. Our Skill-Critic algorithm optimizes both the low-level and high-level policies; these policies are initialized and regularized by the latent space learned from offline demonstrations to guide the parallel policy optimization. We validate Skill-Critic in multiple sparse-reward RL environments, including a new sparse-reward autonomous racing task in Gran Turismo Sport. The experiments show that Skill-Critic's low-level policy fine-tuning and demonstration-guided regularization are essential for good performance. Code and videos are available at our website: https://sites.google.com/view/skill-critic.
|
|
10:45-11:00, Paper WeAT12.4 | |
Data-Efficient Task Generalization Via Probabilistic Model-Based Meta Reinforcement Learning |
|
Bhardwaj, Arjun | ETH Zurich |
Rothfuss, Jonas | ETH Zurich |
Sukhija, Bhavya | ETH Zürich |
As, Yarden | ETH Zurich |
Hutter, Marco | ETH Zurich |
Coros, Stelian | ETH Zurich |
Krause, Andreas | ETH Zurich |
Keywords: Reinforcement Learning, Model Learning for Control, Learning from Experience
Abstract: We introduce PACOH-RL, a novel model-based Meta-Reinforcement Learning (Meta-RL) algorithm designed to efficiently adapt control policies to changing dynamics. PACOH-RL meta-learns priors for the dynamics model, allowing swift adaptation to new dynamics with minimal interaction data. Existing Meta-RL methods require abundant meta-learning data, limiting their applicability in settings such as robotics, where data is costly to obtain. To address this, PACOH-RL incorporates regularization and epistemic uncertainty quantification in both the meta-learning and task adaptation stages. When facing new dynamics, we use these uncertainty estimates to effectively guide exploration and data collection. Overall, this enables positive transfer, even when access to data from prior tasks or dynamic settings is severely limited. Our experiment results demonstrate that PACOH-RL outperforms model-based RL and model-based Meta-RL baselines in adapting to new dynamic conditions. Finally, on a real robotic car, we showcase the potential for efficient RL policy adaptation in diverse, data-scarce conditions.
|
|
WeAT13 |
Room 13 |
Scientific Exploration |
Regular session |
Chair: Sutoh, Masataku | Japan Aerospace Exploration Agency |
Co-Chair: Kim, Pyojin | Gwangju Institute of Science and Technology (GIST) |
|
10:00-10:15, Paper WeAT13.1 | |
Transition Gradient from Standing to Traveling Waves for Energy-Efficient Slope Climbing of a Gecko-Inspired Robot |
|
Haomachai, Worasuchad | Nanjing University of Aeronautics and Astronautics |
Dai, Zhendong | Nanjing University of Aeronautics and Astronautics |
Manoonpong, Poramate | Vidyasirimedhi Institute of Science and Technology (VISTEC) |
Keywords: Climbing Robots, Biomimetics, Biologically-Inspired Robots
Abstract: Lateral undulation patterns of a flexible spine, including standing waves, traveling waves, and their transitions, enable agile and versatile locomotion in sprawling animals. Inspired by this, we proposed body-wave transition strategies for energy-efficient inclined-surface climbing of a gecko-inspired robot with a bendable body. Using the robot as a scientific tool, we searched a large space of body movements (i.e., percentage of traveling waves and stride frequency) to explore climbing performance at different slope angles. Consequently, we designed a body-wave strategy to smoothly transition from a standing wave at low speeds to a traveling wave at high speeds to achieve energy-efficient climbing for each slope angle. Through a real robot experiment on the steepest slope (30 degrees), we demonstrated that the robot can reduce energy consumption by 7% compared to climbing with a constant-body movement owing to the transition gradient from standing to traveling waves with an optimal speed. To this end, our study can pave the way for the development of climbing robots that utilize multiple body movement patterns with smooth transitions. Moreover, it can make a valuable contribution to biologists by formulating a novel hypothesis concerning the energy efficiency of gecko climbing.
|
|
10:15-10:30, Paper WeAT13.2 | |
A Multi-Arm Robotic Platform for Scientific Exploration (I) |
|
Marques Marinho, Murilo | The University of Manchester |
Quiroz Omana, Juan Jose | The University of Tokyo |
Harada, Kanako | The University of Tokyo |
Keywords: Robotics and Automation in Life Sciences, Software-Hardware Integration for Robot Systems, Kinematics
Abstract: There are a large number of robotic platforms with two or more arms targeting surgical applications. Despite that, very few groups have employed such platforms for scientific exploration. Possible applications of a multi-arm platform in scientific exploration involve the study of the mechanisms of intractable diseases by using organoids (i.e., miniature human organs). The study of organoids requires the preparation of a cranial window which is done by carefully removing an 8 mm patch of the mouse skull. In this work, we present the first prototype of the AI robot science platform for scientific experimentation, its digital twins, and perform validation experiments under teleoperation. The experiments showcase the dexterity of the platform by performing peg transfer, gauze cutting, mock experiments using eggs, and the world's first four-hand teleoperated drilling for a cranial window. The digital twins and related control software are freely available for noncommercial use at https://AISciencePlatform.github.io.
|
|
10:30-10:45, Paper WeAT13.3 | |
Astrobee ISS Free-Flyer Datasets for Space Intra-Vehicular Robot Navigation Research |
|
Kang, Suyoung | Sookmyung Women's University |
Soussan, Ryan | Aerodyne Industries |
Lee, Daekyeong | Sookmyung Women University |
Coltin, Brian | Carnegie Mellon University |
Mora, Andres | NASA Ames Research Center |
Moreira, Marina | Instituto Superior Técnico, Lisbon University |
Browne, Katie | University of Nevada, Reno |
Garcia Ruiz, Ruben | KBR Inc, NASA Ames |
Bualat, Maria | NASA Ames Research Center |
Smith, Trey | NASA Ames Research Center |
Barlow, Jonathan | KBR, Inc |
Benavides, Jose | NASA |
Jeong, Eunju | Sookmyung Women's University |
Kim, Pyojin | Gwangju Institute of Science and Technology (GIST) |
Keywords: Space Robotics and Automation, Data Sets for SLAM, Autonomous Vehicle Navigation
Abstract: We present the first annotated benchmark datasets for evaluating free-flyer visual-inertial localization and mapping algorithms in a zero-g spacecraft interior. The Astrobee free-flying robots that operate inside the International Space Station (ISS) collected the datasets. Space intra-vehicular free-flyers face unique localization challenges: their IMU does not provide a gravity vector, their attitude is fully arbitrary, and they operate in a dynamic, cluttered environment. We extensively evaluate state-of-the-art visual navigation algorithms on these challenging Astrobee datasets, showing superior performance of classical geometry-based methods over recent data-driven approaches. The datasets include monocular images and IMU measurements, with multiple sequences performing a variety of maneuvers and covering four ISS modules. The sensor data is spatio-temporally aligned, and extrinsic/intrinsic calibrations, ground-truth 6-DoF camera poses, and detailed 3D CAD models are included to support evaluation. The datasets are available at: https://astrobee-iss-dataset.github.io/.
|
|
10:45-11:00, Paper WeAT13.4 | |
Transformable Nano Rover for Space Exploration |
|
Hirano, Daichi | Japan Aerospace Exploration Agency |
Inazawa, Mariko | Japan Aerospace Exploration Agency |
Sutoh, Masataku | Japan Aerospace Exploration Agency |
Sawada, Hirotaka | JAXA |
Kawai, Yuta | Japan Aerospace Exploration Agency |
Nagata, Masaharu | Sony Group Corporation |
Sakoda, Gen | Sony Group Corporation |
Yoneda, Yousuke | TAKARATOMY |
Watanabe, Kimitaka | Doshisha University |
Keywords: Space Robotics and Automation, Wheeled Robots, Computer Vision for Automation
Abstract: This letter introduces a novel nano rover designed to transform its shape for efficient movement on lunar surfaces. The rover, resembling a compact ball, has a diameter of roughly 80 mm and a mass around 250 g. Its transformational mechanism allows for compactness during planetary transportation, with enhanced mobility achieved through the use of extendable wheels, a tail stabilizer, and cameras. To traverse soft terrains efficiently, the rover utilizes an eccentric wheel mechanism, offering two distinct movement modes based on wheel synchronization. This mechanism provides a locomotion velocity of 20 mm/s or more on a flat surface. Moreover, it features onboard image processing to detect spacecraft shielded by Multi-Layer Insulation (MLI) films, facilitating autonomous control and selective image transmission. This rover has been deployed in a real space mission, having been mounted on a lunar lander. This letter presents the design specifics of this transformable rover and results from field tests simulating lunar conditions. These tests affirmed the efficacy of the proposed motion mechanism and onboard image processing.
|
|
WeAT14 |
Room 14 |
Terrestrial Navigation |
Regular session |
Chair: Lim, Yongseob | DGIST |
Co-Chair: Karki, Hamad | Khalifa University |
|
10:00-10:15, Paper WeAT14.1 | |
Horizontal Attention Based Generation Module for Unsupervised Domain Adaptive Stereo Matching |
|
Wang, Sungjun | DGIST |
Seo, Junghyun | DGIST |
Jeon, Hyeonjae | Daegu Gyeongbuk Institute of Science and Technology (DGIST) |
Lim, Sungjin | Daegu Gyeongbuk Institute of Science and Technology (DGIST) |
Park, Sang Hyun | DGIST |
Lim, Yongseob | DGIST |
Keywords: Computer Vision for Transportation, Deep Learning for Visual Perception
Abstract: The emergence of convolutional neural networks (CNNs) has led to significant advancements in various computer vision tasks. Among them, stereo matching is one of the most important research areas that enables the reconstruction of depth and 3D information, which is difficult to obtain with only a monocular camera. However, CNNs have their limitations, particularly their susceptibility to domain shift. The CNN-based state-of-the-art stereo matching networks suffered from performance degradation under domain changes. Moreover, obtaining a significant amount of real-world ground truth data to address these issues is a laborious and costly task when compared to acquiring synthetic ground truth data. In this paper, we propose an end-to-end framework that utilizes image-to-image translation to overcome the domain gap in stereo matching. Specifically, we suggest a horizontal attentive generation (HAG) module that incorporates the epipolar constraint of contents when generating target-stylized left-right views. By employing a horizontal attention mechanism during generation process, our method can address the issues related to small receptive field by aggregating more information of each view without using the entire feature map. Therefore, our network can maintain consistencies between the left and right views during image generation process, making it more robust for different datasets.
|
|
10:15-10:30, Paper WeAT14.2 | |
BeautyMap: Binary-Encoded Adaptable Ground Matrix for Dynamic Points Removal in Global Maps |
|
Jia, Mingkai | The Hong Kong University of Science and Technology |
Zhang, Qingwen | KTH Royal Institute of Technology |
Yang, Bowen | The Hong Kong University of Science and Technology, Robotics Ins |
Wu, Jin | HKUST |
Liu, Ming | Hong Kong University of Science and Technology (Guangzhou) |
Jensfelt, Patric | KTH - Royal Institute of Technology |
Keywords: Object Detection, Segmentation and Categorization, Mapping, Autonomous Agents
Abstract: Global point clouds that correctly represent the static environment features can facilitate accurate localization and robust path planning. However, dynamic objects introduce undesired `ghost' tracks that are mixed up with the static environment. Existing dynamic removal methods normally fail to balance the performance in computational efficiency and accuracy. In response, we present `BeautyMap' to efficiently remove the dynamic points while retaining static features for high-fidelity global maps. Our approach utilizes a binary-encoded matrix to efficiently extract the environment features. With a bit-wise comparison between matrices of each frame and the corresponding map region, we can extract potential dynamic regions. Then we use coarse to fine hierarchical segmentation of the z-axis to handle terrain variations. The final static restoration module accounts for the range-visibility of each single scan and protect static points out of sight. Comparative experiments underscore BeautyMap's superior performance in both accuracy and efficiency against other dynamic points removal methods. The code is open-sourced at https://github.com/HKUSTGZ-IADC/BeautyMap.
|
|
10:30-10:45, Paper WeAT14.3 | |
Under-Canopy Navigation Using Aerial Lidar Maps |
|
Carvalho de Lima, Lucas | The University of Queensland |
Lawrance, Nicholas | CSIRO Data61 |
Khosoussi, Kasra | The Commonwealth Scientific and Industrial Research (CSIRO) |
Borges, Paulo Vinicius Koerich | CSIRO |
Bruenig, Michael | The University of Queensland |
Keywords: Field Robots, Mapping, Robotics and Automation in Agriculture and Forestry
Abstract: Autonomous navigation in unstructured natural environments poses a significant challenge. In goal navigation tasks without prior information, the limited look-ahead of onboard sensors utilised by robots compromises path efficiency. We propose a novel approach that leverages an above-the-canopy aerial map for improved ground robot navigation. Our system utilises aerial lidar scans to create a 3D probabilistic occupancy map, uniquely incorporating the uncertainty in the aerial vehicle’s trajectory for improved accuracy. Novel path planning cost functions are introduced, combining path length with obstruction risk estimated from the probabilistic map. The D* Lite algorithm then calculates an optimal (minimum-cost) path to the goal. This system also allows for dynamic replanning upon encountering unforeseen obstacles on the ground. Extensive experiments and ablation studies in simulated and real forests demonstrate the effectiveness of our system.
|
|
WeBT1 |
Room 1 |
Best Industrial Robotics Research for Application Papers (Mujin Inc.) |
Regular session |
Co-Chair: Nakamura, Taro | Chuo University |
|
11:00-11:15, Paper WeBT1.1 | |
Peristaltic Soft Robot for Long-Distance Pipe Inspection with an Endoskeletal Structure for Propulsion and Traction Amplification |
|
Okuma, Ryusei | Chuo University |
Naruse, Yuta | Chuo University |
Ito, Fumio | Chuo University |
Nakamura, Taro | Chuo University |
Keywords: Biologically-Inspired Robots, Soft Robot Applications
Abstract: This study proposed a peristaltic motion-type inspection robot equipped with a “linear antagonistic mechanism using artificial muscles with an endoskeletal structure” to amplify propulsion and traction. We sought to develop an in-pipe inspection robot for long, narrow, and complex pipes requiring large propulsion, traction, and flexibility. In a previous study, we proposed a linear antagonistic mechanism allowing the inspection robot to generate both high propulsion and traction along with flexibility in narrow pipes. The proposed mechanism consisted of two extension actuators and a gripping actuator sandwiched between these extension actuators. The large extension force by the extension actuators is distributed to both propulsion and traction. However, owing to the piston-shaped configuration of the extension actuators, the generated force decreased in a manner dependent on the cross-sectional area within narrow pipelines. Therefore, the in-pipe inspection robot took time to move in longdistance, small-diameter pipes with multiple bends. This paper describes a “linear antagonistic mechanism using artificial muscles with an endoskeletal structure” that amplifies propulsion and traction by inserting a tension spring (skeleton) inside the contraction actuators (artificial muscles) and utilizing the action force generated by the actuator and transmitted by the tension spring. In this study, the developed robot with an endoskeleton exhibited maximum propulsion of 60.2 N, surpassing its nonendoskeleton counterpart by a factor of 1.61. Furthermore, the robot equipped with the endoskeleton passed through an elbow pipe 1.29 times faster than that without the endoskeleton, reducing the time from 741 to 576 s. The function value that compares the propulsion and traction considering the effects of the applied pressure and pipe diameter required for long-distance inspection was more than 1.13 times that of the previous study. In addition, the non-dimensionalized traction was 1.55 times greater than that of any other pipe inspection robot, and the propulsion was large enough to pass through a bending pipe. This result indicates the feasibility of the developed robot for inspecting long, narrow, and complex pipes.
|
|
11:15-11:30, Paper WeBT1.2 | |
A Robust and Efficient Robotic Packing Pipeline with Dissipativity-Based Adaptive Impedance-Force Control |
|
Zhou, Zhenning | Shanghai Jiao Tong University |
Zhou, Lei | National University of Singapore |
Sun, Shengxin | Shanghai Jiao Tong University |
Ang Jr, Marcelo H | National University of Singapore |
Keywords: Manipulation Planning, Robust/Adaptive Control
Abstract: For humans, dense bin packing heavily relies on force perception. However, current robotic packing studies only focus on the visual input or adopt auxiliary push-to-place actions to eliminate gaps, suffering from high time expenditure and poor robustness. To address such limitations, we first introduce a novel external force estimation method based on the generalized momentum observer, which can avoid the influence of joint acceleration noises and achieve real-time high-precision monitoring. Second, to obtain compliant interaction and fine robustness, an adaptive variable impedance policy is developed to track dynamic motion and desired force, and compensate for uncertainties. Meanwhile, we perform dissipativity analysis and a virtual energy supply function is augmented to the system for optimization, providing a solid foundation for stability. Third, we propose an efficient packing methodology with three subtasks by considering the distinct interaction and constraint states in different areas. Our packing strategies eliminate the need for subsequent auxiliary actions and are proven to enhance efficiency. We perform quantitative evaluations to verify our external force estimation method, conduct comparison studies with current packing methods, and investigate the contribution of our dissipativity-based adaptive controller. The superior results not only prove the robustness and efficiency of our pipeline, but also pave the way for practical applications of packing.
|
|
11:30-11:45, Paper WeBT1.3 | |
Harnessing with Twisting: Single-Arm Deformable Linear Object Manipulation for Industrial Harnessing Task |
|
Zhang, Xiang | University of California, Berkeley |
Lin, Hsien-Chung | FANUC Corporation |
Zhao, Yu | FANUC America Corporation |
Tomizuka, Masayoshi | University of California |
Keywords: Assembly, Force and Tactile Sensing, Industrial Robots
Abstract: Wire-harnessing tasks pose great challenges to be automated by the robot due to the complex dynamics and unpredictable behavior of the deformable wire. Traditional methods, often reliant on dual-robot arms or tactile sensing, face limitations in adaptability, cost, and scalability. This paper introduces a novel single-robot wire-harnessing pipeline that leverages a robot's twisting motion to generate necessary wire tension for precise insertion into clamps, using only one robot arm with an integrated force/torque (F/T) sensor. Benefiting from this design, the single robot arm can efficiently apply tension for wire routing and insertion into clamps in a narrow space. Our approach is structured around four principal components: a Model Predictive Control (MPC) based on the Koopman operator for tension tracking and wire following, a motion planner for sequencing harnessing waypoints, a suite of insertion primitives for clamp engagement, and a fix-point switching mechanism for wire constraint updating. Evaluated on an industrial-level wire harnessing task, our method demonstrated superior performance and reliability over conventional approaches, efficiently handling both single and multiple wire configurations with high success rates.
|
|
11:45-12:00, Paper WeBT1.4 | |
Beyond Feasibility: Efficiently Planning Robotic Assembly Sequences That Minimize Assembly Path Lengths |
|
Cebulla, Alexander | Karlsruhe Institute of Technology (KIT) |
Asfour, Tamim | Karlsruhe Institute of Technology (KIT) |
Kroeger, Torsten | Intrinsic Innovation LLC |
Keywords: Assembly, Intelligent and Flexible Manufacturing
Abstract: Advancements in Industry 4.0 demand sophisticated solutions for automatic robotic assembly sequence planning (RASP), capable of handling the diversity and complexity of modern manufacturing tasks. One approach to RASP is Assembly-by-Disassembly (AbD). It first searches for a disassembly sequence that is then inverted to obtain an assembly sequence. One of the challenges of AbD, however, is the exponential number of potential assembly sequences for any given assembly. To mitigate this challenge, we propose to transfer knowledge obtained during previous planning attempts. Specifically, we present an approach that combines Monte Carlo Tree Search (MCTS) with deep Q-learning to optimize the total length of robotic assembly paths. We use a graph-based representation of disassembly states in combination with a graph neural network to learn the Q-function. We further discuss a principled approach to generate 3D~assemblies out of aluminium profiles that a single robot manipulator can assemble. With this approach, we generated two datasets consisting of 14 assemblies with 21 removable parts and 7 assemblies with 30 removable parts. Using leave-one-out cross-validation, we were able to demonstrate how our approach outperformed an unmodified MCTS. Moreover, we successfully transferred knowledge between datasets.
|
|
WeBT2 |
Room 2 |
Best Robot Mechanisms and Design Papers (ROBOTIS) |
Regular session |
|
11:00-11:15, Paper WeBT2.1 | |
A Novel Vitreoretinal Surgical Robot System to Maximize the Internal Reachable Workspace and Minimize the External Link Motion |
|
Jeong, Gowoon | Chonnam National University |
Ko, Seong Young | Chonnam National University |
Keywords: Medical Robots and Systems, Surgical Robotics: Steerable Catheters/Needles, Mechanism Design
Abstract: This paper presents a novel robotic system designed for efficient minimally invasive eye surgery. The proposed prototype integrates a concentric tube mechanism(CTM) and a belt-driven remote center of motion(RCM) mechanism, aiming to maximize the internal reachable workspace while minimizing external robot movements. The integrated system provides several advantages, including preventing collisions between surgical tools and the lens, minimizing sclera stress, and having efficient robot motion inside and outside the eyeball. It provides sufficient link motions with roll and pitch angles of ±32° and ±85° respectively at the RCM point, allowing access to 89% of the retina. The experiment evaluates the system's performance, with the RCM point accuracy at 0.718mm, CTM position accuracy at 207 µm, and a repeatability error of 246 µm. To reduce hysteresis errors at the RCM point caused by the belt, a lever-based belt tensioner is used for initial calibration while an optical tracking system tracks each joint’s movement. Targeting experiments highlight that the wider workspace was achieved by the CTM+RCM system compared to the traditional RCM mechanism with a straight tool. The results showed the system's compactness, efficiency, and dexterity, confirming its feasibility and potential for the proposed eye surgery robot.
|
|
11:15-11:30, Paper WeBT2.2 | |
Multistable Soft Actuator for Physical Human-Robot Interaction |
|
Long, Juncai | Zhejiang University |
Li, Jituo | Zhejiang University |
Diao, Xiaojie | Zhejiang University |
Zhou, Chengdi | ZheJiang University |
Lu, GuoDong | Zhejiang University |
Feng, Yixiong | Zhejiang University |
Keywords: Physical Human-Robot Interaction, Soft Robot Materials and Design, Haptics and Haptic Interfaces
Abstract: Collaboration with robots through physical contact offers a more intuitive, natural, and engaging operational experience, showcasing vast potential in the field of human-robot interaction. However, current physical interaction devices, such as collaborative robots and haptic feedback mechanisms, are limited by their singular modes of motion and feedback, hindering enhancements in interaction experiences. Herein, we present a multistable soft actuator capable of driving multimodal shape changes and passively conforming to user touch. This actuator can memorizes and maintains any deformation with zero power consumption. Its structural mechanical properties can be dynamically adjusted to produce a rich haptic feedback for the user, including changes in shape, elasticity, stiffness, and even sensations of rupture and weightlessness. Structurally, the mechanism consists of a network of pneumatic bistable units in series and parallel configurations, which can switch states under air pressure or external force, achieving extension, contraction, and omnidirectional bending. The input of air pressure can either impede or assist deformation, altering structural stiffness and resulting in varied loading curves. With its high safety in physical interactions, robust operability, and rich mechanical tactile feedback, the multistable soft actuator promises new design directions for physical human-robot interaction devices.
|
|
11:30-11:45, Paper WeBT2.3 | |
Development of a Compact Robust Passive Transformable Omni-Ball for Enhanced Step-Climbing and Vibration Reduction |
|
Hongo, Kazuo | Sony Group Corporation |
Kito, Takashi | Sony Group Corporation |
Kamikawa, Yasuhisa | Sony Group Corporation |
Kinoshita, Masaya | Sony Group Corporation |
Kawanami, Yasunori | Sony Group Corporation |
Keywords: Wheeled Robots, Field Robots, Mechanism Design
Abstract: This paper introduces the Passive Transformable Omni-Ball (PTOB), an advanced omnidirectional wheel engineered to enhance step-climbing performance, incorporate built-in actuators, diminish vibrations, and fortify structural integrity. By modifying the omni-ball's structure from two to three segments, we have achieved improved in-wheel actuation and a reduction in vibrational feedback. Additionally, we have implemented a sliding mechanism in the follower wheels to boost the wheel's step-climbing abilities. A prototype with a 127 mm diameter PTOB was constructed, which confirmed its functionality for omnidirectional movement and internal actuation. Compared to a traditional omni-wheel, the PTOB demonstrated a comparable level of vibration while offering superior capabilities. Extensive testing in varied settings showed that the PTOB can adeptly handle step obstacles up to 45 mm, equivalent to 35% of the wheel's diameter, in both the forward and lateral directions. The PTOB showcased robust construction and proved to be versatile in navigating through environments with diverse obstacles.
|
|
11:45-12:00, Paper WeBT2.4 | |
BaRiFlex: A Robotic Gripper with Versatility and Collision Robustness for Robot Learning |
|
Jeong, Gu-Cheol | University of Texas at Austin |
Bahety, Arpit | Columbia University |
Pedraza, Gabriel | The University of Texas at Austin |
Deshpande, Ashish | The University of Texas |
Martín-Martín, Roberto | University of Texas at Austin |
Keywords: Grippers and Other End-Effectors, Compliant Joints and Mechanisms, Grasping
Abstract: We present a new approach to robot hand design specifically suited to enable robot learning methods and daily tasks in human environments. We introduce BaRiFlex, an innovative gripper design that alleviates the issues caused by unexpected contact and collisions during robot learning, offering collision mitigation, grasping versatility, task versatility, and simplicity to the learning processes. This achievement is enabled by the incorporation of low-inertia actuators, providing high Back-drivability, and the strategic combination of Rigid and Flexible materials which enhances versatility and the gripper’s resilience against unpredicted collisions. Furthermore, the integration of flexible Fin-Ray and rigid linkages allows the gripper to execute compliant grasping and precise pinching. We conducted rigorous performance tests to characterize the novel gripper’s compliance, durability, grasping and task versatility, and precision. We also integrated the BaRiFlex with a 7 Degree of Freedom (DoF) Franka Emika’s Panda robotic arm to evaluate its capacity to support a trial-and-error (reinforcement learning) training procedure. The results of our experimental study are then compared to those obtained using the original rigid Franka Hand and a reference Fin-Ray soft gripper, demonstrating the superior capabilities and advantages of our developed gripper system. More information and videos at https://robin-lab.cs.utexas.edu/bariflex
|
|
WeBT3 |
Room 3 |
Manipulation and Grasping II |
Regular session |
Chair: Tzes, Anthony | New York University Abu Dhabi |
Co-Chair: Khorrami, Farshad | New York University Tandon School of Engineering |
|
11:00-11:15, Paper WeBT3.1 | |
On the Generality and Application of Mason’s Voting Theorem to Center of Mass Estimation for Pure Translational Motion (I) |
|
Gao, Ziyan | Japan Advanced Institute of Science and Technology |
Elibol, Armagan | Forschungszentrum Jülich GmbH |
Chong, Nak Young | Japan Advanced Institute of Science and Technology |
Keywords: Non-prehensile Manipulation, Calibration and Identification, Manipulation Planning, Foundations of Automation
Abstract: Object rearrangement is widely demanded in many of the manipulation tasks performed by industrial and service robots. Rearranging an object through planar pushing is deemed energy efficient and safer compared with the pick-and-place operation. However, due to the unknown physical properties of the object, re-arranging an object toward the target position is difficult to accomplish. Even though robots can benefit from multi-modal sensory data for estimating novel object dynamics, the exact estimation error bound is still unknown. In this work, firstly, we demonstrate a way to obtain an error bound on the center of mass (CoM) estimation for the novel object only using a position-controlled robot arm and a vision sensor. Specifically, we extend Mason's Voting Theorem (MVT) to object CoM estimation in the absence of accurate information on friction and object shape. The probable CoM locations are monotonously narrowed down to a convex region, and the Extended Voting Theorems (EVT's) guarantee that the convex region contains the CoM ground truth in the presence of contact normal estimation error and pushing execution error. For the object translation task, existing methods generally assume that the pusher-object system’s physical properties and full-state feedback are available, or utilize iterative pushing executions, which limits the application of planar pushing to real-world settings. In this work, assuming a nominal friction coefficient between the pusher and object through contact normal error bound analysis, we leverage the estimated convex region and the Zero Moment Two Edge Pushing (ZMTEP) method to select the contact configurations for object pure translation. It is ensured that the selected contact configurations are capable of tolerating the CoM estimation error. The experimental results show that the object can be accurately translated to the target position with only two controlled pushes at most.
|
|
11:15-11:30, Paper WeBT3.2 | |
Probabilistic Closed-Loop Active Grasping |
|
Schaub, Henry | Hochschule Muenchen University of Applied Sciences |
Wolff, Christian | University of Regensburg |
Hoh, Maximilian | University of Applied Sciences Munich |
Schöttl, Alfred | University of Applied Sciences Munich, Dept. for Electrical Engi |
Keywords: Perception for Grasping and Manipulation, Grasping, Sensor Fusion
Abstract: Picking a specific object is an essential task of assistive robotics. While the majority of grasp detection approaches focus on grasp synthesis from a single depth image or point cloud, this approach is often not viable in an unstructured, uncontrolled environment. Due to occlusion, heavy influence of noise or simply because no collision-free grasp is visible from some perspectives, it is beneficial to collect additional information from other views before opting for grasp execution. We present a closed-loop approach that selects and navigates towards the next-best-view by minimizing the entropy of the volume under consideration. We use a local measure of estimation uncertainty of the surface reconstruction, to sample grasps and estimate their success probabilities in an online fashion. Our experiments show that our algorithm achieves better grasp success rates than comparable approaches, when presented with challenging household objects.
|
|
11:30-11:45, Paper WeBT3.3 | |
Pre-Grasp Approaching on Mobile Robots: A Pre-Active Layered Approach |
|
Naik, Lakshadeep | University of Southern Denmark (SDU) |
Kalkan, Sinan | Middle East Technical University |
Krüger, Norbert | University of Southern Denmark |
Keywords: Mobile Manipulation, Reinforcement Learning
Abstract: In Mobile Manipulation (MM), navigation and manipulation are generally solved as subsequent disjoint tasks. Combined optimization of navigation and manipulation costs can improve the time efficiency of MM. However, this is challenging as precise object pose estimates, which are necessary for such combined optimization, are often not available until the later stages of MM. Moreover, optimizing navigation and manipulation costs with conventional planning methods using uncertain object pose estimates can lead to failures and hence requires re-planning. Hence, in the presence of object pose uncertainty, pre-active approaches are preferred. We propose such a pre-active approach for determining the base pose and pre-grasp manipulator configuration to improve the time efficiency of MM. We devise a Reinforcement Learning (RL) based solution that learns suitable base poses for grasping and pre-grasp manipulator configurations using layered learning that guides exploration and enables sample-efficient learning. Further, we accelerate learning of pre-grasp manipulator configurations by providing dense rewards using a predictor network trained on previously learned base poses for grasping. Our experiments validate that in the presence of uncertain object pose estimates, the proposed approach results in reduced execution time. Finally, we show that our policy learned in simulation can be easily transferred to a real robot. The code repository and the supplementary video can be found on the project webpage*.
|
|
11:45-12:00, Paper WeBT3.4 | |
Smooth Distances for Second Order Kinematic Robot Control (I) |
|
Gonçalves, Vinicius Mariano | New York University Abu Dhabi, United Arab Emirates |
Tzes, Anthony | New York University Abu Dhabi |
Khorrami, Farshad | New York University Tandon School of Engineering |
Fraisse, Philippe | LIRMM |
Keywords: Motion Control of Manipulators, Optimization and Optimal Control, Obstacle Avoidance, Kinematics
Abstract: In this paper, we propose an algorithm for computing a smoothed version of the distance between two objects. As opposed to the traditional Euclidean distance between two objects, which may not be differentiable, this smoothed distance is guaranteed to be differentiable. Differentiability is an important property in many applications, in particular in robotics, in which obstacle-avoidance schemes often rely on the derivative/Jacobian of the distance between two objects. We prove mathematical properties of this smoothed distance and of the algorithm for computing it, and show its applicability in robotics by applying it to a second order kinematic control framework, also proposed in this paper. The control framework using smooth distances was successfully implemented on a 7 DOF manipulator.
|
|
WeBT4 |
Room 4 |
Soft Robot Materials and Design II |
Regular session |
Chair: Nabae, Hiroyuki | Tokyo Institute of Technology |
Co-Chair: Wakimoto, Shuichi | Okayama University |
|
11:00-11:15, Paper WeBT4.1 | |
A Nitinol-Embedded Wearable Soft Robotic Gripper for Deep-Sea Manipulation (I) |
|
Zuo, Zonghao | Beihang University |
He, Xia | Beihang University |
Wang, Haoxuan | Beihang University |
Shao, Zhuyin | Beihang University |
Liu, Jiaqi | Beihang University |
Zhang, Qiyi | Beihang University |
Pan, Fei | Beihang University |
Wen, Li | Beihang University |
Keywords: Marine Robotics, Soft Robot Materials and Design, Soft Robot Applications
Abstract: Soft robotic gripper systems that can safely and nondestructively collect deep-sea biological samples, artifact samples and perform deep-sea manipulation tasks are essential for deep-sea science and engineering applications. In this paper, we implemented a soft robotic gripper composed of nitinol-embedded soft fingers and an in-situ wearable mechanism that allows the soft gripper to be put on and removed from the traditional rigid gripper according to the deep-sea tasks. We apply finite element simulation to investigate the influence of nitinol wires’ diameter on the soft finger and then examine the strength and grasping ability. The results indicate that the soft gripper's maximum horizontal and maximum vertical pulling force can reach 75.5N and 135.7N, respectively. We show that the gripper can perform nondestructive sampling tasks, including picking and placing fragile porcelain and operating a precision instrument at a depth range of 1410m to 3600m by a human-crewed deep-sea submersible (Deep Sea Warrior). The results from this study may provide new design insights into the creation of next-generation deep-sea intelligent robotic systems that can perform dexterous manipulation.
|
|
11:15-11:30, Paper WeBT4.2 | |
A Novel Hybrid Variable Stiffness Mechanism: Synergistic Integration of Layer Jamming and Shape Memory Polymer |
|
Yu, WenKai | Department of Mechanics and Aerospace Engineering, Southern Univ |
Liu, Jingyi | Southern University of Science and Technology |
Li, Xin | Department of Mechanics and Aerospace Engineering, Southern Univ |
Yu, Ziyue | Southern University of Science and Technology |
Yuan, Hongyan | Southern University of Science and Technology |
Keywords: Soft Robot Applications, Soft Robot Materials and Design, Grippers and Other End-Effectors
Abstract: Soft robots have garnered considerable attention recently due to their versatility, compliance, and myriad applications. However, the inherent low stiffness of soft robots also limits their stability and force output capability. Hence, variable stiffness technology has emerged as a solution, which enables soft robots to modulate stiffness according to the application scenario. Two primary methods have been developed to regulate stiffness: material phase transition (MPT) based method and geometric reconfiguration (GR) based method. However, these approaches have not achieved miniaturization while maintaining a wide range of stiffness change. This work introduces a novel hybrid variable stiffness (HVS) concept that combines the MPT-based and GR-based variable stiffness methods for the first time. Specially, the HVS structure leverages shape memory polymer (SMP) method and layer jamming (LJ) to get a simultaneous response. Bending tests reveal that the compact bi-layer designed HVS structure can achieve a wide stiffness range (0.31 N/mm ~ 4.86 N/mm, 15.7 times) and load-bearing capacity (1.76 N~ 28.1 N, 16.0 times), which has been simulated by finite element analysis. Response tests show that the jamming response is rapid (~ms) while the maximum heating rate is 2.77 ± 0.16 °C/s, indicating that the HVS structure can achieve a relatively fast response. Furthermore, a soft gripper equipped with the HVS structure is developed to illustrate the enhanced grasping ability. Several performed grasping tests reveal that the variable stiffness soft gripper can grasp various objects with diverse shapes (40.0 mm ~ 190 mm, 4.75 times) and materials while lifting a weight up to 650 g, which provides an effective solution for the complex application requirements of soft robots.
|
|
11:30-11:45, Paper WeBT4.3 | |
A Soft Crawling Robot That Can Self-Repair Material Removal and Deep Lengthwise Cuts, Actuated by Thin McKibben Muscles |
|
Xie, Mengfei | Tokyo Institute of Technology |
Feng, Yunhao | Tokyo Institute of Technology |
Nabae, Hiroyuki | Tokyo Institute of Technology |
Suzumori, Koichi | Tokyo Institute of Technology |
Keywords: Soft Robot Applications, Soft Robot Materials and Design
Abstract: Soft robots are prone to damage when they come into contact with sharp objects, decreasing their functionality. Self-repairing soft robots have great potential to restore their functionality after the damage has been repaired. However, for some damages where it is difficult to reconnect the cut surfaces, existing self-repairing soft robots often require external intervention to establish contact between the cut surfaces and achieve recovery. This paper proposes a novel self-repairing soft robot composed of thin McKibben muscles and self-healing materials. Experimental validation and mathematical model analysis have demonstrated that this robot can self-repair the damage on hard-to-reconnect cut surfaces, such as material removal and deep lengthwise cuts, by actuating the Thin McKibben muscles in the designed sequence. Furthermore, experimental evidence through bending and crawling confirms that this robot exhibits robust self-repair properties. This recovery process can be achieved without external intervention and shows potential to be extrapolated to other systems.
|
|
11:45-12:00, Paper WeBT4.4 | |
Experimental Validation of a 7-DOF Power Soft Robot Driven by Hydraulic Artificial Muscles |
|
Feng, Yunhao | Tokyo Institute of Technology |
Ide, Tohru | Tokyo Institute of Technology |
Nabae, Hiroyuki | Tokyo Institute of Technology |
Endo, Gen | Tokyo Institute of Technology |
Sakurai, Ryo | Bridgestone Corporation |
Ohno, Shingo | Bridgestone Corporation |
Suzumori, Koichi | Tokyo Institute of Technology |
Keywords: Soft Robot Applications, Soft Robot Materials and Design, Hydraulic/Pneumatic Actuators
Abstract: Hydraulic artificial muscles offer superior performance compared to most pneumatic artificial muscles, but their suitability in multi-DOF robotics remains unverified. We fabricated a 7-DOF power soft robot spanning over 1.5 m using 29 McKibben hydraulic artificial muscles. We analyzed the proposed robot's workspace, payload capacity, and compliance based on the properties of the hydraulic muscles and conducted several validation experiments. The robot successfully handled a payload of more than 25 kg at a maximum pressure of 5.0 MPa and exhibited passive compliance ranging from 0.5 to 2.0 mm/N with valves fully closed. Furthermore, the robot demonstrated strong impact resistance and successfully performed tasks such as concrete chipping. These results demonstrate the capability of muscle-driven robots to perform diverse tasks in a range of industrial environments.
|
|
WeBT5 |
Room 5 |
Robust and Adaptive Control I |
Regular session |
Chair: Kumar, Shivesh | DFKI GmbH |
Co-Chair: Monje, Concepción A. | University Carlos III of Madrid |
|
11:00-11:15, Paper WeBT5.1 | |
Hierarchical Incremental MPC for Redundant Robots: A Robust and Singularity-Free Approach (I) |
|
Wang, Yongchao | Technical University of Munich |
Liu, Yang | Technical University of Munich |
Leibold, Marion | Technische Universität München |
Buss, Martin | Technische Universität München |
Lee, Jinoh | German Aerospace Center (DLR) |
Keywords: Optimization and Optimal Control, Robust/Adaptive Control of Robotic Systems, Redundant Robots, Model Predictive Control
Abstract: This paper presents a model predictive control (MPC) method for redundant robots controlling multiple hierarchical tasks formulated as multi-layer constrained optimal control problems (OCPs). The proposed method, named hierarchical incremental MPC (HIMPC), is robust to dynamic uncertainties, untethered from kinematic/algorithmic singularities, and capable of handling input and state constraints such as joint torque and position limits. To this end, we first derive robust incremental systems that approximate uncertain system dynamics without computing complex nonlinear functions or identifying model parameters. Then the constrained OCPs are cast as quadratic programming problems which result in linear MPC, where dynamically-consistent task priority is achieved by deploying equality constraints and optimal control is attained under input and state constraints. Moreover, hierarchical feasibility and recursive feasibility are theoretically proven. Since the computational complexity of HIMPC drastically decreases compared with nonlinear MPC-based methods, it is implemented under the sampling frequency of 1 kHz for physical experiments with redundant manipulator setups, where robustness (high tracking accuracy and enhanced dynamic consistency), admissibility of multiple constraints, and singularity-avoidance nature are demonstrated and compared with state-of-the-art task-prioritized controllers.
|
|
11:15-11:30, Paper WeBT5.2 | |
Traversability-Aware Adaptive Optimization for Path Planning and Control in Mountainous Terrain |
|
Yoo, Se-Wook | Seoul National University |
Son, E-In | Seoul National University |
Seo, Seung-Woo | Seoul National University |
Keywords: Robust/Adaptive Control, Integrated Planning and Learning, Field Robots
Abstract: Autonomous navigation in extreme mountainous terrains poses challenges due to the presence of mobility-stressing elements and undulating surfaces, making it particularly difficult compared to conventional off-road driving scenarios. In such environments, estimating traversability solely based on exteroceptive sensors often leads to the inability to reach the goal due to a high prevalence of non-traversable areas. In this paper, we consider traversability as a relative value that integrates the robot's internal state, such as speed and torque to exhibit resilient behavior to reach its goal successfully. We separate traversability into apparent traversability and relative traversability, then incorporate these distinctions in the optimization process of sampling-based planning and motion predictive control. Our method enables the robots to execute the desired behaviors more accurately while avoiding hazardous regions and getting stuck. Experiments conducted on simulation with 27 diverse types of mountainous terrain and real-world demonstrate the robustness of the proposed framework, with increasingly better performance observed in more complex environments.
|
|
11:30-11:45, Paper WeBT5.3 | |
Neural-FxSMC: A Robust Adaptive Neural Fixed-Time Sliding Mode Control for Quadrotors with Unknown Uncertainties |
|
Yogi, Subhash Chand | Indian Institute of Technology - Kanpur |
Behera, Laxmidhar | IIT Kanpur |
Tripathy, Twinkle | IIT Bombay |
Keywords: Robust/Adaptive Control, Aerial Systems: Applications
Abstract: This paper presents Neural-FxSMC, a robust and precise control scheme for quadrotors to counter unknown dynamics, uncertainties, and external disturbances. Neural-FxSMC, (i) addresses fixed-time convergence of the tracking error, control singularity, and chattering issues simultaneously, which is not possible with the existing Fixed time Sliding Mode Control (FxSMC), and (ii) relaxes the a priori bound assumption over the uncertainties that are often considered as a constant or a state-dependent upper bound. The fixed-time convergence of tracking error is guaranteed by establishing fixed-time convergence of the Non-singular Fast Terminal Slid- ing Surface (NFTSS), contrary to the existing works where the NFTSS convergence depends on initial conditions. The Chatter- ing is suppressed via Radial Basis Function Network (RBFN) based uncertainties estimation. Finally, using the Lyapunov theory, we prove the fixed-time convergence and boundedness of Neural-FxSMC weights. We comprehensively evaluate Neural-FxSMC in challenging scenarios such as unknown payload and turbulent wind. Our Neural-FxSMC, apart from handling unknown dynamics and uncertainties, also offers direct gravity compensation without using quadrotor mass and gravity.
|
|
11:45-12:00, Paper WeBT5.4 | |
An Open Source Dual Purpose Acrobot and Pendubot Platform for Benchmarking Control Algorithms for Underactuated Robotics (I) |
|
Wiebe, Felix | DFKI GmbH Robotics Innovation Center |
Kumar, Shivesh | DFKI GmbH |
Shala, Lasse | Deutsches Forschungszentrum Für Künstliche Intelligenz |
Vyas, Shubham | Robotics Innovation Center, DFKI GmbH |
Javadi, Mahdi | German Research Center for Artificial Intelligence Robotics Inn |
Kirchner, Frank | University of Bremen |
Keywords: Robust/Adaptive Control, Software Tools for Benchmarking and Reproducibility, Optimization and Optimal Control
Abstract: This paper presents an open-source and low-cost test bench for validating, comparing and benchmarking the per- formance of control algorithms for underactuated robots with strong non-linear dynamics. It introduces a double pendulum platform built using two off-the-shelf quasi-direct drives (QDDs). Due to low friction and high mechanical transparency offered by QDDs, one of the actuators can be kept passive and be used as an encoder, so that the system can be operated as a double pendulum, a pendubot or an acrobot without changing the hardware. Using the proposed platform, trajectory optimization and control algorithms for the swing- up and upright stabilization of the acrobot and pendubot systems are compared and benchmarked. By considering simple variations of the design, the difficulty of the control problem can be varied giving researchers opportunity for showing the robustness of their control algorithms.
|
|
WeBT6 |
Room 6 |
Mechanism Design I |
Regular session |
Chair: Stefanini, Cesare | Scuola Superiore Sant'Anna |
|
11:00-11:15, Paper WeBT6.1 | |
MTABot: An Efficient Morphable Terrestrial-Aerial Robot with Two Transformable Wheels |
|
Shi, Ke | Harbin Institute of Technology |
Jiang, Zainan | State Key Laboratory of Robotics and System, Harbin Institute Of |
Ma, Liyan | Harbin Institute of Technology |
Qi, Le | Harbin Institute of Technology |
Jin, Minghe | Harbin Institute of Technology |
Keywords: Mechanism Design, Aerial Systems: Mechanics and Control
Abstract: Terrestrial-aerial robots, capable of swift aerial navigation and enduring terrestrial operations, possess significant potential for utilization in exploration and rescue missions. However, achieving their capability to negotiate diverse terrains with a high-power-efficient structure remains a formidable challenge. This paper presents a morphable terrestrial-aerial robot, named MTABot, which achieves three modalities through the deployment of two multifunctional appendages. These include: (1) rolling mode, (2) climbing mode, both achieved with transformable two-wheeled configuration, and (3) flying mode, achieved with a bicopter configuration. Moreover, the radius and sector angle of transformable wheel have been optimized to enhance the obstacle-climbing capability; the position of the robot's body center of gravity has been optimized to balance ground gripping capacity and flight dynamic response speed. Finally, the robot's multi-terrain overcoming capability is validated through obstacle-climbing experiments and continuous terrestrial-aerial transformation experiments, and the high power efficiency of robot is affirmed, demonstrating feasibility of the design.
|
|
11:15-11:30, Paper WeBT6.2 | |
Rail DRAGON: Long-Reach Bendable Modularized Rail Structure for Constant Observation Inside PCV |
|
Yokomura, Ryota | The University of Tokyo |
Goto, Masataka | The University of Tokyo |
Yoshida, Takehito | University of Tokyo |
Warisawa, Shin'ichi | The University of Tokyo |
Hanari, Toshihide | JAEA |
Kawabata, Kuniaki | Japan Atomic Energy Agency |
Fukui, Rui | The University of Tokyo |
Keywords: Mechanism Design, Cellular and Modular Robots, Environment Monitoring and Management
Abstract: To reduce errors in the remote control of robots during decommissioning, we developed a Rail DRAGON, which enables continuous observation of the work environment. The Rail DRAGON is constructed by assembling and pushing a long rail structure inside the primary containment vessel (PCV), and then repeatedly deploying several monitoring robots on the rails to enable constant observation in a high-radiation environment. In particular, we have developed the following components of Rail DRAGON: bendable rail modules, straight rail modules, a basement unit, and monitoring robots. Concretely, this research proposes and demonstrates a method to realize an ultralong articulated structure with high portability and workability. In addition, it proposes and verifies the feasibility of a method for deploying observation equipment that can be easily deployed and replaced, while considering disposal.
|
|
11:30-11:45, Paper WeBT6.3 | |
Transformable Inspection Robot Design and Implementation for Complex Pipeline Environment |
|
Wang, Jianlin | Chinese University of Hongkong |
Wang, Yixiang | Rensselaer Polytechnic Institute |
Peng, Lining | The Chinese University of Hong Kong, Shenzhen |
Zhang, Haixiang | The Chinese University of Hong Kong, Shenzhen |
Gao, Hang | The Chinese University of Hong Kong, Shenzhen |
Wang, Chengjiang | The Chinese University of Hong Kong, ShenZhen |
Gao, Yuan | Shenzhen Institute of Artificial Intelligence and Robotics for S |
Luo, Huanliang | Dapeng Customs of the People's Republic of China |
Chen, Yongquan | The Chinese University of Hong Kong, Shenzhen |
Keywords: Mechanism Design, Engineering for Robotic Systems, Surveillance Robotic Systems
Abstract: Pipeline inspections are crucial to ensure the reliability of the transmission system. However, with the growing complexity and aging of the pipeline system, traditional pipeline inspection robots struggle to adapt to complex environments with obstacles, cracks, changing cross-section, and other challenges. This paper introduces a novel transformable inspection robot with remarkable adaptability to varying pipeline environment from 163 mm to 312 mm inner diameter. The robot is composed of several motion modules that are arranged along its central axes at a 60-degree angle. The pneumatically powered robot has good active and passive deformation capabilities, enabling it to passively adapt to its surroundings and actively change between different postures. Our robot can also achieve automatic navigation in complex pipeline environments based on a LiDAR camera. Experiments demonstrate the robot adjusting to varying pipeline scenarios, including obstacles, diameter changes, turning up to 90 degrees, climbing up to 45 degrees, and crossing-section changes with a deformation rate up to 191.4%, overcoming the limitations of traditional designs.
|
|
11:45-12:00, Paper WeBT6.4 | |
Enhancing Maximum Stroke of Twisted String Actuators by Adjusting Twisting Ratio |
|
Baek, Seungjoon | Korea Advanced Institute of Science and Technology |
Jang, JaeHyung | Korea Advanced Institute of Science and Technology |
Ryu, Jee-Hwan | Korea Advanced Institute of Science and Technology |
Keywords: Tendon/Wire Mechanism, Soft Robot Applications, Soft Sensors and Actuators
Abstract: In robotics, twisted string actuators (TSAs) have attracted considerable attention owing to their high gear ratio, flexibility, and simplicity. However, TSAs face challenges such as control issues, limited lifespan, and limited stroke. Among them, their practical use is hindered by a limited stroke due to overtwisting, leading to hysteresis, efficiency, and lifespan issues. To avoid overtwisting, TSAs are confined to a narrow stroke range, approximately 30% of the original string length. This study introduces an innovative approach to enhance TSA stroke without introducing overtwisting. The key lies in adjusting the ratio of overlap and individual twisting (ROI) to strategically use the string that possesses the capacity of twisting. The method capitalizes on the observation that the maximum strokes of two TSA twisting methods—twisted single string and twisted looped single string (TLS)—are approximately equal. This is attributed to the locking point induced by overlap twisting in TLS. To mathematically model this phenomenon, this study develops a novel kinematic model of TSA accounting for the locking point. Additionally, we propose an optimization process, achieving a maximum stroke of 53.44% of the original string length, significantly surpassing the conventional limitation of 30%. This enhancement is achieved without introducing over-twisting, thereby avoiding hysteresis.
|
|
WeBT7 |
Room 7 |
Wearable Robotics |
Regular session |
Chair: Hussain, Irfan | Khalifa University |
|
11:00-11:15, Paper WeBT7.1 | |
A Wearable Finger Tremor-Suppression Orthosis Using the PVC Gel Linear Actuator |
|
Liu, Chen | Queen Mary University of London |
Zhang, Ketao | Queen Mary University of London |
Keywords: Wearable Robotics, Prosthetics and Exoskeletons, Soft Sensors and Actuators
Abstract: Tremor is a prevalent neurological disorder that affects individuals of almost all ages and can significantly impede their quality of life and occupational functioning. Wearable medical devices for suppressing tremors, typically low-frequency vibrations ranging between 3 and 12 Hz, are gaining popularity since active vibration absorbers integrated into such devices have demonstrated immediate efficacy and noninvasive nature. However, there are challenges in miniaturizing active absorbers for wearable applications with traditional actuators. To address this problem, here we present a light wearable active finger tremor suppressing orthosis (AFTO) that consists of a stacked polyvinyl chloride (PVC) gel actuator-based absorber, an inertial measurement unit (IMU), and a force sensor. The integrated sensors allow the device to detect tremors and trigger the absorber to suppress vibrations, regardless of whether the fingertip is vibrating in the air or applying tremor force while in contact with an object. A 3D-printed compliant Sarrus-mechanism exoskeleton was used to house the PVC gel stacked actuator, thus minimizing the linear actuator’s swaying while maximizing the effective actuation area. This innovative wearable finger tremor absorption system has the potential for various applications in daily life and occupational contexts, such as stabilizing the finger during grasping, typing, operating surgical instruments, drawing, and other tasks.
|
|
11:15-11:30, Paper WeBT7.2 | |
Novel Lightweight Lower Limb Exoskeleton Design for Single-Motor Sequential Assistance of Knee & Ankle Joints in Real World |
|
Wu, Xinyu | Xi'an Jiaotong University |
Zhu, Aibin | Xi'an Jiaotong University |
Li, Xiao | Rehabilitation Department, Senior Department of Orthopedics, The |
Bao, Bingsheng | Institute of Robotics & Intelligent Systems, Shaanxi Key Laborat |
Zhang, Jing | Xi'an Jiaotong University |
Shi, Lei | Xi'an Jiaotong University |
Diyang, Dang | Xi'an Jiaotong University |
Xu, Peng | Honghui Hospital, Xi'an Jiaotong University |
Keywords: Wearable Robotics, Prosthetics and Exoskeletons
Abstract: In this paper, we introduce a lightweight lower limb exoskeleton that provides auxiliary torque to both the ankle and knee joints during the stance phase of gait in a real-world environment, using one quasi-direct-drive (QDD) motor . This lightweight exoskeleton incorporates a novel driving mechanism: the Unidirectional Ankle-Knee Gait Clutch(UAKC), which allows for the sequential provision of auxiliary torque to the knee and ankle joints during the stance phase of gait. We trained a lightweight convolutional neural network to determine gait phases from scanned insole pressure data and generate corresponding biological torques. We provide a detailed exposition of the design concept and operation of the lightweight exoskeleton. Through a series of experiments, we evaluated the performance of the lightweight exoskeleton. In real-world conditions, it reduced human energy consumption by 8.9±1.3% compared to walking without the exoskeleton.This study reports on the potential of this rigid-cable coupled underactuated exoskeleton in enhancing human movement.
|
|
11:30-11:45, Paper WeBT7.3 | |
Advanced Enhanced Control of a Novel Wearable Lower-Limb Exoskeleton |
|
Qiu, Shuang | Beihang University |
Pei, Zhongcai | Beihang University |
Shi, Jia | BEIHANG UNIVERSITY |
Zhang, Xu | Beijing Legendary Soaring Technology Company |
Wang, Chen | Beihang University |
Tang, Zhiyong | Beihang University |
Keywords: Wearable Robotics, Force Control, Physical Human-Robot Interaction
Abstract: In this paper, a novel powered lower limb exoskeleton prototype called PTEXO for reducing user burden and enhancing following comfort is presented. The PTEXO is designed with a new control strategy, Enhanced Sensitivity Amplification Control (ESAC), and improves comfort of lower-limb locomotion through three aspects, namely, obtaining high-quality angular acceleration signals, adjusting sensitivities among different model items, and increasing continuity during gait phase transitions. This opens a new option in terms of algorithms for improving the comfort of wearable robotic exoskeletons. In the paper, the mechatronic structure of PTEXO is designed for ESAC, with which dynamic models are established. Finally, wearable experiments validate the proper functioning of the integrated technique, demonstrating the effectiveness of the ESAC strategy in improving PTEXO smoothness. A user survey is included to illustrate the ESAC can effectively and comfortably assists users with lower limb locomotion.
|
|
11:45-12:00, Paper WeBT7.4 | |
Bio-Inspired Cable-Driven Actuation System for Wearable Robotic Devices: Design, Control and Characterization (I) |
|
Xu, Ming | Peking University |
Zhou, Zhihao | Peking University |
Wang, Zezheng | Peking University |
Ruan, Lecheng | University of California Los Angeles |
Mai, Jingeng | Peking University |
Wang, Qining | Peking University |
Keywords: Wearable Robots, Prosthetics and Exoskeletons, Mechanism Design, Human-Centered Robotics
Abstract: Wearable robotic devices interact with human by applying assistive force in parallel with muscle-tendon systems. Designing actuations in mimicking the natural activation patterns of human muscles is a promising way to optimize the performance of wearable robots. In this paper, we propose a bio-inspired cable-driven actuation system capable of providing anisometric contractions (including concentric and eccentric contraction) assistance or nearly acting as a transparent device in an efficient manner. A novel clutch-spring mechanism is employed to accomplish switches between assistive modes and the transparent mode. Corresponding control strategies coordinating with the mechanical design were presented and described in detail. Multiple evaluations were conducted on a test bench to characterize the system performance. The closed-loop bandwidth of the system running concentric assistance control was 18.2 Hz. The R-squared values of linear fitting under eccentric assistance control were above 0.99. The engagement time of the proposed clutch was about 90 ms. Applying the actuation to an ankle exoskeleton, multiple walking experiments with electromyography measurements were performed on five subjects to show its application potential in existing wearable robots. Experimental results revealed that the proposed design could reduce soleus muscle activity by 27.32% compared with normal walking. This study highlights the importance of functional bionic design in human-assistance-related devices and introduces a general actuation system that could be directly applied to existing cable-driven wearable robots.
|
|
WeBT8 |
Room 8 |
Localization I |
Regular session |
Chair: Ma, Junyi | Beijing Institute of Technology |
|
11:00-11:15, Paper WeBT8.1 | |
LCPR: A Multi-Scale Attention-Based LiDAR-Camera Fusion Network for Place Recognition |
|
Zhou, Zijie | Beijing Institute of Technology |
Xu, Jingyi | Beijing Institute of Technology |
Xiong, Guangming | Beijing Institute of Technology |
Ma, Junyi | Beijing Institute of Technology |
Keywords: Localization, Sensor Fusion, SLAM
Abstract: Place recognition is one of the most crucial modules for autonomous vehicles to identify places that were previously visited in GPS-invalid environments. Sensor fusion is considered an effective method to overcome the weaknesses of individual sensors. In recent years, multimodal place recognition fusing information from multiple sensors has gathered increasing attention. However, most existing multimodal place recognition methods only use limited field-of-view camera images, which leads to an imbalance between features from different modalities and limits the effectiveness of sensor fusion. In this paper, we present a novel neural network named LCPR for robust multimodal place recognition, which fuses LiDAR point clouds with multi-view RGB images to generate discriminative and yaw-rotation invariant representations of the environment. A multi-scale attention-based fusion module is proposed to fully exploit the panoramic views from different modalities of the environment and their correlations. We evaluate our method on the nuScenes dataset, and the experimental results show that our method can effectively utilize multi-view camera and LiDAR data to improve the place recognition performance while maintaining strong robustness to viewpoint changes. Our open-source code and pre-trained models are available at https://github.com/ZhouZijie77/LCPR.
|
|
11:15-11:30, Paper WeBT8.2 | |
Robust Cooperative Localization with Failed Communication and Biased Measurements |
|
He, Ronghai | Sun Yat-Sen University |
Shan, Yunxiao | Sun Yat-Sen University |
Huang, Kai | Sun Yat-Sen University |
Keywords: Localization, Multi-Robot Systems, Distributed Robot Systems
Abstract: Cooperative Localization (CL) plays a crucial role in achieving precise localization without relying on localization sensors. However, the performance of CL can be significantly affected by failed communication and biased measurements. This paper presents a robust decentralized CL method that addresses these challenges effectively. To tackle the issue of communication failures, the proposed method adopts a multi-centralized framework that separates the measurement and communication processes. This decoupling allows each robot to utilize measurement information even in the absence of communication. Additionally, an reasonable state estimation method for other robots is proposed by approximating the actual input velocity model of unknown states and then propagating them using the motion model. To handle biased measurements, the method incorporates the M-estimation technique into the measurement update process. This technique weights the received measurements according to their reliability, mitigating the impact of biased measurements on the estimation accuracy. Simulation experiments have been conducted to validate the effectiveness of the proposed method in challenging scenarios. The source code has been made accessible to the public via https://github.com/RonghaiHe/RobustCL.
|
|
11:30-11:45, Paper WeBT8.3 | |
GeoCluster: Enhancing Visual Place Recognition in Spatial Domain on Aerial Vehicle Platforms |
|
Chen, Chao | Beijing University of Chemical Technology |
He, Mengfan | TsinghuaUniversity |
Wang, Jun | Beijing University of Chemical Technology |
Meng, Ziyang | Tsinghua University |
Keywords: Localization, Recognition, Aerial Systems: Perception and Autonomy
Abstract: Visual Place Recognition (VPR) is a critical technology for achieving robust long-term visual geo-localization. During the past few years, VPR research mainly focused on ground-based platforms in the street-level captured scenes with deep learning methods (e.g. NetVLAD, GeM), but little attention was paid to the VPR task on aerial vehicles. The algorithms and models designed for ground-based platforms are always directly applied to the aerial VPR problem. However, the viewpoint variance on Unmanned Aerial Vehicles (UAV) is much larger than the ground-based platforms. Due to the sparse distribution of aerial image features, when the viewpoint of the camera changes, the features of the query image are largely inconsistent with the descriptors in the database, which results in the failures of image retrieval and visual geo-localization. In this paper, we propose an aerial VPR enhancement module called GeoCluster, which presents a feature aggregation method using spatial clustering information to improve the robustness and consistency of the global descriptors for UAV-captured frames. Moreover, it can be applied to any NetVLAD-based VPR method and boost the pre-trained model without any further training process. By integrating GeoCluster into an existing state-of-the-art localization method, we can achieve about 10% improvement for aerial image retrieval tasks and have more accurate and robust geo-localization results. To foster future research, we make the code and datasets in this work publicly available for any researcher at https://github.com/cbbhuxx/GeoCluster.
|
|
WeBT9 |
Room 9 |
Motion and Path Planning I |
Regular session |
Co-Chair: Bennewitz, Maren | University of Bonn |
|
11:00-11:15, Paper WeBT9.1 | |
Safe Navigation Using Density Functions |
|
Zheng, Andrew | Clemson University |
Krishnamoorthy Shankara Narayanan, Sriram Sundar | Clemson University |
Vaidya, Umesh | Clemson University |
Keywords: Motion and Path Planning, Collision Avoidance, Task and Motion Planning
Abstract: This paper presents a novel approach for safe control synthesis using the dual formulation of the navigation problem. The main contribution of this paper is in the analytical construction of density functions for almost everywhere navigation with safety constraints. In contrast to the existing approaches, where density functions are used for the analysis of navigation problems, we use density functions for the synthesis of safe controllers. We provide convergence proof using the proposed density functions for navigation with safety. Further, we use these density functions to design feedback controllers capable of navigating in cluttered environments and high-dimensional configuration spaces. The proposed analytical construction of density functions overcomes the problem associated with navigation functions, which are known to exist but challenging to construct, and potential functions, which suffer from local minima. Application of the developed framework is demonstrated on simple integrator dynamics and fully actuated robotic systems. Our project page with implementation is available at url{https://github.com/clemson-dira/density_feedback_control}
|
|
11:15-11:30, Paper WeBT9.2 | |
State-Feedback Optimal Motion Planning in the Presence of Obstacles |
|
Rousseas, Panagiotis | National Technical University of Athens |
Bechlioulis, Charalampos | University of Patras |
Kyriakopoulos, Kostas | New York University - Abu Dhabi |
Keywords: Motion and Path Planning, Optimization and Optimal Control
Abstract: In this work, a solution to the kinematic optimal motion planning problem is presented, where a previous nearly globally optimal approach is extended to workspaces with internal obstacles. The method is inspired by fundamental properties of velocity fields in the presence of obstacles, where topological restrictions inhibit naive approaches. The topological perplexity problem presents itself as a challenging issue for optimal control, even for low-dimensional cases with simple dynamics. Our scheme is formulated such that a locally optimal workspace decomposition enables extracting a close-to-optimal solution. Several synthetic workspace examples are demonstrated, along with comparisons against existing optimal approaches, where our scheme is superior w.r.t. both cost value and execution time.
|
|
11:30-11:45, Paper WeBT9.3 | |
Efficiency Improvement to Neural-Network-Driven Optimal Path Planning Via Region and Guideline Prediction |
|
Huang, Yuan | Waseda University |
Tsao, Cheng Tien | Waseda University |
Lee, Hee-hyol | Waseda University |
Keywords: Motion and Path Planning, AI-Based Methods
Abstract: Traditional sampling-based algorithms rely on random samples to explore a whole configuration space of robots for optimal path planning, while a uniform sampler impedes the exploration with randomly generated samples, leading to a long calculation time, especially in complex environments. Recently, neural-network-driven methods have attracted wide interest in developing non-uniform sampling to improve the sampling efficiency and reduce the calculation time. A region that contains an optimal path is predicted by neural networks and employed subsequently to biasedly generate samples. This work aims at enhancing the sampling efficiency and reducing the calculation time of the optimal path planning by a novel region and guideline prediction (denoted as RGP) model. We innovatively propose the RGP model with a guideline prediction module to estimate the guideline distributions, which are characterized by the central line of the predicted region. The predicted region and guideline are integrated into a sampling-based algorithm, namely RGP-RRT*, with an adaptively biased sampling strategy to select a proper domain for sampling. Simulations demonstrate the RGP model outperforms other region prediction models in accuracy and robustness. Besides, the RGP-RRT* reliably achieves a 7.2-80.1% reduction in calculation time and a 2.0-58.1% reduction in sample number compared with other neural-network-driven methods. The code is available at https://github.com/RTPWDSDM/RGP-RRTstar.
|
|
11:45-12:00, Paper WeBT9.4 | |
Spatiotemporal Attention Enhances Lidar-Based Robot Navigation in Dynamic Environments |
|
de Heuvel, Jorge | University of Bonn |
Zeng, Xiangyu | University of Bonn |
Shi, Weixian | University of Bonn |
Sethuraman, Tharun | Hochschule Bonn-Rhein-Sieg |
Bennewitz, Maren | University of Bonn |
Keywords: Motion and Path Planning, Collision Avoidance, Reinforcement Learning
Abstract: Foresighted robot navigation in dynamic indoor environments with cost-efficient hardware necessitates the use of a lightweight yet dependable controller. So inferring the scene dynamics from sensor readings without explicit object tracking is a pivotal aspect of foresighted navigation among pedestrians. In this paper, we introduce a spatiotemporal attention pipeline for enhanced navigation based on 2D lidar sensor readings. This pipeline is complemented by a novel lidar-state representation that emphasizes dynamic obstacles over static ones. Subsequently, the attention mechanism enables selective scene perception across both space and time, resulting in improved overall navigation performance within dynamic scenarios. We thoroughly evaluated the approach in different scenarios and simulators, finding excellent generalization to unseen environments. The results demonstrate outstanding performance compared to state-of-the-art methods, thereby enabling the seamless deployment of the learned controller on a real robot.
|
|
WeBT10 |
Room 10 |
Data Sets for Robotic Vision I |
Regular session |
Chair: Aguiari, Davide | TII |
Co-Chair: Meyer, Lukas | Friedrich-Alexander-Universität Erlangen-Nürnberg |
|
11:00-11:15, Paper WeBT10.1 | |
Race against the Machine: A Fully-Annotated, Open-Design Dataset of Autonomous and Piloted High-Speed Flight |
|
Bosello, Michael | Technology Innovation Institute |
Aguiari, Davide | TII |
Keuter, Yvo | TII |
Pallotta, Enrico | TII |
Kiade, Sara | TII |
Caminati, Gyordan | TII |
Pinzarrone, Flavio | TII |
Halepota, Junaid | TII |
Panerati, Jacopo | Technology Innovation Institute |
Pau, Giovanni | TII - Technology Innovation Institute |
Keywords: Aerial Systems: Perception and Autonomy, Data Sets for Robotic Vision, Software Tools for Benchmarking and Reproducibility
Abstract: Unmanned aerial vehicles, and multi-rotors in particular, can now perform dexterous tasks in impervious environments, from infrastructure monitoring to emergency deliveries. Autonomous drone racing has emerged as an ideal benchmark to develop and evaluate these capabilities. Its challenges include accurate and robust visual-inertial odometry during aggressive maneuvers, complex aerodynamics, and constrained computational resources. As researchers increasingly channel their efforts into it, they also need the tools to timely and equitably compare their results and advances. With this dataset, we want to (i) support the development of new methods and (ii) establish quantitative comparisons for approaches originating from the broader robotics and artificial intelligence communities. We want to provide a one-stop resource that is comprehensive of (i) aggressive autonomous and piloted flight, (ii) high-resolution, high-frequency visual, inertial, and motion capture data, (iii) commands and control inputs, (iv) multiple light settings, and (v) corner-level labeling of drone racing gates. We also release the complete specifications to recreate our flight platform, using commercial off-the-shelf components and the open-source flight controller Betaflight, to democratize drone racing research. Our dataset, open-source scripts, and drone design are available at: https://github.com/tii-racing/drone-racing-dataset
|
|
11:15-11:30, Paper WeBT10.2 | |
Multi-Class Trajectory Prediction in Urban Traffic Using the View-Of-Delft Prediction Dataset |
|
Boekema, Hidde | TU Delft |
Martens, Bruno | TU Delft |
Kooij, Julian Francisco Pieter | TU Delft |
Gavrila, Dariu | Delft University of Technology |
Keywords: Datasets for Human Motion, Data Sets for Robot Learning, Deep Learning Methods
Abstract: This paper presents View-of-Delft Prediction, a new dataset for trajectory prediction, to address the lack of on- board trajectory datasets in urban mixed-traffic environments. View-of-Delft Prediction builds on the recently released urban View-of-Delft (VoD) dataset to make it suitable for trajectory prediction. Unique features of this dataset are the challenging road layouts of Delft, with many narrow roads and bridges, and the close proximity between vehicles and Vulnerable Road Users (VRUs). It contains a large proportion of VRUs, with 569 prediction instances for vehicles, 347 for cyclists, and 934 for pedestrians. We additionally provide high-definition map annotations for the VoD dataset to enable state-of-the-art prediction models to be used. We analyse two state-of-the-art trajectory prediction models, PGP and P2T, which originally were developed for vehicle- dominated traffic scenarios, to assess the strengths and weak- nesses of current modelling approaches in mixed traffic settings with large numbers of VRUs. Our analysis shows that there is a significant domain gap between the vehicle-dominated nuScenes and VRU-dominated VoD Prediction datasets. The dataset is publicly released for non-commercial research purposes.
|
|
11:30-11:45, Paper WeBT10.3 | |
Car-Studio: Learning Car Radiance Fields from Single-View and Unlimited In-The-Wild Images |
|
Liu, Tianyu | Hong Kong University of Science and Technology |
Zhao, Hao | Tsinghua University |
Yu, Yang | Hong Kong University of Science and Technology (GUANG ZHOU) |
Zhou, Guyue | Tsinghua University |
Liu, Ming | Hong Kong University of Science and Technology (Guangzhou) |
Keywords: Deep Learning for Visual Perception, Data Sets for Robotic Vision, Computer Vision for Transportation
Abstract: Compositional neural scene graph studies have shown that radiance fields can be an efficient tool in an editable autonomous driving simulator. However, previous studies learned within a sequence of autonomous driving datasets, resulting in unsatisfactory blurring when rotating the car in the simulator. In this letter, we propose a pipeline for learning unconstrained images and building a dataset from processed images. To meet the requirements of the simulator, which demands that the vehicle maintain clarity when the perspective changes and that the contour remains sharp from the background to avoid artifacts when editing, we design a radiation field of the vehicle, a crucial part of the urban scene foreground. Through experiments, we demonstrate that our model achieves competitive performance compared to baselines. Using the datasets built from in-the-wild images, our method gradually presents a controllable appearance editing function. We will release the dataset and code on https://lty2226262.github.io/car-studio/ to facilitate further research in the field.
|
|
WeBT11 |
Room 11 |
Multi-Robot Systems I |
Regular session |
Chair: Parasuraman, Ramviyas | University of Georgia |
Co-Chair: Sun, Guibin | Beihang University |
|
11:00-11:15, Paper WeBT11.1 | |
A Spatial Calibration Method for Robust Cooperative Perception |
|
Song, Zhiying | Tsinghua University |
Xie, Tenghui | Tsinghua University |
Zhang, Hailiang | Tsinghua University |
Liu, Jiaxin | Tsinghua University |
Fuxi, Wen | Tsinghua University |
Li, Jun | Tsinghua University |
Keywords: Multi-Robot Systems, Cooperating Robots, Object Detection, Segmentation and Categorization
Abstract: Cooperative perception is a promising technique for intelligent and connected vehicles through vehicle-to-everything (V2X) cooperation, provided that accurate pose information and relative pose transforms are available. Nevertheless, obtaining precise positioning information often entails high costs associated with navigation systems. Hence, it is required to calibrate relative pose information for multi-agent cooperative perception. This paper proposes a simple but effective object association approach named context-based matching (CBM), which identifies inter-agent object correspondences using intra-agent geometrical context. In detail, this method constructs contexts using the relative position of the detected bounding boxes, followed by local context matching and global consensus maximization. The optimal relative pose transform is estimated based on the matched correspondences, followed by cooperative perception fusion. Extensive experiments are conducted on both the simulated and real-world datasets. Even with larger inter-agent localization errors, high object association precision and decimeter-level relative pose calibration accuracy are achieved among the cooperating agents.
|
|
11:15-11:30, Paper WeBT11.2 | |
Mean-Shift Shape Formation of Multi-Robot Systems without Target Assignment |
|
Zhang, Yunjie | Beihang University |
Zhou, Rui | School of Automation Science and Electrical Engineering, Beihang |
Li, Xing | Beihang Univeristy |
Sun, Guibin | Beihang University |
Keywords: Multi-Robot Systems, Distributed Robot Systems, Cooperating Robots
Abstract: The methods of shape formation in robot swarms are usually classified into two categories by whether assignment is used or not. The first is to use target assignment to assemble precise formation. However, the additional algorithm for re-assignment is required to handle unreasonable situations, which results in lower efficiency. The second, also called assignment-free method, is to use local behaviors to assemble formation, however, existing methods can rarely achieve the precise formation. In this paper, we present a distributed assignment-free algorithm to achieve the precise shape formation based on the mean-shift algorithm. Specifically, each target location in robot's perception range is equally regarded as a point of the mean-shift vector. Then, the weight value of each point is computed according to the density of the target location. Here, each robot obtains the density of the target location according to the distribution of its neighbors. Moreover, this density calculation also considers the states of non-neighboring robots via the hop-count algorithm, thus avoiding conflicts among robots. Subsequently, each robot can regard the calculated mean-shift vector as its control command. Finally, simulation results show that our algorithm can form precise shapes at least 8 times more efficient than the assignment-based approach and physical experiment results confirm that the proposed algorithm exhibits promising potential for practical applications.
|
|
11:30-11:45, Paper WeBT11.3 | |
Distributed Coverage Control for Spatial Processes Estimation with Noisy Observations |
|
Mantovani, Mattia | University of Modena and Reggio Emilia |
Pratissoli, Federico | Università Degli Studi Di Modena E Reggio Emilia |
Sabattini, Lorenzo | University of Modena and Reggio Emilia |
Keywords: Multi-Robot Systems, Distributed Robot Systems
Abstract: The present study addresses the challenge of effectively deploying a multi-robot team to optimally cover a domain with unknown density distribution. Specifically, we propose a distribute coverage-based control algorithm that enables a group of autonomous robots to simultaneously learn and estimate a spatial field over the domain. Additionally, we consider a scenario where the robots are deployed in a noisy environment or equipped with noisy sensors. To accomplish this, the control strategy utilizes Gaussian Process Regression (GPR) to construct a model of the monitored spatial process in the environment. Our strategy tackles the computational limits of Gaussian processes (GPs) when dealing with large data sets. The control algorithm filters the set of samples, limiting the GP training data to those that are relevant to improving the process estimate, avoiding excessive computational complexity and managing the noise in the observations. To evaluate the effectiveness of our proposed algorithm, we conducted several simulations and real platform experiments.
|
|
11:45-12:00, Paper WeBT11.4 | |
Communication-Efficient Multi-Robot Exploration Using Distributed Coverage-Biased Q-Learning |
|
Latif, Ehsan | University of Georgia |
Parasuraman, Ramviyas | University of Georgia |
Keywords: Multi-Robot Systems, Networked Robots, Path Planning for Multiple Mobile Robots or Agents
Abstract: Frontier exploration and reinforcement learning have historically been used to solve the problem of enabling many mobile robots to autonomously and cooperatively explore complex surroundings. These methods need to keep an internal global map for navigation, but they do not consider the high costs of communication and information sharing between robots. This study offers CQLite; a novel distributed Q-learning technique designed to minimize data communication overhead between robots while achieving rapid convergence and thorough coverage in multi-robot exploration. The proposed CQLite method uses ad hoc map merging, and selectively shares updated Q-values at recently identified frontiers to significantly reduce communication costs. The theoretical analysis of CQLite's convergence and efficiency and extensive numerical verification on simulated indoor maps utilizing several robots demonstrate the method's novelty. With over 2x reductions in computation and communication alongside improved mapping performance, CQLite outperformed cutting-edge multi-robot exploration techniques like Rapidly Exploring Random Trees and Deep Reinforcement Learning.
|
|
WeBT12 |
Room 12 |
Reinforcement Learning II |
Regular session |
Co-Chair: Seo, Seung-Woo | Seoul National University |
|
11:00-11:15, Paper WeBT12.1 | |
Diffusion Policies for Out-Of-Distribution Generalization in Offline Reinforcement Learning |
|
Ada, Suzan Ece | Bogazici University |
Oztop, Erhan | Osaka University / Ozyegin University |
Ugur, Emre | Bogazici University |
Keywords: Reinforcement Learning, Deep Learning Methods, Learning from Demonstration
Abstract: Offline Reinforcement Learning (RL) methods leverage previous experiences to learn better policies than the behavior policy used for data collection. However, they face challenges handling distribution shifts due to the lack of online interaction during training. To this end, we propose a novel method named State Reconstruction for Diffusion Policies (SRDP) that incorporates state reconstruction feature learning in the recent class of diffusion policies to address the problem of out-of-distribution (OOD) generalization. Our method promotes learning of generalizable state representation to alleviate the distribution shift caused by OOD states. To illustrate the OOD generalization and faster convergence of SRDP, we design a novel 2D Multimodal Contextual Bandit environment and realize it on a 6-DoF real-world UR10 robot, as well as in simulation, and compare its performance with prior algorithms. In particular, we show the importance of the proposed state reconstruction via ablation studies. In addition, we assess the performance of our model on standard continuous control benchmarks (D4RL), namely the navigation of an 8-DoF ant and forward locomotion of half-cheetah, hopper, and walker2d, achieving state-of-the-art results. Finally, we demonstrate that our method can achieve 167% improvement over the competing baseline on a sparse continuous control navigation task where various regions of the state space are removed from the offline RL dataset, including the region encapsulating the goal.
|
|
11:15-11:30, Paper WeBT12.2 | |
Self-Supervised Curriculum Generation for Autonomous Reinforcement Learning without Task-Specific Knowledge |
|
Lee, Sang-Hyun | Seoul National University |
Seo, Seung-Woo | Seoul National University |
|