| |
Last updated on May 3, 2024. This conference program is tentative and subject to change
Technical Program for Thursday May 16, 2024
|
ThAT1-CC Oral Session, CC-303 |
Add to My Program |
Motion Planning I |
|
|
Co-Chair: Qureshi, Ahmed H. | Purdue University |
|
10:30-12:00, Paper ThAT1-CC.1 | Add to My Program |
Solving Rearrangement Puzzles Using Path Defragmentation in Factored State Spaces |
|
Bayraktar, Servet Bora | TU Berlin |
Orthey, Andreas | Realtime Robotics Inc |
Kingston, Zachary | Rice University |
Toussaint, Marc | TU Berlin |
Kavraki, Lydia | Rice University |
Keywords: Constrained Motion Planning, Task and Motion Planning, Task Planning
Abstract: Rearrangement puzzles are variations of rearrangement problems in which the elements of a problem are potentially logically linked together. To efficiently solve such puzzles, we develop a motion planning approach based on a new state space that is logically factored, integrating the capabilities of the robot through factors of simultaneously manipulatable joints of an object. Based on this factored state space, we propose less-actions RRT (LA-RRT), a planner which optimizes for a low number of actions to solve a puzzle. At the core of our approach lies a new path defragmentation method, which rearranges and optimizes consecutive edges to minimize action cost. We solve six rearrangement scenarios with a Fetch robot, involving planar table puzzles and an escape room scenario. LA-RRT significantly outperforms the next best asymptotically-optimal planner by 4.01 to 6.58 times improvement in final action cost.
|
|
10:30-12:00, Paper ThAT1-CC.2 | Add to My Program |
Leveraging Opportunism in Sample-Based Motion Planning |
|
Lanighan, Michael | TRACLabs, Inc |
Youngquist, Oscar | University of Massachusetts Amherst |
Keywords: Motion and Path Planning
Abstract: Sample-based motion planning approaches, such as RRT*, have been widely adopted in robotics due to their support for high-dimensional state spaces and guarantees of completeness and optimality. This paper introduces an RRT* approach (ORRT*) that leverages opportunism to (1) find solutions quickly, (2) reduce wasted compute, and (3) improve data efficiency. The key insight of the approach is to make the most of compute when expanding the search tree by adding the last viable configurations found when connecting new nodes rather than rejecting the sampled nodes outright, allowing for more productive exploration of the space. We evaluate the proposed approach in a set of mobility and manipulator postural control domains, contrasting the performance of the opportunistic approach with state-of-the-art RRT* variants. Our analysis shows that such an approach has desirable characteristics and warrants further exploration.
|
|
10:30-12:00, Paper ThAT1-CC.3 | Add to My Program |
SE(2) Assembly Planning for Magnetic Modular Cubes |
|
Keune, Kjell | Technische Universität Braunschweig |
Becker, Aaron | University of Houston |
Keywords: Constrained Motion Planning, Assembly, Path Planning for Multiple Mobile Robots or Agents
Abstract: Magnetic modular cubes are cube-shaped bodies with embedded permanent magnets. The cubes are uniformly controlled by a global time-varying magnetic field. A 2D physics simulator is used to simulate global control and the resulting continuous movement of magnetic modular cube structures. We develop local plans, closed-loop control algorithms for planning the connection of two structures at desired faces. The global planner generates a building instruction graph for a target structure that we traverse in a depth-first-search approach by repeatedly applying local plans. We analyze how structure size and shape affect planning time. The planner solves 80% of the randomly created instances with up to 12 cubes in an average time of about 200 seconds.
|
|
10:30-12:00, Paper ThAT1-CC.4 | Add to My Program |
A Constrained Path Following Method for Snake-Like Manipulators Via Controlled Winding Uncoiling Strategy |
|
Luo, Mingrui | Institute of Automation, Chinese Academy of Sciences |
Tian, Yunong | Institute of Automation, Chinese Academy of Sciences |
Cao, Yinghua | Institute of Automation,Chinese Academy of Sciences |
Chen, Minghao | Institute of Automation, Chinese Academy of Sciences |
Zhang, Yanfeng | Institute of Automation, Chinese Academy of Sciences |
Li, En | Institute of Automation, Chinese Academy of Sciences |
Tan, Min | Institute of Automation, Chinese Academy of Sciences |
Keywords: Constrained Motion Planning, Redundant Robots, Motion and Path Planning
Abstract: Benefiting from its hyper-redundant structure, the biomimetic snake-like manipulator retains its remarkable flexibility even within confined spaces. However, its motion planning and control pose significant challenges. This paper imitates the winding uncoiling behavior of snakes to achieve controllable constrained path following. Firstly, based on control points, a recursive computational model and an equivalent planning angle model are established, enabling efficient and analytical determination of joint positions, collision regions, and motion parameters during the path following. Subsequently, the sliding control point algorithm and motion smoothing restriction algorithm are designed. The former ensures that the remaining segments during following strictly remain within the collision-free regions defined by the base and path controls, while the latter smooths the control parameters based on velocity and acceleration limitations. Finally, simulation and practical experiments demonstrate the feasibility of the proposed methods. The prototype that applied our method can reach targets and accomplish tasks, further validating the applicability of the snake-like manipulator.
|
|
10:30-12:00, Paper ThAT1-CC.5 | Add to My Program |
Multi-Profile Quadratic Programming (MPQP) for Optimal Gap Selection and Speed Planning of Autonomous Driving |
|
Anon, Alexandre Miranda | Honda Research Institute USA |
Bae, Sangjae | Honda Research Institute, USA |
Saroya, Manish | Honda Research Institute USA, Inc |
Isele, David | University of Pennsylvania, Honda Research Institute USA |
Keywords: Constrained Motion Planning, Autonomous Vehicle Navigation, Intelligent Transportation Systems
Abstract: Smooth and safe speed planning is imperative for the successful deployment of autonomous vehicles. This paper presents a mathematical formulation for the optimal speed planning of autonomous driving, which has been validated in high-fidelity simulations and real-road demonstrations with practical constraints. The algorithm explores the inter-traffic gaps in the time and space domain using a breadth-first search. For each gap, quadratic programming finds an optimal speed profile, synchronizing the time and space pair along with dynamic obstacles. Qualitative and quantitative analysis in Carla is reported to discuss the smoothness and robustness of the proposed algorithm. Finally, we present a road demonstration result for urban city driving.
|
|
10:30-12:00, Paper ThAT1-CC.6 | Add to My Program |
IKLink: End-Effector Trajectory Tracking with Minimal Reconfigurations |
|
Wang, Yeping | University of Wisconsin-Madison |
Sifferman, Carter | University of Wisconsin-Madison |
Gleicher, Michael | University of Wisconsin - Madison |
Keywords: Constrained Motion Planning, Motion and Path Planning
Abstract: Many applications require a robot to accurately track reference end-effector trajectories. Certain trajectories may not be tracked as single, continuous paths due to the robot's kinematic constraints or obstacles elsewhere in the environment. In this situation, it becomes necessary to divide the trajectory into shorter segments. Each such division introduces a reconfiguration, in which the robot deviates from the reference trajectory, repositions itself in configuration space, and then resumes task execution. The occurrence of reconfigurations should be minimized because they increase time and energy usage. In this paper, we present IKLink, a method for finding joint motions to track reference end-effector trajectories while executing the minimum number of reconfigurations. Our graph-based method generates a diverse set of Inverse Kinematics (IK) solutions for every waypoint on the reference trajectory and utilizes a dynamic programming algorithm to find the optimal motion by linking the IK solutions. We demonstrate the effectiveness of IKLink through a simulation experiment and an illustrative demonstration using a physical robot.
|
|
10:30-12:00, Paper ThAT1-CC.7 | Add to My Program |
A Control Barrier Function-Based Motion Planning Scheme for a Quadruped Robot |
|
Unlu, Halil Utku | New York University |
Gonçalves, Vinicius Mariano | New York University Abu Dhabi, United Arab Emirates |
Chaikalis, Dimitris | New York University |
Tzes, Anthony | New York University Abu Dhabi |
Khorrami, Farshad | New York University Tandon School of Engineering |
Keywords: Motion and Path Planning, Autonomous Vehicle Navigation, Legged Robots
Abstract: A Control Barrier Function (CBF)-based motion planning algorithm is proposed. The algorithm explores an unknown environment to reach a target point, providing velocity commands to the robot controller module. CBFs, along with a circulation inequality are used to generate safe paths toward the goal while preventing collisions with obstacles. The proposed global navigation scheme is experimentally verified on a quadruped platform to demonstrate safe, collision-free exploration over long distances.
|
|
10:30-12:00, Paper ThAT1-CC.8 | Add to My Program |
Physics-Informed Neural Motion Planning on Constraint Manifolds |
|
Ni, Ruiqi | Purdue University |
Qureshi, Ahmed H. | Purdue University |
Keywords: Constrained Motion Planning, Deep Learning Methods
Abstract: Constrained Motion Planning (CMP) aims to find a collision-free path between the given start and goal configurations on the kinematic constraint manifolds. These problems appear in various scenarios ranging from object manipulation to legged-robot locomotion. However, the zero-volume nature of manifolds makes the CMP problem challenging, and the state-of-the-art methods still take several seconds to find a path and require a computationally expansive path dataset for imitation learning. Recently, physics-informed motion planning methods have emerged that directly solve the Eikonal equation through neural networks for motion planning and do not require expert demonstrations for learning. Inspired by these approaches, we propose the first physics-informed CMP framework that solves the Eikonal equation on the constraint manifolds and trains neural function for CMP without expert data. Our results show that the proposed approach efficiently solves various CMP problems in both simulation and real-world, including object manipulation under orientation constraints and door opening with a high-dimensional 6-DOF robot manipulator. In these complex settings, our method exhibits high success rates and finds paths in sub-seconds, which is many times faster than the state-of-the-art CMP methods.
|
|
10:30-12:00, Paper ThAT1-CC.9 | Add to My Program |
Practical and Safe Navigation Function Based Motion Planning of UAVs |
|
Sinhmar, Himani | Cornell University |
Greiff, Marcus | Mitsubishi Electric Research Laboratories |
Di Cairano, Stefano | Mitsubishi Electric Research Laboratories |
Keywords: Constrained Motion Planning, Aerial Systems: Applications, Robot Safety
Abstract: This paper offers a practical method for certifiably safe operations of an unmanned aerial vehicle (UAV) with limited power and computation, useful for real-time operations where the UAV is exposed to significant disturbances in non- convex free space. We propose a motion planning method based on the Explicit Reference Governor (ERG) framework to ensure the safety of a flying quadrotor UAV. From a small set of experiment data and assumptions on modeling errors, a Lyapunov function is synthesized by which an ERG is constructed to modify the UAV set-points. The method can handle polyhedral obstacles and constraints imposed on the maximum thrust of the UAV and its maximum tilt. We demonstrate the approach with extensive simulations and experiments using a Crazyflie 2.1.
|
|
ThAT2-CC Oral Session, CC-311 |
Add to My Program |
Swarm Robotics |
|
|
Chair: Birattari, Mauro | Université Libre De Bruxelles |
Co-Chair: Grosu, Radu | TU Wien |
|
10:30-12:00, Paper ThAT2-CC.1 | Add to My Program |
Flock-Formation Control of Multi-Agent Systems Using Imperfect Relative Distance Measurements |
|
Brandstätter, Andreas | Technische Universität Wien |
Smolka, Scott | Stony Brook University |
Stoller, Scott | Stony Brook University |
Tiwari, Ashish | Microsoft Corp |
Grosu, Radu | TU Wien |
Keywords: Swarm Robotics, Aerial Systems: Mechanics and Control
Abstract: We present distributed distance-based control (DDC), a novel approach for controlling a multi-agent system, such that it achieves a desired formation, in a resource-constrained setting. Our controller is fully distributed and only requires local state-estimation and scalar measurements of inter-agent distances. It does not require an external localization system or inter-agent exchange of state information. Our approach uses spatial-predictive control (SPC), to optimize a cost function given strictly in terms of inter-agent distances and the distance to the target location. In DDC, each agent continuously learns and updates a very abstract model of the actual system, in the form of a dictionary of three independent key-value pairs (delta s, delta d), where delta d is the partial derivative of the distance measurements along a spatial direction delta s. This is sufficient for an agent to choose the best next action. We validate our approach by using DDC to control a collection of Crazyflie drones to achieve formation flight and reach a target while maintaining flock formation.
|
|
10:30-12:00, Paper ThAT2-CC.2 | Add to My Program |
From Shadows to Light: A Swarm Robotics Approach with Onboard Control for Seeking Dynamic Sources in Constrained Environments |
|
Karagüzel, Tugay Alperen | Vrije Universiteit Amsterdam |
Retamal Guiberteau, Victor | Vrije Universiteit Amsterdam |
Cambier, Nicolas | Vrije Universiteit Amsterdam |
Ferrante, Eliseo | Vrije Universiteit Amsterdam |
Keywords: Swarm Robotics, Aerial Systems: Perception and Autonomy, Multi-Robot Systems
Abstract: In this paper, we present a swarm robotics control and coordination approach that can be used for locating a moving target or source in a GNSS-denied indoor setting. The approach is completely on-board and can be deployed on nano-drones such as the Crazyflies. The swarm acts on a simple set of rules to identify and trail a dynamically changing source gradient. To validate the effectiveness of our approach, we conduct experiments to detect the maxima of the dynamic gradient, which was implemented with a set of lights turned on and off with a time-varying pattern. Additionally, we introduce also a minimalistic fully onboard obstacle avoidance method, and assess the flexibility of our method by introducing an obstacle into the environment. The strategies rely on local interactions among UAVs, and the sensing of the source happens only at the individual level and is scalar, making it a viable option for UAVs with limited capabilities. Our method is adaptable to other swarm platforms with only minor parameter adjustments. Our findings demonstrate the potential of this approach as a flexible solution to tackle such tasks in constrained GNSS-denied indoor environments successfully.
|
|
10:30-12:00, Paper ThAT2-CC.3 | Add to My Program |
Multi-Swarm Interaction through Augmented Reality for Kilobots |
|
Feola, Luigi | University of Rome "La Sapienza" - National Research Council IST |
Reina, Andreagiovanni | Université Libre De Bruxelles |
Talamali, Mohamed S. | University College London (UCL) |
Trianni, Vito | Consiglio Nazionale Delle Ricerche |
Keywords: Swarm Robotics, Autonomous Agents, Engineering for Robotic Systems
Abstract: Research with swarm robotics systems can be complicated, time-consuming, and often expensive in terms of space and resources. The situation is even worse for studies involving multiple, possibly heterogeneous robot swarms. Augmented reality can provide an interesting solution to these problems, as demonstrated by the ARK system (Augmented Reality for Kilobots), which enhanced the experimentation possibilities with Kilobots, also relieving researchers from demanding tracking and logging activities. However, ARK is limited in mostly enabling experimentation with a single swarm. In this paper, we introduce M-ARK, a system to support studies on multi-swarm interaction. M-ARK is based on the synchronisation over a network connection of multiple ARK systems, whether real or simulated, serving a twofold purpose: (i) to study the interaction of multiple, possibly heterogeneous swarms, and (ii) to enable a gradual transition from simulation to reality. Moreover, M-ARK enables the interaction between swarms dislocated across multiple labs worldwide, encouraging scientific collaboration and advancement in multi-swarm interaction studies.
|
|
10:30-12:00, Paper ThAT2-CC.4 | Add to My Program |
Morphobot: A Platform for Morphogenesis in Robot Swarm |
|
Qin, Xiaoyang | Shenyang Institute of Automation, CAS |
Yang, Yongliang | Shenyang Institute of Automation, CAS |
Pan, Mengyun | Shenyang Institute of Automation, Chinese Academy of Sciences |
Cui, Long | Shenyang Institute of Automation, Chinese Academy of Sciences |
Liu, Lianqing | Shenyang Institute of Automation |
Keywords: Swarm Robotics, Cellular and Modular Robots, Biologically-Inspired Robots
Abstract: Various robot platforms have been developed for investigating new algorithms for swarm robotics. Morphogenetic engineering in robot swarms, however, proposes new requirements for platforms: precise motion control, physical interactions with the environment or neighbor robots, and functionalized shells. Few current platforms fulfill all the above characteristics. Here, we present Morphobot, a robot platform for morphogenetic engineering in swarm robotics. Its direct current coreless motors provide physical support and strong power, meanwhile, these needle-like motors also enhance the physical interactions among robots. Each Morphobot has a changeable shell. It is functionalized for programming local interactions, through physical contact or communication, among Morphobots. We characterized the mobility of Morphobot to test its capability of moving and physically interacting with its neighbors. To demonstrate its advantages in the morphogenesis of robot swarms, we designed two morphogenetic engineering experiments. The results revealed that swarms of Morphobots can form patterns via physical interactions and optical communications.
|
|
10:30-12:00, Paper ThAT2-CC.5 | Add to My Program |
FireAntV3: A Modular Self-Reconfigurable Robot towards Free-Form Self-Assembly Using Attach-Anywhere Continuous Docks |
|
Swissler, Petras | New Jersey Institute of Technology |
Rubenstein, Michael | Northwestern University |
Keywords: Swarm Robotics, Cellular and Modular Robots, Mechanism Design
Abstract: FireAntV3 uses a refined version of the 3D Continuous Docks to attach to other such docks at any location at any orientation with simple control and without alignment. The robot improves upon previous FireAnt-series robots by redesigning the locomotion drive system to improve mechanical and attachment reliability while also reducing the number of motors from six to three. We also expand the sensory capabilities of FireAntV3 to enable the robot to sense forces, sense the direction to a light source, and to sense contacting neighbors using vibrations. We validate this robot through full-robot tests demonstrating phototaxis and neighbor-detecting behavior. This paper also describes the method for manufacturing the continuous docks in a variety of geometries.
|
|
10:30-12:00, Paper ThAT2-CC.6 | Add to My Program |
Reciprocal and Non-Reciprocal Swarmalators with Programmable Locomotion and Formations for Robot Swarms |
|
Ceron, Steven | Massachusetts Institute of Technology |
Xiao, Wei | MIT |
Rus, Daniela | MIT |
Keywords: Swarm Robotics, Cellular and Modular Robots, Micro/Nano Robots
Abstract: Natural and robotic swarms often exhibit non-reciprocal interactions; agents do not exhibit equal and opposite forces on each other. By studying the effects of reciprocal and non-reciprocal interactions we are better able to design emergent behaviors in robot collectives composed of agents that exert attractive and repulsive forces on each other. Moreover, by controlling agent-specific coupling forces on-demand, we can enable a collective to exhibit desired behaviors previously not possible. We use a general form of the swarming oscillator, swarmalator, model to study reciprocal and non-reciprocal interactions among agents that affect each other's motions over long and short distances, we use non-reciprocal coupling to elicit collective locomotion toward or away from target sites, and we use the control barrier function method to optimize the non-reciprocal interactions for a desired spatial formation. This work addresses the interests of the active matter, swarm robotics, and control barrier functions communities and demonstrates various collective behaviors with strong potential to be realized in macro- and micro- length scale robot swarms.
|
|
10:30-12:00, Paper ThAT2-CC.7 | Add to My Program |
Automatically Designing Robot Swarms in Environments Populated by Other Robots: An Experiment in Robot Shepherding |
|
Garzón Ramos, David | Université Libre De Bruxelles |
Birattari, Mauro | Université Libre De Bruxelles |
Keywords: Swarm Robotics, Evolutionary Robotics
Abstract: Automatic design is a promising approach to realizing robot swarms. Given a mission to be performed by the swarm, an automatic method produces the required control software for the individual robots. Automatic design has concentrated on missions that a swarm can execute independently, interacting only with a static environment and without the involvement of other active entities. In this paper, we investigate the design of robot swarms that perform their mission by interacting with other robots that populate their environment. We frame our research within robot shepherding: the problem of using a small group of robots—the shepherds—to coordinate a relatively larger group—the sheep. In our study, the group of shepherds is the swarm that is automatically designed, and the sheep are pre-programmed robots that populate its environment. We use automatic modular design and neuroevolution to produce the control software for the swarm of shepherds to coordinate the sheep. We show that automatic design can leverage mission-specific interaction strategies to enable an effective coordination between the two groups.
|
|
10:30-12:00, Paper ThAT2-CC.8 | Add to My Program |
CrazySim: A Software-In-The-Loop Simulator for the Crazyflie Nano Quadrotor |
|
Llanes, Christian | The Georgia Institute of Technology |
Kakish, Zahi | Sandia National Laboratories |
Williams, Kyle | Sandia National Labs |
Coogan, Samuel | Georgia Tech |
Keywords: Swarm Robotics, Software-Hardware Integration for Robot Systems, Aerial Systems: Applications
Abstract: In this work we develop a software-in-the-loop simulator platform for Crazyflie nano quadrotor drone fleets. One of the challenges in maintaining a large fleet of drones is ensuring that the fleet performs its task as expected without collision, and this becomes more challenging as the number of drones scales, possibly into the hundreds. Software-in-the-loop simulation is an important component in verifying that drone fleets operate correctly and can significantly reduce development time. The simulator interface that we develop runs an instance of the Crazyflie flight stack firmware for each individual drone on a commercial, desktop machine along with a sensors and communication plugin on Gazebo Sim. The plugin transmits simulated sensor information to the firmware along with a socket link interface to run external scripts that would be run on a ground station during hardware deployment. The plugin simulates a radio communication delay between the drones and the ground station to test offboard control algorithms and high-level fleet commands. To validate the proposed simulator, we provide a case study of decentralized model predictive control (MPC) that is run on a ground station to command a fleet of sixteen drones to follow a specified trajectory. We first run the controller on the simulator interface to verify performance and robustness of the algorithm before deployment to a Crazyflie hardware experiment in the Georgia Tech Robotarium.
|
|
ThAT3-CC Oral Session, CC-313 |
Add to My Program |
Range Sensing |
|
|
Chair: De Martini, Daniele | University of Oxford |
Co-Chair: Caron, Guillaume | CNRS |
|
10:30-12:00, Paper ThAT3-CC.1 | Add to My Program |
Geometry-Informed Distance Candidate Selection for Adaptive Lightweight Omnidirectional Stereo Vision with Fisheye Images |
|
Pulling, Conner | Carnegie Mellon University |
Tan, Je Hon | Defence Science and Technology Agency |
Hu, Yaoyu | Carnegie Mellon University |
Scherer, Sebastian | Carnegie Mellon University |
Keywords: Omnidirectional Vision
Abstract: Multi-view stereo omnidirectional distance estimation usually needs to build a cost volume with many hypothetical distance candidates. The cost volume building process is often computationally heavy considering the limited resources a mobile robot has. We propose a new geometry-informed way of distance candidates selection method which enables the use of a very small number of candidates and reduces the computational cost. We demonstrate the use of the geometry-informed candidates in a set of model variants. We find that by adjusting the candidates during robot deployment, our geometry-informed distance candidates also improve a pre-trained model's accuracy if the extrinsics or the number of cameras changes. Without any re-training or fine-tuning, our models outperform models trained with evenly distributed distance candidates. Models are also released as hardware-accelerated versions with a new dedicated large-scale dataset. The project page, code, and dataset can be found at https://theairlab.org/gicandidates/.
|
|
10:30-12:00, Paper ThAT3-CC.2 | Add to My Program |
On Camera Model Conversions |
|
Goichon, Eva | UPJV |
Caron, Guillaume | CNRS |
Vasseur, Pascal | Université De Picardie Jules Verne |
Kanehiro, Fumio | National Inst. of AIST |
Keywords: Omnidirectional Vision
Abstract: On the one hand, cameras of conventional field-of-view usually considered in computer vision and robotics are very often modeled as a pinhole plus possibly a distortion model. On the other hand, there is a large variety of models for panoramic cameras. Many camera models have been proposed for fisheye cameras, catadioptric cameras, and super fisheye cameras. But in both cases, few models offer the possibility of converting them into another model. This paper contributes to filling this gap in, to allow an algorithm designed with a projection model to accept data of a camera calibrated with another model. So, a pre-existing data set can be used without having to recalibrate the camera. We provide the methodology and mathematical developments for three conversions considering three different types of cameras that are evaluated with respect to calibration and within a visual Simultaneous Localization And Mapping benchmark. The source code of the camera model conversions studied in this paper is shared within the libPeR library for Perception in Robotics: https://github.com/PerceptionRobotique/libPeR_base.
|
|
10:30-12:00, Paper ThAT3-CC.3 | Add to My Program |
RadCloud: Real-Time High-Resolution Point Cloud Generation Using Low-Cost Radars for Aerial and Ground Vehicles |
|
Hunt, David | Duke University |
Luo, Shaocheng | Duke University |
Khazraei, Amir | Duke University |
Zhang, Xiao | Duke University |
Hallyburton, Robert | Duke University |
Chen, Tingjun | Duke University |
Pajic, Miroslav | Duke University |
Keywords: Range Sensing, Deep Learning for Visual Perception, Aerial Systems: Perception and Autonomy
Abstract: Abstract— In this work, we present RadCloud, a novel real time framework for directly obtaining higher-resolution lidar-like 2D point clouds from low-resolution radar frames on resource-constrained platforms commonly used in unmanned aerial and ground vehicles (UAVs and UGVs, respectively); such point clouds can then be used for accurate environmental mapping, navigating unknown environments, and other robotics tasks. While high-resolution sensing using radar data has been previously reported, existing methods cannot be used on most UAVs, which have limited computational power and energy; thus, existing demonstrations focus on offline radar processing. RadCloud overcomes these challenges by using a radar configuration with 1/4th of the range resolution and employing a deep learning model with 2.25× fewer parameters. Additionally, RadCloud utilizes a novel chirp-based approach that makes obtained point clouds resilient to rapid movements (e.g., aggressive turns or spins), that commonly occur during UAV flights. In real-world experiments, we demonstrate the accuracy and applicability of RadCloud on commercially available UAVs and UGVs, with off-the-shelf radar platforms on-board.
|
|
10:30-12:00, Paper ThAT3-CC.4 | Add to My Program |
That's My Point: Compact Object-Centric LiDAR Pose Estimation for Large-Scale Outdoor Localisation |
|
Pramatarov, Georgi | University of Oxford |
Gadd, Matthew | University of Oxford |
Newman, Paul | Oxford University |
De Martini, Daniele | University of Oxford |
Keywords: Range Sensing, Localization, Mapping
Abstract: This paper is about 3D pose estimation on LiDAR scans with extremely minimal storage requirements to enable scalable mapping and localisation. We achieve this by clustering all points of segmented scans into semantic objects and representing them only with their respective centroid and semantic class. In this way, each LiDAR scan is reduced to a compact collection of four-number vectors. This abstracts away important structural information from the scenes, which is crucial for traditional registration approaches. To mitigate this, we introduce an object-matching network based on self- and cross-correlation that captures geometric and semantic relationships between entities. The respective matches allow us to recover the relative transformation between scans through weighted Singular Value Decomposition (SVD) and RANdom SAmple Consensus (RANSAC). We demonstrate that such representation is sufficient for metric localisation by registering point clouds taken under different viewpoints on the KITTI dataset, and at different periods of time localising between KITTI and KITTI-360. We achieve accurate metric estimates comparable with state-of-the-art methods with almost half the representation size, specifically 1.33 kB on average.
|
|
10:30-12:00, Paper ThAT3-CC.5 | Add to My Program |
A Point-To-Distribution Degeneracy Detection Factor for LiDAR SLAM Using Local Geometric Models |
|
Ji, Sehua | Guangdong University of Technology |
Chen, Weinan | Guangdong University of Technology |
Su, Zerong | Institute of Intelligent Manufacturing, Guangdong Academy of Sci |
Guan, Yisheng | Guangdong University of Technology |
Li, Jiehao | South China University of Technology |
Zhang, Hong | SUSTech |
Zhu, Haifei | Guangdong University of Technology |
Keywords: Range Sensing, Localization, Mapping
Abstract: Limited by the working principles, LiDAR-SLAM systems suffer from the degeneration phenomenon in environments such as long corridors and tunnels, due to the lack of sufficient geometric features for frame-to-frame matching. The accuracy and sensitivity of existing degeneracy detection methods need to be further improved. In this paper, we propose a novel method for degeneracy detection using local geometric models based on point-to-distribution matching. To obtain an accurate description of local geometric models, an adaptive adjustment of voxel segmentation according to the point cloud distribution and density is designed. The codes of the proposed method is open-source and available at https://github.com/jisehua/Degenerate-Detection.git. Experiments with public datasets and self-build robots were conducted to evaluate the methods. The results exhibit that our proposed method achieves higher accuracy than the other existing approaches. Applying our proposed method is beneficial for improving the robustness of the LiDAR-SLAM systems.
|
|
10:30-12:00, Paper ThAT3-CC.6 | Add to My Program |
I-Octree: A Fast, Lightweight, and Dynamic Octree for Proximity Search |
|
Zhu, Jun | Tsinghua University |
Li, Hongyi | Tsinghua University |
Wang, Zhepeng | Lenovo Research |
Wang, Shengjie | Tsinghua University |
Zhang, Tao | Tsinghua University |
Keywords: Range Sensing, Localization, Mapping
Abstract: Establishing the correspondences between newly acquired points and historically accumulated data (i.e., map) through nearest neighbors search is crucial in numerous robotic applications. However, static tree data structures are inadequate to handle large and dynamically growing maps in real-time. To address this issue, we present the i-Octree, a dynamic octree data structure that supports both fast nearest neighbor search and real-time dynamic updates, such as point insertion, deletion, and on-tree down-sampling. The i-Octree is built upon a leaf-based octree and has two key features: a local spatially continuous storing strategy that allows for fast access to points while minimizing memory usage, and local on-tree updates that significantly reduce computation time compared to existing static or dynamic tree structures. The experiments show that i-Octree outperforms contemporary state-of-the-art approaches by achieving, on average, a 19% reduction in runtime on real-world open datasets.
|
|
10:30-12:00, Paper ThAT3-CC.7 | Add to My Program |
Self-Supervised Depth Correction of Lidar Measurements from Map Consistency Loss |
|
Agishev, Ruslan | Czech Technical University in Prague, FEE |
Petricek, Tomas | Czech Technical University in Prague |
Zimmermann, Karel | Ceske Vysoke Uceni Technicke V Praze, FEL |
Keywords: Range Sensing, Mapping, Data Sets for SLAM
Abstract: Depth perception is considered an invaluable source of information in the context of 3D mapping and various robotics applications. However, point cloud maps acquired using consumer-level light detection and ranging sensors (lidars) still suffer from bias related to local surface properties such as measuring beam-to-surface incidence angle. This fact has recently motivated researchers to exploit traditional filters, as well as the deep learning paradigm, in order to suppress the aforementioned depth sensors error while preserving geometric and map consistency details. Despite the effort, depth correction of lidar measurements is still an open challenge mainly due to the lack of clean 3D data that could be used as ground truth. In this paper, we introduce two novel point cloud map consistency losses, which facilitate self-supervised learning on real data of lidar depth correction models. Specifically, the models exploit multiple point cloud measurements of the same scene from different view-points in order to learn to reduce the bias based on the constructed map consistency signal. Complementary to the removal of the bias from the measurements, we demonstrate that the depth correction models help to reduce localization drift. Additionally, we release a data set that contains point cloud scans captured in an indoor corridor environment with precise localization and ground truth mapping information.
|
|
10:30-12:00, Paper ThAT3-CC.8 | Add to My Program |
GroundGrid: LiDAR Point Cloud Ground Segmentation and Terrain Estimation |
|
Steinke, Nicolai | Freie Universität Berlin |
Goehring, Daniel | Freie Universität Berlin |
Rojas, Raul | Freie Universität Berlin |
Keywords: Range Sensing, Mapping, Field Robots
Abstract: The precise point cloud ground segmentation is a crucial prerequisite of virtually all perception tasks for LiDAR sensors in autonomous vehicles. Especially the clustering and extraction of objects from a point cloud usually relies on an accurate removal of ground points. The correct estimation of the surrounding terrain is important for aspects of the drivability of a surface, path planning, and obstacle prediction. In this article, we propose our system GroundGrid which relies on 2D elevation maps to solve the terrain estimation and point cloud ground segmentation problems. We evaluate the ground segmentation and terrain estimation performance of GroundGrid and compare it to other state-of-the-art methods using the SemanticKITTI dataset and a novel evaluation method relying on airborne LiDAR scanning. The results show that GroundGrid is outperforming other state-of-the-art systems with an average IoU of 94.78% while maintaining a high run-time performance of 171Hz. The source code is available at https://github.com/dcmlr/groundgrid
|
|
10:30-12:00, Paper ThAT3-CC.9 | Add to My Program |
Morphable-SfS: Enhancing Shape-From-Silhouette Via Morphable Modeling |
|
Lu, Guoyu | University of Georgia |
Keywords: Range Sensing, RGB-D Perception, Mapping
Abstract: Reconstructing accurate object shapes based on single image inputs is still a critical and challenging task, mainly due to the potential shape ambiguity and occlusion. Most existing single image 3D reconstruction approaches, either trained on stereo setting or structure-from-motion, estimate 2.5D visible models which generally reconstruct one viewpoint of objects. We propose a method to leverage both the general Morphable Model on common objects and a multi-view synthesis-based shape-from-silhouette model to reconstruct complete object shapes. We use the proposed method to exploit strong geometric and perceptual cues in 3D shape reconstruction. During the inference, the trained model is able to produce high-quality and complete meshes with finely detailed structures from a 2D image captured from arbitrary perspectives. The proposed method is evaluated on both large-scale synthetic ShapeNet and real-world Pascal 3D+ and Pix3D datasets. The proposed work achieves state-of-the-art results compared with other recent self-supervised methods. Moreover, it shows a good capability of being applied in the unseen object reconstruction tasks.
|
|
ThAT4-CC Oral Session, CC-315 |
Add to My Program |
Path Planning for Multiple Mobile Robots or Agents I |
|
|
Chair: Zhu, Quanyan | New York University |
Co-Chair: Min, Byung-Cheol | Purdue University |
|
10:30-12:00, Paper ThAT4-CC.1 | Add to My Program |
Collision Detection and Avoidance for Black Box Multi-Robot Navigation |
|
Ayoubi, Sara | Nokia Bell Labs |
Hadzic, Ilija | Nokia Bell Labs |
Salaun, Lou | Nokia Bell Labs |
Massaro, Antonio | Nokia Bell Labs |
Keywords: Path Planning for Multiple Mobile Robots or Agents, Collision Avoidance, Deep Learning Methods
Abstract: To date, commercial industrial robots only provide multi-robot coordination for their own fleet of robots and treat robots from other vendors as general obstacles. The ability to enable robots from different vendors to co-exist in the same space is crucial to prevent vendor lock-in. We present the first decentralized system that achieves coordination between a heterogeneous fleet of black box robots for which the internals of the navigation stack are presumed unmodifiable. Our system, which we call CODAK, achieves the coordination by relying on minimum set of interfaces that are commonly available on most industrial and service robots. For each robot, CODAK uses a trained recurrent neural network to anticipate collisions from externally observable metrics. Anticipated collisions are avoided using a simple, but yet effective, concurrency control scheme. We run a series of experiments in simulation and with real robots to demonstrate CODAK’s ability to enable safe navigation in different environments. We also experimentally compare CODAK with previously published white-box solutions to evaluate the penalty of black-box constraint.
|
|
10:30-12:00, Paper ThAT4-CC.2 | Add to My Program |
Stackelberg Game-Theoretic Trajectory Guidance for Multi-Robot Systems with Koopman Operator |
|
Zhao, Yuhan | New York University |
Zhu, Quanyan | New York University |
Keywords: Path Planning for Multiple Mobile Robots or Agents, Cooperating Robots, Model Learning for Control
Abstract: Guided trajectory planning involves a leader robot strategically directing a follower robot to collaboratively reach a designated destination. However, this task becomes notably challenging when the leader lacks complete knowledge of the follower's decision-making model. There is a need for learning-based methods to effectively design the cooperative plan. To this end, we develop a Stackelberg game-theoretic approach based on the Koopman operator to address the challenge. We first formulate the guided trajectory planning problem through the lens of a dynamic Stackelberg game. We then leverage Koopman operator theory to acquire a learning-based linear system model that approximates the follower's feedback dynamics. Based on this learned model, the leader devises a collision-free trajectory to guide the follower using receding horizon planning. We use simulations to elaborate on the effectiveness of our approach in generating learning models that accurately predict the follower's multi-step behavior when compared to alternative learning techniques. Moreover, our approach successfully accomplishes the guidance task and notably reduces the leader's planning time to nearly half when contrasted with the model-based baseline method.
|
|
10:30-12:00, Paper ThAT4-CC.3 | Add to My Program |
Optimal Path Planning for a Convoy-Support Vehicle Pair through a Repairable Network (I) |
|
Bhadoriya, Abhay Singh | Texas A&M University |
Montez, Christopher | Texas A&M University - College Station |
Rathinam, Sivakumar | TAMU |
Darbha, Swaroop | TAMU |
Casbeer, David | AFRL |
Manyam, Satyanarayana Gupta | Infoscitex Corp |
Keywords: Path Planning for Multiple Mobile Robots or Agents, Cooperating Robots, Multi-Robot Systems
Abstract: In this article, we consider a multi-agent path planning problem in a partially impeded environment. The impeded environment is represented by a graph with select road segments(edges) in disrepair impeding vehicular movement in the road network. A primary vehicle, which we refer to as a convoy, wishes to travel from a starting location to a destination while minimizing some accumulated cost. The convoy may traverse an impeded edge for an additional cost (associated with repairing the edge) than if it were unimpeded. A support vehicle, which we refer to as a service vehicle, is simultaneously deployed to assist the convoy by repairing edges, reducing the cost for the convoy to traverse those edges. The convoy is permitted to wait at any vertex to allow the service vehicle to repair an edge. The service vehicle is permitted to terminate its path at any vertex. The goal is then to find a pair of paths so the convoy reaches its destination while minimizing the total time (cost) the two vehicles are active, including any time the convoy waits. We refer to this problem as the Assisted Shortest Path Problem (ASPP). We present a generalized permanent labeling algorithm (GPLA) to find an optimal solution for the ASPP. We also introduce additional modifications to the labeling algorithm to significantly improve the computation time and refer to the modified labeling algorithm as GPLA*. Computational results are presented to illustrate the effectiveness of GPLA* in solving the ASPP.
|
|
10:30-12:00, Paper ThAT4-CC.4 | Add to My Program |
Learning Heterogeneous Multi-Agent Allocations for Ergodic Search |
|
Rao, Ananya | Carnegie Mellon University |
Sartoretti, Guillaume Adrien | National University of Singapore (NUS) |
Choset, Howie | Carnegie Mellon University |
Keywords: Path Planning for Multiple Mobile Robots or Agents, Distributed Robot Systems, Autonomous Agents
Abstract: Information-based coverage directs robots to move over an area to optimize a pre-defined objective function based on some measure of information. Our prior work determined that the spectral decomposition of an information map can be used to guide a set of heterogeneous agents, each with different sensor and motion models, to optimize coverage in a target region, based on a measure called ergodicity. In this paper, we build on this insight to construct a reinforcement learning formulation of the problem of allocating heterogeneous agents to different search regions in the frequency domain. We relate the spectral coefficients of the search map to each other in three different ways. The first method maps agents to pre-defined sets of spectral coefficients. In the second method, each agent learns a weight distribution over all spectral coefficients. Finally, in the third method, each agent learns weight distributions as parameterized curves over coefficients. Our numerical results demonstrate that distributing and assigning coverage responsibilities to agents depending on their sensing and motion models leads to 40%, 51%, and 46% improvement in coverage performance as measured by the ergodic metric, and 15%, 22%, and 20% improvement in time to find all targets in the search region, for the three methods respectively.
|
|
10:30-12:00, Paper ThAT4-CC.5 | Add to My Program |
Multi-Robot Cooperative Socially-Aware Navigation Using Multi-Agent Reinforcement Learning |
|
Wang, Weizheng | Purdue University |
Mao, Le | Beijing University of Chemical Technology |
Wang, Ruiqi | Purdue University |
Min, Byung-Cheol | Purdue University |
Keywords: Path Planning for Multiple Mobile Robots or Agents, Human-Aware Motion Planning, Human Factors and Human-in-the-Loop
Abstract: In public spaces shared with humans, ensuring multi-robot systems navigate without collisions while respecting social norms is challenging, particularly with limited communication. Although current robot social navigation techniques leverage advances in reinforcement learning and deep learning, they frequently overlook robot dynamics in simulations, leading to a simulation-to-reality gap. In this paper, we bridge this gap by presenting a new multi-robot social navigation environment crafted using Dec-POSMDP and multi-agent reinforcement learning. Furthermore, we introduce SAMARL: a novel benchmark for cooperative multi-robot social navigation. SAMARL employs a unique spatial-temporal transformer combined with multi-agent reinforcement learning. This approach effectively captures the complex interactions between robots and humans, thus promoting cooperative tendencies in multi-robot systems. Our extensive experiments reveal that SAMARL outperforms existing baseline and ablation models in our designed environment. Demo videos for this work can be found at: https://sites.google.com/view/samarl
|
|
10:30-12:00, Paper ThAT4-CC.6 | Add to My Program |
Multi-Agent Path Finding for Cooperative Autonomous Driving |
|
Yan, Zhongxia | Massachusetts Institute of Technology |
Zheng, Han | Massachusetts Institute of Technology |
Wu, Cathy | MIT |
Keywords: Path Planning for Multiple Mobile Robots or Agents, Intelligent Transportation Systems, Motion and Path Planning
Abstract: Anticipating possible future deployment of connected and automated vehicles (CAVs), cooperative autonomous driving at intersections has been studied by many works in control theory and intelligent transportation across decades. Simultaneously, recent parallel works in robotics have devised efficient algorithms for multi-agent path finding (MAPF), though often in environments with simplified kinematics. In this work, we hybridize insights and algorithms from MAPF with the structure and heuristics of optimizing the crossing order of CAVs at signal-free intersections. We devise an optimal and complete algorithm, Order-based Search with Kinematics Arrival Time Scheduling (OBS-KATS), which significantly outperforms existing algorithms, fixed heuristics, and prioritized planning with KATS. The performance is maintained under different vehicle arrival rates, lane lengths, crossing speeds, and control horizon. Through ablations and dissections, we offer insight on the contributing factors to OBS-KATS's performance. Our work is directly applicable to many similarly scaled traffic and multi-robot scenarios with directed lanes.
|
|
10:30-12:00, Paper ThAT4-CC.7 | Add to My Program |
Communication-Aware Map Compression for Online Path-Planning |
|
Psomiadis, Evangelos | Georgia Institute of Technology |
Maity, Dipankar | University of North Carolina - Charlotte |
Tsiotras, Panagiotis | Georgia Tech |
Keywords: Path Planning for Multiple Mobile Robots or Agents, Optimization and Optimal Control, Cooperating Robots
Abstract: This paper addresses the problem of the communication of optimally compressed information for mobile robot path-planning. In this context, mobile robots compress their current local maps to assist another robot in reaching a target in an unknown environment. We propose a framework that sequentially selects the optimal level of compression, guided by the robot's path, by balancing map resolution and communication cost. Our approach is tractable in close-to-real scenarios and does not necessitate prior environment knowledge. We design a novel decoder that leverages compressed information to estimate the unknown environment via convex optimization with linear constraints and an encoder that utilizes the decoder to select the optimal compression. Numerical simulations are conducted in a large close-to-real map and a maze map and compared with two alternative approaches. The results confirm the effectiveness of our framework in assisting the robot reach its target by reducing transmitted information, on average, by approximately 50%, while maintaining satisfactory performance.
|
|
10:30-12:00, Paper ThAT4-CC.8 | Add to My Program |
Wind Field Modeling for Formation Planning in Multi-Drone Systems |
|
Park, Minhyuk | UNIST |
Au, Tsz-Chiu | Ulsan National Institute of Science and Technology |
Keywords: Path Planning for Multiple Mobile Robots or Agents, Planning, Scheduling and Coordination
Abstract: In multi-drone systems such as drone light shows, drones move in formation while avoiding collisions. However, few existing formation planning algorithms consider the wind fields of drones during planning. Since the wind field effect is prominent when drones have to fly close to each other, we cannot ignore the effect during planning. In this paper, we extend the reservation system in autonomous intersection management for grid-based formation planning by including a new type of reservation called non-exclusive reservations specifically for handling wind fields. We train a deep learning model to predict the deviation of a drone's trajectory when the drone enters the wind field of another drone and then use the reservation grid to prevent collision. Based on the reservation system, we develop a new formation planning algorithm that focuses on adjusting the start times of motion plans to avoid collision. Our experimental results show that trajectory prediction can help make better decisions in task assignments for minimizing makespans.
|
|
10:30-12:00, Paper ThAT4-CC.9 | Add to My Program |
Multi-Robot Informative Path Planning from Regression with Sparse Gaussian Processes |
|
Jakkala, Kalvik | University of North Carolina at Charlotte |
Akella, Srinivas | University of North Carolina at Charlotte |
Keywords: Path Planning for Multiple Mobile Robots or Agents, Probability and Statistical Methods, Environment Monitoring and Management
Abstract: This paper addresses multi-robot informative path planning (IPP) for environmental monitoring. The problem involves determining informative regions in the environment that should be visited by robots to gather the most information about the environment. We propose an efficient sparse Gaussian process-based approach that uses gradient descent to optimize paths in continuous environments. Our approach efficiently scales to both spatially and spatio-temporally correlated environments. Moreover, our approach can simultaneously optimize the informative paths while accounting for routing constraints, such as a distance budget and limits on the robot's velocity and acceleration. Our approach can be used for IPP with both discrete and continuous sensing robots, with point and non-point field-of-view sensing shapes, and for both single and multi-robot IPP. We demonstrate that the proposed approach is fast and accurate on real-world data.
|
|
ThAT5-CC Oral Session, CC-411 |
Add to My Program |
Visual Learning I |
|
|
Chair: Aragon-Camarasa, Gerardo | University of Glasgow |
Co-Chair: Yi, Li | Tsinghua University |
|
10:30-12:00, Paper ThAT5-CC.1 | Add to My Program |
ASP-LED: Learning Ambiguity-Aware Structural Priors for Joint Low-Light Enhancement and Deblurring |
|
Ye, Jing | Sun Yat-Sen University |
Liu, Yang | Sun Yat-Sen University |
Yu, Congjing | Sun Yat-Sen University |
Qiu, Changzhen | Sun Yat-Sen University |
Zhang, Zhiyong | Sun Yat-Sen University |
Keywords: Deep Learning for Visual Perception, Computer Vision for Automation, Visual Learning
Abstract: Low-light enhancement and deblurring is vital for high-level vision-related nighttime tasks. Most existing cascade and joint enhancement methods may provide undesirable results, suffering from severe artifacts, deteriorating blur, and unclear details. In this paper, we propose a novel ambiguity-aware network (ASP-LED) with structural priors, including high-frequency and edge, to enable effective image representation learning for joint low-light enhancement and deblurring. Specifically, we employ a Transformer backbone to explore the global clues of the image. To compensate for the inadequate local detail optimization, we propose a multi-patch perception pyramid block that models the correlation between different size patches and ambiguity, and identifies non-uniform deblurring spatial features, facilitating the reconstruction of potential high-frequency and edge information. Furthermore, a prior-guided reconstruction block based on the parallel attention mechanism is present to adaptively correct global image with statistical features, which helps guide the model to refine sharp texture and structure. Extensive experiments performed on simulated and real-world datasets demonstrate the efficacy of our proposed method in restoring low-light blurry images with increased visual perception compared to state-of-the-art methods.
|
|
10:30-12:00, Paper ThAT5-CC.2 | Add to My Program |
AnyOKP: One-Shot and Instance-Aware Object Keypoint Extraction with Pretrained ViT |
|
Qin, Fangbo | Institute of Automation, Chinese Academy of Sciences |
Hou, Taogang | Beijing Jiaotong University |
Lin, Shan | University of California, San Diego |
Wang, Kaiyuan | University of California, San Diego |
Yip, Michael C. | University of California, San Diego |
Yu, Shan | Institute of Automation, Chinese Academy of Sciences |
Keywords: Deep Learning for Visual Perception, Computer Vision for Automation, Visual Learning
Abstract: Towards flexible object-centric visual perception, we propose a one-shot instance-aware object keypoint (OKP) extraction approach, AnyOKP, which leverages the powerful representation ability of pretrained vision transformer (ViT), and can obtain keypoints on multiple object instances of arbitrary category after learning from a support image. An off-the-shelf petrained ViT is directly deployed for generalizable and transferable feature extraction, which is followed by training-free feature enhancement. The best-prototype pairs (BPPs) are searched for in support and query images based on appearance similarity, to yield instance-unaware candidate keypoints. Then, the entire graph with all candidate keypoints as vertices are divided into sub-graphs according to the feature distributions on the graph edges. Finally, each sub-graph represents an object instance. AnyOKP is evaluated on real object images collected with the cameras of a robot arm, a mobile robot, and a surgical robot, which not only demonstrates the cross-category flexibility and instance awareness, but also show remarkable robustness to domain shift and viewpoint change.
|
|
10:30-12:00, Paper ThAT5-CC.3 | Add to My Program |
RenderOcc: Vision-Centric 3D Occupancy Prediction with 2D Rendering Supervision |
|
Pan, Mingjie | Peking University |
Liu, Jiaming | Peking University |
Zhang, Renrui | CUHK |
Huang, Peixiang | Peking University |
Li, Xiaoqi | Peking University |
Wang, Bing | NTU |
Xie, Hongwei | Nanjing University |
Liu, Li | Xiaomi Car |
Zhang, Shanghang | Peking University |
Keywords: Deep Learning for Visual Perception, Computer Vision for Transportation, Visual Learning
Abstract: 3D occupancy prediction holds significant promise in the fields of robot perception and autonomous driving, which quantifies 3D scenes into grid cells with semantic labels. Recent works mainly utilize complete occupancy labels in 3D voxel space for supervision. However, the expensive annotation process and sometimes ambiguous labels have severely constrained the usability and scalability of 3D occupancy models. To address this, we present RenderOcc, a novel paradigm for training 3D occupancy models only using 2D labels. Specifically, we extract a NeRF-style 3D volume representation from multi-view images, and employ volume rendering techniques to establish 2D renderings, thus enabling direct 3D supervision from 2D semantics and depth labels. Additionally, we introduce an Auxiliary Ray method to tackle the issue of sparse viewpoints in autonomous driving scenarios, which leverages sequential frames to construct comprehensive 2D rendering for each object. To our best knowledge, RenderOcc is the first attempt to train multi-view 3D occupancy models only using 2D labels, reducing the dependence on costly 3D occupancy annotations. Extensive experiments demonstrate that RenderOcc achieves comparable performance to models fully supervised with 3D labels, underscoring the significance of this approach in real-world applications.
|
|
10:30-12:00, Paper ThAT5-CC.4 | Add to My Program |
DRO: Deep Recurrent Optimizer for Video to Depth |
|
Gu, Xiaodong | Alibaba Group |
Yuan, Weihao | Hong Kong University of Science and Technology |
Dai, Zuozhuo | Alibaba Group |
Zhu, Siyu | Alibaba AI Lab |
Tang, Chengzhou | Simon Fraser University |
Dong, Zilong | Company |
Tan, Ping | Simon Fraser University |
Keywords: Deep Learning for Visual Perception, Computer Vision for Transportation, Visual Learning
Abstract: There are increasing interests of studying the video-to-depth (V2D) problem with machine learning techniques. While earlier methods directly learn a mapping from images to depth maps and camera poses, more recent works enforce multi-view geometry constraints through optimization embedded in the learning framework. This paper presents a novel optimization method based on recurrent neural networks to further exploit the potential of neural networks in V2D. Specifically, our neural optimizer alternately updates the depth and camera poses through iterations to minimize a feature-metric cost, and two gated recurrent units iteratively improve the results by tracing historical information. In this way, our network is a gradient-free zeroth-order optimizer designed for V2D and can be applied to both supervised and self-supervised V2D. Extensive experimental results demonstrate that our method outperforms previous methods and is more efficient in computation and memory consumption than cost-volume-based methods. In particular, our self-supervised method outperforms previous supervised methods on the KITTI and ScanNet datasets. Our source code will be made public.
|
|
10:30-12:00, Paper ThAT5-CC.5 | Add to My Program |
Doduo: Learning Dense Visual Correspondence from Unsupervised Semantic-Aware Flow |
|
Jiang, Zhenyu | The Unversity of Texas at Austin |
Jiang, Hanwen | UT Austin |
Zhu, Yuke | The University of Texas at Austin |
Keywords: Deep Learning for Visual Perception, Perception for Grasping and Manipulation, Visual Learning
Abstract: Dense visual correspondence plays a vital role in robotic perception. This work focuses on establishing the dense correspondence between a pair of images that captures dynamic scenes undergoing substantial transformations. We introduce Doduo to learn general dense visual correspondence from in-the-wild images and videos without ground truth supervision. Given a pair of images, it estimates the dense flow field encoding the displacement of each pixel in one image to its corresponding pixel in the other image. Doduo uses flow-based warping to acquire supervisory signals for the training. Incorporating semantic priors with self-supervised flow training, Doduo produces accurate dense correspondence robust to the dynamic changes of the scenes. Trained on an in-the-wild video dataset, Doduo illustrates superior performance on point-level correspondence estimation over existing self-supervised correspondence learning baselines. We also apply Doduo to articulation estimation and zero-shot goal-conditioned manipulation, underlining its practical applications in robotics.
|
|
10:30-12:00, Paper ThAT5-CC.6 | Add to My Program |
RoboLLM: Robotic Vision Tasks Grounded on Multimodal Large Language Models |
|
Long, Zijun | University of Glasgow |
Killick, George | School of Computing Science, University of Glasgow |
McCreadie, Richard | University of Glasgow |
Aragon-Camarasa, Gerardo | University of Glasgow |
Keywords: Deep Learning for Visual Perception, Recognition
Abstract: Robotic vision applications often necessitate a wide range of visual perception tasks, such as object detection, segmentation, and identification. While there have been substantial advances in these individual tasks, integrating specialized models into a unified vision pipeline presents significant engineering challenges and costs. Recently, Multimodal Large Language Models (MLLMs) have emerged as novel backbones for various downstream tasks. We argue that leveraging the pre-training capabilities of MLLMs enables the creation of a simplified framework, thus mitigating the need for task-specific encoders. Specifically, the large-scale pretrained knowledge in MLLMs allows for easier fine-tuning to downstream robotic vision tasks and yields superior performance. We introduce the RoboLLM framework, equipped with a BEiT-3 backbone, to address all visual perception tasks in the ARMBench challenge—a large-scale robotic manipulation dataset about real-world warehouse scenarios. RoboLLM not only outperforms existing baselines but also substantially reduces the engineering burden associated with model selection and tuning. All the code used in this paper can be found in https://github.com/longkukuhi/RoboLLM.
|
|
10:30-12:00, Paper ThAT5-CC.7 | Add to My Program |
CrossVideo: Self-Supervised Cross-Modal Contrastive Learning for Point Cloud Video Understanding |
|
Liu, Yunze | Tsinghua University |
Chen, Changxi | Tsinghua University |
Wang, Zifan | Tsinghua University |
Yi, Li | Tsinghua University |
Keywords: Deep Learning for Visual Perception, RGB-D Perception, Visual Learning
Abstract: This paper introduces a novel approach named CrossVideo, which aims to enhance self-supervised cross-modal contrastive learning in the field of point cloud video understanding. Traditional supervised learning methods encounter limitations due to data scarcity and challenges in label acquisition. To address these issues, we propose a self-supervised learning method that leverages the cross-modal relationship between point cloud videos and image videos to acquire meaningful feature representations. Intra-modal and cross-modal contrastive learning techniques are employed to facilitate effective comprehension of point cloud video. We also propose a multi-level contrastive approach for both modalities. Through extensive experiments, we demonstrate that our method significantly surpasses previous state-of-the-art approaches, and we conduct comprehensive ablation studies to validate the effectiveness of our proposed designs.
|
|
10:30-12:00, Paper ThAT5-CC.8 | Add to My Program |
FSNet: Redesign Self-Supervised MonoDepth for Full-Scale Depth Prediction for Autonomous Driving (I) |
|
Liu, Yuxuan | Hong Kong University of Science and Technology |
Xu, Zhenhua | The Hong Kong University of Science and Technology |
Huang, Huaiyang | The Hong Kong University of Science and Technology |
Wang, Lujia | The Hong Kong University of Technology (Guangzhou) |
Liu, Ming | Hong Kong University of Science and Technology (Guangzhou) |
Keywords: Deep Learning for Visual Perception, Visual Learning, RGB-D Perception
Abstract: Predicting accurate depth with monocular images is important for low-cost robotic applications and autonomous driving. This study proposes a comprehensive self-supervised framework for accurate scale-aware depth prediction on autonomous driving scenes utilizing inter-frame poses obtained from inertial measurements. In particular, we introduce a Full-Scale depth prediction network named FSNet. FSNet contains four important improvements over existing self-supervised models: (1) a multichannel output representation for stable training of depth prediction in driving scenarios, (2) an optical-flow-based mask designed for dynamic object removal, (3) a self-distillation training strategy to augment the training process, and (4) an optimization-based post-processing algorithm in test time, fusing the results from visual odometry. With this framework, robots and vehicles with only one well-calibrated camera can collect sequences of training image frames and camera poses, and infer accurate 3D depths of the environment without extra labeling work or 3D data. Extensive experiments on the KITTI dataset, KITTI-360 dataset and the nuScenes dataset demonstrate the potential of FSNet. More visualizations are presented in url{https://sites.google.com/view/fsnet/home}
|
|
10:30-12:00, Paper ThAT5-CC.9 | Add to My Program |
V2CE: Video to Continuous Events Simulator |
|
Zhang, Zhongyang | University of California San Diego |
Cui, Shuyang | University of California, San Diego |
Chai, Kaidong | University of Massachusetts Amherst |
Yu, Haowen | University of Massachusetts Amherst |
Dasgupta, Subhasis | University of California San Diego |
Mahbub, Upal | Qualcomm |
Rahman, Tauhidur | University of California San Diego |
Keywords: Deep Learning for Visual Perception, Visual Learning, Simulation and Animation
Abstract: Dynamic Vision Sensor (DVS)-based solutions have recently garnered significant interest across various computer vision tasks, offering notable benefits in terms of dynamic range, temporal resolution, and inference speed. However, as a relatively nascent vision sensor compared to Active Pixel Sensor (APS) devices such as RGB cameras, DVS suffers from a dearth of ample labeled datasets. Prior efforts to convert APS data into events often grapple with issues such as a considerable domain shift from real events, the absence of quantified validation, and layering problems within the time axis. In this paper, we present a novel method for video-to-events stream conversion from multiple perspectives, considering the specific characteristics of DVS. A series of carefully designed losses helps enhance the quality of generated event voxels significantly. We also propose a novel local dynamic-aware timestamp inference strategy to accurately recover event timestamps from event voxels in a continuous fashion and eliminate the temporal layering problem. Results from rigorous validation through quantified metrics at all stages of the pipeline establish our method unquestionably as the current state-of-the-art (SOTA). The code can be found at bit.ly/v2ce.
|
|
ThAT6-CC Oral Session, CC-414 |
Add to My Program |
Computer Vision for Automation |
|
|
Chair: Bennewitz, Maren | University of Bonn |
Co-Chair: Yamakawa, Yuji | The University of Tokyo |
|
10:30-12:00, Paper ThAT6-CC.1 | Add to My Program |
Physically Grounded Vision-Language Models for Robotic Manipulation |
|
Gao, Jensen | Stanford University |
Sarkar, Bidipta | Stanford University |
Xia, Fei | Google Inc |
Xiao, Ted | Google |
Wu, Jiajun | Stanford University |
Ichter, Brian | Google Brain |
Majumdar, Anirudha | Princeton University |
Sadigh, Dorsa | Stanford University |
Keywords: Data Sets for Robotic Vision, Deep Learning for Visual Perception, Big Data in Robotics and Automation
Abstract: Recent advances in vision-language models (VLMs) have led to improved performance on tasks such as visual question answering and image captioning. Consequently, these models are now well-positioned to reason about the physical world, particularly within domains such as robotic manipulation. However, current VLMs are limited in their understanding of the physical concepts (e.g., material, fragility) of common objects, which restricts their usefulness for robotic manipulation tasks that involve interaction and physical reasoning about such objects. To address this limitation, we propose PhysObjects, an object-centric dataset of 39.6K crowd-sourced and 417K automated physical concept annotations of common household objects. We demonstrate that fine-tuning a VLM on PhysObjects improves its understanding of physical object concepts, including generalization to held-out concepts, by capturing human priors of these concepts from visual appearance. We incorporate this physically grounded VLM in an interactive framework with a large language model-based robotic planner, and show improved planning performance on tasks that require reasoning about physical object concepts, compared to baselines that do not leverage physically grounded VLMs. We additionally illustrate the benefits of our physically grounded VLM on a real robot, where it improves task success rates. We release our dataset and provide further details and visualizations of our results at https://iliad.stanford.edu/pg-vlm/.
|
|
10:30-12:00, Paper ThAT6-CC.2 | Add to My Program |
How Many Views Are Needed to Reconstruct an Unknown Object Using NeRF? |
|
Pan, Sicong | University of Bonn |
Jin, Liren | University of Bonn |
Hu, Hao | Fudan University |
Popovic, Marija | University of Bonn |
Bennewitz, Maren | University of Bonn |
Keywords: Deep Learning for Visual Perception, Computer Vision for Automation
Abstract: Neural Radiance Fields (NeRFs) are gaining significant interest for online active object reconstruction due to their exceptional memory efficiency and requirement for only posed RGB inputs. Previous NeRF-based view planning methods exhibit computational inefficiency since they rely on an iterative paradigm, consisting of (1) retraining the NeRF when new images arrive; and (2) planning a path to the next best view only. To address these limitations, we propose a non-iterative pipeline based on the Prediction of the Required number of Views (PRV). The key idea behind our approach is that the required number of views to reconstruct an object depends on its complexity. Therefore, we design a deep neural network, named PRVNet, to predict the required number of views, allowing us to tailor the data acquisition based on the object complexity and plan a globally shortest path. To train our PRVNet, we generate supervision labels using the ShapeNet dataset. Simulated experiments show that our PRV-based view planning method outperforms baselines, achieving good reconstruction quality while significantly reducing movement cost and planning time. We further justify the generalization ability of our approach in a real-world experiment.
|
|
10:30-12:00, Paper ThAT6-CC.3 | Add to My Program |
Active Implicit Reconstruction Using One-Shot View Planning |
|
Hu, Hao | Fudan University |
Pan, Sicong | University of Bonn |
Jin, Liren | University of Bonn |
Popovic, Marija | University of Bonn |
Bennewitz, Maren | University of Bonn |
Keywords: Deep Learning for Visual Perception, Computer Vision for Automation
Abstract: Active object reconstruction using autonomous robots is gaining great interest. A primary goal in this task is to maximize the information of the object to be reconstructed, given limited on-board resources. Previous view planning methods exhibit inefficiency since they rely on an iterative paradigm based on explicit representations, consisting of (1) planning a path to the next-best view only; and (2) requiring a considerable number of less-gain views in terms of surface coverage. To address these limitations, we propose to integrate implicit representations into the One-Shot View Planning (OSVP). The key idea behind our approach is to use implicit representations to obtain the small missing surface areas instead of observing them with extra views. Therefore, we design a deep neural network, named OSVP, to directly predict a set of views given a dense point cloud refined from an initial sparse observation. To train our OSVP network, we generate supervision labels using dense point clouds refined by implicit representations and set covering optimization problems. Simulated experiments show that our method achieves sufficient reconstruction quality, outperforming several baselines under limited view and movement budgets. We further demonstrate the applicability of our approach in a real-world object reconstruction scenario.
|
|
10:30-12:00, Paper ThAT6-CC.4 | Add to My Program |
Multimodal Object Query Initialization for 3D Object Detection |
|
van Geerenstein, Mathijs Ruben | Delft University of Technology |
Ruppel, Felicia | Bosch Research and Ulm University |
Dietmayer, Klaus | University of Ulm |
Gavrila, Dariu | Delft University of Technology |
Keywords: Deep Learning for Visual Perception, Computer Vision for Automation, Sensor Fusion
Abstract: 3D object detection models that exploit both LiDAR and camera sensor features are top performers in large-scale autonomous driving benchmarks. A transformer is a popular network architecture used for this task, in which so-called object queries act as candidate objects. Initializing these object queries based on current sensor inputs is a common practice. For this, existing methods strongly rely on LiDAR data however, and do not fully exploit image features. Besides, they introduce significant latency. To overcome these limitations we propose EfficientQ3M, an efficient, modular, and multimodal solution for object query initialization for transformer-based 3D object detection models. The proposed initialization method is combined with a "modality-balanced" transformer decoder where the queries can access all sensor modalities throughout the decoder. In experiments, we outperform the state of the art in transformer-based LiDAR object detection on the competitive nuScenes benchmark and showcase the benefits of input-dependent multimodal query initialization, while being more efficient than the available alternatives for LiDAR-camera intitialization. The proposed method can be applied with any combination of sensor modalities as input, demonstrating its modularity.
|
|
10:30-12:00, Paper ThAT6-CC.5 | Add to My Program |
SM^3: Self-Supervised Multi-Task Modeling with Multi-View 2D Images for Articulated Objects |
|
Wang, Haowen | Beijing University of Posts and Telecommunications |
Zhao, Zhen | Midea Group |
Jin, Zhao | Midea Group |
Che, Zhengping | Midea Group |
Qiao, Liang | Beijing University of Posts and Telecommunications |
Yakun, Huang | Beijing University of Posts and Telecommunications |
Fan, Zhipeng | Beijing University of Posts and Telecommunications |
XiuQuan, Qiao | Beijing University of Posts and Telecommunications |
Tang, Jian | Midea Group (Shanghai) Co., Ltd |
Keywords: Deep Learning for Visual Perception, Computer Vision for Automation, Simulation and Animation
Abstract: Reconstructing real-world objects and estimating their movable joint structures are pivotal technologies within the field of robotics. Previous research has predominantly focused on supervised approaches, relying on extensively annotated datasets to model articulated objects within limited categories. However, this approach falls short of effectively addressing the diversity present in the real world. To tackle this issue, we propose a self-supervised interaction perception method, referred to as SM3, which leverages multi-view RGB images captured before and after interaction to model articulated objects, identify movable parts, and infer the parameters of their rotating joints. By constructing 3D geometries and textures from the captured 2D images, SM3 achieves integrated optimization of movable parts and joint parameters during the reconstruction process, obviating the need for annotations. Furthermore, we introduce the MMArt dataset, an extension of PartNet-Mobility, encompassing multi-view and multi-modal data of articulated objects spanning diverse categories. Evaluations demonstrate that UM3 surpasses existing benchmarks across various categories and objects, while its adaptability in real-world scenarios has been duly validated.
|
|
10:30-12:00, Paper ThAT6-CC.6 | Add to My Program |
MF-MOS: A Motion-Focused Model for Moving Object Segmentation |
|
Cheng, Jintao | South China Normal University |
Zeng, Kang | South China Normal University |
Huang, Zhuoxu | Aberystwyth University |
Tang, Xiaoyu | South China Normal University |
Jin, Wu | UESTC |
Zhang, Chengxi | Jiangnan Universtiy |
Chen, Xieyuanli | National University of Defense Technology |
Fan, Rui | Tongji University |
Keywords: Deep Learning for Visual Perception, Computer Vision for Automation, SLAM
Abstract: Moving object segmentation (MOS) provides a reliable solution for detecting traffic participants and thus is of great interest in the autonomous driving field. Dynamic capture is always critical in the MOS problem. While previous methods capture motion features from the range images directly, we argue that the residual maps provide greater potential for motion information, and on the other hand, range images contain rich semantic guidance. Based on this intuition, we propose MF-MOS, a novel motion-focused model with a dual-branch structure for Lidar moving object segmentation. Novelly, we decouple the spatial-temporal information by capturing the motion from residual maps and generating semantic features from range images, which are used as movable object guidance for the motion branch. Our straightforward yet distinctive solution can make the most use of both range images and residual maps, thus greatly improving the performance of the Lidar-based MOS task. Remarkably, our MF-MOS achieved a leading IoU of 76.7% on the MOS leaderboard of the SemanticKITTI dataset upon submission, demonstrating the current state-of-the-art performance. The implementation of our MF-MOS has been released at https://github.com/SCNU-RISLAB/MF-MOS.
|
|
10:30-12:00, Paper ThAT6-CC.7 | Add to My Program |
Improving Neural Indoor Surface Reconstruction with Mask-Guided Adaptive Consistency Constraints |
|
Yu, Xinyi | Zhejiang University of Technology |
Lu, Liqin | Zhejiang University of Technology |
Jintao, Rong | Zhejiang University of Technology |
Xu, Guangkai | Zhejiang University |
Ou, Linlin | Zhejiang University of Technology |
Keywords: Deep Learning for Visual Perception, Computer Vision for Automation, Visual Learning
Abstract: 3D scene reconstruction from 2D images has been a long-standing task. Instead of estimating per-frame depth maps and fusing them in 3D, recent research leverages the neural implicit surface as a unified representation for 3D reconstruction. Equipped with data-driven pre-trained geometric cues, these methods have demonstrated promising performance. However, inaccurate prior estimation, which is usually inevitable, can lead to suboptimal reconstruction quality, particularly in some geometrically complex regions. In this paper, we propose a two-stage training process, decouple view-dependent and view-independent colors, and leverage two novel consistency constraints to enhance detail reconstruction performance without requiring extra priors. Additionally, we introduce an essential mask scheme to adaptively influence the selection of supervision constraints, thereby improving performance in a self-supervised paradigm. Experiments on synthetic and real-world datasets show the capability of reducing the interference from prior estimation errors and achieving high-quality scene reconstruction with rich geometric details.
|
|
10:30-12:00, Paper ThAT6-CC.8 | Add to My Program |
NFL: Normal Field Learning for 6-DoF Grasping of Transparent Objects |
|
Lee, Junho | Seoul National University |
Kim, Sang Min | Seoul National University |
Lee, Yonghyeon | Korea Institute for Advanced Study |
Kim, Young Min | Seoul National University |
Keywords: Deep Learning for Visual Perception, Deep Learning in Grasping and Manipulation, Grasping
Abstract: We present Normal Field Learning (NFL), a robust yet practical solution to perceive 3D layouts of transparent objects and grasp them quickly. Conventional input modalities for vision-based grasping do not provide sufficient information for transparent objects. However, with the recent advance on datasets and algorithms for transparent objects, we can at least obtain noisy estimates of normals and masks for various real-world conditions. Instead of directly using the RGB images, we propose to use the estimates to train a neural volume, which serves as an intermediate representation ignorant of challenging appearance variations. We formulate the training objective to account for in- herent uncertainty in individual estimation, and together with the volumetric aggregation, we can reliably extract useful geometric information for grasping. Our neural volume deploys a voxel- grid based representation, motivated by acceleration techniques of neural radiance fields. However, we directly store the normal and density values in the grid cells instead of latent features. Our modification allows direct access to the geometric values without additional inference or volume rendering, further enhancing the efficiency. Our results show over 85% success rates in grasping in cluttered scenes with only 40 seconds of training time.
|
|
10:30-12:00, Paper ThAT6-CC.9 | Add to My Program |
TRTM: Template-Based Reconstruction and Target-Oriented Manipulation of Crumpled Cloths |
|
Wang, Wenbo | ETH Zurich |
Li, Gen | ETH Zurich |
Zamora Mora, Miguel Angel | ETH Zurich |
Coros, Stelian | ETH Zurich |
Keywords: Deep Learning for Visual Perception, Perception for Grasping and Manipulation, Data Sets for Robotic Vision
Abstract: Precise reconstruction and manipulation of the crumpled cloths is challenging due to the high dimensionality of cloth models, as well as the limited observation at self-occluded regions. We leverage the recent progress in the field of single-view human reconstruction to template-based reconstruct crumpled cloths from their top-view depth observations only, with our proposed sim-real registration protocols. In contrast to previous implicit cloth representations, our reconstruction mesh explicitly describes the positions and visibilities of the entire cloth mesh vertices, enabling more efficient dual-arm and single-arm target-oriented manipulations. Experiments demonstrate that our TRTM system can be applied to daily cloths that have similar topologies as our template mesh, but with different shapes, sizes, patterns, and physical properties. Videos, datasets, pre-trained models, and code can be downloaded from our project website: https://wenbwa.github.io/TRTM/.
|
|
ThAT7-CC Oral Session, CC-416 |
Add to My Program |
Model Learning for Control |
|
|
Chair: Shin, Hyo-Sang | Cranfield University |
Co-Chair: Tilbury, Dawn | University of Michigan |
|
10:30-12:00, Paper ThAT7-CC.1 | Add to My Program |
Active Learning of Discrete-Time Dynamics for Uncertainty-Aware Model Predictive Control |
|
Saviolo, Alessandro | New York University |
Frey, Jonathan | University of Freiburg |
Rathod, Abhishek | University of Idaho |
Diehl, Moritz | Univ. of Heidelberg |
Loianno, Giuseppe | New York University |
Keywords: Model Learning for Control, Aerial Systems: Mechanics and Control, Learning and Adaptive Systems, Optimization and Optimal Control
Abstract: Model-based control requires an accurate model of the system dynamics for precisely and safely controlling the robot in complex and dynamic environments. Moreover, in presence of variations in the operating conditions, the model should be continuously refined to compensate for dynamics changes. In this paper, we present a self-supervised learning approach that actively models the dynamics of nonlinear robotic systems. We combine offline learning from past experience and online learning from current robot interaction with the unknown environment. These two ingredients enable a highly sample-efficient and adaptive learning process, capable of accurately inferring model dynamics in real-time even in operating regimes that greatly differ from the training distribution. Moreover, we design an uncertainty-aware model predictive controller that is heuristically conditioned to the aleatoric (data) uncertainty of the learned dynamics. This controller actively chooses the optimal control actions that (i) optimize the control performance and (ii) improve the efficiency of online learning sample collection. We demonstrate the effectiveness of our method through a series of cha
|
|
10:30-12:00, Paper ThAT7-CC.2 | Add to My Program |
SculptBot: Pre-Trained Models for 3D Deformable Object Manipulation |
|
Bartsch, Alison | Carnegie Mellon University |
Avra, Charlotte | Carnegie Mellon University |
Barati Farimani, Amir | Carnegie Mellon University |
Keywords: Model Learning for Control, AI-Based Methods, Dexterous Manipulation
Abstract: Deformable object manipulation presents a unique set of challenges in robotic manipulation by exhibiting high degrees of freedom and severe self-occlusion. State representation for materials that exhibit plastic behavior, like modeling clay or bread dough, is also difficult because they permanently deform under stress and are constantly changing shape. In this work, we investigate each of these challenges using the task of robotic sculpting with a parallel gripper. We propose a system that uses point clouds as the state representation and leverages pre-trained point cloud reconstruction Transformer to learn a latent dynamics model to predict material deformations given a grasp action. We design a novel action sampling algorithm that reasons about geometrical differences between point clouds to further improve the efficiency of model-based planners. All data and experiments are conducted entirely in the real world. Our experiments show the proposed system is able to successfully capture the dynamics of clay, and is able to create a variety of simple shapes. Videos and additional figures are available on our project page at: https://sites.google.com/andrew.cmu.edu/sculptbot
|
|
10:30-12:00, Paper ThAT7-CC.3 | Add to My Program |
Learning-Based Model Predictive Control for an Autonomous Formula Student Racing Car |
|
Gomes, David | Instituto Superior Técnico, University of Lisbon |
Botto, Miguel | DEM/IST |
Lima, Pedro U. | Instituto Superior Técnico - Institute for Systems and Robotics |
Keywords: Model Learning for Control, Autonomous Agents, Machine Learning for Robot Control
Abstract: Advancements in Automated Driving Systems (ADSs) have enabled the achievement of a certain level of autonomy while commuting in a car. However, emergency and high-speed maneuvers still arise as significant challenges for ADSs due to the intrinsic nonlinearity and fast-paced behavior of such events. These maneuvers are a distinctive feature within the recently established motorsport discipline of Autonomous Racing (AR). In this work, we explore the use of Learning-based Model Predictive Control (LMPC) to address possible model mismatches of the first principles model in high-speed racing. To this end, a Model Predictive Contouring Control (MPCC) (a specific formulation of the standard Model Predictive Control, MPC) is formulated, and a Neural Network (NN) that leverages the use of Feedforward and Recurrent layers is employed to learn the errors of the first principles model. By combining the NN with the first principles model, the LMPC is born, capable of accurately predicting the future with a computational effort compatible with real-time feasibility, effectively handling the vehicle at its limits. Furthermore, the controller can adapt to changing environments by training the NN during the race. The MPCC (formulation without the NN) is deployed on a real autonomous formula student racing car showing an improvement of 16% in mean lap times across the same track between a common geometric controller. The LMPC is analyzed in a high-fidelity simulator, achieving an improvement of 8.9% in mean lap times when compared to the MPCC.
|
|
10:30-12:00, Paper ThAT7-CC.4 | Add to My Program |
Learning Terrain-Aware Kinodynamic Model for Autonomous Off-Road Rally Driving with Model Predictive Path Integral Control |
|
Lee, Hojin | Agency for Defense Development |
Kim, Taekyung | Agency for Defense Development |
Mun, Jungwi | Agency for Defense Development |
Lee, Wonsuk | Agency for Defense Development |
Keywords: Model Learning for Control, Autonomous Vehicle Navigation, Field Robots
Abstract: High-speed autonomous driving in off-road environments has immense potential for various applications, but it also presents challenges due to the complexity of vehicle-terrain interactions. In such environments, it is crucial for the vehicle to predict its motion and adjust its controls proactively in response to environmental changes, such as variations in terrain elevation. To this end, we propose a method for learning terrain-aware kinodynamic model which is conditioned on both proprioceptive and exteroceptive information. The proposed model generates reliable predictions of 6-degree-of-freedom motion and can even estimate contact interactions without requiring ground truth force data during training. This enables the design of a safe and robust model predictive controller through appropriate cost function design which penalizes sampled trajectories with unstable motion, unsafe interactions, and high levels of uncertainty derived from the model. We demonstrate the effectiveness of our approach through experiments on a simulated off-road track, showing that our proposed model-controller pair outperforms the baseline and ensures robust high-speed driving performance without control failure.
|
|
10:30-12:00, Paper ThAT7-CC.5 | Add to My Program |
Adaptive Gait Modeling and Optimization for Principally Kinematic Systems |
|
Deng, Siming | Johns Hopkins University |
Cowan, Noah J. | Johns Hopkins University |
Bittner, Brian | JHUAPL |
Keywords: Model Learning for Control, Calibration and Identification, Optimization and Optimal Control
Abstract: Robotic adaptation to unanticipated operating conditions is crucial to achieving persistence and robustness in complex real world settings. For a wide range of cutting-edge robotic systems, such as micro- and nano-scale robots, soft robots, medical robots, and bio-hybrid robots, it is infeasible to anticipate the operating environment a priori due to complexities that arise from numerous factors including imprecision in manufacturing, chemo-mechanical forces, and poorly understood contact mechanics. Drawing inspiration from data-driven modeling, geometric mechanics (or gauge theory), and adaptive control, we employ an adaptive system identification framework and demonstrate its efficacy in enhancing the performance of principally kinematic locomotors (those governed by Rayleigh dissipation or zero momentum conservation). We showcase the capability of the adaptive model to efficiently accommodate varying terrains and iteratively modified behaviors within a behavior optimization framework. This provides both the ability to improve fundamental behaviors and perform motion tracking to precision. Notably, we are capable of optimizing the gaits of the Purcell swimmer using approximately 10 cycles per link, which for the nine-link Purcell swimmer provides a factor of ten improvement in optimization speed over the state of the art. Beyond simply a computational speed up, this ten-fold improvement may enable this method to be successfully deployed for in-situ behavior refinement, injury recovery, and terrain adaptation, particularly in domains where simulations provide poor guides for the real world.
|
|
10:30-12:00, Paper ThAT7-CC.6 | Add to My Program |
Recursive Least Squares with Log-Determinant Divergence Regularisation for Online Inertia Identification |
|
Cho, Namhoon | Cranfield University |
Lee, Taeyoon | Naver Labs |
Shin, Hyo-Sang | Cranfield University |
Keywords: Model Learning for Control, Calibration and Identification, Robust/Adaptive Control
Abstract: This study presents a recursive algorithm for solving the regularised least squares problem for online identification of rigid body dynamic model parameters with emphasis on the physical consistency of estimated inertial parameters. One of the geometric approaches is to use a regulariser that represents how close the pseudo-inertia matrix is to a given reference on the feasible manifold in the regression problem. The proposed extension enables memory-efficient online learning in addition to the benefits of geometry-aware convex regularisation using the log-determinant divergence of the pseudo-inertia matrix. Also, the resursive version endows the estimator with the capability to deal with time-variation of parameters by introducing an optional forgetting mechanism. The characteristics of the recursive regularised least squares algorithm is demonstrated using the MIT Cheetah 3 leg swinging experiment dataset and compared to the existing batch optimisation method.
|
|
10:30-12:00, Paper ThAT7-CC.7 | Add to My Program |
Sequential Manipulation of Deformable Linear Object Networks with Endpoint Pose Measurements Using Adaptive Model Predictive Control |
|
Toner, Tyler | University of Michigan |
Molazadeh, Vahidreza | Carnegie Mellon University |
Saez, Miguel | General Motors |
Tilbury, Dawn | University of Michigan |
Barton, Kira | University of Michigan at Ann Arbor |
Keywords: Model Learning for Control, Compliant Assembly, Robust/Adaptive Control
Abstract: Robotic manipulation of deformable linear objects (DLOs) is an active area of research, though emerging applications, like automotive wire harness installation, introduce constraints that have not been considered in prior work. Confined workspaces and limited visibility complicate prior assumptions of multi-robot manipulation and direct measurement of DLO configuration (state). This work focuses on single-arm manipulation of stiff DLOs (StDLOs) connected to form a DLO network (DLON), for which the measurements (output) are the endpoint poses of the DLON, which are subject to unknown dynamics during manipulation. To demonstrate feasibility of output-based control without state estimation, direct input-output dynamics are shown to exist by training neural network models on simulated trajectories. Output dynamics are then approximated with polynomials and found to contain well-known rigid body dynamics terms. A composite model consisting of a rigid body model and an online data-driven residual is developed, which predicts output dynamics more accurately than either model alone, and without prior experience with the system. An adaptive model predictive controller is developed with the composite model for DLON manipulation, which completes DLON installation tasks, both in simulation and with a physical automotive wire harness.
|
|
10:30-12:00, Paper ThAT7-CC.8 | Add to My Program |
Physics-Informed Neural Network for Multirotor Slung Load Systems Modeling |
|
Serrano, Gil | Instituto Superior Técnico, University of Lisbon |
Jacinto, Marcelo | Instituto Superior Técnico |
Ribeiro-Gomes, Jose | Instituto Superior Tecnico, University of Lisbon |
Pinto, Joao | Instituto Superior Tecnico, Universidade De Lisboa |
Guerreiro, Bruno J. N. | NOVA School of Science and Technology |
Bernardino, Alexandre | IST - Técnico Lisboa |
Cunha, Rita | Instituto Superior Tecnico |
Keywords: Model Learning for Control, Deep Learning Methods, Aerial Systems: Applications
Abstract: Recent advances in aerial robotics have enabled the use of multirotor vehicles for autonomous payload transportation. Resorting only to classical methods to reliably model a quadrotor carrying a cable-slung load poses significant challenges. On the other hand, purely data-driven learning methods do not comply by design with the problem's physical constraints, especially in states that are not densely represented in training data. In this work, we explore the use of physics-informed neural networks to learn an end-to-end model of the multirotor-slung-load system and, at a given time, estimate a sequence of the future system states. An LSTM encoder-decoder with an attention mechanism is used to capture the dynamics of the system. To guarantee the cohesiveness between the multiple predicted states of the system, we propose the use of a physics-based term in the loss function, which includes a discretized physical model derived from first principles together with slack variables that allow for a small mismatch between expected and predicted values. To train the model, a dataset using a real-world quadrotor carrying a slung load was curated and is made available. Prediction results are presented and corroborate the feasibility of the approach. The proposed method outperforms both the first principles physical model and a comparable neural network model trained without the physics regularization proposed.
|
|
10:30-12:00, Paper ThAT7-CC.9 | Add to My Program |
A Probabilistic Motion Model for Skid-Steer Wheeled Mobile Robot Navigation on Off-Road Terrains |
|
Trivedi, Ananya | Northeastern University |
Zolotas, Mark | Northeastern University |
Abbas, Adeeb | Northeastern Univeristy |
Prajapati, Sarvesh | Northeastern University |
Bazzi, Salah | Northeastern University |
Padir, Taskin | Northeastern University |
Keywords: Model Learning for Control, Dynamics, Machine Learning for Robot Control
Abstract: Skid-Steer Wheeled Mobile Robots (SSWMRs) are increasingly being used for off-road autonomy applications. When turning at high speeds, these robots tend to undergo significant skidding and slipping. In this work, using Gaussian Process Regression (GPR) and Sigma-Point Transforms, we estimate the non-linear effects of tire-terrain interaction on robot velocities in a probabilistic fashion. Using the mean estimates from GPR, we propose a data-driven dynamic motion model that is more accurate at predicting future robot poses than conventional kinematic motion models. By efficiently solving a convex optimization problem based on the history of past robot motion, the GPR augmented motion model generalizes to previously unseen terrain conditions. The output distribution from the proposed motion model can be used for local motion planning approaches, such as stochastic model predictive control, leveraging model uncertainty to make safe decisions. We validate our work on a benchmark real-world multi-terrain SSWMR dataset. Our results show that the model generalizes to three different terrains while significantly reducing errors in linear and angular motion predictions. As shown in the attached video, we perform a separate set of experiments on a physical robot to demonstrate the robustness of the proposed algorithm.
|
|
ThAT8-CC Oral Session, CC-418 |
Add to My Program |
Learning in Field Robotics |
|
|
Chair: Gao, Zhi | Temasek Laboratories @ NUS |
Co-Chair: Sugiura, Hisashi | Yanmar Co., Ltd |
|
10:30-12:00, Paper ThAT8-CC.1 | Add to My Program |
TartanDrive 2.0: More Modalities and Better Infrastructure to Further Self-Supervised Learning Research in Off-Road Driving Tasks |
|
Sivaprakasam, Matthew | Carnegie Mellon University |
Maheshwari, Parv | Indian Institute of Technology Kharagpur |
Guaman Castro, Mateo | Carnegie Mellon University |
Triest, Samuel | Carnegie Mellon University |
Nye, Micah | University of Pittsburgh |
Willits, Steven | Carnegie Mellon University |
Saba, Andrew | Carnegie Mellon University |
Wang, Wenshan | Carnegie Mellon University |
Scherer, Sebastian | Carnegie Mellon University |
Keywords: Data Sets for Robot Learning, Field Robots, Big Data in Robotics and Automation
Abstract: We present TartanDrive 2.0, a large-scale off-road driving dataset for self-supervised learning tasks. In 2021 we released TartanDrive 1.0, which is one of the largest datasets for off-road terrain. As a follow up to our original dataset, we collected seven hours of data at speeds of up to 15m/s with the addition of three new LiDAR sensors alongside the original camera, inertial, GPS, and proprioceptive sensors. We also release the tools we use for collecting, processing, and querying the data, including our metadata system designed to further the utility of our data. Custom infrastructure allows end users to reconfigure the data to cater to their own platforms. These tools and infrastructure alongside the dataset are useful for a variety of tasks in the field of off-road autonomy and, by releasing them, we encourage collaborative data aggregation. These resources lower the barrier to entry to utilizing large-scale datasets, thereyby helping facilitate the advancement of robotics in areas such as self-supervised learning, multi-modal perception, inverse reinforcement learning, and representation learning. The dataset is available at https://theAirLab.org/TartanDrive2.
|
|
10:30-12:00, Paper ThAT8-CC.2 | Add to My Program |
EnYOLO: A Real-Time Framework for Domain-Adaptive Underwater Object Detection with Image Enhancement |
|
Wen, Junjie | The Chinese University of Hong Kong |
Cui, Jinqiang | Peng Cheng Laboratory |
Zhao, Benyun | The Chinese University of Hong Kong |
Han, Bingxin | The Chinese University of Hong Kong |
Liu, Xuchen | The Chinese University of Hong Kong |
Gao, Zhi | Temasek Laboratories @ NUS |
Chen, Ben M. | Chinese University of Hong Kong |
Keywords: Deep Learning Methods, Marine Robotics, Object Detection, Segmentation and Categorization
Abstract: In recent years, significant progress has been made in the field of underwater image enhancement (UIE). However, its practical utility for high-level vision tasks, such as underwater object detection (UOD) in Autonomous Underwater Vehicles (AUVs), remains relatively unexplored. It may be attributed to several factors: (1) Existing methods typically employ UIE as a pre-processing step, which inevitably introduces considerable computational overhead and latency. (2) The process of enhancing images prior to training object detectors may not necessarily yield performance improvements. (3) The complex underwater environments can induce significant domain shifts across different scenarios, seriously deteriorating the UOD performance. To address these challenges, we introduce EnYOLO, an integrated real-time framework designed for simultaneous UIE and UOD with domain-adaptation capabilities. Specifically, both the UIE and UOD task heads share the same network backbone and utilize a lightweight design. Furthermore, to ensure balanced training for both tasks, we present a multi-stage training strategy aimed at consistently enhancing the performance of both functions. Additionally, we propose a novel domain-adaptation strategy to align feature embeddings originating from diverse underwater environments. Comprehensive experiments demonstrate that our framework not only achieves state-of-the-art (SOTA) performance in both UIE and UOD tasks, but also shows superior adaptability when applied to different underwater scenarios. Our efficiency analysis further highlights the substantial potential of our framework for onboard deployment.
|
|
10:30-12:00, Paper ThAT8-CC.3 | Add to My Program |
GS-PKNN: An Efficient and High-Fidelity Mobility Prediction Method for Unmanned Ground Vehicles |
|
Hua, Chen | University of Science and Technology of China |
Jiang, Chunmao | University of Science and Technology of China |
Niu, Runxin | Hefei Institutes of Physical Science, Chinese Academy of Science |
Yu, Biao | Hefei Institutes of Physical Science, Chinese Academy of Science |
Zhu, Hui | Hefei Institutes of Physical Science, Chinese Academy of Science |
Li, Bichun | Hefei Institutes of Physical Science, Chinese Academy of Science |
Keywords: Field Robots, Deep Learning Methods, Simulation and Animation
Abstract: To avoid unmanned ground vehicles being obstructed by deformed terrain in off-road, effective vehicle mobility analysis is required. However, the computational complexity of existing mobility analysis methods, such as discrete element analysis, poses significant challenges when applied to largescale terrains. To address this problem, we propose an efficient and high-fidelity vehicle mobiliy prediction method for a largescale terrain. Initially, precise terrain models are constructed employing Gaussian sampling (GS), thereby serving as optimal inputs for the mobility simulation. Subsequently, we introduce a co-simulation method based on a multi-body dynamics model and discrete element analysis to obtain high-fidelity vehicle mobility data on sampled terrains. Following that, the mobility data is utilized to train a PSO-kriging neural network (PKNN), enabling accurate predictions of the global mobility map. Through rigorous simulation experiments, the proposed method (GS-PKNN) demonstrates its remarkable effectiveness.
|
|
10:30-12:00, Paper ThAT8-CC.4 | Add to My Program |
UNRealNet: Learning Uncertainty-Aware Navigation Features from High-Fidelity Scans of Real Environments |
|
Triest, Samuel | Carnegie Mellon University |
Fan, David D. | Jet Propulsion Laboratory, California Institute of Technology, P |
Scherer, Sebastian | Carnegie Mellon University |
Agha-mohammadi, Ali-akbar | NASA-JPL, Caltech |
Keywords: Field Robots, Legged Robots, Deep Learning Methods
Abstract: Traversability estimation in rugged, unstructured environments remains a challenging problem in field robotics. Often, the need for precise, accurate traversability estimation is in direct opposition to the limited sensing and compute capability present on affordable, small-scale mobile robots. To address this issue, we present a novel method to learn [u]ncertainty-aware [n]avigation features from high-fidelity scans of [real]-world environments (UNRealNet). This network can be deployed on-robot to predict these high-fidelity features using input from lower-quality sensors. UNRealNet predicts dense, metric-space features directly from single-frame lidar scans, thus reducing the effects of occlusion and odometry error. Our approach is label-free, and is able to produce traversability estimates that are robot-agnostic. Additionally, we can leverage UNRealNet’s predictive uncertainty to both produce risk-aware traversability estimates, and refine our feature predictions over time. We find that our method outperforms traditional local mapping and inpainting baselines by up to 40%, and demonstrate its efficacy on multiple legged platforms.
|
|
10:30-12:00, Paper ThAT8-CC.5 | Add to My Program |
Robot-Dependent Traversability Estimation for Outdoor Environments Using Deep Multimodal Variational Autoencoders |
|
Eder, Matthias | Graz University of Technology |
Steinbauer-Wagner, Gerald | Graz University of Technology |
Keywords: Field Robots, Motion and Path Planning, Data Sets for Robot Learning
Abstract: Efficient and reliable navigation in off-road environments poses a significant challenge for robotics, especially when factoring in the varying capabilities of robots across different terrains. To achieve this, the robot system's traversability is usually estimated to plan traversable routes through an environment. This paper presents a new approach that utilizes Deep Multimodal Variational Autoencoders (DMVAEs) for estimating the traversability of different robots in complex off-road terrains. Our method utilizes DMVAEs to capture essential environmental information and robot properties, effectively modeling factors that influence robotic traversability. The key contribution of this research is a two-stage traversability estimation framework for various robots in diverse off-road conditions that integrates robot properties in addition to environmental information to predict the traversability for various robots in a single model. We validate our method through real-world experiments involving four ground robots navigating an alpine environment. Comparative evaluations against state-of-the-art traversability estimation methods demonstrate the superior accuracy and robustness of our approach. Additionally, we investigate the transfer of trained models to new robots, enhancing their traversability estimation and extending the applicability of our framework.
|
|
10:30-12:00, Paper ThAT8-CC.6 | Add to My Program |
F3DMP: Foresighted 3D Motion Planning of Mobile Robots in Wild Environments |
|
Yang, Andong | Institute of Computing Technology, Chinese Academy of Sciences |
Li, Wei | Institute of Computing Technology, Chinese Academy of Sciences |
Hu, Yu | Institute of Computing Technology Chinese Academy of Sciences |
Keywords: Field Robots, Motion and Path Planning, Deep Learning Methods
Abstract: In wild environments, motion planning for mobile robots faces the challenge of local optimal path traps due to limited sensor perception range and lack of spatial awareness. Existing approaches that avoid local optimum by designing heuristic functions or high-quality global paths in wild environments are time-consuming and unstable. This work proposes F3DMP, which consists of two parts to alleviate the local optimum solution and better utilize distant terrain information. First, the entire planning framework is adapted to the three-dimensional space so that the planning result conforms to the geometric characteristics of the terrain. Second, a time allocation function based on offline reinforcement learning is proposed. This function can anticipate potential challenges or opportunities based on semantic information for the image and proactively determine a time allocation. Our planner is integrated into a complete mobile robot system and deployed to a real robot. Experiments in simulation and the real world demonstrate that our method can improve the success rate by 28% and the trajectory smoothness by 27% compared with traditional methods.
|
|
10:30-12:00, Paper ThAT8-CC.7 | Add to My Program |
MATRIX: Multi-Agent Trajectory Generation with Diverse Contexts |
|
Xu, Zhuo | UC Berkeley |
Zhou, Rui | University of California, Berkeley |
Yin, Yida | University of California, Berkeley |
Gao, Huidong | University of California, Berkeley |
Tomizuka, Masayoshi | University of California |
Li, Jiachen | University of California, Riverside |
Keywords: Intelligent Transportation Systems, Path Planning for Multiple Mobile Robots or Agents, Datasets for Human Motion
Abstract: Data-driven methods have great advantages in modeling complicated human behavioral dynamics and dealing with many human-robot interaction applications. However, collecting massive and annotated real-world human datasets has been a laborious task, especially for highly interactive scenarios. On the other hand, algorithmic data generation methods are usually limited by their model capacities, making them unable to offer realistic and diverse data needed by various application users. In this work, we study trajectory-level data generation for multi-human or human-robot interaction scenarios and propose a learning-based automatic trajectory generation model, which we call Multi-Agent TRajectory generation with dIverse conteXts (MATRIX). MATRIX is capable of generating interactive human behaviors in realistic diverse contexts. We achieve this goal by modeling the explicit and interpretable objectives so that MATRIX can generate human motions based on diverse destinations and heterogeneous behaviors. We carried out extensive comparison and ablation studies to illustrate the effectiveness of our approach across various metrics. We also presented experiments that demonstrate the capability of MATRIX to serve as data augmentation for imitation-based motion planning.
|
|
ThAT9-CC Oral Session, CC-419 |
Add to My Program |
Task Planning |
|
|
Chair: Tazaki, Yuichi | Kobe University |
Co-Chair: Manjanna, Sandeep | Plaksha University |
|
10:30-12:00, Paper ThAT9-CC.1 | Add to My Program |
Distributed Multi-Robot Online Sampling with Budget Constraints |
|
Shamshirgaran, Azin | University of California, Merced |
Manjanna, Sandeep | Plaksha University |
Carpin, Stefano | University of California, Merced |
Keywords: Agricultural Automation, Planning, Scheduling and Coordination
Abstract: In multi-robot informative path planning the problem is to find a route for each robot in a team to visit a set of locations that can provide the most useful data to reconstruct an unknown scalar field. In the budgeted version, each robot is subject to a travel budget limiting the distance it can travel. Our interest in this problem is motivated by applications in precision agriculture, where robots are used to collect measurements to estimate domain-relevant scalar parameters such as soil moisture or nitrates concentrations. In this paper, we propose an online, distributed multi-robot sampling algorithm based on Monte Carlo Tree Search (MCTS) where each robot iteratively selects the next sampling location through communication with other robots and considering its remaining budget. We evaluate our proposed method for varying team sizes and in different environments, and we compare our solution with four different baseline methods. Our experiments show that our solution outperforms the baselines when the budget is tight by collecting measurements leading to smaller reconstruction errors.
|
|
10:30-12:00, Paper ThAT9-CC.2 | Add to My Program |
Coupled Active Perception and Manipulation Planning for a Mobile Manipulator in Precision Agriculture Applications |
|
Xie, Shuangyu | Texas A&M University |
Hu, Chengsong | Texas A&M University |
Wang, Di | Texas A&M University |
Johnson, Joe | Texas A&M University |
Bagavathiannan, Muthukumar | Texas A&M University |
Song, Dezhen | Mohamed Bin Zayed University of Artificial Intelligence (MBZUAI) |
Keywords: Agricultural Automation, Planning, Scheduling and Coordination, Task Planning
Abstract: A mobile manipulator often finds itself in an application where it needs to take a close-up view before performing a manipulation task. Named this as a coupled active perception and manipulation (CAPM) problem, we model the uncertainty in the perception process and devise a key state/task planning approach that considers reachability conditions as task constraints of both perception and manipulation tasks for the mobile platform. By minimizing the expected energy usage in body key state planning while satisfying task constraints, our algorithm achieves the best balance between the task success rate and energy usage. We have implemented the algorithm and tested it in both simulation and physical experiments. The results have confirmed that our algorithm has a lower energy consumption compared to a two-stage decoupled approach, while still maintaining a success rate of 100% for the task.
|
|
10:30-12:00, Paper ThAT9-CC.3 | Add to My Program |
RAMP: A Benchmark for Evaluating Robotic Assembly Manipulation and Planning |
|
Collins, Jack | University of Oxford |
Robson, Mark | University of Birmingham |
Yamada, Jun | University of Oxford |
Sridharan, Mohan | University of Edinburgh |
Janik, Karol | Manufacturing Technology Centre |
Posner, Ingmar | Oxford University |
Keywords: Performance Evaluation and Benchmarking, Assembly, Task Planning
Abstract: We introduce RAMP, an open-source robotics benchmark inspired by real-world industrial assembly tasks. RAMP consists of beams that a robot must assemble into specified goal configurations using pegs as fasteners. As such it assesses planning and execution capabilities, and poses challenges in perception, reasoning, manipulation, diagnostics, fault recovery, and goal parsing. RAMP has been designed to be accessible and extensible. Parts are either 3D printed or otherwise constructed from materials that are readily obtainable. The part design and detailed instructions are publicly available. In order to broaden community engagement, RAMP incorporates fixtures such as April Tags which enable researchers to focus on individual sub-tasks of the assembly challenge if desired. We provide a full digital twin as well as rudimentary baselines to enable rapid progress. Our vision is for RAMP to form the substrate for a community-driven endeavour that evolves as capability matures.
|
|
10:30-12:00, Paper ThAT9-CC.4 | Add to My Program |
Towards Safe Robot Use with Edged or Pointed Objects: A Surrogate Study Assembling a Human Hand Injury Protection Database |
|
Kirschner, Robin Jeanne | TU Munich, Institute for Robotics and Systems Intelligence |
Micheler, Carina M. | Technical University of Munich, TUM School of Medicine, Klinikum |
Zhou, Yangcan | Technical University of Munich |
Siegner, Sebastian Julian | TU Munich |
Hamad, Mazin | Technical University of Munich (TUM) |
Glowalla, Claudio | Department of Orthopaedics and Sports Orthopaedics, Klinikum Rec |
Neumann, Jan | TU Munich |
Rajaei, Nader | Technical University of Munich |
Burgkart, Rainer | Technische Universität München |
Haddadin, Sami | Technical University of Munich |
Keywords: Safety in HRI, Human-Centered Robotics, Task Planning
Abstract: The use of pointed or edged tools or objects is one of the most challenging aspects of today's application of physical human-robot interaction (pHRI). One reason for this is that the severity of harm caused by such edged or pointed impactors is less well studied than for blunt impactors. Consequently, the standards specify well-reasoned force and pressure thresholds for blunt impactors and advise avoiding any edges and corners in contacts. Nevertheless, pointed or edged impactor geometries cannot be completely ruled out in real pHRI applications. For example, to allow edged or pointed tools such as screwdrivers near human operators, the knowledge of injury severity needs to be extended so that robot integrators can perform well-reasoned, time-efficient risk assessments. In this paper, we provide the initial datasets on injury prevention for the human hand based on drop tests with surrogates for the human hand, namely pig claws and chicken drumsticks. We then demonstrate the ease and efficiency of robot use using the dataset for contact on two examples. Finally, our experiments provide a set of injuries that may also be expected for human subjects under certain robot mass-velocity constellations in collisions. To extend this work, testing on human samples and a collaborative effort from research institutes worldwide is needed to create a comprehensive human injury avoidance database for any pHRI scenario and thus for safe pHRI applications including edged and pointed geometries.
|
|
10:30-12:00, Paper ThAT9-CC.5 | Add to My Program |
Accelerating Long-Horizon Planning with Affordance-Directed Dynamic Grounding of Abstract Strategies |
|
Elimelech, Khen | Rice University |
Kingston, Zachary | Rice University |
Thomason, Wil | Rice University |
Moshe, Vardi | Rice University |
Kavraki, Lydia | Rice University |
Keywords: Task Planning, AI-Enabled Robotics, Learning from Experience
Abstract: Long-horizon task planning is important for robot autonomy, especially as a subroutine for frameworks such as Integrated Task and Motion Planning. However, task planning is computationally challenging and struggles to scale to realistic problem settings. We propose to accelerate task planning over an agent's lifetime by integrating abstract strategies: a generalizable planning experience encoding introduced in earlier work. In this work, we contribute a practical approach to planning with strategies by introducing a novel formalism of planning in a strategy-augmented domain. We also introduce and formulate the notion of a strategy's affordance, which indicates its predicted benefit to the solution, and use it to guide the planning and strategy grounding processes. Together, our observations yield an affordance-directed, lazy-search planning algorithm, which can seamlessly compose strategies and actions to solve long-horizon planning problems. We evaluate our planner in an object rearrangement domain, where we demonstrate performance benefits relative to a state-of-the-art task planner.
|
|
10:30-12:00, Paper ThAT9-CC.6 | Add to My Program |
Long-Horizon Planning and Execution with Functional Object-Oriented Networks |
|
Paulius, David | Brown University |
Agostini, Alejandro | University of Innsbruck |
Lee, Dongheui | Technische Universität Wien (TU Wien) |
Keywords: Task Planning, Learning from Demonstration
Abstract: Following work on joint object-action representations, functional object-oriented networks (FOON) were introduced as a knowledge graph representation for robots. A FOON contains symbolic concepts useful to a robot’s understanding of tasks and its environment for object-level planning. Prior to this work, little has been done to show how plans acquired from FOON can be executed by a robot, as the concepts in a FOON are too abstract for execution. We thereby introduce the idea of exploiting object-level knowledge as a FOON for task planning and execution. Our approach automatically transforms FOON into PDDL and leverages off-the-shelf planners, action contexts, and robot skills in a hierarchical planning pipeline to generate executable task plans. We demonstrate our entire approach on long-horizon tasks in CoppeliaSim and show how learned action contexts can be extended to never-before-seen scenarios.
|
|
10:30-12:00, Paper ThAT9-CC.7 | Add to My Program |
From Cooking Recipes to Robot Task Trees – Improving Planning Correctness and Task Efficiency by Leveraging LLMs with a Knowledge Network |
|
Sakib, Md Sadman | University of South Florida |
Sun, Yu | University of South Florida |
Keywords: Task Planning, Manipulation Planning, AI-Enabled Robotics
Abstract: Task planning for robotic cooking involves generating a sequence of actions for a robot to prepare a meal successfully. This paper introduces a novel task tree generation pipeline producing correct planning and efficient execution for cooking tasks. Our method first uses a large language model (LLM) to retrieve recipe instructions and then utilizes a fine-tuned GPT-3 to convert them into a task tree, capturing sequential and parallel dependencies among subtasks. The pipeline then mitigates the uncertainty and unreliable features of LLM outputs using task tree retrieval. We combine multiple LLM task tree outputs into a graph and perform a task tree retrieval to avoid questionable nodes and high-cost nodes to improve planning correctness and improve execution efficiency. Our evaluation results show its superior performance compared to previous works in task planning accuracy and efficiency.
|
|
10:30-12:00, Paper ThAT9-CC.8 | Add to My Program |
Stepwise Large-Scale Multi-Agent Task Planning Using Neighborhood Search |
|
Zeng, Fan | The University of Tokyo |
Shirafuji, Shouhei | Kansai University |
Fan, Changxiang | Institute of Facility Agriculture, Guangdong Academy of Agricult |
Nishio, Masahiro | Toyota Motor Corporation |
Ota, Jun | The University of Tokyo |
Keywords: Task Planning, Multi-Robot Systems, Autonomous Agents
Abstract: This paper presents a novel stepwise multi-agent task planning method that incorporates neighborhood search to address large-scale problems, thereby reducing computation time. With an increasing number of agents, the search space for task planning expands exponentially. Hence, conventional methods aiming to find globally optimal solutions, especially for some large-scale problems, incur extremely high computational costs and may even fail. In this paper, the proposed method easily achieves the goals of multi-agent task planning by solving an initial problem using a minimal number of agents. Subsequently, tasks are reallocated among all agents based on this solution and the solutions are iteratively optimized using a neighborhood search. While aiming to find a near-optimal solution rather than an optimal one, the method substantially reduces the time complexity of searching to a polynomial level. Moreover, the effectiveness of the proposed method is demonstrated by solving some benchmark problems and comparing the results obtained using the proposed method with those obtained using other state-of-the-art methods.
|
|
10:30-12:00, Paper ThAT9-CC.9 | Add to My Program |
Bayesian-Guided Evolutionary Strategy with RRT for Multi-Robot Exploration |
|
Wu, Shuge | Beihang University |
Wang, Chunzheng | Beihang University |
Pan, Jiayi | Beihang University |
Han, Dongming | Beihang University |
Zhao, Zhongliang | Beihang University |
Keywords: Task Planning, Multi-Robot Systems, Multi-Robot SLAM
Abstract: With the increasing demand for multi-robot exploration of unknown environments, how to accomplish this problem efficiently has become a focus of research. However, in this kind of task, the formulation of strategies for frontier point detection and task allocation largely determines the overall efficiency of the system. In the task of multi-robot exploration of unknown environments, the strategies of frontier point detection and task assignment determine the overall efficiency of the system. Most of the existing methods implement frontier point detection based on the Rapidly-Exploring Random Tree (RRT) and use greedy algorithms for task allocation. However, the classical RRT algorithm is a fixed growth step, which leads to the difficulty of growing branches in narrow environments, making the efficiency and correctness of detecting frontier points lower. Meanwhile, the allocation strategy of the greedy algorithm causes each robot to consider only the exploration area with the largest gain for itself, which easily leads to repeated exploration and reduces the overall efficiency of the system. To solve these problems, we propose an adaptive RRT tree growth strategy for frontier point detection, which can adjust the step size according to the known map information and thus improve the efficiency and accuracy of detection; and introduce a Bayesian-guided evolutionary strategy(BGE) for efficient task allocation, which can utilize the current and historical information to find the optimal allocation scheme in a global perspective. We conduct a comprehensive test of the proposed strategy in the ROS system as well as in the real world, which proves the efficiency of our strategy.
|
|
ThAT10-CC Oral Session, CC-501 |
Add to My Program |
Modeling, Control, and Learning for Soft Robots I |
|
|
Chair: Mochiyama, Hiromi | University of Tsukuba |
Co-Chair: Della Santina, Cosimo | TU Delft |
|
10:30-12:00, Paper ThAT10-CC.1 | Add to My Program |
A Novel Model for Layer Jamming-Based Continuum Robots |
|
Yi, Bowen | Polytechnique Montreal |
Fan, Yeman | University of Technology Sydney |
Liu, Dikai | University of Technology, Sydney |
Keywords: Modeling, Control, and Learning for Soft Robots, Compliance and Impedance Control
Abstract: Continuum robots with variable stiffness have gained wide popularity in the last decade. Layer jamming (LJ) has emerged as a simple and efficient technique to achieve tunable stiffness for continuum robots. Despite its merits, the development of a control-oriented dynamical model 1 tailored for this specific class of robots remains an open problem in the literature. This paper aims to present the first solution, to the best of our knowledge, to close the gap. We propose an energy-based model that is integrated with the LuGre frictional model for LJ-based continuum robots. Then, we take a comprehensive theoretical analysis for this model, focusing on two fundamental characteristics of LJ-based continuum robots: shape locking and adjustable stiffness. To validate the modeling approach and theoretical results, a series of experiments using our OctRobot-I continuum robotic platform was conducted. The results show that the proposed model is capable of interpreting and predicting the dynamical behaviors in LJ-based continuum robots.
|
|
10:30-12:00, Paper ThAT10-CC.2 | Add to My Program |
Lumped Parameter Dynamic Model of an Eversion Growing Robot: Analysis, Simulation and Experimental Validation |
|
Vartholomeos, Panagiotis | University of Thessaly |
Wu, Zicong | King's College London |
Sadati, S.M.Hadi | King's College London |
Bergeles, Christos | King's College London |
Keywords: Modeling, Control, and Learning for Soft Robots, Dynamics, Medical Robots and Systems
Abstract: This paper presents a lumped-parameter dynamic model of a pressure driven eversion robot carrying a catheter through its hollow core. A simulation framework based on the model is developed in MATLAB and is used for understanding the underlying physics, for identifying the regions of operation, and for demonstrating that, for a range of input commands, the catheter can be used as an actuation mechanism for propelling eversion; an approach especially useful for miniaturised systems. Simulations are experimentally validated on the MAMMOBOT system, which is a miniature steerable soft growing robot for early breast cancer detection. It was demonstrated that for most regions of operation experimental results compare well with simulation exhibiting an error less than 4%. Only one region of operation demonstrated larger deviations due possibly to unmodeled dynamics, which will be investigated in future work.
|
|
10:30-12:00, Paper ThAT10-CC.3 | Add to My Program |
Trajectory Tracking Control of Dual-PAM Soft Actuator with Hysteresis Compensator |
|
Shen, Junyi | The University of Tokyo |
Miyazaki, Tetsuro | The University of Tokyo |
Ohno, Shingo | Bridgestone Corporation |
Sogabe, Maina | The University of Tokyo |
Kawashima, Kenji | The University of Tokyo |
Keywords: Modeling, Control, and Learning for Soft Robots, Hydraulic/Pneumatic Actuators, Robust/Adaptive Control
Abstract: Soft robotics is an emergent and swiftly evolving field. Pneumatic actuators are suitable for driving soft robots because of their superior performance. However, their control is not easy due to their hysteresis characteristics. In response to these challenges, we propose an adaptive control method to compensate hysteresis of a soft actuator. Employing a novel dual pneumatic artificial muscle (PAM) bending actuator, the innovative control strategy abates hysteresis effects by dynamically modulating gains within a traditional PID controller corresponding with the predicted motion of the reference trajectory. Through comparative experimental evaluation, we found that the new control method outperforms its conventional counterparts regarding tracking accuracy and response speed. Our work reveals a new direction for advancing control in soft actuators.
|
|
10:30-12:00, Paper ThAT10-CC.4 | Add to My Program |
Kinematic Modeling and Control of a Soft Robotic Arm with Non-Constant Curvature Deformation |
|
Wang, Zhanchi | University of Science and Technology of China |
Wang, Gaotian | University of Science and Technology of China |
Chen, Xiaoping | University of Science and Technology of China |
Freris, Nikolaos | University of Science and Technology of China |
Keywords: Modeling, Control, and Learning for Soft Robots, Kinematics, Motion Control
Abstract: The passive compliance of soft robotic arms renders the development of accurate kinematic models and model-based controllers challenging. The most widely used model in soft robotic kinematics assumes Piecewise Constant Curvature (PCC). However, PCC introduces errors when the robot is subject to external forces or even gravity. In this paper, we establish a three-dimensional (3D) kinematic representation of a soft robotic arm with pseudo universal and prismatic joints that are capable of capturing non-constant curvature deformations of the soft segments. We theoretically demonstrate that this constitutes a more general methodology than PCC. Simulations and experiments on the real robot attest to the superior modeling accuracy of our approach in 3D motions with unknown load. The maximum position/rotation error of the proposed model is verified 6.7times/4.6times lower than the PCC model considering gravity and external forces. Furthermore, we devise an inverse kinematic controller that is capable of positioning the tip, tracking trajectories, as well as performing interactive tasks in the 3D space.
|
|
10:30-12:00, Paper ThAT10-CC.5 | Add to My Program |
CIDGIKc: Distance-Geometric Inverse Kinematics for Continuum Robots |
|
Zhang, Hanna | University of Toronto |
Giamou, Matthew | University of Toronto |
Maric, Filip | University of Toronto Institute for Aerospace Studies |
Kelly, Jonathan | University of Toronto |
Burgner-Kahrs, Jessica | University of Toronto |
Keywords: Modeling, Control, and Learning for Soft Robots, Kinematics
Abstract: The small size, high dexterity, and intrinsic compliance of continuum robots (CRs) make them well suited for constrained environments. Solving the inverse kinematics (IK), that is finding robot joint configurations that satisfy desired position or pose queries, is a fundamental challenge in motion planning, control, and calibration for any robot structure. For CRs, the need to avoid obstacles in tightly confined workspaces greatly complicates the search for feasible IK solutions. Without an accurate initialization or multiple restarts, existing algorithms often fail to find a solution. We present CIDGIKc (Convex Iteration for Distance-Geometric Inverse Kinematics for Continuum Robots), an algorithm that solves these nonconvex feasibility problems with a sequence of semidefinite programs whose objectives are designed to encourage low-rank minimizers. CIDGIKc is enabled by a novel distance-geometric parameterization of constant curvature segment geometry for CRs with extensible segments. The resulting IK formulation involves only quadratic expressions and can efficiently incorporate a large number of collision avoidance constraints. Our experimental results demonstrate >98% solve success rates within complex, highly cluttered environments which existing algorithms cannot account for.
|
|
10:30-12:00, Paper ThAT10-CC.6 | Add to My Program |
Automatically-Tuned Model Predictive Control for an Underwater Soft Robot |
|
Null, W. David | University of Illinois Urbana-Champaign |
Edwards, William | University of Illinois at Urbana-Champaign |
Jeong, Dohun | University of Illinois at Urbana-Champaign |
Tchalakov, Teodor | University of Illinois Urbana Champaign |
Menezes, James | University of Illinois at Urbana Champaign |
Hauser, Kris | University of Illinois at Urbana-Champaign |
Z, Y | University of Illinois at Urbana-Champaign |
Keywords: Modeling, Control, and Learning for Soft Robots, Machine Learning for Robot Control
Abstract: Soft robots have desirable qualities for use in underwater environments thanks to their inherent compliance and lack of need for exposed hardware. Nevertheless, these advantages come at the cost of considerable control challenges. Data-driven model predictive control (MPC) is an approach that has shown promise in controlling soft robots. However, manually tuning the many hyperparameters in the learned dynamics model and the optimizer can be extremely tedious. In this work, we explore using data-driven MPC to control an underwater soft robot, and employ the AutoMPC method to automatically tune the hyperparameters and generate the controller. In the process, we extend AutoMPC’s capabilities to handle multi-task tuning and we add a barrier cost function to enforce actuator constraints. Our experiments show that the AutoMPC controller reaches targets with significantly higher accuracy and reliability than state-of-the-art baselines both in- and out-of-distribution of the training data.
|
|
10:30-12:00, Paper ThAT10-CC.7 | Add to My Program |
Soft Acoustic End-Effector |
|
Zhang, Zhiyuan | Acoustic Robotics Systems Laboratory, Institute of Robotics And |
Koch, Michael | Acoustic Robotics Systems Laboratory, Institute of Robotics And |
Ahmed, Daniel | ETH Zurich |
Keywords: Automation at Micro-Nano Scales, Micro/Nano Robots, Soft Robot Applications
Abstract: Acoustic techniques have been developed as multifunctional tools for various microscale manipulations. In prevalent design paradigms, a position-fixed piezoelectric transducer (PZT) is utilized to generate ultrasound waves. However, the immobility of the PZT restricts the modulation of the acoustic field's position and orientation, consequently diminishing the adaptability and effectiveness of subsequent acoustic micromanipulation tasks. Here, we proposed a miniaturized soft acoustic end-effector and demonstrated acoustic field modulation and microparticle manipulation by adjusting PZT position and orientation. The PZT is mounted on the end of a soft robotic arm that has three individual degrees of freedom and can be deformed in 3D space by inflating or deflating each chamber. Experiments showed that the soft acoustic end-effector can change the traveling direction of microparticles and modulate the location of a standing wave field. Our approach is simple, flexible, and controllable. We envision that the soft acoustic end-effector will facilitate multiscale acoustic manipulation in interdisciplinary applications, especially, for in vivo acoustic therapies.
|
|
10:30-12:00, Paper ThAT10-CC.8 | Add to My Program |
A Provably Stable Iterative Learning Controller for Continuum Soft Robots |
|
Pierallini, Michele | Centro Di Ricerca E. Piaggio - Universitŕ Di Pisa |
Stella, Francesco | EPFL |
Angelini, Franco | University of Pisa |
Deutschmann, Bastian | German Aerospace Center |
Hughes, Josie | EPFL |
Bicchi, Antonio | Fondazione Istituto Italiano Di Tecnologia |
Garabini, Manolo | Universitŕ Di Pisa |
Della Santina, Cosimo | TU Delft |
Keywords: Modeling, Control, and Learning for Soft Robots, Motion Control
Abstract: Fully exploiting soft robots’ capabilities requires devising strategies that can accurately control their movements with the limited amount of control sources available. This task is challenging for reasons including the hard-to-model dynamics, the system’s underactuation, and the need of using a prominent feedforward control action to preserve the soft and safe robot behavior. To tackle this challenge, this letter proposes a purely feedforward iterative learning control algorithm that refines the torque action by leveraging both the knowledge of the model and data obtained from past experience. After presenting a 3D polynomial description of soft robots, we study their intrinsic properties, e.g., input-to-state stability, and we prove the convergence of the controller coping with locally Lipschitz nonlinearities. Finally, we validate the proposed approach through simulations and experiments involving multiple systems, trajectories, and in the case of external disturbances and model mismatches.
|
|
ThAT11-CC Oral Session, CC-502 |
Add to My Program |
Learning from Demonstration I |
|
|
Chair: Tsuji, Toshiaki | Saitama University |
Co-Chair: Mingyu, You | Tongji |
|
10:30-12:00, Paper ThAT11-CC.1 | Add to My Program |
Stable Motion Primitives Via Imitation and Contrastive Learning |
|
Pérez-Dattari, Rodrigo | Delft University of Technology |
Kober, Jens | TU Delft |
Keywords: Learning from Demonstration, Motion Control of Manipulators, Deep Learning in Robotics and Automation, Dynamical Systems as Movement Primitives
Abstract: Learning from humans allows non-experts to program robots with ease, lowering the resources required to build complex robotic solutions. Nevertheless, such data-driven approaches often lack the ability to provide guarantees regarding their learned behaviors, which is critical for avoiding failures and/or accidents. In this work, we focus on reaching/point-to-point motions, where robots must always reach their goal, independently of their initial state. This can be achieved by modeling motions as dynamical systems and ensuring that they are globally asymptotically stable. Hence, we introduce a novel Contrastive Learning loss for training Deep Neural Networks (DNN) that, when used together with an Imitation Learning loss, enforces the aforementioned stability in the learned motions. Differently from previous work, our method does not restrict the structure of its function approximator, enabling its use with arbitrary DNNs and allowing it to learn complex motions with high accuracy. We validate it using datasets and a real robot. In the former case, motions are 2 and 4 dimensional, modeled as first- and second-order dynamical systems. In the latter, motions are 3, 4, and 6 dimensional, of first and second order, and are used to control a 7DoF robot manipulator in its end effector space and joint space. More details regarding the real-world experiments are presented in: https://youtu.be/OM-2edHBRfc.
|
|
10:30-12:00, Paper ThAT11-CC.2 | Add to My Program |
GAN-Based Editable Movement Primitive from High-Variance Demonstrations |
|
Xu, Xuanhui | TongJi University |
Mingyu, You | Tongji |
Hongjun, Zhou | Tongji University |
Qian, Zhifeng | Tongji University |
Xu, Weisheng | Tongji University |
He, Bin | Tongji University |
Keywords: Learning from Demonstration, Deep Learning Methods, Machine Learning for Robot Control
Abstract: Movement Primitive (MP) is a promising Learning from Demonstration (LfD) framework, which is commonly used to learn movements from human demonstrations and adapt the learned movements to new task scenes. A major goal of MP research is to improve the adaptability of MP to various target positions and obstacles. MPs enable their adaptability by capturing the variability of demonstrations. However, current MPs can only learn from low-variance demonstrations. The low-variance demonstrations include varied target positions but leave various obstacles alone. These MPs can not adapt the learned movements to the task scenes with different obstacles, which limits their adaptability since obstacles are everywhere in daily life. In this paper, we propose a novel transformer and GAN-based Editable Movement Primitive (EditMP), which can learn movements from high-variance demonstrations. These demonstrations include the movements in the task scenes with various target positions and obstacles. After movement learning, EditMP can controllably and interpretably edit the learned movements for new task scenes. Notably, EditMP enables all robot joints rather than the robot end-effector to avoid hitting complex obstacles. The proposed method is evaluated on three tasks and deployed to a real-world robot. We compare EditMP with probabilistic-based MPs and empirically demonstrate the state-of-the-art adaptability of EditMP.
|
|
10:30-12:00, Paper ThAT11-CC.3 | Add to My Program |
One-Shot Imitation Learning with Graph Neural Networks for Pick-And-Place Manipulation Tasks |
|
Di Felice, Francesco | Mechanical Intelligence Institute, Sant'Anna School of Advanced |
D'Avella, Salvatore | Sant'Anna School of Advanced Studies |
Remus, Alberto | Sant'Anna School of Advanced Studies |
Tripicchio, Paolo | Scuola Superiore Sant'Anna |
Avizzano, Carlo Alberto | Scuola Superiore Sant'Anna |
Keywords: Learning from Demonstration, Imitation Learning, Task and Motion Planning
Abstract: The proposed work presents a framework based on Graph Neural Networks (GNN) that abstracts the task to be executed and directly allows the robot to learn task-specific rules from synthetic demonstrations given through imitation learning. A graph representation of the state space is considered to encode the task-relevant entities as nodes for a Pick-and-Place task declined at different levels of difficulty. During training, the GNN-based policy learns the underlying rules of the manipulation task focusing on the structural relevance and the type of objects and goals, relying on an external primitive to move the robot to accomplish the task. The GNN-policy has been trained as a node-classification approach by looking at the different configurations of the objects and goals present in the scene, learning the association between them with respect to their type for the Pick-and-Place task. The experimental results show a high generalization capability of the proposed model in terms of the number, positions, height distributions, and even configurations of the objects/goals. Thanks to the generalization, only a single image of the desired goal configuration is required at inference time.
|
|
10:30-12:00, Paper ThAT11-CC.4 | Add to My Program |
Unsupervised Human Motion Segmentation Based on Characteristic Force Signals of Contact Events |
|
Sugawara, Keito | Saitama University |
Sakaino, Sho | University of Tsukuba |
Tsuji, Toshiaki | Saitama University |
Keywords: Learning from Demonstration, Compliance and Impedance Control
Abstract: Humans perform complex tasks involving force interactions daily. Learning from demonstration, a method for transferring such human manipulation skills to robots, requires techniques for segmenting the demonstrations into movement primitives. Therefore, we propose an unsupervised motion segmentation method that utilizes small characteristic fluctuations of 6-axis force/torque signals as features for motion segmentation. This method includes a feature extraction phase using a differentiation process and detects segmentation points based on the differentiated 6-axis force/torque signals obtained in the task demonstrations. The segmentation method was evaluated using a peg-in-hole task and bottle-lid opening task. The experimental results demonstrate the validity of using differentiated forces and torques for motion segmentation.
|
|
10:30-12:00, Paper ThAT11-CC.5 | Add to My Program |
Robotic Skill Mutation in Robot-To-Robot Propagation During a Physically Collaborative Sawing Task |
|
Maessen, Rosa Enna Sophia | Delft University of Technology |
Prendergast, J. Micah | Delft University of Technology |
Peternel, Luka | Delft University of Technology |
Keywords: Learning from Demonstration, Physical Human-Robot Interaction, Compliance and Impedance Control
Abstract: Skill propagation among robots without human involvement can be crucial in quickly spreading new physical skills to many robots. In this respect, it is a good alternative to pure reinforcement learning, which can be time-consuming, or learning from human demonstration, which requires human involvement. In the latter case, there may not be enough humans to quickly spread skills to many robots. However, propagation among robots without direct human supervision can result in robotic skills mutating from the original source. This can be beneficial when better skills might emerge or when a new skill is obtained to be used for other similar tasks. However, it can also be dangerous in terms of task execution safety. This letter studies the mutation of a robotic skill when it is propagated from one robot to another during a physically collaborative task. We chose the collaborative sawing task as a study case since it involves complex two-agent physical interaction/coordination and because its periodic nature can facilitate repetitive learning. The study employs periodic Dynamic Movement Primitives and Locally Weight Regression to encode and learn the motion and impedance required to execute the task. To explore what influences mutation, we varied several control and environment conditions such as the maximum stiffness, robot base position, friction coefficient of the sawed object, and movement period.
|
|
10:30-12:00, Paper ThAT11-CC.6 | Add to My Program |
Few-Shot Learning of Force-Based Motions from Demonstration through Pre-Training of Haptic Representation |
|
Aoyama, Marina Y. | The University of Edinburgh |
Moura, Joao | The University of Edinburgh |
Saito, Namiko | The University of Edinburgh |
Vijayakumar, Sethu | University of Edinburgh |
Keywords: Learning from Demonstration, Representation Learning, Force and Tactile Sensing
Abstract: In many contact-rich tasks, force sensing plays an essential role in adapting the motion to the physical properties of the manipulated object. To enable robots to capture the underlying distribution of object properties necessary for generalising learnt manipulation tasks to unseen objects, existing Learning from Demonstration (LfD) approaches require a large number of costly human demonstrations. Our proposed semi-supervised LfD approach decouples the learnt model into a haptic representation encoder and a motion generation decoder. This enables us to pre-train the first using a large amount of unsupervised data, easily accessible, while using few-shot LfD to train the second, leveraging the benefits of learning skills from humans. We validate the approach on the wiping task using sponges with different stiffness and surface friction. Our results demonstrate that pre-training significantly improves the ability of the LfD model to recognise physical properties and generate desired wiping motions for unseen sponges, outperforming the LfD method without pre-training. We validate the motion generated by our semi-supervised LfD model on the physical robot hardware using the KUKA iiwa robot arm. We also validate that the haptic representation encoder, pre-trained in simulation, captures the properties of real objects, explaining its contribution to improving the generalisation of the downstream task.
|
|
10:30-12:00, Paper ThAT11-CC.7 | Add to My Program |
Learning-Based Risk-Bounded Path Planning under Environmental Uncertainty (I) |
|
Meng, Fei | The Chinese University of Hong Kong |
Chen, Liangliang | Georgia Institute of Technology |
Ma, Han | The Chinese University of Hong Kong |
Wang, Jiankun | Southern University of Science and Technology |
Meng, Max Q.-H. | The Chinese University of Hong Kong |
Keywords: Learning from Demonstration, Planning under Uncertainty, Simulation and Animation
Abstract: Building a general and efficient path planning framework in uncertain nonconvex environments is challenging due to the safety constraints and complex configuration. Traditional avenues usually involve convexifying obstacles and presume Gaussian distribution, which are not universal. Meanwhile, the fast convergence of high-quality solutions is not guaranteed. Therefore, we develop a novel neural risk-bounded path planner to quickly find near-optimal solutions that have an acceptable collision probability in the complex environments. Firstly, we retrieve the nonconvex obstacles with arbitrary probabilistic uncertainties in the form of a deterministic point cloud map. A neural network sampler encodes it into a latent embedding and is trained with sufficient expert demonstrations, predicting states in the potential subspace. We construct a neural cost estimator to select the best informed state from those samples. Then, we recursively use the simple yet effective neural networks to march toward the start and goal bidirectionally. The collision risk of the intermediate connections is verified based on sum-of-squares optimization. Simulation results show that our approach significantly saves time and resources in finding comparable solutions over the state-of-the-art methods in the seen and unseen challenging environments.
|
|
10:30-12:00, Paper ThAT11-CC.8 | Add to My Program |
ConBaT: Control Barrier Transformer for Safe Robot Learning from Demonstrations |
|
Meng, Yue | Massachusetts Institute of Technology |
Vemprala, Sai | Scaled Foundations |
Bonatti, Rogerio | Microsoft |
Fan, Chuchu | Massachusetts Institute of Technology |
Kapoor, Ashish | MicroSoft |
Keywords: Learning from Demonstration, Sensorimotor Learning, Machine Learning for Robot Control
Abstract: Large-scale self-supervised models have recently revolutionized our ability to perform a variety of tasks within the vision and language domains. However, using such models for autonomous systems is challenging because of safety requirements: besides executing correct actions, an autonomous agent must also avoid the high cost and potentially fatal critical mistakes. Traditionally, self-supervised training mainly focuses on imitating previously observed behaviors, and the training demonstrations carry no notion of which behaviors should be explicitly avoided. In this work, we propose Control Barrier Transformer (ConBaT), an approach that learns safe behaviors from demonstrations in a self-supervised fashion. ConBaT is inspired by the concept of control barrier functions in control theory and uses a causal transformer that learns to predict safe robot actions autoregressively using a critic that requires minimal safety data labeling. During deployment, we employ a lightweight online optimization to find actions that ensure future states lie within the learned safe set. We apply our approach to different simulated control tasks and show that our method results in safer control policies compared to other classical and learning-based methods such as imitation learning, reinforcement learning, and model predictive control.
|
|
10:30-12:00, Paper ThAT11-CC.9 | Add to My Program |
Unsupervised Learning of Neuro-Symbolic Rules for Generalizable Context-Aware Planning in Object Arrangement Tasks |
|
Sharma, Siddhant | Indian Institute of Technology Delhi |
Tuli, Shreshth | Indian Institute of Technology Delhi |
Paul, Rohan | Indian Institute of Technology Delhi |
Keywords: Learning from Demonstration, Task and Motion Planning
Abstract: As robots tackle complex object arrangement tasks, it becomes imperative for them to be able to generalize to complex worlds and scale with number of objects. This work postulates that extracting action primitives, such as push operations, their pre-conditions and effects would enable strong generalization to unseen worlds. Hence, we factorize policy learning as inference of such generic rules, which act as strong priors for predicting actions given the world state. Learnt rules act as propositional knowledge and enable robots to reach goals in a zero-shot method by applying the rules independently and incrementally. However, obtaining hand-engineered rules, such as PDDL descriptions is hard, especially for unseen worlds. This work aims to learn generic, sparse, and context-aware rules that govern action primitives in robotic worlds through human demonstrations in simple domains. We demonstrate that our approach, namely RLAP, is able to extract rules without explicit supervision of rule labels and generate goal-reaching plans in complex Sokoban styled domains that scale with number of objects. RLAP furnishes significantly higher goal reaching rate and shorter planning times compared to the state-of-the-art techniques. The code, dataset, and videos are hosted at https://rule-learning-rlap.github.io/.
|
|
ThAT12-CC Oral Session, CC-503 |
Add to My Program |
Deep Learning I |
|
|
Chair: Valada, Abhinav | University of Freiburg |
Co-Chair: Barfoot, Timothy | University of Toronto |
|
10:30-12:00, Paper ThAT12-CC.1 | Add to My Program |
PCPNet: An Efficient and Semantic-Enhanced Transformer Network for Point Cloud Prediction |
|
Luo, Zhen | Beijing Institute of Technology |
Ma, Junyi | Beijing Institute of Technology |
Zhou, Zijie | Beijing Institute of Technology |
Xiong, Guangming | Beijing Institute of Technology |
Keywords: Deep Learning Methods, Deep Learning for Visual Perception
Abstract: The ability to predict future structure features of environments based on past perception information is extremely needed by autonomous vehicles, which helps to make the following decision-making and path planning more reasonable. Recently, point cloud prediction (PCP) is utilized to predict and describe future environmental structures by the point cloud form. In this letter, we propose a novel efficient Transformer-based network to predict the future LiDAR point clouds exploiting the past point cloud sequences. We also design a semantic auxiliary training strategy to make the predicted LiDAR point cloud sequence semantically similar to the ground truth and thus improves the significance of the deployment for more tasks in real-vehicle applications. Our approach is completely self-supervised, which means it does not require any manual labeling and has a solid generalization ability toward different environments. The experimental results show that our method outperforms the state-of-the-art PCP methods on the prediction results and semantic similarity, and has a good real-time performance. Our open-source code and pre-trained models are available at https://github.com/Blurryface0814/PCPNet.
|
|
10:30-12:00, Paper ThAT12-CC.2 | Add to My Program |
N2M2: Learning Navigation for Arbitrary Mobile Manipulation Motions in Unseen and Dynamic Environments |
|
Honerkamp, Daniel | Albert Ludwigs Universität Freiburg |
Welschehold, Tim | Albert-Ludwigs-Universität Freiburg |
Valada, Abhinav | University of Freiburg |
Keywords: Learning and Adaptive Systems, Mobile Manipulation, Service Robots, Robot Learning
Abstract: Despite its importance in both industrial and service robotics, mobile manipulation remains a significant challenge as it requires seamless integration of end-effector trajectory generation with navigation skills as well as reasoning over long-horizons. Ex- isting methods struggle to control the large configuration space and to navigate dynamic and unknown environments. In the previous work, we proposed to decompose mobile manipulation tasks into a simplified motion generator for the end-effector in task space and a trained reinforcement learning agent for the mobile base to account for the kinematic feasibility of the motion. In this work, we introduce Neural Navigation for Mobile Manipulation (N2M2), which extends this decomposition to complex obstacle environ- ments, extends the agent’s control to the torso joint and the norm of the end-effector motion velocities, uses a more general reward function and, thereby, enables robots to tackle a much broader range of tasks in real-world settings. The resulting approach can perform unseen, long-horizon tasks in unexplored environments while instantly reacting to dynamic obstacles and environmental changes. At the same time, it provides a simple way to define new mobile manipulation tasks. We demonstrate the capabilities of our proposed approach in extensive simulation and real-world experi- ments on multiple kinematically diverse mobile manipulators.
|
|
10:30-12:00, Paper ThAT12-CC.3 | Add to My Program |
The Foreseeable Future: Self-Supervised Learning to Predict Dynamic Scenes for Indoor Navigation |
|
Thomas, Hugues | University of Toronto |
Zhang, Jian | Purdue University |
Barfoot, Timothy | University of Toronto |
Keywords: Learning and Adaptive Systems, Reactive and Sensor-Based Planning, Deep Learning in Robotics and Automation, Indoor Navigation
Abstract: We present a method for generating, predicting, and using Spatiotemporal Occupancy Grid Maps (SOGM), which embed future semantic information of real dynamic scenes. We present an auto-labeling process that creates SOGMs from noisy real navigation data. We use a 3D-2D feedforward architecture, trained to predict the future time steps of SOGMs, given 3D lidar frames as input. Our pipeline is entirely self-supervised, thus enabling lifelong learning for real robots. The network is composed of a 3D back-end that extracts rich features and enables the semantic segmentation of the lidar frames, and a 2D front-end that predicts the future information embedded in the SOGM representation, potentially capturing the complexities and uncertainties of real-world multi-agent interactions. We also design a navigation system that uses these predicted SOGMs within planning, after they have been transformed into Spatiotemporal Risk Maps (SRMs). We verify our navigation system's abilities in simulation, validate it on a real robot, study SOGM predictions on real data in various circumstances, and provide a novel indoor 3D lidar dataset, collected during our experiments, which includes our automated annotations.
|
|
10:30-12:00, Paper ThAT12-CC.4 | Add to My Program |
The Treachery of Images: Bayesian Scene Keypoints for Deep Policy Learning in Robotic Manipulation |
|
von Hartz, Jan Ole | University of Freiburg |
Chisari, Eugenio | University of Freiburg |
Welschehold, Tim | Albert-Ludwigs-Universität Freiburg |
Burgard, Wolfram | University of Technology Nuremberg |
Boedecker, Joschka | University of Freiburg |
Valada, Abhinav | University of Freiburg |
Keywords: Deep Learning Methods, Imitation Learning, Representation Learning
Abstract: In policy learning for robotic manipulation, sample efficiency is of paramount importance. Thus, learning and extracting more compact representations from camera observations is a promising avenue. However, current methods often assume full observability of the scene and struggle with scale invariance. In many tasks and settings, this assumption does not hold as objects in the scene are often occluded or lie outside the field of view of the camera, rendering the camera observation ambiguous with regard to their location. To tackle this problem, we present BASK, a Bayesian approach to tracking scale-invariant keypoints over time. Our approach can successfully resolve inherent ambiguities in images, enabling keypoint tracking on symmetrical objects and occluded and out-of-view objects. We employ ourmethod to learn challenging multi-object robot manipulation tasks from wrist camera observations and demonstrate superior utility for policy learning compared to other representation learning techniques. Furthermore, we show outstanding robustness towards disturbances such as clutter, occlusions, and noisy depth measurements, as well as generalization to unseen objects both in simulation and real-world robotic experiments.
|
|
10:30-12:00, Paper ThAT12-CC.5 | Add to My Program |
PBP: Path-Based Trajectory Prediction for Autonomous Driving |
|
Afshar, Sepideh | Motional |
Deo, Nachiket | UC San Diego |
Bhagat, Akshay | Motional AD Inc |
Chakraborty, Titas | Motional |
Shao, Yunming | ZOOX |
Buddharaju, Balarama Raju | Motional AD |
Deshpande, Adwait | Georgia Institute of Technology |
Cui, Henggang | Motional |
Keywords: Deep Learning Methods, AI-Based Methods, Intelligent Transportation Systems
Abstract: Trajectory prediction plays a crucial role in the autonomous driving stack by enabling autonomous vehicles to anticipate the motion of surrounding agents. Goal-based prediction models have gained traction in recent years for addressing the multimodal nature of future trajectories. Goal-based prediction models simplify multimodal prediction by first predicting 2D goal locations of agents and then predicting trajectories conditioned on each goal. However, a single 2D goal location serves as a weak inductive bias for predicting the whole trajectory, often leading to poor map compliance, i.e., part of the trajectory going off-road or breaking traffic rules. In this paper, we improve upon goal-based prediction by proposing the Path-based prediction (PBP) approach. PBP predicts a discrete probability distribution over reference paths in the HD map using the path features and predicts trajectories in the path-relative Frenet frame. We applied the PBP trajectory decoder on top of the HiVT scene encoder and report results on the Argoverse dataset. Our experiments show that PBP achieves competitive performance on the standard trajectory prediction metrics, while significantly outperforming state-of-the-art baselines in terms of map compliance.
|
|
10:30-12:00, Paper ThAT12-CC.6 | Add to My Program |
Sensorless Estimation of Contact Using Deep-Learning for Human-Robot Interaction |
|
Shan, Shilin | Nanyang Technological University |
Pham, Quang-Cuong | NTU Singapore |
Keywords: Deep Learning Methods, Physical Human-Robot Interaction, Industrial Robots
Abstract: Physical human-robot interaction has been an area of interest for decades. Collaborative tasks, such as joint compliance, demand high-quality joint torque sensing. While external torque sensors are reliable, they come with the drawbacks of being expensive and vulnerable to impacts. To address these issues, studies have been conducted to estimate external torques using only internal signals, such as joint states and current measurements. However, insufficient attention has been given to friction hysteresis approximation, which is crucial for tasks involving extensive dynamic to static state transitions. In this paper, we propose a deep-learning-based method that leverages a novel long-term memory scheme to achieve dynamics identification, accurately approximating the static hysteresis. We also introduce modifications to the well-known Residual Learning architecture, retaining high accuracy while reducing inference time. The robustness of the proposed method is illustrated through a joint compliance and task compliance experiment.
|
|
10:30-12:00, Paper ThAT12-CC.7 | Add to My Program |
Merging Decision Transformers: Weight Averaging for Forming Multi-Task Policies |
|
Lawson, Daniel | Purdue University |
Qureshi, Ahmed H. | Purdue University |
Keywords: Deep Learning Methods, Transfer Learning, Reinforcement Learning
Abstract: Recent work has shown the promise of creating generalist, transformer-based, models for language, vision, and sequential decision-making problems. To create such models, we generally require centralized training objectives, data, and compute. It is of interest if we can more flexibly create generalist policies by merging together multiple, task-specific, individually trained policies. In this work, we take a preliminary step in this direction through merging, or averaging, subsets of Decision Transformers in parameter space trained on different MuJoCo locomotion problems, forming multi-task models without centralized training. We also demonstrate the importance of various methodological choices when merging policies, such as utilizing common pre-trained initializations, increasing model capacity, and utilizing Fisher information for weighting parameter importance. In general, we believe research in this direction could help democratize and distribute the process that forms multi-task robotics policies. Our implementation is available at https://github.com/daniellawson9999/merging-decision-transf ormers.
|
|
ThAT13-AX Oral Session, AX-201 |
Add to My Program |
Human-Robot Collaboration IV |
|
|
Chair: Morimoto, Jun | ATR Computational Neuroscience Labs |
Co-Chair: Xiao, Jing | Worcester Polytechnic Institute (WPI) |
|
10:30-12:00, Paper ThAT13-AX.1 | Add to My Program |
Trust-Aware Motion Planning for Human-Robot Collaboration under Distribution Temporal Logic Specifications |
|
Yu, Pian | University of Oxford |
Dong, Shuyang | University of Virginia |
Sheng, Shili | University of Virginia |
Feng, Lu | University of Virginia |
Kwiatkowska, kwiatkowska_floc18_4YQkwtC9 | University of Oxford |
Keywords: Human-Robot Collaboration, Formal Methods in Robotics and Automation, Task Planning
Abstract: Recent work has considered trust-aware decision making for human-robot collaboration (HRC) with a focus on model learning. In this paper, we are interested in enabling the HRC system to complete complex tasks specified using temporal logic formulas that involve human trust. Since accurately observing human trust in robots with precision is challenging, we adopt the widely used partially observable Markov decision process (POMDP) framework for modelling the interactions between humans and robots. To specify the desired behaviour, we propose to use syntactically co-safe linear distribution temporal logic (scLDTL), a logic that is defined over predicates of states as well as belief states of partially observable systems. The incorporation of belief predicates in scLDTL enhances its expressiveness while simultaneously introducing added complexity. This also presents a new challenge as the belief predicates must be evaluated over the continuous (infinite) belief space. To address this challenge, we present an algorithm for solving the optimal policy synthesis problem. First, we enhance the belief MDP (derived by reformulating the POMDP) with a probabilistic labelling function. Then a product belief MDP is constructed between the probabilistically labelled belief MDP and the automaton translation of the scLDTL formula. Finally, we show that the optimal policy can be obtained by leveraging existing point-based value iteration algorithms with essential modifications. Human subject experiments with 21 participants on a driving simulator demonstrate the effectiveness of the proposed approach.
|
|
10:30-12:00, Paper ThAT13-AX.2 | Add to My Program |
Towards Proactive Safe Human-Robot Collaborations Via Data-Efficient Conditional Behavior Prediction |
|
Pandya, Ravi | Carnegie Mellon University |
Wang, Zhuoyuan | Carnegie Mellon University |
Nakahira, Yorie | CMU |
Liu, Changliu | Carnegie Mellon University |
Keywords: Human-Robot Collaboration, Human-Robot Teaming, Safety in HRI
Abstract: We focus on the problem of how we can enable a robot to collaborate seamlessly with a human partner, specifically in scenarios where preexisting data is sparse. Much prior work in human-robot collaboration uses observational models of humans (i.e. models that treat the robot purely as an observer) to choose the robot's behavior, but such models do not account for the influence the robot has on the human's actions, which may lead to inefficient interactions. We instead formulate the problem of optimally choosing a collaborative robot's behavior based on a conditional model of the human that depends on the robot's future behavior. First, we propose a novel model-based formulation of conditional behavior prediction that allows the robot to infer the human's intentions based on its future plan in data-sparse environments. We then show how to utilize a conditional model for proactive goal selection and safe trajectory generation around human collaborators. Finally, we use our proposed proactive controller in a collaborative task with real users to show that it can improve users' interactions with a robot collaborator quantitatively and qualitatively.
|
|
10:30-12:00, Paper ThAT13-AX.3 | Add to My Program |
Risk-Bounded Online Team Interventions Via Theory of Mind |
|
Zhang, Yuening | Massachusetts Institute of Technology |
Robertson, Paul | Dynamic Object Language Labs Inc. (DOLL Inc.) |
Shu, Tianmin | Massachusetts Institute of Technology |
Hong, Sungkweon | Massachusetts Institute of Technology |
Williams, Brian | MIT |
Keywords: Human-Robot Collaboration, Human-Robot Teaming, Social HRI
Abstract: Despite advancements in human-robot teamwork, limited progress was made in developing AI assistants capable of advising teams online during task time, due to the challenges of modeling both individual and collective beliefs of the team members. Dynamic epistemic logic has proved to be a viable tool for representing a machine Theory of Mind and for modeling communication in epistemic planning, with applications to human-robot teamwork. However, this approach has yet to be applied in an online teaming assistance context and fails to account for the real-life probabilities of potential team beliefs. We propose a novel blend of epistemic planning and POMDP techniques to create a risk-bounded AI team assistant, that intervenes only when the team's expected likelihood of failure exceeds a predefined risk threshold or in the case of potential execution deadlocks. Our experiments and simulated demonstration on the Virtualhome testbed show that the assistant can effectively improve team performance.
|
|
10:30-12:00, Paper ThAT13-AX.4 | Add to My Program |
Human-Robot Complementary Collaboration for Flexible and Precision Assembly |
|
Cao, Shichen | Worcester Polytechnic Institute |
Xiao, Jing | Worcester Polytechnic Institute (WPI) |
Keywords: Human-Robot Collaboration, Compliant Assembly, Human-Centered Automation
Abstract: This paper addresses human-robot collaborative (HRC) precision assembly that complements natural human ability and the strength of an autonomous robot system. Our approach enables both flexibility and efficiency of tight-clearance assembly of various complex-shaped parts in the presence of uncertainty without requiring assembly skills and knowledge of robotics from the human operator. We demonstrated the effectiveness of our approach in a variety of experiments and comparisons with other HRC assembly approaches.
|
|
10:30-12:00, Paper ThAT13-AX.5 | Add to My Program |
HAC-SLAM: Human Assisted Collaborative 3D-SLAM through Augmented Reality |
|
Sayour, Malak | American University of Beirut |
Yassine, Mohammad Karim | American University of Beirut |
Dib, Nadim | American University of Beirut |
Elhajj, Imad | American University of Beirut |
Asmar, Boulos | Idealworks |
Khoury, Elie | Idealworks |
Asmar, Daniel | American University of Beirut |
Keywords: Human-Robot Collaboration, Human Factors and Human-in-the-Loop, SLAM
Abstract: Simultaneous Localization and Mapping (SLAM) has emerged as a prime autonomous mobile agent localization algorithm. Despite the global research effort to improve SLAM, its mapping component remains limited and serves little more than to satisfy the coupled localization problem. We present a collaborative 3D SLAM approach leveraging the power of augmented reality (AR). The system introduces a trio of diverse agents, each with its unique capability to become an active member in the mapping process: mobile robots, human operators, and AR head-mounted display (AR-HMD). A 3D complementary mapping pipeline is developed to utilize the built-in SLAM capabilities of the AR-HMD as shareable data. Our system aligns and merges the AR-HMD and the robot’s local map automatically, triggered by a human-dictated initial guess. The created merged map proves advantageous in scenarios where the robot is restricted from navigating in certain areas. To correct map imperfections resulting from problematic objects such as transparent or reflective surfaces, the fused map is overlayed onto the environment, and hand gestures are used to add or delete 3D map features in real-time. Our system is implemented in both a lab and a real industrial warehouse setup. The results show a significant improvement in the map quality and mapping duration.
|
|
10:30-12:00, Paper ThAT13-AX.6 | Add to My Program |
GAN-Based Semi-Supervised Training of LSTM Nets for Intention Recognition in Cooperative Tasks |
|
Mavsar, Matija | Jozef Stefan Institute |
Morimoto, Jun | ATR Computational Neuroscience Labs |
Ude, Ales | Jozef Stefan Institute |
Keywords: Human-Robot Collaboration, Deep Learning Methods, Machine Learning for Robot Control
Abstract: The accumulation of a sufficient amount of data for training deep neural networks is a major hindrance in the application of deep learning in robotics. Acquiring real-world data requires considerable time and effort, yet it might still not capture the full range of potential environmental variations. The generation of new synthetic data based on existing training data has been enabled with the development of generative adversarial networks (GANs). In this paper, we introduce a training methodology based on GANs that utilizes a recurrent, LSTM-based architecture for intention recognition in robotics. The resulting networks predict the intention of the observed human or robot based on input RGB videos. They are trained in a semi-supervised manner, with the output classification networks predicting one of possible labels for the observed motion, while the recurrent generator networks produce fake RGB videos that are leveraged in the training process. We show that utilization of the generated data during the network training process increases the accuracy and generality of motion classification compared to using only real training data. The proposed method can be applied to a variety of dynamic tasks and different LSTM-based classification networks to supplement real data.
|
|
10:30-12:00, Paper ThAT13-AX.7 | Add to My Program |
CoBT: Collaborative Programming of Behaviour Trees from One Demonstration for Robot Manipulation |
|
Jain, Aayush | Irish Manufacturing Research and Technological University Dublin |
Long, Philip | Atlantic Technological University |
Villani, Valeria | University of Modena and Reggio Emilia |
Kelleher, John D. | Trinity Colege Dublin |
Leva, Maria Chiara | Technological University Dublin |
Keywords: Human-Robot Collaboration, Human-Centered Automation, Learning from Demonstration
Abstract: Mass customization and shorter manufacturing cycles are becoming more important among small and medium-sized companies. However, classical industrial robots struggle to cope with product variation and dynamic environments. In this paper, we present CoBT, a collaborative programming by demonstration framework for generating reactive and modular behavior trees. CoBT relies on a single demonstration and a combination of data-driven machine learning methods with logic-based declarative learning to learn a task, thus eliminating the need for programming expertise or long development times. The proposed framework is experimentally validated on 7 manipulation tasks and we show that CoBT achieves approx 93% success rate overall with an average of 7.5s programming time. We conduct a pilot study with non-expert users to provide feedback regarding the usability of CoBT. More videos and generated behavior trees are available at:https://github.com/jainaayush2006/CoBT.git.
|
|
10:30-12:00, Paper ThAT13-AX.8 | Add to My Program |
A Dynamic Planner for Safe and Predictable Human-Robot Collaboration |
|
Pupa, Andrea | University of Modena and Reggio Emilia |
Minelli, Marco | University of Modena and Reggio Emilia |
Secchi, Cristian | Univ. of Modena & Reggio Emilia |
Keywords: Human-Robot Collaboration, Human-Aware Motion Planning, Safety in HRI
Abstract: The new face of modern industrial scenarios involves shared workspaces where humans and robots work closely together. To ensure safe human-robot collaboration (HRC), regulations have been updated introducing the ISO/TS15066. However, complying with these regulations often leads to inefficient behavior, such as unnecessarily reducing robot speed or unpredictably changing the robot path, which may negatively affect the operator perception of the robot. In this work an optimal approach to address together these two issues is proposed. Starting from a desired final configuration, the framework plans a collision-free trajectory for the robot. Subsequently, predictability is taken into account and a set of virtual tubes into which the path of the robot can move is built. Lastly, an optimization problem is solved online to ensure that the robot stays within these tubes and the velocities are compliant with the ISO/TS 15066. The proposed approach has been experimentally validated in two different scenarios: one composed by a mobile manipulator, i.e. a UR10e mounted on a Neobotix MPO-500, and one composed by only a collaborative manipulator, i.e. a UR5e.
|
|
10:30-12:00, Paper ThAT13-AX.9 | Add to My Program |
Wait, That Feels Familiar: Learning to Extrapolate Human Preferences for Preference-Aligned Path Planning |
|
Karnan, Haresh | The University of Texas at Austin |
Yang, Elvin | University of Michigan, Ann Arbor |
Warnell, Garrett | U.S. Army Research Laboratory |
Stone, Peter | University of Texas at Austin |
Biswas, Joydeep | University of Texas at Austin |
Keywords: Vision-Based Navigation, Representation Learning, Autonomous Vehicle Navigation
Abstract: Autonomous mobility tasks such as last-mile delivery require reasoning about operator-indicated preferences over terrains on which the robot should navigate to ensure both robot safety and mission success. However, coping with out-of-distribution data from novel terrains or appearance changes due to lighting variations remains a fundamental problem in visual terrain-adaptive navigation. Existing solutions either require labor-intensive manual data re-collection and labeling or use hand-coded reward functions that may not align with operator preferences. In this work, we posit that operator preferences for visually novel terrains, which the robot should adhere to, can often be extrapolated from established terrain preferences within the inertial-proprioceptive-tactile domain. Leveraging this insight, we introduce Preference extrApolation for Terrain-awarE Robot Navigation (PATERN), a novel framework for extrapolating operator terrain preferences for visual navigation. PATERN learns to map inertial-proprioceptive-tactile measurements from the robot’s observations to a representation space and performs a nearest-neighbor search in this space to estimate operator preferences over novel terrains. Through physical robot experiments in outdoor environments, we assess PATERN’s capability to extrapolate preferences and generalize to novel terrains and challenging lighting conditions. Compared to baseline approaches, our findings indicate that PATERN robustly generalizes to diverse terrains and varied lighting conditions, while navigating in a preference-aligned manner.
|
|
ThAT15-AX Oral Session, AX-203 |
Add to My Program |
Modeling and Simulating Humans |
|
|
Chair: Maruyama, Hisataka | Nagoya University |
Co-Chair: Sui, Yanan | Tsinghua University |
|
10:30-12:00, Paper ThAT15-AX.1 | Add to My Program |
Towards Unifying Human Likeness: Evaluating Metrics for Human-Like Motion Retargeting on Bimanual Manipulation Tasks |
|
Meixner, Andre | Karlsruhe Institute of Technology (KIT) |
Carl, Mischa | Karlsruhe Institute of Technology (KIT) |
Krebs, Franziska | Karlsruhe Institute of Technology (KIT) |
Jaquier, Noémie | Karlsruhe Institute of Technology (KIT) |
Asfour, Tamim | Karlsruhe Institute of Technology (KIT) |
Keywords: Human and Humanoid Motion Analysis and Synthesis, Natural Machine Motion, Bimanual Manipulation
Abstract: Generating human-like robot motions is pivotal for achieving smooth human-robot interactions. Such motions contribute to better predictions of robot motions by humans, thus leading to more intuitive interaction and increased acceptability. Human likeness in robot motions has been conventionally measured and realized via the optimization of human-likeness metrics. However, the abundance of such metrics and the absence of standardized criteria impede their usage in novel contexts. In this work, we introduce a unified human-likeness metric built from a hierarchically weighted sum of individual metrics. The proposed metric is derived from a thorough analysis of eleven existing human-likeness criteria and is applicable across various tasks and robot models. We evaluate its performance in the context of motion retargeting of bimanual tasks with three different humanoid robots.
|
|
10:30-12:00, Paper ThAT15-AX.2 | Add to My Program |
MiBOT: A Head-Worn Robot That Modulates Cardiovascular Responses through Human-Like Soft Massage |
|
Mylaeus, Alice | ETH Zurich |
Vogt, Stephanie | ETH Zurich |
Demirel, Berken Utku | ETH Zurich |
Gort, Marcel | ETH Zurich |
Meboldt, Mirko | ETH Zurich |
Meier, Manuel | ETH Zurich |
Holz, Christian | ETH Zürich |
Keywords: Health Care Management, Medical Robots and Systems, Rehabilitation Robotics
Abstract: Massage therapy is helpful for the rehabilitation of various diseases, such as headaches caused by migraines and stress. Existing robotic systems have focused on massage therapy on the torso and limbs, but performing massage motions through suitable actuation on a person’s head has been a challenge. In this paper, we present MiBOT, a head-worn massage robot that actuates two soft tactors to produce touch motions mimicking human massage. A key design principle behind MiBOT is its silent actuation, which we achieve through pneumatic artificial muscles in conjunction with a controller loop to respond to contact pressure. We evaluated the effectiveness of MiBOT in a controlled study and assessed subjects’ blood pressure and heart rate levels while applying MiBOT. We found that our mechanical system generated positive and conclusive quantitative outcomes that are similar to the human-administered massage, decreasing participants’ mean systolic and diastolic blood pressure by 2.8 mmHg and 1.7 mmHg, respectively, as well as calming their heart rate by 8–10% on average.
|
|
10:30-12:00, Paper ThAT15-AX.3 | Add to My Program |
ESP: Extro-Spective Prediction for Long-Term Behavior Reasoning in Emergency Scenarios |
|
Wang, Dingrui | Technical University of Munich |
Zheyuan, Lai | Inceptio |
Li, Yuda | Inceptio Technology |
Wu, Yi | Nanjing University of Posts and Telecommunications |
Ma, Yuexin | ShanghaiTech University |
Betz, Johannes | Technical University of Munich |
Yang, Ruigang | University of Kentucky |
Li, Wei | Inceptio |
Keywords: Intelligent Transportation Systems, Datasets for Human Motion, Data Sets for Robot Learning
Abstract: Emergent-scene safety is the key milestone for fully autonomous driving, and reliable on-time prediction is essential to maintain safety in emergency scenarios. However, these emergency scenarios are long-tailed and hard to collect, which restricts the system from getting reliable predictions. In this paper, we build a new dataset, which aims at the long-term prediction with the inconspicuous state variation in history for the emergency event, named the Extro-Spective Prediction (ESP) problem. Based on the proposed dataset, a flexible feature encoder for ESP is introduced to various prediction methods as a seamless plug-in, and its consistent performance improvement underscores its efficacy. Furthermore, a new metric named clamped temporal error (CTE) is proposed to give a more comprehensive evaluation of prediction performance, especially in time-sensitive emergency events of subseconds. Interestingly, as our ESP features can be described in human-readable language naturally, the application of integrating into ChatGPT also shows huge potential. The ESP-dataset and all benchmarks are released at url{https://dingrui-wang.github.io/ESP-Dataset/
|
|
10:30-12:00, Paper ThAT15-AX.4 | Add to My Program |
SynthAct: Towards Generalizable Human Action Recognition Based on Synthetic Data |
|
Schneider, David | Karlsruhe Institute of Technology |
Keller, Marco | Karlsruhe Institute of Technology (KIT) |
Zhong, Zeyun | Karlsruhe Institute of Technology |
Peng, Kunyu | Karlsruhe Institute of Technology |
Roitberg, Alina | University of Stuttgart |
Beyerer, Jürgen | Fraunhofer Gesellschaft |
Stiefelhagen, Rainer | Karlsruhe Institute of Technology |
Keywords: Modeling and Simulating Humans, Datasets for Human Motion, Human and Humanoid Motion Analysis and Synthesis
Abstract: Synthetic data generation is a proven method for augmenting training sets without the need for extensive setups, yet its application in human activity recognition is underexplored. This is particularly crucial for human-robot collaboration in household settings, where data collection is often privacy-sensitive. In this paper, we introduce SynthAct, a synthetic data generation pipeline designed to significantly minimize the reliance on real-world data. Leveraging modern 3D pose estimation techniques, SynthAct can be applied to arbitrary 2D or 3D video action recordings, making it applicable for uncontrolled in-the-field recordings by robotic agents or smarthome monitoring systems. We present two SynthAct datasets: AMARV, a large synthetic collection with over 800k multi-view action clips, and Synthetic Smarthome, mirroring the Toyota Smarthome dataset. SynthAct generates a rich set of data, including RGB videos and depth maps from four synchronized views, 3D body poses, normal maps, segmentation masks and bounding boxes. We validate the efficacy of our datasets through extensive synthetic-to-real experiments on NTU RGB+D and Toyota Smarthome.
|
|
10:30-12:00, Paper ThAT15-AX.5 | Add to My Program |
Visual-Tactile Robot Grasping Based on Human Skill Learning from Demonstrations Using a Wearable Parallel Hand Exoskeleton |
|
Lu, Zhenyu | Bristol Robotics Laboratory |
Chen, Lu | Shanxi University |
Dai, Hengtai | University of Bristol |
Li, Haoran | University of Bristol |
Zhao, Zhou | Central China Normal University |
Zheng, Bofang | University of Bristol |
Lepora, Nathan | University of Bristol |
Yang, Chenguang | University of Liverpool |
Keywords: Modeling and Simulating Humans, Deep Learning in Grasping and Manipulation, Force and Tactile Sensing
Abstract: The soft fingers and strategic grasping skills enable the human hands to grasp objects stably. This paper aims to model human grasping skills and transfer the learned skills to robots to improve grasping quality and success rate. First, we designed a wearable tool-like parallel hand exoskeleton equipped with optical tactile sensors to acquire multimodal information, including hand positions and postures, the relative distance of the exoskeleton claws, and tactile images. Using the demonstration data, we summarized three characteristics observed from human demonstrations: varying-speed actions, grasping effect read from tactile images and grasping strategies for different positions. The characteristics were then utilized in the robot skill modelling to achieve a more human-like grasp. Since no force sensors are fixed to the claws, we introduced a new variable, called "grasp depth", to represent the grasping effect on the object. The robot grasping strategy diagram is constructed as follows: First, grasp quality is predicted using a linear array network (LAN) and global visual images as inputs. The conditions such as grasp width, depth, position, and angle are also predicted. Second, with the grasp width and depth of the object determined, dynamic movement primitives (DMPs) are employed to mimic human grasp actions with varying velocities. A final action adjustment based on tactile detection is performed during the near-grasp time to enhance grasp quality further.
|
|
10:30-12:00, Paper ThAT15-AX.6 | Add to My Program |
Workstation Suitability Maps: Generating Ergonomic Behaviors on a Population of Virtual Humans with Multi-Task Optimization |
|
Zhong, Jacques | CEA-List |
Weistroffer, Vincent | CEA LIST |
Mouret, Jean-Baptiste | Inria |
Colas, Francis | Inria Nancy Grand Est |
Maurice, Pauline | Cnrs - Loria |
Keywords: Modeling and Simulating Humans, Human and Humanoid Motion Analysis and Synthesis
Abstract: In industrial workstations, the morphology of the worker is a key factor for the feasibility and the ergonomics of an activity. Existing digital human modeling tools can simulate different morphologies at work, but hardly scale to a large population of workers because of limited consideration of morphology-specific behaviors and computational cost. This paper presents a framework to efficiently evaluate the suitability of a workstation over a large population of workers in a physics-based simulation. Activities are simulated through a two-step optimization process, involving a quadratic-programming based whole-body controller and a multi-task optimizer for behavioral adaptation. On a screwdriving scenario, we demonstrate how our framework can help ergonomists improve workstation designs thanks to the resulting suitability maps where generated behaviors are optimized for each morphology w.r.t. ergonomics and performance.
|
|
10:30-12:00, Paper ThAT15-AX.7 | Add to My Program |
Self Model for Embodied Intelligence: Modeling Full-Body Human Musculoskeletal System and Locomotion Control with Hierarchical Low-Dimensional Representation |
|
He, Kaibo | Tsinghua University |
Zuo, Chenhui | Tsinghua University |
Shao, Jing | Tsinghua University |
Sui, Yanan | Tsinghua University |
Keywords: Modeling and Simulating Humans, Human Factors and Human-in-the-Loop, Humanoid and Bipedal Locomotion
Abstract: Modeling and control of the human musculoskeletal system is important for understanding human motor functions, developing embodied intelligence, and optimizing human-robot interaction systems. However, current open-source models are restricted to a limited range of body parts and often with a reduced number of muscles. There is also a lack of algorithms capable of controlling over 600 muscles to generate reasonable human movements. To fill this gap, we build a musculoskeletal model with 90 body segments, 206 joints, and 700 muscle-tendon units, allowing simulation of full-body dynamics and interaction with various devices. We develop a new algorithm using low-dimensional representation and hierarchical deep reinforcement learning to achieve state-of-the-art full-body control. We validate the effectiveness of our model and algorithm in simulations with real human locomotion data. The musculoskeletal model, along with its control algorithm, will be made available to the research community to promote a deeper understanding of human motion control and better design of interactive robots.
|
|
10:30-12:00, Paper ThAT15-AX.8 | Add to My Program |
Keypoints-Guided Lightweight Network for Single-View 3D Human Reconstruction |
|
Chen, Yuhang | Southeast University |
Wang, Chenxing | Southeast University |
Keywords: Modeling and Simulating Humans, Human-Centered Automation, Deep Learning for Visual Perception
Abstract: Single-view 3D human reconstruction has been a hot topic due to the potential of wide applications. To achieve high accuracy, existing works usually take computationally intensive models as backbone for exhaustive underlying features and then directly estimate human mesh vertices. These factors lead to redundant parameters, large calculations and low efficiency, while lightweight solutions to address these challenges are relatively scarce. In this work, based on the problems studied above, we propose a keypoints-guided lightweight network with an encoding-decoding framework. As the input is an image, a lightweight backbone named multi-stage and global feature enhanced network is designed for 2D encoding, where some operations of multi-scale fusion and frequency domain filtering are performed to extract more informative but low-resolution features. As the output is mesh of human body, we construct a keypoints-based 3D human template, with which the 2D low-resolution features can be mapped to 3D space to guide the 3D decoding with high efficiency and high accuracy. Extensive experiments on popular benchmarks 3DPW and Human3.6M illustrate the favorable trade-off between the accuracy and complexity of our method. Our code is publicly available at https://github.com/ChrisChenYh/EfficientHuman.git.
|
|
10:30-12:00, Paper ThAT15-AX.9 | Add to My Program |
Moving Horizon Estimation of Human Kinematics and Muscle Forces |
|
Ceglia, Amedeo | University of Montreal |
Bailly, François | INRIA, Université De Montpellier |
Begon, Mickael | University of Montreal |
Keywords: Modeling and Simulating Humans, Physical Human-Robot Interaction, Sensor-based Control
Abstract: Human-robot interaction based on real-time kinematics or electromyography (EMG) feedback improves rehabilitation using assist-as-needed strategies. Muscle forces are expected to provide even more comprehensive information than EMG to control these assistive rehabilitation devices. Measuring in vivo muscle force is challenging, leading to the development of numerical methods to estimate them. Due to their high computational cost, forward dynamics-based optimization algorithms were not viable for real-time estimation until recently. To achieve muscle forces estimation in real time, a moving horizon estimator (MHE) algorithm was used to track experimental biosignals. Two participants were equipped with EMG sensors and skin markers that were streamed in real time and used as targets for the MHE. The upper-limb musculoskeletal (MSK) model was composed of 10 degrees-of-freedom actuated by 31 muscles. The MHE relies on a series of overlapping trajectory optimization subproblems of which the following parameters have been adjusted: the fixed duration and the frame to export. We based this adjustment on the estimation delay, the muscle saturation, the joint kinematic mean power frequency, and errors to experimental data. Our algorithm provided consistent estimates of muscle forces and kinematics with visual feedback at 30 Hz with a 110 ms delay. This method is promising to guide rehabilitation and enrich assistive device control laws with personalized force estimations.
|
|
ThAT16-AX Oral Session, AX-204 |
Add to My Program |
Humanoid and Bipedal Locomotion |
|
|
Chair: Yi, Jingang | Rutgers University |
Co-Chair: Cheng, Gordon | Technical University of Munich |
|
10:30-12:00, Paper ThAT16-AX.1 | Add to My Program |
Reinforcement Learning with Energy-Exchange Dynamics for Spring-Loaded Biped Robot Walking |
|
Kuo, Cheng-Yu | Nara Institute of Science and Technology |
Shin, Hirofumi | Honda R&D Co., Ltd |
Matsubara, Takamitsu | Nara Institute of Science and Technology |
Keywords: Humanoid and Bipedal Locomotion, Model Learning for Control, Reinforcement Learning
Abstract: This paper presents a probabilistic Model-based Reinforcement Learning (MBRL) approach for learning the Energy-exchange Dynamics (EED) of a spring-loaded biped robot. Our approach enables on-site walking acquisition with high sample efficiency, real-time planning capability and generalizability across skill conditions. Specifically, we learn the data-driven state transition dynamics of the robot in the formulation of energy-states, with their interaction characterized as energy-exchange to reduce dimensionality. To improve planning reliability with the learned EED, we design a control space based on a walking trajectory that follows the law of conservation of energy and formulated by energy-states. We evaluated our approach using a four-degree-of-freedom spring-loaded biped robot in simulation and hardware, and generalizability is validated by using same learning framework for different walking speeds and terrains in simulation and walking acquisition with hardware. All results showed successful on-site walking acquisition with a compact nine-dimension dynamics model, 40Hz real-time planning, and on-site learning within a few minutes.}
|
|
10:30-12:00, Paper ThAT16-AX.2 | Add to My Program |
Foot Shape-Dependent Resistive Force Model for Bipedal Walkers on Granular Terrains |
|
Chen, Xunjie | Rutgers University |
Aditya, Anikode | Rutgers University |
Yi, Jingang | Rutgers University |
Liu, Tao | Zhejiang University |
Keywords: Humanoid and Bipedal Locomotion, Modeling and Simulating Humans
Abstract: Legged robots have demonstrated high efficiency and effectiveness in unstructured and dynamic environments. However, it is still challenging for legged robots to achieve rapid and efficient locomotion on deformable, yielding substrates, such as granular terrains. We present an enhanced resistive force model for bipedal walkers on soft granular terrains by introducing effective intrusion depth correction. The enhanced force model captures fundamental kinetic results considering the robot foot shape, walking gait speed variation, and energy expense. The model is validated by extensive foot intrusion experiments with a bipedal robot. The results confirm the model accuracy on the given type of granular terrains. The model can be further integrated with the motion control of bipedal robotic walkers.
|
|
10:30-12:00, Paper ThAT16-AX.3 | Add to My Program |
Adaptive Passive Biped Dynamic Walking on Unknown Uneven Terrain |
|
Pu, Lishen | Tongji University |
Liu, Yixuan | Tongji University |
Aiqun, Zheng | Shanghai New Tobacco Product Research Institute Co., Ltd |
Qi, Bofeng | Department of Control Science & Engineering, Tongji University |
Xu, Chunquan | Tongji University |
Keywords: Humanoid and Bipedal Locomotion, Passive Walking, Robust/Adaptive Control
Abstract: In this paper, we propose an adaptive controller for virtual passive biped dynamic walking on unknown uneven terrain. The adaptive controller consists of a trajectory tracking control law developed via backstepping method to mimic reference passive gait, and a slope estimator for the inclination angle of the terrain. In addition, a re-planning approach is introduced to correct the robot state off-track from the reference gait due to the terrain changes. The controller is validated through the simulations on the mixed uneven terrain consisting of varying slopes and steps. The results suggest that the controller shows comparable cost of transport and greater adaptability to terrain changes compared with certain existing methods.
|
|
10:30-12:00, Paper ThAT16-AX.4 | Add to My Program |
HumanMimic: Learning Natural Locomotion and Transitions for Humanoid Robot Via Wasserstein Adversarial Imitation |
|
Tang, Annan | The University of Tokyo |
Hiraoka, Takuma | The University of Tokyo |
Hiraoka, Naoki | The University of Tokyo |
Shi, Fan | ETH Zürich |
Kawaharazuka, Kento | The University of Tokyo |
Kojima, Kunio | The University of Tokyo |
Okada, Kei | The University of Tokyo |
Inaba, Masayuki | The University of Tokyo |
Keywords: Humanoid and Bipedal Locomotion, Reinforcement Learning, Human and Humanoid Motion Analysis and Synthesis
Abstract: Transferring human motion skills to humanoid robots remains a significant challenge. In this study, we introduce a Wasserstein adversarial imitation learning system, allowing humanoid robots to replicate natural whole-body locomotion patterns and execute seamless transitions by mimicking human motions. First, we present a unified primitive-skeleton motion retargeting to mitigate morphological differences between arbitrary human demonstrators and humanoid robots. An adversarial critic component is integrated with Reinforcement Learning (RL) to guide the control policy to produce behaviors aligned with the data distribution of mixed reference motions. Additionally, we employ a specific Integral Probabilistic Metric (IPM), namely the Wasserstein-1 distance with a novel soft boundary constraint to stabilize the training process and prevent model collapse. Our system is evaluated on a full-sized humanoid JAXON in the simulator. The resulting control policy demonstrates a wide range of locomotion patterns, including standing, push-recovery, squat walking, human-like straight-leg walking, and dynamic running. Notably, even in the absence of transition motions in the demonstration dataset, the robot showcases an emerging ability to transit naturally between distinct locomotion patterns as desired speed changes.
|
|
10:30-12:00, Paper ThAT16-AX.5 | Add to My Program |
Online Adaptive Motion Generation for Humanoid Locomotion on Non-Flat Terrain Via Template Behavior Extension (I) |
|
Meng, Xiang | Beijing Institute of Technology |
Yu, Zhangguo | Beijing Institute of Technology |
Chen, Xuechao | Beijing Insititute of Technology |
Huang, Zelin | Beijing Institute of Technology |
Meng, Fei | Beijing Institute of Technology |
Huang, Qiang | Beijing Institute of Technology |
Keywords: Humanoid and Bipedal Locomotion, Whole-Body Motion Planning and Control
Abstract: For humanoid robots, online motion generation on non-flat terrain remains an ongoing research challenge. Computational complexity is one of the primary restrictions that preclude motion planners from generating adaptive behaviors online. In this paper, we investigate this problem and decompose it into two sequential components: an Efficient Behavior Generator (EBG) and a Nonlinear Centroidal Model Predictive Controller (NC-MPC). The EBG is responsible for optimizing the physically feasible whole-body template behaviors, which can provide reliable warm-starts for NC-MPC, thereby greatly reducing the computational effort of online planning. With tailored objective function and feet complementary constraints, the EBG can search for a near-optimal solution after several iterations within seconds for different behaviors including walking, running, and jumping, even with intuitive initial guesses. To make the template behaviors extensible when the robot encounters possible different scenarios, the NC-MPC is proposed to regenerate the reactive motion online to adapt it to the real local environment. Finally, we validate the effectiveness of synthesizing EBG and NC-MPC for humanoid locomotion on non-flat terrain in simulation and on the real humanoid robot BHR7P.
|
|
10:30-12:00, Paper ThAT16-AX.6 | Add to My Program |
A Bio-Plausible Approach to Realizing Heat-Evoked Nociceptive Withdrawal Reflex on the Upper Limb of a Humanoid Robot |
|
Wang, Fengyi | Technical University of Munich |
Guadarrama-Olvera, Julio Rogelio | Technical University of Munich |
Thakor, Nitish V. | Johns Hopkins University, Baltimore, USA |
Cheng, Gordon | Technical University of Munich |
Keywords: Humanoid Robot Systems, Biomimetics
Abstract: In this letter, we present a method for realizing the heat-evoked nociceptive withdrawal reflex (NWR) in the upper limb of a humanoid robot so that it can avoid the potential damage caused by noxious heat. We use a spiking neuron network whose structure, encoding scheme, and form of information transmission mimic the reflex arc in humans to improve bio-plausibility. The proper synaptic strengths between the sensory neurons and the interneurons in the first two layers are learned using the bio-plausible reward modulated spike timing-dependent plasticity learning algorithm. By monitoring the spikes from the motor neuron in the third layer, a reflex matching the intensity of the stimulation can be evoked. Experimental evaluations show that noxious heat stimulation can be detected online and evoke the NWR. The experiments on a full-size humanoid robot show that the method enables robots to avoid potential damage robustly with proper NWR, depending on the site and intensity of the stimulation. We also verify that the method takes advantage of the intrinsic characteristics of its neuromorphic encoding scheme to reproduce essential features of the NWR e.g., spatial summation effect and temporal summation effect in humans. The improved bio-plausibility and the capability to reproduce the human-like features make the proposed method suitable for devices that provide perceptual feedback to human users and allow local processing with low energy consumption, such as cognitive prosth
|
|
10:30-12:00, Paper ThAT16-AX.7 | Add to My Program |
Fall Prediction for Bipedal Robots: The Standing Phase |
|
Mungai, M. Eva | University of Michigan |
Grizzle, J.W | University of Michigan |
Prabhakaran, Gokul | Univeristy of Michigan |
Keywords: Humanoid Robot Systems, Failure Detection and Recovery, Humanoid and Bipedal Locomotion
Abstract: This paper presents a novel approach to fall prediction for bipedal robots, specifically targeting the detection of potential falls while standing caused by abrupt, incipient, and intermittent faults. Leveraging a 1D convolutional neural network (CNN), our method aims to maximize lead time for fall prediction while minimizing false positive rates. The proposed algorithm uniquely integrates the detection of various fault types and estimates the lead time for potential falls. Our contributions include the development of an algorithm capable of detecting abrupt, incipient, and intermittent faults in full-sized robots, its implementation using both simulation and hardware data for a humanoid robot, and a method for estimating lead time. Evaluation metrics, including false positive rate, lead time, and response time, demonstrate the efficacy of our approach. Particularly, our model achieves impressive lead times and response times across different fault scenarios with a false positive rate of 0. The findings of this study hold significant implications for enhancing the safety and reliability of bipedal robotic systems.
|
|
10:30-12:00, Paper ThAT16-AX.8 | Add to My Program |
Shape-Changing Robotic Mannequin Shoulder with Bio-Inspired Layered Structure |
|
Long, Juncai | ZheJiang University |
Li, Jituo | Zhejiang University |
Lu, Yiwen | Zhejiang University |
Zhou, Chengdi | ZheJiang University |
Lu, GuoDong | Zhejiang University |
Feng, Yixiong | Zhejiang University |
Keywords: Humanoid Robot Systems, Modeling, Control, and Learning for Soft Robots
Abstract: Shape-changing robotic mannequin is a humanoid robot for imitating shapes of human bodies. The diversity of human bodies makes it difficult to imitate various body shapes, especially the shoulders. This paper proposes a rigid-flexible-soft coupling three-layered robotic mannequin shoulder inspired by human body anatomy. The robotic mannequin can adjust the anisotropic deformation of its human-like skin to imitate body dimensions, shape details and surface curvatures of target bodies. Structurally, the inner skeleton layer is composed of rigid framework and linear actuators for changing the global body dimensions. The middle muscle layer consists of flexible patches and layer-jamming bars with tunable stiffness for controlling the surface curvatures. The outer soft skin layer envelops the patches, forming a human-like surface of the robotic mannequin. To imitate a human body, the linear actuators drive the patches forward, which deforms the elastic skin layer. The tensioned skin layer inversely drives the bending deformation of patches, which can be controlled by layer-jamming bars. We design the three-layered structure by analyzing the shape differences of hundreds of scanned human models. An energy-based method is proposed to predict and control the coupling deformation of the layered structure. A physical robotic shoulder prototype has been built to verify the effectiveness of our method.
|
|
10:30-12:00, Paper ThAT16-AX.9 | Add to My Program |
UKF-Based Sensor Fusion for Joint-Torque Sensorless Humanoid Robots |
|
Sorrentino, Ines | Istituto Italiano Di Tecnologia |
Romualdi, Giulio | Istituto Italiano Di Tecnologia |
Pucci, Daniele | Italian Institute of Technology |
Keywords: Humanoid Robot Systems, Physical Human-Robot Interaction, Sensor Fusion
Abstract: This paper proposes a novel sensor fusion based on Unscented Kalman Filtering for the online estimation of joint-torques of humanoid robots without joint-torque sensors. At the feature level, the proposed approach considers multi-modal measurements (e.g. currents, accelerations, etc.) and non-directly measurable effects, such as external contacts, thus leading to joint torques readily usable in control architectures for human-robot interaction. The proposed sensor fusion can also integrate distributed, non-collocated force/torque sensors, thus being a flexible framework with respect to the underlying robot sensor suit. To validate the approach, we show how the proposed sensor fusion can be integrated into a two-level torque control architecture aiming at task-space torque control. The performances of the proposed approach are shown through extensive tests on the new humanoid robot ergoCub, currently being developed at Istituto Italiano di Tecnologia. We also compare our strategy with the existing state-of-the-art approach based on the recursive Newton-Euler algorithm. Results demonstrate that our method achieves low root mean square errors in torque tracking, ranging from 0.05 Nm to 2.5 Nm, even in the presence of external contacts.
|
|
ThAT17-AX Oral Session, AX-205 |
Add to My Program |
Reactive and Sensor-Based Planning |
|
|
Chair: Akai, Naoki | Nagoya University |
Co-Chair: Pantic, Michael | ETH Zürich |
|
10:30-12:00, Paper ThAT17-AX.1 | Add to My Program |
Waverider: Leveraging Hierarchical, Multi-Resolution Maps for Efficient and Reactive Obstacle Avoidance |
|
Reijgwart, Victor | ETH Zurich |
Pantic, Michael | ETH Zürich |
Siegwart, Roland | ETH Zurich |
Ott, Lionel | ETH Zurich |
Keywords: Reactive and Sensor-Based Planning, Collision Avoidance, Aerial Systems: Applications
Abstract: Fast and reliable obstacle avoidance is an important task for mobile robots. In this work, we propose an efficient reactive system that provides high-quality obstacle avoidance while running at hundreds of hertz with minimal resource usage. Our approach combines wavemap, a hierarchical volumetric map representation, with a novel hierarchical and parallelizable obstacle avoidance algorithm formulated through Riemannian Motion Policies (RMP). Leveraging multi-resolution obstacle avoidance policies, the proposed navigation system facilitates precise, low-latency (36ms), and extremely efficient obstacle avoidance with a very large perceptive radius (30m). We perform extensive statistical evaluations on indoor and outdoor maps, verifying that the proposed system compares favorably to fixed-resolution RMP variants and CHOMP. Finally, the RMP formulation allows the seamless fusion of obstacle avoidance with additional objectives, such as goal-seeking, to obtain a fully-fledged navigation system that is versatile and robust. We deploy the system on a Micro Aerial Vehicle and show how it navigates through an indoor obstacle course. Our complete implementation, called waverider, is made available as open source.
|
|
10:30-12:00, Paper ThAT17-AX.2 | Add to My Program |
Whisker-Based Tactile Navigation Algorithm for Underground Robots |
|
Kossas, Tanel | Tallinn University of Technology |
Remmas, Walid | Tallinn University of Technology / Université De Montpellier |
Gkliva, Roza | Tallinn University of Technology |
Ristolainen, Asko | Tallinn University of Technology |
Kruusmaa, Maarja | Tallinn University of Technology (TalTech) |
Keywords: Reactive and Sensor-Based Planning, Autonomous Vehicle Navigation, Biologically-Inspired Robots
Abstract: This work explores the use of artificial whiskers as tactile sensors for enhancing the perception and navigation capabilities of mobile robots in challenging settings such as caves and underground mines. These environments exhibit inconsistent lighting conditions, locally self-similar textures, and general poor visibility conditions, that can cause the performance of state-of-the-art vision-based methods to decline. In order to evaluate the efficacy of tactile sensing in this context, three algorithms were developed and tested with simulated and physical experiments: a wall-follower, a navigation algorithm based on Theta*, and a hybrid approach that combines the two. The obtained results highlight the efficacy of tactile sensing for wall-following in intricate environments. When paired with an external method for pose estimation, it further aids in navigating unknown environments. Moreover, by integrating navigation with wall-following, the third, hybrid algorithm enhanced the map traversal speed by roughly 26−43% compared to standard navigation methods without wall-following.
|
|
10:30-12:00, Paper ThAT17-AX.3 | Add to My Program |
Spline-Interpolated Model Predictive Path Integral Control with Stein Variational Inference for Reactive Navigation |
|
Miura, Takato | Nagoya University |
Akai, Naoki | Nagoya University |
Honda, Kohei | Nagoya University |
Hara, Susumu | Nagoya University |
Keywords: Reactive and Sensor-Based Planning, Motion and Path Planning, Autonomous Vehicle Navigation
Abstract: This paper presents a reactive navigation method based on model predictive path integral (MPPI) control with spline interpolation of control input sequence and Stein variational gradient descent (SVGD). MPPI formulates a non-linear optimization problem that determines an optimal control input sequence and solves it using a sampling-based method. Results by MPPI significantly depend on sampling noises. To quickly find paths to avoid large and/or newly detected obstacles, the sampling noises should be set to large. However, the large noises yield non-smooth control input sequences, resulting in non-smooth paths. To prevent this problem, we first introduce spline interpolation of control input sequence in the MPPI process. Owing to the interpolation, smooth control input sequences can be obtained even though the large sampling noises are used. However, a vanilla MPPI algorithm still does not work in a case where there are optimal and near optimal solutions in one scene, e.g., there are several paths to avoid obstacles, because MPPI assumes that a distribution over an optimal control input sequence can be approximated by a Gaussian distribution. To overcome this problem, we further apply SVGD to MPPI with the spline interpolation. SVGD is based on the optimal transportation algorithm and has a property to put samples into around one optimal sample. As a result, we can achieve robust reactive navigation that quickly finds a path to avoid obstacles while keeping smoothness of control input sequence. We validate our proposals in a simulator of a quadrotor. Results show that our proposal involving both the spline interpolation and SVGD outperforms other base line methods.
|
|
10:30-12:00, Paper ThAT17-AX.4 | Add to My Program |
RETOM: Leveraging Maneuverability for Reactive Tool Manipulation Using Wrench-Fields |
|
Eberle, Felix | Technical University of Munich |
Laha, Riddhiman | Technical University of Munich |
Yao, Haowen | Technical Univerity of Munich |
Naceri, Abdeldjallil | Technical University of Munich |
Figueredo, Luis | Technical University of Munich (TUM) |
Haddadin, Sami | Technical University of Munich |
Keywords: Reactive and Sensor-Based Planning, Manipulation Planning, Motion and Path Planning
Abstract: This paper investigates the problem of effective tool manipulation for motion planning in complex human-like scenarios. Vector-field-based real-time strategies, although widely used, usually do not account for unwieldy tools or incorporate systematic methods to handle these extra maneuvers needed. Instead, we formalize the problem and propose a novel field- based reactive planner that explicitly accounts for rotational forces for seamless maneuvers based on the tool’s geometry and featured points. Furthermore, we capture and encode robot performance through capability metrics and improve the same using an additional quality distribution method. This enables seamless integration of the robot’s embodiment with the reactive force-torque (wrench) field giving rise to flexible tool usage in non-stationary environments. Extensive simulation analysis on a 7 DoF collaborative robot manipulating a common tool in an unorganized table-top layout reinforces our claim of robustness in stationary and non-stationary scenarios.
|
|
10:30-12:00, Paper ThAT17-AX.5 | Add to My Program |
Optimal Prescribed-Time Control Based Reactive Planning System for Quadruped Robot Navigation |
|
Xu, Shaohang | Huazhong University of Science and Technology |
Zhang, Wentao | Huazhong University of Science and Technology |
Ho, Chin Pang | City University of Hong Kong |
Zhu, Lijun | Huazhong University of Science and Technology |
Keywords: Reactive and Sensor-Based Planning, Motion and Path Planning, Collision Avoidance
Abstract: In this paper, we propose a reactive planning system for quadruped robots based on prescribed-time control. The navigation of the quadruped robot is fundamentally depicted as omnidirectional movements, while a feedback control law is formulated to address any deviations the robot may encounter. In particular, our proposed feedback control system is theoretically proven to achieve convergence within a predefined finite time that is specified by the user. To further compute the optimal convergent time and the local goal state, we present a high-level planning node encompassing terrain-aware kinodynamic search and spatiotemporal trajectory optimization, which can generate collision-free, smooth, and efficient trajectories. The effectiveness of our proposed framework is validated through both numerical simulation and real-robot experiments in indoor and outdoor environments, including scenarios with cluttered obstacles, slopes, and external disturbances.
|
|
10:30-12:00, Paper ThAT17-AX.6 | Add to My Program |
On the Fly Robotic-Assisted Medical Instrument Planning and Execution Using Mixed Reality |
|
Ai, Letian | Johns Hopkins University |
Liu, Yihao | Johns Hopkins University |
Armand, Mehran | Johns Hopkins University |
Kheradmand, Amir | Johns Hopkins University |
Martin-Gomez, Alejandro | Johns Hopkins University |
Keywords: Virtual Reality and Interfaces, Medical Robots and Systems, Surgical Robotics: Planning
Abstract: Robotic-assisted medical systems (RAMS) have gained significant attention for their advantages in alleviating surgeons' fatigue and improving patients' outcomes. These systems comprise a range of human-computer interactions, including medical scene monitoring, anatomical target planning, and robot manipulation. However, despite its versatility and effectiveness, RAMS demands expertise in robotics, leading to a high learning cost for the operator. In this work, we introduce a novel framework using mixed reality technologies to ease the use of RAMS. The proposed framework achieves real-time planning and execution of medical instruments by providing 3D anatomical image overlay, human-robot collision detection, and robot programming interface. These features, integrated with an easy-to-use calibration method for head-mounted display, improve the effectiveness of human-robot interactions. To assess the feasibility of the framework, two medical applications are presented in this work: 1) coil placement during transcranial magnetic stimulation and 2) drill and injector device positioning during femoroplasty. Results from these use cases demonstrate its potential to extend to a wider range of medical scenarios.
|
|
10:30-12:00, Paper ThAT17-AX.7 | Add to My Program |
K-VIL: Keypoints-Based Visual Imitation Learning |
|
Gao, Jianfeng | Karlsruhe Institute of Technology (KIT) |
Tao, Zhi | Karlsruhe Institute of Technology |
Jaquier, Noémie | Karlsruhe Institute of Technology |
Asfour, Tamim | Karlsruhe Institute of Technology (KIT) |
Keywords: Learning from Demonstration, Visual Learning, Manipulation Planning, Learning of Geometric Constraints
Abstract: Visual imitation learning provides efficient and intuitive solutions for robotic systems to acquire novel manipulation skills. However, simultaneously learning geometric task constraints and control policies from visual inputs alone Visual imitation learning provides efficient and intuitive solutions for robotic systems to acquire novel manipulation skills. However, simultaneously learning geometric task constraints and control policies from visual inputs alone remains a challenging problem. In this paper, we propose the keypoint-based visual imitation learning (K-VIL) approach that automatically extracts sparse, object-centric, and embodiment-independent task representations from a small number of human demonstration videos. The task representation is composed of keypoint-based geometric constraints on principal manifolds, their associated local frames, and the movement primitives that are then needed for the task execution. Our approach is capable of extracting such task representations from a single demonstration video, and of incrementally updating them when new demonstrations are available. To reproduce manipulation skills using the learned set of prioritized geometric constraints in novel scenes, we introduce a novel keypoint-based admittance controller. We evaluate our approach in several real-world applications, showcasing its ability to deal with cluttered scenes, viewpoint mismatch, new instances of categorical objects, and large object pose and shape variations.
|
|
10:30-12:00, Paper ThAT17-AX.8 | Add to My Program |
Circular Field Motion Planning for Highly-Dynamic Multi-Robot Systems with Application to Robot Soccer |
|
Zeug, Fabrice | Gottfried Wilhelm Leibniz Universität Hannover |
Becker, Marvin | Gottfried Wilhelm Leibniz Universität Hannover |
Müller, Matthias A. | Gottfried Wilhelm Leibniz Universität Hannover |
Keywords: Reactive and Sensor-Based Planning, Collision Avoidance, Path Planning for Multiple Mobile Robots or Agents
Abstract: The rise of autonomous driving in everyday life makes efficient and collision-free motion planning more important than ever. However, multi robot applications in highly dynamic environments still pose hard challenges for state-of-the-art motion planners. In this paper, we present a new iteration of a reactive circular fields motion planner with the focus on simultaneous control of multiple robots in robotic soccer games, which is able to operate omnidirectional robots safely and efficiently despite high measurement delays and inaccuracies. Our extension enables the definition and effective execution of complex tasks in soccer specific problems. We extensively evaluated our planner in several complex simulation environments and experimentally verified the approach in realistic scenarios on real soccer robots. Furthermore, we demonstrated the capabilities of our motion planner during the successful participation in the RoboCup 2022 and 2023.
|
|
10:30-12:00, Paper ThAT17-AX.9 | Add to My Program |
Autonomous Mapless Navigation on Uneven Terrains |
|
Jardali, Hassan | Indiana University |
Ali, Mahmoud | Indiana University |
Liu, Lantao | Indiana University |
Keywords: Reactive and Sensor-Based Planning, Autonomous Vehicle Navigation
Abstract: We propose a new method for autonomous navigation in uneven terrains by utilizing a sparse Gaussian Process (SGP) based local perception model. The SGP local perception model is trained on local ranging observation (pointcloud) to learn the terrain elevation profile and extract the feasible navigation subgoals around the robot. Subsequently, a cost function, which prioritizes the safety of the robot in terms of keeping the robot's roll and pitch angles bounded within a specified range, is used to select a safety-aware subgoal that leads the robot to its final destination. The algorithm is designed to run in real-time and is intensively evaluated in simulation and real-world experiments. The results compellingly demonstrate that our proposed algorithm consistently navigates uneven terrains with high efficiency and surpasses the performance of other planners. The implementation of our method, including the supplementary video showing the experimental and real-world results, is available at https://rb.gy/3ov2r8.
|
|
ThAT18-AX Oral Session, AX-206 |
Add to My Program |
Optimization and Optimal Control III |
|
|
Chair: Del Prete, Andrea | University of Trento |
Co-Chair: Calinon, Sylvain | Idiap Research Institute |
|
10:30-12:00, Paper ThAT18-AX.1 | Add to My Program |
Optimal Control of Granular Material |
|
Aoyama, Yuichiro | Georgia Institute of Technology |
Haeri, Amin | Concordia University |
Theodorou, Evangelos | Georgia Institute of Technology |
Keywords: Optimization and Optimal Control, Machine Learning for Robot Control, Industrial Robots
Abstract: The control of granular materials, which are found in many industrial applications, is a challenging open research problem. Granular material systems are complex-behavior (as they could have solid-, fluid-, and gas-like behaviors) and high-dimensional (as they could have many grains/particles with at least 3 DOF in 3D) systems. Recently, a machine learning-based Graph Neural Network (GNN) simulator has been proposed to learn the underlying dynamics. In this paper, we perform optimal control of a rigid body-driven granular material system whose dynamics is learned by a GNN model trained by reduced data generated via a physics-based simulator and Principal Component Analysis (PCA). We use Differential Dynamic Programming (DDP) to obtain optimal control commands that can form granular particles into a target shape. The model and results are shown to be relatively fast and accurate. The control commands are also applied to the ground truth model, i.e., physics-based simulator, to further validate the approach.
|
|
10:30-12:00, Paper ThAT18-AX.2 | Add to My Program |
CACTO: Continuous Actor-Critic with Trajectory Optimization---Towards Global Optimality |
|
Grandesso, Gianluigi | University of Trento |
Alboni, Elisa | University of Trento |
Rosati Papini, Gastone Pietro | University of Trento |
Wensing, Patrick M. | University of Notre Dame |
Del Prete, Andrea | University of Trento |
Keywords: Optimization and Optimal Control, Machine Learning for Robot Control, Reinforcement Learning
Abstract: This paper presents a novel algorithm for the continuous control of dynamical systems that combines Trajectory Optimization (TO) and Reinforcement Learning (RL) in a single framework. The motivations behind this algorithm are the two main limitations of TO and RL when applied to continuous nonlinear systems to minimize a non-convex cost function. Specifically, TO can get stuck in poor local minima when the search is not initialized close to a “good” minimum. On the other hand, when dealing with continuous state and control spaces, the RL training process may be excessively long and strongly dependent on the exploration strategy. Thus, our algorithm learns a “good” control policy via TO-guided RL policy search that, when used as initial guess provider for TO, makes the trajectory optimization process less prone to converge to poor local optima. Our method is validated on several reaching problems featuring non-convex obstacle avoidance with different dynamical systems, including a car model with 6D state, and a 3-joint planar manipulator. Our results show the great capabilities of CACTO in escaping local minima, while being more computationally efficient than the Deep Deterministic Policy Gradient (DDPG) and Proximal Policy Optimization (PPO) RL algorithms.
|
|
10:30-12:00, Paper ThAT18-AX.3 | Add to My Program |
Learning Model Predictive Control with Error Dynamics Regression for Autonomous Racing |
|
Xue, Haoru | Carnegie Mellon University |
Zhu, Edward | University of California, Berkeley |
Dolan, John M. | Carnegie Mellon University |
Borrelli, Francesco | University of California, Berkeley |
Keywords: Optimization and Optimal Control, Model Learning for Control, Autonomous Vehicle Navigation
Abstract: This work presents a novel Learning Model Predictive Control (LMPC) strategy for autonomous racing at the handling limit that can iteratively explore and learn unknown dynamics in high-speed operational domains. We start from existing LMPC formulations and modify the system dynamics learning method. In particular, our approach uses a nominal, global, nonlinear, physics-based model with a local, linear, data-driven learning of the error dynamics. We conduct experiments in simulation, 1/10th scale hardware, and deployed the proposed LMPC on a full-scale autonomous race car used in the Indy Autonomous Challenge (IAC) with closed loop experiments at the Putnam Park Road Course in Indiana, USA. The results show that the proposed control policy exhibits improved robustness to parameter tuning and data scarcity. Incremental and safety-aware exploration toward the limit of handling and iterative learning of the vehicle dynamics in high-speed domains is observed both in simulations and experiments.
|
|
10:30-12:00, Paper ThAT18-AX.4 | Add to My Program |
Robust Balancing Control of Biped Robots for External Forces |
|
Park, Hae Yeon | POSTECH |
Kim, Jung Hoon | Pohang University of Science and Technology |
Keywords: Optimization and Optimal Control, Robust/Adaptive Control, Humanoid and Bipedal Locomotion
Abstract: This paper develops a controller synthesis method for ensuring an admissible bound of external forces on biped robots in a desired level. We first introduce the authors` preceding results on the norm-based stability criterion for a biped walking constructed on its linear inverted pendulum model (LIPM). More precisely, an induced norm can be taken to formulate the fact that the balance for a biped robot is achieved if its zero moment point (ZMP) always stays in the supporting region at each step. Based on this norm-based criterion, we aim at making the maximum energy of external forces admissible for balancing the biped robot be a pregiven desired bound gamma(> 0). To achieve this objective, a robust controller is designed through the linear matrix inequality (LMI)-based approach. More importantly, a necessary and sufficient condition for the existence of a robust controller leading to the desired bound is characterized by some LMI conditions. The effectiveness of the overall arguments is validated through some comparative simulation results of a biped walking robot with external forces.
|
|
10:30-12:00, Paper ThAT18-AX.5 | Add to My Program |
Robust Policy Iteration of Uncertain Interconnected Systems with Imperfect Data (I) |
|
Qasem, Omar | American International University |
Gao, Weinan | Florida Institute of Technology |
Keywords: Optimization and Optimal Control, Robust/Adaptive Control, Reinforcement Learning
Abstract: This paper investigates the robust optimal control problem of a class of continuous-time, partially linear, interconnected systems. In addition to the dynamic uncertainties resulted from the interconnected dynamic system, unknown bounded disturbances and computational errors are taken into account throughout the learning process, wherein the system’s dynamics are also assumed unknown. These challenges lead the collected online data to be imperfect. In this scenario, traditional data-driven control techniques, such as adaptive dynamic programming (ADP) and robust ADP, encounter a challenge in learning the optimal control policy precisely due to imperfect data. In this paper, a novel data-driven robust policy iteration method is proposed to solve the robust optimal control problems. Without relying on the knowledge of the system’s dynamics, the external disturbances or the complete state, the implementation of the proposed method only needs to access the input and partial state information. Based on the small-gain theorem, the notions of strong unboundedness observability and input-to-output stability, it is guaranteed that the learned robust optimal control gain is stabilizing and that the solution of the closed-loop system is uniformly ultimately bounded despite the existence of dynamic uncertainties and unknown external disturbances. The simulation results reveal the efficiency and practicality of the proposed data-driven control method.
|
|
10:30-12:00, Paper ThAT18-AX.6 | Add to My Program |
Robust Co-Design of Canonical Underactuated Systems for Increased Certifiable Stability |
|
Girlanda, Federico | University of Padua |
Kumar, Shivesh | DFKI GmbH |
Shala, Lasse | Deutsches Forschungszentrum Für Künstliche Intelligenz |
Kirchner, Frank | University of Bremen |
Keywords: Optimization and Optimal Control, Robust/Adaptive Control, Underactuated Robots
Abstract: Optimal behaviours of a system to perform a specific task can be achieved by leveraging the coupling between trajectory optimization, stabilization and design optimization. This approach proves particularly advantageous for underactuated systems, which are systems that have fewer actuators than degrees of freedom and thus require for more elaborate control systems. This paper proposes a novel co-design algorithm, namely Robust Trajectory Control with Design optimization (RTC-D). An inner optimization layer (RTC) simultaneously performs direct transcription (DIRTRAN) to find a nominal trajectory while computing optimal hyperparameters for a stabilizing time-varying linear quadratic regulator (TVLQR). RTC-D augments RTC with a design optimization layer, maximizing the system’s robustness through a time-varying Lyapunov-based region of attraction (ROA) analysis. This analysis provides a formal guarantee of stability for a set of off-nominal states. The proposed algorithm has been tested on two different underactuated systems: the torque-limited simple pendulum and the cart-pole. Extensive simulations of off-nominal initial conditions demonstrate improved robustness, while real-system experiments show increased insensitivity to torque disturbances.
|
|
10:30-12:00, Paper ThAT18-AX.7 | Add to My Program |
Whole-Body Ergodic Exploration with a Manipulator Using Diffusion |
|
Bilaloglu, Cem | Idiap Research Institute, École Polytechnique Fédérale De Lausan |
Löw, Tobias | Idiap Research Institute, EPFL |
Calinon, Sylvain | Idiap Research Institute |
Keywords: Optimization and Optimal Control, Sensorimotor Learning, Whole-Body Motion Planning and Control
Abstract: This paper presents a whole-body robot control method for exploring and probing a given region of interest. The ergodic control formalism behind such an exploration behavior consists of matching the time-averaged statistics of a robot trajectory with the spatial statistics of the target distribution. Most existing ergodic control approaches assume the robots/sensors as individual point agents moving in space. We introduce an approach that decomposes the whole-body of a robotic manipulator into multiple kinematically constrained agents. Then, we generate control actions by calculating a consensus among the agents. To do so, we use an ergodic control formulation called heat equation-driven area coverage (HEDAC) and slow the diffusion using the non-stationary heat equation. Our approach extends HEDAC to applications where robots have multiple sensors on the whole-body (such as tactile skin) and use all sensors to optimally explore the given region. We show that our approach increases the exploration performance in terms of ergodicity and scales well to real-world problems. We compare our method in kinematic simulations with the state-of-the-art and demonstrate the applicability of an online exploration task with a 7-axis Franka Emika robot.
|
|
10:30-12:00, Paper ThAT18-AX.8 | Add to My Program |
ReLU-QP: A GPU-Accelerated Quadratic Programming Solver for Model-Predictive Control |
|
Bishop, Arun | Carnegie Mellon University |
Zhang, John | Carnegie Mellon University |
Gurumurthy, Swaminathan | Carnegie Mellon University |
Tracy, Kevin | Carnegie Mellon University |
Manchester, Zachary | Carnegie Mellon University |
Keywords: Optimization and Optimal Control, Whole-Body Motion Planning and Control
Abstract: We present ReLU-QP, a GPU-accelerated solver for quadratic programs (QPs) that is capable of solving high-dimensional control problems at real-time rates. ReLU-QP is derived by exactly reformulating the Alternating Direction Method of Multipliers (ADMM) algorithm for solving QPs as a deep, weight-tied neural network with rectified linear unit (ReLU) activations. This reformulation enables the deployment of ReLU-QP on GPUs using standard machine-learning toolboxes. We evaluate the performance of ReLU-QP across three model-predictive control (MPC) benchmarks: stabilizing random linear dynamical systems with control limits, balancing an Atlas humanoid robot on a single foot, and performing a whole-body pick-up motion on a quadruped equipped with a six-degree-of-freedom arm. These benchmarks indicate that ReLU-QP is competitive with state-of-the-art CPU-based solvers for small-to-medium-scale problems and offers order-of-magnitude speed improvements for larger-scale problems.
|
|
ThAT19-NT Oral Session, NT-G301 |
Add to My Program |
Surgical Robotics I |
|
|
Chair: Jones, Dominic | University of Leeds |
Co-Chair: Kanoulas, Dimitrios | University College London |
|
10:30-12:00, Paper ThAT19-NT.1 | Add to My Program |
Markerless Ultrasound Probe Pose Estimation in Mini-Invasive Surgery |
|
Kalantari, Mohammad Mahdi | Clermont Auvergne INP, CNRS, SIGMA Clermont, Institut Pascal |
Ozgur, Erol | SIGMA-Clermont / Institut Pascal |
Alkhatib, Mohammad | Université Clermont Auvergne |
Buc, Emmanuel | University Hospital of Clermont-Ferrand, France |
Le Roy, Bertrand | University Hospital of Saint-Etienne, France |
Modrzejewski, Richard | SurgAR |
Mezouar, Youcef | Clermont Auvergne INP - SIGMA Clermont |
Bartoli, Adrien | UCA |
Keywords: Surgical Robotics: Laparoscopy, Computer Vision for Medical Robotics
Abstract: In mini-invasive surgery, the laparoscopic ultrasound probe is visible in the laparoscopic image. We address the problem of estimating the probe pose with respect to the laparoscope without using markers and additional sensors. We propose the first method using a single standard laparoscopic monocular RGB image. It is robust, initialization-free and runs at 10 fps, thus forming a promising tool to improve robotic and augmented reality-based surgery.
|
|
10:30-12:00, Paper ThAT19-NT.2 | Add to My Program |
Occlusion-Robust Autonomous Robotic Manipulation of Human Soft Tissues with 3D Surface Feedback |
|
Hu, Junlei | University of Leeds |
Jones, Dominic | University of Leeds |
Dogar, Mehmet R | University of Leeds |
Valdastri, Pietro | University of Leeds |
Keywords: Surgical Robotics: Laparoscopy, Dual Arm Manipulation, Manipulation Planning, Soft Object Manipulation
Abstract: Robotic manipulation of 3D soft objects remains challenging in both the industrial and medical fields. Various methods based on mechanical modelling, data-driven approaches or explicit feature tracking have been proposed. A unifying disadvantage of these methods is the high computational cost of simultaneous imaging processing, identification of mechanical properties, and motion planning, leading to a need for less computationally intensive methods. We propose a method for autonomous robotic manipulation with 3D surface feedback to solve these issues. First, we produce a deformation model of the manipulated object, which estimates the robots’ movements by monitoring the displacement of surface points surrounding the manipulators. Then we develop a 6-degree-of-freedom velocity controller to manipulate the grasped object to achieve a desired shape. To validate our approach, we conduct comparative simulations with existing methods and perform experiments using phantom and cadaveric soft tissues with the da Vinci Research Kit. The results demonstrate the robustness of the method to occlusions and various materials. Compared to state-of-the-art linear and data-driven methods, our approach is more precise by 46.5% and 15.9%, and saves 55.2% and 25.7% manipulation time, respectively.
|
|
10:30-12:00, Paper ThAT19-NT.3 | Add to My Program |
Sensorless Transparency Optimized Haptic Teleoperation on the Da Vinci Research Kit |
|
Yilmaz, Nural | Marmara University |
Burkhart, Brendan | Johns Hopkins University |
Deguet, Anton | Johns Hopkins University |
Kazanzides, Peter | Johns Hopkins University |
Tumerdem, Ugur | Marmara University |
Keywords: Surgical Robotics: Laparoscopy, Haptics and Haptic Interfaces, Telerobotics and Teleoperation
Abstract: The da Vinci surgical robot introduced remote control of instruments, providing surgeons with increased dexterity and precision. A major drawback, however, is the loss of sense of touch due to a lack of kinesthetic coupling between the surgical field and the surgeon. This paper presents a framework for sensorless transparency optimized four channel teleoperation. It is sensorless because forces are estimated from existing actuator feedback, with a deep network for dynamics identification. Performance is further optimized by introducing robust acceleration control, with disturbance observers. Experiments performed on the da Vinci Research Kit (dVRK), an open research platform based on the clinically deployed robotic hardware, show improvements in control, force estimation and reflection. The significance is that we demonstrate that high-performance bilateral teleoperation is feasible in clinical systems, without hardware changes, and is available to the dVRK community through a software update.
|
|
10:30-12:00, Paper ThAT19-NT.4 | Add to My Program |
Multimodal Transformers for Real-Time Surgical Activity Prediction |
|
Weerasinghe, Keshara | University of Virginia |
Roodabeh, Seyed HamidReza | University of Virginia |
Hutchinson, Kay | University of Virginia |
Alemzadeh, Homa | University of Virginia |
Keywords: Surgical Robotics: Laparoscopy, Recognition, Kinematics
Abstract: Real-time recognition and prediction of surgical activities are fundamental to advancing safety and autonomy in robot-assisted surgery. This paper presents a multimodal transformer architecture for real-time recognition and prediction of surgical gestures and trajectories based on short segments of kinematic and video data. We conduct an ablation study to evaluate the impact of fusing different input modalities and their representations on gesture recognition and prediction performance. We perform an end-to-end assessment of the proposed architecture using the JHU-ISI Gesture and Skill Assessment Working Set (JIGSAWS) dataset. Our model outperforms the state-of-the-art (SOTA) with 89.5% accuracy for gesture prediction through effective fusion of kinematic features with spatial and contextual video features. It achieves the real-time performance of 1.1-1.3ms for processing a 1-second input window by relying on a computationally efficient model.
|
|
10:30-12:00, Paper ThAT19-NT.5 | Add to My Program |
Learning Needle Pick-And-Place without Expert Demonstrations |
|
Bendikas, Rokas | UCL |
Modugno, Valerio | University College London |
Kanoulas, Dimitrios | University College London |
Vasconcelos, Francisco | University College London |
Stoyanov, Danail | University College London |
Keywords: Surgical Robotics: Laparoscopy, Reinforcement Learning, Autonomous Agents
Abstract: We introduce a novel approach for learning a complex multi-stage needle pick-and-place manipulation task for surgical applications using Reinforcement Learning without expert demonstrations or explicit curriculum. The proposed method is based on a recursive decomposition of the original task into a sequence of sub-tasks with increasing complexity and utilizes an actor-critic algorithm with deterministic policy output. In this work, exploratory bottlenecks have been used by a human expert as convenient boundary points for partitioning complex tasks into simpler subunits. Our method has successfully learnt a policy for the needle pick-and-place task, whereas the state-of-the-art TD3+HER method is unable to achieve success without the help of expert demonstrations. Comparison results show that our method achieves the highest performance with a 91% average success rate.
|
|
10:30-12:00, Paper ThAT19-NT.6 | Add to My Program |
Learning Nonprehensile Dynamic Manipulation: Sim2real Vision-Based Policy with a Surgical Robot |
|
Gondokaryono, Radian | University of Toronto |
Haiderbhai, Mustafa | University of Toronto |
Suryadevara, Sai Aneesh | Indian Institute of Technology Bombay |
Kahrs, Lueder Alexander | University of Toronto Mississauga |
Keywords: Surgical Robotics: Laparoscopy, Reinforcement Learning, Visual Servoing
Abstract: Surgical tasks such as tissue retraction, tissue exposure, and needle suturing remain challenging in autonomous surgical robotics. One challenge in these tasks is nonprehensile manipulation such as pushing tissue, pressing cloth, and needle threading. In this work, we isolate the problem of nonprehensile manipulation by implementing a vision-based reinforcement learning agent for rolling a block, a task that has complex dynamics interactions, small scale objects, and a narrow field of view. We train agents in simulation with a reward formulation that encourages efficient and safe learning, domain randomization that allows for robust sim2real transfer, and a recurrent memory layer that enables reasoning about randomized dynamics parameters. We successfully transfer our agents from simulation to real and show robust execution of our vision-based policy with a 96.3% success rate. We analyze and discuss the success rate, trajectories, and recovery behaviours for various models that are either using the recurrent memory layer or are trained with a difficult physics environment. Further project information is available at href{https://medcvr.utm.utoronto.ca/ral2023-rollblock.html}{https://medcvr.utm.utoronto.ca/ral2023-rollblock.html}.
|
|
10:30-12:00, Paper ThAT19-NT.7 | Add to My Program |
Realistic Data Generation for 6D Pose Estimation of Surgical Instruments |
|
Barragan, Juan Antonio | Johns Hopkins University |
Zhang, Jintan | Johns Hopkins University |
Zhou, Haoying | Worcester Polytechnic Institute |
Munawar, Adnan | Johns Hopkins University |
Kazanzides, Peter | Johns Hopkins University |
Keywords: Surgical Robotics: Laparoscopy, Simulation and Animation, Deep Learning Methods
Abstract: Automation in surgical robotics has the potential to improve patient safety and surgical efficiency, but it is difficult to achieve due to the need for robust perception algorithms. In particular, 6D pose estimation of surgical instruments is critical to enable the automatic execution of surgical maneuvers based on visual feedback. In recent years, supervised deep learning algorithms have shown increasingly better performance at 6D pose estimation tasks; yet, their success depends on the availability of large amounts of annotated data. In household and industrial settings, synthetic data, generated with 3D computer graphics software, has been shown as an alternative to minimize annotation costs of 6D pose datasets. However, this strategy does not translate well to surgical domains as commercial graphics software have limited tools to generate images depicting realistic instrument-tissue interactions. To address these limitations, we propose an improved simulation environment for surgical robotics that enables the automatic generation of large and diverse datasets for 6D pose estimation of surgical instruments. Among the improvements, we developed an automated data generation pipeline and an improved surgical scene. To show the applicability of our system, we generated a dataset of 7.5k images with pose annotations of a surgical needle that was used to evaluate a state-of-the-art pose estimation network. The trained model obtained a mean translational error of 2.59mm on a challenging dataset that presented varying levels of occlusion. These results highlight our pipeline's success in training and evaluating novel vision algorithms for surgical robotics applications.
|
|
10:30-12:00, Paper ThAT19-NT.8 | Add to My Program |
Surgical Gym: A High-Performance GPU-Based Platform for Reinforcement Learning with Surgical Robots |
|
Schmidgall, Samuel | Johns Hopkins University |
Krieger, Axel | Johns Hopkins University |
Eshraghian, Jason | University of California, Santa Cruz |
Keywords: Surgical Robotics: Laparoscopy, Software Architecture for Robotic and Automation, Reinforcement Learning
Abstract: Recent advances in robot-assisted surgery have resulted in progressively more precise, efficient, and minimally invasive procedures, sparking a new era of robotic surgical intervention. This enables doctors, in collaborative interaction with robots, to perform traditional or minimally invasive surgeries with improved outcomes through smaller incisions. Recent efforts are working toward making robotic surgery more autonomous which has the potential to reduce variability of surgical outcomes and reduce complication rates. Deep reinforcement learning methodologies offer scalable solutions for surgical automation, but their effectiveness relies on extensive data acquisition due to the absence of prior knowledge in successfully accomplishing tasks. Due to the intensive nature of simulated data collection, previous works have focused on making existing algorithms more efficient. In this work, we focus on making the simulator more efficient, making training data much more accessible than previously possible. We introduce Surgical Gym, an open-source high performance platform for surgical robot learning where both the physics simulation and reinforcement learning occur directly on the GPU. We demonstrate between 100-5000x faster training times compared with previous surgical learning platforms. The code is available at: https://github.com/SamuelSchmidgall/SurgicalGym.
|
|
10:30-12:00, Paper ThAT19-NT.9 | Add to My Program |
Multi-Objective Cross-Task Learning Via Goal-Conditioned GPT-Based Decision Transformers for Surgical Robot Task Automation |
|
Fu, Jiawei | Institute of Artificial Intelligence and Robotics |
Long, Yonghao | The Chinese University of Hong Kong |
Chen, Kai | The Chinese University of Hong Kong |
Wei, Wang | The Chinese University of Hong Kong |
Dou, Qi | The Chinese University of Hong Kong |
Keywords: Surgical Robotics: Laparoscopy, Surgical Robotics: Planning
Abstract: Surgical robot task automation has been a promising research topic for improving surgical efficiency and quality. Learning-based methods have been recognized as an interesting paradigm and been increasingly investigated. However, existing approaches encounter difficulties in long-horizon goal-conditioned tasks due to the intricate compositional structure, which requires decision-making for a sequence of sub-steps and understanding of inherent dynamics of goal-reaching tasks. In this paper, we propose a new learning-based framework by leveraging the strong reasoning capability of the GPT-based architecture to automate surgical robotic tasks. The key to our approach is developing a goal-conditioned decision transformer to achieve sequential representations with goal-aware future indicators in order to enhance temporal reasoning. Moreover, considering to exploit a general understanding of dynamics inherent in manipulations, thus making the model's reasoning ability to be task-agnostic, we also design a cross-task pretraining paradigm that uses multiple training objectives associated with data from diverse tasks. We have conducted extensive experiments on 10 tasks using the surgical robot learning simulator SurRoL. The results show that our new approach achieves promising performance and task versatility compared to existing methods. The learned trajectories can be deployed on the da Vinci Research Kit (dVRK) for validating its practicality in real surgical robot settings. Our project website is at: https://med-air.github.io/SurRoL.
|
|
ThAT20-NT Oral Session, NT-G302 |
Add to My Program |
Safety in HRI |
|
|
Chair: Haddadin, Sami | Technical University of Munich |
Co-Chair: Kamezaki, Mitsuhiro | The University of Tokyo |
|
10:30-12:00, Paper ThAT20-NT.1 | Add to My Program |
Safe-By-Design Digital Twins for Human-Robot Interaction: A Use Case for Humanoid Service Robots |
|
Škerlj, Jon | Technical University of Munich |
Hamad, Mazin | Technical University of Munich (TUM) |
Elsner, Jean | Technical University of Munich |
Naceri, Abdeldjallil | Technical University of Munich |
Haddadin, Sami | Technical University of Munich |
Keywords: Safety in HRI, Service Robotics, Software-Hardware Integration for Robot Systems
Abstract: Integrating humanoid service mobile robots into human environments presents numerous challenges, primarily concerning the safety of interactions between robots and humans. To address these safety concerns, we propose a novel approach that leverages the capabilities of digital twin technology by tailoring it to incorporate comprehensive and robust safety concepts. This paper introduces a ``safe-by-design'' digital twin that operates alongside the real twin robot in the loop, engaging real-time safety framework during physical interactions with the surrounding environment, including humans. To validate the effectiveness of our proposed safe-by-design digital twin framework, we conducted experiments using a humanoid service mobile robot alongside simulated human counterparts. Our results demonstrate the capability of the integrated impact safety module within the proposed digital twin approach to limit the velocities of both the robot's base and arms, adhering to injury biomechanics-based safety thresholds. These findings emphasize the promise of our proposed approach for ensuring the physical safety of humanoid service mobile robots operating in dynamic human environments. It enables the digital twin to preemptively identify potential safety hazards and formulate safe intervention actions to ensure the robot's compliance with safety regulations, paving the way for safer and more widespread adoption of robotic systems in various service domains.
|
|
10:30-12:00, Paper ThAT20-NT.2 | Add to My Program |
Safe Execution of Learned Orientation Skills with Conic Control Barrier Functions |
|
Shen, Zheng | TU Munich |
Saveriano, Matteo | University of Trento |
Abu-Dakka, Fares | Mondragon University |
Haddadin, Sami | Technical University of Munich |
Keywords: Safety in HRI, Learning from Demonstration, Telerobotics and Teleoperation
Abstract: In the field of Learning from Demonstration (LfD), Dynamical Systems (DSs) have gained significant attention due to their ability to generate real-time motions and reach predefined targets. However, the conventional convergence-centric behavior exhibited by DSs may fall short in safety-critical tasks, specifically, those requiring precise replication of demonstrated trajectories or strict adherence to constrained regions even in the presence of perturbations or human intervention. Moreover, existing DS research often assumes demonstrations solely in Euclidean space, overlooking the crucial aspect of orientation in various applications. To alleviate these shortcomings, we present an innovative approach geared toward ensuring the safe execution of learned orientation skills within constrained regions surrounding a reference trajectory. This involves learning a stable DS on SO(3), extracting time-varying conic constraints from the variability observed in expert demonstrations, and bounding the evolution of the DS with Conic Control Barrier Function (CCBF) to fulfill the constraints. We validated our approach through extensive evaluation in simulation and showcased its effectiveness for a cutting skill in the context of assisted teleoperation.
|
|
10:30-12:00, Paper ThAT20-NT.3 | Add to My Program |
Real-Time Batched Distance Computation for Time-Optimal Safe Path Tracking |
|
Fujii, Shohei | Nanyang Technological University, DENSO Corporation |
Pham, Quang-Cuong | NTU Singapore |
Keywords: Safety in HRI, Human-Aware Motion Planning, Industrial Robots
Abstract: In human-robot collaboration, there has been a trade-off relationship between the speed of collaborative robots and the safety of human workers. In our previous paper, we introduced a time-optimal path tracking algorithm designed to maximize speed while ensuring safety for human workers. This algorithm runs in real-time and provides the safe and fastest control input for every cycle with respect to ISO standards. However, true optimality has not been achieved due to inaccurate distance computation resulting from conservative model simplification. To attain true optimality, we require a method that can compute distances 1. at many robot configurations to examine along a trajectory 2. in real-time for online robot control 3. as precisely as possible for optimal control. In this paper, we propose a batched, fast and precise distance checking method based on precomputed link-local SDFs. Our method can check distances for 500 waypoints along a trajectory within less than 1 millisecond using a GPU at runtime, making it suited for time-critical robotic control. Additionally, a neural approximation has been proposed to accelerate preprocessing by a factor of 2. Finally, we experimentally demonstrate that our method can navigate a 6-DoF robot earlier than a geometric-primitives-based distance checker in a dynamic and collaborative environment.
|
|
10:30-12:00, Paper ThAT20-NT.4 | Add to My Program |
Overcoming Hand and Arm Occlusion in Human-To-Robot Handovers: Predicting Safe Poses with a Multimodal DNN Regression Model |
|
Lollett, Catherine | Waseda University |
Sriram, Advaith | Waseda University |
Kamezaki, Mitsuhiro | The University of Tokyo |
Sugano, Shigeki | Waseda University |
Keywords: Safety in HRI, Human-Aware Motion Planning, Deep Learning for Visual Perception
Abstract: Handovers play a key role in human-robot interactions. However, current research focuses on visible-hand handovers, thereby heavily relying on hand detection. Large objects in human-robot interactions present a unique challenge: they inherently block the person's hands and arms from the robot's view. This occlusion raises the robot's risk of unintended physical contact with the person, leading to discomfort and safety concerns. This study aims to develop a model that can determine a pose for the robot that ensures a handover that avoids physical contact with the person, especially in scenarios when hands and arms are occluded. Toward this goal, a three-branch multimodal Deep Neural Network (DNN) regression model was implemented. First, a robust human-pose keypoints detection to calculate shoulder-elbow angles is applied. Secondly, we extract the refined object's segmented mask. Thirdly, we compute two intrinsic object properties. The concatenated outputs from these branches pass through extra dense layers, resulting in the prediction of the robot's 14 arms-joint angles. Compared to an only keypoint data processed-based model, our multimodal approach made a 17.7% accuracy improvement. The experiments highlight each pipeline step's significance, showing important results even when hands and arms were heavily occluded, adjusting to different variations.
|
|
10:30-12:00, Paper ThAT20-NT.5 | Add to My Program |
Legible and Proactive Robot Planning for Prosocial Human-Robot Interactions |
|
Geldenbott, Jasper | University of Washington |
Leung, Karen | University of Washington |
Keywords: Human-Aware Motion Planning, Safety in HRI
Abstract: Humans have a remarkable ability to fluently engage in joint collision avoidance in crowded navigation tasks despite the complexities and uncertainties inherent in human behavior. Underlying these interactions is a mutual understanding that (i) individuals are prosocial, that is, there is equitable responsibility in avoiding collisions, and (ii) individuals should behave legibly, that is, move in a way that clearly conveys their intent to reduce ambiguity in how they intend to avoid others. Toward building robots that can safely and seamlessly interact with humans, we propose a general robot trajectory planning framework for synthesizing legible and proactive behaviors and demonstrate that our robot planner naturally leads to prosocial interactions. Specifically, we introduce the notion of a markup factor to incentivize legible and proactive behaviors and an inconvenience budget constraint to ensure equitable collision avoidance responsibility. We evaluate our approach against well-established multi-agent planning algorithms and show that using our approach produces safe, fluent, and prosocial interactions. We demonstrate the real-time feasibility of our approach with human-in-the-loop simulations. Project page can be found at https://uw-ctrl.github.io/phri/.
|
|
10:30-12:00, Paper ThAT20-NT.6 | Add to My Program |
Integrated Data-Driven Inference and Planning-Based Human Motion Prediction for Safe Human-Robot Interaction |
|
Nam, Youngim | Ulsan National Institute of Science and Technology |
Kwon, Cheolhyeon | Ulsan National Institute of Science and Technology |
Keywords: Human-Aware Motion Planning, Safety in HRI, Planning under Uncertainty
Abstract: This paper presents a unified prediction and planning algorithm for an autonomous vehicle to interact with an uncertain human-driven vehicle. Predicting human motion is challenging due to inherent uncertainties in diverse human internal states, i.e., driving styles and rationality. To address these complexities, we propose a hierarchical prediction strategy that combines data-driven internal state inference and planning-based human motion prediction. First, we employ Long Short Term Memory Networks (LSTM) based inference modules to capture both driving styles and rationality from the observed motion of human driver. With these inferred internal states, we predict the future trajectories of human-driven vehicle by formulating a human planning model as an optimization problem. Lastly, we present a Stochastic Model Predictive Control (SMPC) for the autonomous vehicle to safely interact with the human-driven vehicle while actively inferring human internal states. The simulation results, demonstrating the lane change scenarios, indicate the proposed method outperforms the existing work in both predicting the human motion and achieving the robot's goal.
|
|
10:30-12:00, Paper ThAT20-NT.7 | Add to My Program |
How Does Perception Affect Safety: New Metrics and Strategy |
|
Zhang, Xiaotong | Massachusetts Institute of Technology |
Chong, Jinger | Massachusetts Institute of Technology |
Youcef-Toumi, Kamal | Massachusetts Institute of Technology |
Keywords: Safety in HRI, Human-Robot Collaboration, Object Detection, Segmentation and Categorization
Abstract: Perception plays a pivotal role in enhancing the functionality of autonomous agents. However, the intricate relationship between robotic perception metrics and actuation metrics remains unclear, leading to ambiguity in the development and fine-tuning of perception algorithms. In this paper, we introduce a methodology for quantifying this relationship, taking into account factors such as detection rate, detection quality, and latency. Furthermore, we introduce two novel perception metrics for Human-Robot Collaboration safety predicated upon basic perception metrics: Critical Collision Probability (CCP) and Average Collision Probability (ACP). To validate the utility of these metrics in facilitating algorithm development and tuning, we develop an attentive processing strategy that focuses exclusively on key input features. This approach significantly reduces computational time while preserving a similar level of accuracy. Experimental findings demonstrate that integrating this strategy into an object detector results in a notable maximum reduction of 30.09% in inference time and 26.53% in total time per frame. Additionally, the strategy lowers the CCP and ACP in a baseline model by 11.25% and 13.50%, respectively.
|
|
10:30-12:00, Paper ThAT20-NT.8 | Add to My Program |
Constrained Passive Interaction Control: Leveraging Passivity and Safety for Robot Manipulators |
|
Zhang, Zhiquan | University of Pennsylvania |
Li, Tianyu | University of Pennsylvania |
Figueroa, Nadia | University of Pennsylvania |
Keywords: Safety in HRI, Collision Avoidance, Machine Learning for Robot Control
Abstract: Passivity is necessary for robots to fluidly collaborate and interact with humans physically. Nevertheless, due to the unconstrained nature of passivity-based impedance control laws, the robot is vulnerable to infeasible and unsafe configurations upon physical perturbations. In this paper, we propose a novel control architecture that allows a torque-controlled robot to guarantee safety constraints such as kinematic limits, self-collisions, external collisions and singularities and is passive only when feasible. This is achieved by constraining a dynamical system based impedance control law with a relaxed hierarchical control barrier function quadratic program subject to multiple concurrent, possibly contradicting, constraints. Joint space constraints are formulated from efficient data-driven self- and external C^2 collision boundary functions. We theoretically prove constraint satisfaction and show that the robot is passive when feasible. Our approach is validated in simulation and real robot experiments on a 7DoF Franka Research 3 manipulator.
|
|
10:30-12:00, Paper ThAT20-NT.9 | Add to My Program |
Boosting Adversarial Training in Safety-Critical Systems through Boundary Data Selection |
|
Jia, Yifan | Zhejiang University |
Poskitt, Christopher M. | Singapore Management University |
Zhang, Peixin | Singapore Management University |
Wang, Jingyi | Zhejiang University |
Sun, Jun | Singapore Management University |
Chattopadhyay, Sudipta | Singapore University of Technology and Design |
Keywords: Safety in HRI, AI-Enabled Robotics, Big Data in Robotics and Automation
Abstract: AI-enabled collaboration robots are designed to be used in close collaborations with humans, thus demanding stringent safety standards and quick response times. However, adversarial attacks pose a significant threat to deep learning models of AI-enabled industrial systems, making it crucial to develop methods to improve the models' robustness against them. Adversarial training is one approach to achieve this, but its effectiveness heavily relies on the quality of the training data, which can be expensive to acquire. In this work, we try to balance the need for quality data with the goal to minimize cost by selecting the most `important' of it for adversarial training. In particular, we propose a task-based robust fast (RAST) learning method that selects training data near to the boundary by considering adversarial samples. Our method improves the speed of model training on CIFAR-10 by 68.67%, and compared to other data selection methods, has 10% higher accuracy with 10% training data selected, and 7% higher robustness with 4% training data selected. Our method also significantly improves efficiency by at least 25% on adversarial training with the same performance. Finally, we evaluate our method on a physical robotic arm system with object detection, generating adversarial patches as our attack, and adopting our method as the defense. We find that RAST can defend against 60% of untargeted attacks and 20% of targeted attacks. Therefore, our work highlights the benefits of
|
|
ThAT21-NT Oral Session, NT-G303 |
Add to My Program |
Micro/Nano Robots I |
|
|
Chair: Xu, Qingsong | University of Macau |
Co-Chair: Cappelleri, David | Purdue University |
|
10:30-12:00, Paper ThAT21-NT.1 | Add to My Program |
Automated Assembly by Two-Fingered Microhand for Fabrication of Soft Magnetic Microrobots |
|
Zhao, Yue | Beijing Institute of Technology |
Liu, Xiaoming | Beijing Institute of Technology |
Wang, Ruixi | Tsinghua University |
Liu, Dan | Beijing Institute of Technology |
Kojima, Masaru | Osaka University |
Huang, Qiang | Beijing Institute of Technology |
Arai, Tatsuo | University of Electro-Communications |
Keywords: Micro/Nano Robots
Abstract: Micro-assembly is an emerging method to fabricate microrobots with multiple modules or particles. However, there is always a lack of a flexible and efficient method to freely create the desired magnetic soft microrobots. In this paper, an automated assembly system based on a two-fingered microhand is presented for fabricating magnetic soft microrobots. Our proposed system can automatically pick and place components to assemble microrobots with a two-fingered micromanipulator, and orient these components through an external magnetic field. The automated assembly has the advantages of high accuracy, high speed, and high success rate. It can endow magnetic microrobots with flexible material selection, arbitrary geometry design, and programable magnetization profile. We can make full use of this system to fabricate multiple magnetic soft microrobots. The experiment results demonstrate that this system can efficiently fabricate microrobots with excellent mechanical properties, which have application potential in robotics, biomedical engineering, and environmental governance.
|
|
10:30-12:00, Paper ThAT21-NT.2 | Add to My Program |
Thin-Film NiTi Microactuator with a Magnetic Spring for a Tiny Launcher Mechanism |
|
Kim, Sukjun | University of California, San Diego |
Bergbreiter, Sarah | Carnegie Mellon University |
Keywords: Micro/Nano Robots, Additive Manufacturing
Abstract: In this work, we present a thin-film shape memory alloy (NiTi) microactuator with a magnetic spring. This novel actuator design utilizes two permanent magnets and 3D-printed magnet holders to effectively apply a tensile strain on the NiTi thin-film. This actuator is expected to generate 8.7 mN of blocking force, and a free displacement of 30 μm is experimentally characterized. The actuator leverages bare NiTi film (∼ 1 μm thick) for actuation, enabling a high actuator bandwidth up to 50 Hz. A comprehensive analytical model is also studied, which was then validated by comparing to the experimental results. A launcher mechanism was designed and integrated with the NiTi actuator, and this mechanism was used to launch a microscale projectile (a salt grain) thereby demonstrating the relative high power actuation achievable with thin-film NiTi.
|
|
10:30-12:00, Paper ThAT21-NT.3 | Add to My Program |
Nature-Inspired Bubble Magnetic Microrobots for Multimode Locomotion, Cargo Delivery, Imaging, and Biosensing |
|
Xu, Zichen | University of Macau |
Xu, Qingsong | University of Macau |
Yu, Hon Ho | Kiang Wu Hospital |
Keywords: Micro/Nano Robots, Automation at Micro-Nano Scales, Biologically-Inspired Robots
Abstract: Wirelessly actuated magnetic microrobots are promising tools in medical applications due to their tiny sizes and attractive robotic properties. However, it remains a huge challenge to integrate sufficient functionalities in a limited volume. Microscopic natural phenomenon is a great reference for current microrobot design, where the underlying intelligence and subtlety spurs related modern artificial systems. Inspired by air bubbles in nature, herein, we report a kind of novel magnetic air bubble microrobots. The air bubble-based structure enables multiple functionalities including cargo delivery, multimode locomotion, micromanipulation, medical imaging, and biosensing. The proposed microrobot is essentially Pickering bubbles composed of magnetic particles and air bubbles. Their hollow structures help produce lighter microrobots with density less than 1 g/cm^3, enabling buoyancy-based self-propulsion. Buoyancy and magnetic forces actuation enables flexible 3D locomotion in fluidic environments. Experimental results show that the microrobots can be controlled properly for designated assignments. Furthermore, the introduction of air bubble enhances ultrasound imaging, facilitating further in vivo applications. These findings offer a significant microrobot design paradigm by exploiting natural physical intelligence at the small scale.
|
|
10:30-12:00, Paper ThAT21-NT.4 | Add to My Program |
Magnetic Mobile Micro-Gripping MicroRobots (MMμGRs) with Two Independent Magnetic Actuation Modes |
|
Davis, Aaron C. | Purdue University |
Freeman, Emmett | Purdue University |
Cappelleri, David | Purdue University |
Keywords: Micro/Nano Robots, Automation at Micro-Nano Scales, Mechanism Design
Abstract: In this paper, we introduce magnetic mobile micro-gripping microrobots with two independent actuation modes. By aligning two magnets with slight variations in magnetic moment orientations, we create a net magnetic moment for precise position and orientation control through external fields, while harnessing opposing torques on the magnets to induce internal stresses needed for gripping. Our microrobot design features a compliant spring-like structure for significant deflection, enabling a gripping motion under specific magnetic field conditions. Magnet rotation allows precise control over gripper actions, returning to a default state (normally open or closed) when the magnetic field diminishes. This work advances magnetic field-controlled microrobotics, bridging the millimeter-to-micrometer gap. It holds promise for applications in microsurgery, micro-assembly, and microscale exploration.
|
|
10:30-12:00, Paper ThAT21-NT.6 | Add to My Program |
Electroosmotic Self-Propelled Microswimmer with Magnetic Steering |
|
Yamanaka, Toshiro | The University of Tokyo |
Arai, Fumihito | The University of Tokyo |
Keywords: Micro/Nano Robots, Automation at Micro-Nano Scales, Motion Control
Abstract: Microswimmers have significant potential for medical applications such as long-term drug administration, precise surgery, and so on. The outstanding challenge is to realize power supply, propulsion, and steering mechanisms suitable for operations within the human body and microscale fluids. We propose the microswimmer composed of a self-propulsive disk-shaped module with multiple channels using biofuel cell (BFC) and electroosmotic propulsion (EOP) and a magnetic rod using magnetic steering (MS). The BFC produces an open-circuit potential (OCP) between a bioanode and a biocathode by redox reactions. The EOP generates a self-propulsive velocity due to counteracting forces of electroosmotic flows produced by the OCP in the channels arranged between the electrodes. The MS works by aligning the magnetic rod in a controlled magnetic field direction. The prototype was designed and fabricated using an insulating polymer layer, two conductive layers incorporating silver nanoparticles with anodic/cathodic enzymes, and a magnetic layer containing magnetic nanoparticles. The fast self-propulsion of continuously rotating 30 µm prototypes by the steering in a glucose solution was demonstrated as expected theoretically. This concept has the potential to be used as microrobots for future medical applications such as a pulling mechanism to assist in guidewire insertion or agents delivering drugs.
|
|
10:30-12:00, Paper ThAT21-NT.7 | Add to My Program |
A Flat Tendon-Driven Continuum Microrobot for Brain Interventions |
|
Noseda, Lorenzo | EPFL |
Liu, Addison | Harvard University |
Pancaldi, Lucio | EPFL |
Sakar, Mahmut Selman | EPFL |
Keywords: Micro/Nano Robots, Automation at Micro-Nano Scales, Surgical Robotics: Steerable Catheters/Needles
Abstract: Navigating biomedical instruments inside the brain remains challenging and high-risk. The delicate nature of the tissues involved requires the development of cutting- edge robotic technologies to enhance precision and safety. In response to these demands, this paper presents a novel ribbon-shaped, tendon-driven continuum microrobot designed explicitly to navigate through brain tissues. The microrobot has a cross-sectional area of 1 mm2, and its design is readily compatible with conventional microfabrication techniques for further miniaturization. The flat geometry aims to provide superior maneuverability and opens up new challenges for modeling and control. We detail the design methodology and fabrication, followed by in vitro characterization and testing within brain tissue phantoms.
|
|
10:30-12:00, Paper ThAT21-NT.8 | Add to My Program |
A Selectively Controllable Triple-Helical Micromotor |
|
Zhao, Hongyu | Shenzhen Institute of Artificial Intelligence and Robotics for S |
Ye, Min | Shenzhen Institute of Artificial Inteligence and Robotics for So |
Nelson, Bradley J. | ETH Zurich |
Wang, Xiaopu | Shenzhen Institute of Artificial Intelligence and Robotics for S |
Keywords: Micro/Nano Robots, Biologically-Inspired Robots, Additive Manufacturing
Abstract: Selective control mechanisms of microrobots have attracted significant attention from researchers. So far, selective control within multiple/swarm magnetic microrobots has been achieved with many strategies, such as utilizing locally specified magnetic fields, applying electrostatic anchoring, taking the advantages of geometry/wettability heterogeneity of the microrobots, etc. Using the step-out behavior of helical microrobots driven by a rotating magnetic field, researchers have proposed a mathematical model for multihelical motor that can be selectively controlled. Based on this model, we developed a micromotor that consists of three geometrically heterogeneous helices that can be selectively driven within a specific narrow frequency range. This type of micromotor shows bi-direction motion capability and has the potential to be used as an actuation unit for multiple types of functional micromechanisms.
|
|
ThAT22-NT Oral Session, NT-G304 |
Add to My Program |
Space Robotics II |
|
|
Chair: Abiko, Satoko | Shibaura Institute of Technology |
Co-Chair: Hirano, Daichi | Japan Aerospace Exploration Agency |
|
10:30-12:00, Paper ThAT22-NT.1 | Add to My Program |
System Identification of Space Manipulator Systems and Its Implications on Robust Control Performance |
|
Rekleitis, Georgios | National Technical University of Athens |
Papadopoulos, Evangelos | National Technical University of Athens |
Keywords: Space Robotics and Automation, Calibration and Identification
Abstract: Space manipulator system (SMS) maneuvers can excite flexible appendages, while fuel sloshing effects impact its dynamics and performance. To predict this behavior and control such systems, sloshing and flexible appendages are modeled. A novel system identification scheme is developed, which identifies all parameters required for the reconstruction of system dynamics despite unmeasurable sloshing and modal states. This is achieved by two identification experiments. In Exp.1 all unmeasurable states are eliminated, while in Exp.2 the unmeasurable sloshing states are eliminated, and a novel estimator is used for the unmeasurable modal states. The significance of accurate SYSID in controller design and performance is demonstrated by simulating a 3D SMS controlled by model-based and robust controllers. In both cases, using the identified parameters results in significant robust control performance enhancement.
|
|
10:30-12:00, Paper ThAT22-NT.2 | Add to My Program |
Model Design and Concept of Operations of Standard Interface for On-Orbit Construction |
|
Zhao, Jingdong | Harbin Institute of Technology |
Wang, Zirui | Harbin Institute of Technology |
Liu, Ziyi | Harbin Institute of Technology |
Zhao, Liangliang | Harbin Institute of Technology |
Duan, Qifan | Harbin Institute of Technology |
Liu, Hong | Harbin Institute of Technology |
Keywords: Space Robotics and Automation, Compliance and Impedance Control, Mechanism Design
Abstract: The construction of large-scale space facilities requires the use of on-orbit construction technology. However, several of its key components, such as standard interface design, compliant control methods, and path planning for multi-branch robots, still need improvement before practical application. This paper presents a comprehensive solution for on-orbit construction tasks, encompassing a novel standard interface, docking control method, and path planning method for space multi-branch robots. Firstly, a novel standard interface is introduced, which features multiple mating modes and a lightweight design. Additionally, a compliant docking method is provided to generate lower contact forces along the Z-direction. Furthermore, for four-armed space robots, a hierarchical planning method is proposed, which innovates in environment map construction and locomotion planning. Specifically, the closedform Minkowski sum method is employed to solve the robot’s free space, and a concise locomotion method is elucidated based on transition support points. Finally, simulations and experiments are conducted.
|
|
10:30-12:00, Paper ThAT22-NT.3 | Add to My Program |
Enabling Faster Locomotion of Planetary Rovers with a Mechanically-Hybrid Suspension |
|
Rodríguez-Martínez, David | École Polytechnique Fédérale De Lausanne (EPFL) |
Uno, Kentaro | Tohoku University |
Sawa, Kenta | Tohoku University |
Uda, Masahiro | Tohoku University |
Kudo, Gen | Tohoku University |
Diaz Huenupan, Gustavo Hernan | Tohoku University |
Umemura, Ayumi | Tohoku University |
Santra, Shreya | Tohoku University |
Yoshida, Kazuya | Tohoku University |
Keywords: Space Robotics and Automation, Compliant Joints and Mechanisms, Mechanism Design
Abstract: The exploration of the lunar poles and the collection of samples from the martian surface are characterized by shorter time windows demanding increased autonomy and speeds. Autonomous mobile robots must intrinsically cope with a wider range of disturbances. Faster off-road navigation has been explored for terrestrial applications but the combined effects of increased speeds and reduced gravity fields are yet to be fully studied. In this paper, we design and demonstrate a novel fully passive suspension design for wheeled planetary robots, which couples for the first time a high-range passive rocker with elastic in-wheel coil-over shock absorbers. The design was initially conceived and verified in a reduced-gravity (1.625 m/s 2) simulated environment, where three different passive suspension configurations were evaluated against steep slopes and unexpected obstacles, and later prototyped and validated in a series of field tests. The proposed mechanically-hybrid suspension proves to mitigate more effectively the negative effects (high-frequency/high-amplitude vibrations and impact loads) of faster locomotion (~1 m/s) over unstructured terrains under varied gravity fields.
|
|
10:30-12:00, Paper ThAT22-NT.4 | Add to My Program |
RoboBall: An All-Terrain Spherical Robot with a Pressurized Shell |
|
Oevermann, Micah | Texas A&M University |
Pravecek, Derek | Texas A&M University |
Jibrail, Joseph Garrett | Texas A&M University |
Jangale, Rishi | Texas A&M University |
Ambrose, Robert | Texas A&M University |
Keywords: Space Robotics and Automation, Compliant Joints and Mechanisms, Soft Robot Materials and Design
Abstract: Spherical robots are a different type of mobility platform. A spherical robot is self-contained within its shell rather than relying on a chassis with wheels to navigate. In this shell, it is completely shielded from dust and the environment. This benefit of geometric simplicity has led to the spherical robot becoming an advantageous option for all-terrain exploration and surveying. This paper focuses on a novel iteration of such a robot with a pressurized pneumatic shell design. A soft robot of this type brings benefits of a passive, compliant contact surface that can affect its performance. However, the added softness of its shell adds new unmodeled dynamics into the system that impair commonly used control schemes. This paper outlines the design and manufacture of a soft, inflatable, spherical shell designed for a robot driven by an internal 2-DOF pendulum. In addition, it presents models for controlling the pendulum and understanding the shell dynamics. The paper concludes with experimental validations of these models and field tests of the system on slopes, gravel, rough grass, and on water.
|
|
10:30-12:00, Paper ThAT22-NT.5 | Add to My Program |
Enhanced Multifunctional Interface for Reconfigurability of Robotic Teams in Planetary Applications |
|
Yüksel, Mehmed | DFKI GmbH |
Brinkmann, Wiebke | DFKI Robotics Innovation Center Bremen |
Jankovic, Marko | German Research Center for Artificial Intelligence GmbH (DFKI) |
Kücüker, Hilmi Dogu | DFKI-RIC |
Kirchner, Frank | University of Bremen |
Keywords: Space Robotics and Automation, Mechanism Design, Engineering for Robotic Systems
Abstract: Exploration missions on extra terrestrial celestial bodies are to date performed by complex and heavy robotic systems. The trend is towards lighter modular systems that can be (re)configured in situ according to mission specific requirements. To facilitate flexible configurability, a multifunctional interconnect is used to mechanically couple the involved systems while providing electrical power and data transmission. The paper presents the further development of the reliable electro-mechanical interface (EMI) from the TransTerrA project, which has been proven in several field tests and reached TRL 4. Docking under loads of up to 550 N has been successfully tested with the new design. The experiments presented include undocking at various inclinations with different loads expected for the application scenario.The maximum determined static load that can be carried by the further developed EMI is 2000 N. In further experiments, new contact blocks responsible for the transfer of electrical power and data were tested for water resistance and resilience to environmental factors, as well as power and data transfer.The obtained results will be helpful in the development of a multi-functional interface suitable for lunar applications and missions having similar challenging environmental conditions
|
|
10:30-12:00, Paper ThAT22-NT.6 | Add to My Program |
Autonomous Perching on Flat Surfaces for Free-Flying Robots with Gecko Adhesive Gripper |
|
Hirano, Daichi | Japan Aerospace Exploration Agency |
Tanishima, Nobutaka | JAXA |
Chen, Tony G. | Stanford University |
Keywords: Space Robotics and Automation, Grippers and Other End-Effectors, Compliant Joints and Mechanisms
Abstract: Gecko-inspired adhesives have the advantage of being able to grasp and release flat surfaces in a vacuum using their microwedge structures. This makes them an especially attractive solution for perching on and grasping flat objects in space for free-flying robots. To grasp and anchor onto these flat surfaces, the gripper must ensure contact between the gecko adhesives and the surface before applying the appropriate forces to activate their adhesion. However, in the case of a free-flying robot in microgravity, physical contact with the surface induces reaction forces, causing the robot to quickly bounce away from the surface. To solve this issue, we propose a simple passive mechanism and a control method of a robotic arm on a free-flying robot with a gecko adhesive gripper. The gripper utilizes a single-motor controlled tendon-driven mechanism mounted at the end of a robotic arm equipped with controllable stiffness joints and a linear spring-damper system. A free-flying robot on an air-bearing platform can successfully perch on a flat surface with a velocity of up to 72.5 mm/s and with an approach angle misalignment of up to 33.0 degrees.
|
|
10:30-12:00, Paper ThAT22-NT.7 | Add to My Program |
VWDER: A Variable Wheel-Diameter Ellipsoidal Robot |
|
Qin, Ziao | Beijing University of Posts and Telecommunications |
Song, Jingzhou | Beijing University of Posts and Telecommunications |
Gong, Xinglong | Beijing University of Posts and Telecommunications |
Liu, Changrui | Beijing University of Posts and Telecommunications |
Keywords: Space Robotics and Automation, Product Design, Development and Prototyping, Surveillance Robotic Systems
Abstract: In recent years, many researchers have conducted extensive research on spherical robots due to their high flexibility and anti-overturning capabilities. Nevertheless, compared with legged and traditional wheeled robots, spherical robots face certain limitations. The spherical robot is composed of a closed spherical shell structure, which makes the capacity of carrying workloads weak. At the same time, the single point contact with the ground cause the contact friction force with the ground is small, so it is hard to climb obstacles such as steps and doorsill. Therefore, we propose a new solution: the variable wheel diameter ellipsoidal robot (VWDER), which combines the characteristics of two-wheel differential driven robot and spherical robot driven by equivalent pendulum.VWDER is equipped with six retractable shell-shaped legs on each side and this innovative design allows both wheels to independently change diameter while rolling. The main frame of the VWDER can keep the top of the frame facing up under the action of equivalent pendulum during the locomotion, which makes it possible to carry workloads such as manipulator arms, cameras, IMU etc. The VWDER robot can climb steps or doorsill using its two adjacent shell-shaped legs. This paper introduces the design of the VWDER and analyzes the kinematics and dynamics of the VWDER. The experimental results verified the performance of the VWDER, including its autonomous opening and closing, obstacle crossing, automatic reorientation and slope climbing etc.
|
|
10:30-12:00, Paper ThAT22-NT.8 | Add to My Program |
On Robust Control Laws Trade-Off Analysis for Space Manipulators with Uncertain Parameters and Flexible Appendages |
|
Nanos, Kostas | National Technical University of Athens |
Chachamis, Efstathios | National Technical University of Athens |
Papadopoulos, Evangelos | National Technical University of Athens |
Keywords: Space Robotics and Automation, Motion Control, Dynamics
Abstract: To accurately accomplish on-orbit tasks using Space Manipulator Systems (SMS), advanced model-based controllers, dependent on the knowledge of SMS parameters, can be employed. However, these parameters may change on orbit for several reasons. Also, during an SMS task, excitation of flexible appendages, such as solar panels, or fuel sloshing may introduce significant end-effector errors. Therefore, controllers robust to parametric uncertainty and disturbances are needed. A robust controller attractive due to its small computational effort is the Linear Parameter Varying (LPV) gain-scheduled controller. However, its design for spatial SMS is not trivial and has not been studied yet. Therefore, the aim of this work is to study and compare robust controllers and examine their applicability to SMS. An LPV plus Hinf controller is compared with a Model-Based PD, and a Model-Based PD plus Hinf controller, in the presence of parametric uncertainty, noisy measurements and disturbances, using a planar example. The criteria considered include: (i) Design Complexity, (ii) Trajectory Errors, (iii) Required Torques, and (iv) Computational Effort.
|
|
ThAT23-NT Oral Session, NT-G401 |
Add to My Program |
Aerial Systems: Perception and Autonomy I |
|
|
Chair: Scherer, Sebastian | Carnegie Mellon University |
Co-Chair: Kyriakopoulos, Kostas | New York University - Abu Dhabi |
|
10:30-12:00, Paper ThAT23-NT.1 | Add to My Program |
N-MPC for Deep Neural Network-Based Collision Avoidance Exploiting Depth Images |
|
Jacquet, Martin | NTNU |
Alexis, Kostas | NTNU - Norwegian University of Science and Technology |
Keywords: Aerial Systems: Applications, Aerial Systems: Perception and Autonomy
Abstract: This paper introduces a Nonlinear Model Predictive Control (N-MPC) framework exploiting a Deep Neural Network for processing onboard-captured depth images for collision avoidance in trajectory-tracking tasks with UAVs. The network is trained on simulated depth images to output a collision score for queried 3D points within the sensor field of view. Then, this network is translated into an algebraic symbolic equation and included in the N-MPC, explicitly constraining predicted positions to be collision-free throughout the receding horizon. The N-MPC achieves real time control of a UAV with a control frequency of 100Hz. The proposed framework is validated through statistical analysis of the collision classifier network, as well as Gazebo simulations and real experiments to assess the resulting capabilities of the N-MPC to effectively avoid collisions in cluttered environments. The associated code is released open-source along with the training images.
|
|
10:30-12:00, Paper ThAT23-NT.2 | Add to My Program |
Aerial Transportation of Cable-Suspended Loads with an Event Camera |
|
Panetsos, Fotis | National Technical University of Athens |
Karras, George | University of Thessaly |
Kyriakopoulos, Kostas | New York University - Abu Dhabi |
Keywords: Aerial Systems: Applications, Aerial Systems: Perception and Autonomy
Abstract: In this work, we investigate the integration of a Dynamic Vision Sensor (DVS) into an Unmanned Aerial Vehicle (UAV) with a cable-suspended load in order to achieve a robust and fast estimation of the cable's state during the transportation of the load. Based on the advantageous properties of event cameras, our ultimate goal is to design a computationally lightweight event processing method that persistently identifies the cable and estimates its complete state – required for any controller with feedback of the cable's state – within a much shorter time period compared to frame-based algorithms. Using a point cloud representation for the incoming event streams, the proposed method achieves the fast detection of the cable while the respective measurements are afterward fitted to a Bézier curve in order to approximate both the cable angle and angular velocity. Our method is initially validated in an indoor environment, where ground truth is available from a motion capture system, and is subsequently deployed in an outdoor one in order to evaluate its robustness against noise. Throughout the outdoor experiment, the feedback provided by the DVS is incorporated into a Nonlinear Model Predictive Control (NMPC) scheme which drives an octorotor towards reference setpoint positions while minimizing the cable angular motion.
|
|
10:30-12:00, Paper ThAT23-NT.3 | Add to My Program |
Autonomous Overhead Powerline Recharging for Uninterrupted Drone Operations |
|
Duong Hoang, Viet | University of Southern Denmark |
Falk Nyboe, Frederik | University of Southern Denmark |
Malle, Nicolaj | University of Southern Denmark |
Ebeid, Emad | University of Southern Denmark |
Keywords: Aerial Systems: Applications, Aerial Systems: Perception and Autonomy, Aerial Systems: Mechanics and Control
Abstract: We present a fully autonomous self-recharging drone system capable of long-duration sustained operations near powerlines. The drone is equipped with a robust onboard perception and navigation system that enables it to locate powerlines and approach them for landing. A passively actuated gripping mechanism grasps the powerline cable during landing after which a control circuit regulates the magnetic field inside a split-core current transformer to provide sufficient holding force as well as battery recharging. The system is evaluated in an active outdoor three-phase powerline environment. We demonstrate multiple contiguous hours of fully autonomous uninterrupted drone operations composed of several cycles of flying, landing, recharging, and takeoff, validating the capability of extended, essentially unlimited, operational endurance.
|
|
10:30-12:00, Paper ThAT23-NT.4 | Add to My Program |
LPNet: A Reaction-Based Local Planner for Autonomous Collision Avoidance Using Imitation Learning |
|
Lu, Junjie | Tianjin University |
Tian, Bailing | Tianjin University |
Shen, Hongming | Nanyang Technological University |
Zhang, Xuewei | Tianjin University |
Hui, Yulin | Tianjin University |
Keywords: Aerial Systems: Applications, Aerial Systems: Perception and Autonomy, Integrated Planning and Learning
Abstract: In this work, we propose a reaction-based local planner for autonomous collision avoidance of quadrotor in obstacle-cluttered environment without relying on an explicit map. Our approach searches for feasible trajectory using a set of motion primitives in state lattice and represents the optimal one as a polynomial by solving an optimal control problem. A modified Q-network, termed LPNet, is presented to predict the action-values of motion primitives from the current depth image and the state estimation of the quadrotor directly. To train the proposed LPNet, a primitive-based expert policy with privileged information about the surroundings and unconstrained computational budget is developed to provide demonstrations for imitation learning. Finally, a series of experiments are conducted to demonstrate the effectiveness and time-efficiency of the proposed method in both simulation and real-world.
|
|
10:30-12:00, Paper ThAT23-NT.5 | Add to My Program |
Autonomous Exploration of Unknown 3D Environments Using a Frontier-Based Collector Strategy |
|
Changoluisa Caiza, Ivan David | University of Zagreb |
Milas, Ana | University of Zagreb, Faculty of Electrical Engineering and Comp |
Montes Grova, Marco Antonio | Center for Advanced Aerospace Technologies (CATEC) |
Perez Grau, Francisco Javier | Center for Advanced Aerospace Technologies |
Petrovic, Tamara | Univ. of Zagreb |
Keywords: Aerial Systems: Applications, Aerial Systems: Perception and Autonomy, Mapping
Abstract: Autonomous exploration using unmanned aerial vehicles (UAVs) is essential for various tasks such as building inspections, rescue operations, deliveries, and warehousing. However, there are two main limitations to previous approaches: they may not be able to provide a complete map of the environment and assume that the map built during exploration is accurate enough for safe navigation, which is usually not the case. To address these limitations, a novel exploration method is proposed that combines frontier-based exploration with a collector strategy that achieves global exploration and complete map creation. In each iteration, the collector strategy stores and validates frontiers detected during exploration and selects the next best frontier to navigate to. The collector strategy ensures global exploration by balancing the exploitation of a known map with the exploration of unknown areas. In addition, the online path replanning ensures safe navigation through the map created during motion. The performance of the proposed method is verified by exploring 3D simulation environments in comparison with the state-of-the-art methods. Finally, the proposed approach is validated in a real-world experiment.
|
|
10:30-12:00, Paper ThAT23-NT.6 | Add to My Program |
Learning Multi-Scale Context Mask-RCNN Network for Slant Angled Aerial Imagery in Instance Segmentation Using Sim2Real |
|
Saadiyean, Qiranul | Indian Institute of Science, Banglore |
S P, Samprithi | PES University |
Sundaram, Suresh | Indian Institute of Science |
Keywords: Aerial Systems: Applications, Aerial Systems: Perception and Autonomy, Surveillance Robotic Systems
Abstract: While instance segmentation models excel at object detection in satellite imagery, their performance drops when applied to slant-angled aerial images due to occlusion and scale variation. This is mainly caused by a lack of training data for such diverse viewpoints and scales. To address this limitation, we propose the Sim2Real-based Multi-Scale Context Mask-RCNN (MSC-RCNN) network, specifically designed for slant-angled aerial imagery. Sim2Real-based transfer learning is adapted to compensate for the limited availability of real-world slant-angle training data. A synthetic dataset is generated using Unreal Engine, detailing the methodology of replicating the real-world scene, for producing diverse slant-angle drone datasets with various weather conditions and backgrounds. The model leverages two distinct feature pyramid backbones, with one incorporating dilated convolutions to address large-scale objects and the other optimized for regular convolutions. Their outputs are fused to effectively detect objects across various scales and angles. Through experiments, it was demonstrated that incorporating this synthetic data significantly reduces reliance on real data while maintaining high mean Average Precision (mAP) scores. Compared to the baseline Mask R-CNN, the proposed approach with Sim2Real adaptation and the MSC-RCNN architecture achieves a remarkable 7.6% performance improvement in instance segmentation accuracy with only a 6% increase in model size.
|
|
10:30-12:00, Paper ThAT23-NT.7 | Add to My Program |
Real-Time Multi-Modal Active Vision for Object Detection on UAVs Equipped with Limited Field of View LiDAR and Camera |
|
Shi, Chuanbeibei | Univeristy of Bristol |
Lai, Ganghua | Beijing Institute of Technology |
Yu, Yushu | Beijing Institute of Technology |
Bellone, Mauro | Tallinn University of Technology |
Lippiello, Vincenzo | University of Naples FEDERICO II |
Keywords: Aerial Systems: Applications, Perception-Action Coupling, Sensor Fusion
Abstract: This paper aims to solve the challenging problems in multi-modal active vision for object detection on unmanned aerial vehicles (UAVs) with a monocular camera and a limited Field of View (FoV) LiDAR. The point cloud acquired from the low-cost LiDAR is firstly converted into a 3-channel tensor via motion compensation, accumulation, projection, and upsampling processes. The generated 3-channel point cloud tensor and RGB image are fused into a 6-channel tensor using an early fusion strategy for object detection based on a Gaussian YOLO network structure. To solve the low computational resource problem and improve the real-time performance, the velocity information of the UAV is further fused with the detection results based on an extended Kalman Filter (EKF). A perception-aware model predictive control (MPC) is designed to achieve active vision on our UAV. According to our performance evaluation, our pre-processing step improves other literature methods running time by a factor of 10 while maintaining acceptable detection performance. Furthermore, our fusion architecture reaches 94.6 mAP on the test set, outperforming the individual sensor networks by roughly 5%. We also described an implementation of the overall algorithm on a UAV platform and validated it in real-world experiments.
|
|
10:30-12:00, Paper ThAT23-NT.8 | Add to My Program |
High-Speed Stereo Visual SLAM for Low-Powered Computing Devices |
|
Kumar, Ashish | Indian Institute of Technology, Kanpur |
Park, Jaesik | Seoul National University |
Behera, Laxmidhar | IIT Kanpur |
Keywords: Aerial Systems: Applications, SLAM, Embedded Systems for Robotic and Automation
Abstract: We present an accurate and GPU-accelerated Stereo Visual SLAM design called Jetson-SLAM. It exhibits frame- processing rates above 60FPS on NVIDIA’s low-powered 10W Jetson-NX embedded computer and above 200FPS on desktop- grade 200W GPUs, even in stereo configuration and in the multiscale setting. Our contributions are threefold: (i) a Bounded Rectification technique to prevent tagging many non-corner points as a corner in FAST detection, improving SLAM accuracy. (ii) A novel Pyramidal Culling and Aggregation (PyCA) technique that yields robust features while suppressing redundant ones at high speeds by harnessing a GPU device. PyCA uses our new Multi-Location Per Thread culling strategy (MLPT) and Thread-Efficient Warp-Allocation (TEWA) scheme for GPU to enable Jetson-SLAM achieving high accuracy and speed on embedded devices. (iii) Jetson-SLAM library achieves resource efficiency by having a data-sharing mechanism. Our experiments on three challenging datasets: KITTI, EuRoC, and KAIST-VIO, and two highly accurate SLAM backends: Full- BA and ICE-BA show that Jetson-SLAM is the fastest available accurate and GPU-accelerated SLAM system (Fig. 1).
|
|
ThAT24-NT Oral Session, NT-G402 |
Add to My Program |
Robotics and Automation in Agriculture and Forestry II |
|
|
Chair: Karydis, Konstantinos | University of California, Riverside |
Co-Chair: Singh, Arun Kumar | University of Tartu |
|
10:30-12:00, Paper ThAT24-NT.1 | Add to My Program |
Robotic Assessment of a Crop's Need for Watering |
|
Dechemi, Amel | University of California, Riverside |
Chatziparaschis, Dimitrios | UC Riverside |
Chen, Joshua | University of California, Riverside |
Campbell, Merrick | University of California, Riverside |
Shamshirgaran, Azin | University of California, Merced |
Mucchiani, Caio | University of California Riverside |
Roy-Chowdhury, Amit | University of California, Riverside |
Carpin, Stefano | University of California, Merced |
Karydis, Konstantinos | University of California, Riverside |
Keywords: Agricultural Automation, Mechanism Design, Motion and Path Planning
Abstract: This paper focuses on developing a robot-assisted system for stem water potential (SWP) measurement in orchards. SWP is a metric frequently used by agronomists and growers to optimize irrigation schedules for crops. However, such measurements are currently being made via a time- and labor-intensive procedure that faces the challenges of sparse sampling and human variability in determining SWP. In response to these challenges, our proposed robotic system aims to automate time-consuming and difficult to perform tasks, by collecting multiple leaves and automating some parts of the overall SWP analysis process. To achieve so, this work considers three core components: 1) informed planning, to determine where to collect leaves to get the most informative readings; 2) system design and integration for autonomous leaf retrieval with a mobile manipulator and a custom-made end-effector; and 3) learning-based machine vision for automated visual identification of leaf xylem wetness during SWP analysis. Taken together, these constitute the core building-blocks toward enabling complete robot autonomy in physical specimen sampling and transport in the field.
|
|
10:30-12:00, Paper ThAT24-NT.2 | Add to My Program |
Enhancement on Target-Gripper Alignment: A Tomato Harvesting Robot with Dual-Camera Image-Based Visual Servoing |
|
Lian, Feng-Li | National Taiwan University |
Wang, Lu-Ching | National Taiwan University |
Chu, Yen-Cheng | National Taiwan University |
Keywords: Agricultural Automation, Mobile Manipulation, Field Robots
Abstract: Automation application on crops harvesting has increased in the past decades. Various types of harvesting robots are emerging in both commercial and research areas. One of the main challenges is the precision alignment for the gripper and the target crop. An undesired dislocation can do harm to both gripper and crop, which is mainly caused by the uncertainties from the sensors and the manipulator. To solve the problem, the dual-camera setup is designed and implemented on a self-built robot. The perception of the tomato is done by a fi xed depth camera and a camera without depth on the gripper. The proposed dual-camera IBVS (Image-Based Visual Servoing) controller is designed to deal with the image feedback from both cameras and the proof of asymptotically convergence is provided. Furthermore, the cumulative error compensation reduces the time for harvesting process. The experiments were conducted in the greenhouse and tested under various conditions. The time cost is formulated as a function and the success picking rate of tomatoes is 68.4%.
|
|
10:30-12:00, Paper ThAT24-NT.3 | Add to My Program |
Few-Shot Fruit Segmentation Via Transfer Learning |
|
James, Jordan | University of Texas at Arlington |
Manching, Heather K. | North Carolina State University |
Hulse-Kemp, Amanda M. | North Carolina State University |
Beksi, William J. | The University of Texas at Arlington |
Keywords: Agricultural Automation, Computer Vision for Automation, Object Detection, Segmentation and Categorization
Abstract: Advancements in machine learning, computer vision, and robotics have paved the way for transformative solutions in various domains, particularly in agriculture. For example, accurate identification and segmentation of fruits from field images plays a crucial role in automating jobs such as harvesting, disease detection, and yield estimation. However, achieving robust and precise infield fruit segmentation remains a challenging task since large amounts of labeled data are required to handle variations in fruit size, shape, color, and occlusion. In this paper, we develop a few-shot semantic segmentation framework for infield fruits using transfer learning. Concretely, our work is aimed at addressing agricultural domains that lack publicly available labeled data. Motivated by similar success in urban scene parsing, we propose specialized pre-training using a public benchmark dataset for fruit transfer learning. By leveraging pre-trained neural networks, accurate semantic segmentation of fruit in the field is achieved with only a few labeled images. Furthermore, we show that models with pre-training learn to distinguish between fruit still on the trees and fruit that have fallen on the ground, and they can effectively transfer the knowledge to the target fruit dataset.
|
|
10:30-12:00, Paper ThAT24-NT.4 | Add to My Program |
Spatio-Temporal Correspondence Estimation of Growing Plants by Hausdorff Distance Based Skeletonization for Organ Tracking |
|
Pandey, Sharmistha B | ISRO |
Colliaux, David | Sony CSL Paris |
Chaudhury, Ayan | Indian Institute of Technology Kharagpur |
Keywords: Agricultural Automation, Computer Vision for Automation, Visual Tracking
Abstract: Tracking of plant organs over spatio-temporal sequence of point cloud data is one of the demanding tasks of agricultural robotics for automated plant monitoring and growth analysis. Due to the complex geometry of plants, it is extremely difficult to identify and track the individual organs in different growth stages of plants. In this paper, we present an approach to perform correspondence estimation of different plant organs over a series of spatio-temporal data. The approach is based on two stages. In the first stage we develop a robust skeleton extraction method from unstructured plant point cloud data by adopting Hausdorff distance metric and modified breadth first search algorithm. The proposed skeletonization method is shown to be performing better than state-of-the-art, especially in handling very thin and delicate branches. We also address an overlooked problem of connecting skeleton points in the form of a graph, and demonstrate that different types of plant phenotype parameters can be obtained in a fully automatic manner from the skeleton graph. In the second stage, we exploit the skeleton graphs in developing an algorithm to perform correspondence estimation among the skeleton nodes using a cosine similarity based approach. We demonstrate the effectiveness of the proposed skeletonization technique in tracking different organs of the plant by finding good quality correspondences. Experiments are performed on three datasets on real and synthetic sequence of spatio-temporal plant point cloud data to demonstrate the effectiveness of the proposed method.
|
|
10:30-12:00, Paper ThAT24-NT.5 | Add to My Program |
Field Robot for High-Throughput and High-Resolution 3D Plant Phenotyping |
|
Esser, Felix | University of Bonn |
Rosu, Radu Alexandru | University of Bonn |
Cornelissen, Andre | University of Bonn |
Klingbeil, Lasse | University of Bonn |
Kuhlmann, Heiner | University of Bonn |
Behnke, Sven | University of Bonn |
Keywords: Agricultural Automation, Field Robots, Robotics and Automation in Agriculture and Forestry
Abstract: With the need to feed a growing world population, the efficiency of crop production is of paramount importance. To support breeding and field management, various characteristics of the plant phenotype need to be measured a time-consuming process when performed manually. We present a robotic platform equipped with multiple laser and camera sensors for high-throughput, high-resolution in-field plant scanning. We create digital twins of the plants through 3D reconstruction. This allows the estimation of phenotypic traits such as leaf area, leaf angle, and plant height. We validate our system on a real field, where we reconstruct accurate point clouds and meshes of sugar beet, soybean, and maize.
|
|
10:30-12:00, Paper ThAT24-NT.6 | Add to My Program |
A Hybrid Controller Enhancing Transient Performance for an Aerial Manipulator Extracting a Wedged Object (I) |
|
Byun, Jeonghyun | Seoul National University |
Jang, Inkyu | Seoul National University |
Lee, Dongjae | Seoul National University |
Kim, H. Jin | Seoul National University |
Keywords: Aerial Systems: Applications, Hybrid Logical/Dynamical Planning and Verification, Robust/Adaptive Control
Abstract: Autonomous aerial manipulation requires the capability to handle inevitable dynamic changes during physical interaction. Previously, very few studies have addressed the stability and transient performance of the scenarios involving abrupt changes in dynamics. This paper proposes a hybrid controller enhancing transient performance for an aerial manipulator extracting an object wedged in a static structure. This task incurs a significant jump in the interaction force on the end-effector so that the analysis using the concept of hybrid dynamical systems is required. To demonstrate the dynamic characteristics of the object-extracting aerial manipulator, we derive the dynamic equations for two flight modes, i.e., free-flight and object-extracting, and the rule of state jumps. Also, we design control strategies which enhance the transient performance during flight mode transition. Then, the stability of the proposed control law is proven, and the overshoot reduction after the object extraction is analyzed. To show the improved performance, we conduct plug-pulling experiments with a quadrotor-based aerial manipulator using the proposed controller and two different existing controllers. The comparative results confirm that our controller enables the aerial manipulator to maintain its stability after the flight mode transition and shows the best transient performance in overshoot minimization among three controllers.
|
|
10:30-12:00, Paper ThAT24-NT.7 | Add to My Program |
VACNA: Visibility-Aware Cooperative Navigation with Application in Inventory Management |
|
Masnavi, Houman | Institute of Technology, University of Tartu |
Shrestha, Jatan | University of Tartu |
Kruusamäe, Karl | University of Tartu |
Singh, Arun Kumar | University of Tartu |
Keywords: Aerial Systems: Applications, Machine Learning for Robot Control, Optimization and Optimal Control
Abstract: This paper presents an online trajectory planning algorithm for an Unmanned Aerial Vehicle (UAV) to autonomously scan warehouse racks for inventory management. Our main motivation is to make small-sized UAVs with limited computing and sensing hardware capable of reliably performing the scanning task in cluttered environments. To this end, we propose a cooperative system where an Unmanned Ground Vehicle (UGV) guides the UAV using the novel template of visibility-aware cooperative navigation (VACNA). We propose a Cross-Entropy Method (CEM) based approach for solving the trajectory optimization underpinning VACNA. In particular, our CEM projects sampled vehicle trajectories onto the constraint sets before evaluating the cost functions. We further learn a deep generative model in the form of a Conditional Variational Autoencoder (CVAE) from expert demonstrations to warm-start our optimizer. We improve the state-of-the-art in the following respects. First, we present a detailed analysis of the role of our proposed cost and constraint functions for cooperative occlusion-free navigation. Second, we compare our custom CEM optimizer with conventional variants and show significantly reduced collision and occlusion rates. Finally, our CVAE initialization allows our optimizer to operate with smaller batch sizes and achieve real-time performance even on embedded hardware devices like NVIDIA Jetson Xavier.
|
|
10:30-12:00, Paper ThAT24-NT.8 | Add to My Program |
A CBF-Adaptive Control Architecture for Visual Navigation for UAV in the Presence of Uncertainties |
|
Sankaranarayanan, Viswa Narayanan | Lulea University of Techonology |
Saradagi, Akshit | Luleĺ University of Technology, Luleĺ, Sweden |
Satpute, Sumeet | Luleĺ University of Technology |
Nikolakopoulos, George | Luleĺ University of Technology |
Keywords: Aerial Systems: Applications, Robot Safety
Abstract: In this article, we propose a control solution for the safe transfer of a quadrotor UAV between two surface robots positioning itself only using the visual features on the surface robots, which enforces safety constraints for precise landing and visual locking, in the presence of modeling uncertainties and external disturbances. The controller handles the ascending and descending phases of the navigation using a visual locking control barrier function (VCBF) and a parametrizable switching descending CBF (DCBF) respectively, eliminating the need for an external planner. The control scheme has a backstepping approach for the position controller with the CBF filter acting on the position kinematics to produce a filtered virtual velocity control input, which is tracked by an adaptive controller to overcome modeling uncertainties and external disturbances. The experimental validation is carried out with a UAV that navigates from the base to the target using an RGB camera.
|
|
ThAT25-NT Oral Session, NT-G403 |
Add to My Program |
Localization VII |
|
|
Chair: Lu, Chris Xiaoxuan | University College London |
Co-Chair: Schwertfeger, Sören | ShanghaiTech University |
|
10:30-12:00, Paper ThAT25-NT.1 | Add to My Program |
Multimodal Indoor Localization Using Crowdsourced Radio Maps |
|
Yi, Zhaoguang | University of Edinburgh |
Wen, Xiangyu | The University of Edinburgh |
Xia, Qiyue | University of Edinburgh |
Li, Peize | University of Edinburgh |
Zampella, Francisco | Edinburgh Research Center, Huawei Technologies Co., Ltd |
Alsehly, Firas | Huawei Technologies R&D UK |
Lu, Chris Xiaoxuan | University of Edinburgh |
Keywords: Localization
Abstract: Indoor Positioning Systems (IPS) traditionally rely on odometry and building infrastructures like WiFi, often supplemented by building floor plans for increased accuracy. However, the limitation of floor plans in terms of availability and timeliness of updates challenges their wide applicability. In contrast, the proliferation of smartphones and WiFi-enabled robots has made crowdsourced radio maps – databases pairing locations with their corresponding Received Signal Strengths (RSS) – increasingly accessible. These radio maps not only provide WiFi fingerprint-location pairs but encode movement regularities akin to the constraints imposed by floor plans. This work investigates the possibility of leveraging these radio maps as a substitute for floor plans in multimodal IPS. We introduce a new framework to address the challenges of radio map inaccuracies and sparse coverage. Our proposed system integrates an uncertainty-aware neural network model for WiFi localization and a bespoken Bayesian fusion technique for optimal fusion. Extensive evaluations on multiple real-world sites indicate a significant performance enhancement, with results showing sim25% improvement over the best baseline.
|
|
10:30-12:00, Paper ThAT25-NT.2 | Add to My Program |
CLIP-Loc: Multi-Modal Landmark Association for Global Localization in Object-Based Maps |
|
Matsuzaki, Shigemichi | Toyota Motor Corporation |
Sugino, Takuma | Toyota Motor Corporation |
Tanaka, Kazuhito | Toyota Motor Corporation |
Sha, Zijun | Toyota Motor Corporation |
Nakaoka, Shintaro | Keio University |
Yoshizawa, Shintaro | Toyota Motor Corporation |
Shintani, Kazuhiro | Toyota Motor Corporation |
Keywords: Localization
Abstract: This paper describes a multi-modal data association method for global localization using object-based maps and camera images. In global localization, or relocalization, using object-based maps, existing methods typically resort to matching all possible combinations of detected objects and landmarks with the same object category, followed by inlier extraction using RANSAC or brute-force search. This approach becomes infeasible as the number of landmarks increases due to the exponential growth of correspondence candidates. In this paper, we propose labeling landmarks with natural language descriptions and extracting correspondences based on conceptual similarity with image observations using a Vision Language Model (VLM). By leveraging detailed text information, our approach efficiently extracts correspondences compared to methods using only object categories. Through experiments, we demonstrate that the proposed method enables more accurate global localization with fewer iterations compared to baseline methods, exhibiting its efficiency.
|
|
10:30-12:00, Paper ThAT25-NT.3 | Add to My Program |
Multiple Update Particle Filter: Position Estimation by Combining GNSS Pseudorange and Carrier Phase Observations |
|
Suzuki, Taro | Chiba Institute of Technology |
Keywords: Localization
Abstract: This paper presents an efficient method for updating particles in a particle filter (PF) to address the position estimation problem when dealing with sharp-peaked likelihood functions derived from multiple observations. Sharp-peaked likelihood functions commonly arise from millimeter-accurate distance observations of carrier phases in the global navigation satellite system (GNSS). However, when such likelihood functions are used for particle weight updates, the absence of particles within the peaks leads to all particle weights becoming zero. To overcome this problem, in this study, a straightforward and effective approach is introduced for updating particles when dealing with sharp-peaked likelihood functions obtained from multiple observations. The proposed method, termed as the multiple update PF, leverages prior knowledge regarding the spread of distribution for each likelihood function and conducts weight updates and resampling iteratively in the particle update process, prioritizing the likelihood function spreads. Experimental results demonstrate the efficacy of our proposed method, particularly when applied to position estimation utilizing GNSS pseudorange and carrier phase observations. The multiple update PF exhibits faster convergence with fewer particles when compared to the conventional PF. Moreover, vehicle position estimation experiments conducted in urban environments reveal that the proposed method outperforms conventional GNSS positioning techniques, yielding more accurate position estimates.
|
|
10:30-12:00, Paper ThAT25-NT.4 | Add to My Program |
Robust Lifelong Indoor LiDAR Localization Using the Area Graph |
|
Xie, Fujing | Shanghaitech University |
Schwertfeger, Sören | ShanghaiTech University |
Keywords: Localization, Autonomous Vehicle Navigation, Semantic Scene Understanding
Abstract: Lifelong indoor localization in a given map is the basis for navigation of autonomous mobile robots. In this letter, we address the problem of robust localization in cluttered indoor environments like office spaces and corridors using 3D LiDAR point clouds in a given Area Graph, which is a hierarchical, topometric semantic map representation that uses polygons to demark areas such as rooms, corridors or buildings. This representation is very compact, can represent different floors of buildings through its hierarchy and provides semantic information that helps with localization, like poses of doors and glass. In contrast to this, commonly used map representations, such as occupancy grid maps or point clouds, lack these features and require frequent updates in response to environmental changes (e.g. moved furniture), unlike our approach, AGLoc, which matches against lifelong architectural features such as walls and doors. For that we apply filtering to remove clutter from the 3D input point cloud and then employ further scoring and weight functions for localization. Given a broad initial guess from WiFi and barometer localization, our experiments show that our global localization and the weighted point to line ICP pose tracking perform very well, even when compared to localization and SLAM algorithms that use the current, feature-rich cluttered map for localization.
|
|
10:30-12:00, Paper ThAT25-NT.5 | Add to My Program |
Multi-Camera Asynchronous Ball Localization and Trajectory Prediction with Factor Graphs and Human Poses |
|
Xiao, Qingyu | Georgia Institute of Technology |
Zaidi, Zulfiqar | Georgia Institute of Technology |
Gombolay, Matthew | Georgia Institute of Technology |
Keywords: Localization, Probabilistic Inference
Abstract: The rapid and precise localization and prediction of a ball are critical for developing agile robots in ball sports, particularly in sports like tennis characterized by high-speed ball movements and powerful spins. The Magnus effect induced by spin adds complexity to trajectory prediction during flight and bounce dynamics upon contact with the ground. In this study, we introduce an innovative approach that combines a multi-camera system with factor graphs for real-time and asynchronous 3D tennis ball localization. Additionally, we estimate hidden states like velocity and spin for trajectory prediction. Furthermore, to enhance spin inference early in the ball's flight, where limited observations are available, we integrate human pose data using a temporal convolutional network (TCN) to compute spin priors within the factor graph. This refinement provides more accurate spin priors at the beginning of the factor graph, leading to improved early-stage hidden state inference for prediction. Our results show the trained TCN can predict the spin priors with RMSE of 5.27 Hz. Integrating TCN into the factor graph reduces the prediction error of landing positions by over 63.6% compared to a baseline method that utilized an adaptive extended Kalman filter.
|
|
10:30-12:00, Paper ThAT25-NT.6 | Add to My Program |
Doppler-Only Single-Scan 3D Vehicle Odometry |
|
Galeote-Luque, Andres | Universidad De Málaga |
Kubelka, Vladimir | Örebro University |
Magnusson, Martin | Örebro University |
Ruiz-Sarmiento, J.R. | University of Malaga |
Gonzalez-Jimenez, Javier | University of Malaga |
Keywords: Localization, Range Sensing, Autonomous Vehicle Navigation
Abstract: We present a novel 3D odometry method that recovers the full motion of a vehicle only from a Doppler-capable range sensor. It leverages the radial velocities measured from the scene, estimating the sensor's velocity from a single scan. The vehicle's 3D motion, defined by its linear and angular velocities, is calculated taking into consideration its kinematic model which provides a constraint between the velocity measured at the sensor frame and the vehicle frame. Experiments carried out prove the viability of our single-sensor method compared to mounting an additional IMU. Our method provides a more reliable translation of the sensor, compared to the errors linked to IMUs due to noise and biases. Its short-term accuracy and fast operation (~5ms) make it a proper candidate to supply the initialization to more complex localization algorithms or mapping pipelines. Not only does it reduce the error of the mapper, but it does so at a comparable level of accuracy as an IMU would. All without the need to mount and calibrate an extra sensor on the vehicle.
|
|
10:30-12:00, Paper ThAT25-NT.7 | Add to My Program |
Do We Need Scan-Matching in Radar Odometry? |
|
Kubelka, Vladimir | Örebro University |
Fritz, Emil | Örebro University |
Magnusson, Martin | Örebro University |
Keywords: Localization, Range Sensing, Data Sets for SLAM
Abstract: There is a current increase in the development of “4D” Doppler-capable radar and lidar range sensors that produce 3D point clouds where all points also have information about the radial velocity relative to the sensor. 4D radars in particular are interesting for object perception and navigation in low-visibility conditions (dust, smoke) where lidars and cameras typically fail. With the advent of high-resolution Doppler-capable radars comes the possibility of estimating odometry from single point clouds, foregoing the need for scan registration which is error-prone in feature-sparse field environments. We compare several odometry estimation methods, from direct integration of Doppler/IMU data and Kalman filter sensor fusion to 3D scan-to-scan and scan-to-map registration, on three datasets with data from two recent 4D radars and two IMUs. Surprisingly, our results show that the odometry from Doppler and IMU data alone give similar or better results than 3D point cloud registration. In our experiments, the position drift can be as low as 0.9% over 1.8 and 4.5 km trajectories. That allows accurate estimation of 6-DOF ego-motion over long distances also in feature-sparse mine environments. These results are useful not least for applications of navigation with resource-constrained robot platforms in feature-sparse and low-visibility conditions such as mining, construction, and search & rescue operations.
|
|
10:30-12:00, Paper ThAT25-NT.8 | Add to My Program |
Outram: One-Shot Global Localization Via Triangulated Scene Graph and Global Outlier Pruning |
|
Yin, Pengyu | Nanyang Technological University |
Cao, Haozhi | Nanyang Technological University |
Nguyen, Thien-Minh | Nanyang Technological University |
Yuan, Shenghai | Nanyang Technological University |
Zhang, Shuyang | The Hong Kong University of Science and Technology |
Liu, Kangcheng | ETH Zurich |
Xie, Lihua | NanyangTechnological University |
Keywords: Localization, Semantic Scene Understanding, SLAM
Abstract: One-shot LiDAR localization refers to the ability to estimate the robot pose from one single point cloud, which yields significant advantages in initialization and relocalization processes. In the point cloud domain, the topic has been extensively studied as a global descriptor retrieval (i.e., loop closure detection) and pose refinement (i.e., point cloud registration) problem both in isolation or combined. However, few have explicitly considered the relationship between candidate retrieval and correspondence generation in pose estimation, leaving them brittle to substructure ambiguities. To this end, we propose a hierarchical one-shot localization algorithm called Outram that leverages substructures of 3D scene graphs for locally consistent correspondence searching and global substructure-wise outlier pruning. Such a hierarchical process couples the feature retrieval and the correspondence extraction to resolve the substructure ambiguities by conducting a local-to-global consistency refinement. We demonstrate the capability of Outram in a variety of scenarios in multiple large-scale outdoor datasets. Our implementation is open-sourced: https://github.com/Pamphlett/Outram.
|
|
10:30-12:00, Paper ThAT25-NT.9 | Add to My Program |
Fast and Consistent Covariance Recovery for Sliding-Window Optimization-Based VINS |
|
Chen, Chuchu | University of Delaware |
Peng, Yuxiang | University of Delaware |
Huang, Guoquan | University of Delaware |
Keywords: Localization, Visual-Inertial SLAM, SLAM
Abstract: In this paper, we introduce a novel and efficient technique for consistent covariance recovery in nonlinear optimization-based Visual-Inertial Navigation Systems (VINS). Estimating uncertainty in real-time is crucial for evaluating system performance and enhancing downstream operations such as data association. However accessing the marginal covariance of the state variables of interest in optimization-based VINS resents a significant challenge – a computational bottleneck due to the need to invert the high-dimensional information (Hessian) matrix. In our recent work [1], the First-Estimates Jacobian (FEJ) methodology was used to properly fix state linearization points in the optimization-based VINS, which seems counter-intuitive but improves the estimation performance in both consistency and accuracy. Capitalizing on this unique aspect of the FEJ strategy, in this work we carefully design the covariance recovery algorithm to improve efficiency by avoiding redundant computation. Remarkably, our approach achieves a computational speed that is 4-10 times faster than the existing methods. Through comprehensive numerical evaluations across four state-of-the-art marginalization archetypes, we not only affirm the consistency of our covariance estimates but underscore its superior computational efficiency.
|
|
ThAT26-NT Oral Session, NT-G404 |
Add to My Program |
SLAM IV |
|
|
Chair: Pu, Jian | Fudan University |
Co-Chair: Rosen, David | Northeastern University |
|
10:30-12:00, Paper ThAT26-NT.1 | Add to My Program |
LONER: LiDAR Only Neural Representations for Real-Time SLAM |
|
Isaacson, Seth | University of Michigan |
Kung, Pou-Chun | University of Michigan, Ann Arbor |
Srinivasan Ramanagopal, Manikandasriram | Carnegie Mellon University |
Vasudevan, Ram | University of Michigan |
Skinner, Katherine | University of Michigan |
Keywords: SLAM, Mapping, Deep Learning Methods
Abstract: This paper proposes LONER, the first real-time LiDAR SLAM algorithm that uses a neural implicit scene representation. Existing implicit mapping methods for LiDAR show promising results in large-scale reconstruction, but either require groundtruth poses or run slower than real-time. In contrast, LONER uses LiDAR data to train an MLP to estimate a dense map in real-time, while simultaneously estimating the trajectory of the sensor. To achieve real-time performance, this paper proposes a novel information-theoretic loss function that accounts for the fact that different regions of the map may be learned to varying degrees throughout online training. The proposed method is evaluated qualitatively and quantitatively on two open-source datasets. This evaluation illustrates that the proposed loss function converges faster and leads to more accurate geometry reconstruction than other loss functions used in depth-supervised neural implicit frameworks. Finally, this paper shows that LONER estimates trajectories competitively with state-of-the-art LiDAR SLAM methods, while also producing dense maps competitive with existing real-time implicit mapping methods that use groundtruth poses.
|
|
10:30-12:00, Paper ThAT26-NT.2 | Add to My Program |
LIO-EKF: High Frequency LiDAR-Inertial Odometry Using Extended Kalman Filters |
|
Wu, Yibin | University of Bonn |
Guadagnino, Tiziano | University of Bonn |
Wiesmann, Louis | University of Bonn |
Klingbeil, Lasse | University of Bonn |
Stachniss, Cyrill | University of Bonn |
Kuhlmann, Heiner | University of Bonn |
Keywords: SLAM, Mapping, Localization
Abstract: Odometry estimation is crucial for every autonomous system requiring navigation in an unknown environment. In modern mobile robots, 3D LiDAR-inertial systems are often used for this task. By fusing LiDAR scans and IMU measurements, these systems can reduce the accumulated drift caused by sequentially registering individual LiDAR scans and provide a robust pose estimate. Although effective, LiDAR-inertial odometry systems require proper parameter tuning to be deployed. In this paper, we propose LIO-EKF, a tightly-coupled LiDAR-inertial odometry system based on point-to-point registration and the classical extended Kalman filter scheme. We propose an adaptive data association that considers the relative pose uncertainty, the map discretization errors, and the LiDAR noise. In this way, we can substantially reduce the parameters to tune for a given type of environment. The experimental evaluation suggests that the proposed system performs on par with the state-of-the-art LiDAR-inertial odometry pipelines but is significantly faster in computing the odometry. The source code of our implementation is publicly available (https://github.com/YibinWu/LIO-EKF).
|
|
10:30-12:00, Paper ThAT26-NT.3 | Add to My Program |
Multi-LIO: A Lightweight Multiple LiDAR-Inertial Odometry System |
|
Chen, Qi | Fudan University |
Li, Guanghao | Fudan University |
Xue, Xiangyang | Fudan University |
Pu, Jian | Fudan University |
Keywords: SLAM, Mapping, Localization
Abstract: The integration of multiple LiDAR sensors has the potential to significantly enhance odometry systems by providing comprehensive environmental measurements. However, current multiple LiDAR-inertial odometry frameworks face challenges in real-time processing due to the voluminous data generated. This paper introduces a real-time, computationally efficient multiple LiDAR-inertial odometry system (Multi-LIO) that outperforms existing state-of-the-art solutions in accuracy and scalability. Utilizing a novel parallel strategy for state updates and a voxelized map format, Multi-LIO optimizes computational efficiency. Furthermore, we introduce a point-wise uncertainty estimation method to augment the accuracy of scan-to-map registration, particularly in large-scale and complex scenarios. We validate our system's performance through extensive experiments on various challenging sequences. Multi-LIO emerges as a robust, scalable, and extensible solution, adaptable to various LiDAR configurations.
|
|
10:30-12:00, Paper ThAT26-NT.4 | Add to My Program |
The Importance of Coordinate Frames in Dynamic SLAM |
|
Morris, Jesse | University of Sydney |
Wang, Yiduo | University of Sydney |
Ila, Viorela | The University of Sydney |
Keywords: SLAM, Mapping, Localization
Abstract: Most Simultaneous localisation and mapping (SLAM) systems have traditionally assumed a static world, which does not align with real-world scenarios. To enable robots to safely navigate and plan in dynamic environments, it is essential to employ representations capable of handling moving objects. Dynamic SLAM is an emerging field in SLAM research as it improves the overall system accuracy while providing additional estimation of object motions. State-of-the-art literature informs two main formulations for Dynamic SLAM, representing dynamic object points in either the world or object coordinate frame. While expressing object points in their local reference frame may seem intuitive, it does not necessarily lead to the most accurate and robust solutions. This paper conducts and presents a thorough analysis of various Dynamic SLAM formulations, identifying the best approach to address the problem. To this end, we introduce a front-end agnostic framework using GTSAM that can be used to evaluate various Dynamic SLAM formulations.
|
|
10:30-12:00, Paper ThAT26-NT.5 | Add to My Program |
VoxelMap++: Mergeable Voxel Mapping Method for Online LiDAR(-Inertial) Odometry |
|
Wu, Chang | University of Electronic Science and Technology of China (UESTC) |
You, Yuan | University of Electronic Science and Technology of China |
Yuan, Yifei | University of Electronic Science and Technology of China |
Kong, Xiaotong | University of Electronic Science and Technology of China (UESTC) |
Zhang, Ying | University of Electronic Science and Technology of China |
Li, Qiyan | University of Electronic Science and Technology of China |
Zhao, Kaiyong | Hong Hong Baptist University |
Keywords: SLAM, Mapping, Localization
Abstract: This paper presents VoxelMap++: a voxel mapping method with plane merging which can effectively improve the accuracy and efficiency of LiDAR(-inertial) based simultaneous localization and mapping (SLAM). This map is a collection of voxels that contains one plane feature with 3DOF representation and corresponding covariance estimation. Considering total map will contain a large number of coplanar features (kid planes), these kid planes' 3DOF estimation can be regarded as the measurements with covariance of a larger plane (father plane). Thus, we design a plane merging module based on union-find which can save resources and further improve the accuracy of plane fitting. This module can distinguish the kid planes in different voxels and merge these kid planes to estimate the father plane. After merging, the father plane 3DOF representation will be more accurate than the kids plane and the uncertainty will decrease significantly which can further improve the performance of LiDAR(-inertial) odometry. Experiments on challenging environments such as corridors and forests demonstrate the high accuracy and efficiency of our method compared to other state-of-the-art methods (see our attached video). By the way, our implementation VoxelMap++ is open-sourced on GitHub which is applicable for both non-repetitive scanning LiDARs and traditional scanning LiDAR.
|
|
10:30-12:00, Paper ThAT26-NT.6 | Add to My Program |
Efficient and Consistent Bundle Adjustment on Lidar Point Clouds |
|
Liu, Zheng | University of Hong Kong |
Liu, Xiyuan | The University of Hong Kong |
Zhang, Fu | University of Hong Kong |
Keywords: SLAM, Mapping, Localization, Lidar bundle adjustment
Abstract: Simultaneous determination of sensor poses and scene geometry is a fundamental problem for robot vision that is often achieved by Bundle Adjustment (BA). This paper presents an efficient and consistent bundle adjustment method for lidar sensors. The method employs edge and plane features to represent the scene geometry, and directly minimizes the natural Euclidean distance from each raw point to the respective geometry feature. A nice property of this formulation is that the geometry features can be analytically solved, drastically reducing the dimension of the numerical optimization. To represent and solve the resultant optimization problem more efficiently, this paper then adopts and formalises the concept of point cluster, which encodes all raw points associated to the same feature by a compact set of parameters, the point cluster coordinates. We derive the closed-form derivatives, up to the second order, of the BA optimization based on the point cluster coordinates and show their theoretical properties such as the null spaces and sparsity. Based on these theoretical results, this paper develops an efficient second-order BA solver. Besides estimating the lidar
|
|
10:30-12:00, Paper ThAT26-NT.7 | Add to My Program |
DORF: A Dynamic Object Removal Framework for Robust Static LiDAR Mapping in Urban Environments |
|
Chen, Zhiming | Hong Kong University of Science and Technology |
Zhang, Kun | Hong Kong University of Science and Technology |
Chen, Hua | Southern University of Science and Technology |
Wang, Michael Yu | Mywang@gbu.edu.cn |
Zhang, Wei | Southern University of Science and Technology |
Yu, Hongyu | The Hong Kong University of Science and Technology |
Keywords: SLAM, Mapping, Range Sensing
Abstract: 3D point cloud maps are widely used in robotic tasks like localization and planning. However, dynamic objects, such as cars and pedestrians, can introduce ghost artifacts during the map generation process,leading to reduced map quality and hindering normal robot navigation. Online dynamic object removal methods are restricted to utilize only local scope information and have limited performance. To address this challenge, we propose DORF (Dynamic Object Removal Framework), a novel coarse-to-fine offline framework that exploits global 4D spatial-temporal LiDAR information to achieve clean static point cloud map generation, which reaches the state of the art performance among existing offline methods. DORF first conservatively preserves the definite static points leveraging the Receding Horizon Sampling (RHS) mechanism proposed by us. Then DORF gradually recovers more ambiguous static points, guided by the inherent characteristic of dynamic objects in urban environments which necessitates their interaction with the ground. We validate the effectiveness and robustness of DORF across various types of highly dynamic datasets.
|
|
10:30-12:00, Paper ThAT26-NT.8 | Add to My Program |
ImMesh: An Immediate LiDAR Localization and Meshing Framework |
|
Lin, Jiarong | The University of Hong Kong |
Yuan, Chongjian | The University of Hong Kong |
Cai, Yixi | University of Hong Kong |
Li, Haotian | The University of Hong Kong |
Ren, Yunfan | The University of Hong Kong |
Zou, Yuying | The University of Hong Kong |
Hong, Xiaoping | Southern University of Science and Technology |
Zhang, Fu | University of Hong Kong |
Keywords: SLAM, Mapping, Sensor Fusion, 3D Reconstruction
Abstract: In this paper, we propose a novel LiDAR(-inertial) odometry and mapping framework to achieve the goal of simultaneous localization and meshing in real-time. This proposed framework termed ImMesh comprises four tightly-coupled modules: receiver, localization, meshing, and broadcaster. The localization module first utilizes the preprocessed sensor data from the receiver, estimates the sensor pose online by registering LiDAR scans to maps, and dynamically grows the map. Then, our meshing module takes the registered LiDAR scan for incrementally reconstructing the triangle mesh on the fly. Finally, the real-time odometry, map, and mesh are published via our broadcaster. The primary contribution of this work is the meshing module, which represents a scene by an efficient voxel structure, performs fast finding of voxels observed by new scans, and incrementally reconstructs triangle facets in each voxel. To the best of our knowledge, this is the first work in literature that can reconstruct online the triangle mesh of large-scale scenes, just relying on a standard CPU without GPU acceleration.
|
|
10:30-12:00, Paper ThAT26-NT.9 | Add to My Program |
OASIS: Optimal Arrangements for Sensing in SLAM |
|
Kaveti, Pushyami | Northeastern University |
Giamou, Matthew | McMaster University |
Singh, Hanumant | Northeatern University |
Rosen, David | Northeastern University |
Keywords: SLAM, Methods and Tools for Robot System Design, Probability and Statistical Methods
Abstract: The number and arrangement of sensors on mobile robot dramatically influence its perception capabilities. Ensuring that sensors are mounted in a manner that enables accurate detection, localization, and mapping is essential for the success of downstream control tasks. However, when designing a new robotic platform, researchers and practitioners alike usually mimic standard configurations or maximize simple heuristics like field-of-view (FOV) coverage to decide where to place exteroceptive sensors. In this work, we conduct an information-theoretic investigation of this overlooked element of robotic perception in the context of simultaneous localization and mapping (SLAM). We show how to formalize the sensor arrangement problem as a form of subset selection under the E-optimality performance criterion. While this formulation is NP-hard in general, we show that a combination of greedy sensor selection and fast convex relaxation-based post-hoc verification enables the efficient recovery of emph{certifiably optimal} sensor designs in practice. Results from synthetic experiments reveal that sensors placed with OASIS outperform benchmarks in terms of mean squared error of visual SLAM estimates.
|
|
ThAT27-NT Oral Session, NT-G2 |
Add to My Program |
Dexterous Manipulation I |
|
|
Chair: Pinto, Lerrel | New York University |
Co-Chair: Kasaei, Mohammadreza | University of Edinburgh |
|
10:30-12:00, Paper ThAT27-NT.1 | Add to My Program |
See to Touch: Learning Tactile Dexterity through Visual Incentives |
|
Guzey, Irmak | New York University |
Dai, Yinlong | NYU |
Evans, Ben | New York University |
Chintala, Soumith | Facebook AI Research |
Pinto, Lerrel | New York University |
Keywords: Dexterous Manipulation, Force and Tactile Sensing, Imitation Learning
Abstract: Equipping multi-fingered robots with tactile sensing is crucial for achieving the precise, contact-rich, and dexterous manipulation that humans excel at. However, relying solely on tactile sensing fails to provide adequate cues for reasoning about objects' spatial configurations, limiting the ability to correct errors and adapt to changing situations. In this paper, we present Tactile Adaptation from Visual Incentives (TAVI), a new framework that enhances tactile-based dexterity by optimizing dexterous policies using vision-based rewards. First, we use a contrastive-based objective to learn visual representations. Next, we construct a reward function using these visual representations through optimal-transport based matching on one human demonstration. Finally, we use online reinforcement learning on our robot to optimize tactile-based policies that maximize the visual reward. In six challenging tasks, such as peg pick-and-place, unstacking bowls, and flipping slender objects, TAVI achieves a success rate of 73% using our four-fingered Allegro robot hand. The increase in performance is 108% higher than policies using tactile and vision-based rewards and 135% higher than policies without tactile observational input. Robot videos are best viewed on our project website: https://see-to-touch.github.io/.
|
|
10:30-12:00, Paper ThAT27-NT.2 | Add to My Program |
Real-Time Contact State Estimation in Shape Control of Deformable Linear Objects under Small Environmental Constraints |
|
Chen, Kejia | Technical University of Munich |
Bing, Zhenshan | Technical University of Munich |
Wu, Yansong | Technische Universität München |
Wu, Fan | Technical University of Munich |
Zhang, Liding | Technical University of Munich |
Haddadin, Sami | Technical University of Munich |
Knoll, Alois | Tech. Univ. Muenchen TUM |
Keywords: Dual Arm Manipulation, Assembly, Force and Tactile Sensing
Abstract: Controlling the shape of deformable linear objects using robots and constraints provided by environmental fixtures has diverse industrial applications. In order to establish robust contacts with these fixtures, accurate estimation of the contact state is essential for preventing and rectifying potential anomalies. However, this task is challenging due to the small sizes of fixtures, the requirement for real-time performances, and the infinite degrees of freedom of the deformable linear objects. In this paper, we propose a real-time approach for estimating both contact establishment and subsequent changes by leveraging the dependency between the applied and detected contact force on the deformable linear objects. We seamlessly integrate this method into the robot control loop and achieve an adaptive shape control framework which avoids, detects and corrects anomalies automatically. Real-world experiments validate the robustness and effectiveness of our contact estimation approach across various scenarios, significantly increasing the success rate of shape control processes.
|
|
10:30-12:00, Paper ThAT27-NT.3 | Add to My Program |
Self-Supervised Learning for Joint Pushing and Grasping Policies in Highly Cluttered Environments |
|
Wang, Yongliang | University of Groningen |
Mokhtar, Kamal | University of Groningen |
Heemskerk, Cock | Heemkerk Innovative Technology |
Kasaei, Hamidreza | University of Groningen |
Keywords: Dual Arm Manipulation, Reinforcement Learning
Abstract: Robotic systems often face challenges when attempting to grasp a target object due to interference from surrounding items. We propose a Deep Reinforcement Learning (DRL) method that develops joint policies for grasping and pushing, enabling effective manipulation of target objects within untrained, densely cluttered environments. In particular, a dual RL model is introduced, which presents high resilience in handling complicated scenes, reaching an average of 98% task completion in simulation and real-world scenes. To evaluate the proposed method, we conduct comprehensive simulation experiments in three distinct environments: densely packed building blocks, randomly positioned building blocks, and common household objects. Further, real-world tests are conducted using actual robots to confirm the robustness of our approach in various untrained and highly cluttered environments. The results from experiments underscore the superior efficacy of our method in both simulated and real-world scenarios, outperforming recent state-of-the-art methods. To ensure reproducibility and further the academic discourse, we make available a demonstration video, the trained models, and the source code for public access.
|
|
10:30-12:00, Paper ThAT27-NT.4 | Add to My Program |
A Robust Model Predictive Controller for Tactile Servoing |
|
Wang, Shuai | Tencent |
Huang, Yihao | The University of Manchester |
Lee, Wang Wei | Tencent |
Liu, Tianliang | Harbin Institute of Technology |
Teng, Xiao | Keppel-NUS Corporate Lab, National University of Singapore |
Zheng, Yu | Tencent |
Li, Qiang | Shenzhen Technology University |
Keywords: Dexterous Manipulation, Grasping, Sensor-based Control
Abstract: Tactile servoing is an effective approach to enabling robots to safely interact with unknown environments. One of the core problems in tactile servoing is to robustly converge the contact features to the desired ones via a dedicated controller. This paper proposes a Data-Driven Model Predictive Controller (DDMPC) to compute the motion command given the previous interaction experience and feature deviations in tactile space. Compared with the manually designed PID-based controller, the proposed controller depends on the sound control theory and its convergence is guaranteed from a computational perspective. It is applied to the balancing control of a rolling bottle on a robotic forearm covered by a custom tactile sensor array. The real experiment demonstrates the superior robustness of the proposed approach and shows its great potential for other tactile servoing scenarios with measurement noise, which is inevitable for current tactile sensors.
|
|
10:30-12:00, Paper ThAT27-NT.5 | Add to My Program |
Harnessing the Synergy between Pushing, Grasping, and Throwing to Enhance Object Manipulation in Cluttered Scenarios |
|
Kasaei, Hamidreza | University of Groningen |
Kasaei, Mohammadreza | University of Edinburgh |
Keywords: Reinforcement Learning, Machine Learning for Robot Control, Sensorimotor Learning
Abstract: In this work, we delve into the intricate synergy among non-prehensile actions like pushing, and prehensile actions such as grasping and throwing, within the domain of robotic manipulation. We introduce an innovative approach to learning these synergies by leveraging model-free deep reinforcement learning. The robot's workflow involves detecting the pose of the target object and the basket at each time step, predicting the optimal push configuration to isolate the target object, determining the appropriate grasp configuration, and inferring the necessary parameters for an accurate throw into the basket. This empowers robots to skillfully reconfigure cluttered scenarios through pushing, creating space for collision-free grasping actions. Simultaneously, we integrate throwing behavior, showcasing how this action significantly extends the robot's operational reach. Ensuring safety, we developed a simulation environment in Gazebo for robot training, applying the learned policy directly to our real robot. Notably, this work represents a pioneering effort to learn the synergy between pushing, grasping, and throwing actions. Extensive experimentation in both simulated and real-robot scenarios substantiates the effectiveness of our approach across diverse settings. Our approach achieves a success rate exceeding 80% in both simulated and real-world scenarios. A video showcasing our experiments is available online at: https://youtu.be/q1l4BJVDbRw
|
|
10:30-12:00, Paper ThAT27-NT.6 | Add to My Program |
Direct Self-Identification of Inverse Jacobians for Dexterous Manipulation through Particle Filtering |
|
Grace, Joshua | Yale University |
Chanrungmaneekul, Podshara | Rice University |
Hang, Kaiyu | Rice University |
Dollar, Aaron | Yale University |
Keywords: Dexterous Manipulation, In-Hand Manipulation, Model Learning for Control
Abstract: The ability to plan and control robotic in-hand manipulation is challenged by several issues, including the required amount of prior knowledge of the system and the sophisticated physics that varies across different robot hands or even grasp instances. One of the most direct models of in-hand manipulation is the inverse Jacobian, which can directly map from the desired in-hand object motions to the required hand actuator controls. However, acquiring such inverse Jacobians without complex hand-object system models is typically infeasible. We present a method for controlling in-hand manipulation using inverse Jacobians that are self-identified by a particle filter-based estimation scheme that leverages the ability of underactuated hands to maintain a passively stable grasp during self-identification movements. This method requires no a priori knowledge of the specific hand-object system and learns the system's inverse Jacobian through small exploratory motions. Our system approximates the underlying inverse Jacobian closely, which can be used to perform manipulation tasks across a range of objects successfully. With extensive experiments on a Yale Model O hand, we show that the proposed system can provide accurate in-hand manipulation of sub-millimeter precision and that the inverse Jacobian-based controller can support real-time manipulation control of up to 900Hz.
|
|
10:30-12:00, Paper ThAT27-NT.7 | Add to My Program |
Masked Visual-Tactile Pre-Training for Robot Manipulation |
|
Liu, Qingtao | Zhejiang University |
Ye, Qi | Zhejiang University |
Sun, Zhengnan | Zhejiang University |
Cui, Yu | Zhejiang University |
Li, Gaofeng | Zhejiang University |
Chen, Jiming | Zhejiang University |
Keywords: Dexterous Manipulation, Sensor Fusion, Visual Learning
Abstract: Recent works on the pretraining for robot manipulation have demonstrated that representations learning from large human manipulation data can generalize well to new manipulation tasks and environments. However, these approaches mainly focus on human vision or natural language, neglecting tactile feedback. In this article, we make an attempt to explore how to pre-train a representation model for robotic manipulation using both human manipulation visual and tactile data. We develop a system for collecting visual and tactile data, featuring a cost-effective tactile glove to capture human tactile data and Hololens2 for capturing visual data. With this system, we collect a dataset of turning bottle caps. Furthermore, we introduce a novel visual-tactile fusion network and learning strategy, with one key module to tokenize 20 sparse binary tactile signals sensing touch states for the learning of tactile context and the other key module applying the attention and mask mechanism to the interaction of visual and tactile tokens for visual-tactile representation learning. We utilize our dataset to pre-train the fusion model and embed the pre-trained model into a reinforcement learning framework for downstream tasks. Experimental results demonstrate that our pre-trained model significantly aids in learning manipulation skills. Compared to methods without pre-training, our approach achieves a success rate increase of over 60%. Additionally, when compared to current visual pre-training methods, our success rate exceeds them by more than 50%.
|
|
10:30-12:00, Paper ThAT27-NT.8 | Add to My Program |
Tactile Estimation of Extrinsic Contact Patch for Stable Placement |
|
Ota, Kei | Tokyo Institute of Technology |
Jha, Devesh | Mitsubishi Electric Research Laboratories |
Jatavallabhula, Krishna Murthy | MIT |
Kanezaki, Asako | Tokyo Institute of Technology |
Tenenbaum, Joshua | Massachusetts Institute of Technology |
Keywords: Dexterous Manipulation, Force and Tactile Sensing, Deep Learning in Grasping and Manipulation
Abstract: Precise perception of contact interactions is essential for fine-grained manipulation skills for robots. In this paper, we present the design of feedback skills for robots that must learn to stack complex-shaped objects on top of each other (see Fig.1). To design such a system, a robot should be able to reason about the stability of placement from very gentle contact interactions. Our results demonstrate that it is possible to infer the stability of object placement based on tactile readings during contact formation between the object and its environment. In particular, we estimate the contact patch between a grasped object and its environment using force and tactile observations to estimate the stability of the object during a contact formation. The contact patch could be used to estimate the stability of the object upon release of the grasp. The proposed method is demonstrated in various pairs of objects that are used in a very popular board game.
|
|
10:30-12:00, Paper ThAT27-NT.9 | Add to My Program |
Robotic Manipulation of Hand Tools: The Case of Screwdriving |
|
Tang, Ling | Iowa State University |
Jia, Yan-Bin | Iowa State University |
Xue, Yuechuan | Amazon.com |
Keywords: Dexterous Manipulation, In-Hand Manipulation, Force Control
Abstract: Despite decades of steady research progress, the robotic hand is still far behind the human hand in terms of dexterity and versatility. A milestone in this quest for human-level performance will be possessing the skills of manipulating hand tools, for their non-trivial geometries and for the intricacies of controlling their contact-based interactions with objects, which are the final targets of manipulation. This paper investigates screwdriving by a robotic arm/hand pair, dealing with the chain of contacts connecting the substrate, screw, screwdriver, and fingertips. Considering rolling contacts and finger gaits, our force control scheme is derived through backward chaining to leverage the dynamics of the screwdriver and arm/hand. To maintain the fastening effort, estimations are carried out sequentially for the screwdriver's pose via optimization under visual and kinematic constraints, and for its applied wrench on the screw via solution drawing upon dynamics. This wrench, adjusted based on position/force feedback, is mapped by the grasp matrix to the desired fingertip forces, which are then used for computing torques to be exerted by the arm and hand to close the loop. Simulation and experiments with a Shadow Hand have been conducted for validations.
|
|
ThAT28-NT Oral Session, NT-G4 |
Add to My Program |
Perception for Grasping and Manipulation I |
|
|
Chair: Jin, Long | Lanzhou University |
Co-Chair: Lee, Dongjun | Seoul National University |
|
10:30-12:00, Paper ThAT28-NT.1 | Add to My Program |
You Only Scan Once: A Dynamic Scene Reconstruction Pipeline for 6-DoF Robotic Grasping of Novel Objects |
|
Zhou, Lei | National University of Singapore |
Wang, Haozhe | National University of Singapore |
Zhang, Zhengshen | National University of Singapore |
Liu, Zhiyang | National University of Singapore |
Tay, Francis | NUS |
Ang Jr, Marcelo H | National University of Singapore |
Keywords: Perception for Grasping and Manipulation
Abstract: In the realm of robotic grasping, achieving accurate and reliable interactions with the environment is a pivotal challenge. Traditional methods of grasp planning methods utilizing partial point clouds derived from depth image often suffer from reduced scene understanding due to occlusion, ultimately impeding their grasping accuracy. Furthermore, scene reconstruction methods have primarily relied upon static techniques, which are susceptible to environment change during manipulation process limits their efficacy in real-time grasping tasks. To address these limitations, this paper introduces a novel two-stage pipeline for dynamic scene reconstruction. In the first stage, our approach takes scene scanning as input to register each target object with mesh reconstruction and novel object pose tracking. In the second stage, pose tracking is still performed to provide object poses in real-time, enabling our approach to transform the reconstructed object point clouds back into the scene. Unlike conventional methodologies, which rely on static scene snapshots, our method continuously captures the evolving scene geometry, resulting in a comprehensive and up-to-date point cloud representation. By circumventing the constraints posed by occlusion, our method enhances the overall grasp planning process and empowers state-of-the-art 6-DoF robotic grasping algorithms to exhibit markedly improved accuracy.
|
|
10:30-12:00, Paper ThAT28-NT.2 | Add to My Program |
Thermoformed Electronic Skins for Conformal Tactile Sensor Arrays |
|
Lu, Peng | Tencent |
Liang, Jiaming | Tencent |
Huang, Bidan | Tencent |
Yang, Sicheng | Tencent |
Lee, Wang Wei | Tencent |
Keywords: Perception for Grasping and Manipulation
Abstract: Robots and prostheses are increasingly designed with curvilinear surfaces for functional, aesthetic, aerodynamic, and safety reasons. Electronic skins (e-skins) capable of sensing contact location and pressure across complex, non-developable surfaces are essential for empowering next-generation robots with tactile awareness. This will facilitate safe and natural human-machine interactions while enhancing object manipulation capabilities. Despite the evident advantages of conformal e-skins, current fabrication methods face significant challenges in realizing their full potential. In this paper, we introduce thermoforming as a technique to efficiently fabricate tactile sensitive e-skins that conform to curvilinear surfaces. The performance, repeatability and uniformity of the sensors are characterized in detail. We also present a custom calibration pipeline where accurate digital replicas of conformal e-skins are generated for use in simulations. Finally, we demonstrate the benefits of 3D e-skins in a tool manipulation task.
|
|
10:30-12:00, Paper ThAT28-NT.3 | Add to My Program |
The GEM-C Controller for Load Compensation in Object Manipulation |
|
Papadakis, Emmanouil | Foundation for Research and Technology - Hellas |
Sigalas, Markos | Foundation for Research and Technology - Hellas |
Vangos, Michail | Foundation for Research and Technology-Hellas |
Trahanias, Panos | Foundation for Research and Technology – Hellas (FORTH) |
Keywords: Perception for Grasping and Manipulation, Control Architectures and Programming, Industrial Robots
Abstract: Nowadays, robotic arms are ubiquitously employed for object manipulation across a spectrum of applications, spanning from production lines to warehouses, and encompassing both stationary and mobile robotic systems. Among the most prevalent end-effectors, used for the majority of these applications, are the suction cups. The rudimentary act of grasping an object and relocating it, devoid of a cognizant awareness of the forces stemming from the object's motion and grip, can result in suboptimal and inefficient robot movements. In more dire circumstances, such negligent handling may precipitate detachment of the object from the end-effector, potentially incurring damage to either the object or the arm. In this paper, we build upon the advanced sensing and attaching capabilities of our suction cup MIGHTY, and introduce GEM-C, a novel Gravity, External forces and Motion Compensation controller, that constantly adapts the orientation of the suction cup so as to enhance the quality of attachment. Throughout all examined scenarios and experiments, our approach remarkably improved the robot's performance. by providing with the optimal end-effector pose while also reducing the stress on the motors and the overall power consumption. The derived results, clearly demonstrate the MIGHTY and GEM-C schema's potential for a wide range of long-term and complex robotic manipulation applications.
|
|
10:30-12:00, Paper ThAT28-NT.4 | Add to My Program |
Sim-To-Real Grasp Detection with Global-To-Local RGB-D Adaptation |
|
Ma, Haoxiang | Beihang University |
Qin, Ran | Beihang University |
Shi, Modi | Beihang University |
Gao, Boyang | Geometry Robotics Ltd. Harbin Institute of Technology |
Huang, Di | Beihang University |
Keywords: Perception for Grasping and Manipulation
Abstract: This paper focuses on the sim-to-real issue of RGB-D grasp detection and formulates it as a domain adaptation problem. In this case, we present a global-to-local method to address hybrid domain gaps in RGB and depth data and insufficient multi-modal feature alignment. First, a self-supervised rotation pre-training strategy is adopted to deliver robust initialization for RGB and depth networks. We then propose a global-to-local alignment pipeline with individual global domain classifiers for scene features of RGB and depth images as well as a local one specifically working for grasp features in the two modalities. In particular, we propose a grasp prototype adaptation module, which aims to facilitate fine-grained local feature alignment by dynamically updating and matching the grasp prototypes from the simulation and real-world scenarios throughout the training process. Due to such designs, the proposed method substantially reduces the domain shift and thus leads to consistent performance improvements. Extensive experiments are conducted on the GraspNet-Planar benchmark and physical environment, and superior results are achieved which demonstrate the effectiveness of our method.
|
|
10:30-12:00, Paper ThAT28-NT.5 | Add to My Program |
Residual-NeRF : Learning Residual NeRFs for Transparent Object Manipulation |
|
Duisterhof, Bart | Carnegie Mellon University |
Mao, Yuemin | Carnegie Mellon University |
Teng, Si Heng | DSTA |
Ichnowski, Jeffrey | Carnegie Mellon University |
Keywords: Perception for Grasping and Manipulation, Computer Vision for Automation, RGB-D Perception
Abstract: Transparent objects are ubiquitous in industry, pharmaceuticals, and households. Grasping and manipulating these objects is a significant challenge for robots. Existing methods have difficulty reconstructing complete depth maps for challenging transparent objects, leaving holes in the depth reconstruction. Recent work has shown neural radiance fields (NeRFs) work well for depth perception in scenes with transparent objects, and these depth maps can be used to grasp transparent objects with high accuracy. NeRF-based depth reconstruction can still struggle with especially challenging transparent objects and lighting conditions. In this work, we propose Residual-NeRF, a method to improve depth perception and training speed for transparent objects. Robots often operate in the same area, such as a kitchen. By first learning a background NeRF of the scene without transparent objects to be manipulated, we reduce the ambiguity faced by learning the changes with the new object. We propose training two additional networks: a residual NeRF learns to infer residual RGB values and densities, and a Mixnet learns how to combine background and residual NeRFs. We contribute synthetic and real experiments that suggest Residual-NeRF improves depth perception of transparent objects. The results on synthetic data suggest Residual-NeRF outperforms the baselines with a 46.1% lower RMSE and a 29.5% lower MAE. Real-world qualitative experiments suggest Residual-NeRF leads to more robust depth maps with less noise and fewer holes. Website: https://residual-nerf.github.io
|
|
10:30-12:00, Paper ThAT28-NT.6 | Add to My Program |
Online 3D Edge Reconstruction of Wiry Structures from Monocular Image Sequences |
|
Choi, Hyelim | Seoul National University |
Lee, Minji | Seoul National University |
Kang, Jiseock | Seoul National University |
Lee, Dongjun | Seoul National University |
Keywords: Perception for Grasping and Manipulation, Computer Vision for Automation, RGB-D Perception
Abstract: Three-dimensional (3D) reconstruction of wiry structures from vision suffers from thin geometry, lack of texture, and severe self-occlusions. We propose an online 3D edge reconstruction framework that uses monocular image sequences to reconstruct the wiry structures whose skeletons are mainly straight as commonly found in the real world. To reconstruct such structures in an efficient manner, we employ straight edges constructed from points as underlying primitives of the representation. This is to address the harsh geometric nature of wiry objects (e.g., severe self-occlusion) and also to avoid a typically expensive line matching process. Specifically, we first construct sparse 3D points by tracking feature points, while simultaneously refining the camera poses via a robust maximum a posteriori (MAP) inference. These sparse points are then used to generate edge candidates and the belief of each candidate is updated in a Bayesian fashion using a likelihood evaluated on the image observation. Finally, we take the set of 3D edges with beliefs greater than a threshold and apply a post-processing step to reject false edges. We experimentally validate our framework using real-world wiry objects and demonstrate a manipulation task using the reconstruction. The proposed framework exhibits superior performance over state-of-the-art algorithms for the class of wiry structures and the potential to be easily used for subsequent robotic tasks.
|
|
10:30-12:00, Paper ThAT28-NT.7 | Add to My Program |
Kitchen Artist: Precise Control of Liquid Dispensing for Gourmet Plating |
|
Huang, Hung-Jui | Carnegie Mellon University |
Xiang, Jingyi | University of Illinois at Urbana-Champaign |
Yuan, Wenzhen | University of Illinois |
Keywords: Perception for Grasping and Manipulation, Art and Entertainment Robotics, Perception-Action Coupling
Abstract: Manipulating liquid is widely required for many tasks, especially in cooking. A common way to address this is extruding viscous liquid from a squeeze bottle. In this work, our goal is to create a sauce plating robot, which requires precise control of the thickness of squeezed liquids on a surface. Different liquids demand different manipulation policies. We command the robot to rotate the container and monitor the liquid response using a force sensor to identify liquid properties. Based on the liquid properties, we predict the liquid behavior with fixed squeezing motions in a data-driven way and calculate the required drawing speed for the desired stroke size. This open-loop system works effectively even without sensor feedback. Our experiments demonstrate accurate stroke size control across different liquids and fill levels. We show that understanding liquid properties can facilitate effective liquid manipulation. More importantly, our dish garnishing robot has a wide range of applications and holds significant commercialization potential.
|
|
10:30-12:00, Paper ThAT28-NT.8 | Add to My Program |
HandNeRF: Learning to Reconstruct Hand-Object Interaction Scene from a Single RGB Image |
|
Choi, Hongsuk | Samsung AI Center, New York |
Chavan-Dafle, Nikhil | Samsung Research America |
Yuan, Jiacheng | University of Minnesota |
Isler, Volkan | University of Minnesota |
Park, Hyun Soo | University of Minnesota |
Keywords: Perception for Grasping and Manipulation, Deep Learning for Visual Perception
Abstract: This paper presents a method to learn hand-object interaction prior for reconstructing a 3D hand-object scene from a single RGB image. The inference as well as training-data generation for 3D hand-object scene reconstruction is challenging due to the depth ambiguity of a single image and occlusions by the hand and object. We turn this challenge into an opportunity by utilizing the hand shape to constrain the possible relative configuration of the hand and object geometry. We design a generalizable implicit function, HandNeRF, that explicitly encodes the correlation of the 3D hand shape features and 2D object features to predict the hand and object scene geometry. With experiments on real-world datasets, we show that HandNeRF is able to reconstruct hand-object scenes of novel grasp configurations more accurately than comparable methods. Moreover, we demonstrate that object reconstruction from HandNeRF ensures more accurate execution of downstream tasks, such as grasping for robotic hand-over.
|
|
10:30-12:00, Paper ThAT28-NT.9 | Add to My Program |
Robotic Grasping of Harvested Tomato Trusses Using Vision and Online Learning |
|
van den Bent, Luuk | Technical University Delft |
Coleman, Tomás | TU Delft |
Babuska, Robert | Delft University of Technology |
Keywords: Perception for Grasping and Manipulation, Agricultural Automation, Continual Learning
Abstract: Currently, truss tomato weighing and packaging require significant manual work. The main obstacle to automation lies in the difficulty of developing a reliable robotic grasping system for already harvested trusses. We propose a method to grasp trusses that are stacked in a crate with considerable clutter, which is how they are commonly stored and transported after harvest. The method consists of a deep learning-based vision system to first identify the individual trusses in the crate and then determine a suitable grasping location on the stem. To this end, we have introduced a grasp pose ranking algorithm with online learning capabilities. After selecting the most promising grasp pose, the robot executes a pinch grasp without needing touch sensors or geometric models. Lab experiments with a robotic manipulator equipped with an eye-in-hand RGB-D camera showed a 100% clearance rate when tasked to pick all trusses from a pile. 93% of the trusses were successfully grasped on the first try, while the remaining 7% required more attempts.
|
|
ThAT29-NT Oral Session, NT-G5 |
Add to My Program |
Object Detection III |
|
|
Chair: Nguyen, Anh | University of Liverpool |
Co-Chair: Xiang, Yu | University of Texas at Dallas |
|
10:30-12:00, Paper ThAT29-NT.1 | Add to My Program |
RISeg: Robot Interactive Object Segmentation Via Body Frame-Invariant Features |
|
Qian, Howard H. | Rice University |
Lu, Yangxiao | The University of Texas at Dallas |
Ren, Kejia | Rice University |
Wang, Gaotian | Rice University |
Khargonkar, Ninad | University of Texas at Dallas |
Xiang, Yu | University of Texas at Dallas |
Hang, Kaiyu | Rice University |
Keywords: Object Detection, Segmentation and Categorization, Perception for Grasping and Manipulation
Abstract: In order to successfully perform manipulation tasks in new environments, such as grasping, robots must be proficient in segmenting unseen objects from the background and/or other objects. Previous works perform unseen object instance segmentation (UOIS) by training deep neural networks on large-scale data to learn RGB/RGB-D feature embeddings, where cluttered environments often result in inaccurate segmentations. We build upon these methods and introduce a novel approach to correct inaccurate segmentation, such as under-segmentation, of static image-based UOIS masks by using robot interaction and a designed body frame-invariant feature. We demonstrate that the relative linear and rotational velocities of frames randomly attached to rigid bodies due to robot interactions can be used to identify objects and accumulate corrected object-level segmentation masks. By introducing motion to regions of segmentation uncertainty, we are able to drastically improve segmentation accuracy in an uncertainty-driven manner with minimal, non-disruptive interactions (ca. 2-3 per scene). We demonstrate the effectiveness of our proposed interactive perception pipeline in accurately segmenting cluttered scenes by achieving an average object segmentation accuracy rate of 80.7%, an increase of 28.2% when compared with other state-of-the-art UOIS methods.
|
|
10:30-12:00, Paper ThAT29-NT.2 | Add to My Program |
PUMA: Fully Decentralized Uncertainty-Aware Multiagent Trajectory Planner with Real-Time Image Segmentation-Based Frame Alignment |
|
Kondo, Kota | Massachusetts Institute of Technology |
Tewari, Claudius Taroon | Massachusetts Institute of Technology |
Peterson, Mason B. | Massachusetts Institute of Technology |
Thomas, Annika | Massachusetts Institute of Technology |
Kinnari, Jouko | Saab Finland Oy |
Tagliabue, Andrea | Massachusetts Institute of Technology |
How, Jonathan | Massachusetts Institute of Technology |
Keywords: Object Detection, Segmentation and Categorization, Planning under Uncertainty, Distributed Robot Systems
Abstract: Fully decentralized, multiagent trajectory planners enable complex tasks like search and rescue or package delivery by ensuring safe navigation in unknown environments. However, deconflicting trajectories with other agents and en- suring collision-free paths in a fully decentralized setting is complicated by dynamic elements and localization uncertainty. To this end, this paper presents (1) an uncertainty-aware multiagent trajectory planner and (2) an image segmentation-based frame alignment pipeline. The uncertainty-aware planner propagates uncertainty associated with the future motion of detected obstacles, and by incorporating this propagated uncertainty into optimization constraints, the planner effectively navigates around obstacles. Unlike conventional methods that emphasize explicit obstacle tracking, our approach integrates implicit tracking. Sharing trajectories between agents can cause potential collisions due to frame misalignment. Addressing this, we introduce a novel frame alignment pipeline that rectifies inter-agent frame misalignment. This method leverages a zero-shot image segmentation model for detecting objects in the environment and a data association framework based on geometric consistency for map alignment. Our approach accurately aligns frames with only 0.18 m and 2.7 degrees of mean frame alignment error in our most challenging simulation scenario. In addition, we conducted hardware experiments and successfully achieved 0.29 m and 2.59 degrees of frame alignment error. Together with the alignment framework, our planner ensures safe navigation in unknown environments and collision avoidance in decentralized settings.
|
|
10:30-12:00, Paper ThAT29-NT.3 | Add to My Program |
Open-Vocabulary Affordance Detection Using Knowledge Distillation and Text-Point Correlation |
|
Van Vo, Tuan | FPT Software |
Vu, Minh Nhat | TU Wien, Austria |
Huang, Baoru | Imperial College London |
Nguyen, Tien Toan | FPT Software |
Le, Ngan | University of Arkansas |
Vo, Thieu | Ton Duc Thang University |
Nguyen, Anh | University of Liverpool |
Keywords: Object Detection, Segmentation and Categorization, Perception for Grasping and Manipulation
Abstract: Affordance detection presents intricate challenges and has a wide range of robotic applications. Previous works have faced limitations such as the complexities of 3D object shapes, the wide range of potential affordances on real-world objects, and the lack of open-vocabulary support for affordance understanding. In this paper, we introduce a new open-vocabulary affordance detection method in 3D point clouds, leveraging knowledge distillation and text-point correlation. Our approach employs pre-trained 3D models through knowledge distillation to enhance feature extraction and semantic understanding in 3D point clouds. We further introduce a new text-point correlation method to learn the semantic links between point cloud features and open-vocabulary labels. The intensive experiments show that our approach outperforms previous works and adapts to new affordance labels and unseen objects. Notably, our method achieves the improvement of 7.96% mIOU score compared to the baselines. Furthermore, it offers real-time inference which is well-suitable for robotic manipulation applications.
|
|
10:30-12:00, Paper ThAT29-NT.4 | Add to My Program |
MEDL-U: Uncertainty-Aware 3D Automatic Annotation Based on Evidential Deep Learning |
|
Paat, Helbert | HKUST |
Lian, Qing | HKUST |
Yao, Weilong | Autowise.AI |
Zhang, Tong | Hong Kong University of Science and Technology |
Keywords: Object Detection, Segmentation and Categorization, Intelligent Transportation Systems, Probabilistic Inference
Abstract: Advancements in deep learning-based 3D object detection necessitate the availability of large-scale datasets. However, this requirement introduces the challenge of manual annotation, which is often both burdensome and time-consuming. To tackle this issue, the literature has seen the emergence of several weakly supervised frameworks for 3D object detection which can automatically generate pseudo labels for unlabeled data. Nevertheless, these generated pseudo labels contain noise and are not as accurate as those labeled by humans. In this paper, we present the first approach that addresses the inherent ambiguities present in pseudo labels by introducing an Evidential Deep Learning (EDL) based uncertainty estimation framework. Specifically, we propose MEDL-U, an EDL framework based on MTrans, which not only generates pseudo labels but also quantifies the associated uncertainties. However, applying EDL to 3D object detection presents three key challenges: (1) lower pseudo label quality in comparison to other autolabelers; (2) high evidential uncertainty estimates; and (3) lack of clear interpretability and effective utilization of uncertainties for downstream tasks. We tackle these issues through the introduction of an uncertainty-aware IoU-based loss, an evidence-aware multi-task loss, and the implementation of a post-processing stage for uncertainty refinement. Our experimental results demonstrate that probabilistic detectors trained using the outputs of MEDL-U surpass deterministic detectors trained using outputs from previous 3D annotators on the KITTI val set for all difficulty levels. Moreover, MEDL-U achieves state-of-the-art results on the KITTI official test set compared to existing 3D automatic annotators. Code is publicly available at https://github.com/paathelb/MEDL-U.
|
|
10:30-12:00, Paper ThAT29-NT.5 | Add to My Program |
HandyPriors: Physically Consistent Perception of Hand-Object Interactions with Differentiable Priors |
|
Zhang, Shutong | University of Toronto |
Qiao, Yi-Ling | University of Maryland, College Park |
Zhu, Guanglei | University of Toronto, Carnegie Mellon University |
Heiden, Eric | NVIDIA |
Turpin, Dylan | University of Toronto |
Liu, Jingzhou | University of Toronto, NVIDIA |
Lin, Ming C. | University of Maryland at College Park |
Macklin, Miles | University of Copenhagen, NVIDIA |
Garg, Animesh | Georgia Institute of Technology |
Keywords: Object Detection, Segmentation and Categorization, Multifingered Hands
Abstract: Various heuristic objectives for modeling hand-object interaction have been proposed in past work. However, due to the lack of a cohesive framework, these objectives often possess a narrow scope of applicability and are limited by their efficiency or accuracy. In this paper, we propose HandyPriors, a unified and general pipeline for human-object interaction scenes by leveraging recent advances in differentiable physics and rendering. Our approach employs rendering priors to align with input images and segmentation masks along with physics priors to mitigate penetration and relative-sliding across frames. Furthermore, we present two alternatives for hand and object pose estimation. The optimization-based pose estimation achieves higher accuracy, while the filtering-based tracking, which utilizes the differentiable priors as dynamics and observation models, executes faster. We demonstrate that HandyPriors attains comparable or superior results in the pose estimation task, and that the differentiable physics module can predict contact information for pose refinement. We also show that our approach generalizes to perception tasks, including robotic hand manipulation and human-object pose estimation in the wild.
|
|
10:30-12:00, Paper ThAT29-NT.6 | Add to My Program |
Dynamic Occupancy Grids for Object Detection: A Radar-Centric Approach |
|
Ronecker, Max Peter | SETLabs Research GmbH |
Markus Schratter, Markus | Virtual Vehicle Research GmbH |
Kuschnig, Lukas | Virtual Vehicle Research GmbH |
Watzenig, Daniel | TU Graz |
Keywords: Object Detection, Segmentation and Categorization, Intelligent Transportation Systems, Sensor Fusion
Abstract: Dynamic Occupancy Grid Mapping is a technique used to generate a local map of the environment, containing both static and dynamic information. Typically, these maps are primarily generated using lidar measurements. However, with improvements in radar sensing, resulting in better accuracy and higher resolution, radar is emerging as a viable alternative to lidar as the primary sensor for mapping. In this paper, we propose a radar-centric dynamic occupancy grid mapping algorithm with adaptations to the state computation, inverse sensor model, and field-of-view computation tailored to the specifics of radar measurements. We extensively evaluate our approach with real data to demonstrate its effectiveness and establish the first benchmark for radar-based dynamic occupancy grid mapping using the publicly available Radarscenes dataset.
|
|
10:30-12:00, Paper ThAT29-NT.7 | Add to My Program |
Dynamic Object Classification of Low-Resolution Point Clouds: An LSTM-Based Ensemble Learning Approach |
|
Zhang, Shaoming | Tongji University |
Yao, Tangjun | Tongji University |
Wang, Jianmei | Tongji University |
Feng, Tiantian | Tongji University |
Wang, Zhong | Tongji University |
Keywords: Object Detection, Segmentation and Categorization, Recognition, Computer Vision for Automation
Abstract: In unmanned vehicle perception, dynamic object classification is applied to classify objects accurately and timely, providing decision-making for obstacle avoidance and planning. Low-resolution LiDAR is one of the most important sensors for this task. Unfortunately, the existing approaches perform unsatisfactorily due to the huge domain gap between low-resolution and high-resolution point cloud classification. Some schemes try to reduce the gap by fusing multi-scan information through SLAM or completing single-scan point clouds. However, these methods rely on high positioning accuracy or the wholeness of object data. To this end, differently, we propose a dynamic object classification method of low-resolution data from the perspective of time-series fusion. By modeling time series of sparse data, we indicate change rules of separate classification models for object representation. Subsequently, based on ensemble learning, our method performs feature-level fusion on multiple networks to exploit their different expression capabilities. Finally, we utilize long short-term memory to gradually classify dynamic objects. Besides, we also propose a dataset of the low-resolution point clouds and manually annotate the ground truth, which contains abundant samples of cars, pedestrians, and motorcycles. Through testing actual low-resolution data, the accuracy of our method is verified to improve a lot than the state-of-the-art approaches.
|
|
10:30-12:00, Paper ThAT29-NT.8 | Add to My Program |
Dynablox: Real-Time Detection of Diverse Dynamic Objects in Complex Environments |
|
Schmid, Lukas M. | Massachusetts Institute of Technology (MIT) |
Andersson, Olov | KTH Royal Institute |
Sulser, Aurelio | ETH Zurich |
Pfreundschuh, Patrick | ETH Zurich |
Siegwart, Roland | ETH Zurich |
Keywords: Object Detection, Segmentation and Categorization, Mapping, Range Sensing
Abstract: Real-time detection of moving objects is an essential capability for robots acting autonomously in dynamic environments. We thus propose Dynablox, a novel online mapping-based approach for robust moving object detection in complex unstructured environments. The central idea of our approach is to incrementally estimate high confidence free-space areas by modeling and accounting for sensing, state estimation, and mapping limitations during online robot operation. The spatio-temporally conservative free space estimate enables robust detection of moving objects without making any assumptions on the appearance of objects or environments. This allows deployment in complex scenes such as multi-storied buildings or staircases, and for diverse moving objects such as people carrying various items, doors swinging or even balls rolling around. We thoroughly evaluate our approach on real-world data sets, achieving 86% IoU at 17 FPS in typical robotic settings. The method outperforms a recent appearance-based classifier and approaches the performance of offline methods. We demonstrate its generality on a novel data set with rare moving objects in complex environments. We make our efficient implementation and the novel data set available as open-source.
|
|
10:30-12:00, Paper ThAT29-NT.9 | Add to My Program |
3D Object Detection with VI-SLAM Point Clouds: The Impact of Object and Environment Characteristics on Model Performance |
|
Duan, Lin | Duke University |
Scargill, Tim | Duke University |
Chen, Ying | Duke University |
Gorlatova, Maria | Duke University |
Keywords: Object Detection, Segmentation and Categorization, Environment Monitoring and Management, Semantic Scene Understanding
Abstract: 3D object detection (OD) is a crucial element in scene understanding. However, most existing 3D OD models have been tailored to work with light detection and ranging (LiDAR) and RGB-D point cloud data, leaving their performance on commonly available visual-inertial simultaneous localization and mapping (VI-SLAM) point clouds unexamined. In this paper, we create and release two datasets: VIP500, 4772 VI-SLAM point clouds covering 500 different object and environment configurations, and VIP500-D, an accompanying set of 20 RGB-D point clouds for the object classes and shapes in VIP500. We then use these datasets to quantify the differences between VI-SLAM point clouds and dense RGB-D point clouds, as well as the discrepancies between VI-SLAM point clouds generated with different object and environment characteristics. Finally, we evaluate the performance of three leading OD models on the diverse data in our VIP500 dataset, revealing the promise of OD models trained on VI-SLAM data; we examine the extent to which both object and environment characteristics impact performance, along with the underlying causes.
|
|
ThAT30-NT Oral Session, NT-G6 |
Add to My Program |
Robotics with Large Language Models |
|
|
Chair: Hori, Chiori | Mitsubishi Electric Research Laboratories (MERL) |
Co-Chair: Sukhatme, Gaurav | University of Southern California |
|
10:30-12:00, Paper ThAT30-NT.1 | Add to My Program |
ERRA: An Embodied Representation and Reasoning Architecture for Long-Horizon Language-Conditioned Manipulation Tasks |
|
Zhao, Chao | Hong Kong University of Science and Technology |
Yuan, Shuai | The Hong Kong University of Science and Technology |
Jiang, Chunli | The Hong Kong University of Science and Technology |
Cai, Junhao | Hong Kong University of Science and Technology |
Yu, Hongyu | The Hong Kong University of Science and Technology |
Wang, Michael Yu | Mywang@gbu.edu.cn |
Chen, Qifeng | HKUST |
Keywords: Manipulation Planning, Integrated Planning and Learning
Abstract: This letter introduces ERRA, an embodied learning architecture that enables robots to jointly obtain three fundamental capabilities (reasoning, planning, and interaction) for solving long-horizon language-conditioned manipulation tasks. ERRA is based on tightly-coupled probabilistic inferences at two granularity levels. A coarse-resolution inference is formulated as sequence generation through a large language model, which infers action language from natural language instruction and environment state. The robot then zooms to the fine-resolution inference part to perform the concrete action corresponding to the action language. Fine-resolution inference is constructed as a Markov decision process, which takes action language and environmental sensing as observations and outputs the action. The results of action execution in environments provide feedback for subsequent coarse-resolution reasoning. Such coarse-to-fine inference allows the robot to decompose and achieve long-horizon tasks interactively. In extensive experiments, we show that ERRA can complete various long-horizon manipulation tasks specified by abstract language instructions. We also demonstrate successful generalization to the novel but similar natural language instructions.
|
|
10:30-12:00, Paper ThAT30-NT.2 | Add to My Program |
Grasp-Anything: Large-Scale Grasp Dataset from Foundation Models |
|
Vuong, An Dinh | FPT Software |
Vu, Minh Nhat | TU Wien, Austria |
Hieu, Le Trung | FPT Software |
Huang, Baoru | Imperial College London |
Binh, Huynh Thi Thanh | School of Information and Communication Technology (Hanoi Univer |
Vo, Thieu | Ton Duc Thang University |
Kugi, Andreas | TU Wien |
Nguyen, Anh | University of Liverpool |
Keywords: Big Data in Robotics and Automation, Data Sets for Robot Learning
Abstract: Foundation models such as ChatGPT have made significant strides in robotic tasks due to their universal representation of real-world domains. In this paper, we leverage foundation models to tackle grasp detection, a persistent challenge in robotics with broad industrial applications. Despite numerous grasp datasets, their object diversity remains limited compared to real-world figures. Fortunately, foundation models possess an extensive repository of real-world knowledge, including objects we encounter in our daily lives. As a consequence, a promising solution to the limited representation in previous grasp datasets is to harness the universal knowledge embedded in these foundation models. We present Grasp-Anything, a new large-scale grasp dataset synthesized from foundation models to implement this solution. Grasp-Anything excels in diversity and magnitude, boasting 1M samples with text descriptions and more than 3M objects, surpassing prior datasets. Empirically, we show that Grasp-Anything successfully facilitates zero-shot grasp detection on vision-based tasks and real-world robotic experiments. Our dataset and code are available at https://grasp-anything-2023.github.io.
|
|
10:30-12:00, Paper ThAT30-NT.3 | Add to My Program |
Anticipate & Act : Integrating LLMs and Classical Planning for Efficient Task Execution in Household Environments |
|
Arora, Raghav | IIIT Hyderabad |
Singh, Shivam | International Institute of Information Technology Hyderabad |
Swaminathan, Karthik | International Institutue of Information Technology - Hyderabad ( |
Datta, Ahana | International Institute of Information Technology, Hyderabad |
Banerjee, Snehasis | Iiit-H / Tcs |
Bhowmick, Brojeshwar | Tata Consultancy Services |
Jatavallabhula, Krishna Murthy | MIT |
Sridharan, Mohan | University of Edinburgh |
Krishna, Madhava | IIIT Hyderabad |
Keywords: Integrated Planning and Learning, Task Planning, AI-Enabled Robotics
Abstract: Assistive agents performing household tasks such as making the bed or cooking breakfast often compute and execute actions that accomplish one task at a time. However, efficiency can be improved by anticipating upcoming tasks and computing an action sequence that jointly achieves these tasks. State-of-the-art methods for task anticipation use data-driven deep networks and Large Language Models (LLMs), but they do so at the level of high-level tasks and/or require many training examples. Our framework leverages the generic knowledge of LLMs through a small number of prompts to perform high-level task anticipation, using the anticipated tasks as goals in a classical planning system to compute a sequence of finer-granularity actions that jointly achieve these goals. We ground and evaluate our framework's abilities in realistic scenarios in the VirtualHome environment and demonstrate a 31% reduction in execution time compared with a system that does not consider upcoming tasks.
|
|
10:30-12:00, Paper ThAT30-NT.4 | Add to My Program |
Conditionally Combining Robot Skills Using Large Language Models |
|
Zentner, K.R. | University of Southern California |
Julian, Ryan | Google |
Ichter, Brian | Google Brain |
Sukhatme, Gaurav | University of Southern California |
Keywords: Deep Learning Methods, Software Tools for Benchmarking and Reproducibility, Imitation Learning
Abstract: This paper combines two contributions. First, we introduce an extension of the Meta-World benchmark, which we call ``Language-World,'' which allows a large language model to operate in a simulated robotic environment using semi-structured natural language queries and scripted skills described using natural language. By using the same set of tasks as Meta-World, Language-World results can be easily compared to Meta-World results, allowing for a point of comparison between recent methods using Large Language Models (LLMs) and those using Deep Reinforcement Learning. Second, we introduce a method we call Plan Conditioned Behavioral Cloning (PCBC), that allows finetuning the behavior of high-level plans using end-to-end demonstrations. Using Language-World, we show that PCBC is able to achieve strong performance in a variety of few-shot regimes, often achieving task generalization with as little as a single demonstration. We have made Language-World available as open-source software at https://github.com/krzentner/language-world/
|
|
10:30-12:00, Paper ThAT30-NT.5 | Add to My Program |
Interactive Planning Using Large Language Models for Partially Observable Robotic Tasks |
|
Sun, Lingfeng | University of California, Berkeley |
Jha, Devesh | Mitsubishi Electric Research Laboratories |
Hori, Chiori | Mitsubishi Electric Research Laboratories (MERL) |
Jain, Siddarth | Mitsubishi Electric Research Laboratories (MERL) |
Corcodel, Radu | Mitsubishi Electric Research Laboratories |
Zhu, Xinghao | University of California, Berkeley |
Tomizuka, Masayoshi | University of California |
Romeres, Diego | Mitsubishi Electric Research Laboratories |
Keywords: Planning under Uncertainty, Deep Learning Methods, Task and Motion Planning
Abstract: Designing robotic agents to perform open vocabulary tasks has been the long-standing goal in robotics and AI. Recently, Large Language Models (LLMs) have achieved impressive results in creating robotic agents for performing open vocabulary tasks. However, planning for these tasks in the presence of uncertainties is challenging as it requires enquote{chain-of-thought} reasoning, aggregating information from the environment, updating state estimates, and generating actions based on the updated state estimates. In this paper, we present an interactive planning technique for partially observable tasks using LLMs. In the proposed method, an LLM is used to collect missing information from the environment using a robot, and infer the state of the underlying problem from collected observations while guiding the robot to perform the required actions. We also use a fine-tuned Llama 2 model via self-instruct and compare its performance against a pre-trained LLM like GPT-4. Results are demonstrated on several tasks in simulation as well as real-world environments.
|
|
10:30-12:00, Paper ThAT30-NT.6 | Add to My Program |
Optimal Scene Graph Planning with Large Language Model Guidance |
|
Dai, Zhirui | UC San Diego |
Asgharivaskasi, Arash | University of California, San Diego |
Duong, Thai | University of California, San Diego |
Lin, Shusen | University of California, San Diego |
Tzes, Mariliza | University of Pennsylvania |
Pappas, George J. | University of Pennsylvania |
Atanasov, Nikolay | University of California, San Diego |
Keywords: Task and Motion Planning, Semantic Scene Understanding, Formal Methods in Robotics and Automation
Abstract: Recent advances in metric, semantic, and topological mapping have equipped autonomous robots with concept grounding capabilities to interpret natural language tasks. Leveraging these capabilities, this work develops an efficient task planning algorithm for hierarchical metric-semantic models. We consider a scene graph model of the environment and utilize a large language model (LLM) to convert a natural language task into a linear temporal logic (LTL) automaton. Our main contribution is to enable optimal hierarchical LTL planning with LLM guidance over scene graphs. To achieve efficiency, we construct a hierarchical planning domain that captures the attributes and connectivity of the scene graph and the task automaton, and provide semantic guidance via an LLM heuristic function. To guarantee optimality, we design an LTL heuristic function that is provably consistent and supplements the potentially inadmissible LLM guidance in multi-heuristic planning. We demonstrate efficient planning of complex natural language tasks in scene graphs of virtualized real environments.
|
|
10:30-12:00, Paper ThAT30-NT.7 | Add to My Program |
CAPE: Corrective Actions from Precondition Errors Using Large Language Models |
|
Sundara Raman, Shreyas | Brown University |
Cohen, Vanya | Brown University |
Idrees, Ifrah | Brown University |
Rosen, Eric | Brown University |
Mooney, Raymond | University of Texas at Austin |
Tellex, Stefanie | Brown |
Paulius, David | Brown University |
Keywords: Task and Motion Planning, AI-Enabled Robotics, Human-Centered Robotics
Abstract: Extracting knowledge and reasoning from large language models (LLMs) offers a path to designing intelligent robots. Common approaches that leverage LLMs for planning are unable to recover when actions fail and resort to retrying failed actions without resolving the underlying cause. We propose a novel approach (CAPE) that generates corrective actions to resolve precondition errors during planning. CAPE improves the quality of generated plans through few-shot reasoning on action preconditions. Our approach enables embodied agents to execute more tasks than baseline methods while maintaining semantic correctness and minimizing re-prompting. In VirtualHome, CAPE improves a human-annotated plan correctness metric from 28.89% to 49.63% over SayCan, whilst achieving competitive executability. Our improvements transfer to a Boston Dynamics Spot robot initialized with a set of skills (specified in language) and associated preconditions, where CAPE improves correctness by 76.49% with higher executability compared to SayCan. Our approach enables embodied agents to follow natural language commands and robustly recover from failures.
|
|
10:30-12:00, Paper ThAT30-NT.8 | Add to My Program |
GraspGPT: Leveraging Semantic Knowledge from a Large Language Model for Task-Oriented Grasping |
|
Tang, Chao | Southern University of Science and Technology |
Huang, Dehao | Southern University of Science and Technology |
Ge, Wenqi | Southern University of Science and Technology |
Liu, Weiyu | Stanford University |
Zhang, Hong | SUSTech |
Keywords: Grasping, Perception for Grasping and Manipulation, Deep Learning in Grasping and Manipulation
Abstract: Task-oriented grasping (TOG) refers to the problem of predicting grasps on an object that enable subsequent manipulation tasks. To model the complex relationships between objects, tasks, and grasps, existing methods incorporate semantic knowledge as priors into TOG pipelines. However, the existing semantic knowledge is typically constructed based on closed-world concept sets, restraining the generalization to novel concepts out of the pre-defined sets. To address this issue, we propose GraspGPT, a large language model (LLM) based TOG framework that leverages the open-end semantic knowledge from an LLM to achieve zero-shot generalization to novel concepts. We conduct experiments on Language Augmented TaskGrasp (LA-TaskGrasp) dataset and demonstrate that GraspGPT outperforms existing TOG methods on different held-out settings when generalizing to novel concepts out of the training set. The effectiveness of GraspGPT is further validated in real-robot experiments. Our code, data, appendix, and video are publicly available at https://sites.google.com/view/graspgpt.
|
|
10:30-12:00, Paper ThAT30-NT.9 | Add to My Program |
DOS: A Deployment Operating System for RoboOps |
|
Ye, Guo | Northwestern University |
Lin, Qinjie | Northwestern University |
Luo, Zening | Northwestern University |
Liu, Han | Northwestern University |
Keywords: Computer Architecture for Robotic and Automation, Software Architecture for Robotic and Automation, Software Tools for Benchmarking and Reproducibility
Abstract: We propose a new system named DOS(D eployment O perating System for RoboOps) for reliably deploying any data driven robots in both production and simulation environments. Compared to existing systems, DOS features a unique CI/CD (continuous i ntegration and continuous deployment) architecture which allows us to seamlessly integrate agile development and reliable operation in a fully automated fashion. With this CI/CD architecture, this paper mainly introduces three essential components that uniquely differentiate DOS from existing robotic systems: (i) A cloud orchestrator that build and schedule pipeline components; (ii) A bridge tool named DOS Connect that make cloud-edge bidirectional communication feasible; (iii) An analytical profiler that collects any set of user-defined performance metrics for system optimization. DOS significantly increases the reliability and maintainability of the deployed robotic systems. To illustrate this point, we demonstrate performance of DOS by training and deploying deep reinforcement learning based methods in a challenging real-world environment with a swarm of robots.
|
|
ThAT31-NT Oral Session, NT-G7 |
Add to My Program |
Autonomous Vehicle Navigation I |
|
|
Chair: Ding, Wenchao | Fudan University |
Co-Chair: Yang, Ming | Shanghai Jiao Tong University |
|
10:30-12:00, Paper ThAT31-NT.1 | Add to My Program |
Driving with LLMs: Fusing Object-Level Vector Modality for Explainable Autonomous Driving |
|
Chen, Long | Wayve |
Sinavski, Oleg | Wayve Technologies Ltd |
Hünermann, Jan | Wayve |
Karnsund, Alice | Wayve |
Willmott, Andrew | Wayve |
Birch, Danny | Wayve |
Maund, Daniel | Wayve |
Shotton, Jamie | Wayve Technologies Ltd |
Keywords: Autonomous Vehicle Navigation, AI-Enabled Robotics, Machine Learning for Robot Control
Abstract: Large Language Models (LLMs) have shown promise in the autonomous driving sector, particularly in generalization and interpretability. We introduce a unique object-level multimodal LLM architecture that merges vectorized numeric modalities with a pre-trained LLM to improve context understanding in driving situations. We also present a new dataset of 160k QA pairs derived from 10k driving scenarios, paired with high quality control commands collected with RL agent and question answer pairs generated by teacher LLM (GPT-3.5). A distinct pretraining strategy is devised to align numeric vector modalities with static LLM representations using vector captioning language data. We also introduce an evaluation metric for Driving QA and demonstrate our LLM-driver's proficiency in interpreting driving scenarios, answering questions, and decision-making. Our findings highlight the potential of LLM-based driving action generation in comparison to traditional behavioral cloning. We make our benchmark, datasets, and model available for further exploration.
|
|
10:30-12:00, Paper ThAT31-NT.2 | Add to My Program |
HHGNN: Heterogeneous Hypergraph Neural Network for Traffic Agents Trajectory Prediction in Grouping Scenarios |
|
Guo, Hetian | Southern University of Science and Technology |
Peng, Yingzhi | Southern University of Science and Technology |
Fan, Zipei | University of Tokyo |
Zhu, He | Southern University of Science and Technology |
Song, Xuan | University of Tokyo |
Keywords: Autonomous Vehicle Navigation, Autonomous Agents, AI-Based Methods
Abstract: In many intelligent transportation systems, predicting the future motion of heterogeneous traffic participants is a fundamental but challenging task due to various factors encompassing the agents' dynamic states, interactions with neighboring agents and surrounding traffic infrastructures, and their stochastic and multi-modal natural behavior tendencies. However, existing approaches have limitations as they either focus solely on static, pairwise interactions, ignoring interactions of varied granularity, or fail to tackle agents' heterogeneity. In this paper, instead of focusing solely on pairwise interactions, we propose a Heterogenous Hypergraph Graph Neural Network (HHGNN) based motion prediction model that leverages the nature of hypergraph to encode the groupwise interactions among traffic participants. Moreover, we propose the type-aware two-level hypergraph message passing module (TTHMS) with learnable hyperedge-type embeddings to model the intra-group and inter-group level interactions among heterogeneous traffic agents (e.g., vehicles, pedestrians, and cyclists). Besides, We integrate a scene context fusion layer in TTHMS to incorporate the scene context. Comparison and ablation experiments on the Waymo Open Motion Dataset (WOMD) demonstrate HHGNN's effectiveness within the motion prediction task.
|
|
10:30-12:00, Paper ThAT31-NT.3 | Add to My Program |
Odometry Estimation by Fusing Multiple Radar Sensors and an Inertial Measurement Unit |
|
Brühl, Tim | Dr. Ing. H.c. F. Porsche AG |
Eberhardt, Tim Dieter | HTW Berlin |
Schwager, Robin | Porsche AG |
Ewecker, Lukas | Dr. Ing. H.c. F. Porsche AG |
Sohn, Tin Stribor | Dr. Ing H.c. F. Porsche AG |
Hohmann, Sören | Institute of Control Systems, Karlsruhe Institute of Technology |
Keywords: Autonomous Vehicle Navigation, Autonomous Agents, Intelligent Transportation Systems
Abstract: This paper presents a framework for odometry estimation in automotive application using six asynchronously operating millimeter wave radar sensors and a combination of gyroscope and accelerometer. Two different motion models are combined to estimate motion with three degrees of freedom. For this purpose, we propose a novel three-part radar filtering method for outlier detection: By analyzing uncertainties and system limits, sensor-specific outliers are detected and removed in the first filter. We introduce knowledge about the previous motion state by a status-quo-ante filter and hereby identify further false positive raw targets in the current measure which are not accessible from the previous state. Moreover, we suggest employing a downstream, resampling-based algorithm for additional outlier detection. Based on the filtered data, radar motion state estimation is performed by use of curve fitting methods. To fuse the radar odometry estimation with the acceleration and yaw rate measurements handling non-linearities, an Unscented Kalman Filter is used. The developed framework is evaluated with reference data in various scenarios. The results demonstrate that it accurately and robustly determines motion and position states even in radar-challenging scenes, such as environments with few radar targets or with heavy metal structures. Our method keeps up with common approaches such as wheel speed sensor odometry while outperforming it in terms of drift-impairment.
|
|
10:30-12:00, Paper ThAT31-NT.4 | Add to My Program |
Thermal Voyager: A Comparative Study of RGB and Thermal Cameras for Night-Time Autonomous Navigation |
|
Ng, Aditya | PES University |
Pb, Dhruval | PES University |
Shalabi, Jehan | Purdue University |
Jape, Shubhankar | Purdue University |
Wang, Xueji | Purdue University |
Jacob, Zubin | Purdue University |
Keywords: Autonomous Vehicle Navigation, Data Sets for Robotic Vision, Deep Learning for Visual Perception
Abstract: Achieving reliable autonomous navigation during nighttime remains a substantial obstacle in the field of robotics. Although systems utilizing Light Detection and Ranging (LiDAR) and Radio Detection and Ranging (RADAR) enable environmental perception regardless of lighting conditions, they face significant challenges in environments with a high density of agents due to their dependence on active emissions. Cameras operating in the visible spectrum represent a quasi-passive alternative, yet they see a substantial drop in efficiency in low-light conditions, consequently hindering both scene perception and path planning. Here, we introduce a novel end-to-end navigation system, the "Thermal Voyager", which leverages infrared thermal vision to achieve true passive perception in autonomous entities. The system utilizes our architecture, TrajNet to interpret thermal visual inputs to produce desired trajectories and employs a model predictive control strategy to determine the optimal steering angles needed to actualize those trajectories. We train our TrajNet on a comprehensive video dataset incorporating visible and thermal footage alongside Controller Area Network (CAN) frames. We demonstrate that nighttime navigation facilitated by Long-Wave Infrared (LWIR) thermal cameras can rival the performance of daytime navigation systems using RGB cameras. Our work paves the way for scene perception and trajectory prediction empowered entirely by passive thermal sensing technology, heralding a new era where autonomous navigation is both feasible and reliable irrespective of the time of day. We make our code and thermal trajectory dataset public.
|
|
10:30-12:00, Paper ThAT31-NT.5 | Add to My Program |
Rethinking Imitation-Based Planners for Autonomous Driving |
|
Cheng, Jie | Hong Kong University of Science and Technology |
Chen, Yingbing | The Hongkokng University of Science and Technology |
Mei, Xiaodong | HKUST |
Yang, Bowen | The Hong Kong University of Science and Technology, Robotics Ins |
Li, Bo | Lotus Technology Ltd |
Liu, Ming | Hong Kong University of Science and Technology (Guangzhou) |
Keywords: Autonomous Vehicle Navigation, Imitation Learning
Abstract: In recent years, imitation-based driving planners have reported considerable success. However, due to the absence of a standardized benchmark, the effectiveness of various designs remains unclear. The newly released nuPlan addresses this issue by offering a large-scale real-world dataset and a standardized closed-loop benchmark for equitable comparisons. Utilizing this platform, we conduct a comprehensive study on two fundamental yet underexplored aspects of imitation-based planners: the essential features for ego planning and the effective data augmentation techniques to reduce compounding errors. Furthermore, we highlight an imitation gap that has been overlooked by current learning systems. Finally, integrating our findings, we propose a strong baseline model—PlanTF. Our results demonstrate that a well-designed, purely imitation-based planner can achieve highly competitive performance compared to state-of-the-art methods involving hand-crafted rules and exhibit superior generalization capabilities in long-tail cases. Our models and benchmarks are publicly available. Project website https://jchengai.github.io/planTF.
|
|
10:30-12:00, Paper ThAT31-NT.6 | Add to My Program |
Traffic Flow-Based Crowdsourced Mapping in Complex Urban Scenario |
|
Qin, Tong | Shanghai Jiao Tong University |
Huang, Haihui | Zhejiang University |
Wang, Ziqiang | Autonomous Driving Solution, IAS BU, Huawei |
Chen, Tongqing | Huawei Technology |
Ding, Wenchao | Fudan University |
Keywords: Autonomous Vehicle Navigation, Intelligent Transportation Systems
Abstract: An accurate road topological structure is of great importance for autonomous driving in complex urban environments. Currently, most autonomous vehicles highly rely on the High-Definition map (HD map) to cruise across the city. Without the prior map, it's hard for vehicles to find right-turning and left-turning ways in large intersections. However, due to the complexity of intersections, producing such a map by human resources is time-consuming and error-prone. In this paper, we proposed a framework to automatically produce the topological map of complicated intersections. This framework adopts the crowdsourcing way to collect semantic information about the environment and traffic flows. The topological structure is inferred from traffic flows correctly and automatically. We highlight that this framework is highly automatic and scalable, which can greatly speed up HD map production and decrease the cost. The proposed system is validated by real-world crowdsourcing data and the result is comparable to the traditional HD maps.
|
|
10:30-12:00, Paper ThAT31-NT.7 | Add to My Program |
Scene Informer: Anchor-Based Occlusion Inference and Trajectory Prediction in Partially Observable Environments |
|
Lange, Bernard | Stanford University |
Li, Jiachen | University of California, Riverside |
Kochenderfer, Mykel | Stanford University |
Keywords: Autonomous Vehicle Navigation, Intelligent Transportation Systems, Deep Learning Methods
Abstract: Navigating complex and dynamic environments requires autonomous vehicles (AVs) to reason about both visible and occluded regions. This involves predicting the future motion of observed agents, inferring occluded ones, and modeling their interactions based on vectorized scene representations of the partially observable environment. However, prior work on occlusion inference and trajectory prediction have developed in isolation, with the former based on simplified rasterized methods and the latter assuming full environment observability. We introduce the Scene Informer, a unified approach for predicting both observed agent trajectories and inferring occlusions in a partially observable setting. It uses a transformer to aggregate various input modalities and facilitate selective queries on occlusions that might intersect with the AV's planned path. The framework estimates occupancy probabilities and likely trajectories for occlusions, as well as forecast motion for observed agents. We explore common observability assumptions in both domains and their performance impact. Our approach outperforms existing methods in both occupancy prediction and trajectory prediction in partially observable setting on the Waymo Open Motion Dataset.
|
|
10:30-12:00, Paper ThAT31-NT.8 | Add to My Program |
Monocular Localization with Semantics Map for Autonomous Vehicles |
|
Wan, Jixiang | OPPO Research Institute |
Zhang, Xudong | OPPO Research Institute |
Dong, Shuzhou | OPPO |
Zhang, Yuwei | OPPO Research Institute |
Yang, Yuchen | OPPO Research Institute |
Ruoxi, Wu | OPPO Research Institute, Shanghai, China |
Jiang, Ye | OPPO Research Institute |
Li, Jijunnan | OPPO Research Institute |
Lin, Jinquan | OPPO Research Institute |
Yang, Ming | Shanghai Jiao Tong University |
Keywords: Autonomous Vehicle Navigation, Localization, Intelligent Transportation Systems
Abstract: Accurate and robust localization remains a significant challenge for autonomous vehicles. The cost of sensors and limitations in local computational efficiency make it difficult to scale to large commercial applications. Traditional vision-based approaches focus on texture features that are susceptible to changes in lighting, season, perspective, and appearance. Additionally, the large storage size of maps with descriptors and complex optimization processes hinder system performance. To balance efficiency and accuracy, we propose a novel lightweight visual semantic localization algorithm that employs stable semantic features instead of low-level texture features. First, semantic maps are constructed offline by detecting semantic objects, such as ground markers, lane lines, and poles, using cameras or LiDAR sensors. Then, online visual localization is performed through data association of semantic features and map objects. We evaluated our proposed localization framework in the publicly available KAIST Urban dataset and in scenarios recorded by ourselves. The experimental results demonstrate that our method is a reliable and practical localization solution in various autonomous driving localization tasks.
|
|
10:30-12:00, Paper ThAT31-NT.9 | Add to My Program |
DiPA: Probabilistic Multi-Modal Interactive Prediction for Autonomous Driving |
|
Knittel, Anthony | Five AI |
Hawasly, Majd | FiveAI Ltd |
Albrecht, Stefano V. | University of Edinburgh |
Redford, John | Five AI Ltd |
Ramamoorthy, Subramanian | The University of Edinburgh |
Keywords: Autonomous Vehicle Navigation, Motion and Path Planning, Deep Learning Methods
Abstract: Accurate prediction is important for operating an autonomous vehicle in interactive scenarios. Prediction must be fast, to support multiple requests from a planner exploring a range of possible futures. The generated predictions must accurately represent the probabilities of predicted trajectories, while also capturing different modes of behaviour (such as turning left vs continuing straight at a junction). To this end, we present DiPA, an interactive predictor that addresses these challenging requirements. Previous interactive prediction methods use an encoding of k-mode-samples, which under-represents the full distribution. Other methods optimise closest-mode evaluations, which test whether one of the predictions is similar to the ground-truth, but allow additional unlikely predictions to occur, over-representing unlikely predictions. DiPA addresses these limitations by using a Gaussian-Mixture-Model to encode the full distribution, and optimising predictions using both probabilistic and closest-mode measures. These objectives respectively optimise probabilistic accuracy and the ability to capture distinct behaviours, and there is a challenging trade-off between them. We are able to solve both together using a novel training regime. DiPA achieves new state-of-the-art performance on the INTERACTION and NGSIM datasets, and improves over the baseline (MFP) when both closest-mode and probabilistic evaluations are used. This demonstrates effective prediction for supporting a pla
|
|
ThAT32-NT Oral Session, NT-G8 |
Add to My Program |
Intelligent Transportation Systems IV |
|
|
Chair: Xue, Jianru | Xi'an Jiaotong University |
Co-Chair: Ma, Jun | The Hong Kong University of Science and Technology |
|
10:30-12:00, Paper ThAT32-NT.1 | Add to My Program |
MacFormer: Map-Agent Coupled Transformer for Real-Time and Robust Trajectory Prediction |
|
Feng, Chen | Hong Kong University of Science and Technology |
Zhou, Hangning | Megvii |
Lin, Huadong | Beihang University |
Zhang, Zhigang | Megvii.Inc |
Xu, Ziyao | Megvii |
Zhang, Chi | Mach |
Zhou, Boyu | Sun Yat-Sen University |
Shen, Shaojie | Hong Kong University of Science and Technology |
Keywords: Deep Learning Methods, Representation Learning, Autonomous Vehicle Navigation
Abstract: Predicting the future behavior of agents is a fundamental task in autonomous vehicle domains. Accurate prediction relies on comprehending the surrounding map, which significantly regularizes agent behaviors. However, existing methods have limitations in exploiting the map and exhibit a strong dependence on historical trajectories, which yield unsatisfactory prediction performance and robustness. Additionally, their heavy network architectures impede real-time applications. To tackle these problems, we propose Map-Agent Coupled Transformer (MacFormer) for real-time and robust trajectory prediction. Our framework explicitly incorporates map constraints into the network via two carefully designed modules named coupled map and reference extractor. A novel multi-task optimization strategy (MTOS) is presented to enhance learning of topology and rule constraints. We also devise bilateral query scheme in context fusion for a more efficient and lightweight network. We evaluated our approach on Argoverse 1, Argoverse 2, and nuScenes real-world benchmarks, where it all achieved state-of-the-art performance with the lowest inference latency and smallest model size. Experiments also demonstrate that our framework is resilient to imperfect tracklet inputs. Furthermore, we show that by combining with our proposed strategies, classical models outperform their baselines, further validating the versatility of our framework.
|
|
10:30-12:00, Paper ThAT32-NT.2 | Add to My Program |
Real-Time Capable Decision Making for Autonomous Driving Using Reachable Sets |
|
Kochdumper, Niklas | Stony Brook University |
Bak, Stanley | Stony Brook University |
Keywords: Intelligent Transportation Systems, Motion and Path Planning, Dynamics
Abstract: Despite large advances in recent years, real-time capable motion planning for autonomous road vehicles re- mains a huge challenge. In this work, we present a decision module that is based on set-based reachability analysis: First, we identify all possible driving corridors by computing the reachable set for the longitudinal position of the vehicle along the lanelets of the road network, where lane changes are modeled as discrete events. Next, we select the best driving corridor based on a cost function that penalizes lane changes and deviations from a desired velocity profile. Finally, we generate a reference trajectory inside the selected driving corridor, which can be used to guide or warm start low-level trajectory planners. For the numerical evaluation we combine our decision module with a motion-primitive-based and an optimization-based planner and evaluate the performance on 2000 challenging CommonRoad traffic scenarios as well in the realistic CARLA simulator. The results demonstrate that our decision module is real-time capable and yields significant speed-ups compared to executing a motion planner standalone without a decision module.
|
|
10:30-12:00, Paper ThAT32-NT.3 | Add to My Program |
Vehicle Behavior Prediction by Episodic-Memory Implanted NDT |
|
Shen, Peining | Chang'an University |
Fang, Jianwu | Xian Jiaotong University |
Yu, Hongkai | Cleveland State University |
Xue, Jianru | Xi'an Jiaotong University |
Keywords: Learning Categories and Concepts, Intelligent Transportation Systems, Representation Learning
Abstract: In autonomous driving, predicting the behavior (turning left, stopping, etc.) of target vehicles is crucial for the self-driving vehicle to make safe decisions and avoid accidents. Existing deep learning-based methods have shown excellent and accurate performance, but the black-box nature makes it untrustworthy to apply them in practical use. In this work, we explore the interpretability of behavior prediction of target vehicles by an Episodic Memory implanted Neural Decision Tree (abbrev. eMem-NDT). The structure of eMem-NDT is constructed by hierarchically clustering the text embedding of vehicle behavior descriptions. eMem-NDT is a neural-backed part of a pre-trained deep learning model by changing the soft-max layer of the deep model to eMem-NDT, for grouping and aligning the memory prototypes of the historical vehicle behavior features in training data on a neural decision tree. Each leaf node of eMem-NDT is modeled by a neural network for aligning the behavior memory prototypes. By eMem-NDT, we infer each instance in behavior prediction of vehicles by bottom-up Memory Prototype Matching (MPM) (searching the appropriate leaf node and the links to the root node) and top-down Leaf Link Aggregation (LLA) (obtaining the probability of future behaviors of vehicles for certain instances). We validate eMem-NDT on BLVD and LOKI datasets, and the results show that our model can obtain a superior performance to other methods with clear explainability. The code is available in https://github.com/JWFangit/eMem-NDT.
|
|
10:30-12:00, Paper ThAT32-NT.4 | Add to My Program |
Optimal Driver Warning Generation in Dynamic Driving Environment |
|
Li, Chenran | University of California, Berkeley |
Xu, Aolin | HRI |
Sachdeva, Enna | Honda Research Institute |
Misu, Teruhisa | Honda Research Institute USA, Inc |
Dariush, Behzad | Honda Research Institute USA |
Keywords: Intelligent Transportation Systems, Collision Avoidance
Abstract: The driver warning system that alerts the human driver about potential risks during driving is a key feature of an advanced driver assistance system. Existing driver warning technologies, mainly the forward collision warning and unsafe lane change warning, can reduce the risk of collision caused by human errors. However, the current design methods have several major limitations. Firstly, the warnings are mainly generated in a one-shot manner without modeling the ego driver's reactions and surrounding objects, which reduces the flexibility and generality of the system over different scenarios. Additionally, the triggering conditions of warning are mostly rule-based threshold-checking given the current state, which lacks the prediction of the potential risk in a sufficiently long future horizon. In this work, we study the problem of optimally generating driver warnings by considering the interactions among the generated warning, the driver behavior, and the states of ego and surrounding vehicles on a long horizon. The warning generation problem is formulated as a partially observed Markov decision process (POMDP). An optimal warning generation framework is proposed as a solution to the proposed POMDP. The simulation experiments demonstrate the superiority of the proposed solution to the existing warning generation methods.
|
|
10:30-12:00, Paper ThAT32-NT.5 | Add to My Program |
Active Learning with Dual Model Predictive Path-Integral Control for Interaction-Aware Autonomous Highway On-Ramp Merging |
|
Knaup, Jacob | Georgia Institute of Technology |
D'sa, Jovin | Honda Research Institute, USA |
Chalaki, Behdad | Honda Research Institute USA, Inc |
Naes, Tyler | Honda Research Institute, USA |
Nourkhiz Mahjoub, Hossein | Honda Research Institute US |
Moradi-Pari, Ehsan | Honda Research Institute |
Tsiotras, Panagiotis | Georgia Tech |
Keywords: Planning under Uncertainty, Integrated Planning and Control, Intelligent Transportation Systems
Abstract: Merging into dense highway traffic for an autonomous vehicle is a complex decision-making task, wherein the vehicle must identify a potential gap and coordinate with surrounding human drivers, each of whom may exhibit diverse driving behaviors. Many existing methods consider other drivers to be dynamic obstacles and, as a result, they are incapable of capturing the full intent of the human drivers through this passive planning. In this paper, we propose a novel dual control framework based on Model Predictive Path-Integral control to generate interactive trajectories. This framework incorporates a Bayesian inference approach to actively learn the agents’ parameters, i.e., other drivers’ model parameters. The proposed framework employs a sampling-based approach that is suitable for real-time implementation through the utilization of GPUs. We illustrate the effectiveness of our proposed methodology through comprehensive numerical simulations conducted in both high and low-fidelity simulation scenarios focusing on autonomous on-ramp merging.
|
|
10:30-12:00, Paper ThAT32-NT.6 | Add to My Program |
Informed Reinforcement Learning for Situation-Aware Traffic Rule Exceptions |
|
Bogdoll, Daniel | FZI Research Center for Information Technoloy |
Qin, Jing | Karlsruhe Institute of Technology |
Nekolla, Moritz | FZI Research Center for Information Technoloy |
Abouelazm, Ahmed | FZI Forschungszentrum Informatik |
Joseph, Tim | FZI Research Center for Information Technoloy |
Zöllner, Johann Marius | FZI Forschungszentrum Informatik |
Keywords: Reinforcement Learning, Intelligent Transportation Systems, AI-Enabled Robotics
Abstract: Reinforcement Learning is a highly active re- search field with promising advancements. In the field of autonomous driving, however, often very simple scenarios are being examined. Common approaches use non-interpretable control commands as the action space and unstructured reward designs, which are unsuitable for complex scenarios. In this work, we introduce Informed Reinforcement Learning, where a structured rulebook is integrated as a knowledge source. We learn trajectories and asses them with a situation-aware reward design, leading to a dynamic reward that allows the agent to learn situations that require controlled traffic rule exceptions. Our method is applicable to arbitrary RL models. We successfully demonstrate high completion rates of complex scenarios with recent model-based agents.
|
|
10:30-12:00, Paper ThAT32-NT.7 | Add to My Program |
Chance-Aware Lane Change with High-Level Model Predictive Control through Curriculum Reinforcement Learning |
|
Wang, Yubin | The Hong Kong University of Science and Technology (Guangzhou) |
Li, Yulin | Hong Kong University of Science and Technology(HKUST) |
Peng, Zengqi | The Hong Kong University of Science and Technology (Guangzhou) |
Ghazzai, Hakim | King Abdullah University of Science and Technology |
Ma, Jun | The Hong Kong University of Science and Technology |
Keywords: Intelligent Transportation Systems, Integrated Planning and Learning, Motion and Path Planning
Abstract: Lane change in dense traffic typically requires the recognition of an appropriate opportunity for maneuvers, which remains a challenging problem in self-driving. In this work, we propose a chance-aware lane-change strategy with high-level model predictive control (MPC) through curriculum reinforcement learning (CRL). In our proposed framework, full-state references and regulatory factors concerning the relative importance of each cost term in the embodied MPC are generated by a neural policy. Furthermore, effective curricula are designed and integrated into an episodic reinforcement learning (RL) framework with policy transfer and enhancement, to improve the convergence speed and ensure a high-quality policy. The proposed framework is deployed and evaluated in numerical simulations of dense and dynamic traffic. It is noteworthy that, given a narrow chance, the proposed approach generates high-quality lane-change maneuvers such that the vehicle merges into the traffic flow with a high success rate of 96%. Finally, our framework is validated in the high-fidelity simulator under dense traffic, demonstrating satisfactory practicality and generalizability.
|
|
10:30-12:00, Paper ThAT32-NT.8 | Add to My Program |
Human Observation-Inspired Trajectory Prediction for Autonomous Driving in Mixed-Autonomy Traffic Environments |
|
Haicheng, Liao | University of Macau |
Liu, Shangqian | University of Macau |
Li, Yong Kang | University of Electronic Science and Technology of China |
Li, Zhenning | University of Macau |
Wang, Chengyue | University of Macau |
Yunjian, Li | Macau University of Science and Technology |
Li, Shengbo Eben | Tsinghua University |
Xu, Chengzhong | University of Macau |
Keywords: Intelligent Transportation Systems, Motion and Path Planning, Deep Learning Methods
Abstract: In the burgeoning field of autonomous vehicles (AVs), trajectory prediction remains a formidable challenge, especially in mixed autonomy environments. Traditional approaches often rely on computational methods such as time-series analysis. Our research diverges significantly by adopting an interdisciplinary approach that integrates principles of human cognition and observational behavior into trajectory prediction models for AVs. We introduce a novel ''adaptive visual sector'' mechanism that mimics the dynamic allocation of attention human drivers exhibit based on factors like spatial orientation, proximity, and driving speed. Additionally, we develop a ''dynamic traffic graph'' using Convolutional Neural Networks (CNN) and Graph Attention Networks (GAT) to capture spatio-temporal dependencies among agents. Benchmark tests on the NGSIM, HighD, and MoCAD datasets reveal that our model (GAVA) outperforms state-of-the-art baselines by at least 15.2%, 19.4%, and 12.0%, respectively. Our findings underscore the potential of leveraging human cognition principles to enhance the proficiency and adaptability of trajectory prediction algorithms in AVs.
|
|
10:30-12:00, Paper ThAT32-NT.9 | Add to My Program |
Context-Aware Timewise VAEs for Real-Time Vehicle Trajectory Prediction |
|
Xu, Pei | Stanford University |
Hayet, Jean-Bernard | CIMAT |
Karamouzas, Ioannis | Clemson University |
Keywords: Motion and Path Planning, Computer Vision for Transportation, Deep Learning Methods
Abstract: Real-time, accurate prediction of human steering behaviors has wide applications, from developing intelligent traffic systems to deploying autonomous driving systems in both real and simulated worlds. In this paper, we present ContextVAE, a context-aware approach for multi-modal vehicle trajectory prediction. Built upon the backbone architecture of a timewise variational autoencoder, ContextVAE employs a dual attention mechanism for observation encoding that accounts for the environmental context information and the dynamic agents' states in a unified way. By utilizing features extracted from semantic maps during agent state encoding, our approach takes into account both the social features exhibited by agents on the scene and the physical environment constraints to generate map-compliant and socially-aware trajectories. We perform extensive testing on the nuScenes prediction challenge, Lyft Level 5 dataset and Waymo Open Motion Dataset to show the effectiveness of our approach and its state-of-the-art performance. In all tested datasets, ContextVAE models are fast to train and provide high-quality multi-modal predictions in real-time.
|
|
ThAT33-CC Oral Session, CC-301 |
Add to My Program |
Integrated Planning and Control |
|
|
Chair: Lehnert, Christopher | Queensland University of Technology |
Co-Chair: Devasia, Santosh | University of Washington |
|
10:30-12:00, Paper ThAT33-CC.1 | Add to My Program |
Probably Approximately Correct Nonlinear Model Predictive Control (PAC-NMPC) |
|
Polevoy, Adam | Johns Hopkins University Applied Physics Lab |
Kobilarov, Marin | Johns Hopkins University |
Moore, Joseph | Johns Hopkins University Applied Physics Lab |
Keywords: Planning under Uncertainty, Integrated Planning and Control, Robot Safety
Abstract: Approaches for stochastic nonlinear model predictive control (SNMPC) typically make restrictive assumptions about the system dynamics and rely on approximations to characterize the evolution of the underlying uncertainty distributions. For this reason, they are often unable to capture more complex distributions (e.g., non-Gaussian or multi-modal) and cannot provide accurate guarantees of performance. In this paper, we present a sampling-based SNMPC approach that leverages recently derived sample complexity bounds to certify the performance of a feedback policy without making assumptions about the system dynamics or underlying uncertainty distributions. By parallelizing our approach, we are able to demonstrate real-time receding-horizon SNMPC with statistical safety guarantees in simulation and on hardware using a 1/10th scale rally car and a 24-inch wingspan fixed-wing UAV.
|
|
10:30-12:00, Paper ThAT33-CC.2 | Add to My Program |
QuAD: Query-Based Interpretable Neural Motion Planning for Autonomous Driving |
|
Biswas, Sourav | Waabi, University of Toronto |
Casas Romero, Sergio | University of Toronto |
Sykora, Quin | University of Toronto |
Agro, Ben | UofT, Waabi |
Sadat, Abbas | Waabi |
Urtasun, Raquel | University of Toronto |
Keywords: Motion and Path Planning, Deep Learning Methods, Imitation Learning
Abstract: A self-driving vehicle must understand its environment to determine the appropriate action. Traditional autonomy systems rely on object detection to find the agents in the scene. However, object detection assumes a discrete set of objects and loses information about uncertainty, so any errors compound when predicting the future behavior of those agents. Alternatively, dense occupancy grid maps have been utilized to understand free-space. However, predicting a grid for the entire scene is wasteful since only certain spatio-temporal regions are reachable and relevant to the self-driving vehicle. We present a unified, interpretable, and efficient autonomy framework that moves away from cascading modules that first perceive, then predict, and finally plan. Instead, we shift the paradigm to have the planner query occupancy at relevant spatio-temporal points, restricting the computation to those regions of interest. Exploiting this representation, we evaluate a candidate trajectory around key factors such as collision avoidance, comfort, and progress for safety and interpretability. Our approach achieves better highway driving quality than the state-of-the-art on high-fidelity closed-loop simulations.
|
|
10:30-12:00, Paper ThAT33-CC.3 | Add to My Program |
Safe Receding Horizon Motion Planning with Infinitesimal Update Interval |
|
Jang, Inkyu | Seoul National University |
Hwang, Sunwoo | Seoul National University |
Byun, Jeonghyun | Seoul National University |
Kim, H. Jin | Seoul National University |
Keywords: Integrated Planning and Control, Robot Safety, Optimization and Optimal Control
Abstract: Safety verification in motion planning is known to be computationally burdensome, despite its importance in robotics. In this paper, we investigate the behavior of safe receding horizon motion planners when the update interval becomes infinitesimal. By requiring the trajectory parameters to evolve continuously in time, the trajectory optimization problem is reformulated into a time-derivative form, whose decision variables are their rate of change. This results in a quadratic programming problem which directly provides safe input, and can be regarded as a real-time safety filter. The input expressivity is also enhanced by leveraging the differentiable structure of the parameter space. The proposed safety filter is experimentally validated using a wheeled ground robot in obstacle-cluttered environments. The result shows that the safety filter is capable of generating safe inputs in real-time, while addressing hundreds of constraints simultaneously.
|
|
10:30-12:00, Paper ThAT33-CC.4 | Add to My Program |
NPC: Neural Predictive Control for Fuel-Efficient Autonomous Trucks |
|
Ren, Jiaping | Inceptio Technology |
Xiang, Jiahao | Tongji University, Inceptio Technology |
Gao, Hongfei | Inceptio Technology |
Zhang, Jinchuan | Inceptio Technology |
Ren, Yiming | ShanghaiTech University |
Ma, Yuexin | ShanghaiTech University |
Wu, Yi | Nanjing University of Posts and Telecommunications |
Yang, Ruigang | University of Kentucky |
Li, Wei | Inceptio |
Keywords: Integrated Planning and Control, Planning under Uncertainty, Integrated Planning and Learning
Abstract: Fuel efficiency is a crucial aspect of long-distance cargo transportation by oil-powered trucks that economize on costs and decrease carbon emissions. Current predictive control methods depend on an accurate model of vehicle dynamics and engine, including weight, drag coefficient, and the Brake-specific Fuel Consumption (BSFC) map of the engine. We propose a pure data-driven method, Neural Predictive Control (NPC), which does not use any physical model for the vehicle. After training with over 20,000 km of historical data, the novel proposed NVFormer implicitly models the relationship between vehicle dynamics, road slope, fuel consumption, and control commands using the attention mechanism. Based on the online sampled primitives from the past of the current freight trip and anchor-based future data synthesis, the NVFormer can infer optimal control command for reasonable fuel consumption. The physical model free NPC outperforms the base PCC method with 2.41% and 3.45% more fuel saving in simulation and the open-road highway testing, respectively.
|
|
10:30-12:00, Paper ThAT33-CC.5 | Add to My Program |
Robustified Time-Optimal Collision-Free Motion Planning for Autonomous Mobile Robots under Disturbance Conditions |
|
Zhang, Shuhao | KU Leuven |
Bos, Mathias | KU Leuven |
Vandewal, Bastiaan | KU Leuven |
Decré, Wilm | Katholieke Universiteit Leuven |
Gillis, Joris | KU Leuven |
Swevers, Jan | KU Leuven |
Keywords: Planning under Uncertainty, Collision Avoidance, Integrated Planning and Control
Abstract: This paper presents a robustified time-optimal motion planning approach for navigating an Autonomous Mobile Robot (AMR) from an initial state to a terminal state without colliding with obstacles, even when subjected to disturbances, which are modeled as random process noise and measurement noise. The approach iteratively solves the robustified problem by incorporating updated state-dependent safety margins for collision avoidance, the evolution of which is derived separately from the robustified problem. Additionally, a strategy for selecting an alternative terminal state to reach is introduced, which comes into play when the desired terminal state becomes infeasible considering the disturbances. Both of these contributions are integrated into a robustified motion planning and control pipeline, the efficacy of which is validated through simulation experiments.
|
|
10:30-12:00, Paper ThAT33-CC.6 | Add to My Program |
Learning-Aided Warmstart of Model Predictive Control in Uncertain Fast-Changing Traffic |
|
Bouzidi, Mohamed-Khalil | Continental, FU Berlin |
Yao, Yue | Freie Universität Berlin & Continental AG |
Reichardt, Joerg | Continental AG |
Goehring, Daniel | Freie Universität Berlin |
Keywords: Integrated Planning and Learning, Integrated Planning and Control, Constrained Motion Planning
Abstract: Model Predictive Control lacks the ability to escape local minima in nonconvex problems. Furthermore, in fastchanging, uncertain environments, the conventional warmstart, using the optimal trajectory from the last timestep, often falls short of providing an adequately close initial guess for the current optimal trajectory. This can potentially result in convergence failures and safety issues. Therefore, this paper proposes a framework for learning-aided warmstarts of Model Predictive Control algorithms. Our method leverages a neural network based multimodal predictor to generate multiple trajectory proposals for the autonomous vehicle, which are further refined by a sampling-based technique. This combined approach enables us to identify multiple distinct local minima and provide an improved initial guess. We validate our approach with Monte Carlo simulations of traffic scenarios.
|
|
10:30-12:00, Paper ThAT33-CC.7 | Add to My Program |
Co-Learning Planning and Control Policies Constrained by Differentiable Logic Specifications |
|
Xiong, Zikang | Purdue University |
Lawson, Daniel | Purdue University |
Eappen, Joe Kurian | Purdue University |
Qureshi, Ahmed H. | Purdue University |
Jagannathan, Suresh | Purdue University |
Keywords: Reinforcement Learning, Integrated Planning and Control, Deep Learning Methods
Abstract: Synthesizing planning and control policies in robotics is a fundamental task, further complicated by factors such as complex logic specifications and high-dimensional robot dynamics. This paper presents a novel reinforcement learning approach to solving high-dimensional robot navigation tasks with complex logic specifications by co-learning planning and control policies. Notably, this approach significantly reduces the sample complexity in training, allowing us to train high-quality policies with much fewer samples compared to existing reinforcement learning algorithms. In addition, our methodology streamlines complex specification extraction from map images and enables the efficient generation of long-horizon robot motion paths across different map layouts. Moreover, our approach also demonstrates capabilities for high-dimensional control and avoiding suboptimal policies via policy alignment. The efficacy of our approach is demonstrated through experiments involving simulated high-dimensional quadruped robot dynamics and a real-world differential drive robot (TurtleBot3) under different types of task specifications.
|
|
10:30-12:00, Paper ThAT33-CC.8 | Add to My Program |
Output-Sampled Model Predictive Path Integral Control (o-MPPI) for Increased Efficiency |
|
Yan, Leon | University of Washington |
Devasia, Santosh | University of Washington |
Keywords: Integrated Planning and Control, Optimization and Optimal Control
Abstract: The success of the model predictive path integral control (MPPI) approach depends on the appropriate selection of the input distribution used for sampling. However, it can be challenging to select inputs that satisfy output constraints in dynamic environments. The main contribution of this paper is to propose an output-sampling-based MPPI (o-MPPI), which improves the ability of samples to satisfy output constraints and thereby, increases MPPI efficiency. Comparative simulations and experiments of dynamic autonomous driving of bots around a track are provided to show that the proposed o-MPPI is more efficient and requires substantially (20-times) less number of rollouts and (4-times) smaller prediction horizon when compared with the standard MPPI for similar success rates.
|
|
10:30-12:00, Paper ThAT33-CC.9 | Add to My Program |
The Virtues of Laziness: Multi-Query Kinodynamic Motion Planning with Lazy Methods |
|
Pasricha, Anuj | University of Colorado Boulder |
Roncone, Alessandro | University of Colorado Boulder |
Keywords: Motion and Path Planning, Integrated Planning and Control, Manipulation Planning
Abstract: In this work, we introduce LazyBoE, a multi-query method for kinodynamic motion planning with forward propagation. This algorithm allows for the simultaneous exploration of a robot's state and control spaces, thereby enabling a wider suite of dynamic tasks in real-world applications. Our contributions are three-fold: i) a method for discretizing the state and control spaces to amortize planning times across multiple queries; ii) lazy approaches to collision checking and propagation of control sequences that decrease the cost of physics-based simulation; and iii) LazyBoE, a robust kinodynamic planner that leverages these two contributions to produce dynamically-feasible trajectories. The proposed framework not only reduces planning time but also increases success rate in comparison to previous approaches.
|
|
ThAL-EX Poster Session, Exhibition Hall |
Add to My Program |
Late Breaking Results Poster VII |
|
|
|
10:30-12:00, Paper ThAL-EX.1 | Add to My Program |
Development of Control Systems for Autonomous Mobile Robots in Hololinc Manufacturing Systems |
|
Yang, Dayeon | Korea Institute of Industrial Technology |
Ju, Chanyoung | Korea Institute of Industrial Technology |
Keywords: Industrial Robots, Autonomous Agents, SLAM
Abstract: An innovative manufacturing system called the holonic manufacturing system was created to increase production process efficiency. To automate manufacturing, autonomous mobile robots must be included in the holonic manufacturing system. Thus, this study aims to present the control methods of the autonomous mobile manipulator that can operate independently in machining operations like welding and drilling. The developed robot consists of a manipulator for machining tasks and a mobile robot that can navigate the manufacturing facility. An enhanced bacteria foraging optimization algorithm-based controller is proposed to optimize the performance for a class of nonlinear process models. This work will outline the robot's mechanical design, analyze it, and suggest future research.
|
|
10:30-12:00, Paper ThAL-EX.2 | Add to My Program |
Development of a Printable Joint Structure Using Multi-Material 4D Printing |
|
Park, Jong Hoo | Seoul National University |
Lee, Haemin | Seoul National University |
Cho, Kyu-Jin | Seoul National University, Biorobotics Laboratory |
Keywords: Methods and Tools for Robot System Design, Assembly, Compliant Joints and Mechanisms
Abstract: 3D printing, or additive manufacturing, enables the creation of complex structures that conventional manufacturing technology cannot replicate. Such advantages have triggered researchers to create custom articulated mechanisms without manual assembly using various 3D printing techniques, which can be called Single-step 3D Printing. However, in 3D printing, joint clearance between the bodies must be considered otherwise undesired fusion between bodies may occur. Therefore, current researchers focus on minimizing joint clearance by geometric design; however, the robustness of the 3D printable articulated mechanisms is far inferior to that of the mechanisms made with machining and manual assembly. In this paper, 4D printing technology is utilized to enhance the robustness of the motion of printable joint structures. The joint structure designed by the proposed approach is programmed to minimize its joint clearances as 4D printing implies shape change after 3D printing under a stimulus. Multi-material 3D printing is also incorporated to achieve appropriate material properties and to realize selective shape change for components of the joint. As a result, the joint performance is compared with that of conventional 3D printable joints. The proposed approach can generate robust joint motions by realizing a joint clearance that is unachievable from conventional 3D printing technologies and is expected to be applied in various robotics applications in the future.
|
|
10:30-12:00, Paper ThAL-EX.3 | Add to My Program |
A Machine Vision for the Automation Process of Slaughtering Ducks |
|
Ko, KwangEun | Korea Institute of Industrial Technology |
Yoon, Chanyoung | Korea Institute of Industrial Technology |
Yang, Gi-Hun | KITECH |
Kang, Jaehyeon | Korea Institute of Industrial Technology |
Han, Sang Kuy | Korea Institute of Industrial Technology |
Keywords: Deep Learning for Visual Perception, Computer Vision for Automation, Visual Learning
Abstract: Duck slaughter process consists of a total of 16 stages from arrival to shipping. Most of process has been automated. However, the bloodletting process stage still requires repetitive and dangerous labors who are constantly exposed to physical and psychological hazards including blood and filth. In order to automate the bloodletting process for duck slaughtering, artificial intelligence (A.I.) and robot technologies has taken attention.In this study, we propose a machine vision based bloodletting area recognition technology to automate the bloodletting process during duck slaughter. A Mask-RCNN, the most general deep learning model for background segmentation, is utilized for object-background segmentation. Following the background segmentation, an integrated pipeline was implemented to find the bloodletting area based on the estimated cervical vertebrae from the neck area of each duck. For evaluating the proposed method, a training dataset of 1,789 RGB images was collected from duck slaughter process and the deep learning model for a duck neck-background segmentation was trained.The binary mask for the duck neck region estimated by the trained deep learning model based on the dataset is applied to the input images. For the automatic detection of neck area, i.e., target bloodletting area, the testing results shows high accuracy. Two cases of non-faint ducks, i.e., ducks with bent neck or excessive body movement were successfully detected.
|
|
10:30-12:00, Paper ThAL-EX.4 | Add to My Program |
Vision-Based UAV Geo-Localization Using Satellite Images in GNSS-Denied Environments |
|
Choi, Euncheol | Inha University |
Jung, Sungwook | KETI (Korea Electronics Technology Institute) |
Cho, Younggun | Inha University |
Keywords: Aerial Systems: Perception and Autonomy, Field Robots, Localization
Abstract: Robust and accurate localization is an essential technology for autonomous flight of unmanned aerial vehicles (UAVs). Currently, many UAV localization methodologies rely on GNSS signals, which can be easily blocked in urban environments and can be neutralized by jamming and spoofing. Therefore, geo-localization through matching UAV images and satellite images has been actively studied recently, but difficulties exist due to the large gap between the two images. In this paper, we present a geolocalization method that performs the matching between UAV images and satellite images using a foundation vision model in a GNSS-denied environment and estimates the global position of the UAV through integration with a Kalman filter. To validate the effectiveness of the geo-localization methodology presented in this paper, experiments were conducted on real aerial datasets.
|
|
10:30-12:00, Paper ThAL-EX.5 | Add to My Program |
Visual Affordance Model for Apple Harvesting Based on Hybrid Egocentric Dataset Collection |
|
Kim, Geonkuk | Korea University |
Park, Juyoun | Korea Institute of Science and Technology |
Keywords: Robotics and Automation in Agriculture and Forestry, Agricultural Automation, AI-Enabled Robotics
Abstract: This poster introduces a pioneering approach applying a visual affordance model to apple harvesting tasks, estimating contact points and trajectory after contact necessary for robotic manipulation. Affordance, in this context, refers to the potential actions that a robot can perform using a specific object when given. By leveraging this concept of affordance embedded within an action induction model, we propose a robot that automatically generates motions to harvest apples. Video datasets of apple harvesting are collected from both the real world and the simulation world, and they are used to train the visual affordance model. The trained visual affordance model is qualitatively evaluated through visualizations of contact point heatmaps and generated trajectories for the given apple image. Additionally, for further quantitative evaluation, we apply the proposed model to various tasks in kitchen environments. We refine the algorithm into a task-condition affordance model, which assigns multiple tasks to a single object based on the verb. Evaluations are conducted against a baseline in terms of metrics such as ADE, FDE, SIM, AUC-J, and NSS. The evaluation results confirmed that the task-condition affordance model outperforms across all five metrics. Future plans include utilizing Isaac Sim and O3DE simulators for a range of tasks to validate and verify the practicality of the proposed apple harvesting affordance model, with further tests to be conducted in the real world.
|
|
10:30-12:00, Paper ThAL-EX.6 | Add to My Program |
Trajectory Planning Based on Time-Space Network with Dijkstra Algorithm for Security Screening Using THz Sensor-Equipped UGV |
|
Uchida, Yuki | National Defense Academy of Japan |
Tsujita, Teppei | National Defense Academy of Japan |
Sakuma, Yutaka | National Defense Academy of Japan |
Abiko, Satoko | Shibaura Institute of Technology |
Sato, Daisuke | Tokyo City University |
Keywords: Surveillance Robotic Systems, Motion and Path Planning
Abstract: このポスターは、軌道計画のアルゴリズムを提示します テラヘルツを搭載したロボットによる保安検査 センサー。このアルゴリズムは、 2次元グリッド内のロボットと歩行者 表現、歩行者の数を予測します 特定の時間と場所で検査できます。その後、 は、軌跡が 検査可能な歩行者の数が多いほど、 コスト、 時空間における最短経路問題へのインスペクション ネットワーク。ダイクストラのアルゴリズムは、最短の パス
|
|
10:30-12:00, Paper ThAL-EX.7 | Add to My Program |
Walking in Constrained Environment Using Model-Based Reinforcement Learning for Virtual Constraint-Based Gait |
|
Jin, Takanori | National Institute of Informatics/SOKENDAI |
Kobayashi, Taisuke | National Institute of Informatics |
Matsubara, Takamitsu | Nara Institute of Science and Technology |
Keywords: Humanoid and Bipedal Locomotion, Legged Robots, Machine Learning for Robot Control
Abstract: Virtual constraint-based gait control can acquire a variety of gait motions depending on the virtual constraints provided. However, this method suffers from footstep constraints. This is because the center of mass (CoM) motion cannot be predicted analytically. Therefore, the method requires a numerical solution of the forward problem, which is vulnerable to errors. To solve the issue, we propose a footstep planning method using model-based reinforcement learning in virtual constraint-based walking. In the proposed method, model predictive control (MPC) evaluates the stability of each step while adjusting the footstep and manipulating the conserved quantities. To simplify the optimal control problem, passive dynamic autonomous control (PDAC), which compresses the CoM motion to the lowest dimension, is employed for walking control. The entire transition model to predict the future is decomposed into three segments in order to improve the learning speed by utilizing the gait phases knowledge. The three decomposed models and a stability-cost, which evaluates the footstep stability, are trained with ensemble learning for reducing modeling error and efficient exploration. Simulation results showed the proposed method achieved nearly twice higher goal achievement rate than the simplified baseline. Furthermore, the proposed method successfully maintains more than 70~% constraints even on constrained environments.
|
|
10:30-12:00, Paper ThAL-EX.8 | Add to My Program |
Action Recognition Based Variable Scaling Teleoperation System for Robotic Spine Surgery |
|
Lee, Hunjo | Korea University of Science and Technology, Korea Institute of I |
Kim, Donghyun | University of Science and Technology |
Yang, Gi-Hun | KITECH |
Keywords: Telerobotics and Teleoperation, Surgical Robotics: Laparoscopy
Abstract: This paper introduces a variable scaling teleoperation framework designed to enhance the intuitiveness of robotic spine surgery. Challenges such as a limited field of view, asymmetric between master devices and surgical robots, and the absence of haptic feedback complicate teleoperated surgery, making it less intuitive for surgeons. To address these issues, we propose a variable scaling framework capable of real-time adjustments to the motion scale or stiffness scale of a slave robot based on the surgeon's current actions. This framework uses grip force as an intention of the operator’s motion, adjusting scale factors accordingly. The relationships between grip force and scale factors are grounded in the innate skill of humans, making the system intuitive to use. The scaling modes are switched based on outputs from an action recognition system, which infers surgeons' actions through surgical instruments. For this method, we employ an object detection algorithm to identify the current action during surgery. Our proposed framework thus facilitates intuitive execution of robotic surgeries. Future studies will focus on implementing this system in actual surgical environments and verifying its effectiveness.
|
|
10:30-12:00, Paper ThAL-EX.9 | Add to My Program |
Legged Robot State Estimation within Non-Inertial Environments |
|
Zijian, He | Purdue University |
Teng, Sangli | University of Michigan, Ann Arbor |
Lin, Tzu-Yuan | University of Michigan |
Ghaffari, Maani | University of Michigan |
Gu, Yan | Purdue University |
Keywords: Sensor Fusion, Legged Robots, Humanoid and Bipedal Locomotion
Abstract: This work investigates the robot state estimation problem within a non-inertial environment. The proposed state estimation approach relaxes the common assumption of static ground in the system modeling. The process and measurement models explicitly treat the movement of the non-inertial environments without requiring knowledge of its motion in the inertial frame or relying on GPS or sensing environmental landmarks. Further, the proposed state estimator is formulated as an invariant extended Kalman filter (InEKF) with the deterministic part of its process model obeying the group-affine property, leading to log-linear error dynamics. The observability analysis confirms the robot’s pose (i.e., position and orientation) and velocity relative to the non-inertial environment are observable under the proposal InEKF.
|
|
10:30-12:00, Paper ThAL-EX.10 | Add to My Program |
Continual Skill Learning with Vision-Language Model for Robotic Manipulation |
|
Tan, Runjia | Nanyang Technological University |
Lou, Shanhe | Nanyang Technological University |
Huang, Wenhui | NanYang Technological University |
Lv, Chen | Nanyang Technological University |
Keywords: Manipulation Planning, AI-Enabled Robotics, Intelligent and Flexible Manufacturing
Abstract: The advent of Large Language Models (LLMs) has significantly enhanced the capability of robots to perform tasks based on human instructions. However, a persistent challenge remains: enabling robots to autonomously learn and improve from their past experiences. Addressing this, our paper introduces a systematic approach that assists robots in acquiring new skills through their interaction history. At the heart of our methodology is the development of a meta-task framework, which conceptualizes all tasks as sequences of meta-tasks within a hierarchical skill library that categorizes tasks based on their complexity. To ensure the applicability of acquired skills across different settings, we incorporate a scene understanding module that maintains skill consistency across diverse environments. Moreover, our system is designed to allow human operators to effortlessly invoke these newly acquired skills through direct instructions. We have conducted extensive testing of our system in both simulated and real-world environments to validate its effectiveness and versatility.
|
|
10:30-12:00, Paper ThAL-EX.11 | Add to My Program |
Characterizing Robot Vision Solutions for Anomaly Detection in Confined Spaces |
|
Patil, Apoorva | University of Washington |
Lee, Ryan | University of Washington |
Balasubramaniyam, Shankruth | Indian Institute of Technology Madras |
Zhang, Tommy | University of Washington |
Wong, Benjamin | University of Washington |
Banerjee, Ashis | University of Washington |
Keywords: Robotics in Hazardous Fields, RGB-D Perception
Abstract: We present a framework to experimentally characterize robot vision systems for anomaly detection during inspection of confined spaces, containing key structural and functional elements, such as pipes, cables, columns, and I-beams. The anomalies typically comprise rust patches and corroded sections, and foreign object debris of different kinds such as industrial tools. We first collect our own dataset in a realistic confined space resembling a large ballast tank, using two commodity RGB-D cameras with a large collection of FODs and rust patches of different shapes and sizes. We then employ fine-tuned object detection models to classify the anomalies in the RGB-D images, and the classification performance is used to characterize the effectiveness of the two depth cameras. The results indicate that one of the cameras tends to capture higher-quality depth images for anomaly detection purposes.
|
|
10:30-12:00, Paper ThAL-EX.12 | Add to My Program |
Integration of Active Object Search and Object Manipulation in Constrained Robot Operational Spaces through Real-Time Efficient |
|
Sakamaki, Arata | Tamagawa University |
Contreras-Toledo, Luis Angel | Tamagawa University |
Inamura, Tetsunari | Tamagawa University |
Okada, Hiroyuki | Tamagawa University |
Keywords: Domestic Robotics, Service Robotics, Mobile Manipulation
Abstract: In recent years, various methods and algorithms have been researched and proposed, leading to remarkable advancements in the performance of autonomous domestic service robots. As an example task, there is the challenge of manipulating objects within cluttered shelves with occlusions, where the robot’s workspace is constrained, to discover specific items. Despite numerous methods proposed for such challenging problems, many studies have focused on evaluating the number of actions and success rates, with few addressing real-time efficiency. Therefore, in this study, we focus on the real-time efficiency under conditions with constrained feasible workspace, such as shelves in domestic settings. Initially, operating on the assumption of the robot's execution success, we employ a combination of simple methods based on human empirical knowledge to operate the robot reliably. This approach aims to clarify interactions between the robot and objects, highlighting aspects where the robot fails in tasks and incurs time losses, aspects that cannot be measured by success rates alone.
|
|
10:30-12:00, Paper ThAL-EX.13 | Add to My Program |
Cardiac Assistive Device Based on Electroactive Polymer |
|
Kim, Jiyeop | Seoul National University |
Lee, Junheon | Seoul National University |
Song, Sein | Seoul National University |
Kang, Si-Hyuck | Seoul National University Bundang Hospital |
Han, Amy Kyungwon | Seoul National University |
Keywords: Medical Robots and Systems, Soft Robot Applications, Soft Sensors and Actuators
Abstract: Over 6 million heart failure with reduced ejection fraction (HFrEF) patients face a 70% 5-year mortality rate. These patients rely on a continuous flow pump, a left ventricular assist device (LVAD), while waiting for a heart donor. Research on direct ventricular compression assist devices utilizing soft robotics such as pneumatic and shape memory alloy systems has been conducted to overcome the limitations. Still, these devices encounter challenges in removing drivelines, high power consumption, and portability. We present a ventricular assistive device based on electroactive polymers (EAPs) that mechanically assists ventricular contraction with small power consumption and the potential to be wireless. The device utilizes compact EAPs with negative-bias springs (NBS) to effectively assist ventricular contraction by mimicking the systolic pressure-volume curve. The units of EAP-NBS are arranged on a flexible sleeve that wraps the ventricle and is adjusted to be patient-specific. The device was tested on an HFrEF model silicone phantom and demonstrated an 18.8% increase in cardiac output and a 1.19x higher ejection fraction with <0.3W of power consumption compared to the scenario without the device. As far as we know, this is the first EAP-based direct ventricular compression assistive device. Unlike LVADs, our device is blood-contact free, minimizing the risk of thrombosis, and does not require a large battery or a driveline, significantly increasing the quality of life.
|
|
10:30-12:00, Paper ThAL-EX.14 | Add to My Program |
Human-In-The-Loop Physics Simulation of a Mobile Robotic Balance Assistant |
|
Chan, Sherwin Stephen | Nanyang Technological University |
Wang, Yifan | Nanyang Technological University |
Lei, Mingyuan | Nanyang Technological University |
Johan, Henry | Nanyang Technological University |
Ang, Wei Tech | Nanyang Technological University |
Keywords: Human Factors and Human-in-the-Loop, Simulation and Animation
Abstract: As the world ages, rehabilitation and assistive devices will be pivotal in meeting mobility related challenges, enhance rehabilitation outcomes and alleviate demands on caregivers. Mobile Robotic Balance Assistant (MRBA) is a balance assistive robot used during gait training and activities of daily living. As with all other healthcare robots, extensive iterations, clinical trials were done to refine the robot system design, resulting in a slow and expensive development process. Our work explores the use of Human-in-the-Loop physics simulation with accurate Human-Robot-Interaction to allow for a more efficient and cost effective way to optimise the robot system design. Our simulation uses a personalized digital human model with various trained walking control policies as a digital guinea pig and our digital robot model replicates the kinematic, dynamic and control properties from its real counterpart. We model the interaction between the human and robot as a six degree of freedom constrained mass-spring-damper model to generate the interaction forces. Our results show that the digital human's reaction when using the robot is similar to experimental observations which highlights the potential of using simulation as a tool to customise the robot to the user and enhance its effectiveness to each user.
|
|
10:30-12:00, Paper ThAL-EX.15 | Add to My Program |
DART: Directed Action Realtime Tracking |
|
Lin, Huai-Ti | Imperial College London |
Zhou, Rui | Imperial College London |
Hadfield, Charles | Imperial College London |
Keywords: Mechanism Design, Visual Tracking, Biomimetics
Abstract: Visual tracking is an important, ubiquitous and challenging task for both animals and robots. Precise localization of moving objects in the world using vision allows safe path planning, efficient guidance, and better situation awareness. However, visually detecting and tracking a fast-moving target with erratic trajectories is computationally expensive and often requires a model of the world. Here, we take inspirations from our study of dragonfly’s vision and visual tracking behavior to develop an energy-efficient model-free high-speed visual tracking system. The system consists of two parts: 1) a pan-tilt mirror system for steering the line-of-sight of a capture camera, 2) a bioinspired visual tracking algorithm for commanding the system in real-world to follow a moving target. The system was first developed to investigate visual guidance and flight control in insects by tracking freely flying insects such as the dragonfly in the lab. The initial system performance evaluation through motion capture control show robust results and demonstrated promises for the bioinspired visual tracking control in the outdoor environment. Our breaking results demonstrate how the dragonfly-inspired visual tracking strategies can be implemented to track fast flying dragonflies in their habitat. The system can be further adapted for many other machine vision applications for detecting and tracking small fast-moving objects with minimum computation overhead.
|
|
10:30-12:00, Paper ThAL-EX.16 | Add to My Program |
Approximate Multiagent Reinforcement Learning for Large-Scale On-Demand Urban Mobility |
|
Bhattacharya, Sushmita | Harvard University |
Garces, Daniel | Harvard University |
Bertsekas, Dimitri | MIT |
Gil, Stephanie | Harvard University |
Keywords: Intelligent Transportation Systems, Multi-Robot Systems, Reinforcement Learning
Abstract: We focus on the autonomous multiagent taxi routing problem for a large urban environment with realistic travel time constraints where the location and number of future ride requests are unknown apriori but can be estimated by an empirical distribution. Motivated by a recent theory that has shown that a rollout algorithm with a stable base policy produces a near-optimal stable policy, our previous work proposed a hierarchical RL approach that scales sub-linearly with the number of agents. In this paper, we provide a provably stable hierarchical two-phase rollout-based RL approach, which keeps the number of outstanding requests uniformly bounded over time and improves the run time compared to our previous work. Our preliminary work shows that our new two-phase approach outperforms our previous work for unit travel time cases. Our new two-phase policy outperforms a stable base policy for a fleet size that is five times larger than the one considered in our previous work with realistic travel time constraints.
|
|
10:30-12:00, Paper ThAL-EX.17 | Add to My Program |
Pedestrian Trajectory Prediction with Pose Estimation and Monte Carlo Dropout |
|
Tadano, Shunya | Tohoku University |
Tamura, Yusuke | Tohoku University |
Hirata, Yasuhisa | Tohoku University |
Keywords: RGB-D Perception
Abstract: Pedestrian movement prediction is essential for autonomous mobile robots (AMR) and social robots that share the same environment with humans. A key challenge is the accurate prediction of pedestrian trajectories, due to uncertainties from environment, sensor factors, and unpredictable pedestrian behaviors. In this study, we propose a dual approach to enhance prediction accuracy by managing these uncertainties. First, we employ Monte Carlo Dropout in our predictive model to produce multiple prediction outputs. This method considers uncertainty in pedestrian movements, addressing the inherent unpredictability in human behavior. Second, we incorporate pedestrian pose information, using a posture estimation model with data from a first-person RGB-D camera. This integration provides critical cues for prediction, improving the model's performance in Average Displacement Error (ADE) and FDE metrics. In the Future, we validate our model in more complex scenarios and real-world applications.
|
|
10:30-12:00, Paper ThAL-EX.18 | Add to My Program |
An Enhanced Shopping Cart Guidance System for Visually Impaired Users with Passive Control Interface |
|
Shi, Zhan | TOHOKU University |
Tamura, Yusuke | Tohoku University |
Liao, Zhenyu | Tohoku University |
He, Weizan | Tohoku University |
Hirata, Yasuhisa | Tohoku University |
Keywords: Human-Centered Robotics
Abstract: To address the challenges faced by visually impaired individuals in supermarkets, this paper introduces a shopping cart-type robot equipped with a camera and environmental markers for intuitive navigation. It employs a passive control strategy for safety and provides verbal and haptic feedback to guide users to their destinations. In ten validation experiments, the system effectively led blindfolded users to desired locations, demonstrating its potential as an assistive tool for visually impaired shoppers.
|
|
10:30-12:00, Paper ThAL-EX.19 | Add to My Program |
Robust Side Following Robotic Wheelchair by Using Homotopy Class of Human Intention |
|
Tan, Kuan Yuee | Nanyang Technological University |
Garg, Neha Priyadarshini | NUS |
Ramanathan, Manoj | Nanyang Technological University |
Ang, Wei Tech | Nanyang Technological University |
Keywords: Human-Aware Motion Planning, Intention Recognition, Human-Centered Automation
Abstract: A side-by-side following robot can alleviate the need for wheelchair pushing, thereby reducing the burden on caregivers and porters. Existing works cannot be easily adapted to different environments as they either need prior knowledge about the person’s path or the robot may take a different path than the human when moving around obstacles. In this work, we propose a side-by-side following system that ensures the wheelchair takes the same path as the human around obstacles while avoiding them by leveraging the homotopy class of the human’s intended path. Our system can also be easily deployed in various real-world environments as it only needs prior knowledge of the human’s final goal. Through experiments in simulation, we show that our method can perform significantly better compared to a baseline approach that does not leverage the homotopy class of the human’s intended path. For further validation, we are in the process of testing our method with real human subjects using actual wheelchair.
|
|
10:30-12:00, Paper ThAL-EX.20 | Add to My Program |
Avoidance Behavior Selection of Autonomous Mobile Robot Based on the Feasibility of Model Predictive Control |
|
Kada, Aiki | Nagoya University |
Suzuki, Kosuke | Nagoya University |
Honda, Kohei | Nagoya University |
Okuda, Hiroyuki | Nagoya University |
Suzuki, Tatsuya | Nagoya University |
Keywords: Human-Aware Motion Planning, Human-Robot Collaboration, Intention Recognition
Abstract: This study focuses on the considerate behavior between AMRs and humans in shared spaces. In such shared environments, adjust a balance between efficient navigation and yielding to human is crucial. To switch the behavior appropriately, a framework based on the feasibility of Model Predictive Control (MPC) is proposed. In this framework, the variation in behavior is realized by changing parameters in MPC. To avoid the redundancy in the behavior planner and the controller, the feasibility of the MPC is used to switch the behavior, and the usefulness was confirmed.
|
|
10:30-12:00, Paper ThAL-EX.21 | Add to My Program |
Two-Dimensional Airflow Analysis of Enclosed Space with an Opening Based on PINNs for Real-Time Control of UAVs |
|
Abiko, Satoko | Shibaura Institute of Technology |
Seki, Misato | Shibaura Institute of Technology |
Tsujita, Teppei | National Defense Academy of Japan |
Sato, Daisuke | Tokyo City University |
Keywords: Aerial Systems: Perception and Autonomy, Aerial Systems: Applications, Aerial Systems: Mechanics and Control
Abstract: Recently, UAV delivery in urban areas has attracted significant attention. The authors focus on the concept of delivering packages to verandas. However, navigating drones in urban environments is considerably more complex than in open spaces due to the intricate airflow patterns between buildings and other structures. Consequently, accurate airflow prediction becomes essential for ensuring safe UAV operations. Typically, fluid dynamics is governed by the Navier-Stokes equation, a complex partial differential equation. Solving this precisely requires computational methods, as direct solutions are challenging to obtain. Computational Fluid Dynamics (CFD), using methods like the Finite Difference Method (FDM), divides the space into grids to approximate equations. Smaller grids improve accuracy but increase computation time, making real-time prediction impractical for UAV use due to this accuracy-computation time trade-off. This poster describes the application of Physics-Informed Neural Networks (PINNs) to predict airflow in real-time. PINNs are specialized neural networks trained to adhere to the underlying physical laws of fluid dynamics. The poster presents a study on two-dimensional airflow analysis within an enclosed space with an opening using PINNs. It compares the airflow predictions obtained with PINNs against those calculated using the FDM. The comparison focuses on the performance of real-time airflow analysis and the accuracy of these methods.
|
|
10:30-12:00, Paper ThAL-EX.22 | Add to My Program |
Pass through the Bottleneck: MEMS-Mirror LiDAR SLAM on Degraded Tunnels with Dynamic Noise |
|
Ruan, Jianyuan | Hong Kong Polytechnic University |
Keywords: SLAM, Range Sensing, Field Robots
Abstract: MEMS-mirror LiDAR has been deployed on commercial automotive vehicles as a low-cost, reliable, and compact choice. The MEMS-mirror technique enables dense range data and vibration resistance compared with conventional spinning LiDARs. This work explores the advantages of this new type of sensor in SLAM (Simultaneous Localization and Mapping). The system can achieve precise, fast, and robust localization and mapping and overcome one of the bottlenecks of the SLAM system: degenerated scenarios with dynamic noise. First, we use an image-based method leveraging the high-resolution range data for segmentation and feature extraction. This dramatically enhances the utilization of information from the current frame. Then, to address dynamic and degraded scenes, we propose a method to remove dynamic objects during scan registration optimization by utilizing object size information. We improve the signal-to-noise ratio in extreme environments by integrating the earlier techniques. Extensive experiments demonstrate that our method performs robustly in city tunnels with heavy traffic, while most state-of-the-art methods fail.
|
|
10:30-12:00, Paper ThAL-EX.23 | Add to My Program |
Optical-Based Slip Sensing for Reliable Tissue Handling in Robotic Minimally Invasive Surgery |
|
Lee, Minjae | Seoul National University |
Han, Amy Kyungwon | Seoul National University |
Keywords: Surgical Robotics: Laparoscopy, Medical Robots and Systems, Grasping
Abstract: In Robotic minimally invasive surgery, grasping and manipulating biological tissue are the most frequent tasks, affecting surgery's accuracy, safety, and duration. Grasping delicate, deformable, and moist tissue presents a challenge: excessive force leads to tissue damage and bleeding, while insufficient force causes tissue slippage, resulting in critical accidents and delays. The optimal gripping force is the amount precisely enough to prevent slippage. Therefore, detecting slippage and adjusting gripping force accordingly is crucial for enhancing the safety, accuracy, and duration of surgery. Several groups have developed slip-sensing surgical graspers based on temperature [1], coil inductance [2], and micro-vibration [3]. However, their ability to detect slip speed or displacement is limited regarding directions, response time, and minimum displacement threshold. Here, we present a surgical grasper equipped with optical-based slip sensing technology that precisely measures slippage across a wide range of slip speeds (0.33 to 60mm/s) in a 2D plane. The sensor accurately estimates the slip distance under various gripping forces with a mean error of <0.32mm. By incorporating two sensors on a surgical grasper, we identified various slippage cases, including translational, rotational, and combined slippage, as well as the stretch amount and direction of the tissue during pulling. The detection of tissue slippage and deformation will enhance tissue handling in robotic surgery.
|
|
10:30-12:00, Paper ThAL-EX.24 | Add to My Program |
Zero-Shot Safety Prediction for Autonomous Robots with Foundation World Models |
|
Mao, Zhenjiang | University of Florida |
Dai, Siqi | University of Florida |
Geng, Yuang | University of Florida |
Ruchkin, Ivan | University of Florida |
Keywords: Robot Safety, AI-Based Methods, Representation Learning
Abstract: A world model creates a surrogate world to train a controller and predict safety violations by learning the internal dynamic model of systems. However, the existing world models rely solely on statistical learning of how observations change in response to actions, lacking precise quantification of how accurate the surrogate dynamics are, which poses a significant challenge in safety-critical systems. To address this challenge, we propose foundation world models that embed observations into meaningful and interpretable latent representations. This enables the surrogate dynamics to directly predict interpretable future states by leveraging a training-free large language model. In two common benchmarks, this novel model outperforms standard world models in the safety prediction task and has a performance comparable to supervised learning despite not using any data. We evaluate its performance with a more specialized and system-relevant metric by comparing estimated states instead of aggregating observation-wide error.
|
|
10:30-12:00, Paper ThAL-EX.25 | Add to My Program |
Cell Segmentation of the Carnivorous Plant and Hydrogel-Based Actuator Inspired by the Sundew |
|
Zeng, Xiangli | Osaka University |
Wang, Yingzhe | Osaka University |
Morishima, Keisuke | Osaka University |
Keywords: Biologically-Inspired Robots, Biomimetics, Micro/Nano Robots
Abstract: Plants could respond to the stimulations (light, humidity, sound, touch, etc), and thus they enlightened functional material and soft robot design. The robots inspired by the plant are always with simple structures, easy controlling strategies, and pre-determined deformation. Sundew is one of the carnivorous plants, which can catch insects by its sticky tentacles and curling of the leaf. Here, to explore the motion mechanism of this plant, the dissections of the leaf were obtained and the cell segmentation was conducted for cell area and cell distribution. To quantitatively analyze the morphology, the data was imported into MATLAB and processed. Based on the statistical results, an actuating principle was proposed and tested by the simulation model. Besides, a physical model was fabricated by hydrogel which could be composed of several units to amplify deformation. In conclusion, an actuating method that the structure could be actuated by cell distribution was proposed and examined by simulation model and physical model. The mechanism in this study could inspire soft robot design.
|
|
10:30-12:00, Paper ThAL-EX.26 | Add to My Program |
Late Breaking Results on End-To-End Generation of Factorized Scene Graphs |
|
Millan Romera, Jose Andres | University of Luxembourg |
Bavle, Hriday | University of Luxembourg |
Shaheer, Muhammad | University of Luxembourg |
Oswald, Martin R. | ETH Zurich |
Voos, Holger | University of Luxembourg |
Sanchez-Lopez, Jose Luis | University of Luxembourg |
Keywords: Cognitive Modeling, SLAM, Deep Learning Methods
Abstract: Scene Graphs (SG) model the geometric-semantic information of the environment of the robot enabling it for any downstream task. However, SG generation has been limited to classifying the edge type between observable objects or ad-hoc algorithms to generate one specific type of semantic entity [1]. We overcome this with a GNN-based diffusion model which generates the SC independently of node type (G-GNN). At each denoising step, one node, its edges and tis node/edge features are generated based on GraphARM[2]. Furthermore, our work S-Graphs+ [1] includes a factor on every edge, tightly coupling the optimization of the SG with the SLAM graph. These factors are manually defined by different functions depending on the related node types. We present a novel factor definition based on GNN common for every edge (F-GNN). A unique architecture encodes the geometrical relationship between the entities of every generated edge with a different model for each combination of connected node types. G-GNN and F-FNN are trained on our synthetic dataset containing planes, walls, rooms and floors and tested in simulated and real scenarios.
|
|
ThBT1-CC Oral Session, CC-303 |
Add to My Program |
Motion Planning II |
|
|
Chair: Hutter, Marco | ETH Zurich |
Co-Chair: Halperin, Dan | Tel Aviv University |
|
13:30-15:00, Paper ThBT1-CC.1 | Add to My Program |
Characterizing Physical Adversarial Attacks on Robot Motion Planners |
|
Wu, Wenxi | King's College London |
Pierazzi, Fabio | King's College London |
Du, Yali | King's College London |
Brandao, Martim | King's College London |
Keywords: Motion and Path Planning, Robot Safety
Abstract: As the adoption of robots across society increases, so does the importance of considering cybersecurity issues such as vulnerability to adversarial attacks. In this paper we investigate the vulnerability of an important component of autonomous robots to adversarial attacks - robot motion planning algorithms. We particularly focus on attacks on the physical environment, and propose the first such attacks to motion planners: "planner failure" and "blindspot" attacks. Planner failure attacks make changes to the physical environment so as to make planners fail to find a solution. Blindspot attacks exploit occlusions and sensor field-of-view to make planners return a trajectory which is thought to be collision-free, but is actually in collision with unperceived parts of the environment. Our experimental results show that successful attacks need only to make subtle changes to the real world, in order to obtain a drastic increase in failure rates and collision rates - leading the planner to fail 95% of the time and collide 90% of the time in problems generated with an existing planner benchmark tool. We also analyze the transferability of attacks to different planners, and discuss underlying assumptions and future research directions. Overall, the paper shows that physical adversarial attacks on motion planning algorithms pose a serious threat to robotics, which should be taken into account in future research and development.
|
|
13:30-15:00, Paper ThBT1-CC.2 | Add to My Program |
Eclares: Energy-Aware Clarity-Driven Ergodic Search |
|
Naveed, Kaleb Ben | University of Michigan, Ann Arbor |
Agrawal, Devansh | University of Michigan |
Vermillion, Christopher | University of Michigan |
Panagou, Dimitra | University of Michigan, Ann Arbor |
Keywords: Constrained Motion Planning, Motion and Path Planning, Integrated Planning and Control
Abstract: Planning informative trajectories while considering the spatial distribution of the information over the environment, as well as constraints such as the robot's limited battery capacity, makes the long-time horizon persistent coverage problem complex. Ergodic search methods consider the spatial distribution of environmental information while optimizing robot trajectories; however, current methods lack the ability to construct the target information spatial distribution for environments that vary stochastically across space and time. Moreover, current coverage methods dealing with battery capacity constraints either assume simple robot and battery models or are computationally expensive. To address these problems, we propose a framework called Eclares, in which our contribution is two-fold. 1) First, we propose a method to construct the target information spatial distribution for ergodic trajectory optimization using clarity, an information measure bounded between [0,1]. The clarity dynamics allow us to capture information decay due to a lack of measurements and to quantify the maximum attainable information in stochastic spatiotemporal environments. 2) Second, instead of directly tracking the ergodic trajectory, we introduce the energy-aware (eware) filter, which iteratively validates the ergodic trajectory to ensure that the robot has enough energy to return to the charging station when needed. The proposed eware filter is applicable to nonlinear robot models and is computationally lightweight. We demonstrate the working of the framework through a simulation case study.
|
|
13:30-15:00, Paper ThBT1-CC.3 | Add to My Program |
Tight Motion Planning by Riemannian Optimization for Sliding and Rolling with Finite Number of Contact Points |
|
Livnat, Dror | Tel Aviv University |
Bilevich, Michael M. | Tel Aviv University |
Halperin, Dan | Tel Aviv University |
Keywords: Constrained Motion Planning, Disassembly
Abstract: We address a challenging problem in motion planning where robots must navigate through narrow passages in their configuration space. Our novel approach leverages optimization techniques to facilitate sliding and rolling movements across critical regions, which represent semi-free configurations, where the robot and the obstacles are in contact. Our algorithm seamlessly traverses widely free regions, follows semi-free paths in narrow passages, and smoothly transitions between the two types. We specifically focus on scenarios resembling 3D puzzles, intentionally designed to be complex for humans by requiring intricate simultaneous translations and rotations. Remarkably, these complexities also present computational challenges. Our contributions are threefold: First, we solve previously unsolved problems; second, we outperform state-of-the-art algorithms on certain problem types; and third, we present a rigorous analysis supporting the consistency of the algorithm. In the Supplementary Material we provide theoretical foundations for our approach. The Supplementary Material and our open source software are available at https://github.com/TAU-CGL/tr-rrt-public. This research sheds light on effective approaches to address motion planning difficulties in intricate 3D puzzle-like scenarios.
|
|
13:30-15:00, Paper ThBT1-CC.4 | Add to My Program |
Online Trajectory Deformation and Tracking for Self-Entanglement-Free Differential-Driven Robots |
|
Liu, Jiangpin | Zhejiang University |
Yang, Tong | Zhejiang University |
Lu, Wangtao | Zhejiang University |
Wang, Yue | Zhejiang University |
Xiong, Rong | Zhejiang University |
Keywords: Constrained Motion Planning, Nonholonomic Motion Planning, Planning under Uncertainty
Abstract: This paper introduces an optimisation-based trajectory deformation and tracking algorithm for tethered differential-driven mobile robots. The motivation of this work is to generate self-entanglement-free (SEF) commands for a tethered differential-driven robot to track a path. Whilst existing path planner has been capable of generating SEF paths for tethered differential-driven robots lacking an omni-directional tether retracting mechanism, no trajectory planner can handle the unavoidable movement errors that cause robot pose deviate from the pre-defined path. The trajectory deformation and tracking is challenging because the admissible heading direction of the robot is highly constrained by the SEF constraint. As a result, even with an SEF path, the robot still encounters self-entanglement issues during execution. This paper fills this gap by formulating the trajectory deforming and tracking (TDT) problem of a tethered robot into a multi-objective optimisation framework. Explicit consideration of the constraint of the relative angle between the tether stretching direction and the robot’s heading direction to be admissible during its movement is provided in this framework. The proposed algorithm repeatedly deforms the pre-defined path for easier tracking, whilst generating a suitable velocity profile for robot execution. Compared to directly applying the commonly used untethered trajectory deformation and tracking algorithm into tethered cases, the proposed algorithm demonstrates improved performance in terms of minimising the risk of self-entanglement and maximising robot safety. These are validated in both simulated and real scenarios. An open-sourcesourcing implementation has also been provided for the benefit of the robotics community
|
|
13:30-15:00, Paper ThBT1-CC.5 | Add to My Program |
Efficient Motion Planning for Manipulators with Control Barrier Function-Induced Neural Controller |
|
Yu, Mingxin | Massachusetts Institute of Technology |
Yu, Chenning | University of California San Diego |
Naddaf Shargh, Mohammad Mahdi | Ford Motor Company |
Upadhyay, Devesh | Saab |
Gao, Sicun | UCSD |
Fan, Chuchu | Massachusetts Institute of Technology |
Keywords: Motion and Path Planning, Robot Safety, Deep Learning in Grasping and Manipulation
Abstract: Sampling-based motion planning methods for manipulators in crowded environments often suffer from expensive collision checking and high sampling complexity, which make them difficult to use in real time. To address this issue, we propose a new generalizable control barrier function (CBF)- based steering controller to reduce the number of samples needed in a sampling-based motion planner RRT. Our method combines the strength of CBF for real-time collision-avoidance control and RRT for long-horizon motion planning, by using CBF-induced neural controller (CBF-INC) to generate control signals that steer the system towards sampled configurations by RRT. CBF-INC is learned as Neural Networks and has two variants handling different inputs, respectively: state (signed distance) input and point-cloud input from LiDAR. In the latter case, we also study two different settings: fully and partially observed environmental information. Compared to manually crafted CBF which suffers from over-approximating robot geometry, CBF-INC can balance safety and goal-reaching better without being over-conservative. Given state-based input, our neural CBF-induced neural controller-enhanced RRT (CBF-INC-RRT) can increase the success rate by 14% while reducing the number of nodes explored by 30%, compared with vanilla RRT on hard test cases. Given LiDAR input where vanilla RRT is not directly applicable, we demonstrate that our CBF-INC-RRT can improve the success rate by 10%, compared with planning with other steering controllers. Our project page with supplementary material is at https://mit-realm.github.io/CBF-INC-RRT-website/.
|
|
13:30-15:00, Paper ThBT1-CC.6 | Add to My Program |
Planning Optimal Trajectories for Mobile Manipulators under End-Effector Trajectory Continuity Constraint |
|
Nguyen, Quang-Nam | Nanyang Technological University |
Pham, Quang-Cuong | NTU Singapore |
Keywords: Constrained Motion Planning, Task and Motion Planning, Mobile Manipulation
Abstract: Mobile manipulators have been employed in many applications that are traditionally performed by either multiple fixed-base robots or a large robotic system. This capability is enabled by the mobility of the mobile base. However, the mobile base also brings redundancy to the system, which makes mobile manipulator motion planning more challenging. In this paper, we tackle the mobile manipulator motion planning problem under the end-effector trajectory continuity constraint in which the end-effector is required to traverse a continuous task-space trajectory (time-parametrized path), such as in mobile printing or spraying applications. Our method decouples the problem into: (1) planning an optimal base trajectory subject to geometric task constraints, end-effector trajectory continuity constraint, collision avoidance, and base velocity constraint; which ensures that (2) a manipulator trajectory is computed subsequently based on the obtained base trajectory. To validate our method, we propose a discrete optimal base trajectory planning algorithm to solve several mobile printing tasks in hardware experiment and simulations.
|
|
13:30-15:00, Paper ThBT1-CC.7 | Add to My Program |
Zero-Shot Constrained Motion Planning Transformers Using Learned Sampling Dictionaries |
|
Johnson, Jacob | UCSD |
Qureshi, Ahmed H. | Purdue University |
Yip, Michael C. | University of California, San Diego |
Keywords: Constrained Motion Planning, Deep Learning Methods, Manipulation Planning
Abstract: In this work, we explore using a transformer-based model for motion planning with task space constraints for manipulation systems. Vector Quantized-Motion Planning Transformer (VQ-MPT) is a recent learning-based model that reduces the search space for unconstraint planning for sampling-based motion planners. We propose to adapt a pre-trained VQ-MPT model to reduce the search space for constraint planning without retraining or finetuning the model. We also propose to update the neural network output to move sampling regions closer to the constraint manifold. Our experiments show how VQ-MPT improves planning times and accuracy compared to traditional planners in simulated and real-world environments. Unlike previous learning methods, which require task-related data, our method uses pre-trained neural network models and requires no additional data for training and finetuning the model. We also tested our method on a physical Franka Panda robot with real-world sensor data, demonstrating the generalizability of our algorithm.
|
|
13:30-15:00, Paper ThBT1-CC.8 | Add to My Program |
LSTP: Long Short-Term Motion Planning for Legged and Legged-Wheeled Systems |
|
Jelavic, Edo | Swiss Federal Institute of Technology Zurich |
Qu, Kaixian | ETH Zürich |
Farshidian, Farbod | ETH Zurich |
Hutter, Marco | ETH Zurich |
Keywords: Motion and Path Planning, Robotics in Construction, Optimization and Optimal Control, Legged-wheeled Robots
Abstract: This article presents a hybrid motion planning and control approach applicable to various ground robot types and morphologies. Our two-step approach uses a sampling-based planner to compute an approximate motion which is then fed to numerical optimization for refinement. The sampling-based stage finds a long-term global plan consisting of a contact schedule and a sequence of stable whole-body configurations. Subsequently, the optimization refines the solution with a short-term planning horizon to satisfy all dynamics constraints. The proposed planner can compute plans for scenarios that would be difficult for trajectory optimization or sampling planner alone. We present tasks of traversing challenging terrain that requires discovering a contact schedule, navigating non-convex obstacles, and coordinating many degrees of freedom. Our hybrid planner has been applied to three different robots: a quadruped, a wheeled quadruped, and a legged excavator. We validate our hybrid planner in the real world and simulation, generating behaviors we could not achieve with previous methods. The results show that computing and executing hybrid locomotion plans is possible on hardwar
|
|
13:30-15:00, Paper ThBT1-CC.9 | Add to My Program |
Risk-Inspired Aerial Active Exploration for Enhancing Autonomous Driving of UGV in Unknown Off-Road Environments |
|
Wang, Rongchuan | Beijing Institute of Technology |
Fu, Mengyin | Beijing Institute of Technology |
Yu, Jing | Beijing Institute of Technology |
Yang, Yi | Beijing Institute of Technology |
Song, Wenjie | Beijing Institute of Technology |
Keywords: Search and Rescue Robots, Planning under Uncertainty, Field Robots
Abstract: Unknown area exploration is a crucial but challeng- ing task for autonomous driving of unmanned ground vehicles (UGV) in unknown off-road environments. However, the exploration efficiency of single UGV is low due to its limited sensing range. To solve this problem, this paper proposes a risk-inspired aerial active exploration system, which utilizes the flexibility and view field advantages of Unmanned Aerial Vehicles (UAV) to guide the UGV in unknown off-road environments. Firstly, a fast terrain risk mapping method that can be used for both UAV and UGV is developed. This method efficiently combines quadtree and hash table data structure to enable UAV to analyze large scale terrain point cloud in real time. Based on the risk mapping result, a risk inspired active exploration method is proposed to actively search a safe reference path for the UGV, which introduces terrain risk information into the process of travel point selection. Finally, the reference path is gradually generated and optimized, so that the UGV can safely and smoothly follow the path to the target location. Compared with single UGV exploration system, our approach reduces the overall path risk by 26.8% in simulated experiments, showing that the proposed system can enhance autonomous driving of the UGV and help it effectively avoid high-risk areas in unknown off-road environments.
|
|
ThBT2-CC Oral Session, CC-311 |
Add to My Program |
Autonomous Agents I |
|
|
Co-Chair: Wang, Shenlong | University of Illinois at Urbana-Champaign |
|
13:30-15:00, Paper ThBT2-CC.1 | Add to My Program |
Sim-On-Wheels: Physical World in the Loop Simulation for Self-Driving |
|
Shen, Yuan | UIUC |
Chandaka, Bhargav | University of Illinois at Urbana-Champaign |
Lin, Zhi-Hao | University of Illinois at Urbana-Champaign |
Zhai, Albert | UIUC |
Cui, Hang | University of Illinois at Urbana-Ch |
Forsyth, David | University of Illinois at Urbana-Champaign |
Wang, Shenlong | University of Illinois at Urbana-Champaign |
Keywords: Autonomous Agents, Simulation and Animation, Robot Safety
Abstract: We present Sim-on-Wheels, a safe, realistic, and vehicle-in-loop framework to test autonomous vehicles’ performance in the real world under safety-critical scenarios. Sim-on-wheels runs on a self-driving vehicle operating in the physical world. It creates virtual traffic participants with risky behaviors and seamlessly inserts the virtual events into images perceived from the physical world in real-time. The manipulated images are fed into autonomy, allowing the self-driving vehicle to react to such virtual events. The full pipeline runs on the actual vehicle and interacts with the physical world, but the safety-critical events it sees are virtual. Sim-on-Wheels is safe, interactive, realistic, and easy to use. The experiments demonstrate the potential of Sim-on-Wheels to facilitate the process of testing autonomous driving in challenging real-world scenes with high fidelity and low risk.
|
|
13:30-15:00, Paper ThBT2-CC.2 | Add to My Program |
Safety-Critical Scenario Generation Via Reinforcement Learning Based Editing |
|
Liu, Haolan | University of California San Diego |
Zhang, Liangjun | Baidu |
Hari, Siva Kumar Sastry | NVIDIA |
Zhao, Jishen | UC San Diego |
Keywords: Autonomous Agents, Reinforcement Learning, AI-Based Methods
Abstract: Generating safety-critical scenarios is essential for testing and verifying the safety of autonomous vehicles. Traditional optimization techniques suffer from the curse of dimensionality and limit the search space to fixed parameter spaces. To address these challenges, we propose a deep reinforcement learning approach that generates scenarios by sequential editing, such as adding new agents or modifying the trajectories of the existing agents. Our framework employs a reward function consisting of both risk and plausibility objectives. The plausibility objective leverages generative models, such as a variational autoencoder, to learn the likelihood of the generated parameters from the training datasets; It penalizes the generation of unlikely scenarios. Our approach overcomes the dimensionality challenge and explores a wide range of safety-critical scenarios. Our evaluation demonstrates that the proposed method generates safety-critical scenarios of higher quality compared with previous approaches.
|
|
13:30-15:00, Paper ThBT2-CC.3 | Add to My Program |
Robust Autonomous Vehicle Pursuit without Expert Steering Labels |
|
Pan, Jiaxin | Technical University of Munich |
Zhou, Changyao | Technical University of Munich |
Gladkova, Mariia | Technical University of Munich |
Khan, Qadeer | Technical University of Munich |
Cremers, Daniel | Technical University of Munich |
Keywords: Autonomous Agents, Intelligent Transportation Systems, Motion Control
Abstract: In this work, we present a learning method for lateral and longitudinal motion control of an ego-vehicle for vehicle pursuit. The car being controlled does not have a pre-defined route; rather, it adapts to follow a target vehicle while maintaining a safety distance. To train our model, we do not rely on steering labels recorded by an expert driver but effectively leverage a classical controller as an offline label generation tool. In addition, we account for the errors in the predicted control values, which can lead to a loss of tracking and catastrophic crashes of the controlled vehicle. To this end, we propose an effective data augmentation approach, which allows the training of a network capable of handling different views of the target vehicle. During the pursuit, the target vehicle is firstly localized using a Convolutional Neural Network. The network takes a single RGB image along with cars' velocities and estimates the target vehicle's pose with respect to the ego-vehicle. This information is then fed to a Multi-Layer Perceptron, which regresses the control commands for the ego-vehicle, namely throttle and steering angle. We extensively validate our approach using the CARLA simulator on various terrains. Our method demonstrates real-time performance robustness to different scenarios, including unseen trajectories and high route completion. Our project page can be found at https://changyaozhou.github.io/Autonomous-Vehicle-Pursuit/.
|
|
13:30-15:00, Paper ThBT2-CC.4 | Add to My Program |
Risk-Aware Trajectory Prediction by Incorporating Spatio-Temporal Traffic Interaction Analysis |
|
Thuremella, Divya | University of Oxford, Robotics Institute |
Ince, Lewis | University of Oxford, Robotics Institute |
Kunze, Lars | University of Oxford |
Keywords: Autonomous Agents, Intelligent Transportation Systems, Computer Vision for Transportation
Abstract: To operate in open-ended environments where humans interact in complex, diverse ways, autonomous robots must learn to predict their behaviour, especially when that behavior is potentially dangerous to other agents or to the robot. However, reducing the risk of accidents requires prior knowledge of where potential collisions may occur and how. Therefore, we propose to gain this information by analyzing locations and speeds that commonly correspond to high-risk interactions within the dataset, and use it within training to generate better predictions in high risk situations. Through these location-based and speed-based re-weighting techniques, we achieve improved overall performance, as measured by most-likely FDE and KDE, as well as improved performance on high-speed vehicles, and vehicles within high-risk locations.
|
|
13:30-15:00, Paper ThBT2-CC.5 | Add to My Program |
Reinforcement Learning with Human Feedback for Realistic Traffic Simulation |
|
Cao, Yulong | NVIDIA |
Ivanovic, Boris | NVIDIA |
Xiao, Chaowei | University of Michigan |
Pavone, Marco | Stanford University |
Keywords: Autonomous Agents, Deep Learning Methods, Reinforcement Learning
Abstract: In light of the challenges and costs of real-world testing, autonomous vehicle developers often rely on testing in simulation for the creation of reliable systems. A key element of effective simulation is the incorporation of realistic traffic models that align with human knowledge, an aspect that has proven challenging due to the need to balance realism and diversity. Towards this end, in this work we develop a framework that employs reinforcement learning from human feedback (RLHF) to enhance the realism of existing traffic models. This work also identifies two main challenges: capturing the nuances of human preferences on realism and unifying diverse traffic simulation models. To tackle these issues, we propose using human feedback for alignment and employ RLHF due to its sample efficiency. We also introduce the first dataset for realism alignment in traffic modeling to support such research. Our framework, named TrafficRLHF, demonstrates its proficiency in generating realistic traffic scenarios that are well-aligned with human preferences through comprehensive evaluations on the nuScenes dataset.
|
|
13:30-15:00, Paper ThBT2-CC.6 | Add to My Program |
Plug in the Safety Chip: Enforcing Constraints for LLM-Driven Robot Agents |
|
Yang, Ziyi | Brown University |
Sundara Raman, Shreyas | Brown University |
Shah, Ankit Jayesh | Massachusetts Institute of Technology |
Tellex, Stefanie | Brown |
Keywords: Agent-Based Systems, Formal Methods in Robotics and Automation, Safety in HRI
Abstract: Recent advancements in large language models (LLMs) have enabled a new research domain, LLM agents, for solving robotics and planning tasks by leveraging the world knowledge and general reasoning abilities of LLMs obtained during pretraining. However, while considerable effort has been made to teach the robot the "dos", the "don'ts" received relatively less attention. We argue that, for any practical usage, it is as crucial to teach the robot the "don'ts": conveying explicit instructions about prohibited actions, assessing the robot's comprehension of these restrictions, and, most importantly, ensuring compliance. Moreover, verifiable safe operation is essential for deployments that satisfy worldwide standards such as ISO 61508, which defines standards for safely deploying robots in industrial factory environments worldwide. Aiming at deploying the LLM agents in a collaborative environment, we propose a queryable safety constraint module based on linear temporal logic (LTL) that simultaneously enables natural language (NL) to temporal constraints encoding, safety violation reasoning and explaining, and unsafe action pruning. To demonstrate the effectiveness of our system, we conducted experiments in VirtualHome environment and on a real robot. The experimental results show that our system strictly adheres to the safety constraints and scales well with complex temporal constraints, highlighting its potential for practical utility.
|
|
13:30-15:00, Paper ThBT2-CC.7 | Add to My Program |
InterCoop: Spatio-Temporal Interaction Aware Cooperative Perception for Networked Vehicles |
|
Wang, Wentao | Sun Yat-Sen University |
Xu, Haoran | Sun Yat-Sen University |
Tan, Guang | Sun Yat-Sen University |
Keywords: Autonomous Agents, Computer Vision for Automation, Networked Robots
Abstract: In autonomous driving, leveraging cooperative perception through vehicle-to-vehicle (V2V) communication is considered crucial for enhancing traffic safety and efficiency. However, existing methods often simplify the handling of perception data from multiple vehicles. In these approaches, the ego-vehicle aggregates observations from all neighboring connected cooperative vehicles (CCV), without considering the interactions between agents or making differentiated use of the acquired sensing data. This approach can result in suboptimal system performance due to the amplification of noise and the large transmission delay. In this paper, we introduce a novel approach to cooperative perception. By fusing both the road topology and trajectory histories of neighboring CCVs, our model learns an interaction score for each CCV. These scores prioritize vehicles that are most relevant to the current driving scenario, offering valuable guidance for selective fusion of sensor data, thereby enhancing driving decision-making. The proposed method is validated through extensive experiments conducted on the CARLA simulator. Results demonstrate that our approach surpasses existing methods in terms of performance and robustness.
|
|
13:30-15:00, Paper ThBT2-CC.8 | Add to My Program |
Improving Autonomous Driving Safety with POP: A Framework for Accurate Partially Observed Trajectory Predictions |
|
Wang, Sheng | Hong Kong University of Science and Technology |
Chen, Yingbing | The Hongkokng University of Science and Technology |
Cheng, Jie | Hong Kong University of Science and Technology |
Mei, Xiaodong | HKUST |
Xin, Ren | The Hong Kong University of Science and Technology |
Song, Yongkang | Ningbo Lotus Robotics Co., Ltd |
Liu, Ming | Hong Kong University of Science and Technology (Guangzhou) |
Keywords: Autonomous Agents, Deep Learning Methods, Human Detection and Tracking
Abstract: Accurate trajectory prediction is crucial for safe and efficient autonomous driving, but handling partial observations presents significant challenges. To address this, we propose a novel trajectory prediction framework called Partial Observations Prediction (POP) for congested urban road scenarios. The framework consists of two key stages: self-supervised learning (SSL) and feature distillation. POP first employs SLL to help the model learn to reconstruct history representations, and then utilizes feature distillation as the fine-tuning task to transfer knowledge from the teacher model, which has been pre-trained with complete observations, to the student model, which has only few observations. POP achieves comparable results to top-performing methods in open-loop experiments and outperforms the baseline method in closed-loop simulations, including safety metrics. Qualitative results illustrate the superiority of POP in providing reasonable and safe trajectory predictions.
|
|
13:30-15:00, Paper ThBT2-CC.9 | Add to My Program |
FIMP: Future Interaction Modeling for Multi-Agent Motion Prediction |
|
Woo, Sungmin | Yonsei University |
Kim, Minjung | Yonsei University |
Kim, Donghyeong | Yonsei University |
Jang, Sungjun | Yonsei University |
Lee, Sangyoun | Yonsei University |
Keywords: Autonomous Agents, AI-Based Methods, Computer Vision for Transportation
Abstract: Multi-agent motion prediction is a crucial concern in autonomous driving, yet it remains a challenge owing to the ambiguous intentions of dynamic agents and their intricate interactions. Existing studies have attempted to capture interactions between road entities by using the definite data in history timesteps, as future information is not available and involves high uncertainty. However, without sufficient guidance for capturing future states of interacting agents, they frequently produce unrealistic trajectory overlaps. In this work, we propose Future Interaction modeling for Motion Prediction (FIMP), which captures potential future interactions in an end-to-end manner. FIMP adopts a future decoder that implicitly extracts the potential future information in an intermediate feature-level, and identifies the interacting entity pairs through future affinity learning and top-k filtering strategy. Experiments show that our future interaction modeling improves the performance remarkably, leading to superior performance on the Argoverse motion forecasting benchmark.
|
|
ThBT3-CC Oral Session, CC-313 |
Add to My Program |
Calibration and Identification I |
|
|
Chair: Gossard, Thomas | University of Tübingen |
Co-Chair: Su, Hao | UCSD |
|
13:30-15:00, Paper ThBT3-CC.1 | Add to My Program |
EasyHeC: Accurate and Automatic Hand-Eye Calibration Via Differentiable Rendering and Space Exploration |
|
Chen, Linghao | Zhejiang University |
Qin, Yuzhe | UC San Diego |
Zhou, Xiaowei | Zhejiang University |
Su, Hao | UCSD |
Keywords: Calibration and Identification, Computer Vision for Automation, Recognition
Abstract: Hand-eye calibration is a critical task in robotics, as it directly affects the efficacy of critical operations such as manipulation and grasping. Traditional methods for achieving this objective necessitate the careful design of joint poses and the use of specialized calibration markers, while most recent learning-based approaches using solely pose regression are limited in their abilities to diagnose inaccuracies. In this work, we introduce a new approach to hand-eye calibration called EasyHeC, which is markerless, white-box, and delivers superior accuracy and robustness. We propose to use two key technologies: differentiable rendering-based camera pose optimization and consistency-based joint space exploration, which enables accurate end-to-end optimization of the calibration process and eliminates the need for the laborious manual design of robot joint poses. Our evaluation demonstrates superior performance in synthetic and real-world datasets, enhancing downstream manipulation tasks by providing precise camera poses for locating and interacting with objects. The code is available at the project page: https://ootts.github.io/easyhec.
|
|
13:30-15:00, Paper ThBT3-CC.2 | Add to My Program |
Zero-Training LiDAR-Camera Extrinsic Calibration Method Using Segment Anything Model |
|
Luo, Zhaotong | Tsinghua University |
Yan, Guohang | Shanghai AI Laboratory |
Cai, Xinyu | Shanghai AI Laboratory |
Shi, Botian | Shanghai AI Laboratory |
Keywords: Calibration and Identification, Computer Vision for Automation, Sensor Fusion
Abstract: Extrinsic calibration for LiDAR and camera is an essential prerequisite for sensor fusion. Recently, automatic and target-less extrinsic calibration has become the mainstream of academic research. However, geometric feature-based methods still have requirements on the scene. Deep learning methods, while achieving high accuracy and good adaptability, rely on large annotated dataset and need additional training. We propose a novel LiDAR-camera calibration method by using the Segment Anything Model(SAM) without additional training. With the automatically generated masks, we optimize the extrinsic parameters by maximizing the consistency score of the point attributes that fall on each mask. The point cloud attributes include intensity, normal vector and segmentation class. Experiments on different real-world dataset demonstrate the accuracy and robustness of our proposed method. The code is available at https://github.com/OpenCalib/CalibAnything.
|
|
13:30-15:00, Paper ThBT3-CC.3 | Add to My Program |
Dive Deeper into Rectifying Homography for Stereo Camera Online Self-Calibration |
|
Zhao, Hongbo | Tongji University |
Zhang, Yikang | Tongji University |
Chen, Qijun | Tongji University |
Fan, Rui | Tongji University |
Keywords: Calibration and Identification, Computer Vision for Automation, SLAM
Abstract: Accurate estimation of stereo camera extrinsic parameters is crucial to guarantee the performance of stereo matching algorithms. In prior arts, the online self-calibration of stereo cameras has commonly been formulated as a specialized visual odometry problem, without taking into account the principles of stereo rectification. In this paper, we first delve deeply into the concept of rectifying homography, which serves as the cornerstone for the development of our novel stereo camera online self-calibration algorithm, for cases where only a single pair of images is available. Furthermore, we introduce a simple yet effective solution for global optimum extrinsic parameter estimation in the presence of stereo video sequences. Additionally, we emphasize the impracticality of using three Euler angles and three components in the translation vectors for performance quantification. Instead, we introduce four new evaluation metrics to quantify the robustness and accuracy of extrinsic parameter estimation, applicable to both single-pair and multi-pair cases. Extensive experiments conducted across indoor and outdoor environments using various experimental setups validate the effectiveness of our proposed algorithm. The comprehensive evaluation results demonstrate its superior performance in comparison to the baseline algorithm. Our source code, demo video, and supplement are publicly available at mias.group/StereoCalibrator.
|
|
13:30-15:00, Paper ThBT3-CC.4 | Add to My Program |
Online Camera-LiDAR Calibration Monitoring and Rotational Drift Tracking |
|
Moravec, Jaroslav | Faculty of Electrical Engineering, Czech Technical University In |
Šára, Radim | Faculty of Electrical Enginnering, Czech Technical University In |
Keywords: Calibration and Identification, Computer Vision for Transportation, Sensor Fusion, LiDAR-Camera Systems
Abstract: The relative poses of visual perception sensors distributed over a vehicle’s body may vary due to dynamic forces, thermal dilations, or minor accidents. This paper proposes two methods, OCAMO and LTO, that monitor and track the LiDAR-Camera extrinsic calibration parameters online. Calibration monitoring provides a certificate for reference-calibration parameters validity. Tracking follows the calibration parameters drift in time. OCAMO is based on an adaptive online stochastic optimization with a memory of past evolution. LTO uses a fixed-grid search for the optimal parameters per frame and without memory. Both methods use low-level point-like features, a robust kernel-based loss function, and work with a small memory footprint and computational overhead. Both include a preselection of informative data that limits their divergence. The statistical accuracy of both calibration monitoring methods is over 98%, whereas OCAMO monitoring can detect small decalibrations better, and LTO monitoring reacts faster on abrupt decalibrations. The tracking variants of both methods follow random calibration drift with an accuracy of about 0.03 degrees in the yaw angle.
|
|
13:30-15:00, Paper ThBT3-CC.5 | Add to My Program |
PeLiCal: Targetless Extrinsic Calibration Via Penetrating Lines for RGB-D Cameras with Limited Co-Visibility |
|
Shin, Jaeho | SNU |
Yun, Seungsang | Seoul National University, SNU |
Kim, Ayoung | Seoul National University |
Keywords: Calibration and Identification, RGB-D Perception, SLAM
Abstract: RGB-D cameras are crucial in robotic perception, given their ability to produce images augmented with depth data. However, their limited field of view (FOV) often requires multiple cameras to cover a broader area. In multi-camera RGB-D setups, the goal is typically to reduce camera overlap, optimizing spatial coverage with as few cameras as possible. The extrinsic calibration of these systems introduces additional complexities. Existing methods for extrinsic calibration either necessitate specific tools or highly depend on the accuracy of camera motion estimation. To address these issues, we present PeLiCal, a novel line-based calibration approach for RGB-D camera systems exhibiting limited overlap. Our method leverages long line features from surroundings, and filters out outliers with a novel convergence voting algorithm, achieving targetless, real-time, and outlier-robust performance compared to existing methods. We open source our implementation on https://github.com/joomeok/PeLiCal.git.
|
|
13:30-15:00, Paper ThBT3-CC.6 | Add to My Program |
A Novel, Efficient and Accurate Method for Lidar Camera Calibration |
|
Huang, Zhanhong | Worcester Polytechnic Institute |
Zhang, Xiao | Worcester Polytechnic Institute |
Garcia, Antony | Worcester Polytechnic Institute |
Huang, Xinming | Worcester Polytechnic Institute |
Keywords: Calibration and Identification, Sensor Fusion
Abstract: As autonomous systems evolve, the precise calibration of lidar and camera sensors remains a pivotal concern. Among the myriad of available techniques, target-based calibration methods, which employ planar boards with distinct geometry and image patterns, have been a popular choice. These methods simplify the task of extracting corresponding features between the image and lidar point cloud. But many of these approaches also face a significant challenge, which is their sensitivity to lidar resolution and Field of View (FOV), which may degrade the reliability of the calibration results. Therefore, our research introduces a novel calibration method using a uniquely designed acrylic checkerboard which allows the lidar beam to pass through the white grids and reflect back from the black grids. This innovative technique sidesteps the common challenges associated with lidar feature extraction. Our method's distinct advantage lies in its ability to perform accurate calibrations at close distances, owing to the efficient feature extraction from both lidar and camera sensors. This novel, efficient, and accurate method can provide state-of-the-art results for camera lidar calibration in the field.
|
|
13:30-15:00, Paper ThBT3-CC.7 | Add to My Program |
An Extrinsic Calibration Method between LiDAR and GNSS/INS for Autonomous Driving |
|
Pi, Jiahao | Shanghai AI Laboratory |
Yan, Guohang | Shanghai AI Laboratory |
Wang, Chengjie | Fudan University |
Cai, Xinyu | Shanghai AI Laboratory |
Shi, Botian | Shanghai AI Laboratory |
Keywords: Calibration and Identification, Sensor Fusion, Mapping
Abstract: Accurate and reliable sensor calibration is critical for fusing LiDAR and inertial measurements in autonomous driving. This paper proposes a novel three-stage extrinsic calibration method between LiDAR and GNSS/INS for autonomous driving. The first stage can quickly calibrate the extrinsic parameters between the sensors through point cloud surface features so that the extrinsic can be narrowed from a large initial error to a small error range in little time. The second stage can further calibrate the extrinsic parameters based on LiDAR-mapping space occupancy while removing motion distortion. In the final stage, the z-axis (the vertical direction relative to the ground plane) errors caused by the plane motion of the autonomous vehicle are corrected, and an accurate extrinsic parameter is finally obtained. Specifically, This method utilizes the planar features in the environment, making it possible to quickly carry out calibration. Experimental results on real-world datasets demonstrate the reliability and accuracy of our method. The codes are open-sourced on the Github website. The code link is https://github.com/OpenCalib/LiDAR2INS.
|
|
13:30-15:00, Paper ThBT3-CC.8 | Add to My Program |
SGCalib: A Two-Stage Camera-LiDAR Calibration Method Using Semantic Information and Geometric Features |
|
Lin, Zhipeng | The Chinese University of Hong Kong |
Gao, Zhi | Temasek Laboratories @ NUS |
Liu, Xinyi | Wuhan University |
Wang, Jialiang | Wuhan University |
Song, Weiwei | Peng Cheng Laboratory |
Chen, Ben M. | Chinese University of Hong Kong |
Li, Chenyang | Wuhan University |
Huang, Yue | Wuhan University |
Zhu, Yuhan | Wuhan University |
Keywords: Calibration and Identification, Sensor Fusion, Object Detection, Segmentation and Categorization
Abstract: Extrinsic calibration is an essential prerequisite for the applications of camera-LiDAR fusion. Existing methods either suffer from the complex offline setting of man-made targets or tend to produce suboptimal and unrobust results. In this paper, we propose an online two-stage calibration method that estimates robust and accurate extrinsic parameters between camera and LiDAR. This is a novel work to use semantic information and geometric features jointly in calibration to promote accuracy and robustness. In the first stage, we detect objects in the image and point cloud and build graphs on the objects using Delaunay triangulation. Then, we design a novel graph matching algorithm to associate the objects in the two data domains and extract pairs of 2D-3D points. Using the PnP solver, we get robust initial extrinsic parameters. Then, in the second stage, we design a new optimization formulation with semantic information and geometric features to generate accurate extrinsic parameters with the initial value from the first stage. Extensive experiments on solid-state LiDAR, conventional spinning LiDAR and KITTI datasets have verified the robustness and accuracy of our method which outperforms existing works. We will share the code publicly to benefit the community (after review stages).
|
|
13:30-15:00, Paper ThBT3-CC.9 | Add to My Program |
EWand: An Extrinsic Calibration Framework for Wide Baseline Frame-Based and Event-Based Camera Systems |
|
Gossard, Thomas | University of Tübingen |
Ziegler, Andreas | University of Tuebingen |
Kolmar, Levin | University Tuebingen |
Tebbe, Jonas | University of Tübingen |
Zell, Andreas | University of Tübingen |
Keywords: Calibration and Identification, Visual Tracking
Abstract: Accurate calibration is crucial for using multiple cameras to triangulate the position of objects precisely. However, it is also a time-consuming process that needs to be repeated for every displacement of the cameras. The standard approach is to use a printed pattern with known geometry to estimate the intrinsic and extrinsic parameters of the cameras. The same idea can be applied to event-based cameras, though it requires extra work. By using frame reconstruction from events, a printed pattern can be detected. A blinking pattern can also be displayed on a screen. Then, the pattern can be directly detected from the events. Such calibration methods can provide accurate intrinsic calibration for both frame- and event-based cameras. However, using 2D patterns has several limitations for multi-camera extrinsic calibration, with cameras possessing highly different points of view and a wide baseline. The 2D pattern can only be detected from one direction and needs to be of significant size to compensate for its distance to the camera. This makes the extrinsic calibration time-consuming and cumbersome. To overcome these limitations, we propose eWand, a new method that uses blinking LEDs inside opaque spheres instead of a printed or displayed pattern. Our method provides a faster, easier-to-use extrinsic calibration approach that maintains high accuracy for both event- and frame-based cameras.
|
|
ThBT4-CC Oral Session, CC-315 |
Add to My Program |
Path Planning for Multiple Mobile Robots or Agents II |
|
|
Chair: Johnson, Aaron M. | Carnegie Mellon University |
Co-Chair: Sartoretti, Guillaume Adrien | National University of Singapore (NUS) |
|
13:30-15:00, Paper ThBT4-CC.1 | Add to My Program |
Benchmarking Multi-Robot Coordination in Realistic, Unstructured Human-Shared Environments |
|
Heuer, Lukas | Örebro University, Robert Bosch GmbH |
Palmieri, Luigi | Robert Bosch GmbH |
Mannucci, Anna | Robert Bosch GmbH Corporate Research |
Koenig, Sven | University of Southern California |
Magnusson, Martin | Örebro University |
Keywords: Path Planning for Multiple Mobile Robots or Agents, Human-Aware Motion Planning, Multi-Robot Systems
Abstract: Coordinating a fleet of robots in unstructured, human-shared environments is challenging. Human behavior is hard to predict, and its uncertainty impacts the performance of the robotic fleet. Various multi-robot planning and coordination algorithms have been proposed, including Multi-Agent Path Finding (MAPF) methods to precedence-based algorithms. However, it is still unclear how human presence impacts different coordination strategies in both simulated environments and the real world. With the goal of studying and further improving multi-robot planning capabilities in those settings, we propose a method to develop and benchmark different multi-robot coordination algorithms in realistic, unstructured and human-shared environments. To this end, we introduce a multi-robot benchmark framework that is based on state-of-the-art open-source navigation and simulation frameworks and can use different types of robots, environments and human motion models. We show a possible application of the benchmark framework with two different environments and three centralized coordination methods (two MAPF algorithms and a loosely-coupled coordination method based on precedence constraints). We evaluate each environment for different human densities to investigate its impact on each coordination method. We also present preliminary results that show how informing each coordination method about human presence can help the coordination method to find faster paths for the robots.
|
|
13:30-15:00, Paper ThBT4-CC.2 | Add to My Program |
Conflict Area Prediction for Boosting Search-Based Multi-Agent Pathfinding Algorithms |
|
Ryu, Jaesung | Chung-Ang University |
Kwon, Youngjoon | Chung-Ang University |
Yoon, Sangho | Chung-Ang University |
Lee, Kyungjae | Chung-Ang University |
Keywords: Path Planning for Multiple Mobile Robots or Agents, Integrated Planning and Learning, Multi-Robot Systems
Abstract: We address the challenge of efficiently controlling multi-agent systems, crucial in fields like logistics and traffic management. We propose a novel approach that combines learning-based techniques with search-based methods, focusing on enhancing the conflict-based search (CBS). The CBS ensures optimality but suffers from increasing complexity as agents or maps grow. To tackle this, we leverage learning-based approaches to enhance computational efficiency. By training a conflict area prediction (CAP) network, we anticipate potential conflict zones, allowing for low-level path planners to explore conflict-free paths. Our experiments demonstrate the effectiveness of our method in reducing computational demands compared to existing approaches.
|
|
13:30-15:00, Paper ThBT4-CC.3 | Add to My Program |
AMSwarmX: Safe Swarm Coordination in CompleX Environments Via Implicit Non-Convex Decomposition of the Obstacle-Free Space |
|
Adajania, Vivek Kantilal | University of Toronto |
Zhou, Siqi | Technical University of Munich |
Singh, Arun Kumar | University of Tartu |
Schoellig, Angela P. | TU Munich |
Keywords: Path Planning for Multiple Mobile Robots or Agents, Motion and Path Planning, Collision Avoidance
Abstract: Quadrotor motion planning in complex environments leverage the concept of safe flight corridor (SFC) to facilitate static obstacle avoidance. Typically, SFCs are constructed through convex decomposition of the environment's free space into cuboids, convex polyhedra, or spheres. However, such SFCs can be overly conservative when dealing with a quadrotor swarm, substantially limiting the available free space for quadrotors to coordinate. This paper presents an Alternating Minimization-based approach that does not require building a conservative free-space approximation. Instead, both static and dynamic collision constraints are treated in a unified manner. Dynamic collisions are handled based on shared position trajectories of the quadrotors. Static obstacle avoidance is coupled with distance queries from the Octomap, providing an implicit non-convex decomposition of free space. As a result, our approach is scalable to arbitrary complex environments. Through extensive comparisons in simulation, we demonstrate a 60% improvement in success rate, an average 1.8times reduction in mission completion time, and an average 23times reduction in per-agent computation time compared to SFC-based approaches. We also experimentally validated our approach using a Crazyflie quadrotor swarm of up to 12 quadrotors in obstacle-rich environments. The code, supplementary materials, and videos are released for reference.
|
|
13:30-15:00, Paper ThBT4-CC.4 | Add to My Program |
Conflict-Based Model Predictive Control for Scalable Multi-Robot Motion Planning |
|
Tajbakhsh, Amirardalan | Carnegie Mellon University |
Biegler, Lorenz | Carnegie Mellon University |
Johnson, Aaron M. | Carnegie Mellon University |
Keywords: Path Planning for Multiple Mobile Robots or Agents, Motion and Path Planning, Multi-Robot Systems
Abstract: This paper presents a scalable multi-robot motion planning algorithm called Conflict-Based Model Predictive Control (CB-MPC). Inspired by Conflict-Based Search (CBS), the planner leverages a modified high-level conflict tree to efficiently resolve robot-robot conflicts in the continuous space, while reasoning about each agent's kinematic and dynamic constraints and actuation limits using MPC as the low-level planner. We show that tracking high-level multi-robot plans with a vanilla MPC controller is insufficient, and results in unexpected collisions in tight navigation scenarios under realistic execution. Compared to other variations of multi-robot MPC like joint, prioritized, and distributed, we demonstrate that CB-MPC improves the executability and success rate, allows for closer robot-robot interactions, and scales better with higher numbers of robots without compromising the solution quality across a variety of environments.
|
|
13:30-15:00, Paper ThBT4-CC.5 | Add to My Program |
Db-CBS: Discontinuity-Bounded Conflict-Based Search for Multi-Robot Kinodynamic Motion Planning |
|
Moldagalieva, Akmaral | Technical University of Berlin |
Ortiz-Haro, Joaquim | TU Berlin |
Toussaint, Marc | TU Berlin |
Hoenig, Wolfgang | TU Berlin |
Keywords: Path Planning for Multiple Mobile Robots or Agents, Motion and Path Planning, Nonholonomic Motion Planning
Abstract: This paper presents a multi-robot kinodynamic motion planner that enables a team of robots with different dynamics, actuation limits, and shapes to reach their goals in challenging environments. We solve this problem by combining Conflict-Based Search (CBS), a multi-agent path finding method, and discontinuity-bounded A*, a single-robot kinodynamic motion planner. Our method, db-CBS, operates in three levels. Initially, we compute trajectories for individual robots using a graph search that allows bounded discontinuities between precomputed motion primitives. The second level identifies inter-robot collisions and resolves them by imposing constraints on the first level. The third and final level uses the resulting solution with discontinuities as an initial guess for a joint space trajectory optimization. The procedure is repeated with a reduced discontinuity bound. Our approach is anytime, probabilistically complete, asymptotically optimal, and finds near-optimal solutions quickly. Experimental results with robot dynamics such as unicycle, double integrator, and car with trailer in different settings show that our method is capable of solving challenging tasks with a higher success rate and lower cost than the existing state-of-the-art.
|
|
13:30-15:00, Paper ThBT4-CC.6 | Add to My Program |
ALPHA Attention-Based Long-Horizon Pathfinding in Highly-Structured Areas |
|
He, Chengyang | National University Singapore |
Yang, Tianze | National University of Singapore |
Duhan, Tanishq Harish | National University of Singapore |
Wang, Yutong | National University of Singapore |
Sartoretti, Guillaume Adrien | National University of Singapore (NUS) |
Keywords: Path Planning for Multiple Mobile Robots or Agents, Multi-Robot Systems, Motion and Path Planning
Abstract: The multi-agent pathfinding (MAPF) problem seeks collision-free paths for a team of agents from their current positions to their pre-set goals in a known environment, and is an essential problem found at the core of many logistics, transportation, and general robotics applications. Existing learning-based MAPF approaches typically only let each agent make decisions based on a limited field-of-view (FOV) around its position, as a natural means to fix the input dimensions of its policy network. However, this often makes policies short-sighted, since agents lack the ability to perceive and plan for obstacles/agents beyond their FOV. To address this challenge, we propose ALPHA, a new framework combining the use of ground truth proximal (local) information and fuzzy distal (global) information to let agents sequence local decisions based on the full current state of the system, and avoid such myopicity. We further allow agents to make short-term predictions about each others' paths, as a means to reason about each others' path intentions, thereby enhancing the level of cooperation among agents at the whole system level. Our neural structure relies on a Graph Transformer architecture to allow agents to selectively combine these different sources of information and reason about their inter-dependencies at different spatial scales. Our simulation experiments demonstrate that ALPHA outperforms both globally-guided MAPF solvers and communication-learning based ones, showcasing its potential towards scalability in realistic deployments.
|
|
13:30-15:00, Paper ThBT4-CC.7 | Add to My Program |
Online On-Demand Multi-Robot Coverage Path Planning |
|
Mitra, Ratijit | IIT Kanpur |
Saha, Indranil | IIT Kanpur |
Keywords: Path Planning for Multiple Mobile Robots or Agents, Multi-Robot Systems, Motion and Path Planning
Abstract: We present an online centralized path planning algorithm to cover a large, complex, unknown workspace with multiple homogeneous mobile robots. Our algorithm is horizon-based, synchronous, and on-demand. The recently proposed horizon-based synchronous algorithms compute the paths for all the robots in each horizon, significantly increasing the computation burden in large workspaces with many robots. As a remedy, we propose an algorithm that computes the paths for a subset of robots that have traversed previously computed paths entirely (thus on-demand) and reuses the remaining paths for the other robots. We formally prove that the algorithm guarantees the complete coverage of the unknown workspace. Experimental results on several standard benchmark workspaces show that our algorithm scales to hundreds of robots in large complex workspaces and consistently outperforms a state-of-the-art online centralized multi-robot coverage path planning algorithm in terms of the time required to achieve complete coverage. For validation, we perform ROS+Gazebo simulations in five 2D grid benchmark workspaces with 10 Quadcopters and 10 TurtleBots, respectively. In addition, to establish its practical feasibility, we conduct one indoor experiment with two real TurtleBot2 robots and one outdoor experiment with three real Quadcopters.
|
|
13:30-15:00, Paper ThBT4-CC.8 | Add to My Program |
Scalable Multi-Robot Motion Planning for Congested Environments with Topological Guidance |
|
McBeth, Courtney | University of Illinois Urbana-Champaign |
Motes, James | University of Illinois Urbana-Champaign |
Uwacu, Diane | Texas A&M University |
Morales, Marco | University of Illinois at Urbana-Champaign & Instituto Tecnológ |
Amato, Nancy | University of Illinois |
Keywords: Path Planning for Multiple Mobile Robots or Agents, Multi-Robot Systems, Motion and Path Planning
Abstract: Multi-robot motion planning (MRMP) is the problem of finding collision-free paths for a set of robots in a continuous state space. The difficulty of MRMP increases with the number of robots and is exacerbated in environments with narrow passages that robots must pass through, like warehouse aisles where coordination between robots is required. In single-robot settings, topology-guided motion planning methods have shown improved performance in these constricted environments. In this work, we extend an existing topology-guided single-robot motion planning method to the multi-robot domain to leverage the improved efficiency provided by topological guidance. We demonstrate our method’s ability to efficiently plan paths in complex environments with many narrow passages, scaling to robot teams of size up to 25 times larger than existing methods in this class of problems. By leveraging knowledge of the topology of the environment, we also find higher-quality solutions than other methods.
|
|
13:30-15:00, Paper ThBT4-CC.9 | Add to My Program |
Mixed Integer Programming for Time-Optimal Multi-Robot Coverage Path Planning with Efficient Heuristics |
|
Tang, Jingtao | Simon Fraser University |
Ma, Hang | Simon Fraser University |
Keywords: Path Planning for Multiple Mobile Robots or Agents, Multi-Robot Systems, Motion and Path Planning
Abstract: We investigate the time-optimal Multi-Robot Coverage Path Planning (MCPP) problem for both unweighted and weighted terrains, which aims to minimize the coverage time, defined as the maximum travel time of all robots. Specifically, we focus on a reduction from the MCPP problem to the Rooted Min-Max Tree Cover (RMMTC) problem. For the first time, we propose a Mixed Integer Programming (MIP) model to optimally solve the RMMTC problem, resulting in an MCPP solution with a coverage time that is provably at most four times the optimal. Moreover, we propose two suboptimal yet effective heuristics that reduce the number of variables in the MIP model, thus improving its efficiency for large-scale MCPP problems. We show that both heuristics result in reduced-size MIP models that remain complete for RMMTC problems. Additionally, we explore the use of model optimization warm-startup to further improve the efficiency of both the original MIP model and the reduced-size MIP models. We validate the effectiveness of our MIP-based MCPP planner through experiments that compare it with two state-of-the-art MCPP planners on various instances, demonstrating a reduction in the coverage time by an average of 42.42% and 39.16% over them, respectively.
|
|
ThBT5-CC Oral Session, CC-411 |
Add to My Program |
Visual Learning II |
|
|
Chair: Li, Zhibin (Alex) | University College London |
Co-Chair: Zeng, Long | Tsinghua University |
|
13:30-15:00, Paper ThBT5-CC.1 | Add to My Program |
MOTPose: Multi-Object 6D Pose Estimation for Dynamic Video Sequences Using Attention-Based Temporal Fusion |
|
Periyasamy, Arul Selvam | University of Bonn |
Behnke, Sven | University of Bonn |
Keywords: Deep Learning for Visual Perception, Computer Vision for Automation, Data Sets for Robotic Vision
Abstract: Cluttered bin-picking environments are challenging for pose estimation models. Despite the impressive progress enabled by deep learning, single-view RGB pose estimation models perform poorly in cluttered dynamic environments. Imbuing the rich temporal information contained in the video of scenes has the potential to enhance models' ability to deal with the adverse effects of occlusion and the dynamic nature of the environments. Moreover, joint object detection and pose estimation models are better suited to leverage the co-dependent nature of the tasks for improving the accuracy of both tasks. To this end, we propose attention-based temporal fusion for multi-object 6D pose estimation that accumulates information across multiple frames of a video sequence. Our MOTPose method takes a sequence of images as input and performs joint object detection and pose estimation for all objects in one forward pass. It learns to aggregate both object embeddings and object parameters over multiple time steps using cross-attention-based fusion modules. We evaluate our method on the physically-realistic cluttered bin-picking dataset SynPick and the YCB-Video dataset and demonstrate improved pose estimation accuracy as well as better object detection accuracy.
|
|
13:30-15:00, Paper ThBT5-CC.2 | Add to My Program |
Generalizable Thermal-Based Depth Estimation Via Pre-Trained Visual Foundation Model |
|
Fan, Ruoyu | Tsinghua University |
Zhao, Wang | Tsinghua University |
Lin, Matthieu | Tsinghua University |
Wang, Qi | Guizhou University |
Liu, Yong-Jin | Tsinghua University |
Wang, Wenping | The University of Hong Kong |
Keywords: Deep Learning for Visual Perception, Computer Vision for Automation, Sensor Fusion
Abstract: Depth estimation is a crucial task in computer vision, applicable to various domains such as 3D reconstruction, robotics, and autonomous driving. In particular, thermal-based depth estimation has unique advantages, including night-time vision. However, the existing depth estimation method remains challenging in robust generalization due to limited data resources and spectral differences between thermal and RGB images. In this paper, we present a self-supervised approach to enhance thermal-based depth estimation by leveraging pre-trained visual models initially designed for RGB data. In detail, we design a novel two-stage training strategy, incorporating Low-rank Adapters and Convolutional Adapters, which not only significantly improves accuracy and robustness but also enables impressive zero-shot generalization capabilities. Our method outperforms existing thermal-based depth estimation models, opening new possibilities for cross-modal applications in computer vision and robotics research.
|
|
13:30-15:00, Paper ThBT5-CC.3 | Add to My Program |
OSSAR: Towards Open-Set Surgical Activity Recognition in Robot-Assisted Surgery |
|
Bai, Long | The Chinese University of Hong Kong |
Wang, Guankun | The Chinese University of Hong Kong |
Wang, Jie | Beijing Institute of Technology |
Yang, Xiaoxiao | Qilu Hospital of Shandong University |
Gao, Huxin | National University of Singapore |
Liang, Xin | Tongji University |
Wang, An | The Chinese University of Hong Kong |
Islam, Mobarakol | University College London |
Ren, Hongliang | Chinese Univ Hong Kong (CUHK) & National Univ Singapore(NUS) |
Keywords: Deep Learning for Visual Perception, Computer Vision for Medical Robotics, Surgical Robotics: Planning
Abstract: In the realm of automated robotic surgery and computer-assisted interventions, understanding robotic surgical activities stands paramount. Existing algorithms dedicated to surgical activity recognition predominantly cater to pre-defined closed-set paradigms, ignoring the challenges of real-world open-set scenarios. Such algorithms often falter in the presence of test samples originating from classes unseen during training phases. To tackle this problem, we introduce an innovative Open-Set Surgical Activity Recognition (OSSAR) framework. Our solution leverages the hyperspherical reciprocal point strategy to enhance the distinction between known and unknown classes in the feature space. Additionally, we address the issue of over-confidence in the closed-set by refining model calibration, avoiding misclassification of unknown classes as known ones. To support our assertions, we establish an open-set surgical activity benchmark utilizing the public JIGSAWS dataset. Besides, we also collect a novel dataset on endoscopic submucosal dissection for surgical activity tasks. Extensive comparisons and ablation experiments on these datasets demonstrate the significant outperformance of our method over existing state-of-the-art approaches. Our proposed solution can effectively address the challenges of real-world surgical scenarios. Our code is publicly accessible at github.com/longbai1006/OSSAR.
|
|
13:30-15:00, Paper ThBT5-CC.4 | Add to My Program |
FSD: Fast Self-Supervised Single RGB-D to Categorical 3D Objects |
|
Lunayach, Mayank | Georgia Institute of Technology |
Zakharov, Sergey | Toyota Research Institute |
Chen, Dian | Toyota Research Institute |
Ambrus, Rares | Toyota Research Institute |
Kira, Zsolt | Georgia Institute of Technology |
Irshad, Muhammad Zubair | Georgia Institute of Technology |
Keywords: Deep Learning for Visual Perception, RGB-D Perception, Recognition
Abstract: In this work, we address the challenging task of 3D object recognition without the reliance on real-world 3D labeled data. Our goal is to predict the 3D shape, size, and 6D pose of objects within a single RGB-D image, operating at the category level and eliminating the need for CAD models during inference. While existing self-supervised methods have made strides in this field, they often suffer from inefficiencies arising from non-end-to-end processing, reliance on separate models for different object categories, and slow surface extraction during the training of implicit reconstruction models; thus hindering both the speed and real-world applicability of the 3D recognition process. Our proposed method leverages a multi-stage training pipeline, designed to efficiently transfer synthetic performance to the real-world domain. This approach is achieved through a combination of 2D and 3D supervised losses during the synthetic domain training, followed by the incorporation of 2D supervised and 3D self-supervised losses on real-world data in two additional learning stages. By adopting this comprehensive strategy, our method successfully overcomes the aforementioned limitations and outperforms existing self-supervised 6D pose and size estimation baselines on the NOCS test-set with a 16.4% absolute improvement in mAP for 6D pose estimation while running in near real-time at 5 Hz. Project page: https://fsd6d.github.io/
|
|
13:30-15:00, Paper ThBT5-CC.5 | Add to My Program |
Keypoint Detection and Tracking in Low-Quality Image Frames with Events |
|
Wang, Xiangyuan | Wuhan University |
Chen, Kuangyi | Wuhan University |
Yang, Wen | Wuhan University |
Yu, Lei | Wuhan University |
Xing, Yannan | SynSense Co. Ltd |
Yu, Huai | Wuhan University |
Keywords: Deep Learning for Visual Perception, Sensor Fusion
Abstract: Keypoint detection and tracking in traditional image frames are often compromised by image quality issues such as motion blur and extreme lighting conditions. Event cameras offer potential solutions to these challenges by virtue of their high temporal resolution and high dynamic range. However, they have limited performance in practical applications due to their inherent noise in event data. This paper advocates fusing the complementary information from image frames and event streams to achieve more robust keypoint detection and tracking. Specifically, we propose a novel keypoint detection network that fuses the textural and structural information from image frames with the high-temporal-resolution motion information from event streams, namely FE-DeTr. The network leverages a temporal response consistency for supervision, ensuring stable and efficient keypoint detection. Moreover, we use a spatio-temporal nearest-neighbor search strategy for robust keypoint tracking. Extensive experiments are conducted on a new dataset featuring both image frames and event data captured under extreme conditions. The experimental results confirm the superior performance of our method over both existing frame-based and event-based methods. Our code, pre-trained models, and dataset are available at url{https://github.com/yuyangpoi/FE-DeTr}.
|
|
13:30-15:00, Paper ThBT5-CC.6 | Add to My Program |
TiV-ODE: A Neural ODE-Based Approach for Controllable Video Generation from Text-Image Pairs |
|
Xu, Yucheng | University of Edinburgh |
Li, Nanbo | University of Edinburgh |
Goel, Arushi | University of Edinburgh |
Yao, Zonghai | UMass Amherst |
Guo, Zijian | Boston University |
Kasaei, Mohammadreza | University of Edinburgh |
Kasaei, Hamidreza | University of Groningen |
Li, Zhibin (Alex) | University College London |
Keywords: Deep Learning for Visual Perception, Visual Learning, Data Sets for Robotic Vision
Abstract: Videos capture the evolution of continuous dynamical systems over time in the form of discrete image sequences. Recently, video generation models have been widely used in robotic research. However, generating controllable videos from image-text pairs is an important yet underexplored research topic in both robotic and computer vision communities. This paper introduces an innovative and elegant framework named TiV-ODE, formulating this task as modeling the dynamical system in a continuous space. Specifically, our framework leverages the ability of Neural Ordinary Differential Equations (Neural ODEs) to model the complex dynamical system depicted by videos as a nonlinear ordinary differential equation. The resulting framework offers control over the generated videos' dynamics, content, and frame rate, a feature not provided by previous methods. Experiments demonstrate the ability of the proposed method to generate highly controllable and visually consistent videos and its capability of modeling dynamical systems. Overall, this work is a significant step towards developing advanced controllable video generation models that can handle complex and dynamic scenes.
|
|
13:30-15:00, Paper ThBT5-CC.7 | Add to My Program |
TVFusionGAN: Thermal-Visible Image Fusion Based on Multi-Level Adversarial Network Strategy |
|
Lu, Guoyu | University of Georgia |
Keywords: Deep Learning for Visual Perception, Visual Learning, Object Detection, Segmentation and Categorization
Abstract: Thermal images excel at distinguishing objects from the background in low-light or nighttime conditions due to thermal radiation differences. However, they lack texture compared to visible images. Conversely, visible images retain more texture information at higher resolutions, particularly during the daytime, but perform poorly at night. To address the limitations of both image modalities, recent methods have employed traditional fusion techniques or fusion networks to generate fused images that combine thermal and visible properties. This paper introduces an end-to-end fusion network that leverages generative adversarial networks (GANs) to fuse salient image components from the two modalities. Our network comprises a generator and two discriminators. The generator aims to produce fusion images with salient objects using a specially designed CIoU loss. The two adversarial networks ensure that the fused images are salient both in a holistic sense and at a local scale. One discriminator encourages the fused images to resemble visible images holistically, while the other ensures that the targeted objects in the fused images are as salient as in thermal images. Our method effectively preserves the thermal radiation of salient objects in infrared images while incorporating the textures of visible images.
|
|
13:30-15:00, Paper ThBT5-CC.8 | Add to My Program |
You Only Label Once: 3D Box Adaptation from Point Cloud to Image with Semi-Supervised Learning |
|
Shi, Jieqi | The Hong Kong University of Science and Technology |
Li, Peiliang | HKUST, Robotics Institute |
Chen, Xiaozhi | DJI |
Shen, Shaojie | Hong Kong University of Science and Technology |
Keywords: Deep Learning for Visual Perception, Visual Learning, RGB-D Perception
Abstract: The image-based 3D object detection task expects that the predicted 3D bounding box has a “tightness” projection (also referred to as cuboid) to facilitate 2D-based training, which fits the object contour well on the image while still remaining reasonable on the 3D space. These requirements bring significant challenges to the annotation. Simply projecting the Lidar-labeled 3D boxes to the image leads to non-trivial misalignment, while directly drawing a cuboid on the image cannot access the original 3D information. In this work, we propose a learning-based 3D box adaptation approach that automatically adjusts minimum parameters of the 360◦ Lidar 3D bounding box to perfectly fit the image appearance of panoramic cameras. With only a few 2D boxes annotation as guidance during the training phase, our network can produce accurate image-level cuboid annotations with 3D properties from Lidar boxes. We call our method “you only label once”, which means labeling on the point cloud once and automatically adapting to all surrounding cameras. Our refinement balances the accuracy and efficiency well and dramatically reduces the labeling effort for accurate cuboid annotation. Extensive experiments on the public Waymo and NuScenes datasets show that our method can produce human-level cuboid annotation on the image without needing manual adjustment, and can accelerate the monocular-3D training tasks.
|
|
13:30-15:00, Paper ThBT5-CC.9 | Add to My Program |
Self-Supervised Pretraining and Finetuning for Monocular Depth and Visual Odometry |
|
Chidlovskii, Boris | Naver Labs Europe |
Antsfeld, Leonid | Naver Labs Europe |
Keywords: Deep Learning for Visual Perception, Visual Learning, Vision-Based Navigation
Abstract: For the task of simultaneous monocular depth and visual odometry estimation, we propose two-step learning of self-supervised transformer-based models. We first benefit from the generic pretrained models oriented towards understanding 3D geometry, followed by self-supervised finetuning on non-annotated videos. We show that our self-supervised models can reach the state-of-the-art performance 'without bells and whistles' using the standard components such as visual transformers, dense prediction transformers and adapters. We demonstrate the effectiveness of our proposed method by running evaluations on six benchmark datasets, both static and dynamic, indoor and outdoor, with synthetic and real images. For all datasets, our method outperforms state-of-the-art methods, in particular for depth prediction task.
|
|
ThBT6-CC Oral Session, CC-414 |
Add to My Program |
Computer Vision for Transportation |
|
|
Chair: de Croon, Guido | TU Delft |
Co-Chair: Valada, Abhinav | University of Freiburg |
|
13:30-15:00, Paper ThBT6-CC.1 | Add to My Program |
Amodal Optical Flow |
|
Luz, Maximilian | University of Freiburg |
Mohan, Rohit | University of Freiburg |
Sekkat, Ahmed Rida | IAV GmbH |
Sawade, Oliver | IAV GmbH |
Matthes, Elmar | IAV GmbH |
Brox, Thomas | University of Freiburg |
Valada, Abhinav | University of Freiburg |
Keywords: Data Sets for Robotic Vision, Computer Vision for Transportation, Deep Learning for Visual Perception
Abstract: Optical flow estimation is very challenging in situations with transparent or occluded objects. In this work, we address these challenges at the task level by introducing Amodal Optical Flow, which integrates optical flow with amodal perception. Instead of only representing the visible regions, we define amodal optical flow as a multi-layered pixel-level motion field that encompasses both visible and occluded regions of the scene. To facilitate research on this new task, we extend the AmodalSynthDrive dataset to include pixel-level labels for amodal optical flow estimation. We present several strong baselines, along with the Amodal Flow Quality metric to quantify the performance in an interpretable manner. Furthermore, we propose the novel AmodalFlowNet as an initial step toward addressing this task. AmodalFlowNet consists of a transformer-based cost-volume encoder paired with a recurrent transformer decoder which facilitates recurrent hierarchical feature propagation and amodal semantic grounding. We demonstrate the tractability of amodal optical flow in extensive experiments and show its utility for downstream tasks such as panoptic tracking. We make the dataset, code, and trained models publicly available at http://amodal-flow.cs.uni-freiburg.de.
|
|
13:30-15:00, Paper ThBT6-CC.2 | Add to My Program |
AutoGraph: Predicting Lane Graphs from Traffic Observations |
|
Zürn, Jannik | University of Freiburg |
Posner, Ingmar | Oxford University |
Burgard, Wolfram | University of Technology Nuremberg |
Keywords: Deep Learning for Visual Perception, Computer Vision for Transportation
Abstract: Lane graph estimation is a long-standing problem in the context of autonomous driving. Previous works aimed at solving this problem by relying on large-scale, hand-annotated lane graphs, introducing a data bottleneck for training models to solve this task. To overcome this limitation, we propose to use the motion patterns of traffic participants as lane graph annotations. In our AutoGraph approach, we employ a pre-trained object tracker to collect the tracklets of traffic participants such as vehicles and trucks. Based on the location of these tracklets, we predict the successor lane graph from an initial position using overhead RGB images only, not requiring any human supervision. In a subsequent stage, we show how the individual successor predictions can be aggregated into a consistent lane graph. We demonstrate the efficacy of our approach on the UrbanLaneGraph dataset and perform extensive quantitative and qualitative evaluations, indicating that AutoGraph is on par with models trained on hand-annotated graph data. Model and dataset will be made available at http://autograph.cs.uni-freiburg.de/.
|
|
13:30-15:00, Paper ThBT6-CC.3 | Add to My Program |
3DSF-MixNet: Mixer-Based Symmetric Scene Flow Estimation from 3D Point Clouds |
|
Wang, Shuaijun | Southern University of Science and Technology |
Gao, Rui | Southern University of Science and Technology |
Han, Ruihua | University of Hong Kong |
Hao, Qi | Southern University of Science and Technology |
Keywords: Deep Learning for Visual Perception, Computer Vision for Transportation
Abstract: The scene flow estimation aims at accurately achieving the motion of 3D points, imposing challenges like mis-registration, object occlusions, and non-uniform upsampling. This paper introduces a scene flow estimation framework featuring a unified scene flow estimator, a symmetric cost volume approach, and a geometric/semantic feature based upsampling strategy. The novelty of this work is threefold: (1) developing a novel progressive framework which integrates the cost volume module and scene flow estimator, enhancing scene flow estimation. (2) developing a symmetric inter-frame correlation feature extraction method through CV estimation using MLP-Mixer operations; (3) developing an upsampling strategy based on both the semantic and geometric feature similarities between sparse and dense samples. Experiment results show that our method outperforms state-of-the-art baseline methods, especially in scenarios involving challenging conditions, the improvements of our method achieving at most 0.1094m/0.089m/0.091m in EPE3D, 54.23%/53.67%/74.1% in AS, 32.75%/21.87%/40.25% in AR, and 70.981%/58.06%/43.56% in outliers, when tested on FlyingThings3D (FT3D_S, FT3D_H) and KITTI_H datasets, respectively.
|
|
13:30-15:00, Paper ThBT6-CC.4 | Add to My Program |
CVFormer: Learning Circum-View Representation and Consistency Constraints for Vision-Based Occupancy Prediction Via Transformers |
|
Bai, Zhengqi | Shanghai Institute of Microsystem and Information Technology |
Shi, Wenjun | Shanghai Institute of Microsystem and Information Technology |
Zhu, Dongchen | Shanghai Institute of Microsystem and Information Technology, Chi |
Kang, HanLong | Lotus Robotics |
Zhang, Guanghui | Shanghai Institute of Microsystem and Information Technology, Ch |
Ye, Gang | Lotus Robotics |
Xiao, Yang | Lotus Technology Ltd |
Wang, Lei | Shanghai Institute of Microsystem and Information Technology, Ch |
Zhang, Xiaolin | Shanghai Institute of Microsystem and Information Technology, Chi |
Li, Bo | Lotus Technology Ltd |
Li, Jiamao | Shanghai Institute of Microsystem and Information Technology, Chi |
Keywords: Deep Learning for Visual Perception, Computer Vision for Transportation, Computer Vision for Automation
Abstract: With the increasing demands for perception accuracy in autonomous driving, there is a growing focus on fine-grained 3D semantic occupancy prediction. Effectively representing detailed three-dimensional scenes has become a significant challenge in the development of this task. In this paper, we present a novel transformer-based framework named CVFormer, which leverages two-dimensional circum-views from the ego to excavate three-dimensional features of the surrounding environment. Circum-views provide a novel solution for effectively addressing the representation of dense and fine-grained scenes. Specifically, a multi-attention module CTMA is designed for fusing temporal features from circum-views to fully exploit the spatiotemporal correlations between frames and capture more comprehensive clues. Furthermore, a novel 2D projection constraint is established by observing objects from different perspective directions, and multiple 3D constraints based on object invariance and semantic consistency are also conducted for supervising the network, which enhances its performance of understanding the scene. Experimental results on nuScenes dataset demonstrate that the proposed CVFormer obviously outperforms existing methods for occupancy prediction.
|
|
13:30-15:00, Paper ThBT6-CC.5 | Add to My Program |
Lightweight Event-Based Optical Flow Estimation Via Iterative Deblurring |
|
Wu, Yilun | TU Delft |
Paredes-Valles, Federico | Delft University of Technology |
de Croon, Guido | TU Delft |
Keywords: Deep Learning for Visual Perception, Computer Vision for Transportation, Computer Vision for Automation
Abstract: Inspired by frame-based methods, state-of-the-art event-based optical flow networks rely on the explicit construction of correlation volumes, which are expensive to compute and store, rendering them unsuitable for robotic applications with limited compute and energy budget. Moreover, correlation volumes scale poorly with resolution, prohibiting them from estimating high-resolution flow. We observe that the spatiotemporally continuous traces of events provide a natural search direction for seeking pixel correspondences, obviating the need to rely on gradients of explicit correlation volumes as such search directions. We introduce IDNet (Iterative Deblurring Network), a lightweight yet high-performing event-based optical flow network directly estimating flow from event traces without using correlation volumes. We further propose two iterative update schemes: "ID" which iterates over the same batch of events, and "TID" which iterates over time with streaming events in an online fashion. Our top-performing model (ID) sets a new state of the art on DSEC benchmark. Meanwhile, the base model (TID) is competitive with prior arts while using 80% fewer parameters, consuming 20x less memory footprint and running 40% faster on the NVidia Jetson Xavier NX. Furthermore, the "TID" scheme is even more efficient offering an additional 5x faster inference speed and 8 ms ultra-low latency at the cost of only a 9% performance drop, making it the only model among current literature capable of real-time operation while maintaining decent performance.
|
|
13:30-15:00, Paper ThBT6-CC.6 | Add to My Program |
ActFormer: Scalable Collaborative Perception Via Active Queries |
|
Huang, Suozhi | Tsinghua University |
Zhang, Juexiao | New York University |
Li, Yiming | New York University |
Feng, Chen | New York University |
Keywords: Deep Learning for Visual Perception, Computer Vision for Transportation, Computer Vision for Automation
Abstract: Collaborative perception leverages rich visual observations from multiple robots to extend a single robot's perception ability beyond its field of view. Many prior works receive messages broadcast from all collaborators, leading to a scalability challenge when dealing with a large number of robots and sensors. In this work, we aim to address scalable camera-based collaborative perception with a Transformer-based architecture. Our key idea is to enable a single robot to intelligently discern the relevance of the collaborators and their associated cameras according to a learned spatial prior. This proactive understanding of the visual features' relevance does not require the transmission of the features themselves, enhancing both communication and computation efficiency. Specifically, we present ActFormer, a Transformer that learns bird's eye view (BEV) representations by using predefined BEV queries to interact with multi-robot multi-camera inputs. Each BEV query can actively select relevant cameras for information aggregation based on pose information, instead of interacting with all cameras indiscriminately. Experiments on the V2X-Sim dataset demonstrate that ActFormer improves the detection performance from 29.89% to 45.15% in terms of AP@0.7 with about 50% fewer queries, showcasing the effectiveness of ActFormer in multi-agent collaborative 3D object detection.
|
|
13:30-15:00, Paper ThBT6-CC.7 | Add to My Program |
LiDAR Data Synthesis with Denoising Diffusion Probabilistic Models |
|
Nakashima, Kazuto | Kyushu University |
Kurazume, Ryo | Kyushu University |
Keywords: Deep Learning for Visual Perception, Computer Vision for Transportation, Representation Learning
Abstract: Generative modeling of 3D LiDAR data is an emerging task with promising applications for autonomous mobile robots, such as scalable simulation, scene manipulation, and sparse-to-dense completion of LiDAR point clouds. While existing approaches have demonstrated the feasibility of image-based LiDAR data generation using deep generative models, they still struggle with fidelity and training stability. In this work, we present R2DM, a novel generative model for LiDAR data that can generate diverse and high-fidelity 3D scene point clouds based on the image representation of range and reflectance intensity. Our method is built upon denoising diffusion probabilistic models (DDPMs), which have shown impressive results among generative model frameworks in recent years. To effectively train DDPMs in the LiDAR domain, we first conduct an in-depth analysis of data representation, loss functions, and spatial inductive biases. Leveraging our R2DM model, we also introduce a flexible LiDAR completion pipeline based on the powerful capabilities of DDPMs. We demonstrate that our method surpasses existing methods in generating tasks on the KITTI-360 and KITTI-Raw datasets, as well as in the completion task on the KITTI-360 dataset. Our project page can be found at url{https://kazuto1011.github.io/r2dm}.
|
|
13:30-15:00, Paper ThBT6-CC.8 | Add to My Program |
Multi-Task Learning for Real-Time Autonomous Driving Leveraging Task-Adaptive Attention Generator |
|
Choi, Wonhyeok | DGIST |
Shin, Mingyu | Daegu Gyeongbuk Institute of Science and Technology |
Lee, Hyukzae | Hyundai Motor Company |
Cho, Jaehoon | Hyundai Motor Company R&D Division |
Park, Jaehyeon | Hyundai Motor |
Im, Sunghoon | DGIST |
Keywords: Deep Learning for Visual Perception, Computer Vision for Transportation, Vision-Based Navigation
Abstract: Real-time processing is crucial in autonomous driving systems due to the imperative of instantaneous decision-making and rapid response. In real-world scenarios, autonomous vehicles are continuously tasked with interpreting their surroundings, analyzing intricate sensor data, and making decisions within split seconds to ensure safety through numerous computer vision tasks. In this paper, we present a new real-time multi-task network adept at three vital autonomous driving tasks: monocular 3D object detection, semantic segmentation, and dense depth estimation. To counter the challenge of negative transfer — the prevalent issue in multi-task learning — we introduce a task-adaptive attention generator. This generator is designed to automatically discern interrelations across the three tasks and arrange the task-sharing pattern, all while leveraging the efficiency of the hard-parameter sharing approach. To the best of our knowledge, the proposed model is pioneering in its capability to concurrently handle multiple tasks, notably 3D object detection, while maintaining real-time processing speeds. Our rigorously optimized network, when tested on the Cityscapes-3D datasets, consistently outperforms various baseline models. Moreover, an in-depth ablation study substantiates the efficacy of the methodologies integrated into our framework.
|
|
13:30-15:00, Paper ThBT6-CC.9 | Add to My Program |
LiDARFormer: A Unified Transformer-Based Multi-Task Network for LiDAR Perception |
|
Zhou, Zixiang | University of Central Florida |
Ye, Dongqiangzi | Tusimple |
Chen, Weijia | N/A |
Xie, Yufei | Tusimple |
Wang, Yu | N/A |
Wang, Panqu | ZERON |
Foroosh, Hassan | University of Central Florida |
Keywords: Deep Learning for Visual Perception, Object Detection, Segmentation and Categorization, Computer Vision for Transportation
Abstract: There is a recent need in the LiDAR perception field for unifying multiple tasks in a single strong network with improved performance, as opposed to using separate networks for each task. In this paper, we introduce a new LiDAR multi-task learning paradigm based on the transformer. The proposed LiDARFormer utilizes cross-space global contextual feature information and exploits cross-task synergy to boost the performance of LiDAR perception tasks across multiple large-scale datasets and benchmarks. Our novel transformer-based framework includes a cross-space transformer module that learns attentive features between the 2D dense Bird's Eye View (BEV) and 3D sparse voxel feature maps. Additionally, we propose a transformer decoder for the segmentation task to dynamically adjust the learned features by leveraging the categorical feature representations. Furthermore, we combine the segmentation and detection features in a shared transformer decoder with cross-task attention layers to enhance and integrate the object-level and class-level features. LiDARFormer is evaluated on the large-scale nuScenes and the Waymo Open datasets for both 3D detection and semantic segmentation tasks, and it achieves state-of-the-art performance on both tasks.
|
|
ThBT7-CC Oral Session, CC-416 |
Add to My Program |
Machine Learning for Robot Control I |
|
|
Chair: Falotico, Egidio | Scuola Superiore Sant'Anna |
Co-Chair: Controzzi, Marco | Scuola Superiore Sant'Anna |
|
13:30-15:00, Paper ThBT7-CC.1 | Add to My Program |
Adaptive Robot-Human Handovers with Preference Learning |
|
Perovic, Gojko | Scuola Superiore Sant'Anna |
Iori, Francesco | Scuola Superiore Sant'Anna |
Mazzeo, Angela | Scuola Superiore Sant'Anna |
Controzzi, Marco | Scuola Superiore Sant'Anna |
Falotico, Egidio | Scuola Superiore Sant'Anna |
Keywords: Machine Learning for Robot Control, Human-Robot Collaboration, Human-Aware Motion Planning
Abstract: This paper proposes an adaptive method for robot-to-human handovers under different scenarios. The method combines Dynamic Movement Primitives (DMP) with Preference Learning (PL) to generate online trajectories that are reactive to human motion, modulating the speed of the robot. The PL allows for tuning the coupling parameters of the DMP, tailoring the interaction to each participant personally, and allowing for qualitative analysis of user preferences. Simulation of an interaction-constrained learning task with different optimization techniques is performed to determine an appropriate learning approach for a handover task. The validity of the approach is demonstrated through experiments with participants on two handover tasks, with results indicating that the proposed method leads to seamless and pleasurable interactions.
|
|
13:30-15:00, Paper ThBT7-CC.2 | Add to My Program |
NaviFormer: A Data-Driven Robot Navigation Approach Via Sequence Modeling and Path Planning with Safety Verification |
|
Zhang, Xuyang | University of Science and Technology of China |
Feng, Ziyang | University of Science and Technology of China |
Qiu, Quecheng | School of Data Science, USTC, Hefei 230026, China |
Chen, Yu'an | University of Science and Technology of China |
Hua, Bei | University of Science and Technology of China |
Ji, Jianmin | University of Science and Technology of China |
Keywords: Machine Learning for Robot Control, Reinforcement Learning
Abstract: Reinforcement learning has shown great potential in improving the performance of robot navigation. In response to the increasing deployments of mobile robots within various scenarios, a data-driven paradigm of navigation approach with safety verification is preferred where one can train RL algorithms with large amounts of prior data, keep learning continuously, and ensure safe navigation in applications. Conventional end-to-end reinforcement learning navigation paradigms have encountered multiple challenges in meeting these demands. In this work, we introduce a novel robot navigation approach termed NaviFormer. This approach handles navigation tasks based on sequence modeling to obtain the data-driven ability. It also integrates rule-based verification for safety insurance. We conduct a series of experiments to validate the data-driven ability of our approach and to compare it with existing navigation methods. We also perform quantitative tests on a real-world robot platform, TurtleBot. The experimental results show our method's outstanding data-driven ability and highlight its superior arrival rate and generalization compared to other state-of-the-art methods like the PPO-based navigation method.
|
|
13:30-15:00, Paper ThBT7-CC.3 | Add to My Program |
Real-Time Adaptive Safety-Critical Control with Gaussian Processes in High-Order Uncertain Models |
|
Zhang, Yu | Technical University of Munich |
Wen, Long | Technical University of Munich |
Yao, Xiangtong | Technical University of Munich |
Bing, Zhenshan | Technical University of Munich |
Kong, Linghuan | University of Macau |
He, Wei | University of Science and Technology Beijing |
Knoll, Alois | Tech. Univ. Muenchen TUM |
Keywords: Machine Learning for Robot Control, Robot Safety, Collision Avoidance
Abstract: This paper presents an adaptive online learning framework for systems with uncertain parameters to ensure safety-critical control in non-stationary environments. Our approach consists of two phases. The initial phase is centered on a novel sparse Gaussian process (GP) framework. We first integrate a forgetting factor to refine a variational sparse GP algorithm, thus enhancing its adaptability. Subsequently, the hyperparameters of the Gaussian model are trained with a specially compound kernel, and the Gaussian model’s online inferential capability and computational efficiency are strengthened by updating a solitary inducing point derived from newly samples, in conjunction with the learned hyperparameters. In the second phase, we propose a safety filter based on high order control barrier functions (HOCBFs), synergized with the previously trained learning model. By leveraging the compound kernel from the first phase, we effectively address the inherent limitations of GPs in handling highdimensional problems for real-time applications. The derived controller ensures a rigorous lower bound on the probability of satisfying the safety specification. Finally, the efficacy of our proposed algorithm is demonstrated through real-time obstacle avoidance experiments executed using both simulation platform and a real-world 7-DOF robot.
|
|
13:30-15:00, Paper ThBT7-CC.4 | Add to My Program |
DeformNet: Latent Space Modeling and Dynamics Prediction for Deformable Object Manipulation |
|
Li, Chenchang | Tsinghua University |
Ai, Zihao | Tsinghua University |
Wu, Tong | Tsinghua University |
Li, Xiaosa | Tsinghua University |
Ding, Wenbo | Tsinghua University |
Xu, Huazhe | Tsinghua University |
Keywords: Machine Learning for Robot Control, Representation Learning, AI-Based Methods
Abstract: Manipulating deformable objects is a ubiquitous task in household environments, demanding adequate representation and accurate dynamics prediction due to the objects' infinite degrees of freedom. This work proposes DeformNet, which utilizes latent space modeling with a learned 3D representation model to tackle these challenges effectively. The proposed representation model combines a PointNet encoder and a conditional neural radiance field (NeRF), facilitating a thorough acquisition of object deformations and variations in lighting conditions. To model the complex dynamics, we employ a recurrent state-space model (RSSM) that accurately predicts the transformation of the latent representation over time. Extensive simulation experiments with diverse objectives demonstrate the generalization capabilities of DeformNet for various deformable object manipulation tasks, even in the presence of previously unseen goals. Finally, we deploy DeformNet on an actual UR5 robotic arm to demonstrate its capability in real-world scenarios.
|
|
13:30-15:00, Paper ThBT7-CC.5 | Add to My Program |
Actor-Critic Model Predictive Control |
|
Romero, Angel | University of Zurich |
Song, Yunlong | University of Zurich |
Scaramuzza, Davide | University of Zurich |
Keywords: Machine Learning for Robot Control, Reinforcement Learning, Aerial Systems: Applications
Abstract: An open research question in robotics is how to combine the benefits of model-free reinforcement learning (RL) - known for its strong task performance and flexibility in optimizing general reward formulations - with the robustness and online replanning capabilities of model predictive control (MPC). This paper provides an answer by introducing a new framework called Actor-Critic Model Predictive Control. The key idea is to embed a differentiable MPC within an actor-critic RL framework. The proposed approach leverages the short-term predictive optimization capabilities of MPC with the exploratory and end-to-end training properties of RL. The resulting policy effectively manages both short-term decisions through the MPC-based actor and long-term prediction via the critic network, unifying the benefits of both model-based control and end-to-end learning. We validate our method in both simulation and the real world with a quadcopter platform across various high-level tasks. We show that the proposed architecture can achieve real-time control performance, learn complex behaviors via trial and error, and retain the predictive properties of the MPC to better handle out of distribution behaviour.
|
|
13:30-15:00, Paper ThBT7-CC.6 | Add to My Program |
Tractable Joint Prediction and Planning Over Discrete Behavior Modes for Urban Driving |
|
Villaflor, Adam | CMU |
Yang, Brian | University of California, Berkeley |
Su, Huangyuan | Harvard University |
Fragkiadaki, Aikaterini | Carnegie Mellon University |
Dolan, John M. | Carnegie Mellon University |
Schneider, Jeff | Carnegie Mellon University |
Keywords: Machine Learning for Robot Control, Deep Learning Methods
Abstract: Significant progress has been made in training multimodal trajectory forecasting models for autonomous driving. However, effectively integrating these models with downstream planners and model-based control approaches is still an open problem. Although these models have conventionally been evaluated for open-loop prediction, we show that they can be used to parameterize autoregressive closed-loop models without retraining. We consider recent trajectory prediction approaches which leverage learned anchor embeddings to predict multiple trajectories, finding that these anchor embeddings can parameterize discrete and distinct modes representing high-level driving behaviors. We propose to perform fully reactive closed-loop planning over these discrete latent modes, allowing us to tractably model the causal interactions between agents at each step. We validate our approach on a suite of more dynamic merging scenarios, finding that our approach avoids the frozen robot problem which is pervasive in conventional planners. Our approach also outperforms the previous state-of-the-art in CARLA on challenging dense traffic scenarios when evaluated at realistic speeds.
|
|
13:30-15:00, Paper ThBT7-CC.7 | Add to My Program |
GenDOM: Generalizable One-Shot Deformable Object Manipulation with Parameter-Aware Policy |
|
Kuroki, So | The University of Tokyo |
Guo, Jiaxian | The University of Tokyo |
Matsushima, Tatsuya | The University of Tokyo |
Okubo, Takuya | University of Tokyo |
Kobayashi, Masato | Osaka University |
Ikeda, Yuya | University of Tokyo |
Takanami, Ryosuke | The University of Tokyo |
Yoo, Paul | The University of Tokyo |
Matsuo, Yutaka | The University of Tokyo |
Iwasawa, Yusuke | The University of Tokyo |
Keywords: Machine Learning for Robot Control, Transfer Learning, Deep Learning in Grasping and Manipulation
Abstract: Due to the inherent uncertainty in their deformability during motion, previous methods in deformable object manipulation, such as rope and cloth, often required hundreds of real-world demonstrations to train a manipulation policy for each object, which hinders their applications in our ever-changing world. To address this issue, we introduce GenDOM, a framework that allows the manipulation policy to handle different deformable objects with only a single real-world demonstration. To achieve this, we augment the policy by conditioning it on deformable object parameters and training it with a diverse range of simulated deformable objects so that the policy can adjust actions based on different object parameters. At the time of inference, given a new object, GenDOM can estimate the deformable object parameters with only a single real-world demonstration by minimizing the disparity between the grid density of point clouds of real-world demonstrations and simulations in a differentiable physics simulator. Empirical validations on both simulated and real-world object manipulation setups clearly show that our method can manipulate different objects with a single demonstration and significantly outperforms the baseline in both environments (a 62% improvement for in-domain ropes and a 15% improvement for out-of-distribution ropes in simulation, as well as a 26% improvement for ropes and a 50% improvement for cloths in the real world), demonstrating the effectiveness of our approach in one-shot deformable object manipulation.
|
|
ThBT8-CC Oral Session, CC-418 |
Add to My Program |
Datasets for Robotic Vision |
|
|
Co-Chair: Chen, Yi-Ting | National Yang Ming Chiao Tung University |
|
13:30-15:00, Paper ThBT8-CC.1 | Add to My Program |
RiskBench: A Scenario-Based Benchmark for Risk Identification |
|
Kung, Chi-Hsi | National Tsing Hua University |
Pao, Pang-Yuan | National Yang Ming Chiao Tung University |
Chen, Pin-Lun | National Yang Ming Chiao Tung University |
Lu, Hsin-Cheng | National Taiwan University |
Yang, Chieh Chi | National Yang Ming Chiao Tung University |
Lu, Shu-Wei | National Yang Ming Chiao Tung University |
Chen, Yi-Ting | National Yang Ming Chiao Tung University |
Keywords: Data Sets for Robotic Vision, Collision Avoidance, Motion and Path Planning
Abstract: Intelligent driving systems aim to achieve a zero-collision mobility experience, requiring interdisciplinary efforts to enhance safety performance. This work focuses on risk identification, the process of identifying and analyzing risks stemming from dynamic traffic participants and unexpected events. While significant advances have been made in the community, the current evaluation of different risk identification algorithms uses independent datasets, leading to difficulty in direct comparison and hindering collective progress toward safety performance enhancement. To address this limitation, we introduce RiskBench, a large-scale scenario-based benchmark for risk identification. Our benchmark is created using a scenario-based approach, which is widely accepted in the automotive industry. We design a scenario taxonomy and augmentation pipeline to enable a systematic collection of ground truth risks under different scenarios. We assess the ability of ten algorithms to (1) detect and locate risks, (2) anticipate risks, and (3) facilitate decision-making. We conduct extensive experiments and summarize future research on risk identification. Our aim is to encourage collaborative endeavors in achieving a society with zero collisions. To facilitate this, we have made our dataset and benchmark toolkit publicly at https://hcis-lab.github.io/RiskBench/
|
|
13:30-15:00, Paper ThBT8-CC.2 | Add to My Program |
Enhancing Inland Water Safety: The Lake Constance Obstacle Detection Benchmark |
|
Griesser, Dennis | University of Applied Sciences Konstanz, Institute for Optical S |
Franz, Matthias | University of Applied Sciences Konstanz, Institute for Optical S |
Umlauf, Georg | University of Applied Sciences Konstanz, Institute for Optical S |
Keywords: Data Sets for Robotic Vision, Computer Vision for Automation, Recognition
Abstract: Autonomous navigation on inland waters requires an accurate understanding of the environment in order to react to possible obstacles. Deep learning is a promising technique to detect obstacles robustly. However, supervised deep learning models require large data-sets to adjust their weights and to generalize to unseen data. Therefore, we equipped our research vessel with a laser scanner and a stereo camera to record a novel obstacle detection data-set for inland waters. We annotated 1974 stereo images and lidar point clouds with 3d bounding boxes. Furthermore, we provide an initial approach and a suitable metric to compare the results on the test data-set. The data-set is publicly available and seeks to make a contribution towards increasing the safety on inland waters.
|
|
13:30-15:00, Paper ThBT8-CC.3 | Add to My Program |
IDD-X: A Multi-View Dataset for Ego-Relative Important Object Localization and Explanation in Dense and Unstructured Traffic |
|
Parikh, Chirag | International Institute of Information Technology, Hyderabad |
Saluja, Rohit | IIT Mandi |
Jawahar, C.V. | IIIT, Hyderabad |
Sarvadevabhatla, Ravi Kiran | IIIT Hyderabad |
Keywords: Data Sets for Robotic Vision, Computer Vision for Transportation, Deep Learning for Visual Perception
Abstract: Intelligent vehicle systems require a deep understanding of the interplay between road conditions, surrounding entities, and the ego vehicle's driving behavior for safe and efficient navigation. This is particularly critical in developing countries where traffic situations are often dense and unstructured with heterogeneous road occupants. Existing datasets, predominantly geared towards structured and sparse traffic scenarios, fall short of capturing the complexity of driving in such environments. To fill this gap, we present IDD-X, a large-scale dual-view driving video dataset. With 697K bounding boxes, 9K important object tracks, and 1-12 objects per video, IDD-X offers comprehensive ego-relative annotations for multiple important road objects covering 10 categories and 19 explanation label categories. The dataset also incorporates rearview information to provide a more complete representation of the driving environment. We also introduce custom-designed deep networks aimed at multiple important object localization and per-object explanation prediction. Overall, our dataset and introduced prediction models form the foundation for studying how road conditions and surrounding entities affect driving behavior in complex traffic situations.
|
|
13:30-15:00, Paper ThBT8-CC.4 | Add to My Program |
LiDAR-CS Dataset: LiDAR Point Cloud Dataset with Cross-Sensors for 3D Object Detection |
|
Fang, Jin | Baidu |
Zhou, Dingfu | Baidu |
Zhao, Jingjing | Tusimple |
Wu, Chenming | Baidu Research |
Tang, Chulin | University of California, Irvine |
Xu, Chengzhong | University of Macau |
Zhang, Liangjun | Baidu |
Keywords: Data Sets for Robotic Vision, Deep Learning for Visual Perception
Abstract: Over the past few years, there has been remarkable progress in research on 3D point clouds and their use in autonomous driving scenarios has become widespread. However, deep learning methods heavily rely on annotated data and often face domain generalization issues. Unlike 2D images whose domains usually pertain to the texture information present in them, the features derived from a 3D point cloud are affected by the distribution of the points. The lack of a 3D domain adaptation benchmark leads to the common practice of training a model on one benchmark (e.g. Waymo) and then assessing it on another dataset (e.g. KITTI). This setting results in two distinct domain gaps: scenarios and sensors, making it difficult to analyze and evaluate the method accurately. To tackle this problem, this paper presents LiDAR Dataset with Cross Sensors (LiDAR-CS Dataset), which contains large-scale annotated LiDAR point cloud under six groups of different sensors but with the same corresponding scenarios, captured from hybrid realistic LiDAR simulator. To our knowledge, LiDAR-CS Dataset is the first dataset that addresses the sensor-related gaps in the domain of 3D object detection in real traffic. Furthermore, we evaluate and analyze the performance using various baseline detectors and demonstrated its potential applications. Project page: https://opendriving.github.io/lidar-cs.
|
|
13:30-15:00, Paper ThBT8-CC.5 | Add to My Program |
ROV6D: 6D Pose Estimation Benchmark Dataset for Underwater Remotely Operated Vehicles |
|
Tang, Jingyi | Tsinghua University |
Chen, Zeyu | Tsinghua University |
Fu, Bowen | Tsinghua University |
Lu, Wenjie | Harbin Institute of Technology (Shenzhen) |
Li, Shengquan | Pengcheng Lab |
Li, Xiu | Tsinghua University |
Ji, Xiangyang | Tsinghua University |
Keywords: Data Sets for Robotic Vision, Deep Learning for Visual Perception
Abstract: Accurately localization between multi-robots is crucial for many underwater applications, such as tracking, convoying and subsea intervention tasks. 6D pose estimation is a fundamental task that enables precise object localization in 3D space with full six degrees of freedom. However, one critical challenge is the lack of available large-scale datasets due to the unbearable cost of labelled data collection. To overcome this difficulty, we propose a benchmark dataset, ROV6D, for 6D pose estimation of remotely operated vehicles (ROVs). The training subset consists of a large number of synthetic images with 6D pose ground truth for ROVs. These synthetic images are generated using BlenderProc and further rendered with the underwater neural rendering (UWNR) strategy to enhance their realism. The testing subsets cover different real-world scenarios, including the Pool subset and Maoming subset, focusing on challenging cases that involve partial occlusion and low visibility. Diverse recent methods are evaluated on the constructed dataset. The results show that methods based on dense coordinates currently perform best, outperforming both the keypoint-based method and the refinement-based method. Our dataset will be made publicly available soon.
|
|
13:30-15:00, Paper ThBT8-CC.6 | Add to My Program |
The GOOSE Dataset for Perception in Unstructured Environments |
|
Mortimer, Peter | Universität Der Bundeswehr München |
Hagmanns, Raphael | Karlsruhe Institute of Technology |
Granero, Miguel | Fraunhofer IOSB |
Luettel, Thorsten | Universität Der Bundeswehr München |
Petereit, Janko | Fraunhofer IOSB |
Wuensche, Hans Joachim Joe | Universität Der Bundeswehr München |
Keywords: Data Sets for Robotic Vision, Deep Learning for Visual Perception, Sensor Fusion
Abstract: The potential for deploying autonomous systems can be significantly increased by improving the perception and interpretation of the environment. However, the development of deep learning-based techniques for autonomous systems in unstructured outdoor environments poses challenges due to limited data availability for training and testing. To address this gap, we present the German Outdoor and Offroad Dataset (GOOSE), a comprehensive dataset specifically designed for unstructured outdoor environments. The GOOSE dataset incorporates 10000 labeled pairs of images and point clouds, which are utilized to train a range of state-of-the-art segmentation models on both image and point cloud data. We open source the dataset, along with an ontology for unstructured terrain, as well as dataset standards and guidelines. This initiative aims to establish a common framework, enabling the seamless inclusion of existing datasets and a fast way to enhance the perception capabilities of various robots operating in unstructured environments. This framework also makes it possible to query data for specific weather conditions or sensor setups from a database in future. The dataset, pre-trained models for offroad perception, and additional documentation can be found at https://goose-dataset.de/.
|
|
13:30-15:00, Paper ThBT8-CC.7 | Add to My Program |
CoAS-Net: Context-Aware Suction Network with a Large-Scale Domain Randomized Synthetic Dataset |
|
Son, Yeong Gwang | SungKyunKwan University |
Bui, Tat Hieu | Sungkyunkwan University |
Hong, Juyong | Sungkyunkwan Univ |
Kim, Yong Hyeon | SungKyunKwan University |
Moon, Seung Jae | Sungkyunkwan, Mechanical Engineering, Robottory |
Kim, ChunSoo | SKKU |
Rhee, Issac | Sungkyunkwan University |
Kang, Hansol | Sungkyunkwan University |
Choi, Hyouk Ryeol | Sungkyunkwan University |
Keywords: Data Sets for Robotic Vision, Deep Learning in Grasping and Manipulation, Computer Vision for Automation
Abstract: Robotic grasping is one of the essential skills in robotics. From industrial to housework, robots are required to handle objects, enabling them to interact with their surroundings. Among the various tasks in robotic grasping, bin-picking is considered one of the most challenging because of the cluttered bin filled with objects. Also, for the next-level automation, they need to handle unseen objects and discriminate target objects and outliers. This paper proposes a novel dataset generation pipeline for suction-grasping in bin-picking tasks. This pipeline consists of a series of methods that progressively transit from a single object evaluation to an entire scene evaluation and lower the dimension of the labels to the image space. We trained a suction prediction FCN (Fully Convolution Network) with our dataset generated from the pipeline and conducted bin-picking experiments. Our large-scale collision-free annotation enables the network to understand the context of a bin-picking task, where collisions between the gripper and the bin or object are a concern, and distinguishing the background is crucial. The results show that our solution excels the existing methods, and the network demonstrates its context-aware grasp on objects with loosely defined RoI (Region of Interest). Our dataset and the grasp detection model are available at https://github.com/SonYeongGwang/CoAS-Net.git.
|
|
13:30-15:00, Paper ThBT8-CC.8 | Add to My Program |
NYC-Indoor-VPR: A Long-Term Indoor Visual Place Recognition Dataset with Semi-Automatic Annotation |
|
Sheng, Diwei | New York University |
Yang, Anbang | New York University |
Rizzo, John-Ross | NYU School of Medicine / NYU Tandon School of Engineering |
Feng, Chen | New York University |
Keywords: Data Sets for Robotic Vision, Localization, Deep Learning for Visual Perception
Abstract: Visual Place Recognition (VPR) in indoor environments is beneficial to humans and robots for better localization and navigation. It is challenging due to appearance changes at various frequencies, and difficulties of obtaining ground truth metric trajectories for training and evaluation. This paper introduces the NYC-Indoor-VPR dataset, a unique and rich collection of over 36,000 images compiled from 13 distinct crowded scenes in New York City taken under varying lighting conditions with appearance changes. Each scene has multiple revisits across a year. To establish the ground truth for VPR, we propose a semiautomatic annotation approach that computes the positional information of each image. Our method specifically takes pairs of videos as input and yields matched pairs of images along with their estimated relative locations. The accuracy of this matching is refined by human annotators, who utilize our annotation software to correlate the selected keyframes. Finally, we present a benchmark evaluation of several state-of-the-art VPR algorithms using our annotated dataset, revealing its challenge and thus value for VPR research.
|
|
13:30-15:00, Paper ThBT8-CC.9 | Add to My Program |
TreeScope: An Agricultural Robotics Dataset for LiDAR-Based Mapping of Trees in Forests and Orchards |
|
Cheng, Derek | University of Pennsylvania |
Cladera, Fernando | University of Pennsylvania |
Prabhu, Ankit | University of Pennsylvania |
Liu, Xu | University of Pennsylvania |
Zhu, Alan | University of Pennsylvania |
Green, Patrick Corey | Virginia Polytechnic Institute and State University |
Ehsani, Reza | UC Merced |
Chaudhari, Pratik | University of Pennsylvania |
Kumar, Vijay | University of Pennsylvania |
Keywords: Data Sets for Robotic Vision, Object Detection, Segmentation and Categorization, Deep Learning for Visual Perception
Abstract: Data collection for forestry, timber, and agriculture currently relies on manual techniques which are labor-intensive and time-consuming. We seek to demonstrate that robotics offers improvements over these techniques and accelerate agricultural research, beginning with semantic segmentation and diameter estimation of trees in forests and orchards. We present TreeScope v1.0, the first robotics dataset for precision agriculture and forestry addressing the counting and mapping of trees in forestry and orchards. TreeScope provides LiDAR data from agricultural environments collected with robotics platforms, such as UAV and mobile robot platforms carried by vehicles and human operators. In the first release of this dataset, we provide ground-truth data with over 1,800 manually annotated semantic labels for tree stems and field-measured tree diameters. We share benchmark scripts for these tasks that researchers may use to evaluate the accuracy of their algorithms. Finally, we run our open-source diameter estimation and off-the-shelf semantic segmentation algorithms and share our baseline results.
|
|
ThBT9-CC Oral Session, CC-419 |
Add to My Program |
Task and Motion Planning I |
|
|
Chair: Song, Dezhen | Mohamed Bin Zayed University of Artificial Intelligence (MBZUAI) and Texas A&M University (TAMU) |
Co-Chair: Chandraker, Manmohan | University of California, San Diego |
|
13:30-15:00, Paper ThBT9-CC.1 | Add to My Program |
Long-HOT: A Modular Hierarchical Approach for Long-Horizon Object Transport |
|
N N, Sriram | NEC Labs America |
Jayaraman, Dinesh | University of Pennsylvania |
Chandraker, Manmohan | University of California, San Diego |
Keywords: Task and Motion Planning, Semantic Scene Understanding, Autonomous Agents
Abstract: We aim to address key challenges in long-horizon embodied exploration and navigation by proposing a long-horizon object transport task called Long-HOT and a novel modular framework for temporally extended navigation. Agents in Long-HOT need to efficiently find and pick up target objects that are scattered in the environment, carry them to a goal location with load constraints, and optionally have access to a container. We propose a modular topological graph-based transport policy (HTP) that explores efficiently with the help of weighted frontiers. Our hierarchical approach uses a combination of motion planning algorithms to reach point goals within explored locations and object navigation policies for moving towards semantic targets at unknown locations. Experiments on both our proposed Habitat transport task and on MultiOn benchmarks show that our method outperforms baselines and prior works. Further, we analyze the agent's behavior for the usage of the container and demonstrate meaningful generalization to harder transport scenes with training only on simpler versions of the task.
|
|
13:30-15:00, Paper ThBT9-CC.2 | Add to My Program |
COAST: Constraints and Streams for Task and Motion Planning |
|
Vu, Brandon | Stanford University |
Migimatsu, Toki | Stanford University |
Bohg, Jeannette | Stanford University |
Keywords: Task and Motion Planning, Integrated Planning and Control
Abstract: Task and Motion Planning (TAMP) algorithms solve long-horizon robotics tasks by integrating task planning with motion planning; the task planner proposes a sequence of actions towards a goal state and the motion planner verifies whether this action sequence is geometrically feasible for the robot. However, state-of-the-art TAMP algorithms do not scale well with the difficulty of the task and require an impractical amount of time to solve relatively small problems. We propose Constraints and Streams for Task and Motion Planning (COAST), a probabilistically-complete, sampling-based TAMP algorithm that combines stream-based motion planning with an efficient, constrained task planning strategy. We validate COAST on three challenging TAMP domains and demonstrate that our method outperforms baselines in terms of cumulative task planning time by an order of magnitude. You can find more supplementary materials on our project website at url{https://branvu.github.io/coast.github.io}.
|
|
13:30-15:00, Paper ThBT9-CC.3 | Add to My Program |
Extending the Cooperative Dual-Task Space in Conformal Geometric Algebra |
|
Löw, Tobias | Idiap Research Institute, EPFL |
Calinon, Sylvain | Idiap Research Institute |
Keywords: Dual Arm Manipulation, Optimization and Optimal Control
Abstract: In this work, we are presenting an extension of the cooperative dual-task space (CDTS) in conformal geometric algebra. The CDTS was first defined using dual quaternion algebra and is a well established framework for the simplified definition of tasks using two manipulators. By integrating conformal geometric algebra, we aim to further enhance the geometric expressiveness and thus simplify the modeling of various tasks. We show this formulation by first presenting the CDTS and then its extension that is based around a cooperative pointpair. This extension keeps all the benefits of the original formulation that is based on dual quaternions, but adds more tools for geometric modeling of the dual-arm tasks. We also present how this CGA-CDTS can be seamlessly integrated with an optimal control framework in geometric algebra that was derived in previous work. In the experiments, we demonstrate how to model different objectives and constraints using the CGA-CDTS. Using a setup of two Franka Emika robots we then show the effectiveness of our approach using model predictive control in real world experiments.
|
|
13:30-15:00, Paper ThBT9-CC.4 | Add to My Program |
D-LGP: Dynamic Logic-Geometric Program for Reactive Task and Motion Planning |
|
Xue, Teng | Idiap/EPFL |
Razmjoo, Amirreza | Idiap Research Institute |
Calinon, Sylvain | Idiap Research Institute |
Keywords: Task and Motion Planning
Abstract: Many real-world sequential manipulation tasks involve a combination of discrete symbolic search and continuous motion planning, collectively known as combined task and motion planning (TAMP). However, prevailing methods often struggle with the computational burden and intricate combinatorial challenges, limiting their applications for online replanning in the real world. To address this, we propose Dynamic Logic-Geometric Program (D-LGP), a novel approach integrating Dynamic Tree Search and global optimization for efficient hybrid planning. Through empirical evaluation on three benchmarks, we demonstrate the efficacy of our approach, showcasing superior performance in comparison to state-of-the-art techniques. We validate our approach through simulation and demonstrate its reactive capability to cope with online uncertainty and external disturbances in the real world.
|
|
13:30-15:00, Paper ThBT9-CC.5 | Add to My Program |
Indoor Exploration and Simultaneous Trolley Collection through Task-Oriented Environment Partitioning |
|
Gao, Junjie | Harbin Institute of Technology |
Xie, Peijia | Southern University of Science and Technology |
Gao, Xuheng | Southern University of Science and Technology |
Sun, Zhirui | Southern University of Science and Technology |
Wang, Jiankun | Southern University of Science and Technology |
Meng, Max Q.-H. | The Chinese University of Hong Kong |
Keywords: Motion and Path Planning, Search and Rescue Robots, Mapping
Abstract: In this paper, we present a simultaneous exploration and object search framework for the application of autonomous trolley collection. For environment representation, a task-oriented environment partitioning algorithm is presented to extract diverse information for each sub-task. First, LiDAR data is classified as potential objects, walls, and obstacles after outlier removal. Segmented point clouds are then transformed into a hybrid map with the following functional components: object proposals to avoid missing trolleys during exploration; room layouts for semantic space segmentation; and polygonal obstacles containing geometry information for efficient motion planning. For exploration and simultaneous trolley collection, we propose an efficient exploration-based object search method. First, a traveling salesman problem with precedence constraints (TSP-PC) is formulated by grouping frontiers and object proposals. The next target is selected by prioritizing object search while avoiding excessive robot backtracking. Then, feasible trajectories with adequate obstacle clearance are generated by topological graph search. We validate the proposed framework through simulations and demonstrate the system with real-world autonomous trolley collection tasks.
|
|
13:30-15:00, Paper ThBT9-CC.6 | Add to My Program |
Effort Level Search in Infinite Completion Trees with Application to Task-And-Motion Planning |
|
Toussaint, Marc | TU Berlin |
Ortiz-Haro, Joaquim | TU Berlin |
Hartmann, Valentin | ETH Zürich |
Karpas, Erez | Technion |
Hoenig, Wolfgang | TU Berlin |
Keywords: Task and Motion Planning, Planning, Scheduling and Coordination, Manipulation Planning
Abstract: Solving a Task-and-Motion Planning (TAMP) problem can be represented as a sequential (meta-) decision process, where early decisions concern the skeleton (sequence of logic actions) and later decisions concern what to compute for such skeletons (e.g., action parameters, bounds, RRT paths, or full optimal manipulation trajectories). We consider the general problem of how to schedule compute effort in such hierarchical solution processes. More specifically, we introduce infinite completion trees as a problem formalization, where before we can expand or evaluate a node, we have to solve a preemptible computational sub-problem of a priori unknown compute effort. Infinite branchings represent an infinite choice of random initializations of computational sub-problems. Decision making in such trees means to decide on where to invest compute or where to widen a branch. We propose a heuristic to balance branching width and compute depth using polynomial level sets. We show completeness of the resulting solver and that a round robin baseline strategy used previously for TAMP becomes a special case. Experiments confirm the robustness and efficiency of the method on problems including stochastic bandits and a suite of TAMP problems, and compare our approach to a round robin baseline. An appendix comparing the framework to bandit methods and proposing a corresponding tree policy version is found on the supplementary webpage.
|
|
13:30-15:00, Paper ThBT9-CC.7 | Add to My Program |
Sense in Motion with Belief Clustering: Efficient Gas Source Localization with Mobile Robots |
|
Jin, Wanting | EPFL |
Martinoli, Alcherio | EPFL |
Keywords: Environment Monitoring and Management, Task and Motion Planning, Probabilistic Inference
Abstract: Given the patchy nature of gas plumes and the slow response of conventional gas sensors, the use of mobile robots for Gas Source Localization (GSL) tasks presents significant challenges. These aspects increase the difficulties in obtaining gas measurements, encompassing both qualitative and quantitative aspects. Most existing model-based GSL algorithms rely on lengthy stops at each sampling point to ensure accurate gas measurements. However, this approach not only prolongs the time required for a single measurement but also hinders sampling during robot motion, thus exacerbating the scarcity of available gas measurements. In this work, our goal is to push the boundaries in terms of continuity in sampling to enhance system efficiency. Firstly, we decouple and comprehensively evaluate the impact of both plume dynamics and gas sensor properties on the GSL performance. Secondly, we demonstrate that adopting a continuous sampling strategy, which has been generally overlooked in prior research, markedly enhances the system efficiency by obviating the prolonged measurement pauses and leveraging all the data gathered during the robot motion. Thirdly, we further expand the capabilities of the continuous sampling by introducing a novel informative path-planning strategy, which takes into account all the information gathered along the robot's movement. The proposed method is evaluated in both simulation and reality under different scenarios emulating indoor environmental conditions.
|
|
13:30-15:00, Paper ThBT9-CC.8 | Add to My Program |
R-LGP: A Reachability-Guided Logic-Geometric Programming Framework for Optimal Task and Motion Planning on Mobile Manipulators |
|
Ly, Kim Tien | University of Oxford |
Semenov, Valeriy | Oxford University |
Risiglione, Mattia | Italian Institute of Technology |
Merkt, Wolfgang Xaver | University of Oxford |
Havoutis, Ioannis | University of Oxford |
Keywords: Task and Motion Planning
Abstract: This paper presents an optimization-based solution to task and motion planning (TAMP) on mobile manipulators. Logic-geometric programming (LGP) has shown promising capabilities for optimally dealing with hybrid TAMP problems that involve abstract and geometric constraints. However, LGP does not scale well to high-dimensional systems (e.g. mobile manipulators) and can suffer from obstacle avoidance issues due to local minima. In this work, we extend LGP with a sampling-based reachability graph to enable solving optimal TAMP on high-DoF mobile manipulators. The proposed reachability graph can incorporate environmental information (obstacles) to provide the planner with sufficient geometric constraints. This reachability-aware heuristic efficiently prunes infeasible sequences of actions in the continuous domain, hence, it reduces replanning by securing feasibility at the final full path trajectory optimization. Our framework proves to be time-efficient in computing optimal and collision-free solutions, while outperforming the current state of the art on metrics of success rate, planning time, path length and number of steps. We validate our framework on the physical Toyota HSR robot and report comparisons on a series of mobile manipulation tasks of increasing difficulty.
|
|
13:30-15:00, Paper ThBT9-CC.9 | Add to My Program |
Solving Sequential Manipulation Puzzles by Finding Easier Subproblems |
|
Levit, Svetlana | TU Berlin |
Ortiz-Haro, Joaquim | TU Berlin |
Toussaint, Marc | TU Berlin |
Keywords: Task and Motion Planning, Manipulation Planning
Abstract: We consider a set of challenging sequential manipulation puzzles, where an agent has to interact with multiple movable objects and navigate narrow passages. Such settings are notoriously difficult for Task-and-Motion Planners, as they require interdependent regrasps and solving hard motion planning problems. In this paper, we propose to search over sequences of easier pick-and-place subproblems, which can lead to the solution of the manipulation puzzle. Our method combines a heuristic- driven forward subproblem search with an optimization based Task-and-Motion Planning solver. To guide the search, we introduce heuristics to generate and prioritize subgoals. We evaluate our approach on various manually designed and automatically generated scenes, demonstrating the benefits of auxiliary subproblems in sequential manipulation planning.
|
|
ThBT10-CC Oral Session, CC-501 |
Add to My Program |
Modeling, Control, and Learning for Soft Robots II |
|
|
Chair: Hughes, Josie | EPFL |
Co-Chair: Sadati, S.M.Hadi | King's College London |
|
13:30-15:00, Paper ThBT10-CC.1 | Add to My Program |
Control and Implementation of a Fluidic Elastomer Actuator for Active Suppression of Hand Tremor |
|
Wang, Yixin | Tsinghua University |
Liu, Xin-Jun | Tsinghua University |
Zhao, Huichan | Tsinghua University |
Keywords: Modeling, Control, and Learning for Soft Robots, Neural and Fuzzy Control, Prosthetics and Exoskeletons
Abstract: Active exoskeletons for tremor suppression show potential for treatment of pathological tremor thanks to their non-invasive nature. However, the active force was only used for the voluntary movement following. As a potential alternative, fluidic elastomer actuators (FEAs) possess compliance and flexibility that is important for wearable devices. In this study, we introduce the control implementation for a FEA to the application of active suppression of hand tremor, which allows a wearable FEA actively exerting force on the finger against tremor and meanwhile following the voluntary motion. The proposed pressure control algorithm could push the closed-loop pressure control to 19 Hz cutoff frequency. A combination neural network of GRU-MLP (Gated Recurrent Unit-Multilayer Perceptron) was proposed to identify and control a fiber-reinforced FEA following the voluntary movement of hand. The active tremor suppression effectiveness of the proposed method was tested on a bench-top tremor simulator, and such method could suppress the hand tremor from the original amplitude of more than 5° to less than 1°. The proposed method paves a new way for tremor suppression exoskeletons.
|
|
13:30-15:00, Paper ThBT10-CC.2 | Add to My Program |
Adaptive State Estimation with Constant-Curvature Dynamics Using Force-Torque Sensors with Application to a Soft Pneumatic Actuator |
|
Mehl, Phillip Maximilian | Leibniz University Hannover |
Bartholdt, Max | Institute of Mechatronic Systems, Leibniz Universität Hannover |
Ehlers, Simon F. G. | Leibniz University Hannover |
Seel, Thomas | Leibniz Universität Hannover |
Schappler, Moritz | Institute of Mechatronic Systems, Leibniz Universitaet Hannover |
Keywords: Modeling, Control, and Learning for Soft Robots, Probability and Statistical Methods, Dynamics
Abstract: Abstract— Using compliant materials leads to continuum robots undergoing large deformations. Their nonlinear behavior motivates the use of model-based controllers. They require state estimation as an essential step to be deployed. Available sensors are usually realized by introducing rigid bodies to the soft robot or inserting soft sensors made of materials different from the robot itself. Both approaches result in changes in the system’s dynamics. Optical measurements are problematic, especially in confined spaces. This can be avoided when the sensor is located at the robot’s base. This paper studies the state estimation of a pneumatically actuated soft robot using the measured forces and torques at its base. For the first time, this is done using an unscented Kalman filter without restraining the dynamics to a planar or quasi-static motion while applying it to a real system. Real-time capability is achieved with our implementation. The state estimation is tested in a Cosserat rod simulation and on the physical system. The position is estimated with an accuracy of three to five millimeters for a 130 millimeter long pneumatic robot.
|
|
13:30-15:00, Paper ThBT10-CC.3 | Add to My Program |
SofToss: Learning to Throw Objects with a Soft Robot |
|
Bianchi, Diego | Scuola Superiore Sant'Anna, Pisa, Italy |
Antonelli, Michele Gabrio Ernesto | University of L’Aquila |
Laschi, Cecilia | National University of Singapore |
Sabatini, Angelo Maria | Scuola Superiore Sant'Anna |
Falotico, Egidio | Scuola Superiore Sant'Anna |
Keywords: Modeling, Control, and Learning for Soft Robots, Reinforcement Learning, Machine Learning for Robot Control
Abstract: In this paper, we present, for the first time, a soft robot control system (SofToss) capable of throwing life-size objects toward target positions. SofToss is an open-loop controller based on deep reinforcement learning that generates, given the target position, an actuation pattern for the tossing task. To deal with the high non-linearity of the dynamics of soft robots, we deployed a neural network to learn the relationship between the actuation pattern and the target landing position, i.e., the direct model of the task. Then, a reinforcement learning method is used to predict the actuation pattern given the goal position. The proposed controller was tested on a modular soft robotic arm, I-Support, by tossing four objects of different shapes and weights in 140-mm squared target boxes. We registered a success rate of almost 65% of the throws in two actuation modalities (i.e., partial, keeping one module of the soft arm passive, and complete, with both modules active). This performance raises to 85% if one can choose the number of modules to actuate for each throwing direction. Furthermore, the results show that the proposed learning-based real-time controller achieves performance comparable to an optimization-based non-real-time controller. Our study contributes to the foundations for bringing soft robots into everyday life and industry, by performing more complex, dynamic tasks.
|
|
13:30-15:00, Paper ThBT10-CC.4 | Add to My Program |
A Soft Robot Inverse Kinematics for Virtual Reality |
|
Bern, James | Williams College |
May, William | Williams College |
Osborn, Austin | Williams College |
Stella, Francesco | EPFL |
Zargarzadeh, Sadra | University of Alberta |
Hughes, Josie | EPFL |
Keywords: Modeling, Control, and Learning for Soft Robots, Simulation and Animation, Virtual Reality and Interfaces
Abstract: We show how a variety of techniques from Computer Graphics can be leveraged to intuitively control the shape (configuration) of arbitrary 3D Soft Robots in VR. Our pipeline, Virtual Reality Soft Robot Inverse Kinematics (VR-Soft IK), overcomes fundamental limitations of general-purpose drag-and-drop soft robot control interfaces by leaving the 2D computer screen for 3D Virtual Reality (VR). VR-Soft IK uses a simulation based on the Finite Element Method (FEM) and a control method based on sensitivity analysis. Additionally, we show that our general control pipeline can be fused with techniques from 3D character animation to skin our simulation with a high-resolution surface mesh, pointing a way toward Mixed Reality Soft Robots. This full Skinned VR-Soft IK pipeline uses skeletal animation and GPU picking. We demonstrate the utility of our pipeline by doing real-time, open-loop control of the real-world 3D soft robotic arm Helix.
|
|
13:30-15:00, Paper ThBT10-CC.5 | Add to My Program |
Toward the Use of Proxies for Efficient Learning Manipulation and Locomotion Strategies on Soft Robots |
|
Ménager, Etienne | Univ. Lille, Inria, CNRS, Centrale Lille, UMR 9189 CRIStAL |
Peyron, Quentin | Inria and CRIStAL UMR CNRS 9189, University of Lille |
Duriez, Christian | INRIA |
Keywords: Modeling, Control, and Learning for Soft Robots, Soft Robot Applications
Abstract: Soft robots are naturally designed to perform safe interactions with their environment, like locomotion and manipulation. In the literature, there are now many concepts, often bio-inspired, to propose new modes of locomotion or grasping. However, a methodology for implementing motion planning of these tasks, as exists for rigid robots, is still lacking. One of the difficulties comes from the modeling of these robots, which is very different, as it is based on the mechanics of deformable bodies. These models, whose dimension is often very large, make learning and optimization methods very costly. In this paper, we propose a proxy approach, as exists for humanoid robotics. This proxy is a simplified model of the robot that enables frugal learning of a motion strategy. This strategy is then transferred to the complete model to obtain the corresponding actuation inputs. Our methodology is illustrated and analyzed on two classical designs of soft robots doing manipulation and locomotion tasks.
|
|
13:30-15:00, Paper ThBT10-CC.6 | Add to My Program |
Vision-Based Autonomous Steering of a Miniature Eversion Growing Robot |
|
Wu, Zicong | King's College London |
Sadati, S.M.Hadi | King's College London |
Rhode, Kawal | King's College London |
Bergeles, Christos | King's College London |
Keywords: Modeling, Control, and Learning for Soft Robots, Soft Robot Applications
Abstract: This paper presents vision-based autonomous navigation of a steerable soft growing robot. Our experimental platform is the previously presented MAMMOBOT, which is a small-diameter eversion growing robot with an embedded steerable catheter. The current manuscript first models the robot using kinematics (constant curvature) and mechanics (virtual work). Modelling considers the potential misalignment between the everting sheath and the embedded catheter. Second, a switching control architecture is proposed, wherein a model-based controller is employed for rapid convergence to a target position, followed by a closed-loop proportional controller that minimises the system’s steady-state error. Feedback is visually provided from a calibrated stereo vision system. Target-positioning and trajectory-tracking experiments are conducted to evaluate the performance of the control architecture. Experimental results demonstrate the superiority of the mechanics-based modelling and control approach, showing an average accuracy of 0.67 mm (0.66% arclength) in target positioning experiments, and an accuracy of 0.72 mm (1.11% arclength) and 0.72 mm (1.01% arclength) for tracking a square trajectory and a circular trajectory, respectively. The autonomous steering framework is showcased within a 3D-printed mammary duct phantom. This work sets the stage for endoscope-based autonomous navigation of MAMMOBOT and similar soft growing steerable robots.
|
|
13:30-15:00, Paper ThBT10-CC.7 | Add to My Program |
POE: Acoustic Soft Robotic Proprioception for Omnidirectional End-Effectors |
|
Yoo, Uksang | Carnegie Mellon University |
Lopez, Ziven | Northeastern University |
Ichnowski, Jeffrey | Carnegie Mellon University |
Oh, Jean | Carnegie Mellon University |
Keywords: Modeling, Control, and Learning for Soft Robots, Soft Robot Materials and Design, Grippers and Other End-Effectors
Abstract: Shape estimation is crucial for precise control of soft robots. However, soft robot shape estimation and proprioception are challenging due to their complex deformation behaviors and infinite degrees of freedom. Their continuously deforming bodies complicate integrating rigid sensors and reliably estimating its shape. In this work, we present Proprioceptive Omnidirectional End-effector (POE), a tendon-driven soft robot with six embedded microphones. We first introduce novel applications of 3D reconstruction methods to acoustic signals from the microphones for soft robot shape proprioception. To improve the proprioception pipeline's training efficiency and model prediction consistency, we present POE-M. POE-M predicts key point positions from acoustic signal observations and uses an energy-minimization method to reconstruct a physically admissible high-resolution mesh of POE. We evaluate mesh reconstruction on simulated data and the POE-M pipeline with real-world experiments. Ablation studies suggest POE-M's guidance of the key points during the mesh reconstruction process provides robustness and stability to the pipeline. POE-M reduced the maximum Chamfer distance error by 23.1% compared to the state-of-the-art end-to-end soft robot proprioception models and achieved 4.91 mm average Chamfer distance error during evaluation. Supplemental materials, experiment data, and visualizations are available at sites.google.com/view/acoustic-poe.
|
|
13:30-15:00, Paper ThBT10-CC.8 | Add to My Program |
A Data-Driven Approach to Geometric Modeling of Systems with Low-Bandwidth Actuator Dynamics |
|
Deng, Siming | Johns Hopkins University |
Liu, Junning | Department of Mechanics and Engineering Sciences, College of Eng |
Datta, Bibekananda | Johns Hopkins University |
Pantula, Aishwarya | Johns Hopkins University |
Gracias, David H. | The Johns Hopkins University, Baltimore |
Nguyen, Thao | Johns Hopkins University |
Bittner, Brian | JHUAPL |
Cowan, Noah J. | Johns Hopkins University |
Keywords: Modeling, Control, and Learning for Soft Robots, Underactuated Robots, Biologically-Inspired Robots
Abstract: It is challenging to perform system identification on soft robots due to their underactuated, high-dimensional dynamics. In this work, we present a data-driven modeling framework, based on geometric mechanics (also known as gauge theory) that can be applied to systems with low-bandwidth control of the system's internal configuration. This method constructs a series of connected models comprising actuator and locomotor dynamics based on data points from stochastically perturbed, repeated behaviors. By deriving these connected models from general formulations of dissipative Lagrangian systems with symmetry, we offer a method that can be applied broadly to robots with first-order, low-pass actuator dynamics, including swelling-driven actuators used in hydrogel crawlers. These models accurately capture the dynamics of the system shape and body movements of a simplified swimming robot model. We further apply our approach to a stimulus-responsive hydrogel simulator that captures the complexity of chemo-mechanical interactions that drive shape changes in biomedically relevant micromachines. Finally, we propose an approach of numerically optimizing control signals by iteratively refining models, which is applied to optimize the input waveform for the hydrogel crawler. This transfer to realistic environments provides promise for applications in locomotor design and biomedical engineering.
|
|
13:30-15:00, Paper ThBT10-CC.9 | Add to My Program |
Modeling and Control of Intrinsically Elasticity Coupled Soft-Rigid Robots |
|
Patterson, Zachary | MIT |
Della Santina, Cosimo | TU Delft |
Rus, Daniela | MIT |
Keywords: Modeling, Control, and Learning for Soft Robots
Abstract: While much work has been done recently in the realm of model-based control of soft robots and soft-rigid hybrids, most works examine robots that have an inherently serial structure. While these systems have been prevalent in the literature, there is an increasing trend toward designing soft-rigid hybrids with intrinsically coupled elasticity between various degrees of freedom. In this work, we seek to address the issues of modeling and controlling such structures, particularly when underactuated. We introduce several simple models for elastic coupling, typical of those seen in these systems. We then propose a controller that compensates for the elasticity, and we prove its stability with Lyapunov methods without relying on the elastic dominance assumption. This controller is applicable to the general class of underactuated soft robots. After evaluating the controller in simulated cases, we then develop a simple hardware platform to evaluate both the models and the controller. Finally, using the hardware, we demonstrate a novel use case for underactuated, elastically coupled systems in "sensorless" force control.
|
|
ThBT11-CC Oral Session, CC-502 |
Add to My Program |
Learning from Demonstration II |
|
|
Chair: Schmitz, Alexander | Waseda University |
Co-Chair: Lin, Hsiu-Chin | McGill University |
|
13:30-15:00, Paper ThBT11-CC.1 | Add to My Program |
A Combination of a Controllable Clutch and an Oscillating Slider Crank Mechanism for Ease of Direct-Teaching with Various Payloads |
|
Arifin, Muhammad | Waseda University |
Kage, Yuta | Waseda University |
Yang, Yuchen | Waseda University |
Schmitz, Alexander | Waseda University |
Sugano, Shigeki | Waseda University |
Keywords: Learning from Demonstration, Physical Human-Robot Interaction, Compliant Joints and Mechanisms
Abstract: Direct teaching is a straightforward way of teaching new motion to robots. Active methods with torque sensors, for example, can be used so that the robot can follow the movements of the human, but such methods introduce delays. Alternatively, series clutch actuators are easily backdrivable without delay. However, vertical joints are subject to gravity torques, which need to be compensated when disengaging the clutch. We implemented passive gravity compensation to counteract the robot’s weight, but this mechanism cannot compensate for varying payloads, as adjustable passive gravity compensation is relatively slow and mechanically complex. The varying payload causes an unintended joint movement, i.e. the arm falls down on its own, which is unacceptable during direct teaching. Therefore, this paper demonstrates how the torque output controlled with series clutch actuators can be used to compensate for varying payloads while maintaining high backdrivability. The proposed method is evaluated on a collaborative robot with a clutch in series for each actuator. Real-world experiments with payloads from 0 to 3 kg are conducted. During the experiments, the operator force is measured to evaluate the proposed method.
|
|
13:30-15:00, Paper ThBT11-CC.2 | Add to My Program |
WayEx: Waypoint Exploration Using a Single Demonstration |
|
Levy, Mara | University of Maryland, College Park |
Saini, Nirat | University of Maryland, College Park |
Shrivastava, Abhinav | University of Maryland, College Park |
Keywords: Learning from Demonstration, Imitation Learning, Reinforcement Learning
Abstract: We propose WayEx, a new method for learning complex goal-conditioned robotics tasks from a single demonstration. Our approach distinguishes itself from existing imitation learning methods by demanding fewer expert examples and eliminating the need for information about the actions taken during the demonstration. This is accomplished by introducing a new reward function and employing a knowledge expansion technique. We demonstrate the effectiveness of WayEx, our waypoint exploration strategy, across six diverse tasks, showcasing its applicability in various environments. Notably, our method significantly reduces training time by 50% as compared to traditional reinforcement learning methods. WayEx obtains a higher reward than existing imitation learning methods given only a single demonstration. Furthermore, we demonstrate its success in tackling complex environments where standard approaches fall short.
|
|
13:30-15:00, Paper ThBT11-CC.3 | Add to My Program |
Generating Robotic Elliptical Excisions with Human-Like Tool-Tissue Interactions |
|
Straizys, Arturas | University of Edinburgh |
Burke, Michael | Monash University |
Ramamoorthy, Subramanian | The University of Edinburgh |
Keywords: Learning from Demonstration, Imitation Learning, Force and Tactile Sensing
Abstract: In surgery, the application of appropriate force levels is critical for the success and safety of a given procedure. While many studies are focused on measuring in situ forces, little attention has been devoted to relating these observed forces to surgical techniques. Answering questions like “Can certain changes to a surgical technique result in lower forces and increased safety margins?” could lead to improved surgical practice, and importantly, patient outcomes. However, such studies would require a large number of trials and professional surgeons, which is generally impractical to arrange. Instead, we show how robots can learn several variations of a surgical technique from a smaller number of surgical demonstrations and interpolate learnt behaviour via a parameterised skill model. This enables a large number of trials to be performed by a robotic system and the analysis of surgical techniques and their downstream effects on tissue. Here, we introduce a parameterised model of the elliptical excision skill and apply a Bayesian optimisation scheme to optimise the excision behaviour with respect to expert ratings, as well as individual characteristics of excision forces. Results show that the proposed framework can successfully align the generated robot behaviour with subjects across varying levels of proficiency in terms of excision forces.
|
|
13:30-15:00, Paper ThBT11-CC.4 | Add to My Program |
Fitting Parameters of Linear Dynamical Systems to Regularize Forcing Terms in Dynamical Movement Primitives |
|
Stulp, Freek | DLR - Deutsches Zentrum Für Luft Und Raumfahrt E.V |
Colomé, Adriŕ | Institut De Robňtica I Informŕtica Industrial (CSIC-UPC), Q28180 |
Torras, Carme | Csic - Upc |
Keywords: Learning from Demonstration
Abstract: Due to their flexibility and ease of use, Dynamical Movement Primitives (DMPs) are widely used in robotics applications and research. DMPs combine linear dynamical systems to achieve robustness to perturbations and adaptation to moving targets with non-linear function approximators to fit a wide range of demonstrated trajectories. We propose a novel DMP formulation with a generalized logistic function as a delayed goal system. This formulation inherently has low initial jerk, and generates the bell-shaped velocity profiles that are typical of human movement. As the novel formulation is more expressive, it is able to fit a wide range of human demonstrations well, also without a non-linear forcing term. We exploit this increased expressiveness by automating the fitting of the dynamical system parameters through optimization. Our experimental evaluation demonstrates that this optimization regularizes the forcing term, and improves the interpolation accuracy of parametric DMPs.
|
|
13:30-15:00, Paper ThBT11-CC.5 | Add to My Program |
AirExo: Low-Cost Exoskeletons for Learning Whole-Arm Manipulation in the Wild |
|
Fang, Hongjie | Shanghai Jiao Tong University |
Fang, Hao-Shu | Shanghai Jiao Tong University |
Wang, Yiming | Shanghai Jiao Tong University |
Ren, Jieji | Shanghai Jiao Tong University |
Chen, Jingjing | Shanghai Jiao Tong University |
Zhang, Ruo | University College London |
Wang, Weiming | Shanghai Jiao Tong University |
Lu, Cewu | ShangHai Jiao Tong University |
Keywords: Bimanual Manipulation, Learning from Demonstration, Imitation Learning
Abstract: While humans can use parts of their arms other than the hands for manipulations like gathering and supporting, whether robots can effectively learn and perform the same type of operations remains relatively unexplored. As these manipulations require joint-level control to regulate the complete poses of the robots, we develop AirExo, a low-cost, adaptable, and portable dual-arm exoskeleton, for teleoperation and demonstration collection. As collecting teleoperated data is expensive and time-consuming, we further leverage AirExo to collect cheap in-the-wild demonstrations at scale. Under our in-the-wild learning framework, we show that with only 3 minutes of the teleoperated demonstrations, augmented by diverse and extensive in-the-wild data collected by AirExo, robots can learn a policy that is comparable to or even better than one learned from teleoperated demonstrations lasting over 20 minutes. Experiments demonstrate that our approach enables the model to learn a more general and robust policy across the various stages of the task, enhancing the success rates in task completion even with the presence of disturbances. Project website: https://airexo.github.io/
|
|
13:30-15:00, Paper ThBT11-CC.6 | Add to My Program |
Bayesian Constraint Inference from User Demonstrations Based on Margin-Respecting Preference Models |
|
Papadimitriou, Dimitris | UC Berkeley |
Brown, Daniel | University of Utah |
Keywords: Learning from Demonstration, Reinforcement Learning, Probabilistic Inference
Abstract: It is crucial for robots to be aware of the presence of constraints in order to acquire safe policies. However, explicitly specifying all constraints in an environment can be a challenging task. State-of-the-art constraint inference algorithms learn constraints from demonstrations, but tend to be computationally expensive and prone to instability issues. In this paper, we propose a novel Bayesian method that infers constraints based on preferences over demonstrations. The main advantages of our proposed approach are that it 1) infers constraints without calculating a new policy at each iteration, 2) uses a simple and more realistic ranking of groups of demonstrations, without requiring pairwise comparisons over all demonstrations, and 3) adapts to cases where there are varying levels of constraint violation. Our empirical results demonstrate that our proposed Bayesian approach infers constraints of varying severity, more accurately than state-of-the-art constraint inference methods.
|
|
13:30-15:00, Paper ThBT11-CC.7 | Add to My Program |
Instructing Robots by Sketching: Learning from Demonstration Via Probabilistic Diagrammatic Teaching |
|
Zhi, Weiming | Carnegie Mellon University |
Zhang, Tianyi | Carnegie Mellon University |
Johnson-Roberson, Matthew | Carnegie Mellon University |
Keywords: Learning from Demonstration, Probabilistic Inference
Abstract: Learning from Demonstration (LfD) enables robots to acquire new skills by imitating expert demonstrations, allowing users to communicate their instructions in an intuitive manner. Recent progress in LfD often relies on kinesthetic teaching or teleoperation as the medium for users to specify the demonstrations. Kinesthetic teaching requires physical handling of the robot, while teleoperation demands proficiency with additional hardware. This paper introduces an alternative paradigm for LfD called emph{Diagrammatic Teaching}. Diagrammatic Teaching aims to teach robots novel skills by prompting the user to sketch out demonstration trajectories on 2D images of the scene, these are then synthesised as a generative model of motion trajectories in 3D task space. Additionally, we present the Ray-tracing Probabilistic Trajectory Learning (RPTL) framework for Diagrammatic Teaching. RPTL extracts time-varying probability densities from the 2D sketches, applies ray-tracing to find corresponding regions in 3D Cartesian space and fits a probabilistic model of motion trajectories to these regions. New motion trajectories, which mimic those sketched by the user, can then be generated from the probabilistic model. We empirically validate our framework both in simulation and on real robots, which include a fixed-base manipulator and a quadruped-mounted manipulator.
|
|
13:30-15:00, Paper ThBT11-CC.8 | Add to My Program |
Learning Distributional Demonstration Spaces for Task-Specific Cross-Pose Estimation |
|
Wang, Jenny | Carnegie Mellon University |
Donca, Octavian | Carnegie Mellon University |
Held, David | Carnegie Mellon University |
Keywords: Learning from Demonstration, Deep Learning Methods, Representation Learning
Abstract: Relative placement tasks are an important category of tasks in which one object needs to be placed in a desired pose relative to another object. Previous work has shown success in learning relative placement tasks from just a small number of demonstrations, when using relational reasoning networks with geometric inductive biases. However, such methods fail to consider that demonstrations for the same task can be fundamentally multimodal, like a mug hanging on any of n racks. We propose a method that retains the provably translation-invariant and relational properties of prior work but incorporates additional properties that enable learning multimodal, distributional examples. We show that our method is able to learn precise relative placement tasks with a small number of multimodal demonstrations with no human annotations across a diverse set of objects within a category. Supplementary information can be found on the website: https://sites.google.com/view/tax-posed/home.
|
|
13:30-15:00, Paper ThBT11-CC.9 | Add to My Program |
Globally Stable Neural Imitation Policies |
|
Abyaneh, Amin | McGill University |
Sosa Guzmán, Mariana | Universidad Veracruzana |
Lin, Hsiu-Chin | McGill University |
Keywords: Learning from Demonstration, Imitation Learning, Manipulation Planning
Abstract: Imitation learning mitigates the resource-intensive nature of learning policies from scratch by mimicking expert behavior. While existing methods can accurately replicate expert demonstrations, they often exhibit unpredictability in unexplored regions of the state space, thereby raising major safety concerns when facing perturbations. We propose SNDS, an imitation learning approach aimed at efficient training of scalable neural policies while formally ensuring global stability. SNDS leverages a neural architecture that enables the joint training of the policy and its associated Lyapunov candidate to ensure global stability throughout the learning process. We validate our approach through extensive simulations and deploy the trained policies on a real-world manipulator arm. The results confirm SNDS's ability to address instability, accuracy, and computational intensity challenges highlighted in the literature, positioning it as a promising solution for scalable and stable policy learning in complex environments.
|
|
ThBT12-CC Oral Session, CC-503 |
Add to My Program |
Deep Learning II |
|
|
Chair: Huang, Rui | The Chinese University of Hong Kong, Shenzhen |
Co-Chair: Natale, Lorenzo | Istituto Italiano Di Tecnologia |
|
13:30-15:00, Paper ThBT12-CC.1 | Add to My Program |
Pedestrian Trajectory Prediction Using Dynamics-Based Deep Learning |
|
Wang, Honghui | University of New South Wales |
Zhi, Weiming | Carnegie Mellon University |
Batista, Gustavo | UNSW |
Chandra, Rohitash | UNSW Sydney |
Keywords: Deep Learning Methods, Dynamics, Collision Avoidance
Abstract: Pedestrian trajectory prediction plays an important role in autonomous driving systems and robotics. Recent work utilizing prominent deep learning models for pedestrian motion prediction makes limited a priori assumptions about human movements, resulting in a lack of explainability and explicit constraints enforced on predicted trajectories. We present a dynamics-based deep learning framework with a novel asymptotically stable dynamical system integrated into a Transformer-based model. We use an asymptotically stable dynamical system to model human goal-targeted motion by enforcing the human walking trajectory, which converges to a predicted goal position, and to provide the Transformer model with prior knowledge and explainability. Our framework features the Transformer model that works with a goal estimator and dynamical system to learn features from pedestrian motion history. The results show that our framework outperforms prominent models using five benchmark human motion datasets.
|
|
13:30-15:00, Paper ThBT12-CC.2 | Add to My Program |
POAQL: A Partially Observable Altruistic Q-Learning Method for Cooperative Multi-Agent Reinforcement Learning |
|
Tao, Lesong | Xi'an Jiaotong University |
Kang, Miao | Xi’an Jiaotong University |
Dong, Jinpeng | Xi'an Jiaotong University |
Zhang, Songyi | Xi'an Jiaotong University |
Ye, Ke | Xi'an Jiaotong University |
Chen, Shitao | Xi'an Jiaotong University |
Zheng, Nanning | Xi'an Jiaotong University |
Keywords: Reinforcement Learning, Path Planning for Multiple Mobile Robots or Agents, Cooperating Robots
Abstract: Multi-Agent Path Finding (MAPF) is an important issue in multi-agent cooperation. Many studies apply Multi-Agent Reinforcement Learning (MARL) to solve MAPF in partially observable settings. The objective of cooperative MARL is to maximize the cumulative team reward. Nevertheless, in partially observable settings, the team reward is misleading due to unpredictable factors from the behavior and state of unobserved agents. To address this issue, we propose a Partially Observable Altruistic Q-learning (POAQL) method. POAQL considers the cumulative reward of the observed subteam instead of the whole team, where Altruistic Q-learning plays an important role in learning the subteam action value. In addition, we design a new conflict resolution without additional guidance to emphasize the cooperative nature of MARL frameworks. Experimental results show that POAQL outperforms existing reinforcement learning methods in terms of efficiency and performance.
|
|
13:30-15:00, Paper ThBT12-CC.3 | Add to My Program |
Statler: State-Maintaining Language Models for Embodied Reasoning |
|
Yoneda, Takuma | Toyota Technological Institute at Chicago |
Fang, Jiading | Toyota Technological Institute at Chicago |
Li, Peng | Fudan University |
Zhang, Huanyu | The University of Chicago |
Jiang, Tianchong | University of Chicago |
Lin, Shengjie | TTI-Chicago |
Picker, Benjamin | University of Chicago |
Yunis, David | Toyota Technological Institute at Chicago |
Mei, Hongyuan | Toyota Technological Institute at Chicago |
Walter, Matthew | Toyota Technological Institute at Chicago |
Keywords: Deep Learning Methods, Manipulation Planning, Task and Motion Planning
Abstract: There has been a significant research interest in employing large language models to empower intelligent robots with complex reasoning. Existing work focuses on harnessing their abilities to reason about the histories of their actions and observations. In this paper, we explore a new dimension in which large language models may benefit robotics planning. In particular, we propose Statler, a framework in which large language models are prompted to maintain an estimate of the world state, which are often unobservable, and track its transition as new actions are taken. Our framework then conditions each action on the estimate of the current world state. Despite being conceptually simple, our Statler framework significantly outperforms strong competing methods (e.g., Code-as-Policies) on several robot planning tasks. Additionally, it has the potential advantage of scaling up to more challenging long-horizon planning tasks.
|
|
13:30-15:00, Paper ThBT12-CC.4 | Add to My Program |
Multi-Granular Transformer for Motion Prediction with LiDAR |
|
Gan, Yiqian | Tusimple, Inc |
Xiao, Hao | TuSimple Inc |
Zhao, Yizhe | TuSimple |
Zhang, Ethan | University of Michigan |
Huang, Zhe | TuSimple Inc |
Ye, Xin | Arizona State University |
Ge, Lingting | TuSimple Inc |
Keywords: Deep Learning Methods, AI-Based Methods, Autonomous Agents
Abstract: Motion prediction has been an essential component of autonomous driving systems since it handles highly uncertain and complex scenarios involving moving agents of different types. In this paper, we propose a Multi-Granular TRansformer (MGTR) framework, an encoder-decoder network that exploits context features in different granularities for different kinds of traffic agents. To further enhance MGTR’s capabilities, we leverage LiDAR point cloud data by incorporating LiDAR semantic features from an off-the-shelf LiDAR feature extractor. We evaluate MGTR on Waymo Open Dataset motion prediction benchmark and show that the proposed method achieved state-of-the-art performance, ranking 1st on its leaderboard (https://waymo.com/open/challenges/2023/motion-prediction/) .
|
|
13:30-15:00, Paper ThBT12-CC.5 | Add to My Program |
What Matters for Active Texture Recognition with Vision-Based Tactile Sensors |
|
Böhm, Alina | TU Darmstadt |
Schneider, Tim | Technical University Darmstadt |
Belousov, Boris | German Research Center for Artificial Intelligence - DFKI |
Kshirsagar, Alap | Technische Universität Darmstadt |
Lin, Lisa | Justus-Liebig-Universität Gießen |
Doerschner, Katja | Justus Liebig University Giessen |
Drewing, Knut | Giessen University |
Rothkopf, Constantin | Frankfurt Institute for Advanced Studies |
Peters, Jan | Technische Universität Darmstadt |
Keywords: Probabilistic Inference, Deep Learning Methods, Force and Tactile Sensing
Abstract: This paper explores active sensing strategies that employ vision-based tactile sensors for robotic perception and classification of fabric textures. We formalize the active sampling problem in the context of tactile fabric recognition and provide an implementation of information-theoretic exploration strategies based on minimizing predictive entropy and variance of probabilistic models. Through ablation studies and human experiments, we investigate which components are crucial for quick and reliable texture recognition. Along with the active sampling strategies, we evaluate neural network architectures, representations of uncertainty, influence of data augmentation, and dataset variability. By evaluating our method on a previously published Active Clothing Perception Dataset and on the real system, we establish that the choice of the exploration strategy has only a minor influence on the recognition accuracy, whereas data augmentation and dropout rate play a significantly larger role. In a comparison study, while humans achieve 66.9% recognition accuracy, our best approach reaches 90.0% in under 5 touches, highlighting that vision-based tactile sensors are highly effective for fabric texture recognition.
|
|
13:30-15:00, Paper ThBT12-CC.6 | Add to My Program |
Learning with Chemical versus Electrical Synapses - Does It Make a Difference? |
|
Farsang, Monika | TU Wien |
Lechner, Mathias | Massachusetts Institute of Technology |
Lung, David | TU Wien |
Hasani, Ramin | Massachusetts Institute of Technology (MIT) |
Rus, Daniela | MIT |
Grosu, Radu | TU Wien |
Keywords: Bioinspired Robot Learning, Deep Learning Methods
Abstract: Bio-inspired neural networks have the potential to advance our understanding of neural computation and improve the state-of-the-art of AI systems. Bio-electrical synapses directly transmit neural signals, by enabling fast current flow between neurons. In contrast, bio-chemical synapses transmit neural signals indirectly, through neurotransmitters. Prior work showed that interpretable dynamics for complex robotic control, can be achieved by using chemical synapses, within a sparse, bio-inspired architecture, called Neural Circuit Policies (NCPs). However, a comparison of these two synaptic models, within the same architecture, remains an unexplored area. In this work we aim to determine the impact of using chemical synapses compared to electrical synapses, in both sparse and all-to-all connected networks. We conduct experiments with autonomous lane-keeping through a photorealistic autonomous driving simulator to evaluate their performance under diverse conditions and in the presence of noise. The experiments highlight the substantial influence of the architectural and synaptic-model choices, respectively. Our results show that employing chemical synapses yields noticeable improvements compared to electrical synapses, and that NCPs lead to better results in both synaptic models.
|
|
13:30-15:00, Paper ThBT12-CC.7 | Add to My Program |
Distill-Then-Prune: An Efficient Compression Framework for Real-Time Stereo Matching Network on Edge Devices |
|
Pan, Baiyu | University of Macau |
Jiao, Jichao | Beijing University of Posts and Telecommunications |
Pang, Jianxin | UBtech Robotics Corp |
Cheng, Jun | Shenzhen Institutes of Advanced Technology |
Keywords: Deep Learning Methods, Transfer Learning, Computer Vision for Transportation
Abstract: In recent years, numerous real-time stereo matching methods have been introduced, but they often lack accuracy. These methods attempt to improve accuracy by introducing new modules or integrating traditional methods. However, the improvements are only modest. In this paper, we propose a novel strategy by incorporating knowledge distillation and model pruning to overcome the inherent trade-off between speed and accuracy. As a result, we obtained a model that maintains real-time performance while delivering high accuracy on edge devices. Our proposed method involves three key steps. Firstly, we review state-of-the-art methods and design our lightweight model by removing redundant modules from those efficient models through a comparison of their contributions. Next, we leverage the efficient model as the teacher to distill knowledge into the lightweight model. Finally, we systematically prune the lightweight model to obtain the final model. Through extensive experiments conducted on two widely-used benchmarks, Sceneflow and KITTI, we perform ablation studies to analyze the effectiveness of each module and present our state-of-the-art results.
|
|
13:30-15:00, Paper ThBT12-CC.8 | Add to My Program |
Incremental 3D Reconstruction through a Hybrid Explicit-And-Implicit Representation |
|
Li, Feifei | The Chinese University of Hong Kong Shenzhen |
Hu, Panwen | The Chinese University of Hong Kong, Shenzhen |
Song, Qi | The Chinese University of Hong Kong, Shenzhen |
Huang, Rui | The Chinese University of Hong Kong, Shenzhen |
Keywords: Incremental Learning, Visual Learning, Deep Learning Methods
Abstract: 3D reconstruction is an important task in computer vision and is widely used in robotics and autonomous driving. When building large-scale scenes, limitations in computing resources and the difficulty of accessing the entire dataset in a single task are inevitable. Therefore, an incremental reconstruction approach is desired. On the one hand, traditional explicit 3D reconstruction methods such as SLAM and SFM require global optimization, which means that time and space resources increase dramatically with the growth of training data. On the other hand, implicit methods like Neural Radiation Fields (NeRF) suffer from catastrophic forgetting if trained incrementally. In this paper, we incrementally reconstruct 3D models in a hybrid representation, where the density of the radiation field is formulated by a voxel grid, and the view-dependent color information of the points is inferred by a shallow MLP. The expansion of the voxel grid and the distillation of the shallow MLP are efficient in this case. Experimental results demonstrate that our incremental method achieves a level of accuracy on par with approaches employing global optimization techniques.
|
|
13:30-15:00, Paper ThBT12-CC.9 | Add to My Program |
Sim2Real Bilevel Adaptation for Object Surface Classification Using Vision-Based Tactile Sensors |
|
Caddeo, Gabriele Mario | Istituto Italiano Di Tecnologia |
Maracani, Andrea | Istituto Italiano Di Tecnologia and University of Genoa |
Alfano, Paolo Didier | Istituto Italiano Di Tecnologia |
Piga, Nicola Agostino | Istituto Italiano Di Tecnologia |
Rosasco, Lorenzo | Istituto Italiano Di Tecnologia & MassachusettsInstitute OfTechn |
Natale, Lorenzo | Istituto Italiano Di Tecnologia |
Keywords: Deep Learning Methods, Force and Tactile Sensing
Abstract: In this paper, we address the Sim2Real gap in the field of vision-based tactile sensors for classifying object surfaces. We train a Diffusion Model to bridge this gap using a relatively small dataset of real-world images randomly collected from unlabeled everyday objects via the DIGIT sensor. Subsequently, we employ a simulator to generate images by uniformly sampling the surface of objects from the YCB Model Set. These simulated images are then translated into the real domain using the Diffusion Model and automatically labeled to train a classifier. During this training, we further align features of the two domains using an adversarial procedure. Our evaluation is conducted on a dataset of tactile images obtained from a set of ten 3D printed YCB objects. The results reveal a total accuracy of 81.9%, a significant improvement compared to the 34.7% achieved by the classifier trained solely on simulated images. This demonstrates the effectiveness of our approach. We further validate our approach using the classifier on a 6D object pose estimation task from tactile data.
|
|
ThBT13-AX Oral Session, AX-201 |
Add to My Program |
Human-Robot Collaboration V |
|
|
Chair: Osa, Takayuki | University of Tokyo |
Co-Chair: Wang, Ziwei | Lancaster University |
|
13:30-15:00, Paper ThBT13-AX.1 | Add to My Program |
Towards Human-Robot Collaborative Surgery: Trajectory and Strategy Learning in Bimanual Peg Transfer |
|
Hu, Zhaoyang Jacopo | Imperial College London |
Wang, Ziwei | Lancaster University |
Huang, Yanpei | Imperial College London |
Sena, Aran | Foster + Partners |
Rodriguez y Baena, Ferdinando | Imperial College, London, UK |
Burdet, Etienne | Imperial College London |
Keywords: Human-Robot Collaboration, Medical Robots and Systems, Autonomous Agents
Abstract: While the traditional control of surgical robots relies on fully manual teleoperations, human-robot collaborative systems promise to address issues such as workspace constrains and laborious tasks. In particular, shared control between human and robot can reduce the surgeon's workload and improve the overall surgical performance by supporting the surgeon effort during movements while keeping them in charge of complex control phases. In this letter, we propose a task segmentation of the bimanual peg transfer procedure that alternates manual and autonomous control correspondingly. The authority allocation in this shared control framework considers both the limitation of learning-based methods and the higher dexterity of humans during physical interaction. The human motion and strategies are transferred from an expert human to a da Vinci Research Kit (dVRK) using an epsilon-greedy on a maximum entropy inverse reinforcement learning algorithm. The model generated enables to train an intelligent agent that can skillfully collaborate with the human operator during the surgical task. The proposed shared control framework is verified both on a virtual platform and then on a real dVRK to assess its usability and robustness. The results show that, compared to traditional manual teleoperation, our method can achieve faster and more consistent peg transfers. An analysis of the participants' effort also reveals a significantly lower perception of the workload.
|
|
13:30-15:00, Paper ThBT13-AX.2 | Add to My Program |
SIREN: Underwater Robot-To-Human Communication Using Audio |
|
Fulton, Michael | University of Minnesota |
Sattar, Junaed | University of Minnesota |
Absar, Rafa | Metro State University |
Keywords: Human-Robot Collaboration, Marine Robotics, Field Robots
Abstract: In this paper we present SIREN: a novel audio-based communication system for underwater human-robot interaction. SIREN utilizes a surface transducer to produce sound by vibrating the frame of an underwater robot, essentially turning the robot's outer surface into the vibrating membrane of a speaker. We employ this hardware in two forms of robot-to-human communication: synthesized text-to-speech (TTS-Sonemes) and synthesized musical indicators (Tone-Sonemes).To profile the system's capabilities with respect to underwater communication, we perform a substantial in-person human study with 12 participants. In this study, participants were trained on the use of one of the previously mentioned audio communication systems. Participants were then asked to identify the communication from their system in a pool at various distances. This study's results demonstrate that sound is a viable method of underwater communication. TTS-Sonemes outperform Tonal-Sonemes at close distances but fail at further distances, while Tonal-Sonemes remain recognizable as the distance to the robot increases.
|
|
13:30-15:00, Paper ThBT13-AX.3 | Add to My Program |
Shared Autonomy Via Variable Impedance Control and Virtual Potential Fields for Encoding Human Demonstrations |
|
Jadav, Shail | IIT Gandhinagar |
Heidersberger, Johannes | Technische Universität Wien |
Ott, Christian | TU Wien |
Lee, Dongheui | Technische Universität Wien (TU Wien) |
Keywords: Human-Robot Collaboration, Physical Human-Robot Interaction, Compliance and Impedance Control
Abstract: This article introduces a framework for complex human-robot collaboration tasks, such as the co-manufacturing of furniture. For these tasks, it is essential to encode tasks from human demonstration and reproduce these skills in a compliant and safe manner. Therefore, two key components are addressed in this work: motion generation and shared autonomy. We propose a motion generator based on a time-invariant potential field, capable of encoding wrench profiles, complex and closed-loop trajectories, and additionally incorporates obstacle avoidance. Additionally, the paper addresses shared autonomy (SA) which enables synergetic collaboration between human operators and robots by dynamically allocating authority. Variable impedance control (VIC) and force control are employed, where impedance and wrench are adapted based on the human-robot autonomy factor derived from interaction forces. System passivity is ensured by an energy-tank based task passivation strategy. The framework's efficacy is validated through simulations and an experimental study employing a Franka Emika Research 3 robot.
|
|
13:30-15:00, Paper ThBT13-AX.4 | Add to My Program |
Robustifying a Policy in Multi-Agent RL with Diverse Cooperative Behaviors and Adversarial Style Sampling for Assistive Tasks |
|
Osa, Takayuki | University of Tokyo |
Harada, Tatsuya | The University of Tokyo |
Keywords: Human-Robot Collaboration, Multi-Robot Systems, Reinforcement Learning
Abstract: Autonomous assistance of people with motor impairments is one of the most promising applications of autonomous robotic systems. Recent studies have reported encouraging results using deep reinforcement learning (RL) in the healthcare domain. Previous studies showed that assistive tasks can be formulated as multi-agent RL, wherein there are two agents: a caregiver and a care-receiver. However, policies trained in multi-agent RL are often sensitive to the policies of other agents. In such a case, a trained caregiver’s policy may not work for different care-receivers. To alleviate this issue, we propose a framework that learns a robust caregiver’s policy by training it for diverse care-receiver responses. In our framework, diverse care-receiver responses are autonomously learned through trials and errors. In addition, to robustify the care-giver’s policy, we propose a strategy for sampling a care-receiver’s response in an adversarial manner during the training. We evaluated the proposed method using tasks in an Assistive Gym. We demonstrate that policies trained with a popular deep RL method are vulnerable to changes in policies of other agents and that the proposed framework improves the robustness against such changes.
|
|
13:30-15:00, Paper ThBT13-AX.5 | Add to My Program |
Human Robot Shared Control in Surgery: A Performance Assessment |
|
Chen, Longrui | Imperial College London |
Hu, Zhaoyang Jacopo | Imperial College London |
Huang, Yanpei | Imperial College London |
Burdet, Etienne | Imperial College London |
Rodriguez y Baena, Ferdinando | Imperial College, London, UK |
Keywords: Human-Robot Collaboration, Medical Robots and Systems, Physical Human-Robot Interaction
Abstract: While surgical robots, such as the da Vinci Surgical System, have become prevalent in minimally invasive surgeries, they are predominantly used by the human operator to directly teleoperate the tools. This paper aims to analyse the different methods of human robot shared control in the surgical domain. We propose a reinforcement learning algorithm, transverse generative adversarial imitation learning (tGAIL), which is employed to train the robot from the expert's demonstration and show competitive generalization ability compared to inverse reinforcement learning and conventional GAIL. We then propose a priority-changing shared control method to effectively combine the surgeon and robot's strengths by dynamically adjusting control priority based on the deviation distance. We show that using this method in a supervision framework boosts the performance of the human operator when completing the peg transfer task. By learning from the expert and collaborating with the human during the task, the intelligent agent helps to reduce surgery time by 31.7% and the human input by 60.5% compared to direct teleoperation.
|
|
13:30-15:00, Paper ThBT13-AX.6 | Add to My Program |
Distilling and Retrieving Generalizable Knowledge for Robot Manipulation Via Language Corrections |
|
Zha, Lihan | Stanford University |
Cui, Yuchen | Stanford University |
Lin, Li-Heng | Stanford University |
Kwon, Minae | Stanford University |
Gonzalez Arenas, Montserrat | Google Inc |
Zeng, Andy | Google DeepMind |
Xia, Fei | Google Inc |
Sadigh, Dorsa | Stanford University |
Keywords: Human-Robot Collaboration, Long term Interaction, Incremental Learning
Abstract: Today's robot policies exhibit subpar performance when faced with the challenge of generalizing to novel environments. Therefore, adapting to and learning from online human corrections is essential but a non-trivial endeavor: not only do robots need to remember human feedback over time to retrieve the right information in new settings and reduce the intervention rate, but also they would need to be able to respond to feedback that can take arbitrary corrections about high-level human preferences to low-level adjustments to skill parameters. In this work, we present Distillation and Retrieval of Online Corrections (DROC), an LLM-based system that can respond to arbitrary forms of language feedback, distill generalizable knowledge from corrections, and retrieve relevant past experiences based on textual and visual similarity for improving performance in novel settings. DROC is able to respond to a sequence of online language corrections that address failures in both high-level task plans and low-level skill primitives. We demonstrate DROC effectively distills the relevant information from the sequence of online corrections in a knowledge base and retrieves that knowledge in settings with new task or object instances. DROC outperforms baseline Code as Policies by using only half of the total number of corrections needed in the first round and requires little to no corrections after 2 iterations.
|
|
13:30-15:00, Paper ThBT13-AX.7 | Add to My Program |
An Intuitive Manual Guidance Scheme to Operate Rotation and Translation Simultaneously |
|
Shao, Fan | University of Naples Federico II |
Ficuciello, Fanny | Universitŕ Di Napoli Federico II |
Keywords: Human-Robot Collaboration, Physical Human-Robot Interaction
Abstract: During certain human-robot collaboration tasks, the operator interacts with the robot by hand guidance to adjust the end-effector pose for spatial operations. The rotational operation is less intuitive to humans than translation. In fact, imagining the path to the target orientation is more challenging. In the literature related to control strategies for robot manual guidance, it is usually proposed to control translation and rotation independently. Our research explored and quantified the factors that influence operational intuition. A Virtual Fixture spatial guidance framework with intuition maintenance is proposed. This novel guidance scheme enables operators to effortlessly and simultaneously control both orientation and position in an intuitive way. High operation precision and efficiency can be achieved without interfering with the main task by exploring the null space with constraint optimization.
|
|
13:30-15:00, Paper ThBT13-AX.8 | Add to My Program |
An Ergo-Interactive Framework for Human-Robot Collaboration Via Learning from Demonstration |
|
Liao, Zhiwei | Xi'an Jiaotong University |
Lorenzini, Marta | Istituto Italiano Di Tecnologia |
Leonori, Mattia | Istituto Italiano Di Tecnologia |
Zhao, Fei | Xi'an Jiaotong University |
Jiang, Gedong | State Key Laboratory for Manufacturing Systems Engineering Xi'an |
Ajoudani, Arash | Istituto Italiano Di Tecnologia |
Keywords: Human-Robot Collaboration, Learning from Demonstration, Human Factors and Human-in-the-Loop
Abstract: This work presents an ergonomic and interactive human-robot collaboration (HRC) framework, through which new collaborative skills are extracted from a one-shot human demonstration and learned through Riemannian dynamic movement primitives (DMP). The proposed framework responds to human-robot interaction forces to adapt to the task requirements, while generating virtual “ergonomic forces” that guide the human toward more ergonomic postures, based on online monitoring of a kinematics-based index. The resulting motion is then integrated into the learned task trajectories. The framework is implemented on a mobile manipulator with a weighted whole-body Cartesian velocity controller, which meets the needs of large-scale HRC. To evaluate the proposed framework, a multi-subject experiment involving a human-robot co-carrying task is conducted. The performance of the ergo-interactive control in terms of task performance and ergonomics adaptation is verified under different experimental conditions. This is followed by a comparative statistical analysis. The experimental results show that the learned trajectory can be reproduced and generalized to several targets and adjusted online according to human preferences and ergonomics.
|
|
13:30-15:00, Paper ThBT13-AX.9 | Add to My Program |
A User-Centered Shared Control Scheme with Learning from Demonstration for Robotic Surgery |
|
Zheng, Haoyi | Imperial College London |
Hu, Zhaoyang Jacopo | Imperial College London |
Huang, Yanpei | Imperial College London |
Cheng, Xiaoxiao | Imperial College of Science, Technology and Medicine, London UK |
Wang, Ziwei | Lancaster University |
Burdet, Etienne | Imperial College London |
Keywords: Human-Robot Collaboration, Learning from Demonstration, Surgical Robotics: Laparoscopy
Abstract: The utilization of shared control in the realm of surgical robotics augments precision and safety by amalgamating human expertise with autonomous assistance. This paper proposes a user-centered shared control framework enabling a robot to learn from expert demonstration, predict operators' intent and modulate control authority to provide natural assistance when needed. We employ deep inverse reinforcement learning (IRL) to enable the robot to learn path planning from expert demonstrations with fast convergence, subsequently enhancing the policy with a potential field method. The control authority is allocated seamlessly between the human operator and the autonomous agent based on the prediction of operators’movement from an adaptive filter and fuzzy logic inference. The proposed method is executed using the da Vinci Research Kit (dVRK) robot in a simulation environment, and its effectiveness is assessed through user performance evaluation in a trajectory tracking task. Compared to direct control and simple shared control, the proposed shared control scheme exhibits superior tracking accuracy and trajectory smoothness under external disturbances. Subjective responses underscore users' perception of the method's efficacy in enhancing their performance.
|
|
ThBT15-AX Oral Session, AX-203 |
Add to My Program |
Human-Aware Motion Planning |
|
|
Co-Chair: Bera, Aniket | Purdue University |
|
13:30-15:00, Paper ThBT15-AX.1 | Add to My Program |
Virtual Borders in 3D: Defining a Drone’s Movement Space Using Augmented Reality |
|
Riechmann, Malte | Bielefeld University of Applied Sciences |
Kirsch, André | Bielefeld University of Applied Sciences and Arts |
König, Matthias | Bielefeld University of Applied Sciences |
Rexilius, Jan | Bielefeld University of Applied Sciences and Arts |
Keywords: Human-Aware Motion Planning, Virtual Reality and Interfaces, Aerial Systems: Perception and Autonomy
Abstract: Robots are increasingly finding their way into home environments, where they can assist with household tasks like vacuuming or surveilling. While the robots can navigate on their own, users might not want them to go everywhere or not in a specific way. For example, users might not want a drone to fly over a table where important letters and the newspaper are stored, even though it is the shortest path to the goal. Therefore, an application is required, that is easy to learn and to apply even for inexperienced users. In this paper, we present a framework that uses a tablet as augmented reality (AR) device to modify a robot’s movement space in 3D. A user can define virtual borders in the real world with the tablet and add them to a map, changing the navigational behavior of the robot. The framework is evaluated by a user study with inexperienced participants that verifies our approach. Further analyses show, that even complex scenarios can be covered with our framework.
|
|
13:30-15:00, Paper ThBT15-AX.2 | Add to My Program |
Trajectory Prediction for Robot Navigation Using Flow-Guided Markov Neural Operator |
|
Bhaskara, Rashmi | Purdue University |
Viswanath, Hrishikesh | Purdue University |
Bera, Aniket | Purdue University |
Keywords: Human-Aware Motion Planning, Deep Learning Methods
Abstract: Predicting pedestrian movements remains a complex and persistent challenge in robot navigation research. We must evaluate several factors to achieve accurate predictions, such as pedestrian interactions, the environment, crowd density, and social and cultural norms. Accurate prediction of pedestrian paths is vital for ensuring safe human-robot interaction, especially in robot navigation. Furthermore, this research has potential applications in autonomous vehicles, pedestrian tracking, and human-robot collaboration. Therefore, in this paper, we introduce FlowMNO, an Optical Flow-Integrated Markov Neural Operator designed to capture pedestrian behavior across diverse scenarios. Our paper models trajectory prediction as a Markovian process, where future pedestrian coordinates depend solely on the current state. This problem formulation eliminates the need to store previous states. We conducted experiments using standard benchmark datasets like ETH, HOTEL, ZARA1, ZARA2, UCY, and RGB-D pedestrian datasets. Our study demonstrates that FlowMNO outperforms some of the state-of-the-art deep learning methods like LSTM, GAN, and CNN-based approaches, by approximately 86.46% when predicting pedestrian trajectories. Thus, we show that FlowMNO can seamlessly integrate into robot navigation systems, enhancing their ability to navigate crowded areas smoothly.
|
|
13:30-15:00, Paper ThBT15-AX.3 | Add to My Program |
Stranger Danger! Identifying and Avoiding Unpredictable Pedestrians in RL-Based Social Robot Navigation |
|
Pohland, Sara | University of California, Berkeley |
Tan, Alvin | University of California, Berkeley |
Dutta, Prabal | University of California, Berkeley |
Tomlin, Claire | UC Berkeley |
Keywords: Human-Aware Motion Planning, Reinforcement Learning
Abstract: Reinforcement learning (RL) methods for social robot navigation show great success navigating robots through large crowds of people, but the performance of these learning-based methods tends to degrade in particularly challenging or unfamiliar situations due to the models' dependency on representative training data. To ensure human safety and comfort, it is critical that these algorithms handle uncommon cases appropriately, but the low frequency and high diversity of such situations present a significant challenge for these data-driven methods. To overcome this issue, we propose modifications to the learning process that encourage these RL policies to maintain additional caution in unfamiliar situations. Specifically, we improve the Socially Attentive Reinforcement Learning (SARL) policy by (1) modifying the training process to systematically introduce deviations into a pedestrian model, (2) updating the value network to estimate and utilize pedestrian-unpredictability features, and (3) implementing a reward function to learn an effective response to pedestrian unpredictability. Compared to the original SARL policy, our modified policy maintains similar navigation times and path lengths, while reducing the number of collisions by 82% and reducing the proportion of time spent in the pedestrians' personal space by up to 19 percentage points for the most difficult cases. We also describe how to apply these modifications to other RL policies and demonstrate that key high-level behaviors of our approach transfer to a physical robot.
|
|
13:30-15:00, Paper ThBT15-AX.4 | Add to My Program |
Robot Navigation in Risky, Crowded Environments: Understanding Human Preferences |
|
Suresh, Aamodh | US Army Research Laboratory |
Taylor, Angelique | Cornell Tech |
Riek, Laurel D. | University of California San Diego |
Martinez, Sonia | UC San Diego |
Keywords: Human-Aware Motion Planning, Social HRI, Human-Centered Robotics
Abstract: The effective deployment of robots in risky and crowded environments (RCE) requires the specification of robot plans that are consistent with humans' behaviors. As is well known, humans perceive uncertainty and risk in a biased way, which can lead to a diversity of actions and expectations when interacting with others. To gain a better understanding of these behaviors, this work presents new data that aims to verify how these biases translate into a human navigational setting. More precisely, we conduct a novel study that recreates a COVID-19 pandemic grocery shopping scenario and asks participants to select among various paths with different levels of} time-risk tradeoffs. The data shows that participants exhibit a variety of path preferences: from risky and urgent to safe and relaxed. To model users' decision making, we evaluate three popular risk models and found that CPT captures people's decisions more accurately, corroborating previous theoretical results that CPT is more expressive and inclusive. We also find that people's self assessments of risk and time-urgency do not correlate with their path preferences in RCEs. Finally, we conduct thematic analysis of custom open-ended questions to gauge interest and preferences of navigational Explainable AI (XAI) in robots. A large majority also showed interest in understanding robot's intention (path plans and decisions) through various modalities like speech, touchscreen and gestures. We provide crucial XAI design insights.
|
|
13:30-15:00, Paper ThBT15-AX.5 | Add to My Program |
MAC-ID: Multi-Agent Reinforcement Learning with Local Coordination for Individual Diversity |
|
Chung, Hojun | Seoul National University |
Oh, Jeongwoo | Seoul National University |
Heo, Jae Seok | Seoul National University |
Lee, Gunmin | Seoul National University |
Oh, Songhwai | Seoul National University |
Keywords: Human-Aware Motion Planning, Data Sets for Robot Learning, Motion and Path Planning
Abstract: With the increase of robots navigating through crowded environments in our daily lives, the demand for designing a socially-aware navigation method considering human-robot interaction has risen. When developing and assessing socially-aware navigation methods, pedestrian motion modeling plays a significant role. However, existing pedestrian models often struggle in complex environments and do not have the capacity to generate diverse pedestrian styles. In this paper, we propose multi-agent reinforcement learning with local coordination for individual diversity (MAC-ID), which can synthesize diverse pedestrian motions via local coordination factor (LCF). Our experiments have demonstrated that the manipulation of the LCF induces interpretable changes in pedestrian behaviors, along with a superior performance compared to existing pedestrian motion models. For evaluating socially-aware navigation methods using MAC-ID, we present a novel benchmark called BSON. It offers realistic and diverse social environments with pedestrians modeled via MAC-ID. We have trained and compared various navigation methods in BSON using a newly proposed metric called socially-aware navigation score. Through BSON, users can evaluate their socially-aware navigation methods and compare them to baselines.
|
|
13:30-15:00, Paper ThBT15-AX.6 | Add to My Program |
Interactive Joint Planning for Autonomous Vehicles |
|
Chen, Yuxiao | Nvidia Research |
Veer, Sushant | NVIDIA |
Karkus, Peter | NVIDIA |
Pavone, Marco | Stanford University |
Keywords: Human-Aware Motion Planning, Integrated Planning and Learning, Motion and Path Planning
Abstract: In highly interactive driving scenarios, the actions of one agent greatly influence those of its neighbors. Planning safe motions for autonomous vehicles in such interactive environments, therefore, requires reasoning about the impact of the ego’s intended motion plan on nearby agents’ behavior. Deep-learning-based models have recently achieved considerable success in trajectory prediction and many models in the literature allow for ego-conditioned prediction. However, leveraging ego-conditioned prediction remains challenging in downstream planning due to the complex nature of neural networks, limiting the planner structure to simple ones, e.g., sampling-based planners. Despite the ability of gradient-based planning algorithms, such as model predictive control (MPC), to generate fine-grained high-quality motion plans, it is difficult for them to leverage ego-conditioned prediction due to their iterative nature and need for gradients. We present Interactive Joint Planning (IJP) that bridges MPC with learned prediction models in a computationally scalable manner to provide us with the best of both worlds. In particular, IJP jointly optimizes over the behavior of the ego and the surrounding agents and leverages deep-learned prediction models as prediction priors that the join trajectory optimization tries to stay close to. Furthermore, by leveraging free-end homotopy classes—a novel concept we introduce in this paper—IJP efficiently searches over diverse motion plans. Closed-
|
|
13:30-15:00, Paper ThBT15-AX.7 | Add to My Program |
Improve Computing Efficiency and Motion Safety by Analyzing Environment with Graphics (I) |
|
Zhang, Qianyi | Nankai University |
Wu, Shichao | Nankai University |
Jia, Yuhang | Nankai University |
Xu, Yuang | Nankai University |
Liu, Jingtai | Nankai University |
Keywords: Human-Aware Motion Planning, Nonholonomic Motion Planning, Collision Avoidance
Abstract: Exploring topologically distinctive trajectories provides more options for robot motion planning. Since computing time grows greatly with environment complexity, improving exploration efficiency and picking the optimal trajectory in complex environments are critical issues. To this end, this paper proposes a Graphic- and Timed-Elastic-Band-based approach (GraphicTEB) with spatial completeness and high computing efficiency. The environment is analyzed utilizing computer graphics, where obstacles are extracted as nodes and their relationships are built as edges. Three contributions are presented. 1) By assembling directed detours formed by nodes and segmented paths formed by edges, a generalized path consisting of nodes and edges derives various normal paths efficiently. 2) By multiplying two vectors starting from the obstacle point closest to the waypoint and the boundary point farthest from the waypoint, an novel obstacle gradient is introduced to guide safer optimization. 3) By assigning edges with asymmetric Gaussian model, a trajectory evaluation strategy is designed to reflect motion tendency and motion uncertainty of dynamic obstacles. Qualitative and quantitative simulations demonstrate that the proposed GraphicTEB achieves spatial completeness, higher scene pass rate, and fastest computing efficiency. Experiments are implemented in long corridor and broad room scenarios, where the robot goes through gaps safely, finds trajectories quickly, passes pedestrians politely.
|
|
13:30-15:00, Paper ThBT15-AX.8 | Add to My Program |
Generating Environment-Based Explanations of Motion Planner Failure: Evolutionary and Joint-Optimization Algorithms |
|
Liu, Qishuai | University of Nebraska-Lincoln |
Brandao, Martim | King's College London |
Keywords: Human-Aware Motion Planning, Human-Centered Robotics
Abstract: Motion planning algorithms are important components of autonomous robots, which are difficult to understand and debug when they fail to find a solution to a problem. In this paper we propose a solution to the failure-explanation problem, which are automatically-generated environment-based explanations. These explanations reveal the objects in the environment that are responsible for the failure, and how their location in the world should change so as to make the planning problem feasible. Concretely, we propose two methods - one based on evolutionary optimization and another on joint trajectory-and-environment continuous-optimization. We show that the evolutionary method is well-suited to explain sampling-based motion planners, or even optimization-based motion planners in situations where computation speed is not a concern (e.g. post-hoc debugging). However, the optimization-based method is 4000 times faster and thus more attractive for interactive applications, even though at the cost of a slightly lower success rate. We demonstrate the capabilities of the methods through concrete examples and quantitative evaluation.
|
|
ThBT16-AX Oral Session, AX-204 |
Add to My Program |
Wearable Robotics I |
|
|
Chair: Zhu, Yaonan | Nagoya University |
Co-Chair: Hassan, Modar | University of Tsukuba |
|
13:30-15:00, Paper ThBT16-AX.1 | Add to My Program |
Vision-Based Wearable Steering Assistance for People with Impaired Vision in Jogging |
|
Liu, Xiaotong | University of Science and Technology of China |
Wang, Binglu | Northwestern Polytechnical University |
Li, Zhijun | University of Science and Technology of China |
Keywords: Wearable Robotics, Human Performance Augmentation, Data Sets for Robotic Vision
Abstract: Outdoor sports pose a challenge for people with impaired vision. The demand for higher-speed mobility inspired us to develop a vision-based wearable steering assistance. To ensure broad applicability, we focused on a representative sports environment, the athletics track. Our efforts centered on improving the speed and accuracy of perception, enhancing planning adaptability for the real world, and providing swift and safe assistance for people with impaired vision. In perception, we engineered a lightweight multitask network capable of simultaneously detecting track lines and obstacles. Additionally, due to the limitations of existing datasets for supporting multi-task detection in athletics tracks, we diligently collected and annotated a new dataset (MAT) containing 1000 images. In planning, we integrated the methods of sampling and spline curves, addressing the planning challenges of curves. Meanwhile, we utilized the positions of the track lines and obstacles as constraints to guide people with impaired vision safely along the current track. Our system is deployed on an embedded device, Jetson Orin NX. Through outdoor experiments, it demonstrated adaptability in different sports scenarios, assisting users in achieving free movement of 400 meter at an average speed of 1.34 m/s, meeting the level of normal people in jogging. Our MAT dataset is publicly available from https://github.com/snoopy-l/MAT
|
|
13:30-15:00, Paper ThBT16-AX.2 | Add to My Program |
Variable Grounding Flexible Limb Tracking Center of Gravity for Sit-To-Stand Transfer Assistance |
|
Sugiura, Sojiro | Nagoya University |
Unde, Jayant | Nagoya University |
Zhu, Yaonan | Nagoya University |
Hasegawa, Yasuhisa | Nagoya University |
Keywords: Wearable Robotics, Physically Assistive Devices, Mechanism Design
Abstract: Wearable robotic limbs support sit-to-stand (STS) transfer with increased stability while maintaining a compact size, as well as providing body weight support. This paper proposes a new robotic limb, Variable Grounding Flexible Limb (VGFL). The VGFL achieved a novel strategy in which its grounding point tracks the forward shift of the wearer's center of gravity (CoG) during STS. Since the strategy keeps the distance between the grounding point and the CoG close, an upward force can efficiently work for the STS assistance. To implement the strategy, this paper utilized the High-Strength & Flexible Mechanism (HSFM). The HSFM can change its grounding point while being capable of lifting body weight with one motor. Owing to these unique characteristics, the VGFL can realize the strategy without multiple actuators and complex controllers. Furthermore, real-world experiments confirmed that the grounding point of the VGFL accurately tracked the forward shift of the CoG during the STS assistance. Moreover, experiments conducted with three healthy subjects showed that the VGFL reduced the surface myoelectricity of the lower limbs during STS transfer. The VGFL could demonstrate high support performance in STS with the CoG tracking strategy.
|
|
13:30-15:00, Paper ThBT16-AX.3 | Add to My Program |
Towards Enhanced Stability of Human Stance with a Supernumerary Robotic Tail |
|
Abeywardena, Sajeeva | University of Surrey |
Farkhatdinov, Ildar | Queen Mary University of London |
Keywords: Wearable Robotics, Human Performance Augmentation, Physical Human-Robot Interaction
Abstract: Neural control is paramount in maintaining upright stance of a human; however, the associated time delay affects stability. In the design and control of wearable robots to augment human stance, the neural delay dynamics are often overly simplified or ignored leading to over specified systems. In this letter, the neural delay dynamics of human stance are modelled and embedded in the control of a supernumerary robotic tail to augment human balance. The actuation, geometric and inertial parameters of the tail are examined. Through simulations it was shown that by incorporating the delay dynamics, the requirements of the tail can be greatly reduced. Further, it is shown that robustness of stance is significantly enhanced with a supernumerary tail and that there is positive impact on muscle fatigue.
|
|
13:30-15:00, Paper ThBT16-AX.4 | Add to My Program |
Active, Quasi-Passive, Pneumatic, and Portable Knee Exoskeleton with Bidirectional Energy Flow for Efficient Air Recovery in Sit-Stand Tasks |
|
Miskovic, Luka | Jožef Stefan Institute |
Brecelj, Tilen | Jozef Stefan Institute |
Dezman, Miha | Karlsruhe Institute of Technology |
Petric, Tadej | Jozef Stefan Institute |
Keywords: Wearable Robotics, Hydraulic/Pneumatic Actuators, Prosthetics and Exoskeletons
Abstract: While existing literature encompasses exoskeleton-assisted sit-stand tasks, the integration of energy recovery mechanisms remains unexplored. To push the boundaries further, this study introduces a portable pneumatic knee exoskeleton that operates in both quasi-passive and active modes, where active mode is utilized for aiding in standing up (power generation), thus the energy flows from the exoskeleton to the user, and quasi-passive mode for aiding in sitting down (power absorption), where the device absorbs and can store energy in the form of compressed air, leading to energy savings in active mode. The absorbed energy can be stored and later reused without compromising exoskeleton transparency in the meantime. In active mode, an air pump inflates the pneumatic artificial muscle (PAM), which stores the compressed air, that can then be released into a pneumatic cylinder to generate torque. All electronic and pneumatic components are integrated into the system, and the exoskeleton weighs 3.9 kg with a maximum torque of 20 Nm at the knee joint. The paper describes the mechatronic design, mathematical model and includes a pilot study with an able-bodied subject performing sit-to-stand tasks. The results show that the exoskeleton can recover energy while assisting the subject and reducing mean muscle activity by ∼31%. Further results highlight air regeneration's potential for energy saving in portable pneumatic exoskeletons, showing that the proposed device extends exoskeleton operation by ∼27%.
|
|
13:30-15:00, Paper ThBT16-AX.5 | Add to My Program |
Task-Based Human-Robot Collaboration Control of Supernumerary Robotic Limbs for Overhead Tasks |
|
Tu, Zhixin | Southern University of Science and Technology |
Fang, Yijun | Southern University of Science and Technology |
Leng, Yuquan | Southern University of Science and Technology |
Fu, Chenglong | Southern University of Science and Technology (SUSTech) |
Keywords: Wearable Robotics, Human-Robot Collaboration, Human Performance Augmentation
Abstract: Supernumerary robotic limbs (SRLs) are novel wearable robots that can be used to augment human operating ability in completing some difficult and complex tasks. In this work, a task-based human-SRLs collaboration control method for overhead tasks is developed. It is autonomous and safe without the need for active commands and previous data. A task model is proposed to model the human-SRLs collaboration process that features different task states and state transition conditions. Specifically, the overhead task process is modeled as a finite state machine (FSM) with four task states, three trigger events, and three SRLs actions. The real-time measured human motion data is utilized to trigger the task state transition and estimate the task parameters, which are the constraints for SRLs motion planning. The proposed admittance control with adjustable parameters allows SRLs to behave like spring-damping systems with different characteristics in different states and actions, which enhances the safety and reliability of the human-SRLs interaction. Finally, the effectiveness of the proposed control method for overhead tasks is further validated on a prototype of the human-SRLs system with two subjects under different installation heights. Trigger events and task parameters are successfully detected and estimated during the task process to trigger the coordination actions of SRLs. The results demonstrate that the task-based collaboration method is useful for overhead tasks with differ
|
|
13:30-15:00, Paper ThBT16-AX.6 | Add to My Program |
Safe and Individualized Motion Planning for Upper-Limb Exoskeleton Robots Using Human Demonstration and Interactive Learning |
|
Chen, Yu | Tsinghua University |
Chen, Gong | Shenzhen MileBot Robotics |
Ye, Jing | Shenzhen MileBot Robotics Co. Ltd |
Qiu, Xiangjun | Tsinghua University |
Li, Xiang | Tsinghua University |
Keywords: Wearable Robotics, Safety in HRI, Physical Human-Robot Interaction
Abstract: A typical application of upper-limb exoskeleton robots is deployment in rehabilitation training, helping patients to regain manipulative abilities. However, as the patient is not always capable of following the robot, safety issues may arise during the training. Due to the bias in different patients, an individualized scheme is also important to ensure that the robot suits the specific conditions (e.g., movement habits) of a patient, hence guaranteeing effectiveness. To fulfill this requirement, this paper proposes a new motion planning scheme for upper-limb exoskeleton robots, which drives the robot to provide customized, safe, and individualized assistance using both human demonstration and interactive learning. Specifically, the robot first learns from a group of healthy subjects to generate a reference motion trajectory via probabilistic movement primitives (ProMP). It then learns from the patient during the training process to further shape the trajectory inside a moving safe region. The interactive data is fed back into the ProMP iteratively to enhance the individualized features for as long as the training process continues. The robot tracks the individualized trajectory under a variable impedance model to realize the assistance. Finally, the experimental results are presented in this paper to validate the proposed control scheme.
|
|
ThBT17-AX Oral Session, AX-205 |
Add to My Program |
Whole-Body Motion Planning and Control |
|
|
Chair: Semini, Claudio | Istituto Italiano Di Tecnologia |
Co-Chair: Okada, Kei | The University of Tokyo |
|
13:30-15:00, Paper ThBT17-AX.1 | Add to My Program |
Reactive Landing Controller for Quadruped Robots |
|
Roscia, Francesco | Istituto Italiano Di Tecnologia |
Focchi, Michele | Universitŕ Di Trento |
Del Prete, Andrea | University of Trento |
Caldwell, Darwin G. | Istituto Italiano Di Tecnologia |
Semini, Claudio | Istituto Italiano Di Tecnologia |
Keywords: Legged Robots, Whole-Body Motion Planning and Control
Abstract: Quadruped robots are machines intended for challenging and harsh environments. Despite the progress in locomotion strategy, safely recovering from unexpected falls or planned drops is still an open problem. It is further made more difficult when high horizontal velocities are involved. In this work, we propose an optimization-based reactive Landing Controller that uses only proprioceptive measures for torque-controlled quadruped robots that free-fall on a flat horizontal ground, knowing neither the distance to the landing surface nor the flight time. Based on an estimate of the Center of Mass horizontal velocity, the method uses the Variable Height Springy Inverted Pendulum model for continuously recomputing the feet position while the robot is falling. In this way, the quadruped is ready to attain a successful landing in all directions, even in the presence of significant horizontal velocities. The method is demonstrated to dramatically enlarge the region of horizontal velocities that can be dealt with by a naive approach that keeps the feet still during the airborne stage. To the best of our knowledge, this is the first time that a quadruped robot can successfully recover from falls with horizontal velocities up to 3 m/s in simulation. Experiments prove that the used platform, Go1, can successfully attain a stable standing configuration from falls with various horizontal velocities and different angular perturbations.
|
|
13:30-15:00, Paper ThBT17-AX.2 | Add to My Program |
Hierarchical Optimization-Based Control for Whole-Body Loco-Manipulation of Heavy Objects |
|
Rigo, Alberto | University of Southern California |
Hu, Muqun | University of Southern California |
Gupta, Satyandra K. | University of Southern California |
Nguyen, Quan | University of Southern California |
Keywords: Legged Robots, Whole-Body Motion Planning and Control, Mobile Manipulation
Abstract: In recent years, the field of legged robotics has seen growing interest in enhancing the capabilities of these robots through the integration of articulated robotic arms. However, achieving successful loco-manipulation, especially involving interaction with heavy objects, is far from straightforward, as object manipulation can introduce substantial disturbances that impact the robot's locomotion. This paper presents a novel framework for legged loco-manipulation that considers whole-body coordination through a hierarchical optimization-based control framework. First, an online manipulation planner computes the manipulation forces and manipulated object task-based reference trajectory. Then, pose optimization aligns the robot's trajectory with kinematic constraints. The resultant robot reference trajectory is executed via a linear MPC controller incorporating the desired manipulation forces into its prediction model. Our approach has been validated in simulation and hardware experiments, highlighting the necessity of whole-body optimization compared to the baseline locomotion MPC when interacting with heavy objects. Experimental results with Unitree Aliengo, equipped with a custom-made robotic arm, showcase its ability to successfully lift and carry an 8kg payload and manipulate doors.
|
|
13:30-15:00, Paper ThBT17-AX.3 | Add to My Program |
Toward Self-Righting and Recovery in the Wild: Challenges and Benchmarks |
|
Scalise, Rosario | University of Washington |
Caglar, Ege | University of Washington - Seattle |
Boots, Byron | University of Washington |
Kessens, Chad C. | United States Army Research Laboratory |
Keywords: Multi-Contact Whole-Body Motion Planning and Control, Failure Detection and Recovery, Performance Evaluation and Benchmarking
Abstract: Self-recovery is a critical capability for robust, agile robots operating in the real world. Given truly challenging terrain, it is nearly inevitable that, at some point, the robot will fail and subsequently need to recover if it is to continue its task. One critical subset of recovery is standing back up after falling down (aka “self-righting”), an essential early milestone for babies learning to walk, and an existential capability for animals. While some robots can be designed with multiple orientations for mobility, most seeking to affect the world would significantly benefit from planners/policies that facilitate self-righting whenever possible. In this work, we present a series of challenges that outline why recovery in the wild is difficult. We then present a set of benchmark policies trained in simulation using deep reinforcement learning (RL) and the Student-Teacher approach. Finally, we evaluate the performance of these policies on a set of benchmark contexts in simulation, and provide baseline validation on a physical robot.
|
|
13:30-15:00, Paper ThBT17-AX.4 | Add to My Program |
Design of Morphable StateNet Based on Pseudo-Generalization of Standing up Motions for Humanoid with Variable Body Structure |
|
Makabe, Tasuku | The University of Tokyo |
Okada, Kei | The University of Tokyo |
Inaba, Masayuki | The University of Tokyo |
Keywords: Multi-Contact Whole-Body Motion Planning and Control, Humanoid Robot Systems, Whole-Body Motion Planning and Control
Abstract: In this paper, we explain the Morphable StateNet as the StateNet with pseudo-generalized behaviors for robots with various degree-of-freedom arrangements and link lengths. Pseudo-generalization is performed by analytically calculating joint angles that satisfy the desired support conditions, focusing on link lengths and antigravity joints that contribute to motion, with constraints placed on the contact conditions between the environment and the robot body. We apply Morphable StateNet to the standing-up motion of humanoids with variable body structures and conduct evaluation experiments. We have demonstrated the usefulness of the proposed method in environments with low friction coefficients with the environment by conducting evaluations using both a simulator and an actual humanoid.
|
|
13:30-15:00, Paper ThBT17-AX.5 | Add to My Program |
Agile and Dynamic Standing-Up Control for Humanoids Using 3D Divergent Component of Motion in Multi-Contact Scenario |
|
Zambella, Grazia | TU Wien |
Schuller, Robert | German Aerospace Center (DLR) |
Mesesan, George | German Aerospace Center (DLR) |
Bicchi, Antonio | Fondazione Istituto Italiano Di Tecnologia |
Ott, Christian | TU Wien |
Lee, Jinoh | German Aerospace Center (DLR) |
Keywords: Multi-Contact Whole-Body Motion Planning and Control, Whole-Body Motion Planning and Control
Abstract: Standing-up is a task that humanoids need to be able to perform in order to be employed in real-world scenarios. This paper proposes a new robust strategy for a humanoid to stand up in challenging scenarios where no completely preplanned motion can accomplish the same task. This strategy exploits the concept of three-dimensional divergent component of motion and passivity-based whole-body control. The latter firstly maximizes the push forces applied to the robot's center of mass to make agile whole-body recovery motion. Then, during the rising phase, it reduces these forces to zero and stabilizes the robot in an upward position. Optimization of centroidal angular momentum is fully integrated into the proposed whole-body standing-up control to create the trajectories of the hip and the upper body joints online. The effectiveness of the proposed method is validated in simulations and experiments on the humanoid TORO.
|
|
13:30-15:00, Paper ThBT17-AX.6 | Add to My Program |
Representing Robot Geometry As Distance Fields: Applications to Whole-Body Manipulation |
|
Li, Yiming | Idiap Research Institute, École Polytechnique Fédérale De Lausan |
Zhang, Yan | EPFL |
Razmjoo, Amirreza | Idiap Research Institute |
Calinon, Sylvain | Idiap Research Institute |
Keywords: Representation Learning, Whole-Body Motion Planning and Control, Collision Avoidance
Abstract: In this work, we propose a novel approach to represent robot geometry as distance fields (RDF) that extends the principle of signed distance fields (SDFs) to articulated kinematic chains. Our method employs a combination of Bernstein polynomials to encode the signed distance for each robot link with high accuracy and efficiency while ensuring the mathematical continuity and differentiability of SDFs. We further leverage the kinematics chain of the robot to produce the SDF representation in joint space, allowing robust distance queries in arbitrary joint configurations. The proposed RDF representation is differentiable and smooth in both task and joint spaces, enabling its direct integration to optimization problems. Additionally, the 0-level set of the robot corresponds to the robot surface, which can be seamlessly integrated into whole-body manipulation tasks. We conduct various experiments in both simulations and with 7-axis Franka Emika robots, comparing against baseline methods, and demonstrating its effectiveness in collision avoidance and whole-body manipulation tasks. Project page: https://sites.google.com/view/lrdf/home
|
|
13:30-15:00, Paper ThBT17-AX.7 | Add to My Program |
Singularity-Robust Prioritized Whole-Body Tracking and Interaction Control with Smooth Task Transitions |
|
Wu, Xuwei | German Aerospace Center (DLR) |
Albu-Schäffer, Alin | DLR - German Aerospace Center |
Dietrich, Alexander | German Aerospace Center (DLR) |
Keywords: Whole-Body Motion Planning and Control, Compliance and Impedance Control, Redundant Robots
Abstract: In this work, we propose a singularity-robust whole-body control framework that ensures smooth task transitions while maintaining strict priorities. The weighted generalized inverse is adopted to derive a hierarchical control law compatible with singular and redundant tasks. Moreover, a smooth activation matrix is proposed to continuously shape both null-space projectors and task-level control actions. Validation has been conducted in MATLAB/Simulink and MuJoCo simulations with Rollin’ Justin.
|
|
13:30-15:00, Paper ThBT17-AX.8 | Add to My Program |
Learning Force Control for Legged Manipulation |
|
Portela, Tifanny | EPFL |
Margolis, Gabriel | Massachusetts Institute of Technology |
Ji, Yandong | UCSD |
Agrawal, Pulkit | MIT |
Keywords: Whole-Body Motion Planning and Control, Reinforcement Learning, Legged Robots
Abstract: Controlling the contact force during interactions is an inherent requirement for locomotion and manipulation tasks. Current reinforcement learning approaches to locomotion and manipulation rely implicitly on forceful interaction to accomplish tasks but do not explicitly regulate it. This paper proposes a reinforcement learning task specification that focuses on matching desired contact force levels. Integrating force control with the coordination of a robot's body and arm, we present an end-to-end policy for legged manipulator control. Force control enables us to realize compliant gripper and whole-body pulling movements that have not been previously demonstrated using a learned policy. It also facilitates a characterization of the force-tracking performance of learned policies in simulation and the real world, indicating their performance potential for force-critical tasks.
|
|
13:30-15:00, Paper ThBT17-AX.9 | Add to My Program |
A Study of Shared-Control with Bilateral Feedback for Obstacle Avoidance in Whole-Body Telelocomotion of a Wheeled Humanoid |
|
Baek, DongHoon | University of Illinois Urbana-Champaign |
Chang, Yu-Chen (Johnny) | University of Illinois, Urbana-Champaign |
Ramos, Joao | University of Illinois at Urbana-Champaign |
Keywords: Whole-Body Motion Planning and Control, Telerobotics and Teleoperation, Humanoid Robot Systems
Abstract: Teleoperation has emerged as an alternative solution to fully-autonomous systems for achieving human-level capabilities on humanoids. Specifically, teleoperation with wholebody control is a promising hands-free strategy to command humanoids but requires more physical and mental demand. To mitigate this limitation, researchers have proposed shared-control methods incorporating robot decision-making to aid humans on low-level tasks, further reducing operation effort. However, shared-control methods for wheeled humanoid telelocomotion on a whole-body level has yet to be explored. In this work, we explore how whole-body bilateral feedback with haptics affects the performance of different shared-control methods for obstacle avoidance in diverse environments. A time-derivative Sigmoid function (TDSF) is implemented to generate more intuitive haptic feedback from obstacles. Comprehensive human experiments were conducted and the results concluded that bilateral feedback enhances the whole-body telelocomotion performance in unfamiliar environments but could reduce performance in familiar environments. Conveying the robot’s intention through haptics showed further improvements since the operator can utilize the feedback for reactive short-distance planning and visual feedback for long-distance planning.
|
|
ThBT18-AX Oral Session, AX-206 |
Add to My Program |
Representation Learning I |
|
|
Chair: Joho, Dominik | KUKA Deutschland GmbH |
Co-Chair: Moghadam, Peyman | CSIRO |
|
13:30-15:00, Paper ThBT18-AX.1 | Add to My Program |
MOSAIC: Learning Unified Multi-Sensory Object Property Representations for Robot Learning Via Interactive Perception |
|
Tatiya, Gyan | Tufts University |
Francis, Jonathan | Bosch Center for Artificial Intelligence |
Wu, Ho-Hsiang | Bosch Research |
Bisk, Yonatan | Carnegie Mellon University |
Sinapov, Jivko | Tufts University |
Keywords: Representation Learning, Sensorimotor Learning, Learning Categories and Concepts
Abstract: A holistic understanding of object properties across diverse sensory modalities (e.g., visual, audio, and haptic) is essential for tasks ranging from object categorization to complex manipulation. Drawing inspiration from cognitive science studies that emphasize the significance of multi-sensory integration in human perception, we introduce MOSAIC (Multimodal Object property learning with Self-Attention and Interactive Comprehension), a novel framework designed to facilitate the learning of unified multi-sensory object property representations. While it is undeniable that visual information plays a prominent role, we acknowledge that many fundamental object properties extend beyond the visual domain to encompass attributes like texture, mass distribution, or sounds, which significantly influence how we interact with objects. In MOSAIC, we leverage this profound insight by distilling knowledge from multimodal foundation models and aligning these representations not only across vision but also haptic and auditory sensory modalities. Through extensive experiments on a dataset where a humanoid robot interacts with 100 objects across 10 exploratory behaviors, we demonstrate the versatility of MOSAIC in two task families: object categorization and object-fetching tasks. Our results underscore the efficacy of MOSAIC's unified representations, showing competitive performance in category recognition through a simple linear probe setup and excelling in the fetch object task under zero-shot transfer conditions. This work pioneers the application of sensory grounding in foundation models for robotics, promising a significant leap in multi-sensory perception capabilities for autonomous systems. We have released the code, datasets, and additional results: https://github.com/gtatiya/MOSAIC.
|
|
13:30-15:00, Paper ThBT18-AX.2 | Add to My Program |
Neural Rearrangement Planning for Object Retrieval from Confined Spaces Perceivable by Robot's In-Hand RGB-D Sensor |
|
Ren, Hanwen | Purdue University |
Qureshi, Ahmed H. | Purdue University |
Keywords: Representation Learning, Reactive and Sensor-Based Planning, Task Planning
Abstract: Rearrangement planning for object retrieval tasks from confined spaces is a challenging problem, primarily due to the lack of open space for robot motion and limited perception. Several traditional methods exist to solve object retrieval tasks, but they require overhead cameras for perception and a time-consuming exhaustive search to find a solution and often make unrealistic assumptions, such as having identical, simple geometry objects in the environment. This paper presents a neural object retrieval framework that efficiently performs rearrangement planning of unknown, arbitrary objects in confined spaces to retrieve the desired object using a given robot grasp. Our method actively senses the environment with the robot's in-hand camera. It then selects and relocates the non-target objects such that they do not block the robot path homotopy to the target object, thus also aiding an underlying path planner in quickly finding robot motion sequences. Furthermore, we demonstrate our framework in challenging scenarios, including real-world cabinet-like environments with arbitrary household objects. The results show that our framework achieves the best performance among all presented methods and is, on average, two orders of magnitude computationally faster than the best-performing baselines.
|
|
13:30-15:00, Paper ThBT18-AX.3 | Add to My Program |
MMPI: A Flexible Radiance Field Representation by Multiple Multi-Plane Images Blending |
|
He, Yuze | Tsinghua University |
Wang, Peng | The University of Hong Kong |
Hu, Yubin | Tsinghua University |
Zhao, Wang | Tsinghua University |
Yi, Ran | Shanghai Jiao Tong University |
Liu, Yong-Jin | Tsinghua University |
Wang, Wenping | The University of Hong Kong |
Keywords: Representation Learning, Computer Vision for Automation, Deep Learning for Visual Perception
Abstract: This paper presents a flexible representation of neural radiance fields based on multi-plane images (MPI), for high-quality view synthesis of complex scenes. MPI with Normalized Device Coordinate (NDC) parameterization is widely used in NeRF learning for its simple definition, easy calculation, and powerful ability to represent unbounded scenes. However, existing NeRF works that adopt MPI representation for novel view synthesis can only handle simple forward-facing unbounded scenes (e.g., the scenes in the LLFF dataset), where the input cameras are all observing in similar directions with small relative translations. Hence, extending these MPI-based methods to more complex scenes like large-range or even 360-degree scenes is very challenging. In this paper, we explore the potential of MPI and show that MPI can synthesize high-quality novel views of complex scenes with diverse camera distributions and view directions, which are not only limited to simple forward-facing scenes. Our key idea is to encode the neural radiance field with multiple MPIs facing different directions and blend them with an adaptive blending operation. For each region of the scene, the blending operation gives larger blending weights to those advantaged MPIs with stronger local representation abilities while giving lower weights to those with weaker representation abilities. Such blending operation automatically modulates the multiple MPIs to appropriately represent the diverse local density and color information. Experiments on the KITTI dataset and ScanNet dataset demonstrate that our proposed MMPI synthesizes high-quality images from diverse camera pose distributions and is fast to train, outperforming the previous fast-training NeRF methods for novel view synthesis. Moreover, we show that MMPI can encode extremely long trajectories and produce novel view renderings, demonstrating its potential in applications like autonomous driving.
|
|
13:30-15:00, Paper ThBT18-AX.4 | Add to My Program |
Neural Implicit Swept Volume Models for Fast Collision Detection |
|
Joho, Dominik | KUKA Deutschland GmbH |
Schwinn, Jonas | Kuka Deutschland GmbH |
Safronov, Kirill | KUKA Deutschland GmbH |
Keywords: Representation Learning, Integrated Planning and Learning, Deep Learning in Grasping and Manipulation
Abstract: Collision detection is one of the most time-consuming operations during motion planning. Thus, there is an increasing interest in exploring machine learning techniques to speed up collision detection and sampling-based motion planning. A recent line of research focuses on utilizing neural signed distance functions of either the robot geometry or the swept volume of the robot motion. Building on this, we present a novel neural implicit swept volume model to continuously represent arbitrary motions parameterized by their start and goal configurations. This allows to quickly compute signed distances for any point in the task space to the robot motion. Further, we present an algorithm combining the speed of the deep learning-based signed distance computations with the strong accuracy guarantees of geometric collision checkers. We validate our approach in simulated and real-world robotic experiments, and demonstrate that it is able to speed up a commercial bin picking application.
|
|
13:30-15:00, Paper ThBT18-AX.5 | Add to My Program |
Reg-NF: Efficient Registration of Implicit Surfaces within Neural Fields |
|
Hausler, Stephen | CSIRO |
Hall, David | Commonwealth Scientific and Industrial Research Organisation |
Mahendren, Sutharsan | Queensland University of Technology |
Moghadam, Peyman | CSIRO |
Keywords: Representation Learning, Deep Learning for Visual Perception
Abstract: Neural fields, coordinate-based neural networks, have recently gained popularity for implicitly representing a scene. In contrast to classical methods that are based on explicit representations such as point clouds, neural fields provide a continuous scene representation able to represent 3D geometry and appearance in a way which is compact and ideal for robotics applications. However, limited prior methods have investigated registering multiple neural fields by directly utilising these continuous implicit representations. In this paper, we present Reg-NF, a neural fields-based registration that optimises for the relative 6-DoF transformation between two arbitrary neural fields, even if those two fields have different scale factors. Key components of Reg-NF include a bidirectional registration loss, multi-view surface sampling, and utilisation of volumetric signed distance functions (SDFs). We showcase our approach on a new neural field dataset for evaluating registration problems. We provide an exhaustive set of experiments and ablation studies to identify the performance of our approach, while also discussing limitations to provide future direction to the research community on open challenges in utilizing neural fields in unconstrained environments.
|
|
13:30-15:00, Paper ThBT18-AX.6 | Add to My Program |
3D-OAE: Occlusion Auto-Encoders for Self-Supervised Learning on Point Clouds |
|
Zhou, Junsheng | Tsinghua University |
Wen, Xin | NVIDIA |
Ma, Baorui | Beijing Academy of Artificial Intelligence |
Liu, Yu-Shen | Tsinghua University |
Gao, Yue | Tsinghua University |
Fang, Yi | New York University |
Han, Zhizhong | Wayne State University |
Keywords: Representation Learning
Abstract: The manual annotation for large-scale point clouds is still tedious and unavailable for many harsh real-world tasks. Self-supervised learning, which is used on raw and unlabeled data to pre-train deep neural networks, is a promising approach to address this issue. Existing works usually take the common aid from auto-encoders to establish the self-supervision by the self-reconstruction schema. However, the previous auto-encoders merely focus on the global shapes and do not distinguish the local and global geometric features apart. To address this problem, we present a novel and efficient self-supervised point cloud representation learning framework, named 3D Occlusion Auto-Encoder (3D-OAE), to facilitate the detailed supervision inherited in local regions and global shapes. We propose to randomly occlude some local patches of point clouds and establish the supervision via inpainting the occluded patches using the remaining ones. Specifically, we design an asymmetrical encoder-decoder architecture based on standard Transformer, where the encoder operates only on the visible subset of patches to learn local patterns, and a lightweight decoder is designed to leverage these visible patterns to infer the missing geometries via self-attention. We find that occluding a very high proportion of the input point cloud (e.g. 75%) will still yield a nontrivial self-supervisory performance, which enables us to achieve 3-4 times faster during training but also improve accuracy. Experimental results show that our approach outperforms the state-of-the-art on a diverse range of downstream discriminative and generative tasks. Code is available at https://github.com/junshengzhou/3D-OAE.
|
|
13:30-15:00, Paper ThBT18-AX.7 | Add to My Program |
Composing Pre-Trained Object-Centric Representations for Robotics from "What" and "Where" Foundation Models |
|
Shi, Junyao | University of Pennsylvania |
Qian, Jianing | University of Pennsylvania |
Ma, Yecheng Jason | University of Pennsylvania |
Jayaraman, Dinesh | University of Pennsylvania |
Keywords: Representation Learning, Imitation Learning, Sensorimotor Learning
Abstract: There have recently been large advances both in pre-training visual representations for robotic control and segmenting unknown category objects in general images. To leverage these for improved robot learning, we propose POCR, a new framework for building pre-trained object-centric representations for robotic control. Building on theories of "what-where" representations in psychology and computer vision, we use segmentations from a pre-trained model to stably locate across timesteps, various entities in the scene, capturing "where" information. To each such segmented entity, we apply other pre-trained models that build vector descriptions suitable for robotic control tasks, thus capturing "what" the entity is. Thus, our pre-trained object-centric representations for control are constructed by appropriately combining the outputs of off-the-shelf pre-trained models, with no new training. On various simulated and real robotic tasks, we show that imitation policies for robotic manipulators trained on POCR achieve better performance and systematic generalization than state of the art pre-trained representations for robotics, as well as prior object-centric representations that are typically trained from scratch.
|
|
13:30-15:00, Paper ThBT18-AX.8 | Add to My Program |
SKT-Hang: Hanging Everyday Objects Via Object-Agnostic Semantic Keypoint Trajectory Generation |
|
Kuo, Chia-Liang | National Yang Ming Chiao Tung University |
Chao, Yu-Wei | NVIDIA |
Chen, Yi-Ting | National Yang Ming Chiao Tung University |
Keywords: Representation Learning, Perception for Grasping and Manipulation, Deep Learning for Visual Perception
Abstract: We study the problem of hanging a wide range of grasped objects on diverse supporting items. Hanging objects is a ubiquitous task that is encountered in numerous aspects of our everyday lives. However, both the objects and supporting items can exhibit substantial variations in their shapes and structures, bringing two challenging issues: (1) determining the task-relevant geometric structures across different objects and supporting items, and (2) identifying a robust action sequence to accommodate the shape variations of supporting items. To this end, we propose Semantic Keypoint Trajectory (SKT), an object-agnostic representation that is highly versatile and applicable to various everyday objects. We also propose Shape-conditioned Trajectory Deformation Network (SCTDN), a model that learns to generate SKT by deforming a template trajectory based on the task-relevant geometric structure features of the supporting items. We conduct extensive experiments and demonstrate substantial improvements in our framework over existing robot hanging methods in the success rate and inference time. Finally, our simulation-trained framework shows promising hanging results in the real world. For videos and supplementary materials, please visit our project webpage: https://hcis-lab.github.io/SKTHang/.
|
|
13:30-15:00, Paper ThBT18-AX.9 | Add to My Program |
Object-Centric Cross-Modal Feature Distillation for Event-Based Object Detection |
|
Li, Lei | ETH Zurich |
Liniger, Alexander | ETH Zurich |
Millhaeusler, Mario | Huawei Zurich |
Tsiminaki, Vagia | Huawei Zurich |
Li, Yuanyou | Huawei |
Dai, Dengxin | ETH Zurich |
Keywords: Object Detection, Segmentation and Categorization, Computer Vision for Transportation
Abstract: Event cameras are gaining popularity due to their unique properties, such as their low latency and high dynamic range. One task where these benefits can be crucial is real-time object detection. However, RGB detectors still outperform event-based detectors due to the sparsity of the event data and missing visual details. In this paper, we propose a cross-modality feature distillation method that can focus on regions where the knowledge distillation works best to shrink the detection performance gap between these two modalities. We achieve this by using an object-centric slot attention mechanism that can iteratively decouple feature maps into object-centric features and corresponding pixel-features used for distillation. We evaluate our novel distillation approach on a synthetic and a real event dataset with aligned grayscale images as a teacher modality. We show that object-centric distillation allows to significantly improve the performance of the event-based student object detector, nearly halving the performance gap with respect to the teacher.
|
|
ThBT19-NT Oral Session, NT-G301 |
Add to My Program |
Surgical Robotics II |
|
|
Chair: Yip, Michael C. | University of California, San Diego |
Co-Chair: Rucker, Caleb | University of Tennessee |
|
13:30-15:00, Paper ThBT19-NT.1 | Add to My Program |
A Novel Robotic Bronchoscope with a Spring-Based Extensible Segment for Improving Steering Ability |
|
Wang, Jie | Tsinghua University |
Hu, Chengquan | Tsinghua University |
Kang, Jingyi | Tsinghua University |
Liu, Jiayuan | Tsinghua University |
Ma, Longfei | Tsinghua University |
Liao, Hongen | Tsinghua University |
Keywords: Surgical Robotics: Laparoscopy, Tendon/Wire Mechanism, Medical Robots and Systems
Abstract: Bronchoscopy, as an essential minimally invasive diagnostic and therapeutic modality, assumes a pivotal role in the early detection of lung cancer. However, the complex anatomy of the airway and the fixed length of the bronchoscope’s bending segment, along with its external propulsion property, pose challenges, including the risk of bleeding. This paper introduces a 4 mm diameter robot-assisted bronchoscope with a spring-based extensible segment. By manipulating two driven rods, the segment can be lengthened or shortened. The advantages of the extensible segment are discussed in two main aspects through theoretical analysis and experimentation. Firstly, the extensible segment enables the bronchoscope to move in a follow-the-leader motion mode or fixed-angle motion mode, navigating through narrow corners that are inaccessible to fixed-length bronchoscopes. It can also be shortened to increase its stiffness when it reaches the target position, creating a stable surgical platform for procedures like biopsies. In addition, a tailored master device has been developed to control the extensible bronchoscope in an isotropic manner. Phantom experiments confirm the feasibility and effectiveness of the extensible bronchoscope.
|
|
13:30-15:00, Paper ThBT19-NT.2 | Add to My Program |
Robust Surgical Tool Tracking with Pixel-Based Probabilities for Projected Geometric Primitives |
|
D'Ambrosia, Christopher | University of California, San Diego |
Richter, Florian | University of California, San Diego |
Chiu, Zih-Yun | University of California, San Diego |
Shinde, Nikhil | University of California San Diego |
Liu, Fei | UCSD |
Christensen, Henrik | University of California, San Diego |
Yip, Michael C. | University of California, San Diego |
Keywords: Surgical Robotics: Laparoscopy, Visual Servoing, Computer Vision for Medical Robotics
Abstract: Controlling robotic manipulators via visual feedback requires a known coordinate frame transformation between the robot and the camera. Uncertainties in mechanical systems as well as camera calibration create errors in this coordinate frame transformation. These errors result in poor localization of robotic manipulators and create a significant challenge for applications that rely on precise interactions between manipulators and the environment. In this work, we estimate the camera-to-base transform and joint angle measurement errors for surgical robotic tools using an image based insertion-shaft detection algorithm and probabilistic models. We apply our proposed approach in both a structured environment as well as an unstructured environment and measure to demonstrate the efficacy of our methods.
|
|
13:30-15:00, Paper ThBT19-NT.3 | Add to My Program |
Ada-Tracker: Soft Tissue Tracking Via Inter-Frame and Adaptive-Template Matching |
|
Guo, Jiaxin | The Chinese University of Hong Kong |
Wang, Jiangliu | The Chinese University of Hong Kong |
Li, Zhaoshuo | Johns Hopkins University |
Jia, Tongyu | Faculty of Urology, Third Medical Center, Chinese PLA General Ho |
Dou, Qi | The Chinese University of Hong Kong |
Liu, Yunhui | Chinese University of Hong Kong |
Keywords: Surgical Robotics: Laparoscopy, Visual Tracking
Abstract: Soft tissue tracking is crucial for computer-assisted interventions. Existing approaches mainly rely on extracting discriminative features from the template and videos to recover corresponding matches. However, it is difficult to adopt these techniques in surgical scenes, where tissues are changing in shape and appearance throughout the surgery. To address this problem, we exploit optical flow to naturally capture the pixel-wise tissue deformations and adaptively correct the tracked template. Specifically, we first implement an inter-frame matching mechanism to extract a coarse region of interest based on optical flow from consecutive frames. To accommodate appearance change and alleviate drift, we then propose an adaptive-template matching method, which updates the tracked template based on the reliability of the estimates. Our approach, Ada-Tracker, enjoys both short-term dynamics modeling by capturing local deformations and long-term dynamics modeling by introducing global temporal compensation. We evaluate our approach on the public SurgT benchmark, which is generated from Hamlyn, SCARED, and Kidney boundary datasets. The experimental results show that Ada-Tracker achieves superior accuracy and performs more robustly against prior works. Code is available at https://github.com/wrld/Ada-Tracker.
|
|
13:30-15:00, Paper ThBT19-NT.4 | Add to My Program |
Real-To-Sim Deformable Object Manipulation: Optimizing Physics Models with Residual Mappings for Robotic Surgery |
|
Liang, Xiao | University of California San Diego |
Liu, Fei | UCSD |
Zhang, Yutong | University of California San Diego |
Li, Yuelei | University of California, San Diego |
Lin, Shan | University of California, San Diego |
Yip, Michael C. | University of California, San Diego |
Keywords: Surgical Robotics: Planning, Computational Geometry, Computer Vision for Medical Robotics
Abstract: Accurate deformable object manipulation (DOM) is essential for achieving autonomy in robotic surgery, where soft tissues are being displaced, stretched, and dissected. Many DOM methods can be powered by simulation, which ensures realistic deformation by adhering to the governing physical constraints and allowing for model prediction and control. However, real soft objects in robotic surgery, such as membranes and soft tissues, have complex, anisotropic physical parameters that a simulation with simple initialization from cameras may not fully capture. To use the simulation techniques in real surgical tasks, the real-to-sim gap needs to be properly compensated. In this work, we propose an online, adaptive parameter tuning approach for simulation optimization that (1) bridges the real-to-sim gap between a physics simulation and observations obtained 3D perceptions through estimating a residual mapping and (2) optimizes its stiffness parameters online. Our method ensures a small residual gap between the simulation and observation and improves the simulation's predictive capabilities. The effectiveness of the proposed mechanism is evaluated in the manipulation of both a thin-shell and volumetric tissue, representative of most tissue scenarios. This work contributes to the advancement of simulation-based deformable tissue manipulation and holds potential for improving surgical autonomy.
|
|
13:30-15:00, Paper ThBT19-NT.5 | Add to My Program |
Efficient and Accurate Mapping of Subsurface Anatomy Via Online Trajectory Optimization for Robot Assisted Surgery |
|
Cho, Brian Y | University of Utah |
Kuntz, Alan | University of Utah |
Keywords: Surgical Robotics: Planning, Mapping, Medical Robots and Systems
Abstract: Robotic surgical subtask automation has the potential to reduce the per-patient workload of human surgeons. There are a variety of surgical subtasks that require geometric information of subsurface anatomy, such as the location of tumors, which necessitates accurate and efficient surgical sensing. In this work, we propose an automated sensing method that maps 3D subsurface anatomy to provide such geometric knowledge. We model the anatomy via a Bayesian Hilbert map-based probabilistic 3D occupancy map. Using the 3D occupancy map, we plan sensing paths on the surface of the anatomy via a graph search algorithm, A* search, with a cost function that enables the trajectories generated to balance between exploration of unsensed regions and refining the existing probabilistic understanding. We demonstrate the performance of our proposed method by comparing it against 3 different methods in several anatomical environments including a real-life CT scan dataset. The experimental results show that our method efficiently detects relevant subsurface anatomy with shorter trajectories than the comparison methods, and the resulting occupancy map achieves high accuracy.
|
|
13:30-15:00, Paper ThBT19-NT.6 | Add to My Program |
A Cross-Entropy Motion Planning Framework for Hybrid Continuum Robots |
|
Chen, Jibiao | The Chinese University of Hong Kong |
Yan, Junyan | The Chinese University of Hong Kong |
Qiu, Yufu | The Chinese University of HongKong |
Fang, Haiyang | The Chinese University of Hong Kong |
Chen, Jianghua | The Chinese University of Hong Kong |
Cheng, Shing Shin | The Chinese University of Hong Kong |
Keywords: Surgical Robotics: Planning, Motion and Path Planning
Abstract: The sampling-based motion planners, including the Rapidly-exploring Random Trees (RRT) algorithms, are widely utilized in continuum robots, enabling efficient search for feasible motion plans in constrained environments. In surgical robotics, complex mapping among the high-dimensional kinematics of continuum robots, trajectory parameterization, and path redundancy may lead to non-optimal motion path, which in turn affects their efficiency and surgical task performance (e.g. path following), and ultimately the patient outcome. In this letter, a cross-entropy (CE) motion planning framework is proposed for continuum robots, wherein the RRT* planner is equipped with a CE estimation method serving as a probabilistic model to sample elite trajectories with optimal computation costs. It can asymptotically optimize the sampling distributions among individuals in terms of either robot states or parameterized trajectories. The presented CE motion planners were implemented on a hybrid continuum robot to enable obstacle avoidance, approximate follow-the-leader (FTL) motion, and navigation in a clinical scenario. They are shown to offer lower sampling cost and higher computational efficiency compared to existing approaches.
|
|
13:30-15:00, Paper ThBT19-NT.7 | Add to My Program |
Evaluating the Task Generalization of Temporal Convolutional Networks for Surgical Gesture and Motion Recognition Using Kinematic Data |
|
Hutchinson, Kay | University of Virginia |
Reyes, Ian | IBM |
Li, Zongyu | The University of Virginia |
Alemzadeh, Homa | University of Virginia |
Keywords: Surgical Robotics: Planning, Recognition, Deep Learning Methods
Abstract: Fine-grained activity recognition enables explainable analysis of procedures for skill assessment, autonomy, and error detection in robot-assisted surgery. However, existing recognition models suffer from the limited availability of annotated datasets with both kinematic and video data and an inability to generalize to unseen subjects and tasks. Kinematic data from the surgical robot is particularly critical for safety monitoring and autonomy, as it is unaffected by common camera issues such as occlusions and lens contamination. We leverage an aggregated dataset of six dry-lab surgical tasks from a total of 28 subjects to train activity recognition models at the gesture and motion primitive (MP) levels and for separate robotic arms using only kinematic data. The models are evaluated using the LOUO (Leave-One-User-Out) and our proposed LOTO (Leave-One-Task-Out) cross validation methods to assess their ability to generalize to unseen users and tasks respectively. Gesture recognition models achieve higher accuracies and edit scores than MP recognition models. But, using MPs enables the training of models that can generalize better to unseen tasks. Also, higher MP recognition accuracy can be achieved by training separate models for the left and right robot arms. For task-generalization, MP recognition models perform best if trained on similar tasks and/or tasks from the same dataset.
|
|
13:30-15:00, Paper ThBT19-NT.8 | Add to My Program |
Lens Capsule Tearing in Cataract Surgery Using Reinforcement Learning |
|
Peter, Rebekka Charlotte | Carl Zeiss AG |
Peikert, Steffen | Friedrich-Alexander-University (FAU) Erlangen-Nuremberg |
Haide, Ludwig | Carl Zeiss AG |
Pham, Doan Xuan Viet | Carl Zeiss AG |
Chettaoui, Tahar | Carl Zeiss AG, Karlsruhe Institute of Technology - KIT (master T |
Tagliabue, Eleonora | Carl Zeiss AG |
Scheikl, Paul Maria | Friedrich-Alexander-Universität Erlangen-Nürnberg (FAU) |
Fauser, Johannes | Carl ZEISS AG |
Hillenbrand, Matthias | Carl Zeiss AG |
Neumann, Gerhard | Karlsruhe Institute of Technology |
Mathis-Ullrich, Franziska | Friedrich-Alexander-University Erlangen-Nurnberg (FAU) |
Keywords: Surgical Robotics: Planning, Reinforcement Learning, Simulation and Animation
Abstract: Cataract is the leading cause of blindness worldwide with an increasing number of patients due to changing demographics, making automation an important part in future surgical treatment. In this work, we focus on a substep of cataract surgery, the Continuous Curvilinear Capsulorhexis (CCC). With a high complexity, this task is an ideal candidate for Reinforcement Learning (RL) in simulation. First, we present an interactive and physically realistic simulation based on the Finite Element Method (FEM) that mimics the tearing behavior of soft tissue during CCC. Then, we train and evaluate RL models in simulation, demonstrating that the trained policies can complete the CCC in 85% of cases. We also show that applying domain randomization techniques make the policy more robust against changes in geometrical and biomechanical boundary conditions.
|
|
13:30-15:00, Paper ThBT19-NT.9 | Add to My Program |
ORBIT-Surgical: An Open-Simulation Framework for Learning Surgical Augmented Dexterity |
|
Yu, Qinxi | University of Toronto |
Moghani, Masoud | University of Toronto |
Dharmarajan, Karthik | UC Berkeley |
Schorp, Vincent | UC Berkeley, AUTOLab |
Panitch, William | University of California, Berkeley |
Liu, Jingzhou | University of Toronto, NVIDIA |
Hari, Kush | UC Berkeley |
Huang, Huang | University of California at Berkeley |
Mittal, Mayank | ETH Zurich |
Goldberg, Ken | UC Berkeley |
Garg, Animesh | Georgia Institute of Technology |
Keywords: Surgical Robotics: Planning, Simulation and Animation
Abstract: Physics-based simulations have accelerated progress in robot learning for driving, manipulation, and locomotion. Yet, a fast, accurate, and robust surgical simulation environment remains a challenge. In this paper, we present ORBIT-Surgical, a physics-based surgical robot simulation framework with photorealistic rendering in NVIDIA Omniverse. We provide 14 benchmark surgical tasks for the da Vinci Research Kit (dVRK) and Smart Tissue Autonomous Robot (STAR) which represent common subtasks in surgical training. ORBIT-Surgical leverages GPU parallelization to train reinforcement learning and imitation learning algorithms to facilitate study of robot learning to augment human surgical skills. ORBIT-Surgical also facilitates realistic synthetic data generation for active perception tasks. We demonstrate ORBIT-Surgical sim-to-real transfer of learned policies onto a physical dVRK robot. Project website: orbit-surgical.github.io
|
|
ThBT20-NT Oral Session, NT-G302 |
Add to My Program |
Failure Detection and Recovery |
|
|
Chair: Haddadin, Sami | Technical University of Munich |
Co-Chair: Aksoy, Eren Erdal | Halmstad University |
|
13:30-15:00, Paper ThBT20-NT.1 | Add to My Program |
Relaxed Hover Solution Based Control for a Bi-Copter with Rotor and Servo Stuck Failure |
|
Zhao, Haixin | Beihang University |
Li, Ruifeng | Beihang University |
Quan, Quan | Beihang University |
Keywords: Failure Detection and Recovery, Aerial Systems: Mechanics and Control, Robust/Adaptive Control
Abstract: As the usage of bi-copters increases in military and civilian fields, the demand for reliable bi-copters is on the rise. This study focuses on controlling a bi-copter under rotor or servo stuck failure. A relaxed hover solution is derived for the bi-copter, by solving an optimization problem subject to rotor and servo stuck failures. The solution is used for designing a reduced attitude controller based on linear quadratic regulator (LQR). To ensure hover capability, we introduce a position controller based on a cascaded-PID. The numerical simulations are conducted to demonstrate that position control is possible, even with complete rotor or servo stuck failure, by driving the bi-copter into relaxed hover state through the abandonment of the yaw channel. Meanwhile, the FTC scheme is examined under constant wind disturbances and uncertainties in the rotational damping parameters.
|
|
13:30-15:00, Paper ThBT20-NT.2 | Add to My Program |
Aim-Aware Collision Monitoring: Discriminating between Expected and Unexpected Post-Impact Behaviors |
|
Proper, Benn | Eindhoven University of Technology |
Kurdas, Alexander Andreas | Technical University of Munich |
Abdolshah, Saeed | KUKA Deutschland GmbH |
Haddadin, Sami | Technical University of Munich |
Saccon, Alessandro | Eindhoven University of Technology |
Keywords: Failure Detection and Recovery, Contact Modeling, Perception for Grasping and Manipulation
Abstract: To speed up and reduce power consumption per cycle in robotic manipulation, one option is to exploit intentional collisions with the surrounding environment and objects, an approach referred to as impact-aware manipulation. Within this context, this paper focuses on developing an online collision monitoring framework for distinguishing between expected and unexpected post-impact behaviors. The classification is based on a desired post-impact motion created via an idealized rigid robot-object-environment model. To generate a classification error bound, it employs a causal envelop filter that is needed due to the unavoidable joint and environment flexibility. In this way, it becomes possible to compare a desired idealized rigid response, which is straightforward to obtain with existing tools, with a measured impact response, which is affected by difficult-to-model post-impact oscillations. The classifier can be used for single-contact as well as multi-contact impact scenarios, such as those occurring in surface-to-surface impacts, and allows for tuning of the sensitivity between expected and unexpected post-impact behaviors. The monitoring framework fuses a (bandpass) momentum observer with impact-aware control to extend the classical collision event pipeline. As a proof of concept, we show the effectiveness of the approach through numerical simulations as well as with preliminary experimental results.
|
|
13:30-15:00, Paper ThBT20-NT.3 | Add to My Program |
The Voraus-AD Dataset for Anomaly Detection in Robot Applications |
|
Brockmann, Jan Thieß | Voraus Robotik GmbH |
Rudolph, Marco | Leibniz University Hannover |
Rosenhahn, Bodo | Institute of Information Processing, Leibniz Universität Hannove |
Wandt, Bastian | Linköping University |
Keywords: Failure Detection and Recovery, Datasets for Anomaly Detection, Deep Learning in Robotics and Automation, Probability and Statistical Methods
Abstract: During the operation of industrial robots, unusual events may endanger the safety of humans and the quality of production. When collecting data to detect such cases, it is not ensured that data from all potentially occurring errors is included as unforeseeable events may happen over time. Therefore, anomaly detection (AD) delivers a practical solution, using only normal data to learn to detect unusual events. We introduce a dataset that allows training and benchmarking of anomaly detection methods for robotic applications based on machine data which will be made publicly available to the research community. As a typical robot task the dataset includes a pick-and-place application which involves movement, actions of the end effector and interactions with the objects of the environment. Since several of the contained anomalies are not task-specific but general, evaluations on our dataset are transferable to other robotics applications as well. Additionally, we present MVT-Flow as a new baseline method for anomaly detection: It relies on deep-learning-based density estimation with normalizing flows, tailored to the data domain by taking its structure into account.
|
|
13:30-15:00, Paper ThBT20-NT.4 | Add to My Program |
Multimodal Detection and Classification of Robot Manipulation Failures |
|
Inceoglu, Arda | Istanbul Technical University |
Aksoy, Eren Erdal | Halmstad University |
Sariel, Sanem | Istanbul Technical University |
Keywords: Failure Detection and Recovery, Deep Learning in Grasping and Manipulation, Sensor Fusion
Abstract: An autonomous service robot should be able to interact with its environment safely and robustly without requiring human assistance. Unstructured environments are challenging for robots since the exact prediction of outcomes is not always possible. Even when the robot behaviors are well-designed, the unpredictable nature of physical robot-object interaction may prevent success in object manipulation. Therefore, execution of a manipulation action may result in an undesirable outcome involving accidents or damages to the objects or environment. Situation awareness becomes important in such cases to enable the robot to (i) maintain the integrity of both itself and the environment, (ii) recover from failed tasks in the short term, and (iii) learn to avoid failures in the long term. For this purpose, robot executions should be continuously monitored, and failures should be detected and classified appropriately. In this work, we focus on detecting and classifying both manipulation and post-manipulation phase failures using the same exteroception setup. We cover a diverse set of failure types for primary tabletop manipulation actions. In order to detect these failures, we propose FINO-Net [1], a deep multimodal sensor fusion based classifier network. Proposed network accurately detects and classifies failures from raw sensory data without any prior knowledge. In this work, we use our extended FAILURE dataset [1] with 99 new multimodal manipulation recordings and annotate them with
|
|
13:30-15:00, Paper ThBT20-NT.5 | Add to My Program |
FT-Net: Learning Failure Recovery and Fault-Tolerant Locomotion for Quadruped Robots |
|
Luo, Zeren | The University of Hong Kong |
Xiao, Erdong | The University of Hong Kong |
Lu, Peng | The University of Hong Kong |
Keywords: Failure Detection and Recovery, Legged Robots
Abstract: Quadruped robots, in recent years, have been increasingly used in extremely harsh and dangerous conditions. Consequently, diverse severe hardware failures may occur at any time during the working cycle of the robots. In this work, we propose a fault-tolerant (FT) control pipeline based on model-free reinforcement learning -- FT-Net, which is guided by an inverted pendulum model and support polygon. This pipeline allows the robot to dynamically and autonomously adapt to both partial and complete motor failures. Unlike conventional FT control methods that need to pinpoint the failed location, our controller identifies the fault implicitly with a neural network-based adaptor. Furthermore, we achieve a unified policy that is capable of switching from four-legged to three-legged walking mode when the complete motor failure occurs. It is shown by both extensive simulation and hardware experiments that FT-Net learns to effectively perform recovery behaviors. The fault-tolerant locomotion can even be executed in various dynamic tasks and terrains.
|
|
13:30-15:00, Paper ThBT20-NT.6 | Add to My Program |
Utilizing a Malfunctioning 3D Printer by Modeling Its Dynamics with Machine Learning |
|
Caballero, Renzo | King Abdullah University of Science and Technology |
Piękos, Piotr | King Abdullah University of Science and Technology |
Feron, Eric | King Abdullah University of Science and Technology |
Schmidhuber, Jurgen | Technische Universität München |
Keywords: Failure Detection and Recovery, Model Learning for Control, Robust/Adaptive Control
Abstract: To create a self-repairing 3D printer, it must continue operating even after experiencing corruption. This work focuses on developing a method to effectively utilize a malfunctioning printer for reliable printing. This method can be applied by the printer itself for self-repair and enhance the reliability of commercial 3D printers. We achieve this by modeling the dynamics of the corrupted printer using a machine learning model that by observing one trajectory infers the corrupted printer dynamics to improve its accuracy. Our method is evaluated on a digital twin of the 3D printer, demonstrating its capability to enable the printer to operate reliably, even when encountering new corruptions not encountered during training.
|
|
13:30-15:00, Paper ThBT20-NT.7 | Add to My Program |
A Novel Metric for Detecting Quadrotor Loss-Of-Control |
|
van Beers, Jasper | Delft University of Technology |
Solanki, Prashant | Delft University of Technology |
de Visser, Coen | TU Delft |
Keywords: Failure Detection and Recovery, Robot Safety, Aerial Systems: Mechanics and Control
Abstract: Unmanned aerial vehicles (UAVs) are becoming an integral part of both industry and society. In particular, the quadrotor is now invaluable across a plethora of fields and recent developments, such as the inclusion of aerial manipulators, only extends their versatility. As UAVs become more widespread, preventing loss-of-control (LOC) is an ever growing concern. Unfortunately, LOC is not clearly defined for quadrotors, or indeed, many other autonomous systems. Moreover, any existing definitions are often incomplete and restrictive. A novel metric, based on actuator capabilities, is introduced to detect LOC in quadrotors. The potential of this metric for LOC detection is demonstrated through both simulated and real quadrotor flight data. It is able to detect LOC induced by actuator faults without explicit knowledge of the occurrence and nature of the failure. The proposed metric is also sensitive enough to detect LOC in more nuanced cases, where the quadrotor remains undamaged but nevertheless losses control through an aggressive yawing manoeuvre. As the metric depends only on system and actuator models, it is sufficiently general to be applied to other systems.
|
|
13:30-15:00, Paper ThBT20-NT.8 | Add to My Program |
Specifying and Monitoring Safe Driving Properties with Scene Graphs |
|
Toledo, Felipe | University of Virginia |
Woodlief, Trey | University of Virginia |
Elbaum, Sebastian | University of Virginia |
Dwyer, Matthew | University of Virginia |
Keywords: Failure Detection and Recovery, Robot Safety, Semantic Scene Understanding
Abstract: With the proliferation of autonomous vehicles (AVs) comes the need to ensure they abide by safe driving properties. Specifying and monitoring such properties, however, is challenging because of the mismatch between the semantic space over which typical driving properties are asserted (e.g., vehicles, pedestrians, intersections) and the sensed inputs of AVs. Existing efforts either assume for such semantic data to be available or develop bespoke methods for capturing it. Instead, this work introduces a framework that can extract scene graphs (SGs) from sensor inputs to capture the entities related to the AV, and a domain-specific language that enables building propositions over those graphs and composing them through temporal logic. We implemented the framework to monitor for specification violations of 3 top AVs from the CARLA Autonomous Driving Leaderboard, and found that the AVs violated 71% of properties during at least one test. Artifact available at https://github.com/less-lab-uva/SGSM
|
|
13:30-15:00, Paper ThBT20-NT.9 | Add to My Program |
Using Large Language Models to Generate and Apply Contingency Handling Procedures in Collaborative Assembly Applications |
|
Kang, Jeon Ho | University of Southern California |
Dhanaraj, Neel | University of Southern California |
Wadaskar, Siddhant Ravindra | University of Southern California |
Gupta, Satyandra K. | University of Southern California |
Keywords: Failure Detection and Recovery, Task Planning, Intelligent and Flexible Manufacturing
Abstract: In manufacturing, minimizing operational delays is crucial for efficiency and resilience. Therefore efficiently handling contingencies is an important capability in the context of human-robot teams working on assembly (i.e., collaborative assembly) applications. This paper introduces a novel approach to generating contingency handling procedures by leveraging recent advances in Large Language Models (LMMs). Our approach uses LLMs to update the required tasks in hierarchical task networks (HTNs) to handle contingencies. The results demonstrate that our approach is able to handle a wide variety of contingencies in assembly applications and minimizes impact on the assembly completion time.
|
|
ThBT21-NT Oral Session, NT-G303 |
Add to My Program |
Micro/Nano Robots II |
|
|
Chair: Yamanishi, Yoko | Kyushu University |
Co-Chair: Zhang, Jiachen | City University of Hong Kong |
|
13:30-15:00, Paper ThBT21-NT.1 | Add to My Program |
A Robotic Surgery Platform for Automated Tissue Micromanipulation in Zebrafish Embryos |
|
Ozelci, Ece | EPFL |
Etesami, Erfan | EPFL |
Rohde, Laurel | EPFL |
Oates, Andrew | EPFL |
Sakar, Mahmut Selman | EPFL |
Keywords: Automation at Micro-Nano Scales, Biological Cell Manipulation
Abstract: Microsurgical manipulations are key experimental techniques in life science research, particularly in embryology. These techniques are most often performed manually by highly skilled scientists, posing limitations on speed, precision, and re- producibility. Here we introduce a fully automated robotic mi- crosurgery platform that generates explants of specific tail tis- sue from growing zebrafish embryos, a popular model organism for vertebrate development. Our work leverages both classical and deep learning-based image-processing techniques to perform robotic micromanipulation on biological specimens. Using two ex- ample experimental cases as proof of concept, we show that our automated platform is more precise, accurate, and efficient than teleoperated and manual microsurgery conducted by experienced scientists. Moreover, we demonstrate the usefulness of our platform for inexperienced experimentalists, supporting an important role for robotic microsurgery in broadening the use of such techniques in experimental research.
|
|
13:30-15:00, Paper ThBT21-NT.2 | Add to My Program |
Skill Learning in Robot-Assisted Micro-Manipulation through Human Demonstrations with Attention Guidance |
|
An, Yujian | Shanghai Jiao Tong University |
Yang, Jianxin | Shanghai Jiao Tong University |
Li, Jinkai | Shanghai Jiao Tong University |
He, Bingze | Shanghai Jiao Tong University |
Guo, Yao | Shanghai Jiao Tong University |
Yang, Guang-Zhong | Shanghai Jiao Tong University |
Keywords: Automation at Micro-Nano Scales, Learning from Experience, Learning from Demonstration
Abstract: For the development of robotic systems for micro-manipulation, it is challenging to design appropriate control strategies due to either the lack of sufficient information for feedback or the difficulty in extracting subtle yet critical visual features. With the same system under the teleoperated mode, however, human operators seem to be able to complete the task more successfully with an inherent motion and control strategy. The extraction of implicit human attention during the task and integration of this with robot control could provide crucial guidance in the design of feature extraction and motion control algorithms. In this paper, a micro-assembly task of miniature thin membrane sensors is considered. For human demonstrations, we collected data from repeated tests performed by ten operators following three motion strategies. The human attention during the task is explored according to the coordinates of the eye gaze, and then a neural network with gaze-guided attention is trained to segment the visual Region of Interest (ROI). After quantitative evaluation of operator results in terms of success rate, efficiency, reset time, and the Index of Pupillary Activity (IPA), an optimized motion strategy based on the ``palpation" framework was derived. Consequently, we apply this strategy to automated tasks and achieve superior results than human operators, showing an average task completion time of 34.8±5.9s and a success rate of over 90%.
|
|
13:30-15:00, Paper ThBT21-NT.3 | Add to My Program |
Automated Surgical Knot Tying on Mini-Incision with Micro-Suture Based on Dual-Arm Nanorobot under Stereo Microscope |
|
Jiang, Yujie | ShanghaiTech University |
Fu, Xiang | ShanghaiTech University |
Zhong, Chengxi | ShanghaiTech University |
Li, Teng | ShanghaiTech University |
Lu, Haojian | Zhejiang University |
Liu, Song | ShanghaiTech University |
Keywords: Automation at Micro-Nano Scales, Micro/Nano Robots, Dual Arm Manipulation
Abstract: Knot tying is an essential task for robotic surgery, which is routinely realized by dual-arm robotic manipulation. Despite the well-established protocol and progress at macro scale so far, there remain challenges to further advance robotic knot tying technique, particularly in terms of decreasing space consumption with better dexterity, higher precision, and well biomechanical compatibility. In this paper, we propose a novel dual-arm nanorobotic system setup for automated knot tying performed on mini-incision under stereo microscope, featured by an additional rotation degree of freedom mounted on each arm. With this setup, an optimized motion trajectory planning under standard knot-tying protocol is also presented in order to support tying knots with shorter and thinner suture. Leveraging the natural advantage of nanorobotics and microscope, the proposed system is capable of tying consecutive throws with micro-suture on mini-incision, like in vascular anastomosis or microsurgery. We successfully evaluated the knot tying system on 2.0 mm wide bionic blood vessel with 30 mm long #8-0 micro-suture. We finally tested the mechanical strength of the knots for potential medical assessment.
|
|
13:30-15:00, Paper ThBT21-NT.4 | Add to My Program |
Weakly-Supervised Depth Completion During Robotic Micromanipulation from a Monocular Microscopic Image |
|
Yang, Han | The Chinese University of Hong Kong, Shenzhen |
Jin, Yufei | The Chinese Univiersity of Hong Kong(shenzhen) |
Shan, Guanqiao | University of Toronto |
Wang, Yibin | The Chinese University of HongKong, Shenzhen |
Zheng, YongBin | The Chinese University of Hong Kong,Shenzhen |
Yu, Jiangfan | Chinese University of Hong Kong, Shenzhen |
Sun, Yu | University of Toronto |
Zhang, Zhuoran | The Chinese University of Hong Kong, Shenzhen |
Keywords: Biological Cell Manipulation, Automation at Micro-Nano Scales, Deep Learning Methods
Abstract: Obtaining three-dimensional information, especially the z-axis depth information, is crucial for robotic micromanipulation. Due to the unavailability of depth sensors such as lidars in micromanipulation setups, traditional depth acquisition methods such as depth from focus or depth from defocus directly infer depth from microscopic images and suffer from poor resolution. Alternatively, micromanipulation tasks obtain accurate depth information by detecting the contact between an end-effector and an object (e.g., a cell). Despite its high accuracy, only sparse depth data can be obtained due to its low efficiency. This paper aims to address the challenge of acquiring dense depth information during robotic cell micromanipulation. A weakly-supervised depth completion network is proposed to take cell images and sparse depth data obtained by contact detection as input to generate a dense depth map. A two-stage data augmentation method is proposed to augment the sparse depth data, and the depth map is optimized by a network refinement method. The experimental results show that the MAE value of the depth prediction error is less than 0.3 mum, which proves the accuracy and effectiveness of the method. This deep learning network pipeline can be seamlessly integrated with the robotic micromanipulation tasks to provide accurate depth information.
|
|
13:30-15:00, Paper ThBT21-NT.5 | Add to My Program |
Dynamic Adaptive Imaging System on Optoelectronic Tweezers Platform |
|
Wang, Ao | Beihang University |
Gan, Chunyuan | Beihang University |
Han, Haocheng | Beihang University |
Xiong, Hongyi | Beihang University |
Zhao, Jiawei | Beihang University, School of Mechanical Engineering and Automati |
Wang, Chutian | Beihang University |
Feng, Lin | Beihang University |
Keywords: Biological Cell Manipulation, Automation at Micro-Nano Scales, Micro/Nano Robots
Abstract: Optoelectronic tweezers (OET) has shown great promise in various applications, especially in the precise manipulation of microparticles and microorganisms on a micron and nanometer scale. This technology significantly enhances the efficiency of single-cell sorting and the development of antibody-based drugs. However, conventional OET platforms are limited by issues such as low autofocusing accuracy, restricted imaging field of view, and uneven illumination. To overcome these limitations, we have innovatively developed a dynamic adaptive imaging system. By incorporating peak-finding and in situ Gaussian blur compensation algorithms, we achieved rapid automatic focusing and illumination shadow compensation across an expanded field of view. At the same time, the system can also dynamically adjust compensation parameters under different lighting conditions. Our system has successfully completed comprehensive scanning of the optoelectronic tweezers chip, achieving a 60% reduction in autofocus time and a 15.8% improvement in lighting uniformity. Moreover, this imaging system demonstrates robust versatility and can serve as a reference for other optical systems.
|
|
13:30-15:00, Paper ThBT21-NT.6 | Add to My Program |
Automated Dissection of Intact Single Cell from Tissue Using Robotic Micromanipulation System |
|
Zhang, Youchao | Zhejiang University |
Guo, Xiangyu | Zhejiang University |
Wang, Qingyu | Zhejiang University |
Wang, Fanghao | Zhejiang University |
Liu, Chuanjie | Zhejiang University |
Zhou, Mingchuan | Zhejiang University |
Ying, Yibin | Zhejiang University |
Keywords: Biological Cell Manipulation, Automation at Micro-Nano Scales, Visual Tracking
Abstract: Obtaining single cell from tissues is important for intersection of information and bioscience research. In this article, a robotic framework based on micromanipulation system has been proposed, which can automatically and intelligently cut down single cells intact from tissue sections. The proposed method consists of several steps. An attention mechanism improved (AMI) tip localization neural network is proposed to detect and track the needle tip of micro-scale displacement end-effector within the limited field of view under microscopy. Then, the transformation matrix between the camera and coordinate system of the robot is calculated. And the cutting trajectory is generated and optimized. Finally, the end-effector is controlled to obtain intact single cells from tissues by model predictive control (MPC). The performance of the framework is verified in paraffin tissue sections dissection experiment which shows proposed framework is robust and precise enough to obtain an intact single cell. The error of autonomous single cell dissection is no more than 0.61 um.
|
|
13:30-15:00, Paper ThBT21-NT.7 | Add to My Program |
Development of a 3-RRS Micromanipulator Based on Origami-Inspired Spherical Joint |
|
Han, Haoqi | Shanghai Jiao Tong University |
Liu, Xiaoming | Beijing Institute of Technology |
Chen, Yan | Beijing Institute of Technology |
Pang, Hao | Beijing Institute of Technology |
Tang, Xiaoqing | Beijing Institute of Technology |
Liu, Dan | Beijing Institute of Technology |
Huang, Qiang | Beijing Institute of Technology |
Arai, Tatsuo | University of Electro-Communications |
Keywords: Micro/Nano Robots, Compliant Joints and Mechanisms, Parallel Robots
Abstract: In recent years, micromanipulation technology has achieved extensive applications in industry and life science. Improving the precision and bandwidth of the micromanipulator and simultaneously reducing size, weight, and cost pose significant challenges to the existing micromanipulator design and fabrication methods. Here, we propose a 3-RRS micromanipulator with an origami-inspired spherical joint based on the PC-MEMS process, aiming for miniaturization and cost-effectiveness. The spherical joint allows rotations of 140° around the x-axis approximately, 140° around the y-axis approximately, and 20° around the z-axis approximately. The micromanipulator has weights of 0.8 g, dimensions of 16 mm × 16 mm × 22 mm, and workspace of 0.7 mm3. The end platform of the micromanipulator can be equipped with various effectors to accomplish different kinds of tasks. Experimental results validated its high precision and bandwidth, exhibiting its potential to perform intricate micromanipulation tasks.
|
|
13:30-15:00, Paper ThBT21-NT.8 | Add to My Program |
Singularity Analysis and Solutions for the Origami Transmission Mechanism of Fast-Moving Untethered Insect-Scale Robot |
|
Liu, Yide | Zhejiang University |
Feng, Bo | Zhejiang University |
Cheng, Tianlun | Zhejiang University |
Chen, Yanhong | Zhejiang University |
Liu, Xiyan | Zhejiang University |
Zhang, Jiahang | Zhejiang University |
Qu, Shaoxing | Zhejiang University |
Yang, Wei | Zhejiang University |
Keywords: Micro/Nano Robots, Mechanism Design, Parallel Robots, Grassmann-Cayley algebra
Abstract: Designing insect-scale robots with high mobility is becoming an essential challenge in the field of robotics research. Among the methods for fabricating the transmission mechanism of the insect-scale robot, the smart composite microstructure method (SCM) is getting more and more attention. This method can construct compact and functional miniature origami mechanisms through planarized fabrication and folding assembly processes. Our previous work has proposed an untethered robot S2worm equipped with a novel 2-DoF origami transmission mechanism. The S2worm is fabricated through SCM and holds a top speed of 27.4 cm/s. In this work, we propose a novel strategy for designing the insect-scale robot with high mobility, that is, applying Grassmann-Cayley Algebra to avoid the singularity of the transmission mechanism. The experimental results prove that the singularity of the previous work has been solved. The new robot prototype S2worm-G weighs 4.71 g, scales 4.0 cm, achieves a top speed of 75.0 cm/s and a relative speed of 18.8 bodylength/s. To the best of our knowledge, the 2-DoF origami transmission mechanism is the first parallel mechanism designed for the insect-scale robot and the singularity of the mechanism is found and solved here. The experimental results prove that the refined S2worm-G robot is one of the best insect-scale robots for its size, mass, and mobility.
|
|
13:30-15:00, Paper ThBT21-NT.9 | Add to My Program |
A Theoretical Investigation of the Ability of Magnetic Miniature Robots to Exert Forces and Torques for Biomedical Functionalities |
|
Xiang, Yuxuan | City University of Hong Kong |
Zhang, Jiachen | City University of Hong Kong |
Keywords: Micro/Nano Robots, Medical Robots and Systems
Abstract: Magnetic miniature robots exert forces and torques onto the environment to conduct minimally invasive diagnostic and therapeutic tasks. The orders of magnitude of forces and torques determine what functionalities these robots can achieve. Although some studies have been dispersedly reported, the forces and torques have yet to be systematically investigated within biomedical context from underlying physical principles, leaving their theoretical limits elusive. This work constructs a theoretical framework from governing equations to calculate the forces and torques exerted by magnetic miniature robots in their respective targeted workspace to achieve functionalities. It reports that the existing miniature robots with a maximum characteristic length of 10^-2 m can exert a force and a torque up to the order of 10^-1 N and 10^-2 Nm, respectively, considering realistic actuation paradigms and constraints. The attainable force and torque magnitudes are on par with the requirements of surgeries at human head (e.g., brain, eyes, and ears surgeries) or within adjacent regions of human skin (e.g., surgeries in the bladder and some blood vessels), as well as the surgeries on small animals. But they are insufficient for operations in deep-buried regions of large animals and human (e.g., implant therapy, biopsy, and tissue removal). Hence, potential strategies to raise the ceiling of the ranges are examined to extend the functionality catalog and expand the operating scope of these robots.
|
|
ThBT22-NT Oral Session, NT-G304 |
Add to My Program |
Telerobotics and Teleoperation I |
|
|
Chair: Yokokohji, Yasuyoshi | Kobe University |
Co-Chair: Muratore, Luca | Istituto Italiano Di Tecnologia |
|
13:30-15:00, Paper ThBT22-NT.1 | Add to My Program |
Wearable Haptics for a Marionette-Inspired Teleoperation of Highly Redundant Robotic Systems |
|
Torielli, Davide | Humanoids and Human Centered Mechatronics (HHCM), Istituto Itali |
Franco, Leonardo | University of Siena |
Pozzi, Maria | University of Siena |
Muratore, Luca | Istituto Italiano Di Tecnologia |
Malvezzi, Monica | University of Siena |
Tsagarakis, Nikos | Istituto Italiano Di Tecnologia |
Prattichizzo, Domenico | University of Siena |
Keywords: Telerobotics and Teleoperation, Haptics and Haptic Interfaces, Human-Centered Robotics
Abstract: The teleoperation of complex, kinematically redundant robots with loco-manipulation capabilities represents a challenge for human operators, who have to learn how to operate the many degrees of freedom of the robot to accomplish a desired task. In this context, developing an easy-to-learn and easy-to-use human-robot interface is paramount. Recent works introduced a novel teleoperation concept, which relies on a virtual physical interaction interface between the human operator and the remote robot equivalent to a "Marionette" control, but whose feedback was limited to only visual feedback on the human side. In this paper, we propose extending the "Marionette" interface by adding a wearable haptic interface to cope with the limitations given by the previous work. Leveraging the additional haptic feedback modality, the human operator gains full sensorimotor control over the robot, and the awareness about the robot's response and interactions with the environment is greatly improved. We evaluated the proposed interface and the related teleoperation framework with naive users, assessing the teleoperation performance and the user experience with and without haptic feedback. The conducted experiments consisted in a loco-manipulation mission with the CENTAURO robot, a hybrid leg-wheel quadruped with a humanoid dual-arm upper body.
|
|
13:30-15:00, Paper ThBT22-NT.2 | Add to My Program |
NetLfD: Network-Aware Learning from Demonstration for In-Contact Skills Via Teleoperation |
|
Güleçyüz, Başak | Technical University of Munich |
von Büren, Vincent | Technical University of Munich |
Xu, Xiao | Technical University of Munich |
Steinbach, Eckehard | Technical University of Munich |
Keywords: Learning from Demonstration, Telerobotics and Teleoperation
Abstract: When providing task demonstrations to a remote robot over the network via bilateral teleoperation, communication impairments are unavoidable, hindering the human operator from delivering high-quality demonstrations. Poor-quality demonstrations can negatively impact the robot's ability to learn and generalize. In this work, we propose to enhance learning performance by introducing a network-aware confidence weighting strategy for remote learning from demonstration. Our approach extends the Hidden Semi-Markov Model (HSMM) and its task-parameterized version (TP-HSMM) to their confidence-weighted versions, WHSMM and WTP-HSMM. We evaluated various weight metrics that serve as teleoperation transparency measures and demonstration quality indicators under varying communication delays. We validated the proposed approach in two different in-contact tasks using data collected from 18 participants. The results show that weighting improves task performance in reproduction by up to 42% in the force precision and 63% in the success rate, demonstrating the potential of the proposed approach to enhance the effectiveness of robot learning from remote demonstrations.
|
|
13:30-15:00, Paper ThBT22-NT.3 | Add to My Program |
Lightweight and Compliant Bilateral Teleoperation System with Anthropomorphic Arms for Aerial and Ground Service Operations |
|
Suarez, Alejandro | University of Seville |
Gonzalez-Morgado, Antonio | Universidad De Sevilla |
Ollero, Anibal | AICIA. G41099946 |
Keywords: Telerobotics and Teleoperation, Dual Arm Manipulation, Aerial Systems: Applications
Abstract: This paper presents a bilateral teleoperation system based on smart servos for the realization of dexterous manipulation tasks with aerial robots or in ground service applications, facilitating the transferability of cognitive capabilities of human workers to robots operating remotely or in high altitude workspaces. The system consists of a pair of lightweight and compliant anthropomorphic dual arm manipulators (LiCAS) in leader-follower configuration. The leader dual arm (LDA) captures the movements of the operator's arms to obtain the desired joint references, sent to the follower dual arm (FDA) to reproduce in a natural and intuitive way the manipulation task. A model of the smart servos is derived, exploiting the feedback from the FDA actuators to provide the kinesthetic feedback to the LDA, using the pulse width modulation signal (PWM) along with the joint speed to estimate the interaction torque. The mechanical joint compliance of the FDA allows the passive accommodation of the arms to the physical interactions with the manipulated objects or the environment, whereas the very low weight of the arms (1.0 kg LDA, 2.5 kg FDA) and the human-size and human-like kinematics facilitate their use in a wide variety of applications. The performance of the system is evaluated using an industrial task board for benchmarking, and in two illustrative bimanual aerial manipulation tasks.
|
|
13:30-15:00, Paper ThBT22-NT.4 | Add to My Program |
Intelligent Mode-Switching Framework for Teleoperation |
|
Kizilkaya, Burak | University of Glasgow |
She, Changyang | University of Sydney |
Zhao, Guodong | University of Glasgow, UK |
Imran, Muhammad Ali | University of Glasgow |
Keywords: Telerobotics and Teleoperation, AI-Enabled Robotics, Reinforcement Learning
Abstract: Teleoperation can be very difficult due to limited perception, high communication latency, and limited degrees of freedom (DoFs) at the operator side. Autonomous teleoperation is proposed to overcome this difficulty by predicting user intentions and performing some parts of the task autonomously to decrease the demand on the operator and increase the task completion rate. However, decision-making for mode-switching is generally assumed to be done by the operator, which brings an extra DoF to be controlled by the operator and introduces extra mental demand. On the other hand, the communication perspective is not investigated in the current literature, although communication imperfections and resource limitations are the main bottlenecks for teleoperation. In this study, we propose an intelligent mode-switching framework by jointly considering mode-switching and communication systems. User intention recognition is done at the operator side. Based on user intention recognition, a deep reinforcement learning (DRL) agent is trained and deployed at the operator side to seamlessly switch between autonomous and teleoperation modes. A real-world data set is collected from our teleoperation testbed to train both user intention recognition and DRL algorithms. Our results show that the proposed framework can achieve up to 50% communication load reduction with improved task completion probability.
|
|
13:30-15:00, Paper ThBT22-NT.5 | Add to My Program |
Digital Twin-Driven Mixed Reality Framework for Immersive Teleoperation with Haptic Rendering |
|
Fan, Wen | University of Bristol |
Guo, Xiaoqing | University of Bristol |
Feng, Enyang | University of Bristol |
Lin, Jialin | University of Bristol |
Wang, Yuanyi | City University of Hong Kong |
Liang, Jiaming | Tencent |
Garrad, Martin | University of Bristol |
Rossiter, Jonathan | University of Bristol |
Zhang, Zhengyou | Tencent |
Lepora, Nathan | University of Bristol |
Wei, Lei | Deakin University |
Zhang, Dandan | Imperial College London |
Keywords: Telerobotics and Teleoperation, Haptics and Haptic Interfaces
Abstract: Teleoperation has widely contributed to many applications. Consequently, the design of intuitive and ergonomic control interfaces for teleoperation has become crucial. The rapid advancement of Mixed Reality (MR) has yielded tangible benefits in human-robot interaction. MR provides an immersive environment for interacting with robots, effectively reducing the mental and physical workload of operators during teleoperation. Additionally, the incorporation of haptic rendering, including kinaesthetic and tactile rendering, could further amplify the intuitiveness and efficiency of MR-based immersive teleoperation. In this study, we developed an immersive, bilateral teleoperation system, integrating Digital Twin-driven Mixed Reality (DTMR) manipulation with haptic rendering. This system comprises a commercial remote controller with a kinaesthetic rendering feature and a wearable cost-effective tactile rendering interface, called the Soft Pneumatic Tactile Array (SPTA). We carried out two user studies to assess the system's effectiveness, including a performance evaluation of key components within DTMR and a quantitative assessment of the newly developed SPTA. The results demonstrate an enhancement in both the human-robot interaction experience and teleoperation performance.
|
|
13:30-15:00, Paper ThBT22-NT.6 | Add to My Program |
Design Octree-Based Method to Improve Model-Mediated Teleoperation in Tactile Internet |
|
Antonsen, Mads Mřrch | Aarhus University |
Chinello, Francesco | Aarhus University |
Zhang, Qi | Aarhus University |
Keywords: Telerobotics and Teleoperation, Haptics and Haptic Interfaces
Abstract: In this paper, we propose a model-mediated teleoperation (MMT) system using an octree-based model (OBM) to spatially map the environment impedance for the emerging use cases in Tactile Internet. Different from the existing just-noticeable-difference (JND) based MMT, our method avoids continuous transmission of environment impedance. Moreover, it allows the local model to generate accurate force feedback and reduces the number of model updates. Furthermore, the OBM can be deployed with or without previous knowledge of the environment. An online estimation of the OBM is proposed using a JND and a rate-of-change threshold. An offline estimation method is also proposed when the geometry and impedance parameters of the remote environment are known. In addition, a point cloud-based force rendering algorithm is tailored to use the OBM, thereby allowing the generating of force feedback for complex environments. An experiment without human-in-the-loop was conducted, showing that for an online estimated OBM, the accuracy of the force feedback was improved by up to 44 percent while using less than half the number of model updates when compared to JND-based MMT. Another experiment with a human operator interacting with a virtual environment showed that using an offline estimated OBM improves the accuracy of the force feedback and is reliable against packet loss and short temporal breakdown of the communication link.
|
|
13:30-15:00, Paper ThBT22-NT.7 | Add to My Program |
Autonomous and Teleoperation Control of a Drawing Robot Avatar |
|
Chen, Lingyun | Technical University of Munich |
Naceri, Abdeldjallil | Technical University of Munich |
Swikir, Abdalla | Technical University of Munich |
Hirche, Sandra | Technische Universität München |
Haddadin, Sami | Technical University of Munich |
Keywords: Telerobotics and Teleoperation, Art and Entertainment Robotics
Abstract: A drawing robot avatar is a robotic system that allows for telepresence-based drawing, enabling users to remotely control a robotic arm and create drawings in real-time from a remote location. The proposed control framework aims to improve bimanual robot telepresence quality by reducing the user workload and required prior knowledge through the automation of secondary or auxiliary tasks. The introduced novel method calculates the near-optimal Cartesian end-effector pose in terms of visual feedback quality for the attached eye-to-hand camera with motion constraints in consideration. The effectiveness is demonstrated by conducting user studies of drawing reference shapes using the implemented robot avatar compared to stationary and teleoperated camera pose conditions. Our results demonstrate that the proposed control framework offers improved visual feedback quality and drawing performance.
|
|
13:30-15:00, Paper ThBT22-NT.8 | Add to My Program |
Adaptive Haptic Control Interface for Safeguarding Robotic Teleoperation in Hazardous Steelmaking Environments |
|
Park, Jaehyun | Pohang University of Science and Technology |
Choi, Il Seop | POSCO |
Choi, Sang-Woo | PoscoHoldings |
Kim, Keehoon | POSTECH, Pohang University of Science and Technology |
Keywords: Telerobotics and Teleoperation, Haptics and Haptic Interfaces, Industrial Robots
Abstract: Steel mill is one of the most extreme and hazardous working environments due to molten iron erupted from blast furnace. Current manual labor to remove lump iron near the outlet, which is essential to prevent lump iron from scattering or blocking of molten iron, is performed by equipped human workers using a long stick tool. Thus, implementation of robotic teleoperation system is in demand to ensure safety of workers. However, the conventional command interface is not intuitive for tool manipulation (i.e. pivoting, sweeping). Besides, haptic interface, which is used to render interaction results efficiently, still limits performance due to narrow workspace and insufficient kinesthetic feedback output compared to requirements. This paper proposes a novel haptic command interface (POstick) specified to lump iron removal task with two types (KF and VF). Both POsticks have rod-shaped end tip which is identical to actual tool already used to accelerate training. POstick-KF has large workspace and high kinesthetic feedback output satisfying requirements. Further, POstick-VF has strength with unlimited workspace at the expense of the amount of haptic information from simple vibrotactile feedback. User study to compare the performance of POsticks and conventional interface reveals that POstick-KF and VF showed superior interaction and tracking ability, respectively. Moreover, these two properties are in trade-off relationship that cannot be compatible. Finally, we proposed a seamless and automatic conversion mechanism from POstick-VF to KF, and vice versa, to cover up inherent limits of haptic devices.
|
|
13:30-15:00, Paper ThBT22-NT.9 | Add to My Program |
A Probabilistic Approach for Learning and Adapting Shared Control Skills with the Human in the Loop |
|
Quere, Gabriel | DLR |
Stulp, Freek | DLR - Deutsches Zentrum Für Luft Und Raumfahrt E.V |
Filliat, David | ENSTA ParisTech |
Silvério, Joăo | German Aerospace Center (DLR) |
Keywords: Learning from Demonstration, Telerobotics and Teleoperation, Incremental Learning
Abstract: Assistive robots promise to be of great help to wheelchair users with motor impairments, for example for activities of daily living. Using shared control to provide task-specific assistance -- for instance with the Shared Control Templates (SCT) framework -- facilitates user control, even with low-dimensional input signals. However, designing SCTs is a laborious task requiring robotic expertise. To facilitate their design, we propose a method to learn one of their core components -- active constraints -- from demonstrated end-effector trajectories. We use a probabilistic model, Kernelized Movement Primitives, which additionally allows adaptation from user commands to improve the shared control skills, during both design and execution. We demonstrate that the SCTs so acquired can be successfully used to pick up an object, as well as adjusted for new environmental constraints, with our assistive robot EDAN.
|
|
ThBT23-NT Oral Session, NT-G401 |
Add to My Program |
Aerial Systems: Perception and Autonomy II |
|
|
Chair: Okada, Yoshito | Tohoku University |
Co-Chair: Chli, Margarita | ETH Zurich & University of Cyprus |
|
13:30-15:00, Paper ThBT23-NT.1 | Add to My Program |
Air Bumper: A Collision Detection and Reaction Framework for Autonomous MAV Navigation |
|
Wang, Ruoyu | The Chinese University of Hong Kong |
Guo, Zixuan | The Chinese University of Hong Kong |
Chen, Yizhou | Chinese University of Hong Kong |
Wang, Xinyi | The Chinese University of Hong Kong |
Chen, Ben M. | Chinese University of Hong Kong |
Keywords: Aerial Systems: Perception and Autonomy, Collision Avoidance, Aerial Systems: Applications
Abstract: Autonomous navigation in unknown environments with obstacles remains challenging for micro aerial vehicles (MAVs) due to their limited onboard computing and sensing resources. Although various collision avoidance methods have been developed, it is still possible for drones to collide with unobserved obstacles due to unpredictable disturbances, sensor limitations, and control uncertainty. Instead of completely avoiding collisions, this article proposes Air Bumper, a collision detection and reaction framework, for fully autonomous flight in 3D environments to improve flight safety. Our framework only utilizes the onboard inertial measurement unit (IMU) to detect and estimate collisions. We further design a collision recovery control for rapid recovery and collision-aware mapping to integrate collision information into general LiDAR-based sensing and planning frameworks. Our simulation and experimental results show that the drone can rapidly detect, estimate, and recover from collisions with obstacles in 3D space and continue the flight smoothly with the help of the collision-aware map. In addition, we will open-source the implementation of Air Bumper on GitHub.
|
|
13:30-15:00, Paper ThBT23-NT.2 | Add to My Program |
Safety-Aware Perception for Autonomous Collision Avoidance in Dynamic Environments |
|
Bena, Ryan | University of Southern California |
Zhao, Chongbo | University of Southern California |
Nguyen, Quan | University of Southern California |
Keywords: Aerial Systems: Perception and Autonomy, Collision Avoidance, Robot Safety
Abstract: Autonomous collision avoidance requires accurate environmental perception; however, flight systems often possess limited sensing capabilities with field-of-view (FOV) restrictions. To navigate this challenge, we present a safety-aware approach for online determination of the optimal sensor-pointing direction, psi_d, which utilizes control barrier functions (CBFs). First, we generate a spatial density function, Phi, which leverages CBF constraints to map the collision risk of all local coordinates. Then, we convolve Phi with an attitude-dependent sensor FOV quality function to produce the objective function, Gamma, which quantifies the total observed risk for a given pointing direction. Finally, by finding the global optimizer for Gamma, we identify the value of psi_d which maximizes the perception of risk within the FOV. We incorporate psi_d into a safety-critical flight architecture and conduct a numerical analysis using multiple simulated mission profiles. Our algorithm achieves a success rate of 88-96%, constituting a 16-29% improvement compared to the best heuristic methods. We demonstrate the functionality of our approach via a flight demonstration using the Crazyflie 2.1 micro-quadrotor. Without a priori obstacle knowledge, the quadrotor follows a dynamic flight path while simultaneously calculating and tracking psi_d to perceive and avoid two static obstacles with an average computation time of 371 mus.
|
|
13:30-15:00, Paper ThBT23-NT.3 | Add to My Program |
Incremental Multimodal Surface Mapping Via Self-Organizing Gaussian Mixture Models |
|
Goel, Kshitij | Carnegie Mellon University |
Tabib, Wennie | Carnegie Mellon University |
Keywords: Aerial Systems: Perception and Autonomy, Field Robots, Multi-Robot Systems
Abstract: This letter describes an incremental multimodal surface mapping methodology, which represents the environment as a continuous probabilistic model. This model enables high-resolution reconstruction while simultaneously compressing spatial and intensity point cloud data. The strategy employed in this work utilizes Gaussian mixture models (GMMs) to represent the environment. While prior GMM-based mapping works have developed methodologies to determine the number of mixture components using information-theoretic techniques, these approaches either operate on individual sensor observations, making them unsuitable for incremental mapping, or are not real-time viable, especially for applications where high-fidelity modeling is required. To bridge this gap, this letter introduces a spatial hash map for rapid GMM submap extraction combined with an approach to determine relevant and redundant data in a point cloud. These contributions increase computational speed by an order of magnitude compared to state-of-the-art incremental GMM-based mapping. In addition, the proposed approach yields a superior tradeoff in map accuracy and size when compared to state-of-the-art mapping methodologies (both GMM- and not GMM-based). Evaluations are conducted using both simulated and real-world data. The software is released open-source to benefit the robotics community.
|
|
13:30-15:00, Paper ThBT23-NT.4 | Add to My Program |
Learning to Explore Indoor Environments Using Autonomous Micro Aerial Vehicles |
|
Tao, Yuezhan | University of Pennsylvania |
Iceland, Eran | Hebrew University Jerusalem Israel |
Li, Beiming | University of Pennsylvania |
Zwecher, Elchanan | Hebrew University |
Heinemann, Uri | Hebrew University of Jerusalem |
Cohen, Avraham | Technion |
Avni, Amir | Technion |
Gal, Oren | Technion - Israel Institute of Technology |
Barel, Ariel | Technion - Israel Institute of Technology |
Kumar, Vijay | University of Pennsylvania |
Keywords: Aerial Systems: Perception and Autonomy, Mapping, Reinforcement Learning
Abstract: In this paper, we address the challenge of exploring unknown indoor environments using autonomous aerial robots with Size Weight and Power (SWaP) constraints. The SWaP constraints induce limits on mission time requiring efficiency in exploration. We present a novel exploration framework that uses Deep Learning (DL) to predict the most likely indoor map given the previous observations, and Deep Reinforcement Learning (DRL) for exploration, designed to run on modern SWaP constraints neural processors. The DL-based map predictor provides a prediction of the occupancy of the unseen environment while the DRL-based planner determines the best navigation goals that can be safely reached to provide the most information. The two modules are tightly coupled and run onboard allowing the vehicle to safely map an unknown environment. Extensive experimental and simulation results show that our approach surpasses state-of-the-art methods by 50-60% in efficiency, which we measure by the fraction of the explored space as a function of the trajectory length.
|
|
13:30-15:00, Paper ThBT23-NT.5 | Add to My Program |
Multi-Robot Multi-Room Exploration with Geometric Cue Extraction and Circular Decomposition |
|
Kim, Seungchan | Carnegie Mellon University |
Corah, Micah | Colorado School of Mines |
Keller, John | Carnegie Mellon University |
Best, Graeme | University of Technology Sydney |
Scherer, Sebastian | Carnegie Mellon University |
Keywords: Aerial Systems: Perception and Autonomy, Multi-Robot Systems, Vision-Based Navigation
Abstract: This work proposes an autonomous multi-robot exploration pipeline that coordinates the behaviors of robots in an indoor environment composed of multiple rooms. Contrary to simple frontier-based exploration approaches, we aim to enable robots to methodically explore and observe an unknown set of rooms in a structured building, keeping track of which rooms are already explored and sharing this information among robots to coordinate their behaviors in a distributed manner. To this end, we propose (1) a geometric cue extraction method that processes 3D point cloud data and detects the locations of potential cues such as doors and rooms, (2) a circular decomposition for free spaces used for target assignment. Using these two components, our pipeline effectively assigns tasks among robots, and enables a methodical exploration of rooms. We evaluate the performance of our pipeline using a team of up to 3 aerial robots, and show that our method outperforms the baseline by 33.4% in simulation and 26.4% in real-world experiments.
|
|
13:30-15:00, Paper ThBT23-NT.6 | Add to My Program |
Fast Multi-UAV Decentralized Exploration of Forests |
|
Bartolomei, Luca | ETH Zurich |
Teixeira, Lucas | ETH Zurich |
Chli, Margarita | ETH Zurich & University of Cyprus |
Keywords: Aerial Systems: Perception and Autonomy, Path Planning for Multiple Mobile Robots or Agents
Abstract: Efficient exploration strategies are vital in tasks such as search-and-rescue missions and disaster surveying. Unmanned Aerial Vehicles (UAVs) have become particularly popular in such applications, promising to cover large areas at high speeds. Moreover, with the increasing maturity of onboard UAV perception, research focus has been shifting toward higher-level reasoning for multi-robot missions. However, autonomous navigation and exploration of previously unknown large spaces still constitute an open challenge, especially when the environment is cluttered and exhibits large and frequent occlusions due to high obstacle density, as is the case of forests. Moreover, the problem of long-distance wireless communication in such scenes can become a limiting factor, especially when automating the navigation of a UAV fleet. In this spirit, this work proposes an exploration strategy that enables multiple UAVs to quickly explore complex scenes in a decentralized fashion. By providing the decision-making capabilities to each UAV to switch between different execution modes, the proposed strategy is shown to strike a great balance between cautious exploration of yet completely unknown regions and more aggressive exploration of smaller areas of unknown space. This results in full coverage of forest areas in multi-UAV setups up to 30% faster than the state of the art.
|
|
13:30-15:00, Paper ThBT23-NT.7 | Add to My Program |
Reinforcement Learning for Collision-Free Flight Exploiting Deep Collision Encoding |
|
Kulkarni, Mihir | NTNU: Norwegian University of Science and Technology |
Alexis, Kostas | NTNU - Norwegian University of Science and Technology |
Keywords: Aerial Systems: Perception and Autonomy, Reinforcement Learning
Abstract: This work contributes a novel deep navigation policy that enables collision-free flight of aerial robots based on a modular approach exploiting deep collision encoding and reinforcement learning. The proposed solution builds upon a deep collision encoder that is trained on both simulated and real depth images using supervised learning such that it compresses the high-dimensional depth data to a low-dimensional latent space encoding collision information while accounting for the robot size. This compressed encoding is combined with an estimate of the robot's odometry and the desired target location to train a deep reinforcement learning navigation policy that offers low-latency computation and robust sim2real performance. A set of simulation and experimental studies in diverse environments are conducted and demonstrate the efficiency of the emerged behavior and its resilience in real-life deployments.
|
|
13:30-15:00, Paper ThBT23-NT.8 | Add to My Program |
Learning Agile Flights through Narrow Gaps with Varying Angles Using Onboard Sensing |
|
Xie, Yuhan | The University of Hong Kong |
Lu, Minghao | The University of Hong Kong |
Peng, Rui | The University of Hong Kong |
Lu, Peng | The University of Hong Kong |
Keywords: Aerial Systems: Perception and Autonomy, Reinforcement Learning, Integrated Planning and Learning
Abstract: This paper addresses the problem of traversing through unknown, tilted, and narrow gaps for quadrotors using Deep Reinforcement Learning (DRL). Previous learning-based methods relied on accurate knowledge of the environment, including the gap's pose and size. In contrast, we integrate onboard sensing and detect the gap from a single onboard camera. The training problem is challenging for two reasons: a precise and robust whole-body planning and control policy is required for variable-tilted and narrow gaps, and an effective Sim2Real method is needed to successfully conduct real-world experiments. To this end, we propose a learning framework for agile gap traversal flight, which successfully trains the vehicle to traverse through the center of the gap at an approximate attitude to the gap with aggressive tilted angles. The policy trained only in a simulation environment can be transferred into different domains with fine-tuning while maintaining the success rate. Our proposed framework, which integrates onboard sensing and a neural network controller, achieves a success rate of 87.36% in real-world experiments, with gap orientations up to 60deg. To the best of our knowledge, this is the first paper that performs the learning-based variable-tilted narrow gap traversal flight in the real world, without prior knowledge of the environment.
|
|
ThBT24-NT Oral Session, NT-G402 |
Add to My Program |
Robotics and Automation in Agriculture and Forestry III |
|
|
Chair: Fukao, Takanori | University of Tokyo |
Co-Chair: Sugiura, Hisashi | Yanmar Co., Ltd |
|
13:30-15:00, Paper ThBT24-NT.1 | Add to My Program |
Osiris: Building Hierarchical Representations for Agricultural Environments |
|
Mukuddem, Adam | University of Cape Town |
Amayo, Paul | University of Cape Town |
Keywords: Robotics and Automation in Agriculture and Forestry
Abstract: 3D scene graphs have recently emerged as a powerful and human-understandable way of representing complex 3D environments. These describes environments through a layered or hierarchical graph where nodes represent different spatial concepts (from low-level geometry to higher-level scene-scale reasoning) and the edges between them represent relationships. While these representations have shown great promise in indoor well-structured environments, their use in outdoor structured environments such as agricultural environment has been under-explored. A key challenge here is that concepts and structure often observed in urban indoor environments cannot be easily transferred to these novel scenes. Motivated by this challenge , this paper presents emph{Osiris} which is a 3D scene graph builder for agricultural environment. We first propose a structure of the hierarchical graph for agricultural environments consisting of rowed crops and through our proposed system emph{Osiris} incrementally construct a 3D scene graph of agricultural environments from data taken onboard a mobile robot. We validate and evaluate the performance of Osiris using real-world data collected at several farms and show that this system is able to accurately get to the underlying structure of these agricultural environments while presenting a metrically accurate and human-understandable representation.
|
|
13:30-15:00, Paper ThBT24-NT.2 | Add to My Program |
Streamlined Acquisition of Large Sensor Data for Autonomous Mobile Robots to Enable Efficient Creation and Analysis of Datasets |
|
Niemeyer, Mark | DFKI |
Arkenau, Julian | German Research Center for Artificial Intelligence |
Pütz, Sebastian | German Research Center for Artificial Intelligence |
Hertzberg, Joachim | University of Osnabrueck |
Keywords: Robotics and Automation in Agriculture and Forestry, Agricultural Automation
Abstract: The increasing usage of modern AI techniques represents a transforming shift in the robotics domain. Training and accessing new models requires substantial amounts of application-specific data, but the limited resources onboard mobile robots (like processing power, network bandwidth, etc.) pose a challenge for the development of efficient data recording and provisioning pipelines. Furthermore, accessing specific information based on a combination of spatial, temporal and semantic information is generally not supported by currently available tools. In this paper, we present a methodology which allows the efficient recording of robotic sensor data streams. We show that our approach reduces the overall time needed until the data can be served via the spatio-temporal-semantic query interface of the semantic environment representation SEEREP. We further present that the maximum sensor data rate which can be stored to disk in real-time is increased for large robotic data types like images and point clouds in comparison to frequently employed solutions within the ROS ecosystem.
|
|
13:30-15:00, Paper ThBT24-NT.3 | Add to My Program |
Development of an Automatic Sweet Pepper Harvesting Robot and Experimental Evaluation |
|
Pan, Qinghui | Dalian University of Technology |
Wang, Dong | Dalian University of Technology |
Lian, Jie | Dalian University of Technology |
Dong, Yongxiang | Dalian University of Technology |
Qiu, Chaochao | Dalian University of Technology |
Keywords: Robotics and Automation in Agriculture and Forestry, Agricultural Automation
Abstract: The aging population and diminishing working population in agriculture motivate the development of autonomous harvesting robots. Although autonomous harvesting is expanding rapidly, the commercial application of sweet pepper harvesting robots still faces challenges. This paper presents the development of a sweet pepper harvesting robot and reports its experimental verification, which mainly includes end-effector design, visual perception, and grasping pose control. The end-effector adopts electrical control, mainly composed of a servo-electric two-finger parallel clamping module, a swing-cutting module, and a fruit recovery device. Equipped with a tactile sensor array, it can accurately sense the sweet pepper peduncle position and the end-effector state (harvesting failure) to complete the precise cutting. An end-effector grasping pose control algorithm of the manipulator is proposed, which can control the end-effector to grasp along the direction of the fruit peduncle and perpendicular to the tangent direction of the picking point by estimating the pose of the sweet pepper peduncle. Finally, the robot and proposed method were verified in a plant factory. The experimental findings demonstrate that the developed harvesting robot can complete robust detection of fruit peduncles and non-destructive picking of sweet pepper, with an average picking time of about 15 seconds.
|
|
13:30-15:00, Paper ThBT24-NT.4 | Add to My Program |
LiDAR-Based Robot Transplanter |
|
Asano, Masaki | University of Tokyo, Graduate School of Information Science And |
Fukao, Takanori | University of Tokyo |
Keywords: Robotics and Automation in Agriculture and Forestry, Agricultural Automation, Field Robots
Abstract: In Japan, labor shortage of agriculture is becoming increasingly severe due to the lack of farmers and aging. Therefore, the development of automation of vegetable production such as transplanting, harvesting and transporting is required. In this paper, a self-localization method by using LiDAR and a robust control method of a transplanter are proposed for accurate transplanting. In this system, the path of transplanter is generated by using 3D point cloud data, and the transplanting part follows it and plant seedlings of cabbage accurately. Path generation is performed considering vehicle tilt in the roll direction depending on the environment of grooves. An accurate calculation of lateral and angular position of the transplanting part is also proposed. For path following control, sliding-mode control and inverse optimal control are applied to transplanter. The experimental results demonstrated the effectiveness of these proposed methods and problems we have to tackle on. Basically, it was possible to perform automated transplanting accurately, but there was an occasional problem of offset error from 0. It was confirmed that inverse optimal control is superior to sliding-mode control and is more robust to environmental changes.
|
|
13:30-15:00, Paper ThBT24-NT.5 | Add to My Program |
EdgeSoil 2.0 – Soil Analyzer Using Convolutional Neural Network and Camera Imaging for Agricultural Robotics |
|
Kasemi, Roni | UBT |
Lammer, Lara | Acin Tu Wien |
Thalhammer, Stefan | TU Wien |
Vincze, Markus | Vienna University of Technology |
Keywords: Robotics and Automation in Agriculture and Forestry, Agricultural Automation, Field Robots
Abstract: Soil is the most important building element of agriculture and its analysis is crucial for healthy plants and a high crop yield. But apart from its importance, soil analysis is a tedious and time-consuming task. This paper presents EdgeSoil 2.0, a non-invasive, accurate, and real-time robotic system for soil pH prediction, a key parameter of soil status for farmers. The EdgeSoil 2.0 predicts the pH value of the soil in real-time, using a live video stream from a webcam with an average of 7 FPS. The method is suitable to be implemented on edge devices necessary for the application: we are using a mobile robot with the NVIDIA Jetson Nano module which is running a pH-estimator trained with a Convolutional Neural Network (CNN) on a novel dataset we built for this purpose. Predictions are performed while the robot is moving over the plowed field before the planting process starts. In order to achieve the best performance, we train the pH-estimator with different input modalities and validate each result using Mean Squared Error (MSE) and Standard Deviation (SD). We are able to achieve accurate results with the MSE value of 0.08, the SD value of 0.15, and with testing results from the field showing up to ± 0.3 deviation from the GT value during prediction, which is sufficient to comply with agricultural standards.
|
|
13:30-15:00, Paper ThBT24-NT.6 | Add to My Program |
Semiautonomous Precision Pruning of Upright Fruiting Offshoot Orchard Systems: An Integrated Approach |
|
You, Alexander | Oregon State University |
Parayil, Nidhi | Oregon State University |
Josyula, Gopala Krishna | Oregon State Univeristy |
Bhattarai, Uddhav | Washington State University |
Sapkota, Ranjan | Washington State University |
Ahmed, Dawood | Washington State University |
Whiting, Matthew | Washington State University |
Karkee, Manoj | Washington State University |
Grimm, Cindy | Oregon State University |
Davidson, Joseph | Oregon State University |
Keywords: Robotics and Automation in Agriculture and Forestry, Agricultural Automation, Field Robots
Abstract: Dormant pruning is an important orchard activity for maintaining tree health and producing high-quality fruit. Due to decreasing worker availability, pruning is a prime candidate for robotics. However, pruning also represents a uniquely difficult problem, requiring robust systems for perception, pruning point determination, and manipulation that must operate under variable lighting conditions and in complex, highly unstructured environments. In this article, we introduce a system for pruning modern, planar orchard architectures with simple pruning rules that combines various subsystems from our previous work on perception and manipulation. The integrated system demonstrates the ability to autonomously detect and cut pruning targets with minimal control of the environment, laying the groundwork for a fully autonomous system in the future. We validate the performance of our system through field trials in a sweet cherry orchard, ultimately achieving a cutting success rate of 58% across ten trees. Though not fully robust and requiring improvements in throughput, our system is the first to operate on fruit trees and represents a useful base platform to be improved in the future.
|
|
13:30-15:00, Paper ThBT24-NT.7 | Add to My Program |
On-The-Go Tree Detection and Geometric Traits Estimation with Ground Mobile Robots in Fruit Tree Groves |
|
Chatziparaschis, Dimitrios | UC Riverside |
Teng, Hanzhe | University of California, Riverside |
Wang, Yipeng | University of California, Riverside |
Peiris, Pamodya | University of California, Riverside |
Scudiero, Elia | University of California, Riverside |
Karydis, Konstantinos | University of California, Riverside |
Keywords: Robotics and Automation in Agriculture and Forestry, Agricultural Automation, Object Detection, Segmentation and Categorization
Abstract: By-tree information gathering is an essential task in precision agriculture achieved by ground mobile sensors, but it can be time- and labor-intensive. In this paper we present an algorithmic framework to perform real-time and on-the-go detection of trees and key geometric characteristics (namely, width and height) with wheeled mobile robots in the field. Our method is based on the fusion of 2D domain-specific data (normalized difference vegetation index [NDVI] acquired via a red-green-near-infrared [RGN] camera) and 3D LiDAR point clouds, via a customized tree landmark association and parameter estimation algorithm. The proposed system features a multi-modal and entropy-based landmark correspondences approach, integrated into an underlying Kalman filter system to recognize the surrounding trees and jointly estimate their spatial and vegetation-based characteristics. Realistic simulated tests are used to evaluate our proposed algorithm's behavior in a variety of settings. Physical experiments in agricultural fields help validate our method's efficacy in acquiring accurate by-tree information on-the-go and in real-time by employing only onboard computational and sensing resources.
|
|
13:30-15:00, Paper ThBT24-NT.8 | Add to My Program |
Autonomous Apple Fruitlet Sizing with Next Best View Planning |
|
Freeman, Harry | Carnegie Mellon University |
Kantor, George | Carnegie Mellon University |
Keywords: Robotics and Automation in Agriculture and Forestry, Computer Vision for Automation, Agricultural Automation
Abstract: In this paper, we present a next-best-view planning approach to autonomously size apple fruitlets. State-of-the-art viewpoint planners in agriculture are designed to size large and more sparsely populated fruit. They rely on lower resolution maps and sizing methods that do not generalize to smaller fruit sizes. To overcome these limitations, our method combines viewpoint sampling around semantically labeled regions of interest, along with an attention-guided information gain mechanism to more strategically select viewpoints that target the small fruits' volume. Additionally, we integrate a dual-map representation of the environment that is able to both speed up expensive ray casting operations and maintain the high occupancy resolution required to informatively plan around the fruit. When sizing, a robust estimation and graph clustering approach is introduced to associate fruit detections across images. Through simulated experiments, we demonstrate that our viewpoint planner improves sizing accuracy compared to state of the art and ablations. We also provide quantitative results on data collected by a real robotic system in the field.
|
|
13:30-15:00, Paper ThBT24-NT.9 | Add to My Program |
Gradient-Based Local Next-Best-View Planning for Improved Perception of Targeted Plant Nodes |
|
Burusa, Akshay Kumar | Wageningen University and Research |
Van Henten, Eldert J. | Wageningen University |
Kootstra, Gert | Wageningen University |
Keywords: Robotics and Automation in Agriculture and Forestry, Computer Vision for Automation, Autonomous Agents
Abstract: Robots are increasingly used in tomato greenhouses to automate labour-intensive tasks such as selective harvesting and de-leafing. To perform these tasks, robots must be able to accurately and efficiently perceive the plant nodes that need to be cut, despite the high levels of occlusion from other plant parts. We formulate this problem as a local next-best-view (NBV) planning task where the robot has to plan an efficient set of camera viewpoints to overcome occlusion and improve the quality of perception. Our formulation focuses on quickly improving the perception accuracy of a single target node to maximise its chances of being cut. Previous methods of NBV planning mostly focused on global view planning and used random sampling of candidate viewpoints for exploration, which could suffer from high computational costs, ineffective view selection due to poor candidates, or non-smooth trajectories due to inefficient sampling. We propose a gradient-based NBV planner using differential ray sampling, which directly estimates the local gradient direction for viewpoint planning to overcome occlusion and improve perception. Through simulation experiments, we showed that our planner can handle occlusions and improve the 3D reconstruction and position estimation of nodes equally well as a sampling-based NBV planner, while taking ten times less computation and generating 28% more efficient trajectories.
|
|
ThBT25-NT Oral Session, NT-G403 |
Add to My Program |
Localization and Mapping I |
|
|
Chair: Garg, Sourav | University of Adelaide |
Co-Chair: Sui, Wei | Soochow University |
|
13:30-15:00, Paper ThBT25-NT.1 | Add to My Program |
A Vision-Centric Approach for Static Map Element Annotation |
|
Zhang, Jiaxin | Soochow University |
Shiyuan, Chen | Soochow University |
Yin, Haoran | Soochow University |
Mei, Ruohong | Soochow University |
Liu, Xuan | Northeast Normal University |
Yang, Cong | Soochow University |
Zhang, Qian | Horizon Robotics |
Sui, Wei | Soochow University |
Keywords: Data Sets for Robotic Vision, Mapping, Computer Vision for Transportation
Abstract: The recent development of online static map element (a.k.a. HD Map) construction algorithms has raised a vast demand for data with ground truth annotations. However, available public datasets currently cannot provide high-quality training data regarding consistency and accuracy. To this end, we present CAMA: a vision-centric approach for Consistent and Accurate Map Annotation. Without LiDAR inputs, our proposed framework can still generate high-quality 3D annotations of static map elements. Specifically, the annotation can achieve high reprojection accuracy across all surrounding cameras and is spatial-temporal consistent across the whole sequence. We apply our proposed framework to the popular nuScenes dataset to provide efficient and highly accurate annotations. Compared with the original nuScenes static map element, models trained with annotations from CAMA achieve lower reprojection errors (e.g., 4.73 vs. 8.03 pixels).
|
|
13:30-15:00, Paper ThBT25-NT.2 | Add to My Program |
VBR: A Vision Benchmark in Rome |
|
Brizi, Leonardo | Sapienza University of Rome |
Giacomini, Emanuele | Sapienza University of Rome |
Di Giammarino, Luca | Sapienza University of Rome |
Ferrari, Simone | Sapienza University of Rome |
Salem, Omar Ashraf Ahmed Khairy | Sapienza University of Rome |
De Rebotti, Lorenzo | Sapienza University of Rome |
Grisetti, Giorgio | Sapienza University of Rome |
Keywords: Data Sets for SLAM, Mapping, Range Sensing
Abstract: This paper presents a robotics perception research dataset collected in Rome, featuring RGB data, 3D point clouds, IMU, and GPS data. We introduce a new benchmark targeting visual odometry and SLAM, to advance the research in autonomous robotics. This work complements existing datasets by simultaneously addressing several issues, such as environment diversity, motion patterns, and sensor frequency. It uses up-to-date devices and presents effective procedures to accurately calibrate the intrinsic and extrinsic of the sensors while addressing temporal synchronization. During recording, we cover multi-floor buildings, gardens, urban and highway scenarios. Combining handheld and car-based data collections, our setup can simulate any robot (quadrupeds, quadrotors, autonomous vehicles). The dataset includes an accurate 6-dof groundtruth based on a novel methodology that refines the RTK-GPS estimate with LiDAR point clouds through Bundle Adjustment(BA). All sequences divided in training and validation are accessible at www.rvp-group.net/datasets/slam.
|
|
13:30-15:00, Paper ThBT25-NT.3 | Add to My Program |
Spatial-Aware Dynamic Lightweight Self-Supervised Monocular Depth Estimation |
|
Song, Linna | National University of Defense Technology |
Shi, Dianxi | Defense Innovation Institute |
Xia, Jianqiang | National Innovation Institute of Defense Technology |
Ouyang, Qianying | Intelligent Game and Decision Lab; Tianjin Artificial Intelliegnc |
Qiao, Ziteng | National Innovation Institute of Defense Technology |
Jin, Songchang | Defense Innovation Institute |
Yang, Shaowu | National University of Defense Technology |
Keywords: Deep Learning for Visual Perception, Deep Learning Methods, Mapping
Abstract: Self-supervised monocular depth estimation has attracted extensive attention in recent years. Lightweight depth estimation methods are crucial for resource-constrained edge devices. However, existing lightweight methods often encounter the challenge of limited representation capacity and increased computational resource consumption for image reconstruction. To alleviate these issues, we propose a novel spatial-aware dynamic lightweight monocular depth estimation method (SAD-Depth). Specifically, we propose a spatial-aware dynamic encoder, which can capture spatial information of the input and generate input-adaptive dynamic convolutions, thereby significantly enhancing the model's adaptability to complex scenes. Meanwhile, we propose a multi-scale sub-pixel lightweight decoder that generates high-quality depth maps while maintaining a lightweight design. Experimental results demonstrate that our proposed SAD-Depth exhibits superiority in both model size and inference speed, achieving state-of-the-art performance on the KITTI benchmark.
|
|
13:30-15:00, Paper ThBT25-NT.4 | Add to My Program |
VDNA-PR: Using General Dataset Representations for Robust Sequential Visual Place Recognition |
|
Ramtoula, Benjamin | University of Oxford |
De Martini, Daniele | University of Oxford |
Gadd, Matthew | University of Oxford |
Newman, Paul | Oxford University |
Keywords: Deep Learning for Visual Perception, Localization
Abstract: This paper adapts a general dataset representation technique to produce robust Visual Place Recognition (VPR) descriptors, crucial to enable real-world mobile robot localisation. Two parallel lines of work on VPR have shown, on one side, that general-purpose off-the-shelf feature representations can provide robustness to domain shifts, and, on the other, that fused information from sequences of images improves performance. In our recent work on measuring domain gaps between image datasets, we proposed a Visual Distribution of Neuron Activations (VDNA) representation to represent datasets of images. This representation can naturally handle image sequences and provides a general and granular feature representation derived from a general-purpose model. Moreover, our representation is based on tracking neuron activation values over the list of images to represent and is not limited to a particular neural network layer, therefore having access to high- and low-level concepts. This work shows how VDNAs can be used for VPR by learning a very lightweight and simple encoder to generate task-specific descriptors. Our experiments show that our representation can allow for better robustness than current solutions to serious domain shifts away from the training data distribution, such as to indoor environments and aerial imagery.
|
|
13:30-15:00, Paper ThBT25-NT.5 | Add to My Program |
NISB-Map: Scalable Mapping with Neural Implicit Spatial Block |
|
Xiang, Beichen | Nanjing University of Science and Technology |
Sun, Yuxin | Shanghai Jiao Tong University |
Xie, Zhongqu | Nanjing University of Science and Technology |
Yang, Xiaolong | Nanjing University of Science and Technology |
Wang, Yulin | Nanjing University of Science and Technology |
Keywords: Deep Learning for Visual Perception, Mapping
Abstract: Recently, neural implicit representations have been applied in the mapping process of simultaneous localization and mapping (SLAM), accompanied by less storage overhead and continuous representation. Nevertheless, related methods use a single neural network to represent the whole scene, resulting in forgetting the observed regions caused by the limited capacity of a single network in the large-scale scene. Several methods encode the scene into implicit voxels to avoid parameter forgetting while the memory is sacrificed. In this paper, we introduce a scalable mapping framework that utilizes extensible Neural Implicit Spatial Blocks (NISB) with fixed size to cover the entire scene by incrementally creating multiple Multi-Layer Perceptron (MLP) networks. In evaluations against alternative methods on 3 datasets of indoor environments, our method Avoids forgetting the observed areas during the mapping process with a small memory footprint and smoothly updates the global map at 2 Hz.
|
|
13:30-15:00, Paper ThBT25-NT.6 | Add to My Program |
Regressing Transformers for Data-Efficient Visual Place Recognition |
|
Leyva-Vallina, Maria | University of Groningen |
Strisciuglio, Nicola | University of Twente |
Petkov, Nicolai | University of Groningen |
Keywords: Deep Learning for Visual Perception, Recognition, Visual Learning
Abstract: Visual place recognition is a critical task in computer vision, especially for localization and navigation systems. Existing methods often rely on contrastive learning: image descriptors are trained to have small distance for similar images and larger distance for dissimilar ones in a latent space. However, this approach struggles to ensure accurate distance-based image similarity representation, particularly when training with binary pairwise labels, and complex re-ranking strategies are required. This work introduces a fresh perspective by framing place recognition as a regression problem, using camera field-of-view overlap as similarity ground truth for learning. By optimizing image descriptors to align directly with graded similarity labels, this approach enhances ranking capabilities without expensive re-ranking, offering data-efficient training and strong generalization across several benchmark datasets
|
|
13:30-15:00, Paper ThBT25-NT.7 | Add to My Program |
On the Study of Data Augmentation for Visual Place Recognition |
|
Jang, Suji | Gwang-Ju Institute of Science and Technology |
Kim, Ue-Hwan | Gwangju Institute of Science and Technology (GIST) |
Keywords: Deep Learning for Visual Perception, Recognition, Visual Learning
Abstract: In the field of robotics engineering and autonomous driving vehicles, precise estimation of positions through visual place recognition (VPR) is crucial not only for reducing localization errors caused by visual odometry but also for preventing the creation of ambiguous maps in unfamiliar environments. Despite numerous research efforts aimed at improving VPR performance by addressing challenges such as illumination variation, occlusions, and dynamic objects, contemporary approaches have primarily focused on model-based methods, with limited attention given to data augmentation (DA) methods. Therefore, there is a need to investigate the impact of DA on the generalization ability of VPR. To achieve this objective, this study compares VPR learning approaches, conducts a comprehensive empirical analysis, and presents crucial insights. The results of this study can provide useful guidance for the design of future VPR systems and contribute to the advancement of computer vision and robotics research.
|
|
13:30-15:00, Paper ThBT25-NT.8 | Add to My Program |
Enhancing Visual Place Recognition with Multi-Modal Features and Time-Constrained Graph Attention Aggregation |
|
Wang, Zhuo | Northeastern University |
Zhang, Yunzhou | Northeastern University |
Zhao, Xinge | Northeastern University |
Ning, Jian | Northeastern University |
Zou, Dehao | Northeastern University |
Pei, Meiqi | Northeastern University |
Keywords: Deep Learning for Visual Perception, RGB-D Perception
Abstract: Visual place recognition(VPR) is a crucial technology for autonomous driving and robotic navigation. However, severe appearance and perspective changes often lead to degradation of algorithm performance. Current methods mainly utilize single-modality RGB images, which are sensitive to environmental changes. To address this challenge, we propose a novel multi-modal visual place recognition method by incorporating depth information as auxiliary data to enhance the robustness of the VPR algorithm. The pipeline involves dual-branch feature extraction and shared multi-modal feature fusion based on transformer(SFFM) to enable full interaction between semantic and structural information. Furthermore, we introduce a time-constrained graph attention aggregation(TC-GAT) that propagates node information across time and space to deal with perceptual aliasing. Extensive experiments on the Oxford Robotcar and MSLS datasets demonstrate that the proposed algorithm is not only effective in appearance changes but also competitive in opposing viewpoints.
|
|
13:30-15:00, Paper ThBT25-NT.9 | Add to My Program |
MBFusion: A New Multi-Modal BEV Feature Fusion Method for HD Map Construction |
|
Hao, Xiaoshuai | Samsung Research China - Beijing (SRC-B) |
Zhang, Hui | Samsung Research China - Beijing (SRC-B) |
Yang, Yifan | Samsung Research China - Beijing (SRC-B) |
Zhou, Yi | Samsung Research |
Jung, Sangil | Samsung Advanced Institute of Technology |
Park, Seung-In | Samsung Advanced Institute of Technology |
Yoo, ByungIn | Samsung Advanced Institute of Technology |
Keywords: Deep Learning for Visual Perception, Sensor Fusion, Computer Vision for Automation
Abstract: Abstract— HD map construction is a fundamental and challenging task in autonomous driving to understand the surrounding environment. Recently, Camera-LiDAR BEV feature fusion methods have attracted increasing attention in HD map construction task, which can significantly boost the benchmark. However, existing fusion methods ignore modal interaction and utilize very simple fusion strategy, which suffers from the problems of misalignment and information loss. To tackle this, we propose a novel Multi-modal BEV feature fusion method named MBFusion. Specifically, to solve the semantic misalignment problem between Camera and LiDAR features, we design Cross-modal Interaction Transform(CIT) module to make these two feature spaces interact knowledge with each other to enhance the feature representation by the cross-attention mechanism. Then, we propose a Dual Dynamic Fusion (DDF) module to automatically select valuable information from different modalities for better feature fusion. Moreover, MBFusion is simple, and can be plug-and-played into exist- ing pipelines. We evaluate MBFusion on three architectures, including HDMapNet, VectorMapNet, and MapTR, to show its versatility and effectiveness. Compared with the state-of- the-art methods, MBFusion achieves 3.6% and 4.1% absolute improvements on mAP on the nuScenes and the Argoverse2 datasets, respectively, demonstrating the superiority of our method.
|
|
ThBT26-NT Oral Session, NT-G404 |
Add to My Program |
SLAM V |
|
|
Chair: Mordohai, Philippos | Stevens Institute of Technology |
Co-Chair: Date, Hisashi | University of Tsukuba |
|
13:30-15:00, Paper ThBT26-NT.1 | Add to My Program |
HPF-SLAM: An Efficient Visual SLAM System Leveraging Hybrid Point Features |
|
Su, Xin | Technical University of Munich |
Eger, Sebastian | TUM |
Misik, Adam | Siemens Technology, Technical University Munich |
Yang, Dong | Technical University of Munich |
Pries, Rastin | Nokia |
Steinbach, Eckehard | Technical University of Munich |
Keywords: SLAM, Multi-Robot SLAM, Autonomous Agents
Abstract: Visual SLAM is an essential tool in diverse applications such as robot perception and extended reality, where feature-based methods are prevalent due to their accuracy and robustness. However, existing methods employ either hand-crafted or solely learnable point features and are thus limited by the feature attributes. In this paper, we propose incorporating hybrid point features efficiently into a single system. By integrating hand-crafted and learnable features, we seek to capitalize on their complementary attributes in both key-point identification and descriptor expressiveness. To this purpose, we design a pre-processing module, which includes extraction, inter-class processing, and post-processing of hybrid point features. We present an efficient matching approach to exclusively perform the data association within the same class of features. Moreover, we design a Hybrid Bag-of-Words (H-BoW) model to deal with hybrid point features in matching and loop-closure-detection. By integrating the proposed framework into a modern feature-based system, we introduce HPF-SLAM. We evaluate the system on EuRoC-MAV and TUM-RGBD benchmarks. The experimental results show that our method consistently surpasses the baseline at comparable speed.
|
|
13:30-15:00, Paper ThBT26-NT.2 | Add to My Program |
2D-3D Object Shape Alignment for Camera-Object Pose Compensation in Object-Visual SLAM |
|
Lee, Hanyeol | Seoul National University |
Jung, Jaehyung | Technical University of Munich |
Park, Chan Gook | Seoul National University |
Keywords: SLAM, Object Detection, Segmentation and Categorization, Deep Learning for Visual Perception
Abstract: In this study, we propose an object shape alignment method through a robust optimization scheme for 6-degrees-of-freedom (DOF) object pose compensation. Although the pose estimation of the 3D object by the camera has been rapidly improved in recent years with the development of deep learning, the estimate still contains errors due to several factors. To compensate for this, we perform a shape alignment between the 2D segmentation of the object and the projection of the 3D object in the image plane. To avoid convergence to a local minimum in nonlinear optimization, we separate the pose into translation and rotation. This approach derives the optimization of a linear form in terms of a translation with reduced computational cost. For the rotation, the parallel optimization is performed with multiple initial values, reflecting to the uncertainty of an initial value. We formulate an invariant extended Kalman filter (EKF)-based object-visual simultaneous localization and mapping (SLAM) with a camera-object relative pose as the measurement model. To verify the performance of the proposed algorithm, we present the improved results of camera-object relative pose accuracy and localization and mapping accuracy in the several sequences of YCB-video dataset.
|
|
13:30-15:00, Paper ThBT26-NT.3 | Add to My Program |
Spectral Trade-Off for Measurement Sparsification of Pose-Graph SLAM |
|
Nam, Jiyeon | ASRI, Seoul National University |
Hyeon, Soojeong | Seoul National University |
Joo, Youngjun | Sookmyung Women's University |
Noh, DongKi | LG Electronics Inc |
Shim, Hyungbo | Seoul National University |
Keywords: SLAM, Optimization and Optimal Control, Mapping
Abstract: In this paper, we propose a trade-off optimization algorithm to compute an appropriate number of edges for measurement (edge) sparsification in pose-graph SLAM. The greater the amount of measurement data, the larger is the computational burden. To reduce computational burden, one can remove a portion of measurements. However, reliable data, such as odometric measurements, can be lost if measurements are removed without any principle. To remove measurements which is redundant, we propose a trade-off optimization algorithm between maximization of the Fiedler value and minimization of the largest eigenvalue of adjacency matrix for measurement graph. This problem formulation gives virtues twofold. First, it is scalable. For any dataset, when a weight for trade-off is given, this algorithm determines the appropriate number of edges since this is a trade-off optimization problem. Second, the edges of the measurement graph can be distributed evenly. The algorithm considers the minimization of the largest eigenvalue of the adjacency matrix, so it suppresses the upper bound of the maximum degree of the measurement graph. It removes the redundant information concentrated on a few nodes, and improves the estimation accuracy of the sparsified graph. To validate the performance of the proposed trade-off optimization algorithm, we apply our approach to CSAIL, Intel, and Manhattan datasets.
|
|
13:30-15:00, Paper ThBT26-NT.4 | Add to My Program |
Learning Covariances for Estimation with Constrained Bilevel Optimization |
|
Qadri, Mohamad | Carnegie Mellon University |
Manchester, Zachary | Carnegie Mellon University |
Kaess, Michael | Carnegie Mellon University |
Keywords: SLAM, Probabilistic Inference, Probability and Statistical Methods
Abstract: We consider the problem of learning error covariance matrices for robotic state estimation. The convergence of a state estimator to the correct belief over the robot state is dependent on the proper tuning of noise models. During inference, these models are used to weigh different blocks of the Jacobian and error vector resulting from linearization and hence, additionally affect the stability and convergence of the non-linear system. We propose a gradient-based method to estimate well-conditioned covariance matrices by formulating the learning process as a constrained bilevel optimization problem over factor graphs. We evaluate our method against baselines across a range of simulated and real-world tasks and demonstrate that our technique converges to model estimates that lead to better solutions as evidenced by the improved tracking accuracy on unseen test trajectories.
|
|
13:30-15:00, Paper ThBT26-NT.5 | Add to My Program |
UWB Radar SLAM: An Anchorless Approach in Vision Denied Indoor Environments |
|
Hanchapola Appuhamilage, Gihan Charith Premachandra | Singapore University of Technology and Design |
Liu, Ran | Southwest University of Science and Technology |
Yuen, Chau | Nanyang Technological University |
Tan, U-Xuan | Singapore University of Techonlogy and Design |
Keywords: SLAM, Range Sensing
Abstract: LiDAR and cameras are frequently used as sensors for simultaneous localization and mapping (SLAM). However, these sensors are prone to failure under low visibility (e.g. smoke) or places with reflective surfaces (e.g. mirrors). On the other hand, electromagnetic waves exhibit better penetration properties when the wavelength increases, thus are not affected by low visibility. Hence, this letter presents ultra-wideband (UWB) radar as an alternative to the existing sensors. UWB is generally known to be used in anchor-tag SLAM systems. One or more anchors are installed in the environment and the tags are attached to the robots. Although this method performs well under low visibility, modifying the existing infrastructure is not always feasible. UWB has also been used in peer-to-peer ranging collaborative SLAM systems. However, this requires more than a single robot and does not include mapping in the mentioned environment like smoke. Therefore, the presented approach in this letter solely depends on the UWB transceivers mounted on-board. In addition, an extended Kalman filter (EKF) SLAM is used to solve the SLAM problem at the back-end. Experiments were conducted and demonstrated that the proposed UWB-based radar SLAM is able to map natural point landmarks inside an indoor environment while improving robot localization.
|
|
13:30-15:00, Paper ThBT26-NT.6 | Add to My Program |
Less Is More: Physical-Enhanced Radar-Inertial Odometry |
|
Huang, Qiucan | Hong Kong University of Science and Technology |
Liang, Yuchen | The Hong Kong University of Science and Technology |
Qiao, Zhijian | Hong Kong University of Science and Technology |
Shen, Shaojie | Hong Kong University of Science and Technology |
Yin, Huan | Hong Kong University of Science and Technology |
Keywords: SLAM, Range Sensing, Localization
Abstract: Radar offers the advantage of providing additional physical properties related to observed objects. In this study, we design a physical-enhanced radar-inertial odometry system that capitalizes on the Doppler velocities and radar cross-section information. The filter for static radar points, correspondence estimation, and residual functions are all strengthened by integrating the physical properties. We conduct experiments on both public datasets and our self-collected data, with different mobile platforms and sensor types. Our quantitative results demonstrate that the proposed radar-inertial odometry system outperforms alternative methods using the physical-enhanced components. Our findings also reveal that using the physical properties results in fewer radar points for odometry estimation, but the performance is still guaranteed and even improved, thus aligning with the "less is more" principle.
|
|
13:30-15:00, Paper ThBT26-NT.7 | Add to My Program |
Linear Four-Point LiDAR SLAM for Manhattan World Environments |
|
Jeong, Eunju | Sookmyung Women's University |
Lee, Jina | Sookmyung Women's University |
Kang, Suyoung | Sookmyung Women's University |
Kim, Pyojin | Gwangju Institute of Science and Technology (GIST) |
Keywords: SLAM, Range Sensing, Sensor Fusion
Abstract: We present a new SLAM algorithm that utilizes an inexpensive four-point LiDAR to supplement the limitations of the short-range and viewing angles of RGB-D cameras. Herein, the four-point LiDAR can detect distances up to 40 m, and it senses only four distance measurements per scan. In open spaces, RGB-D SLAM approaches, such as L-SLAM, fail to estimate robust 6-DoF camera poses due to the limitations of the RGB-D camera. We detect walls beyond the range of RGB-D cameras using four-point LiDAR; subsequently, we build a reliable global Manhattan world (MW) map while simultaneously estimating 6-DoF camera poses. By leveraging the structural regularities of indoor MW environments, we overcome the challenge of SLAM with sparse sensing owing to the four-point LiDARs. We expand the application range of L-SLAM while preserving its strong performance, even in low-textured environments, using the linear Kalman filter (KF) framework. Our experiments in various indoor MW spaces, including open spaces, demonstrate that the performance of the proposed method is comparable to that of other state-of-the-art SLAM methods.
|
|
13:30-15:00, Paper ThBT26-NT.8 | Add to My Program |
IBoW3D: Place Recognition Based on Incremental and General Bag of Words in 3D Scans |
|
Lin, Yuxiaotong | ZJU |
Chen, Jiming | Zhejiang University |
Li, Liang | Zhejiang Univerisity |
Keywords: SLAM, Recognition, Localization
Abstract: Existing methods for place recognition in 3D point clouds either ignore partial structure information by converting 3D scans to 2D images or construct constrained bag-of-words (BoW) representations reliant on specific feature extraction algorithms. In this paper, we propose a novel method based on incremental and general bag of words. Incorporating an adaptable keypoint and 3D local feature extraction method, we employ an incremental BoW model that is updated regularly. This enables a coarse-to-fine candidate selection from the database. And a revisit can be identified following geometric verification. In addition, we propose a new supplementary metric that addresses the leaving-out issue of the conventional metric, enhancing the identification of true loops. Employing a state-of-the-art (SOTA) keypoint and feature extraction algorithm, we evaluate our method as well as SOTA place recognition methods using diverse datasets with varying qualities. Experimental results demonstrate that our method outperforms the baselines across all three datasets, showcasing robust performance and notable generalization capabilities.
|
|
13:30-15:00, Paper ThBT26-NT.9 | Add to My Program |
Language-EXtended Indoor SLAM (LEXIS): A Versatile System for Real-Time Visual Scene Understanding |
|
Kassab, Christina | University of Oxford |
Mattamala, Matias | University of Oxford |
Zhang, Lintong | University of Oxford |
Fallon, Maurice | University of Oxford |
Keywords: SLAM, Semantic Scene Understanding, Deep Learning for Visual Perception
Abstract: Versatile and adaptive semantic understanding would enable autonomous systems to comprehend and interact with their surroundings. Existing fixed-class models limit the adaptability of indoor mobile and assistive autonomous systems. In this work, we introduce LEXIS, a real-time indoor Simultaneous Localization and Mapping (SLAM) system that harnesses the open-vocabulary nature of Large Language Models (LLMs) to create a unified approach to scene understanding and place recognition. The approach first builds a topological SLAM graph of the environment (using visual-inertial odometry) and embeds Contrastive Language-Image Pretraining (CLIP) features in the graph nodes. We use this representation for flexible room classification and segmentation, serving as a basis for room-centric place recognition. This allows loop closure searches to be concentrated to semantically relevant places. Our proposed system is evaluated using both public, simulated data and real-world data, covering office and home environments. It successfully categorizes rooms with varying layouts and dimensions and outperforms the state-of-the-art (SOTA). For place recognition and trajectory estimation tasks we achieve equivalent performance to the SOTA, all while utilizing the same pre-trained model. Lastly, we demonstrate the system’s potential for planning.
|
|
ThBT27-NT Oral Session, NT-G2 |
Add to My Program |
Dexterous Manipulation II |
|
|
Chair: Temel, Zeynep | Carnegie Mellon University |
Co-Chair: Taniguchi, Tadahiro | Ritsumeikan University |
|
13:30-15:00, Paper ThBT27-NT.1 | Add to My Program |
Helical Control in Latent Space: Enhancing Robotic Craniotomy Precision in Uncertain Environments |
|
Jia, Yuanyuan | Ritsumeikan University |
Qu, Jessica | Canadian Academy |
Taniguchi, Tadahiro | Ritsumeikan University |
Keywords: Dexterous Manipulation, Medical Robots and Systems, Machine Learning for Robot Control
Abstract: In this paper, we introduce a double-stage transfer learning framework based on expert data. It employs probabilistic graphical models to effectively capture helical periodic features in the latent space, integrating Bayesian variational inference and neural networks for implementation. Compared to traditional methods, it achieves high precision and stable control even in environments with limited observation signals and high noise levels. We have successfully applied this method to a biomedical task of a simulated cranial window procedure. Preliminary results show promising performance comparable to those of human experts with only image information, further validating the efficacy of the proposed method.
|
|
13:30-15:00, Paper ThBT27-NT.2 | Add to My Program |
1 kHz Behavior Tree for Self-Adaptable Tactile Insertion |
|
Wu, Yansong | Technische Universität München |
Wu, Fan | Technical University of Munich |
Chen, Lingyun | Technical University of Munich |
Chen, Kejia | Technical University of Munich |
Schneider, Samuel | TUM |
Johannsmeier, Lars | Franka Robotics GmbH |
Bing, Zhenshan | Technical University of Munich |
Abu-Dakka, Fares | Mondragon University |
Knoll, Alois | Tech. Univ. Muenchen TUM |
Haddadin, Sami | Technical University of Munich |
Keywords: Dexterous Manipulation, Assembly, Force Control
Abstract: Insertion is an essential skill for robots in both modern manufacturing and services robotics. In our previous study, we proposed an insertion skill framework based on force-domain wiggle motion. The main limitation of this method lies in the robot's inability to adjust its behavior according to changing contact state during interaction. In this paper, we extend the skill formalism by incorporating a behavior tree-based primitive switching mechanism that leverages high-frequency tactile data for the estimation of contact state. The efficacy of our proposed framework is validated with a series of experiments that involve the execution of tightly constrained peg-in-hole tasks. The experiment results demonstrate a significant improvement in performance, characterized by reduced execution time, heightened robustness, and superior adaptability when confronted with unknown tasks. Moreover, in the context of transfer learning, our paper provides empirical evidence indicating that the proposed skill framework contributes to enhanced transferability across distinct operational contexts and tasks.
|
|
13:30-15:00, Paper ThBT27-NT.3 | Add to My Program |
DexDLO: Learning Goal-Conditioned Dexterous Policy for Dynamic Manipulation of Deformable Linear Objects |
|
Zhaole, Sun | Tsinghua University, the University of Edinburgh, Intel Lab Chin |
Zhu, Jihong | University of York |
Fisher, Robert | University of Edinburgh |
Keywords: Dexterous Manipulation, Reinforcement Learning
Abstract: Deformable linear object (DLO) manipulation is needed in many fields. Previous research on deformable linear object (DLO) manipulation has primarily involved parallel jaw gripper manipulation with fixed grasping positions. However, the potential for dexterous manipulation of DLOs using an anthropomorphic hand is under-explored. We present DexDLO, a model-free framework that learns dexterous dynamic manipulation policies for deformable linear objects with a fixed-base dexterous hand in an end-to-end way. By abstracting several common DLO manipulation tasks into goal-conditioned tasks, DexDLO can perform tasks such as DLO grabbing, DLO pulling, DLO end-tip position controlling, etc. Using the Mujoco physics simulator, we demonstrate that our framework can efficiently and effectively learn five different DLO manipulation tasks with the same framework parameters. We further provide a thorough analysis of learned policies, reward functions, and reduced observations for a comprehensive understanding of the framework.
|
|
13:30-15:00, Paper ThBT27-NT.4 | Add to My Program |
Everyday Finger: A Robotic Finger That Meets the Needs of Everyday Interactive Manipulation |
|
Castro Ornelas, Ruben | Massachusetts Institute of Technology |
Cantu, Tomas | Massachusetts Institute of Technology |
Sperandio, Isabel | Massachusetts Institute of Technology |
Slocum, Alexander | Massachusetts Institute of Technology |
Agrawal, Pulkit | MIT |
Keywords: Dexterous Manipulation, Actuation and Joint Mechanisms, Compliant Joints and Mechanisms
Abstract: We provide the mechanical and dynamical requirements for a robotic finger capable of performing a large number of everyday tasks. To match these requirements, we present a novel actuator and finger design, the everyday finger, that comes close to many characteristics of the human fingers. In particular, we focus on minimizing the size of components to get proper performance without sacrificing compactness. A robotic hand that uses two Everyday fingers demonstrated an 80% success rate in picking up and placing dishes in a rack, and the ability to pick up flat objects like napkins and delicate ones like strawberries. Videos are available at the project website: https://sites.google.com/view/everydayfinger.
|
|
13:30-15:00, Paper ThBT27-NT.5 | Add to My Program |
Quadratic Programming Based Inverse Kinematics for Precise Bimanual Manipulation |
|
Chaki, Tomohiro | Honda R&D Co., Ltd |
Kawakami, Tomohiro | Honda R&D Co., Ltd |
Keywords: Dual Arm Manipulation, Bimanual Manipulation, Telerobotics and Teleoperation
Abstract: We discuss the precise cooperative motion of a dual manipulator. In the inverse kinematics of cooperative redundant manipulators, a hierarchical method using null space and an optimization method prioritizing the end-effectors relative position in the objective function have been proposed. However, there is no guarantee that the relative position will be maintained in regions subject to joint limits and task-space reachability constraints. As a result, unacceptable errors may occur, and some tasks cannot be accomplished. We propose designing the maximum permissible errors in advance by expressing the target relative position as inequality constraints in the Quadratic Programming (QP) problem. By extending its description to include a virtual spring, we have also achieved subtle force application by two cooperated manipulators. The proposed method was verified by simulation and experiments.
|
|
13:30-15:00, Paper ThBT27-NT.6 | Add to My Program |
Model-Free 3D Shape Control of Deformable Objects Using Novel Features Based on Modal Analysis |
|
Yang, Bohan | The Chinese University of Hong Kong |
Lu, Bo | Soochow University |
Chen, Wei | The Chinese University of Hong Kong |
Zhong, Fangxun | The Chinese University of Hong Kong |
Liu, Yunhui | Chinese University of Hong Kong |
Keywords: Deformable Object Manipulation, Visual Servoing, Learning and Adaptive Systems, Sensor-based Control
Abstract: Shape control of deformable objects is challenging and important. This paper proposes a model-free controller using novel 3D global deformation features based on modal analysis. Unlike most existing controllers using geometric features, our controller employs physically based deformation features designed by decoupling global deformation into low-frequency modes. Although modal analysis is widely adopted in computer vision and simulation, its usage in robotic deformation control is still an open topic. We develop a new model-free framework for the modal-based deformation control. Physical interpretation of the modes enables us to formulate an analytical deformation Jacobian matrix mapping the robot manipulation onto changes of the modal features. In the Jacobian matrix, unknown geometric and physical models of the object are treated as low-dimensional modal parameters which can be used to linearly parameterize the closed-loop system. Thus, an adaptive controller with proven stability can be designed to deform the object while online estimating the modal parameters. Simulations, experiments, and comparative studies are conducted for validation.
|
|
13:30-15:00, Paper ThBT27-NT.7 | Add to My Program |
Global Planning for Contact-Rich Manipulation Via Local Smoothing of Quasi-Dynamic Contact Models |
|
Pang, Tao | Boston Dynamics AI Institute |
Suh, Hyung Ju Terry | Massachusetts Institute of Technology |
Yang, Lujie | MIT |
Tedrake, Russ | Massachusetts Institute of Technology |
Keywords: Manipulation Planning, Dexterous Manipulation, Motion and Path Planning, Contact Modeling
Abstract: The empirical success of Reinforcement Learning (RL) in contact-rich manipulation leaves much to be understood from a model-based perspective, where key difficulties are often attributed to (i) the explosion of contact modes, (ii) stiff, non-smooth contact dynamics and the resulting exploding / discontinuous gradients, and (iii) the non-convexity of the planning problem. The stochastic nature of RL addresses (i) and (ii) by sampling and averaging contact modes. In contrast, model-based methods smooth contact dynamics analytically. Our first contribution establishes the theoretical equivalence of the two methods for simple systems, and shows empirical equivalence on several complex examples. To further alleviate (ii), our second contribution is a convex, differentiable and quasi-dynamic formulation of contact dynamics, which is amenable to both smoothing schemes. Our final contribution resolves (iii), where we show that with smoothing, classical sampling-based motion planning can be effective in global planning. Applying our method on challenging contact-rich manipulation tasks, we show that model-based motion planning can perform comparably to RL with dramatically
|
|
13:30-15:00, Paper ThBT27-NT.8 | Add to My Program |
Enhancing Dexterity in Robotic Manipulation Via Hierarchical Contact Exploration |
|
Cheng, Xianyi | Carnegie Mellon University |
Patil, Sarvesh | Carnegie Mellon University School of Computer Science |
Temel, Zeynep | Carnegie Mellon University |
Kroemer, Oliver | Carnegie Mellon University |
Mason, Matthew T. | Carnegie Mellon University |
Keywords: Dexterous Manipulation, Manipulation Planning, In-Hand Manipulation
Abstract: Planning robot dexterity is challenging due to the non-smoothness introduced by contacts, intricate fine motions, and ever-changing scenarios. We present a hierarchical planning framework for dexterous robotic manipulation (HiDex). This framework explores in-hand and extrinsic dexterity by leveraging contacts. It generates rigid-body motions and complex contact sequences. Our framework is based on Monte-Carlo Tree Search and has three levels: 1) planning object motions and environment contact modes; 2) planning robot contacts; 3) path evaluation and control optimization. This framework offers two main advantages. First, it allows efficient global reasoning over high-dimensional complex space created by contacts. It solves a diverse set of manipulation tasks that require dexterity, both intrinsic (using the fingers) and extrinsic (also using the environment), mostly in seconds. Second, our framework allows the incorporation of expert knowledge and customizable setups in task mechanics and models. It requires minor modifications to accommodate different scenarios and robots. Hence, it provides a flexible and generalizable solution for various manipulation tasks. As examples, we analyze the results on 7 hand configurations and 15 scenarios. We demonstrate 8 tasks on two robot platforms.
|
|
13:30-15:00, Paper ThBT27-NT.9 | Add to My Program |
Inter-Finger Small Object Manipulation with DenseTact Optical Tactile Sensor |
|
Do, Won Kyung | Stanford University |
Aumann, Bianca | Stanford University |
Chungyoun, Camille | Stanford University |
Kennedy, Monroe | Stanford University |
Keywords: Dexterous Manipulation, In-Hand Manipulation, Grasping
Abstract: The ability to grasp and manipulate small objects in cluttered environments remains a significant challenge. This letter introduces a novel approach that utilizes a tactile sensor-equipped gripper with eight degrees of freedom to overcome these limitations. We employ DenseTact 2.0 for the gripper, enabling precise control and improved grasp success rates, particularly for small objects ranging from 5 mm to 25 mm. Our integrated strategy incorporates the robot arm, gripper, and sensor to manipulate and orient small objects for subsequent classification, effectively. We contribute a specialized dataset designed for classifying these objects based on tactile sensor output and a new control algorithm for in-hand orientation tasks. Our system demonstrates 88% of successful grasp and successfully classified small objects in cluttered scenarios.
|
|
ThBT28-NT Oral Session, NT-G4 |
Add to My Program |
Perception for Grasping and Manipulation II |
|
|
Chair: Yang, Chenguang | University of Liverpool |
Co-Chair: Iba, Soshi | Honda Research Institute USA |
|
13:30-15:00, Paper ThBT28-NT.1 | Add to My Program |
ViHOPE: Visuotactile In-Hand Object 6D Pose Estimation with Shape Completion |
|
Li, Hongyu | Brown University |
Dikhale, Snehal | Honda Research Institute USA |
Iba, Soshi | Honda Research Institute USA |
Jamali, Nawid | Honda Research Institute USA, Inc |
Keywords: Perception for Grasping and Manipulation, Deep Learning for Visual Perception, Force and Tactile Sensing
Abstract: In this letter, we introduce ViHOPE, a novel framework for estimating the 6D pose of an in-hand object using visuotactile perception. Our key insight is that the accuracy of the 6D object pose estimate can be improved by explicitly completing the shape of the object. To this end, we introduce a novel visuotactile shape completion module that uses a conditional Generative Adversarial Network to complete the shape of an in-hand object based on volumetric representation. This approach improves over prior works that directly regress visuotactile observations to a 6D pose. By explicitly completing the shape of the in-hand object and jointly optimizing the shape completion and pose estimation tasks, we improve the accuracy of the 6D object pose estimate. We train and test our model on a synthetic dataset and compare it with the state-of-the-art. In the visuotactile shape completion task, we outperform the state-of-the-art by 265% using the Intersection of Union metric and achieve 88% lower Chamfer Distance. In the visuotactile pose estimation task, we present results that suggest our framework reduces position and angular errors by 35% and 64%, respectively. Furthermore, we ablate our framework to confirm the gain on the 6D object pose estimate from explicitly completing the shape. Ultimately, we show that our framework produces models that are robust to sim-to-real transfer on a real-world robot platform.
|
|
13:30-15:00, Paper ThBT28-NT.2 | Add to My Program |
VERGNet: Visual Enhancement Guided Robotic Grasp Detection under Low-Light Condition |
|
Niu, Mingdi | Shanxi University |
Lu, Zhenyu | Bristol Robotics Laboratory |
Chen, Lu | Shanxi University |
Yang, Jing | Shanxi University |
Yang, Chenguang | University of Liverpool |
Keywords: Perception for Grasping and Manipulation, Deep Learning in Grasping and Manipulation
Abstract: Although existing grasp detection methods have achieved encouraging performance under well-light conditions, repetitive experiments have found that the detection performance would deteriorate drastically under low-light conditions. Although supplementary information can be provided by additional sensors, such as depth camera, the sparse and weak visual features still hinder the improvement of detection accuracy. In order to address these, we propose a visual enhancement guided grasp detection model (VERGNet) to improve the robustness of robotic grasping in low-light conditions. Firstly, a simultaneous grasp detection and low-light feature enhancement framework is designed, which integrates residual blocks with coordinate attention to re-optimize grasping features. Then, the unsupervised low-light feature enhancement strategy is adopted to reduce the dependence on paired data as well as improve the algorithmic robustness to low-light conditions. Extensive experiments are finally conducted on two newly-constructed lowlight grasp datasets and the proposed method achieves 98.9% and 91.2% detection accuracy respectively, which are superior to comparative methods. Besides, the effectiveness in our method has also been validated in real-world low-light imaging scenarios.
|
|
13:30-15:00, Paper ThBT28-NT.3 | Add to My Program |
TactileAR: Active Tactile Pattern Reconstruction |
|
Wu, Bing | Dalian University of Technology , China |
Liu, Qian | Dalian University of Technology |
Keywords: Perception for Grasping and Manipulation, Force and Tactile Sensing, Contact Modeling
Abstract: High-resolution (HR) contact surface information is essential for robotic grasping and precise manipulation tasks. However, it remains a challenge for current taxel-based sensors to obtain HR tactile information. In this paper, we focus on utilizing low-resolution (LR) tactile sensors to reconstruct the localized, dense, and HR representation of contact surfaces. In particular, we build a Gaussian triaxial tactile sensor degradation model and propose a tactile pattern reconstruction framework based on the Kalman filter. This framework enables the reconstruction of 2-D HR contact surface shapes using collected LR tactile sequences. In addition, we present an active exploration strategy to enhance the reconstruction efficiency. We evaluate the proposed method in real-world scenarios with comparison to existing priori-information-based approaches. Experimental results confirm the efficiency of the proposed approach and demonstrate satisfactory reconstructions of complex contact surface shapes.
|
|
13:30-15:00, Paper ThBT28-NT.4 | Add to My Program |
Online Estimation of Articulated Objects with Factor Graphs Using Vision and Proprioceptive Sensing |
|
Buchanan, Russell | University of Edinburgh |
Röfer, Adrian | University of Freiburg |
Moura, Joao | The University of Edinburgh |
Valada, Abhinav | University of Freiburg |
Vijayakumar, Sethu | University of Edinburgh |
Keywords: Perception for Grasping and Manipulation, Deep Learning in Grasping and Manipulation, Autonomous Agents
Abstract: From dishwashers to cabinets, humans interact with articulated objects every day, and for a robot to assist in common manipulation tasks, it must learn a representation of articulation. Recent deep learning methods can provide powerful vision-based priors on the affordance of articulated objects from previous, possibly simulated, experiences. In contrast, many other works estimate articulation by observing the object in motion, requiring the robot to already be interacting with the object. In this work, we propose to use the best of both worlds by introducing an online estimation method that merges vision-based affordance predictions from a neural network with interactive kinematic sensing in an analytical model. Our work has the benefit of using vision to predict an articulation model before touching the object, while also being able to update the model quickly from kinematic sensing during the interaction. In this paper, we implement a full system using shared autonomy for robotic opening of articulated objects, in particular objects in which the articulation is not apparent from vision alone. We implemented our system on a real robot and performed several autonomous closed-loop experiments in which the robot had to open a door with unknown joint while estimating the articulation online. Our system achieved an 80% success rate for autonomous opening of unknown articulated objects.
|
|
13:30-15:00, Paper ThBT28-NT.5 | Add to My Program |
Learning to Estimate Incipient Slip with Tactile Sensing to Gently Grasp Objects |
|
Boonstra, Dirk-Jan | Delft University of Technology |
Willemet, Laurence | TU Delft |
Luijkx, Jelle Douwe | Delft University of Technology |
Wiertlewski, Michael | TU Delft |
Keywords: Perception for Grasping and Manipulation, Force and Tactile Sensing, Machine Learning for Robot Control
Abstract: To gently grasp objects, robots need to balance generating enough friction yet avoiding too much force that could damage the object. In practice, the force regulation is challenging to implement since it requires knowledge of the friction coefficient, which can vary from object to object and even from grasp to grasp. Tactile sensing offers a window in the contact mechanics and provides information about friction. Notably touch can detect the precursor of the object slipping away from the grasp. To find this information, tactile sensors measure the deformation field of an artificial skin in both the normal and tangential direction. However, current approaches only react to slip and therefore react too late to perturbations. The object slips, inducing a failure of the grasp and damage. In this study, we introduce a method that uses machine-learning to anticipate slip by computing the so-called safety margin of the grasp. This safety margin represents the extra lateral force that maintains the contact away from the frictional limit. To find this value, we use a high-density camera-based tactile sensor to measure the 3D deformation of the surface via the movement of 82 colored markers. We trained a Convolutional Neural Network (CNN) to estimate the safety margin from the tactile images. Because it gives a distance to slip, the safety margin is a powerful metric for regulating grasp forces. As a testament of this effectiveness, we show that a simple proportional controller can robustly grasp a wide variety of objects. The results show that this control method outperforms slip detection methods, by reducing regrasp reaction times while decreasing grasping forces to 1-3 N.
|
|
13:30-15:00, Paper ThBT28-NT.6 | Add to My Program |
Learning Interaction Constraints for Robot Manipulation Via Set Correspondences |
|
Nan, Junyu | Carnegie Mellon University |
Hodgins, Jessica | Carnegie Mellon University |
Okorn, Brian | Boston Dynamics AI Institute |
Keywords: Perception for Grasping and Manipulation, Deep Learning for Visual Perception
Abstract: Cross-pose estimation between rigid objects is a fundamental building block for robotic applications. In this paper, we propose a new cross-pose estimation method that predicts correspondences on a set level as opposed to a point level. This contrasts methods that predict cross-pose from per-point correspondences, which can encounter optimization problems for objects with symmetries, since each point may have multiple valid correspondences. Our method, SCAlign, consists of a Set Correspondence Network (SCN) which predicts these sets and their correspondences, and an alignment module to compute their relative cross-pose. Taking point clouds of two objects as input, SCN predicts a set label for each point such that such that points that share a set label form a cross object correspondence. The alignment module then computes the cross-pose as the SE(3) transformation that aligns these set correspondences. We compare SCAlign against other cross- pose estimation baselines on a synthetically generated dataset, SynWidth, which contains randomly generated width-mate objects with symmetric or near-symmetric intercepts. SCAlign significantly outperforms the baselines on this challenging dataset. Additionally, we show that set correspondences can be leveraged to distinguish positive and negative matches between pegs and holes. Robot experiments further validate the practical application of this approach.
|
|
13:30-15:00, Paper ThBT28-NT.7 | Add to My Program |
Joint-Loss Enhanced Self-Supervised Learning for Refinement-Coupled Object 6D Pose Estimation |
|
Mu, Fengjun | University of Electronic Science and Technology of China |
Sun, Shixiang | University of Electronic Science and Technology of China |
Huang, Rui | University of Electronic Science and Technology of China |
Zou, Chaobin | University of Electronic Science and Technology of China |
Li, Wenjiang | University of Electronic Science and Technology of China |
Zhan, Huayi | Changhong AI Lab (CHAIR), Sichuan Changhong Electronics Holding |
Cheng, Hong | University of Electronic Science and Technology |
Keywords: Perception for Grasping and Manipulation, Deep Learning for Visual Perception, Computer Vision for Automation
Abstract: 6D object pose estimation plays a crucial role in robot grasping and manipulation. However, the prevalent methods for 6D object pose estimation heavily rely on 6D annotated data to train deep neural networks, which poses challenges due to the difficulty in obtaining sufficient pose annotations. To address this limitation, this paper presents a self-supervised pose estimation method based on a novel pixel-wise weighted dense fusion architecture. This method allows for direct learning from unannotated RGB-D data facilitated by an Iterative Annotation Resolver. Furthermore, a self-supervised pose refinement method based on joint loss is proposed to enhance the pose estimation accuracy. This refinement method employs a differentiable renderer to construct joint optimization constraints. The experimental results demonstrate that our approach achieves a level of pose estimation accuracy that closely rivals that of supervised methods.
|
|
13:30-15:00, Paper ThBT28-NT.8 | Add to My Program |
Force-Based Semantic Representation and Estimation of Feature Points for Robotic Cable Manipulation with Environmental Contacts |
|
Monguzzi, Andrea | Politecnico Di Milano |
Karayiannidis, Yiannis | Lund University |
Rocco, Paolo | Politecnico Di Milano |
Zanchettin, Andrea Maria | Politecnico Di Milano |
Keywords: Perception for Grasping and Manipulation, Dual Arm Manipulation, Force and Tactile Sensing
Abstract: This work demonstrates the utility of dual-arm robots with dual-wrist force-torque sensors in manipulating a Deformable Linear Object (DLO) within an unknown environment that imposes constraints on the DLO's movement through contacts and fixtures. We propose a strategy to estimate the pose of unknown environmental contacts encountered during the manipulation of a DLO, classifying the induced constraints as unilateral, bilateral and fully constrained, exploiting the redundancy of force sensors. A semantic approach to define environmental constraints is introduced and incorporated into a graph-based model of the DLO. This model remains accurate as long as the DLO is under tension and is dynamically updated throughout the manipulation process, built by sequencing a set of primitives. The estimation strategy is validated through simulations and real-world experiments, demonstrating its potential in handling DLOs under various, possibly uncertain, constraints.
|
|
13:30-15:00, Paper ThBT28-NT.9 | Add to My Program |
AnyGrasp: Robust and Efficient Grasp Perception in Spatial and Temporal Domains |
|
Fang, Hao-Shu | Shanghai Jiao Tong University |
Wang, Chenxi | Shanghai Jiao Tong University |
Fang, Hongjie | Shanghai Jiao Tong University |
Gou, Minghao | Shanghai Jiao Tong University |
Liu, Jirong | Shanghai Jiaotong University |
Yan, Hengxu | Shanghai Jiao Tong University |
Liu, Wenhai | Shanghai Jiao Tong University |
Xie, Yichen | University of California, Berkeley |
Lu, Cewu | ShangHai Jiao Tong University |
Keywords: Perception for Grasping and Manipulation, Grasping, Computer Vision for Automation, Grasp Tracking
Abstract: As the basis for prehensile manipulation, it is vital to enable robots to grasp as robustly as humans. Our innate grasping system is prompt, accurate, flexible, and continuous across spatial and temporal domains. Few existing methods cover all these properties for robot grasping. In this paper, we propose AnyGrasp for grasp perception to enable robots these abilities using a parallel gripper. Specifically, we develop a dense supervision strategy with real perception and analytic labels in the spatial-temporal domain. Additional awareness of bjects' center-of-mass is incorporated into the learning process to help improve grasping stability. Utilization of grasp correspondence across observations enables dynamic grasp tracking. Our model can efficiently generate accurate, 7-DoF, dense, and temporally-smooth grasp poses and works robustly against large depth-sensing noise. Using AnyGrasp, we achieve a 93.3% success rate when clearing bins with over 300 unseen objects, which is on par with human subjects under controlled conditions. Over 900 mean-picks-per-hour is reported on a single-arm system. For dynamic grasping, we demonstrate catching swimming robot fish in the water.
|
|
ThBT29-NT Oral Session, NT-G5 |
Add to My Program |
Object Detection IV |
|
|
Chair: Yue, Yufeng | Beijing Institute of Technology |
Co-Chair: Feng, Chen | New York University |
|
13:30-15:00, Paper ThBT29-NT.1 | Add to My Program |
Uplifting Range-View-Based 3D Semantic Segmentation in Real-Time with Multi-Sensor Fusion |
|
Tan, Shiqi | University of Toronto, Huawei Technologies Canada, Co., Ltd |
Fazlali, Hamidreza | Noah's Ark Lab |
Xu, Yixuan | Huawei Technologies Canada Co., Ltd |
Ren, Yuan | Noah's Ark Lab, Huawei Technologies Canada Inc |
Liu, Bingbing | Huawei Technologies |
Keywords: Object Detection, Segmentation and Categorization, Semantic Scene Understanding, Deep Learning for Visual Perception
Abstract: Range-View(RV)-based 3D point cloud segmentation is widely adopted due to its compact data form. However, RV-based methods fall short in providing robust segmentation for the occluded points and suffer from distortion of projected RGB images due to the sparse nature of 3D point clouds. To alleviate these problems, we propose a new LiDAR and Camera Range-view-based 3D point cloud semantic segmentation method (LaCRange)}. Specifically, a distortion-compensating knowledge distillation (DCKD) strategy is designed to remedy the adverse effect of RV projection of RGB images. Moreover, a context-based feature fusion module is introduced for robust and preservative sensor fusion. Finally, in order to address the limited resolution of RV and its insufficiency of 3D topology, a new point refinement scheme is devised for proper aggregation of features in 2D and augmentation of point features in 3D. We evaluated the proposed method on large-scale autonomous driving datasets i.e. SemanticKITTI and nuScenes. In addition to being real-time, the proposed method achieves state-of-the-art results on nuScenes benchmark.
|
|
13:30-15:00, Paper ThBT29-NT.2 | Add to My Program |
Radar Tracker: Moving Instance Tracking in Sparse and Noisy Radar Point Clouds |
|
Zeller, Matthias | CARIAD SE |
Casado Herraez, Daniel | University of Bonn & CARIAD SE |
Behley, Jens | University of Bonn |
Heidingsfeld, Michael | CARIAD SE |
Stachniss, Cyrill | University of Bonn |
Keywords: Object Detection, Segmentation and Categorization, Semantic Scene Understanding, Deep Learning Methods
Abstract: Robots and autonomous vehicles should be aware of what happens in their surroundings. The segmentation and tracking of moving objects are essential for reliable path planning, including collision avoidance. We investigate this estimation task for vehicles using radar sensing. We address moving instance tracking in sparse radar point clouds to enhance scene interpretation. We propose a learning-based radar tracker incorporating temporal offset predictions to enable direct center-based association and enhance segmentation performance by including additional motion cues. We implement attention-based tracking for sparse radar scans to include appearance features and enhance performance. The final association combines geometric and appearance features to overcome the limitations of center-based tracking to associate instances reliably. Our approach shows an improved performance on the moving instance tracking benchmark of the RadarScenes dataset compared to the current state of the art.
|
|
13:30-15:00, Paper ThBT29-NT.3 | Add to My Program |
ProEqBEV: Product Group Equivariant BEV Network for 3D Object Detection in Road Scenes of Autonomous Driving |
|
Liu, Hongwei | Fuzhou University |
Yang, Jian | Information Engineering University |
Li, Zhengyu | East China Normal University |
Li, Ke | Information Engineering University |
Zheng, Jianzhang | Fujian Institute of Research on the Structure of Matter, Chinese |
Wang, Xihao | Technische Universität München |
Tang, Xuan | East China Normal University |
Chen, Mingsong | East China Normal University |
Wei, Xian | East China Normal University |
You, Xiong | Information Engineering University |
Keywords: Object Detection, Segmentation and Categorization, Recognition, Deep Learning for Visual Perception
Abstract: With the rapid development of autonomous driving systems, 3D object detection based on Bird's Eye View (BEV) in road scenes has witnessed great progress over the past few years.As a road scene exhibits a part-whole hierarchy between the within objects and the scene itself, simple parts (e.g., roads, lane lines, vehicles and pedestrians) can be assembled into progressively more complex shapes to form a BEV representation of the whole road scene.Therefore, a BEV often has multiple levels of freedom on motion, i.e., the rotation and the moving shift of the whole BEV, and the random movements of objects (e.g., pedestrians and vehicles) inside the BEV.However, most of the current single-sensor or multi-sensor fusion-based BEV object detection methods have not yet taken into account capturing such multi-level motion in a BEV. To address this problem, we propose a product group equivariant object detection network framework that is equivariant with respect to multiple levels of symmetry groups based on multi-sensor fusion. The proposed framework extracts local equivariant features of objects in point clouds, while global equivariant features are extracted in both point clouds and images. Furthermore, the network learns diverse rotation-equivariant features and mitigates a significant amount of detection errors caused by rotations of BEV and objects inside a BEV, thereby further enhancing the performance of object detection.The experiment results show that the network architecture significantly improves object detection on mAP and NDS, respectively.In addition, in order to demonstrate the effectiveness of the proposed local-multi-global equivariant components, we conduct sufficient ablation experiments. The results show that the individual components are indispensable for the object detection performance improvement of the overall network architecture.
|
|
13:30-15:00, Paper ThBT29-NT.4 | Add to My Program |
Orientation-Aware Multi-Modal Learning for Road Intersection Identification and Mapping |
|
He, Qibin | University of Chinese Academy of Sciences |
Xiao, Zhongyang | Autonomous Driving Division of NIO Inc., China |
Huang, Ze | Fudan University |
Yuan, Hongyuan | Autonomous Driving Division of NIO Inc |
Sun, Li | University of Sheffield |
Keywords: Object Detection, Segmentation and Categorization, Representation Learning, Mapping
Abstract: Accurate identification of road intersections is the pivotal task for automatic construction of high-definition maps, particularly in unstructured scenes. Existing methods predominantly rely on single-modal data and thus show an obvious unimodal limitation, i.e., lack of contextual information. Moreover, these approaches overlook the benefits of leveraging multi-modal data fusion and representation learning that is crucial for generalizability. To this end, we propose a novel orientation-aware multi-modal learning paradigm, which formulates intersection identification as an oriented object detection task. Specifically, heterogeneous fusion is introduced to harmonize disparate data modalities, i.e., vector maps, point clouds, and vehicle trajectories, into a unified feature space. Concurrently, we present trigonometric-induced adaptive regression to elevate orientation estimation, while mitigating issues related to scale imbalance and boundary confusion through dual-objective matching with spatial adaptation. To evaluate our methodology, we assemble the first-of-its-kind multi-modal benchmark tailored for complex low-speed environments, complete with fine-grained semantic annotations for intersections. Comprehensive empirical analyses, including ablation studies, affirm both the superior performance of our proposed framework and the efficacy of its constituent modules.
|
|
13:30-15:00, Paper ThBT29-NT.5 | Add to My Program |
Masked Gamma-SSL: Learning Uncertainty Estimation Via Masked Image Modeling |
|
Williams, David | University of Oxford |
Gadd, Matthew | University of Oxford |
Newman, Paul | Oxford University |
De Martini, Daniele | University of Oxford |
Keywords: Object Detection, Segmentation and Categorization, Representation Learning, Deep Learning for Visual Perception
Abstract: This work proposes a semantic segmentation network that produces high-quality uncertainty estimates in a single forward pass. We exploit general representations from foundation models and unlabelled datasets through a Masked Image Modeling (MIM) approach, which is robust to augmentation hyper-parameters and simpler than previous techniques. For neural networks used in safety-critical applications, bias in the training data can lead to errors; therefore it is crucial to understand a network’s limitations at run time and act accordingly. To this end, we test our proposed method on a number of test domains including the SAX Segmentation benchmark, which includes labelled test data from dense urban, rural and off-road driving domains. The proposed method consistently outperforms uncertainty estimation and Out-of-Distribution (OoD) techniques on this difficult benchmark.
|
|
13:30-15:00, Paper ThBT29-NT.6 | Add to My Program |
EfficientDPS: Efficient and End-To-End Depth-Aware Panoptic Segmentation |
|
Wu, Shengkai | CVTE |
Ren, Liangliang | CVTE |
Gao, Linfeng | CVTE |
Li, Yupeng | CVTE |
Liu, Wenyu | Huazhong University of Science and Technology |
Keywords: Object Detection, Segmentation and Categorization, Semantic Scene Understanding, Deep Learning for Visual Perception
Abstract: Depth-aware panoptic segmentation (DPS) combines image segmentation and monocular depth estimation in a single model to achieve semantic and geometry perception simultaneously. DPS task has important applications in the robot area but the previous DPS models are too heavy to be applied. Thus, we propose EfficientDPS, an efficient, end-to-end, and unified model for DPS. In our method, query features extracted with convolution networks are used to represent things/stuff. In this way, different vision tasks such as classification, segmentation, and depth estimation can be realized in a unified manner, leading to a compact and efficient model. EfficientDPS can be trained and tested in an end-to-end manner via bipartite matching and complex post-process is not needed at inference. To enhance the supervision signal, group query representation is proposed, leading to better performance without affecting the inference speed. Extensive experiments on Cityscapes-DPS and SemKITTI-DPS show that EfficientDPS can achieve the best trade-off between speed and accuracy than the state-of-the-art methods.
|
|
13:30-15:00, Paper ThBT29-NT.7 | Add to My Program |
Robust Collaborative Perception against Temporal Information Disturbance |
|
He, Xunjie | Beijing Institute of Technology |
Li, Yiming | Beijing Institute of Technology |
Cui, Te | Beijing Institute of Technology |
Wang, Meiling | Beijing Institute of Technology |
Liu, Tong | Beijing Institute of Technology |
Yue, Yufeng | Beijing Institute of Technology |
Keywords: Object Detection, Segmentation and Categorization, Semantic Scene Understanding, AI-Based Methods
Abstract: Collaborative perception facilitates a more comprehensive representation of the environment by leveraging complementary information shared among various agents and sensors. However, practical applications often encounter information disturbance which includes perception packet loss and time delays, and a comprehensive framework that can simultaneously address such issues is absent. In addition, the feature extraction process prior to fusion is not sufficient, as it lacks exploration of the local semantics and context dependencies of individual features. To enhance both accuracy and robustness, this paper introduces a novel framework named Robust Collaborative Perception against Temporal Information Disturbance, which predicts perception information when disturbance occurs. Specifically, the Historical Frame Prediction (HFP) module is introduced to make compensation for information loss with temporal association excavation of historical features. Based on the predicted features generated by the HFP module, the Pyramid Attention Integration (PAI) module is introduced to augment local semantics and incorporate global long-range dependencies through multi-scale window attention. Compared with existing methods on the publicly available dataset OPV2V, our approach exhibits superior performance and expanded robustness in the 3D object detection task. The code will be publicly available at https://github.com/ hexunjie/Ro- temd.
|
|
13:30-15:00, Paper ThBT29-NT.8 | Add to My Program |
Concavity-Induced Distance for Unoriented Point Cloud Decomposition |
|
Wang, Ruoyu | New York University |
Xue, Yanfei | New York University |
Surianarayanan, Bharath | New York University |
Tian, Dong | InterDigital |
Feng, Chen | New York University |
Keywords: Object Detection, Segmentation and Categorization, Semantic Scene Understanding, Computational Geometry
Abstract: We propose Concavity-induced Distance (CID) as a novel way to measure the dissimilarity between a pair of points in an unoriented point cloud. CID indicates the likelihood of two points or two sets of points belonging to different convex parts of an underlying shape represented as a point cloud. After analyzing its properties, we demonstrate how CID can benefit point cloud analysis without the need for meshing or normal estimation, which is beneficial for robotics applications when dealing with raw point cloud observations. By randomly selecting very few points for manual labeling, a CID-based point cloud instance segmentation via label propagation achieves comparable average precision as recent supervised deep learning approaches, on S3DIS and ScanNet datasets. Moreover, CID can be used to group points into approximately convex parts whose convex hulls can be used as compact scene representations in robotics, and it outperforms the baseline method in terms of grouping quality. Our project website is available at: https://ai4ce.github.io/CID/
|
|
13:30-15:00, Paper ThBT29-NT.9 | Add to My Program |
FocoTrack: Multi Object Tracking by Focusing on Overlap at Low Frame Rate |
|
Lee, Jae-Hyeok | Korea Advanced Institute of Science Technology |
Park, Jae-Hyeon | Korea Advanced Institute of Science and Technology (KAIST) |
Chang, Dong Eui | KAIST |
Keywords: Human Detection and Tracking
Abstract: Multi-object tracking (MOT) presents a crucial challenge in robotics. Due to limited resources embedded in robots, one time step per processing time for algorithms can be considerably large. This scenario necessitates the operation of MOT at a low-frame rate. However, algorithms within the MOT research field have been constructed around datasets functioning at 10--30 frames per second (fps) which can be difficult to operate in the limited resources. In response to it, we introduce a new algorithm, called FocoTrack, which maintains tracking ability in four situations, one of which is when objects are overlapped by each other. Our algorithm exhibits remarkable performance without using any deep appearance descriptor, surpassing existing MOT methods which even use the deep appearance descriptor on a 2.5 fps dataset. We also demonstrate strong results with our algorithm on DanceTrack dataset at 20 fps and provide comprehensive insights through detailed analysis of our tracking model.
|
|
ThBT30-NT Oral Session, NT-G6 |
Add to My Program |
AI-Enabled Robotics and Learning |
|
|
Chair: Lu, Peng | The University of Hong Kong |
Co-Chair: Inamura, Tetsunari | Tamagawa University |
|
13:30-15:00, Paper ThBT30-NT.1 | Add to My Program |
Ethically Compliant Autonomous Systems under Partial Observability |
|
Lu, Qingyuan | Massachusetts Institute of Technology |
Svegliato, Justin | University of California Berkeley |
Nashed, Samer | University of Massachusetts Amherst |
Zilberstein, Shlomo | University of Massachusetts |
Russell, Stuart Jonathan | University of California, Berkeley |
Keywords: Ethics and Philosophy, Planning under Uncertainty
Abstract: Ethically compliant autonomous systems (ECAS) are the prevailing approach to building robotic systems that perform sequential decision making subject to ethical theories in fully observable environments. However, in real-world robotics settings, these systems often operate under partial observability because of sensor limitations, environmental conditions, or limited inference due to bounded computational resources. Therefore, this paper proposes a partially observable ECAS (PO-ECAS), bringing this work one step closer to being a practical and useful tool for roboticists. First, we formally introduce the PO-ECAS framework and a MILP-based solution method for approximating an optimal ethically compliant policy. Next, we extend an existing ethical framework for prima facie duties to belief space and offer an ethical framework for virtue ethics inspired by Aristotle's Doctrine of the Mean. Finally, we demonstrate that our approach is effective in a simulated campus patrol robot domain.
|
|
13:30-15:00, Paper ThBT30-NT.2 | Add to My Program |
Prompt, Plan, Perform: LLM-Based Humanoid Control Via Quantized Imitation Learning |
|
Sun, Jingkai | The Hong Kong University of Science and Technology(GZ) |
Zhang, Qiang | The Hong Kong University of Science and Technology (Guangzhou) |
Duan, Yiqun | University of Technolgoy Sydney |
Jiang, Xiaoyang | Northeastern University |
Cheng, Chong | HKUST(GZ) |
Xu, Renjing | The Hong Kong University of Science and Technology (Guangzhou) |
Keywords: Imitation Learning, Whole-Body Motion Planning and Control, Human and Humanoid Motion Analysis and Synthesis
Abstract: In recent years, reinforcement learning and imitation learning have shown great potential for controlling humanoid robots' motion. However, these methods typically create simulation environments and rewards for specific tasks, resulting in the requirements of multiple policies and limited capabilities for tackling complex and unknown tasks. To overcome these issues, we present a novel approach that combines adversarial imitation learning with large language models (LLMs). This innovative method enables the agent to learn reusable skills with a single policy and solve zero-shot tasks under the guidance of LLMs. In particular, we utilize the LLM as a strategic planner for applying previously learned skills to novel tasks through the comprehension of task-specific prompts. This empowers the robot to perform the specified actions in a sequence. To improve our model, we incorporate codebook-based vector quantization, allowing the agent to generate suitable actions in response to unseen textual commands from LLMs. Furthermore, we design general reward functions that consider the distinct motion features of humanoid robots, ensuring the agent imitates the motion data while maintaining goal orientation without additional guiding direction approaches or policies. To the best of our knowledge, this is the first framework that controls humanoid robots using a single learning policy network and LLM as a planner. Extensive experiments demonstrate that our method exhibits efficient and adaptive ability in complicated motion tasks.
|
|
13:30-15:00, Paper ThBT30-NT.3 | Add to My Program |
Infer and Adapt: Bipedal Locomotion Reward Learning from Demonstrations Via Inverse Reinforcement Learning |
|
Wu, Feiyang | Georgia Institute of Technology |
Gu, Zhaoyuan | Georgia Institute of Technology |
Wu, Hanran | Georgia Institute of Technology |
Wu, Anqi | Georgia Institute of Technology |
Zhao, Ye | Georgia Institute of Technology |
Keywords: Learning from Demonstration, Humanoid and Bipedal Locomotion, Reinforcement Learning
Abstract: Enabling bipedal walking robots to learn how to maneuver over highly uneven, dynamically changing terrains is challenging due to the complexity of robot dynamics and interacted environments. Recent advancements in learning from demonstrations have shown promising results for robot learning in complex environments. While imitation learning of expert policies has been well-explored, the study of learning expert reward functions is largely under-explored in legged locomotion. This paper brings state-of-the-art Inverse Reinforcement Learning (IRL) techniques to solving bipedal locomotion problems over complex terrains. We propose algorithms for learning expert reward functions, and we subsequently analyze the learned functions. Through nonlinear function approximation, we uncover meaningful insights into the expert's locomotion strategies. Furthermore, we empirically demonstrate that training a bipedal locomotion policy with the inferred reward functions enhances its walking performance on unseen terrains, highlighting the adaptability offered by reward learning.
|
|
13:30-15:00, Paper ThBT30-NT.4 | Add to My Program |
Online Distribution Shift Detection Via Recency Prediction |
|
Luo, Rachel | Stanford University |
Sinha, Rohan | Stanford University |
Sun, Yixiao | Stanford University |
Hindy, Ali | Stanford University |
Zhao, Shengjia | Stanford University |
Savarese, Silvio | Stanford University |
Schmerling, Edward | Stanford University |
Pavone, Marco | Stanford University |
Keywords: Probability and Statistical Methods, AI-Enabled Robotics, Methods and Tools for Robot System Design
Abstract: When deploying modern machine learning-enabled robotic systems in high-stakes applications, detecting distribution shift is critical. However, most existing methods for detecting distribution shift are not well-suited to robotics settings, where data often arrives in a streaming fashion and may be very high-dimensional. In this work, we present an online method for detecting distribution shift with guarantees on the false positive rate --- i.e., when there is no distribution shift, our system is very unlikely (with probability < epsilon) to falsely issue an alert; any alerts that are issued should therefore be heeded. Our method is specifically designed for efficient detection even with high dimensional data, and it empirically achieves up to 11x faster detection on realistic robotics settings compared to prior work while maintaining a low false negative rate in practice (whenever there is a distribution shift in our experiments, our method indeed emits an alert). We demonstrate our approach in both simulation and hardware for a visual servoing task, and show that our method indeed issues an alert before a failure occurs.
|
|
13:30-15:00, Paper ThBT30-NT.5 | Add to My Program |
Simplified Continuous High Dimensional Belief Space Planning with Adaptive Probabilistic Belief-Dependent Constraints |
|
Zhitnikov, Andrey | Technion – Israel Institute of Technology |
Indelman, Vadim | Technion - Israel Institute of Technology |
Keywords: Probability and Statistical Methods, Autonomous Agents, SLAM, Belief Space Planning
Abstract: Online decision making under uncertainty in partially observable domains, also known as Belief Space Planning, is a fundamental problem in Robotics and Artificial Intelligence. Due to an abundance of plausible future unravelings, calculating an optimal course of action inflicts an enormous computational burden on the agent. Moreover, in many scenarios, e.g., Information gathering, it is required to introduce a belief-dependent constraint. Prompted by this demand, in this paper, we consider a recently introduced probabilistic belief-dependent constrained POMDP. We present a technique to adaptively accept or discard a candidate action sequence with respect to a probabilistic belief-dependent constraint, before expanding a complete set of sampled future observations episodes and without any loss in accuracy. Moreover, using our proposed framework, we contribute an adaptive method to find a maximal feasible return (e.g., Information Gain) in terms of Value at Risk and a corresponding action sequence, given a set of candidate action sequences, with substantial acceleration. On top of that, we introduce an adaptive simplification technique for a probabilistically constrained setting. Such an approach provably returns an identical-quality solution while dramatically accelerating the online decision making. Our universal framework applies to any belief-dependent constrained continuous POMDP with parametric beliefs, as well as nonparametric beliefs represented by particles.
|
|
13:30-15:00, Paper ThBT30-NT.6 | Add to My Program |
Conformal Policy Learning for Sensorimotor Control under Distribution Shifts |
|
Huang, Huang | University of California at Berkeley |
Sharma, Satvik | University of California, Berkeley |
Loquercio, Antonio | UC Berkeley |
Angelopoulos, Anastasios | University of California, Berkeley |
Goldberg, Ken | UC Berkeley |
Malik, Jitendra | UC Berkeley |
Keywords: Probability and Statistical Methods, Machine Learning for Robot Control, Sensorimotor Learning
Abstract: This paper focuses on the problem of detecting and reacting to changes in the distribution of a sensorimotor controller’s observables. The key idea is the design of policies that can take conformal quantiles as input, to detect distribution shifts with formal statistical guarantees, which we define as conformal policy learning. We show how to design such policies by using conformal quantiles to switch between base policies with different characteristics, e.g. safety or speed, or directly augmenting a policy observation with a quantile and training it with reinforcement learning. Theoretically, we show that such policies achieve the formal convergence guarantees in finite time. In addition, we thoroughly evaluate their advantages and limitations on two use cases: simulated autonomous driving and active perception with a physical quadruped. Empirical results demonstrate that our approach outperforms five baselines. It is also the simplest of the baseline strategies besides one ablation. Being easy to use, flexible, and with formal guarantees, our work demonstrates how conformal prediction can be an effective tool for sensorimotor learning under uncertainty.
|
|
13:30-15:00, Paper ThBT30-NT.7 | Add to My Program |
Resampling-Free Particle Filters in High-Dimensions |
|
Boopathy, Akhilan | Massachusetts Institute of Technology |
Muppidi, Aneesh | Harvard |
Yang, Peggy | MIT |
Iyer, Abhiram | MIT |
Yue, William | MIT |
Ila, Fiete | MIT |
Keywords: Probability and Statistical Methods, Probabilistic Inference, Localization
Abstract: State estimation is crucial for the performance and safety of numerous robotic applications. Among the suite of estimation techniques, particle filters have been identified as a powerful solution due to their non-parametric nature. Yet, in high-dimensional state spaces, these filters face challenges such as 'particle deprivation' which hinders accurate representation of the true posterior distribution. This paper introduces a novel resampling-free particle filter designed to mitigate particle deprivation by forgoing the traditional resampling step. This ensures a broader and more diverse particle set, especially vital in high-dimensional scenarios. Theoretically, our proposed filter is shown to offer a near-accurate representation of the desired posterior distribution in high-dimensional contexts. Empirically, the effectiveness of our approach is underscored through a high-dimensional synthetic state estimation task and a 6D pose estimation derived from videos. We posit that as robotic systems evolve with greater degrees of freedom, particle filters tailored for high-dimensional state spaces will be indispensable.
|
|
13:30-15:00, Paper ThBT30-NT.8 | Add to My Program |
A New Perspective of DL Testing Framework: Human-Computer Interaction Based Neural Network Testing |
|
Kong, Wei | National Key Laboratory on Science and Technology of Information |
Hu, Li | National Key Laboratory on Science and Technology of Information |
Qianjin, Du | Tsinghua University |
Huayang, Cao | National Key Laboratory on Science and Technology of Information |
Xiaohui, Kuang | National Key Laboratory on Science and Technology of Information |
Keywords: Human and Humanoid Motion Analysis and Synthesis, Safety in HRI, Humanoid Robot Systems
Abstract: Deep learning models have revolutionized various domains but have also raised concerns regarding their security and reliability. Adversarial attacks and coverage-based testing have been extensively studied to assess and enhance the dependability of deep neural networks. However, current research in this area has reached a state of stagnation. Adversarial attacks focus on exploiting vulnerabilities in models, while coverage-based testing aims to achieve comprehensive testing but overlooks application scenarios. Moreover, evaluating test cases solely based on their fault-revealing capability is insufficient. To address these limitations, we propose an innovative interdisciplinary framework that incorporates human-computer interaction methods in deep learning security testing. By considering the attributes of model application scenarios, we can design more effective test suites. Additionally, we establish a comprehensive evaluation metric for test suite quality, considering factors such as diversity and naturalness. This framework promotes reliable and secure deployment of deep learning models, fostering interdisciplinary collaboration between artificial intelligence and human-computer interaction.
|
|
ThBT31-NT Oral Session, NT-G7 |
Add to My Program |
Autonomous Vehicle Navigation II |
|
|
Chair: Xiao, Xuesu | George Mason University |
|
13:30-15:00, Paper ThBT31-NT.1 | Add to My Program |
MARC: Multipolicy and Risk-Aware Contingency Planning for Autonomous Driving |
|
Li, Tong | Hong Kong University of Science and Technology |
Zhang, Lu | Hong Kong University of Science and Technology |
Liu, Sikang | DJI |
Shen, Shaojie | Hong Kong University of Science and Technology |
Keywords: Autonomous Vehicle Navigation, Motion and Path Planning, Intelligent Transportation Systems
Abstract: Generating safe and non-conservative behaviors in dense, dynamic environments remains challenging for automated vehicles due to the stochastic nature of traffic participants' behaviors and their implicit interaction with the ego vehicle. This paper presents a novel planning framework, Multipolicy And Risk-aware Contingency planning (MARC), that systematically addresses these challenges by enhancing the multipolicy-based pipelines from both behavior and motion planning aspects. Specifically, MARC realizes a critical scenario set that reflects multiple possible futures conditioned on each semantic-level ego policy. Then, the generated policy-conditioned scenarios are further formulated into a tree-structured representation with a dynamic branchpoint based on the scene-level divergence. Moreover, to generate diverse driving maneuvers, we introduce risk-aware contingency planning, a bi-level optimization algorithm that simultaneously considers multiple future scenarios and user-defined risk tolerance levels. Owing to the more unified combination of behavior and motion planning layers, our framework achieves efficient decision-making and human-like driving maneuvers. Comprehensive experimental results demonstrate superior performance to other strong baselines in various driving environments.
|
|
13:30-15:00, Paper ThBT31-NT.2 | Add to My Program |
Graph-Based Scenario-Adaptive Lane-Changing Trajectory Planning for Autonomous Driving |
|
Dong, Qing | Tsinghua University |
Yan, Zhanhong | The University of Tokyo |
Nakano, Kimihiko | The University of Tokyo |
Ji, Xuewu | Tsinghua University |
Liu, Yahui | Tsinghua University |
Keywords: Autonomous Vehicle Navigation, Motion and Path Planning, Learning from Demonstration
Abstract: Trajectory planning is one of the key challenges to the rapid and large-scale deployment of autonomous driving. The lane-changing trajectory planning algorithm for autonomous driving is typically formulated as a optimization process of a cost function, which can be challenging to manually tune for different traffic scenarios. This paper presents a graph-based scenario-adaptive lane-changing trajectory planning approach that overcomes this challenge. Specifically, the cost function recovery method based on maximum entropy inverse reinforcement learning (IRL) is proposed to recover the cost functions of the all demonstrated lane-changing trajectories, and the cost function database is constructed. Then, the scenario matching model based on spatial-temporal graph convolutional network (ST-GCN) is proposed to match the recovered cost functions with the traffic scenarios, making the lane-changing trajectory planning method scenario-adaptive. Our proposed method is evaluated through simulations on the well-known NGSIM dataset and experiments on two typical lane-changing scenarios on the autonomous driving platform. The results show that our method is capable of learning the lane-changing cost function from demonstration and performing scenario-adaptive lane-changing trajectory planning.
|
|
13:30-15:00, Paper ThBT31-NT.3 | Add to My Program |
Toward Wheeled Mobility on Vertically Challenging Terrain: Platforms, Datasets, and Algorithms |
|
Datar, Aniket | George Mason University |
Pan, Chenhui | George Mason University |
Nazeri, Mohammad | PhD Student at George Mason University |
Xiao, Xuesu | George Mason University |
Keywords: Autonomous Vehicle Navigation, Motion and Path Planning, Machine Learning for Robot Control
Abstract: Most conventional wheeled robots can only move in flat environments and simply divide their planar workspaces into free spaces and obstacles. Deeming obstacles as non- traversable significantly limits wheeled robots’ mobility in real- world, extremely rugged, off-road environments, where part of the terrain (e.g., irregular boulders and fallen trees) will be treated as non-traversable obstacles. To improve wheeled mobility in those environments with vertically challenging terrain, we present two wheeled platforms with little hardware modification compared to conventional wheeled robots; we collect datasets of our wheeled robots crawling over previously non-traversable, vertically challenging terrain to facilitate data-driven mobility; we also present algorithms and their experimental results to show that conventional wheeled robots have previously unrealized potential of moving through vertically challenging terrain. We make our platforms, datasets, and algorithms publicly available to facilitate future research on wheeled mobility.
|
|
13:30-15:00, Paper ThBT31-NT.4 | Add to My Program |
Rethinking Social Robot Navigation: Leveraging the Best of Two Worlds |
|
Raj, Amir Hossain | George Mason University |
Hu, Zichao | University of Texas at Austin |
Karnan, Haresh | The University of Texas at Austin |
Chandra, Rohan | UT Austin |
Payandeh, Amirreza | George Mason |
Mao, Luisa | University of Texas Austin |
Stone, Peter | University of Texas at Austin |
Biswas, Joydeep | University of Texas at Austin |
Xiao, Xuesu | George Mason University |
Keywords: Autonomous Vehicle Navigation, Motion and Path Planning, Machine Learning for Robot Control
Abstract: Empowering robots to navigate in a socially compliant manner is essential for the acceptance of robots moving in human-inhabited environments. Previously, roboticists have developed geometric navigation systems with decades of empirical validation to achieve safety and efficiency. However, the many complex factors of social compliance make geometric navigation systems hard to adapt to social situations, where no amount of tuning enables them to be both safe (people are too unpredictable) and efficient (the frozen robot problem). With recent advances in deep learning approaches, the common reaction has been to entirely discard classical navigation systems and start from scratch, building a completely new learning based social navigation planner. In this work, we find that this reaction is unnecessarily extreme: using a large-scale real world social navigation dataset, SCAND, we find that geometric systems can produce trajectory plans that align with the human demonstrations in a large number of social situations. We, therefore, ask if we can rethink the social robot navigation problem by leveraging the advantages of both geometric and learning-based methods. We validate this hybrid paradigm through a proof-of-concept experiment, in which we develop a hybrid planner that switches between geometric and learning based planning. Our experiments on both SCAND and two physical robots show that the hybrid planner can achieve better social compliance compared to using either the geometric or learning-based approach alone.
|
|
13:30-15:00, Paper ThBT31-NT.5 | Add to My Program |
Continuous Robotic Tracking of Dynamic Targets in Complex Environments Based on Detectability |
|
Wang, Zhihao | Harbin Institute of Technology, Shenzhen |
Huang, Shixing | Harbin Institute of Technology, Shenzhen |
Li, Minghang | Harbin Institute of Technology, Shenzhen |
Ouyang, Junyuan | Harbin Institute of Technology, ShenZhen |
Wang, Yu | Harbin Institute of Technology, Shenzhen |
Chen, Haoyao | Harbin Institute of Technology, Shenzhen |
Keywords: Autonomous Vehicle Navigation, Search and Rescue Robots, Task and Motion Planning
Abstract: Target tracking is a fundamental task in the domain of robotics. The effectiveness of target tracking hinges upon various factors, such as tracking distance, occlusions, collision avoidance, etc. However, few existing works can simultaneously tackle these considerations of tracking single and multiple targets in complex environments. In this study, the interaction mechanism of target tracking between the robot, the environment and the targets is analyzed, and a general measure named detectability is introduced to correlate the tracking performance for guiding robotic motion planning. Based on the detectability measure, the robotic motion planning framework based on Model Predictive Control (MPC) is proposed to achieve continuous and robust tracking of single, two and three targets in complex environments. Simulations and experiments are performed and verify the performances of our method better than the state-of-the-art methods.
|
|
13:30-15:00, Paper ThBT31-NT.6 | Add to My Program |
Talk2BEV: Language-Enhanced Bird’s-Eye View Maps for Autonomous Driving |
|
Choudhary, Tushar | International Institute of Information Technology, Hyderabad |
Dewangan, Vikrant | International Institute of Information Technology, Hyderabad |
Chandhok, Shivam | University of British Columbia |
Priyadarshan, Shubham | International Institute of Information Technology Hyderabad |
Jain, Anushka | International Institute of Information Technology, Hyderabad |
Singh, Arun Kumar | University of Tartu |
Srivastava, Siddharth | TensorTour Inc |
Jatavallabhula, Krishna Murthy | MIT |
Krishna, Madhava | IIIT Hyderabad |
Keywords: Autonomous Vehicle Navigation, Semantic Scene Understanding, Vision-Based Navigation
Abstract: This work introduces Talk2BEV, a large vision-language model (LVLM) interface for bird’s-eye view (BEV) maps commonly used in autonomous driving. While existing perception systems for autonomous driving scenarios have largely focused on a pre-defined (closed) set of object categories and driving scenarios, Talk2BEV eliminates the need for BEV-specific training, relying instead on well-performing pre-trained LVLMs. This enables a single system to cater to a variety of autonomous driving tasks encompassing visual and spatial reasoning, predicting the intents of traffic actors, and decision-making based on visual cues. We extensively evaluate Talk2BEV on a large number of scene understanding tasks that rely on both the ability to interpret freeform natural language queries, and in grounding these queries to the visual context embedded into the language-enhanced BEV map. To enable further research in LVLMs for autonomous driving scenarios, we develop and release Talk2BEV-Bench, a benchmark encompassing 1000 human-annotated BEV scenarios, with more than 20,000 questions and ground-truth responses from the NuScenes dataset.
|
|
13:30-15:00, Paper ThBT31-NT.7 | Add to My Program |
DualAT: Dual Attention Transformer for End-To-End Autonomous Driving |
|
Chen, Zesong | Sun Yat-Sen University |
Yu, Ze | Sun Yat-Sen University |
Li, Jun | Sun Yat-Sen University |
You, Linlin | Sun Yat-Sen University |
Tan, Xiaojun | Sun Yat-Sen University |
Keywords: Autonomous Vehicle Navigation, Sensor Fusion, Intelligent Transportation Systems
Abstract: The effective reasoning of integrated multimodal perception information is crucial for achieving enhanced end-to-end autonomous driving performance. In this paper, we introduce a novel multitask imitation learning framework for end-to-end autonomous driving that leverages a dual attention transformer (DualAT) to enhance the multimodal fusion and waypoint prediction processes. A self-attention mechanism captures global context information and models the long-term temporal dependencies of waypoints for multiple time steps. On the other hand, a cross-attention mechanism implicitly associates the latent feature representations derived from different modalities through a learnable geometrically linked positional embedding. Specifically, the DualAT excels at processing and fusing information from multiple camera views and LiDAR sensors, enabling comprehensive scene understanding for multitask learning. Furthermore, the DualAT introduces a novel waypoint prediction architecture that combines the temporal relationships between waypoints with the spatial features extracted from sensor inputs. We evaluate our approach on both the Town05 and Longest6 benchmarks using the closed-loop CARLA urban driving simulator and provide extensive ablation studies. The experimental results demonstrate that our approach significantly outperforms the state-of-the-art methods.
|
|
13:30-15:00, Paper ThBT31-NT.8 | Add to My Program |
SCALE: Self-Correcting Visual Navigation for Mobile Robots Via Anti-Novelty Estimation |
|
Chen, Chang | The Chinese University of Hong Kong, Shenzhen |
Liu, Yuecheng | Huawei Noah's Ark Lab |
Zhuang, Yuzheng | Huawei Technologies Company |
Mao, Sitong | ShenZhen Huawei Cloud Computing Technologies Co., Ltd |
Xue, Kaiwen | The Chinese University of Hong Kong, Shenzhen |
Zhou, Shunbo | The Chinese University of Hong Kong |
Keywords: Autonomous Vehicle Navigation, Service Robotics
Abstract: Although visual navigation has been extensively studied using deep reinforcement learning, online learning for real-world robots remains a challenging task. Recent work directly learned from offline dataset to achieve broader generalization in the real-world tasks, which, however, faces the out-of-distribution (OOD) issue and potential robot localization failures in a given map for unseen observation. This significantly drops the success rates and even induces collision. In this paper, we present a self-correcting visual navigation method, SCALE, that can autonomously prevent the robot from the OOD situations without human intervention. Specifically, we develop an image-goal conditioned offline reinforcement learning method based on implicit Q-learning (IQL). When facing OOD observation, our novel localization recovery method generates the potential future trajectories by learning from the navigation affordance, and estimates the future novelty via random network distillation (RND). A tailored cost function searches for the candidates with the least novelty that can lead the robot to the familiar places. We collect offline data and conduct evaluation experiments in three real-world urban scenarios. Experiment results show that SCALE outperforms the previous state-of-the-art methods for open-world navigation with a unique capability of localization recovery, significantly reducing the need for human intervention. Code is available at https://github.com/KubeEdge4Robotics/ScaleNav.
|
|
13:30-15:00, Paper ThBT31-NT.9 | Add to My Program |
UniGen: Unified Modeling of Initial Agent States and Trajectories for Generating Autonomous Driving Scenarios |
|
Mahjourian, Reza | Waymo |
Mu, Rongbing | Waymo |
Likhosherstov, Valerii | Waymo UK Ltd |
Mougin, Paul | Waymo |
Huang, Xiukun | Waymo |
Teixeira de Sousa Messias, Joăo Vicente | Waymo UK |
Whiteson, Shimon | Waymo |
Keywords: Autonomous Vehicle Navigation, Simulation and Animation, Deep Learning Methods
Abstract: This paper introduces UniGen, a novel approach to generating new traffic scenarios for evaluating and improving autonomous driving software through simulation. Our approach models all driving scenario elements in a unified model: the position of new agents, their initial state, and their future motion trajectories. By predicting the distributions of all these variables from a shared global scenario embedding, we ensure that the final generated scenario is fully conditioned on all available context in the existing scene. Our unified modeling approach, combined with autoregressive agent injection, conditions the placement and motion trajectory of every new agent on all existing agents and their trajectories, leading to realistic scenarios with low collision rates. Our experimental results show that UniGen outperforms prior state of the art on the Waymo Open Motion Dataset.
|
|
ThBT32-NT Oral Session, NT-G8 |
Add to My Program |
Image-Based Navigation I |
|
|
Chair: Ding, Wenbo | Tsinghua University |
|
13:30-15:00, Paper ThBT32-NT.1 | Add to My Program |
S2R-ViT for Multi-Agent Cooperative Perception: Bridging the Gap from Simulation to Reality |
|
Li, Jinlong | Cleveland State University |
Xu, Runsheng | UCLA |
Liu, Xinyu | Cleveland State University |
Li, Baolu | Cleveland State University |
Zou, Qin | Wuhan University |
Ma, Jiaqi | University of California, Los Angeles |
Yu, Hongkai | Cleveland State University |
Keywords: Computer Vision for Automation, Deep Learning for Visual Perception, Computer Vision for Transportation
Abstract: Due to the lack of enough real multi-agent data and time-consuming of labeling, existing multi-agent cooperative perception algorithms usually select the simulated sensor data for training and validating. However, the perception performance is degraded when these simulation-trained models are deployed to the real world, due to the significant domain gap between the simulated and real data. In this paper, we propose the first Simulation-to-Reality transfer learning framework for multi-agent cooperative perception using a novel Vision Transformer, named as S2R-ViT, which considers both the Deployment Gap and Feature Gap between simulated and real data. We investigate the effects of these two types of domain gaps and propose a novel uncertainty-aware vision transformer to effectively relief the Deployment Gap and an agent-based feature adaptation module with inter-agent and ego-agent discriminators to reduce the Feature Gap. Our intensive experiments on the public multi-agent cooperative perception datasets OPV2V and V2V4Real demonstrate that the proposed S2R-ViT can effectively bridge the gap from simulation to reality and outperform other methods significantly for point cloud-based 3D object detection.
|
|
13:30-15:00, Paper ThBT32-NT.2 | Add to My Program |
Eliminating Cross-Modal Conflicts in BEV Space for LiDAR-Camera 3D Object Detection |
|
Fu, Jiahui | Beihang University |
Gao, Chen | Beihang University |
Wang, Zitian | Beihang University |
Yang, Lirong | Meituan |
Wang, Xiaofei | Meituan |
Mu, Beipeng | Meituan |
Liu, Si | Beihang University |
Keywords: Computer Vision for Automation, Deep Learning for Visual Perception, Visual Learning
Abstract: Recent 3D object detectors typically utilize multi-sensor data and unify multi-modal features in the shared bird’s-eye view (BEV) representation space. However, our empirical findings indicate that previous methods have limitations in generating fusion BEV features free from cross-modal conflicts. These conflicts encompass extrinsic conflicts caused by BEV feature construction and inherent conflicts stemming from heterogeneous sensor signals.Therefore, we propose a novel Eliminating Conflicts Fusion (ECFusion) method to explicitly eliminate the extrinsic/inherent conflicts in BEV space and produce improved multi-modal BEV features. Specifically, we devise a Semantic-guided Flow-based Alignment (SFA) module to resolve extrinsic conflicts via unifying spatial distribution in BEV space before fusion. Moreover, we design a Dissolved Query Recovering (DQR) mechanism to remedy inherent conflicts by preserving objectness clues that are lost in the fusion BEV feature.In general, our method maximizes the effective information utilization of each modality and leverages inter-modal complementarity. Our method achieves state-of-the-art performance in the highly competitive nuScenes 3D object detection dataset.
|
|
13:30-15:00, Paper ThBT32-NT.3 | Add to My Program |
EMIFF: Enhanced Multi-Scale Image Feature Fusion for Vehicle-Infrastructure Cooperative 3D Object Detection |
|
Wang, Zhe | Institute for AI Industry Research, Tsinghua University |
Fan, Siqi | Tsinghua University |
Huo, Xiaoliang | Beihang University |
Xu, Tongda | Tsinghua University |
Wang, Yan | Tsinghua University |
Liu, Jingjing | Institute for AI Industry Research (AIR), Tsinghua University |
Chen, Yilun | Tsinghua University |
Zhang, Ya-Qin | Institute for AI Industry Research(AIR), Tsinghua University |
Keywords: Computer Vision for Transportation, Computer Vision for Automation, Deep Learning for Visual Perception
Abstract: In autonomous driving, cooperative perception makes use of multi-view cameras from both vehicles and infrastructure, providing a global vantage point with rich semantic context of road conditions beyond a single vehicle viewpoint. Currently, two major challenges persist in vehicle-infrastructure cooperative 3D (VIC3D) object detection: 1) inherent pose errors when fusing multi-view images, caused by time asynchrony across cameras; 2) information loss in transmission process resulted from limited communication bandwidth. To address these issues, we propose a novel camera-based 3D detection framework for VIC3D task, Enhanced Multi-scale Image Feature Fusion (EMIFF). To fully exploit holistic perspectives from both vehicles and infrastructure, we propose Multi-scale Cross Attention (MCA) and Camera-aware Channel Masking (CCM) modules to enhance infrastructure and vehicle features at scale, spatial, and channel levels to correct the pose error introduced by camera asynchrony. We also introduce a Feature Compression (FC) module with channel and spatial compression blocks for transmission efficiency. Experiments show that EMIFF achieves SOTA on DAIR-V2X-C datasets, significantly outperforming previous early-fusion and late-fusion methods with comparable transmission costs.
|
|
13:30-15:00, Paper ThBT32-NT.4 | Add to My Program |
Vehicle Intention Classification Using Visual Clues |
|
Klemp, Marvin | Karlsruhe Institute of Technology - KIT |
Wagner, Royden | KIT |
Rösch, Kevin | FZI Forschungszentrum Informatik |
Lauer, Martin | Karlsruhe Institute of Technology |
Stiller, Christoph | Karlsruhe Institute of Technology |
Keywords: Computer Vision for Transportation, Data Sets for Robotic Vision, Deep Learning for Visual Perception
Abstract: Classifying intentions of other traffic agents is an essential task for intelligent transportation systems. To simplify this task, vehicles are equipped with various illumination systems, including turn indicators, emergency lights, rear lights, and brake lights. We extend the Waymo open perception dataset with ground truth annotations for different visual intentions to develop methods designed to classify the state of such systems. Furthermore, we propose the visual intention former, a two-step transformer-based architecture to classify visual intentions in image sequences of tracked traffic participants. We use a vision transformer to extract image features, which are passed into a transformer encoder that reasons about temporal dependencies among them. We evaluate against different baseline architectures where our proposed method achieves state-of-the-art results. Additionally, we conduct an in-depth performance analysis of our method regarding different input sequence lengths, vehicle headings, and daytime conditions.
|
|
13:30-15:00, Paper ThBT32-NT.5 | Add to My Program |
CFDNet: A Generalizable Foggy Stereo Matching Network with Contrastive Feature Distillation |
|
Liu, Zihua | Tokyo Institute of Technology |
Li, Yizhou | Tokyo Institute of Technology |
Okutomi, Masatoshi | Tokyo Institute of Technology |
Keywords: Computer Vision for Transportation, Deep Learning for Visual Perception
Abstract: Stereo matching under foggy scenes remains a challenging task since the scattering effect degrades the visibility and results in less distinctive features for dense correspondence matching. While some previous learning-based methods integrated a physical scattering function for simultaneous stereo-matching and dehazing, simply removing fog might not aid depth estimation because the fog itself can provide crucial depth cues. In this work, we introduce a framework based on contrastive feature distillation(CFD). This strategy combines feature distillation from merged clean-fog features with contrastive learning, ensuring balanced dependence on fog depth hints and clean matching features. This framework helps enhance model generalization across both clean and hazy environments. Comprehensive experiments on synthetic and real-world datasets affirm the superior strength and adaptability of our method.
|
|
13:30-15:00, Paper ThBT32-NT.6 | Add to My Program |
Multi-Class Road Defect Detection and Segmentation Using Spatial and Channel-Wise Attention for Autonomous Road Repairing |
|
Yu, Jongmin | King's College London |
Chen, Chi Bene | King's College London |
Fichera, Sebastiano | University of Liverpool |
Paoletti, Paolo | University of Liverpool |
Mehta, Devansh | Robotiz3d |
Luo, Shan | King's College London |
Keywords: Computer Vision for Transportation, Deep Learning for Visual Perception, Intelligent Transportation Systems
Abstract: Road pavement detection and segmentation are critical for developing autonomous road repair systems. However, developing an instance segmentation method that simultaneously performs multi-class defect detection and segmentation is challenging due to the textural simplicity of road pavement image, the diversity of defect geometries, and the morphological ambiguity between classes. We propose a novel end-to-end method for multi-class road defect detection and segmentation. The proposed method comprises multiple spatial and channel-wise attention blocks available to learn global representations across spatial and channel-wise dimensions. Through these attention blocks, more globally generalised representations of morphological information (spatial characteristics) of road defects and colour and depth information of images can be learned. To demonstrate the effectiveness of our framework, we conducted various ablation studies and comparisons with prior methods on a newly collected dataset annotated with nine road defect classes. The experiments show that our proposed method outperforms existing state-of-the-art methods for multi-class road defect detection and segmentation methods.
|
|
13:30-15:00, Paper ThBT32-NT.7 | Add to My Program |
HPL-ViT: A Unified Perception Framework for Heterogeneous Parallel LiDARs in V2V |
|
Liu, Yuhang | Chinese Academy of Science |
Boyi, Sun | University of Chinese Academic of Science |
Li, Yuke | Waytous Co. Ltd |
Hu, Yuzheng | University of Illinois Urbana-Champaign |
Wang, Feiyue | Institute of Automation, Chinese Academy of Sciences |
Keywords: Computer Vision for Transportation, Deep Learning for Visual Perception, Object Detection, Segmentation and Categorization
Abstract: To develop the next generation of intelligent LiDARs, we propose a novel framework of parallel LiDARs and construct a hardware prototype in our experimental platform, DAWN (Digital Artificial World for Natural). It emphasizes the tight integration of physical and digital space in LiDAR systems, with networking being one of its supported core features. In the context of autonomous driving, V2V (Vehicle-to-Vehicle) technology enables efficient information sharing between different agents which significantly promotes the development of LiDAR networks. However, current research operates under an ideal situation where all vehicles are equipped with identical LiDAR, ignoring the diversity of LiDAR categories and operating frequencies. In this paper, we first utilize OpenCDA and RLS (Realistic LiDAR Simulation) to construct a novel heterogeneous LiDAR dataset named OPV2V-HPL. Additionally, we present HPL-ViT, a pioneering architecture designed for robust feature fusion in heterogeneous and dynamic scenarios. It uses a graph-attention Transformer to extract domain-specific features for each agent, coupled with a cross-attention mechanism for the final fusion. Extensive experiments on OPV2V-HPL demonstrate that HPL-ViT achieves SOTA (state-of-the-art) performance in all settings and exhibits outstanding generalization capabilities.
|
|
13:30-15:00, Paper ThBT32-NT.8 | Add to My Program |
FastOcc: Accelerating 3D Occupancy Prediction by Fusing the 2D Bird's-Eye View and Perspective View |
|
Hou, Jiawei | Fudan University |
Li, Xiaoyan | Chinese Academy of Sciences |
Guan, Wenhao | Fudan University |
Zhang, Gang | Damo Academy, Alibaba Group |
Feng, Di | Mogo Auto Intelligence and Telematics Information Technology Co |
Du, Yuheng | Fudan University |
Xue, Xiangyang | Fudan University |
Pu, Jian | Fudan University |
Keywords: Computer Vision for Transportation, Deep Learning for Visual Perception, Visual Learning
Abstract: In autonomous driving, 3D occupancy prediction outputs voxel-wise status and semantic labels for more comprehensive understandings of 3D scenes compared with traditional perception tasks, such as 3D object detection and bird's-eye view (BEV) semantic segmentation. Recent researchers have extensively explored various aspects of this task, including view transformation techniques, ground-truth label generation, and elaborate network design, aiming to achieve superior performance. However, the inference speed, crucial for running on an autonomous vehicle, is neglected. To this end, a new method, dubbed FastOcc, is proposed. By carefully analyzing the network effect and latency from four parts, including the input image resolution, image backbone, view transformation, and occupancy prediction head, it is found that the occupancy prediction head holds considerable potential for accelerating the model while keeping its accuracy. Targeted at improving this component, the time-consuming 3D convolution network is replaced with a novel residual-like architecture, where features are mainly digested by a lightweight 2D BEV convolution network and compensated by integrating the 3D voxel features interpolated from the original image features. Experiments on the Occ3D-nuScenes benchmark demonstrate that our FastOcc achieves state-of-the-art results with a fast inference speed.
|
|
13:30-15:00, Paper ThBT32-NT.9 | Add to My Program |
LPFormer: LiDAR Pose Estimation Transformer with Multi-Task Network |
|
Ye, Dongqiangzi | Tusimple |
Xie, Yufei | Tusimple |
Chen, Weijia | N/A |
Zhou, Zixiang | University of Central Florida |
Ge, Lingting | TuSimple Inc |
Foroosh, Hassan | University of Central Florida |
Keywords: Computer Vision for Transportation, Deep Learning for Visual Perception, Visual Learning
Abstract: Due to the difficulty of acquiring large-scale 3D human keypoint annotation, previous methods for 3D human pose estimation (HPE) have often relied on 2D image features and sequential 2D annotations. Furthermore, the training of these networks typically assumes the prediction of a human bounding box and the accurate alignment of 3D point clouds with 2D images, making direct application in real-world scenarios challenging. In this paper, we present the 1st framework for end-to-end 3D human pose estimation, named LPFormer, which uses only LiDAR as its input along with its corresponding 3D annotations. LPFormer consists of two stages: firstly, it identifies the human bounding box and extracts multi-level feature representations, and secondly, it utilizes a transformer-based network to predict human keypoints based on these features. Our method demonstrates that 3D HPE can be seamlessly integrated into a strong LiDAR perception network and benefit from the features extracted by the network. Experimental results on the Waymo Open Dataset demonstrate the state-of-the-art performance, and improvements even compared to previous multi-modal solutions.
|
|
ThBT33-CC Oral Session, CC-301 |
Add to My Program |
Integrated Planning and Learning |
|
|
Chair: Pu, Jian | Fudan University |
Co-Chair: Kyriakopoulos, Kostas | New York University - Abu Dhabi |
|
13:30-15:00, Paper ThBT33-CC.1 | Add to My Program |
A Tube-Based Reinforcement Learning Approach for Optimal Motion Planning in Unknown Workspaces |
|
Rousseas, Panagiotis | National Technical University of Athens |
Bechlioulis, Charalampos | University of Patras |
Kyriakopoulos, Kostas | New York University - Abu Dhabi |
Keywords: Motion and Path Planning, Optimization and Optimal Control
Abstract: In this work, a tube-based nearly optimal solution to motion planning in unknown workspaces is presented. The advantages of reactive motion planning are combined with a Policy Iteration Reinforcement Learning scheme to yield a novel solution for unknown workspaces that inherits provable safety, convergence and optimality. Moreover, in simply-connected workspaces, our method is proven to asymptotically provide the globally optimal path. Our method is compared against a provably asymptotically optimal RRT* method, as well as a relevant reactive method and provides satisfactory performance, closely matching or outperforming the former.
|
|
13:30-15:00, Paper ThBT33-CC.2 | Add to My Program |
Task-Oriented Active Learning of Model Preconditions for Inaccurate Dynamics Models |
|
LaGrassa, Alex | Carnegie Mellon University |
Lee, Moonyoung | Carnegie Mellon University |
Kroemer, Oliver | Carnegie Mellon University |
Keywords: Integrated Planning and Learning, Learning from Experience, Manipulation Planning
Abstract: When planning with an inaccurate dynamics model, a practical strategy is to restrict planning to regions of state-action space where the model is accurate: also known as a model precondition. Empirical real-world trajectory data is valuable for defining data-driven model preconditions regardless of the model form (analytical, simulator, learned, etc...). However, real-world data is often expensive and dangerous to collect. In order to achieve data efficiency, this paper presents an algorithm for actively selecting trajectories to learn a model precondition for an inaccurate pre-specified dynamics model. Our proposed techniques address challenges arising from the sequential nature of trajectories, and potential benefit of prioritizing task-relevant data. The experimental analysis shows how algorithmic properties affect performance in three planning scenarios: icy gridworld, simulated plant watering, and real-world plant watering. Results demonstrate an improvement of approximately 80% after only four real-world trajectories when using our proposed techniques. More material can be found on our project website: url{https://sites.google.com/view/active-mde}
|
|
13:30-15:00, Paper ThBT33-CC.3 | Add to My Program |
Risk-Predictive Planning for Off-Road Autonomy |
|
Lao Beyer, Lukas | Massachusetts Institute of Technology |
Ryou, Gilhyun | Massachusetts Institute of Technology |
Spieler, Patrick | JPL |
Karaman, Sertac | Massachusetts Institute of Technology |
Keywords: Integrated Planning and Learning, Planning under Uncertainty, Representation Learning
Abstract: Efficiently navigating off-road environments presents a number of challenges arising from their unstructured nature. In the absence of high-fidelity maps, occlusions from obstacles and terrain lead to limited information available to inform planning decisions. Furthermore, resolution and latency limitations of real-world perception systems lead to potentially of degraded perception performance when traversing such environments at high speeds. We address these problems by proposing an algorithm which plans trajectories while anticipating future observations. In particular, we introduce a model which learns to predict the evolution of future riskmaps conditioned on the future path and speed profile of the vehicle. The model is trained in a self-supervised fashion using recordings of vehicle trajectories. We then present an algorithm which leverages a way to efficiently query the model along candidate paths and speed profiles to produce time-optimal trajectories while maintaining a bound on the future expected risk. We assess the predictive performance of our risk model through a comparison with real vehicle driving logs. Furthermore, our closed-loop simulations of several benchmark scenarios demonstrate how the behavior of our planner leads to qualitatively distinct trajectories, leading to improvements in both success rate and speed by up to 60%.
|
|
13:30-15:00, Paper ThBT33-CC.4 | Add to My Program |
BeBOP -- Combining Reactive Planning and Bayesian Optimization to Solve Robotic Manipulation Tasks |
|
Styrud, Jonathan | ABB |
Mayr, Matthias | Lund University |
Hellsten, Erik | Lund University |
Krueger, Volker | Lund University |
Smith, Claes Christian | KTH Royal Institute of Technology |
Keywords: Integrated Planning and Learning, Reinforcement Learning, Behavior-Based Systems
Abstract: Robotic systems for manipulation tasks are increasingly expected to be easy to configure for new tasks. While in the past, robot programs were often written statically and tuned manually, the current, faster transition times call for robust, modular and interpretable solutions that also allow a robotic system to learn how to perform a task. We propose the method Behavior-based Bayesian optimization and Planning (BeBOP) that combines two approaches for generating behavior trees: we build the structure using a reactive planner and learn specific parameters with Bayesian optimization. The method is evaluated on a set of robotic manipulation benchmarks and is shown to outperform state-of-the-art reinforcement learning algorithms by being up to 46 times faster while simultaneously being less dependent on reward shaping. We also propose a modification to the uncertainty estimate for the Random Forest surrogate models that drastically improves the results.
|
|
13:30-15:00, Paper ThBT33-CC.5 | Add to My Program |
Motion Memory: Leveraging past Experiences to Accelerate Future Motion Planning |
|
Das, Dibyendu | George Mason University |
Lu, Yuanjie | George Mason University |
Plaku, Erion | George Mason University |
Xiao, Xuesu | George Mason University |
Keywords: Integrated Planning and Learning, Motion and Path Planning
Abstract: When facing a new motion-planning problem, most motion planners solve it from scratch, e.g., via sampling and exploration or starting optimization from a straight-line path. However, most motion planners have to experience a variety of planning problems throughout their lifetimes, which are yet to be leveraged for future planning. In this paper, we present a simple but efficient method called Motion Memory, which allows different motion planners to accelerate future planning using past experiences. Treating existing motion planners as either a closed or open box, we present a variety of ways that Motion Memory can contribute to reduce the planning time when facing a new planning problem. We provide extensive experiment results with three different motion planners on three classes of planning problems with over 30,000 problem instances and show that planning speed can be significantly reduced by up to 89% with the proposed Motion Memory technique and with increasing past planning experiences.
|
|
13:30-15:00, Paper ThBT33-CC.6 | Add to My Program |
Mitigating Causal Confusion in Vector-Based Behavior Cloning for Safer Autonomous Planning |
|
Guo, Jiayu | Fudan University |
Feng, Mingyue | Mogo Auto Intelligence and Telematics Information Technology Com |
Zhu, Pengfei | Mogo Ai |
Dou, Jinsheng | Mogo Auto Intelligence and Telematics Information Technology Co |
Feng, Di | Mogo Auto Intelligence and Telematics Information Technology Co |
Li, Chengjun | Mogo Auto Intelligence and Telematics Information Technology Com |
Wan, Ru | Mogo Ai |
Pu, Jian | Fudan University |
Keywords: Integrated Planning and Learning, Learning from Demonstration, Deep Learning Methods
Abstract: The utilization of vector-based deep learning techniques has great prospects in the realm of autonomous driving, particularly in the domains of prediction and planning tasks. However, the application of vector-based backbones for prediction and planning tasks may lead to the occurrence of causal confusion. Previous studies have explored the phenomenon of causal confusion, with a specific emphasis on the context of visual imitation learning. As for the vector-based model, we observe that the states of surrounding vehicles can be a nuisance shortcut. In our work, an off-policy approach is proposed to alleviate the issue by incorporating de-confounding supervision. Additionally, to better capture the environmental cues, such as route and traffic lights, in vectorized representation, a decoder utilizing iterative route fusion is devised. By incorporating auxiliary supervision and employing a dedicated decoder, we demonstrate the effectiveness of our methods in reducing causal confusion and improving performance in planning tasks through reactive and nonreactive closed-loop simulations on the nuPlan dataset.
|
|
13:30-15:00, Paper ThBT33-CC.7 | Add to My Program |
Learning-Based Motion Planning with Mixture Density Networks |
|
Wang, Yinghan | Shanghai Jiao Tong University |
He, Jianping | Shanghai Jiao Tong University |
Duan, Xiaoming | Shanghai Jiao Tong University |
Keywords: Motion and Path Planning, Imitation Learning, Deep Learning Methods
Abstract: The trade-off between computation time and path optimality is a key consideration in motion planning algorithms. While classical sampling based algorithms fall short of computational efficiency in high dimensional planning, learning based methods have shown great potential in achieving time efficient and optimal motion planning. The SOTA learning based motion planning algorithms utilize paths generated by sampling based methods as expert supervision data and train networks via regression techniques. However, these methods often overlook the important multimodal property of the optimal paths in the training set, making them incapable of finding good paths in some scenarios. In this paper, we propose a Multimodal Neuron Planner (MNP) based on the mixture density networks that explicitly takes into account the multimodality of the training data and simultaneously achieves time efficiency and path optimality. For environments represented by point clouds, MNP first efficiently compresses point clouds into a latent vector by encoding networks that are suitable for processing point clouds. We then design multimodal planning networks which enables MNP to learn and predict multiple optimal solutions. Simulation results show that our method outperforms SOTA learning based method MPNet and advanced sampling based methods IRRT* and BIT*.
|
|
13:30-15:00, Paper ThBT33-CC.8 | Add to My Program |
Subgoal Diffuser: Coarse-To-Fine Subgoal Generation to Guide Model Predictive Control for Robot Manipulation |
|
Huang, Zixuan | University of Michigan |
Lin, Yating | University of Michigan |
Yang, Fan | University of Michigan |
Berenson, Dmitry | University of Michigan |
Keywords: Integrated Planning and Learning, Deep Learning in Grasping and Manipulation
Abstract: Manipulation of articulated and deformable objects can be difficult due to their compliant and under-actuated nature. Unexpected disturbances can cause the object to deviate from a predicted state, making it necessary to use Model-Predictive Control (MPC) methods to plan motion. However, these methods need a short planning horizon to be practical. Thus, MPC is ill-suited for long-horizon manipulation tasks due to local minima. In this paper, we present a diffusion-based method that guides an MPC method to accomplish long-horizon manipulation tasks by dynamically specifying sequences of subgoals for the MPC to follow. Our method, called Subgoal Diffuser, generates subgoals in a coarse-to-fine manner, producing sparse subgoals when the task is easily accomplished by MPC and more dense subgoals when the MPC method needs more guidance. The density of subgoals is determined dynamically based on a learned estimate of reachability, and subgoals are distributed to focus on challenging parts of the task. We evaluate our method on two robot manipulation tasks and find it improves the planning performance of an MPC method, and also outperforms prior diffusion-based methods.
|
|
13:30-15:00, Paper ThBT33-CC.9 | Add to My Program |
Motion Planning As Online Learning: A Multi-Armed Bandit Approach to Kinodynamic Sampling-Based Planning |
|
Faroni, Marco | Politecnico Di Milano |
Berenson, Dmitry | University of Michigan |
Keywords: Motion and Path Planning, Integrated Planning and Learning
Abstract: Kinodynamic motion planners allow robots to perform complex manipulation tasks under dynamics constraints or with black-box models. However, they struggle to find high-quality solutions, especially when a steering function is unavailable. This paper presents a novel approach that adaptively biases the sampling distribution to improve the planner's performance. The key contribution is to formulate the sampling bias problem as a non-stationary multi-armed bandit problem, where the arms of the bandit correspond to sets of possible transitions. High-reward regions are identified by clustering transitions from sequential runs of kinodynamic RRT and a bandit algorithm decides what region to sample at each timestep. The paper demonstrates the approach on several simulated examples as well as a 7-degree-of-freedom manipulation task with dynamics uncertainty, suggesting that the approach finds better solutions faster and leads to a higher success rate in execution.
|
|
ThBL-EX Poster Session, Exhibition Hall |
Add to My Program |
Late Breaking Results Poster VIII |
|
|
|
13:30-15:00, Paper ThBL-EX.1 | Add to My Program |
Visual-Inertial Pose Estimation of Externally-Actuated Modular Manipulators |
|
Cho, Jaehwi | Seoul National University |
Kang, Jiseock | Seoul National University |
Kong, Doyoon | Seoul National University |
Lee, Dongjun | Seoul National University |
Keywords: Aerial Systems: Perception and Autonomy, Sensor Fusion, Localization
Abstract: In this poster session, we present a visual-inertial pose estimation method for externally-actuated modular manipulators (EAMMs), a new type of manipulator consisting of multiple rotor-actuated links connected together via joints. While this structure enables the realization of large and dexterous manipulator, it necessitates accurate pose estimation for all links to ensure stable flight and precise manipulation. However, directly applying visual-inertial odometry (VIO) to all links may lead to excessive computation and hardware complexity. To overcome this challenge, we employ factor graph optimization, defining the robot state as a combination of joint angles, joint speeds, and IMU biases. This approach leverages the kinematic structure of EAMMs to enhance the speed and robustness of the framework. We conduct experiments to validate the real-time capability and high accuracy of the proposed framework.
|
|
13:30-15:00, Paper ThBL-EX.2 | Add to My Program |
Influence of Pole Assignment Constraint in P-PI Vibration Suppression Control for Motion Control Systems |
|
Urakawa, Yoshiyuki | Nippon Institute of Technology |
Ngamlamai, Sirichai | Nippon Institute of Technology |
Keywords: Motion Control, Actuation and Joint Mechanisms, Automation at Micro-Nano Scales
Abstract: Positioning system with the motor is widely used in industrial use. Generally, the controllers of the positioning systems are proportional-to-proportional-integral (P-PI) controllers. P-PI controllers have velocity PI loops inside, and position P loops outside. They are easy to tune and show small overshoot. However, authors showed that the P-PI controller has a constraint of pole assignment and the feedforward controller for the velocity loop is the solution to improve the disturbance responses. On the other hand, vibration suppression PI velocity controllers are investigated widely. They should be used for positioning systems to suppress the vibration as P-PI vibration suppression controllers. Authors also showed that there is the pole assignment constraint also on the P-PI vibration suppression control. However, the influence of the constraint was not obvious. In this poster, we analyzed the condition of the constraint and simulated that the constraint depends on the plant and confirm that the poles of the closed loop are assigned farer from the origin, the constraint pole comes close to the origin.
|
|
13:30-15:00, Paper ThBL-EX.3 | Add to My Program |
Posture Dependent Variable Transmission Mechanism for Prosthetic Hand Inspired by Human Grasping Characteristics |
|
Chang, Mun Hyeok | Seoul National University Biorobotics Lab |
Jeong, Inchul | Seoul National University |
Park, Jong Hoo | Seoul National University |
Choi, Hyungmin | Seoul National University |
Cho, Kyu-Jin | Seoul National University, Biorobotics Laboratory |
Keywords: Actuation and Joint Mechanisms, Prosthetics and Exoskeletons, Mechanism Design
Abstract: Gripping objects firmly and quickly is an important function of the human hand for everyday life. Prosthetic devices face significant challenges in replicating these capabilities, particularly in achieving a delicate balance between swift grasping and substantial grip strength while adhering to weight and form-factor constraints. To address these challenges, this study introduces a novel Posture Dependent Variable Transmission (PDVT) that mimics the human hand's behavior by employing a spiral-shaped spool. The PDVT's spiral-shaped spool replicates the human hand's quick and gentle pre-contact movements followed by a stronger force application after contact with the object. Additionally, a compressive series elastic spring enhances tendon tension across a wide range of finger postures. The manufacturing method of PDVT, utilizing both 3D printing and metal processing, enables the creation of complex spiral shapes. The PDVT demonstrates improvements in both speed and grip strength compared to conventional rigid spool mechanisms. The PDVT has the potential to be applied to various robotic grasping systems.
|
|
13:30-15:00, Paper ThBL-EX.4 | Add to My Program |
APF-MRPPO: Navigating Graph-Based Maps with APF and Multi-Reward Reinforcement Learning |
|
O'Hara, Christopher | The University of Tokyo |
Yairi, Takehisa | University of Tokyo |
Keywords: Reinforcement Learning, Legged Robots, Motion and Path Planning
Abstract: This research develops an adaptive navigation algorithm for robots operating in hazardous environments, specifically leveraging Boston Dynamics' Spot, equipped with a graph-based navigation system incorporating cameras, depth sensors, and laser scanning. The algorithm utilizes Reinforcement Learning with Artificial Potential Fields (APF), forming a Multi-Reward Proximal Policy Optimization (PPO) variant that interprets waypoints as attractive forces and hazards as repulsive forces for pathfinding. The MRPPO algorithm inherits the potential field characteristics as part of an environment map reward strategy. This APF-MRPPO approach enables Spot to navigate complex terrains, such as nuclear reactors and disaster sites, optimizing paths and avoiding hazards in real-time by balancing local and global navigational objectives. Initial testing in simulated environments (Unity and ROS) showcases the algorithm's potential, maintaining high success rates despite increased obstacle proximity. The ongoing work focuses on improving algorithmic efficacy with enhanced reward functions, more sophisticated potential fields, and integration of Graph Neural Networks for dynamic, context-aware path planning—aiming for robust adaptability in changeable environments.
|
|
13:30-15:00, Paper ThBL-EX.5 | Add to My Program |
Evaluation of Robust Visual Servoing for Cucumis Melo Harvesting Robot |
|
Park, Woojun | Chonnam National University |
Park, Yonghyun | Chonnam National University |
Kim, Changjo | Chonnam National University |
Son, Hyoung Il | Chonnam National University |
Keywords: Robotics and Automation in Agriculture and Forestry, Agricultural Automation, Perception for Grasping and Manipulation
Abstract: This study introduces robust visual servoing using computer vision for fast and stable fruit pedicel estimation. The key focus of this research is to perform Video Stabilization (ViS) to maintain stable detection even under unstable conditions. Additionally, Fast Point Feature Histogram (FPFH) is utilized for pedicel estimation. By leveraging FPFH, the system detects stems and estimates the pose of harvested crops through curvature vector differences in point clouds. This robust visual servoing system is capable of swift and stable detection. The performance of this system was evaluated by measuring Position Error (PE) using motion capture cameras in a laboratory environment and calculating Root Mean Square Error (RMSE) based on it. Furthermore, experimentation was conducted targeting cucumber, korea melon, and tomato to explore the practical applicability. In the future, based on these results, we will investigate the system's performance in the field, identify issues, and make improvements accordingly.
|
|
13:30-15:00, Paper ThBL-EX.6 | Add to My Program |
Spatiotemporal Analysis System for Precision Aerial Spraying |
|
Ju, Eunji | Chonnam National University |
Seol, Jaehwi | Chonnam National University |
Kim, Changjo | Chonnam National University |
Son, Hyoung Il | Chonnam National University |
Keywords: Aerial Systems: Applications, Sensor-based Control, Deep Learning Methods
Abstract: This study proposes a perception and analysis method for precise aerial spraying based on three dimensional deep learning. Point cloud data for water droplets are acquired using 3D LiDAR, and the PointNet++ deep learning model is trained to classify and segment the spraying pattern. Then, spatial-temporal data are processed for the segmented point cloud data. Spatial data processing is accomplished by employing the k-means and the Voronoi diagram. The k-means calculates four centroids to distinguish each nozzle in the spatial data. After completing the clustering task, the spray shape emitted from the nozzle is classified and visualized through the Voronoi diagram. As such, at each nozzle, the spraying form is clustered through spatial data processing, and clustering is based on this information. This approach allows each nozzle to be distinguished and mapped. Time data processing is accomplished by employing the Kalman filter. Processing temporal data compensates for unsensed or noisy data points and predicts the trajectory of the water droplets, enhancing the spraying data. This method enables a more accurate measurement of the water droplet shape. Experiments changing the flight conditions of UAV were conducted to assess the proposed framework, demonstrating that processing is feasible in the onboard system of the UAV. This method demonstrates the potential applicability of the proposed methods in control systems for precise spraying in the future.
|
|
13:30-15:00, Paper ThBL-EX.7 | Add to My Program |
Development of a Mobile Robotic System for Remote Autonomous Inspection and Digitilizing of Industrial Plants with Critical Infrastructure |
|
Zinanyuca Yábar, Miguel Andrés | Pontificia Universidad Católica Del Perú |
Hilario Poma, Javier Alfredo | Pontificia Universidad Católica Del Perú |
Jara Rios, Jose Alonso | Pontificia Universidad CatÓlica Del PerÚ |
Cabrera Yi, Eduardo Augusto | Pontificia Universidad Católica Del Perú |
Leiva, Martin | PUCP |
Miyahira Yagui, Alessandro | Pontificia Universidad Catolica Del Peru |
Rivadeneira, Franco | Pontificia Universidad Catolica Del Peru |
Cuellar, Francisco | Pontificia Universidad Catolica Del Peru |
Rivera Farfan, Juan Diego | Universidad Católica San Pablo |
Castillo-Araníbar, Patricia | Universidad Católica San Pablo, Departamento De Ingeniería Eléct |
Keywords: Autonomous Agents, Industrial Robots, Deep Learning for Visual Perception
Abstract: Monitoring tasks in industrial plants usually require routine inspection tasks of different elements of the plant, where the delay in the detection of failures of any component can be critical. It is in this context that the proposed system has a mobile platform with a modular articulated arm with 6 degrees of freedom mounted on it, allowing it to perform the inspection tasks autonomously. On the other hand, this system can detect gauges in industrial settings, aiding in data gathering using YOLO V8 architecture. The mobile platform and the robotic arm were implemented, being able to travel through irregular surfaces such as stairs and slopes [5]. Furthermore, the autonomous navigation was validated using ROS Framework. Finally a predictive model were integrated, achieving a F1-Score of 87.72%.
|
|
13:30-15:00, Paper ThBL-EX.8 | Add to My Program |
Design of a Low Backlash and Backdrivable Gearbox for Robot Actuators |
|
Shin, Wonseok | KITECH |
Park, Seungtae | Korea National University of Science and Technology |
Kang, Jihun | UST Graduate School |
Ahn, Bummo | Korea Institute of Industrial Technology |
Kwon, Suncheol | KITECH |
Keywords: Actuation and Joint Mechanisms, Compliant Joints and Mechanisms, Mechanism Design
Abstract: This paper provides the framework of a low backlash and high back-drivable gear based on the modeling and simulation. Robot actuators generally combined electric motor with a high gear ratio gearbox to satisfy the high torque capacity. The 3K compound planetary gearbox with a high gear ratio is a promising candidate for the robot actuators deployed in physically interacting environments based on the high transmission efficiency, which endows the high back-drivability. However, compared to the high ratio gearbox such as harmonic drive, the backlash of the existing 3k compound planetary gearbox is known to be high. In this sense, the proposed design framework includes 1) forward and backward transmission efficiency modeling 2) derivation of the backlash 3) maximization of the efficiency under bounded backlash constraints. The simulation result indicates that the backlash could be improved from 24 acrmin to 2 arcmin. We confirmed that the forward drive efficiency is approximately 86%, and the backward drive efficiency is approximately 84%. We expect this knowledge could provide the intuition for the gear design of low backlash and high efficiency for physically interacting robot actuators.
|
|
13:30-15:00, Paper ThBL-EX.9 | Add to My Program |
Video-Based Human-Object Interaction Detection with Pairwise Tracking |
|
Shao, Zhanpeng | Hunan Normal University |
Peng, Yuxiao | Hunan Normal University |
Wang, Xueping | Hunan Normal University |
Li, You-Fu | City University of Hong Kong |
Keywords: Recognition, Human-Robot Collaboration, Human Detection and Tracking
Abstract: The Human-Object Interaction (HOI) detection aims to locate all human and object instances in an image or video and infer their interactions. HOI can help robots understand human activities in a high semantic level in the context of the human-robot interaction. Currently, Most work is dedicated to image-based HOI detection, while video-based HOI detection is till under-explored. This work integrates the multi-object tracking and HOI detection into one framework, Human-Object Pairwise Tracking with tRansformer (HOPTR), where the human-object pairs are considered as tracking pairs. During tracking, HOPTR predicts HOIs at each frame while maintaining a temporal trajectory for each human-object pair. Such strategy could enforce a temporal consistency to learn more effective spatio-temporal features and keep a set of consistent HOI detections in video streams.
|
|
13:30-15:00, Paper ThBL-EX.10 | Add to My Program |
Fast Image Quality Assessment for Semantic Segmentation |
|
Farahani, Mohammad | The Chinese University of Hong Kong |
Lau, Darwin | The Chinese University of Hong Kong |
Keywords: Deep Learning for Visual Perception, Object Detection, Segmentation and Categorization
Abstract: Semantic segmentation is essential for robots to understand their surroundings by labeling every part of an image. While current methods are getting better, they still struggle with unfamiliar scenes and conditions like changes in illumination, unexpected objects, shadows, noise, etc. Our work introduces SSIMA (Semantic Segmentation IMage Assessment), a compact, quick model designed to evaluate an image's quality for semantic segmentation before actual processing, akin to a human's intuition. It works without needing the original model or any correct answers. SSIMA was developed using a unique training approach based on synthetic changes to images the main model has not seen, assessing errors in its predictions. Early tests on a Jetson Xavier NX showed promising results, achieving a 0.103 Root Mean Square Error (RMSE) and running at 71 frames per second (14 ms per frame).
|
|
13:30-15:00, Paper ThBL-EX.11 | Add to My Program |
A General Purpose Service Robot System Capable of Handling Commands Containing Abstract Nouns |
|
Yamao, Kosei | Kyushu Institute of Technology |
Kanaoka, Daiju | Kyushu Institute of Technology |
Isomoto, Kosei | Kyushu Institute of Technology |
Tamukoh, Hakaru | Kyushu Institute of Technology |
Keywords: Task and Motion Planning, Computer Vision for Automation, Intention Recognition
Abstract: A general-purpose service robot system must understand and execute various types of human commands, such as "Bring me the object behind the lemon." in a real-world environment. However, generating task plans from complex commands and interpreting the positional relationships of objects such as "the object behind the lemon" is challenging. In this study, we propose 1) a task planning system using only a large language model with rule-based constraints and 2) a system to interpret the relevance of target objects using object detection and visual language model (VLM). Our task planning system allows robots to plan tasks more accurately, with rule-based constraints excluding inappropriate skills. Next, our object relevance interpretation system overlays marks on objects in the image based on the result of object detection. After that, VLM interprets the relevance of the marked object in the image based on the question and outputs an answer. In the experiments, we confirmed that the task planning system showed its practical applicability and effectiveness by achieving high results in tests from RoboCup@Home2023. Furthermore, the object relevance interpreting system was effective in experiments focused on interpreting relevant object locations compared to previous methods.
|
|
13:30-15:00, Paper ThBL-EX.12 | Add to My Program |
Robotic Manipulation and Multi-Physical Characterization with a Picometer-Scale Resolution |
|
Zhang, Wenqi | City University of HongKong |
Hou, Chaojian | City University of Hong Kong |
Chen, Donglei | City University of HongKong |
Wang, Shuideng | City University of Hongkong |
Qu, Zhi | City University of Hong Kong |
Yu, Zejie | City University of Hong Kong |
Dong, Lixin | City University of Hong Kong |
Keywords: Micro/Nano Robots, Nanomanufacturing
Abstract: Nanorobotic manipulation (NRM) enables precise handling of micro/nano-objects for observation and assembly under microscopes. Achieving only nanometerscale precision, its accuracy has been bottlenecked by limited dynamic microscope imaging. Spherical aberration correction (Cs) in transmission electron microscopes (TEM) breaks this limit, offering sub-angstrom resolution which is expected to boost the NRM precision. Besides, TEM’s confined space restricts applying various physical environments, crucial for maximizing TEM-based NRM insights into performance-mechanism correlation and device-level prototyping. Therefore, advancements in positioning accuracy and versatile nanorobotic manipulators, compatible across diverse physical environments, are key for evolving TEM-based NRM. Here, we proposed an advanced NRM system built inside a Cs-TEM for achieving the precise manipulation and multi-physical sitmuli simutaneously. The ultra-high resolution of 204 pm, 171 pm, and 140 pm in X, Y, and Z directions is achieved by using the manipulator of our NRM system insde a Cs-TEM.Diverse physical environments, including electrical, thermal, optical, liquid, and magnetic are successfully integrated for operando characterization. This NRM system is shaped into a powerful platform for the investigation of performance-mechanism correlations and device-level prototyping.
|
|
13:30-15:00, Paper ThBL-EX.13 | Add to My Program |
Occlusion-Aware Contactless Surface Tracking Control Using 3-D Point Cloud Registration for Body Search |
|
Kitahara, Tadamasa | National Defense Academy of Japan |
Tsujita, Teppei | National Defense Academy of Japan |
Abiko, Satoko | Shibaura Institute of Technology |
Sato, Daisuke | Tokyo City University |
Keywords: Sensor-based Control, Surveillance Robotic Systems, Telerobotics and Teleoperation
Abstract: In this research, we aim to perform body search by robot. We discuss contactless body search, such as the detection of hazardous materials with a portable metal detector. Research on contactless surface tracking tasks using manipulators, e.g., vehicle painting, require that the shape of the object already be known or that the object be stationary. However, neither the shape of a person is known nor is a person a stationary object. Therefore, we use 3D-LiDAR to obtain the surface shape in real time. When obtain the shape in real time, the surface point cloud cannot be obtained directly due to occlusion by end-effector, and the distance control cannot be performed. To solve this problem, we propose a method to keep the distance between the end-effector and the unknown moving object surface below a certain distance while recognizing the shape using point cloud even in the presence of occlusion. The surface shape without occlusion and deficient points is acquired as point cloud before occlusion occurs, and when occlusion occurs, the point cloud of the target that is partially deficient due to occlusion is superimposed on the previously acquired point cloud without deficient points. In this way, the point cloud of the deficient area can be estimated. Experimental results show that the distance between the end-effector and the unknown moving target surface can be kept below a certain distance even in the presence of occlusion, thus clarifying the usefulness of the proposed method.
|
|
13:30-15:00, Paper ThBL-EX.14 | Add to My Program |
Information-Driven Search Strategy for Multiple Hazardous Gas Sources in Turbulent Environments |
|
Jang, Hongro | UNIST |
Seo, Jaemin | UNIST |
Oh, Hyondong | UNIST |
Keywords: Environment Monitoring and Management, Planning under Uncertainty
Abstract: This study proposes an information-driven two-stage parallel particle filter method to address the challenge of estimating and searching for the unknown number of hazardous gas sources using a mobile sensor in turbulent environments. In the first stage, the estimation particle filter is designed to accurately estimate the number and state of the sources. The second stage, the decision-making particle filter, focuses on refining the search strategy by incorporating the fusion range and potential field concept to effectively differentiate between overlapping sources and avoid possible local minima. Subsequently, information gained from the decision-making particle filter enables the mobile sensor to efficiently explore the search area while obtaining more informative measurements, ensuring the balanced exploitation and exploration search behavior. The proposed algorithm also provides a clear criterion for determining when to conclude the search, leveraging the coverage information from the occupancy grid map and the high confidence in source estimation from the estimation particle filter. Extensive numerical simulations validate the efficacy of this approach in simultaneously estimating and searching for multiple sources in turbulent environments.
|
|
13:30-15:00, Paper ThBL-EX.15 | Add to My Program |
Enhancing Active SLAM with Illuminating the Occluded Area |
|
Lee, Handong | Korea Advanced Institute of Science and Technology |
Ryu, Jee-Hwan | Korea Advanced Institute of Science and Technology |
Keywords: Deep Learning for Visual Perception, Vision-Based Navigation, AI-Based Methods
Abstract: Navigating dynamic environments like underground parking lots, with their ever-changing and occluded spaces, challenges traditional navigation methods. In this context, Active Simultaneous Localization and Mapping (Active SLAM) provides an innovative solution. Its adaptability makes it superior for underground navigation where environments frequently change, and areas are often obscured. Our research focuses on enhancing navigation accuracy through the creation of Bird-Eye-View (BEV) local maps that incorporate these recovered occluded spaces. To address the dynamic obstacles of occludedd environments, we employed Isaac Sim, a cutting-edge simulation platform. By simulating various floor plans, we have accumulated a global map through odometry point clouds. Portions of this global map serve as ground truth, against which we validate the BEV images recovered from occlusion. Our proposed method and subsequent validation demonstrate its potential to significantly improve the autonomy and reliability of Active SLAM systems, propelling autonomnous platform toward the goal of creating independent, robust, and self-sufficient navigation systems that autonomous platform never visited before or surrounded by occluded area.
|
|
13:30-15:00, Paper ThBL-EX.16 | Add to My Program |
Sim2real: Robust Humanoid Walking on Compliant and Uneven Terrain with Deep Reinforcement Learning |
|
Singh, Rohan Pratap | Univerity of Tsukuba, National Institute of Advanced Industrial |
Morisawa, Mitsuharu | National Inst. of AIST |
Benallegue, Mehdi | AIST Japan |
Xie, Zhaoming | Stanford University |
Kanehiro, Fumio | National Inst. of AIST |
Keywords: Humanoid and Bipedal Locomotion, Humanoid Robot Systems, Sensorimotor Learning
Abstract: For the deployment of legged robots in real-world environments, it is essential to develop robust locomotion control methods for challenging terrains that may exhibit unexpected deformability and irregularity. In this paper, we explore the application of sim2real deep reinforcement learning (RL) for the design of locomotion controllers for large-sized humanoid robots on compliant and uneven terrains. Our key contribution is to show that a simple training curriculum for exposing the RL agent to randomized terrains in simulation can achieve robust walking on the real humanoid robot using only proprioceptive feedback. We train an end-to-end omnidirectional locomotion policy using the proposed approach and show extensive real robot demonstration on the HRP-5P humanoid over several difficult terrains inside and outside the lab environment. Additionally, we propose a new control policy to enable modification of the observed clock signal, leading to adaptive gait frequencies depending on the terrain and command velocity. In simulation experiments, we show the effectiveness of this policy specifically for walking over challenging terrains by controlling swing and stance durations.
|
|
13:30-15:00, Paper ThBL-EX.17 | Add to My Program |
Collision-Free Path for Real Time Vehicle Teleoperation |
|
Kashwani, Fatima | Khalifa University |
Hassan, Bilal | Khalifa University, Abu Dhabi |
Kong, Peng-Yong | Khalifa University |
Khonji, Majid | Khalifa University |
Dias, Jorge | Khalifa University |
Keywords: Telerobotics and Teleoperation, Motion and Path Planning, Computer Vision for Transportation
Abstract: Remote driving, known as vehicle teleoperation, requires the use of predictive methods to maintain a real-time system, due to the delay between operator and vehicle. To this end, we propose a predictive display based on a segmentation-detection framework. We showcase our dual-transformer network (DTNet) and its performance both with image datasets and real-time video applications. Our model achieves mIOU score of 83.89% for road free space segmentation and an impressive mAP score of 34.20% for road object detection, in addition to smooth and robust results in real-time applications.
|
|
13:30-15:00, Paper ThBL-EX.18 | Add to My Program |
Development of an Aerial Search Systemfor Habitat Candidates in Unknown Environments |
|
Son, Hyoung Il | Chonnam National University |
Kim, Bosung | Chonnam National University |
Pak, Jeonghyeon | Chonnam National University |
Keywords: Agricultural Automation, Field Robots
Abstract: From an ecosystem management and protection perspective, tracking harmful species such as the black-backed wasp (Vespa velutina) is one of the important areas. Recently, sensor network-based tracking research has been active for ecosystem management through population control, but small sensors for attaching to insects cause problems in habitat exploration, including inevitable tracking errors. This study proposes an aerial search system for habitat candidates in unknown environments that can compensate for these errors. This search and tracking technology attaches and releases a sensor to the tracking target, and then an unmanned aerial vehicle equipped with a directional antenna tracks the target. The antenna rotates and collects signals for 360 degrees at 10-degree increments, and estimates the strongest signal as the target's location. Afterwards, at the end of the tracking, a drone equipped with a camera takes aerial images for a radius of 100m based on the tracking end point. Afterwards, a 3D map is created using the captured images, and the habitat is explored and GPS information is extracted through deep learning-based object recognition of the results.
|
|
13:30-15:00, Paper ThBL-EX.19 | Add to My Program |
Object Orientation Estimation Using TRIAD and Oriented Bounding Box Based Object Detection |
|
Won, Seungjae | University of Science and Technology |
Lee, Hunjo | Korea University of Science and Technology, Korea Institute of I |
Han, JiWoong | KITECH, University of Science & Technology |
Pyo, Dongbum | Korea Institute of Industrial Technology |
Yang, Gi-Hun | KITECH |
Keywords: Computer Vision for Automation, RGB-D Perception
Abstract: Robots often need to interact with objects, especially in the field of driving pegs into holes, which requires pose estimation algorithms to combine pegs and holes. To successfully combine the peg and hole, it is important to know not only the position but also the orientation. Various 6D pose estimation algorithms have been developed in the field of robot vision; however, they have been performed on relatively large objects and require object model information or a series of images, masks, etc. Socket 6D pose estimation requires knowledge of the various shapes of the objects and the location of the sockets on them. Sockets and connectors are embedded in objects, and it takes a lot of data to know them all, but the sockets and connectors themselves are already well structured and known. Therefore, we propose an object orientation estimation algorithm for ports that can be applied to any object without end-to-end 6D pose estimation. We use the TRIAD method and the oriented object detection model for orientation estimation for ports, and experiment on objects rotating around the X-axis or X and Z axes.
|
|
13:30-15:00, Paper ThBL-EX.20 | Add to My Program |
Abraded Optical Fibre-Based Dynamic Range Force Sensor for Tissue Palpation |
|
Dawood, Abu Bakar | Queen Mary University of London |
Althoefer, Kaspar | Queen Mary University of London |
Keywords: Surgical Robotics: Laparoscopy, Soft Sensors and Actuators, Soft Robot Materials and Design
Abstract: Tactile information gleaned through palpation plays a crucial role in relation to surface characterization and tissue differentiation - an essential clinical requirement during surgery. In the case of Minimally Invasive Surgery, access is restricted, and tactile feedback available to surgeons is therefore reduced. This poster presents a stiffness controllable, dynamic force range sensor that can provide remote haptic feedback. The sensor has an abraded optical fibre integrated into a silicone dome. Forces applied to the dome change the curvature of the optical fibres, resulting in light attenuation. By changing the pressure within the dome and thereby adjusting the sensor’s stiffness, we are able to modify the force measurement range.
|
|
13:30-15:00, Paper ThBL-EX.21 | Add to My Program |
Energy Efficient Legged Robot Structure Using Pneumatic-Electric Hybrid Actuator |
|
Kim, Yongjun | KAIST |
Ryu, Jee-Hwan | Korea Advanced Institute of Science and Technology |
Keywords: Hydraulic/Pneumatic Actuators, Legged Robots, Force Control
Abstract: Due to their capability to navigate complex terrains unattainable by wheeled robots, interest in bipedal and quadrupedal robots is increasing. However, the continuous energy consumption of motors, even when idle, poses challenges for long-term operations. Integrating pneumatic actuators with electric motors can reduce energy consumption in static phases, potentially extending operational durations. Notably, the inherent compliance of pneumatic actuators permits movement within a limited range using motors, even when the valves are locked. In this paper, we integrate Pneumatic Electric Hybrid Actuators (PEHA) in parallel with the structure of the MIT mini cheetah. Experimental results have shown that pneumatic cylinders provide stable payload compensation for a static load of 1 kg. Trajectory tracking was achievable both when controlling the pressure of the cylinders and after initial pressurization followed by valve locking. Future research will explore control methods linked with high-level motion controllers.
|
|
13:30-15:00, Paper ThBL-EX.22 | Add to My Program |
Underactuated Robotic Finger with SMA Spring for Switching between Adaptive Grasp Mode and Finger-Tip Force Mode |
|
Jeon, Hyerim | KITECH, Incheon Univ |
Kim, Yeongjin | Incheon National Univeristy |
Yang, Gi-Hun | KITECH |
Choi, Hyeunseok | KITECH |
Keywords: Grippers and Other End-Effectors, Agricultural Automation, Grasping
Abstract: Under-constrained systems are widely utilized in robotics due to their lower control complexity despite their high degree of freedom. In an under-constrained linkage drive system, springs are used to prevent undesired motion caused by the weight and inertia of the links. In this case, since the deformation energy of the spring equals the work done by the actuator, the stiffness of the spring is proportional to the resistance to rotation of the link. Therefore, it is desirable to select the appropriate stiffness of spring based on the task. For instance, choosing low stiffness is effective when grasping fragile objects, while selecting high stiffness is efficient for tasks requiring high fingertip force. In this work, we propose a robotic finger that can change grasp mode by controlling stiffness of spring using the shape memory alloy which has different stiffness depending on the temperature. The experimental results showed that stiffness of spring is inversely proportional to the driving efficiency of the gripper and proportional to the maximum contact force. And it was confirmed that the operation mode of the linkage system can be converted by controlling the current applied to the SMA spring. This work can contribute to fruit harvesting and food packaging, which require not only adaptive grasp but also high finger-tip force.
|
|
13:30-15:00, Paper ThBL-EX.23 | Add to My Program |
Pompeii Robotic Vision: RINGHIO and DDB for Heritage Preservation |
|
Marchello, Gabriele | Istituto Italiano Di Tecnologia |
Galdelli, Alessandro | Universitŕ Politecnica Delle Marche |
Baizid, Khelifa | Italian Institute of Technology |
D'Imperio, Mariapaola | Istituto Italiano Di Tecnologia |
Martini, Michele | Italian Institute of Technology |
Zambrano, Alessandra | Parco Archeologico Di Pompei |
Mancini, Adriano | Universitŕ Politecnica Delle Marche |
Frontoni, Emanuele | Universitŕ Politecnica Delle Marche |
Traviglia, Arianna | Istituto Italiano Di Tecnologia |
Cannella, Ferdinando | Istituto Italiano Di Tecnologia |
Keywords: Art and Entertainment Robotics, Mechanism Design, Data Sets for Robotic Vision
Abstract: The inspection of archaeological sites is a multifaceted work that combines technology, tradition and interdisciplinary knowledge. In this context, we present a robotic system designed to autonomously detect defects and damages on the walls of ancient buildings that make up the Pompeii Archaeological Park. Our system consists of an autonomous rover (RINGHIO: Robot for Inspection and Navigation to Generate Heritage and Infrastructures Observation) equipped with a vision system mounted on a vibration compensation device (DDB: Defect Detection Box). We have conducted rigorous tests on a carefully selected section of the ancient road leading to Porta Stabia and an insula strategically chosen due to various building wall defects and complex topography. This study highlights the fundamental role of automated survey techniques in the preservation of our shared cultural heritage. The preservation of archaeological sites is of great importance and this work demonstrates the importance of collaborative efforts to ensure the longevity of these priceless monuments for future generations.
|
|
13:30-15:00, Paper ThBL-EX.24 | Add to My Program |
Modelling and Control of Multiple Coupled Dielectric Elastomer Actuators |
|
Li, Jisen | Shenzhen Institute of Artificial Intelligence and Robotics for S |
Cheng, Anjing | The Chinese University of Hong Kong, Shenzhen |
Wang, Hao | The Chinese University of Hong Kong, Shenzhen |
Zhu, Jian | Chinese University of Hong Kong, Shenzhen |
Keywords: Modeling, Control, and Learning for Soft Robots, Soft Sensors and Actuators, Soft Robot Applications
Abstract: Dielectric elastomer actuators (DEAs) have emerged as promising artificial muscles for soft robots due to their ability to undergo large voltage-induced deformation and exhibit bioinspired muscle-like motion. However, the nonlinear response of DEAs, arising from rate-dependent viscoelasticity, poses challenges in their modeling and control. While previous studies have mainly focused on analyzing single-degree-of-freedom DEAs, the modeling and control of multiple coupled DEAs remain unexplored and challenging. This paper presents a novel framework for modeling and control of multiple DEAs, leveraging the sparse identification method. This method enables the development of parsimonious governing equations that accurately capture the hysteresis, creep, and cross-coupling effects of DEAs. The model incorporates prior knowledge and/or experience on DEAs and can be further validated using experimental data. By utilizing the identified explicit dynamic equations, various model predictive controllers can be designed to enable DEAs to track desired trajectories with specific task requirements. Furthermore, the proposed method can be extended to model and analyze other soft actuators, such as pneumatic actuators and shape memory polymers, thereby unlocking the full potential of soft actuators and soft robots.
|
|
13:30-15:00, Paper ThBL-EX.25 | Add to My Program |
Explainable Decision Making for Autonomous Driving with LLMs |
|
Feng, Yuchao | The Hong Kong Polytechnic University |
Chu, Henry | The Hong Kong Polytechnic University |
Keywords: Computer Vision for Transportation, Intelligent Transportation Systems, Autonomous Agents
Abstract: Recently, Large Language Models (LLMs) have received widespread attention in the field of autonomous driving. It is believed that utilizing the common sense ability of LLMs in autonomous driving may be a promising solution to alleviate the long tail effect, addressing rare and unpredictable driving scenarios. Moreover, LLMs may contribute to increased explainability in autonomous driving by generating natural-language explanations for driving actions, fostering trust and transparency in autonomous driving systems. Within this context, we proposed an explainable decision making network with LLMs to investigate the effect of the LLMs on driving performance and explainability.
|
|
13:30-15:00, Paper ThBL-EX.26 | Add to My Program |
Study of an Autonomous Mobile Robot for Cultivation Management of Pleurotus Eryngii Mushroom |
|
Jou, Rong-Yuan | National Formosa University |
Chi, Jin-Chuan | National Formosa University, Department of Mechanical Design Eng |
Shih, Hsin-Der | Plat Pathology Division, Taiwan Agricultural Research Institute, |
Keywords: Agricultural Automation, Environment Monitoring and Management, Product Design, Development and Prototyping
Abstract: Mushroom cultivation relies on precise environmental control to ensure optimal growth conditions. Manual inspections for data collection and analysis in mushroom houses have traditionally been labor-intensive and expertise-dependent. To address these challenges, this study presents an innovative Autonomous Mobile Robot (AMR) platform designed for autonomous fixed-point inspections, data acquisition, pollution detection, and navigation within mushroom cultivation facilities. Integrated with the Robot Operating System (ROS) and powered by Raspberry Pi 4, the platform incorporates a vertical positioning system equipped with sensors for comprehensive environmental data collection. The platform achieves pollution detection on mushroom bags with precision exceeding 90% using the YOLO algorithm. All inspection data is securely stored in a remote PostgreSQL database, facilitating seamless access and further analysis. Successful implementation trials in real-world mushroom houses underscore the platform's potential to revolutionize inspection processes, offering substantial efficiency gains and advancements for the mushroom cultivation industry.
|
|
13:30-15:00, Paper ThBL-EX.27 | Add to My Program |
Exploring Human�s Gender Perception and Bias Toward Non-Humanoid Robots |
|
Ramezani, Mahya | University of Luxembourg |
Sanchez-Lopez, Jose Luis | University of Luxembourg |
Keywords: Social HRI, Acceptability and Trust
Abstract: As non-humanoid robots increasingly permeate various sectors, understanding their design implications for human acceptance becomes paramount. Despite their ubiquity, studies on how to improve human interaction are sparse. Our investigation, conducted through two surveys, addresses this gap. The first survey emphasizes non-humanoid robots and human perceptions about gender attributions, suggesting that both design and perceived gender influence acceptance. Survey 2 investigates the effects of varying gender cues on robot designs and their consequent impacts on human-robot interactions. Our findings highlighted that distinct gender cues can bolster or impede interaction comfort.
|
|
ThCT1-CC Oral Session, CC-303 |
Add to My Program |
Planning, Scheduling and Coordination |
|
|
Chair: Kim, Hyun-Jung | Korea Advanced Institute of Science and Technology |
Co-Chair: Xiao, Xuesu | George Mason University |
|
16:30-18:00, Paper ThCT1-CC.1 | Add to My Program |
Quadcopter Trajectory Time Minimization and Robust Collision Avoidance Via Optimal Time Allocation |
|
Xu, Zhefan | Carnegie Mellon University |
Shimada, Kenji | Carnegie Mellon University |
Keywords: Planning, Scheduling and Coordination, Collision Avoidance, Aerial Systems: Perception and Autonomy
Abstract: Autonomous navigation requires robots to generate trajectories for collision avoidance efficiently. Although plenty of previous works have proven successful in generating smooth and spatially collision-free trajectories, their solutions often suffer from suboptimal time efficiency and potential unsafety, particularly when accounting for uncertainties in robot perception and control. To address this issue, this paper presents the Robust Optimal Time Allocation (ROTA) framework. This framework is designed to optimize the time progress of the trajectories temporally, serving as a post-processing tool to enhance trajectory time efficiency and safety under uncertainties. In this study, we begin by formulating a non-convex optimization problem aimed at minimizing trajectory execution time while incorporating constraints on collision probability as the robot approaches obstacles. Subsequently, we introduce the concept of the trajectory braking zone and adopt the chance-constrained formulation for robust collision avoidance in the braking zones. Finally, the non-convex optimization problem is reformulated into a second-order cone programming problem to achieve real-time performance. Through simulations and physical flight experiments, we demonstrate that the proposed approach effectively reduces trajectory execution time while enabling robust collision avoidance in complex environments. Our software is available on GitHub, along with the developed autonomy framework, as open-source ROS packages.
|
|
16:30-18:00, Paper ThCT1-CC.2 | Add to My Program |
Scaling Team Coordination on Graphs with Reinforcement Learning |
|
Limbu, Manshi | George Mason University |
Hu, Zechen | George Mason University |
Wang, Xuan | George Mason University |
Shishika, Daigo | George Mason University |
Xiao, Xuesu | George Mason University |
Keywords: Planning, Scheduling and Coordination, Cooperating Robots, Multi-Robot Systems
Abstract: This paper studies Reinforcement Learning (RL) techniques to enable team coordination behaviors in graph environments with support actions among teammates to reduce the costs of traversing certain risky edges in a centralized manner. While classical approaches can solve this non-standard multi-agent path planning problem by converting the original Environment Graph (EG) into a Joint State Graph (JSG) to implicitly incorporate the support actions, those methods do not scale well to large graphs and teams. To address this curse of dimensionality, we propose to use RL to enable agents to learn such graph traversal and teammate supporting behaviors in a data-driven manner. Specifically, through a new formulation of the team coordination on graphs with risky edges problem into Markov Decision Processes (MDPs) with a novel state and action space, we investigate how RL can solve it in two paradigms: First, we use RL for a team of agents to learn how to coordinate and reach the goal with minimal cost on a single EG. We show that RL efficiently solves problems with up to 20/4 or 25/3 nodes/agents, using a fraction of the time needed for J S G to solve such complex problems; Second, we learn a general RL policy for any N-node EGs to produce efficient supporting behaviors. We present extensive experiments and compare our RL approaches against their classical counterparts.
|
|
16:30-18:00, Paper ThCT1-CC.3 | Add to My Program |
Dynamic Crane Scheduling with Reinforcement Learning for a Steel Coil Warehouse |
|
Cho, Sang-Hyun | Korea Advanced Institute of Science and Technology |
Shin, Woo-Jin | Korea Advanced Institute of Science & Technology |
Ahn, Jeongsun | KAIST |
Joo, Sanghyun | Korea Advanced Institute of Science and Technology(KAIST) |
Kim, Hyun-Jung | Korea Advanced Institute of Science and Technology |
Keywords: Planning, Scheduling and Coordination, Intelligent and Flexible Manufacturing, Factory Automation
Abstract: This paper tackles the dynamic crane scheduling problem in a steel coil warehouse, involving tasks, such as coil storage, retrieval, and shuffling. Tasks arrive dynamically with precedence relations, while multiple cranes share a track, necessitating collision avoidance. Our goal is to minimize the average task waiting time by allocating tasks to cranes and optimizing their execution sequence. Unlike prior research focusing on static scenarios or rule-based heuristics, we introduce a real-time, reinforcement learning-based algorithm. To effectively handle precedence relations and global information, we propose a policy network based on graph neural networks. Experimental results demonstrate its superiority over traditional heuristics such as dispatching rules in dynamic scenarios.
|
|
16:30-18:00, Paper ThCT1-CC.4 | Add to My Program |
Tree-Based Representation of Locally Shortest Paths for 2D K-Shortest Non-Homotopic Path Planning |
|
Yang, Tong | Zhejiang University |
Huang, Li | Zhejiang University |
Wang, Yue | Zhejiang University |
Xiong, Rong | Zhejiang University |
Keywords: Planning, Scheduling and Coordination, Motion and Path Planning
Abstract: A novel algorithm to solve the 2D 𝑘-shortest non-homotopic path planning (𝑘-SNPP) task is proposed in this paper. The task is of practical significance as a sub-module for higher level planning and scheduling tasks, and is gaining increasing attention and focus in recent years. There have existed algorithms that explicitly characterised non-homotopic paths using topological invariants such as ℎ-signature and winding number. However, these algorithms are inefficient due to their separate treatment of topology and geometry: Topological invariants are singularly utilised for distinguishing non-homotopic property among paths, which significantly increases the volume of the robot configuration space. Meanwhile, distance-optimal path planners search for locally shortest paths in the augmented space, which becomes extremely time-consuming. In this paper, a topological tree is proposed to simultaneously leverage topology and geometry. The tree grows from the starting location and explores all topological routes, until the best 𝑘 of its leaves reach the goal. It is proven that different branches of the tree explore different homotopy classes of paths, and all the branches are locally shortest. Comparative experiments for 𝑘-SNPP are conducted in challenging grid-based simulated environments to validate the performance of the proposed algorithm. The C++ implementation of the proposed algorithm is released for the benefit of the robotics community
|
|
16:30-18:00, Paper ThCT1-CC.5 | Add to My Program |
Well-Connected Set and Its Application to Multi-Robot Path Planning |
|
Guo, Teng | Rutgers University |
Yu, Jingjin | Rutgers University |
Keywords: Planning, Scheduling and Coordination, Multi-Robot Systems, Path Planning for Multiple Mobile Robots or Agents
Abstract: Parking lots and autonomous warehouses for accommodating many vehicles/robots adopt designs in which the underlying graphs are well-connected to simplify planning and reduce congestion. In this study, we formulate and delve into the largest well-connected set (LWCS) problem and explore its applications in layout design for multi-robot path planning. Roughly speaking, a well-connected set over a connected graph is a set of vertices such that there is a path on the graph connecting any pair of vertices in the set without passing through any additional vertices of the set. Identifying an LWCS has many potential high-utility applications, e.g., for determining parking garage layout and capacity, as prioritized planning can be shown to be complete when start/goal configurations belong to an LWCS. In this work, we establish that computing an LWCS is NP-complete. We further develop optimal and near-optimal LWCS algorithms, with the near-optimal algorithm targeting large maps. A complete prioritized planning method is given for planning paths for multiple robots residing on an LWCS.
|
|
16:30-18:00, Paper ThCT1-CC.6 | Add to My Program |
Dynamic Coalition Formation and Routing for Multirobot Task Allocation Via Reinforcement Learning |
|
Dai, Weiheng | National University of Singapore |
Bidwai, Aditya | National University of Singapore |
Sartoretti, Guillaume Adrien | National University of Singapore (NUS) |
Keywords: Planning, Scheduling and Coordination, Path Planning for Multiple Mobile Robots or Agents, Reinforcement Learning
Abstract: Many multi-robot deployments, such as automated construction of buildings, distributed search, or cooperative mapping, often require agents to intelligently coordinate their trajectories and form coalition over a large domain, to complete spatially distributed tasks as quickly as possible. We focus on scenarios involving homogeneous robots, but where tasks vary in the number of agents required to start them. For example, construction robots may need to collaboratively air-lift heavy objects at different locations (e.g., prefabricated rooms, crates of material/equipment), where the weight of each payload defines the required coalition size. To balance the total travel time of the agents and their waiting time (before task initiation), agents need to carefully sequence tasks but also dynamically form/disband coalitions. While simpler problems can be approached using heuristics or optimization, these methods struggle with more complex instances involving large task-to-agent ratios, where frequent coalition changes are needed. In this work, we propose to let agents learn to iteratively build cooperative schedules to solve such problems, by casting the problem in the reinforcement learning framework. Our approach relies on an attention-based neural network, allowing agents to reason about the current state of the system to sequence movement decisions that optimize short-term coalition formation and long-term task scheduling. We further propose a novel leader-follower technique to boost cooperation learning and compare our performance to conventional baselines in a wide variety of scenarios. There, our method closely matches or outperforms the baselines; in particular, it yields higher-quality solutions and is at least 2 orders of magnitude faster than exact solver in cases where frequent coalition updates are required.
|
|
16:30-18:00, Paper ThCT1-CC.7 | Add to My Program |
Multi-Robot Task Allocation under Uncertainty Via Hindsight Optimization |
|
Dhanaraj, Neel | University of Southern California |
Kang, Jeon Ho | University of Southern California |
Mukherjee, Anirban | University of Southern California |
Nemlekar, Heramb | University of Southern California |
Nikolaidis, Stefanos | University of Southern California |
Gupta, Satyandra K. | University of Southern California |
Keywords: Planning, Scheduling and Coordination, Task Planning, Intelligent and Flexible Manufacturing
Abstract: Multi-robot systems are becoming increasingly prevalent in various real-world applications, such as manufacturing and warehouse logistics. These systems face complex challenges in 1) task allocation due to factors like time-extended tasks, and agent specialization, and 2) uncertainties in task execution. Potential task failures can add further contingency tasks to recover from the failure, thereby causing delays. This paper addresses the problem of Multi-Robot Task Allocation under Uncertainty by proposing a hierarchical approach that decouples the problem into two layers. We use a low-level optimization formulation to find the optimal solution for a deterministic multi-robot task allocation problem with known task outcomes. The higher-level search intelligently generates more likely combinations of failures and calls the inner-level search repeatedly to find the optimal task allocation sequence, given the known outcomes. We validate our results in simulation for a manufacturing domain and demonstrate that our method can reduce the effect of potential delays from contingencies. We show that our algorithm is computationally efficient while improving average makespan compared to other baselines.
|
|
16:30-18:00, Paper ThCT1-CC.8 | Add to My Program |
Traffic Flow Learning Enhanced Large-Scale Multi-Robot Cooperative Path Planning under Uncertainties |
|
Han, Xingyao | Shanghai Jiao Tong University |
Chen, Siyuan | Shanghai JiaoTong University |
Xiong, Xinye | Shanghai Jiao Tong University |
Liu, Qiming | Shanghai Jiao Tong University |
Zhou, Shunbo | Huawei |
Zhang, Heng | Shanghai Jiao Tong University |
Liu, Zhe | University of Cambridge |
Keywords: Planning, Scheduling and Coordination, Task Planning, Logistics
Abstract: Robotic systems with hundreds or even thousands of robots are widely implemented in logistic and industrial applications. In such systems, cooperative path planning is of great importance, as local congestion and motion conflict may greatly degrade system performance, especially in the presence of uncertainties. Our idea is to consider traffic flow equilibrium in path planning to relieve any potential congestion and increase efficiency. In this paper, we propose a hierarchical framework, which includes a traffic flow prediction layer, a sector-level planning layer, and a road-level coordination layer. In traffic flow prediction, we propose a spatio-temporal graph neural network that integrates local information to predict the evolution of future robot density distribution. In sectorlevel planning, we generate sector-level paths that consider travel distance and traffic flow equilibrium simultaneously. In road-level coordination, we implement the conflict-based search algorithm within each sector to ensure conflict-free local paths. In addition, we also explicitly consider motion/communication uncertainties that are unavoided in practical systems. We validate our effectiveness in simulations with over 1000 robots, what’s more, real experiments are provided.
|
|
16:30-18:00, Paper ThCT1-CC.9 | Add to My Program |
Accounting for Travel Time and Arrival Time Coordination During Task Allocations in Legged-Robot Teams |
|
Chen, Shengqiang | University of Southern California |
Chen, Yiyu | University of Southern California |
Jain, Ronak | University of Southern California |
Zhang, Xiaopan | University of Southern California |
Nguyen, Quan | University of Southern California |
Gupta, Satyandra K. | University of Southern California |
Keywords: Planning, Scheduling and Coordination, Task Planning, Multi-Robot Systems
Abstract: Many applications require the deployment of legged-robot teams to effectively and efficiently carry out missions. The use of multiple robots allows tasks to be executed concurrently, expediting mission completion. It also enhances resilience by enabling task transfer in case of a robot failure. This paper presents a formulation based on Mixed Integer Linear Programming (MILP) for allocating tasks to robots by taking into account travel time and ensuring efficient execution of collaborative tasks. We extended the MILP formulation to account for complexities with legged robot teams. Our results demonstrate that this approach leads to improved performance in terms of the makespan of the mission. We demonstrate the usefulness of this approach using a case study involving the disinfection of a building consisting of multiple rooms.
|
|
ThCT2-CC Oral Session, CC-311 |
Add to My Program |
Autonomous Agents II |
|
|
Chair: Sun, Liang | New Mexico State University |
Co-Chair: Dillmann, Rüdiger | FZI - Forschungszentrum Informatik - Karlsruhe |
|
16:30-18:00, Paper ThCT2-CC.1 | Add to My Program |
Adaptive Pedestrian Agent Modeling for Scenario-Based Testing of Autonomous Vehicles through Behavior Retargeting |
|
Muktadir, Golam Md | University of California, Santa Cruz |
Whitehead, Jim | University of California, Santa Cruz |
Keywords: Behavior-Based Systems, Modeling and Simulating Humans, Task and Motion Planning
Abstract: This work proposes a new representation of pedestrian crossing scenarios and a hybrid modeling approach, RePed, that facilitates transferring microscopic behavior models from behavior research to higher-level trajectories. With this, real-world trajectory-based scenarios can be augmented with a diverse set of human crossing maneuvers, producing a wealth of new scenarios and addressing the scarcity of rare case data that existing works struggle to deal with. Leveraging the controllability of this modeling approach, perturbation-based augmentation can be applied to enrich scenarios further. In addition, the representation is rooted in the Ego vehicle's coordinate system with a logical representation of roads. This design enables scenario retargeting to various road structures, traffic conditions, and ego vehicle behaviors. Thus, it strongly supports scenario-based testing by forcing pedestrians to produce certain situations in simulation even when the Ego Vehicle tries to evade them.
|
|
16:30-18:00, Paper ThCT2-CC.2 | Add to My Program |
KT-BT: A Framework for Knowledge Transfer through Behavior Trees in Multi-Robot Systems |
|
Oruganti Venkata, Sanjay Sarma | Rensselaer Polytechnic Institute |
Parasuraman, Ramviyas | University of Georgia |
Pidaparti, Ramana | University of Georgia |
Keywords: Behavior-Based Systems, Multi-Robot Systems, Behavior Trees, Cooperating Robots
Abstract: Multi-Robot and Multi-Agent Systems demonstrate collective (swarm) intelligence through systematic and distributed integration of local behaviors in a group. Agents sharing knowledge about the mission and environment can enhance performance at individual and mission levels. However, this is difficult to achieve, partly due to the lack of a generic framework for transferring part of the known knowledge (behaviors) between agents. This paper presents a new knowledge representation framework and a transfer strategy called KT-BT: Knowledge Transfer through Behavior Trees. The KT-BT framework follows a query-response-update mechanism through an online Behavior Tree framework, where agents broadcast queries for unknown conditions and respond with appropriate knowledge using a condition-action-control sub-flow. We embed a novel grammar structure called stringBT that encodes knowledge, enabling behavior sharing. We theoretically investigate the properties of the KT-BT framework in achieving homogeneity of high knowledge across the entire group compared to a heterogeneous system without the capability of sharing their knowledge. We extensively verify our framework in a simulated multi-robot
|
|
16:30-18:00, Paper ThCT2-CC.3 | Add to My Program |
Distributed Matching-By-Clone Hungarian-Based Algorithm for Task Allocation of Multi-Agent Systems |
|
Samiei, Arezoo | USA |
Sun, Liang | New Mexico State University |
Keywords: Autonomous Agents, Distributed Robot Systems, Multi-Robot Systems, Task Planning
Abstract: In this article, we present a novel approach, namely distributed matching-by-clone hungarian-based algorithm (DMCHBA), to multiagent task-allocation problems, in which the number of agents is smaller than the number of tasks. The proposed DMCHBA assumes that agents employ an implicit coordination mechanism and consists of two iterative phases, i.e., the communication phase and the assignment phase. In the communication phase, agents communicate with their connected neighbors and exchange their local knowledge base until they converge on the global knowledge base. In the assignment phase, each agent builds a squared cost matrix by cloning agents adding pseudotasks when necessary, and applying the Hungarian method for task allocation. A local planning algorithm is then applied to identify the order of task execution for an agent. The proposed DMCHBA is proven to produce conflict-free assignments among agents in finite time. We compare the performance of DMCHBA with the consensus-based bundle algorithm, the distributed recursive Hungarian-based algorithms, and the cluster-based Hungarian algorithm (CBHA) in Monte-Carlo simulations with different numbers of agents and tasks. The numerical results reveal the superior convergence and optimality of DMCHBA over all other selected algorithms.
|
|
16:30-18:00, Paper ThCT2-CC.4 | Add to My Program |
Convolutional Vision Transformer As a Path Following Controller for Omnidirectional Robots |
|
Athni Hiremath, Sandesh | TU Kaiserslautern |
Huang, ChengYi | Rheinland-Pfälzische Technische Universität Kaiserslautern-Landa |
Tika, Argtim | Technische Universität Kaiserslautern |
Bajcinca, Naim | TU Kaiserslautern |
Keywords: Autonomous Agents, Motion Control, Deep Learning Methods
Abstract: A novel deep neural network (DNN) based controller for omnidirectional robots is proposed. The controller decomposes the prescribed reference path, corresponding to a fixed prediction horizon, into multiple shorter paths corresponding to shorter prediction horizons. This implicitly enforces a Hankel structure in the input and consequently also on the output. Taking advantage of this, a convolutional vision transformer model is used to realize the controller which is then trained to predict state and controls over multiple prediction horizons. Model training is performed in a self-supervised manner using a synthetic dataset. The proposed controller is shown to be more efficient than a model designed for a single prediction horizon. In comparison to a model predictive controller, the proposed approach exhibits competitive performance in path following tasks and is 3 times faster on average for the same prediction length.
|
|
16:30-18:00, Paper ThCT2-CC.5 | Add to My Program |
Can an Embodied Agent Find Your “Cat-Shaped Mug”? LLM-Based Zero-Shot Object Navigation |
|
Dorbala, Vishnu Sashank | University of Maryland, College Park |
Mullen, James | University of Maryland |
Manocha, Dinesh | University of Maryland |
Keywords: Autonomous Agents, Domestic Robotics, AI-Enabled Robotics
Abstract: We present LGX (Language-guided Exploration), a novel algorithm for Language-Driven Zero-Shot Object Goal Navigation (L-ZSON), where an embodied agent navigates to an uniquely described target object in a previously unseen environment. Our approach makes use of Large Language Models (LLMs) for this task by leveraging the LLM’s commonsense-reasoning capabilities for making sequential navigational decisions. Simultaneously, we perform generalized target object detection using a pre-trained Vision-Language grounding model. We achieve state-of-the-art zero-shot object navigation results on RoboTHOR with a success rate (SR) improvement of over 27% over the current baseline of the OWL-ViT CLIP on Wheels (OWL CoW). Furthermore, we study the usage of LLMs for robot navigation and present an analysis of various prompting strategies affecting the model output. Finally, we showcase the benefits of our approach via real-world experiments that indicate the superior performance of LGX in detecting and navigating to visually unique objects.
|
|
16:30-18:00, Paper ThCT2-CC.6 | Add to My Program |
AutoExplorers: Autoencoder-Based Strategies for High-Entropy Exploration in Unknown Environments for Mobile Robots |
|
Puck, Lennart | FZI Forschungszentrum Informatik |
Schik, Maximilian | FZI Forschungszentrum Informatik |
Schnell, Tristan | FZI Forschungszentrum Informatik |
Buettner, Timothee | FZI Research Center for Information Technology |
Roennau, Arne | FZI Forschungszentrum Informatik, Karlsruhe |
Dillmann, Rüdiger | FZI - Forschungszentrum Informatik - Karlsruhe |
Keywords: Autonomous Agents, AI-Enabled Robotics, Space Robotics and Automation
Abstract: Deciding where to go next is a challenging task for humans. However, for robots in unknown environments, this becomes even more demanding. In planetary explorations, the robots are continuously challenged with the task of exploring novel areas, yet so far, humans decide for the robots where to go. Even then, prioritizing the next target based on previous knowledge is complex. In our proposed work, the robot utilizes data about its surroundings from drone or satellite images. Alternatively, a volumetric representation can be reduced to form a suitable input. From the input, tiles are selected and embedded by different autoencoder variants. The robot can select the most promising next exploration goal through the distance in the embedding to the previous samples. In this work, a variational autoencoder, a Wasserstein autoencoder, and a spherical autoencoder are evaluated against each other. The latter two variants yield a high information gain when evaluated on satellite data from the Netherlands. Additionally, the framework was employed on data from an analog mission in the Tabernas desert. Through the framework, the robots get an understanding of which goals yield the most information gain and, therefore, can quickly improve their knowledge about their surroundings.
|
|
16:30-18:00, Paper ThCT2-CC.7 | Add to My Program |
LLM-BT: Performing Robotic Adaptive Tasks Based on Large Language Models and Behavior Trees |
|
Zhou, Haotian | Wuhan University of Science and Technology |
Lin, Yunhan | Wuhan University of Science and Technology |
Yan, Longwu | Wuhan University of Science and Technology |
Zhu, Jihong | University of York |
Min, Huasong | Robotics Institute of Beihang University of China |
Keywords: Behavior-Based Systems, AI-Based Methods, Control Architectures and Programming
Abstract: Large Language Models (LLMs) have been widely utilized to perform complex robotic tasks. However, handling external disturbances during tasks is still an open challenge. This paper proposes a novel method to achieve robotic adaptive tasks based on LLMs and Behavior Trees (BTs). It utilizes ChatGPT to reason the descriptive steps of tasks. In order to enable ChatGPT to understand the environment, semantic maps are constructed by an object recognition algorithm. Then, we design a Parser module based on Bidirectional Encoder Representations from Transformers (BERT) to parse these steps into initial BTs. Subsequently, a BTs Update algorithm is proposed to expand the initial BTs dynamically to control robots to perform adaptive tasks. Different from other LLM based methods for complex robotic tasks, our method outputs variable BTs that can add and execute new actions according to environmental changes, which is robust to external disturbances. Our method is valid with simulation in different practical scenarios.
|
|
16:30-18:00, Paper ThCT2-CC.8 | Add to My Program |
DyHGDAT: Dynamic Hypergraph Dual Attention Network for Multi-Agent Trajectory Prediction |
|
Lin, Weilong | Fudan University |
Zeng, Xinhua | Fudan University |
Teng, Jing | North China Electric Power University |
Chengxin, Pang | Shanghai University of Electric Power |
Liu, Jing | Fudan University |
Keywords: AI-Based Methods, Autonomous Agents, Agent-Based Systems
Abstract: Modeling the interactions among agents based on their historical trajectories is key to precise multi-agent trajectory prediction. Hypergraph Convolutional Networks (HGCN) have become a proper choice for capturing highorder interactions among agents in this field. However, most existing works only consider static hypergraphs, and ignore that in a hypergraph, the power of influence varies between vertices (or hyperedges). Therefore, we propose DyHGDAT, a dynamic hypergraph dual attention network to capture the high-order interactions among agents, which not only models the evolution of hypergraph over time but also highlights the vertices and hyperedges with larger impacts. We apply DyHGDAT to a CVAE-based prediction system for predicting plausible trajectories. To validate the effectiveness of prediction, we evaluate our proposed method on two well-established trajectory prediction datasets: the ETH/UCY datasets and the Stanford Drone Dataset (SDD). The experimental results show that with DyHGDAT, the CVAE-based prediction system outperforms state-of-the-art methods by 12.5%/5.3% in ADE/FDE on ETH/UCY, and the improvement on SDD is 6.4%/7.4%.
|
|
ThCT3-CC Oral Session, CC-313 |
Add to My Program |
Calibration and Identification II |
|
|
Chair: Zhang, Jun | Nanyang Technological University |
Co-Chair: Chen, Kuan-Wen | National Yang Ming Chiao Tung University |
|
16:30-18:00, Paper ThCT3-CC.1 | Add to My Program |
LCCRAFT: LiDAR and Camera Calibration Using Recurrent All-Pairs Field Transforms without Precise Initial Guess |
|
Lee, Yu-Chen | National Yang Ming Chiao Tung University |
Chen, Kuan-Wen | National Yang Ming Chiao Tung University |
Keywords: Calibration and Identification, Deep Learning for Visual Perception, Visual Learning
Abstract: LiDAR-camera fusion plays a pivotal role in 3D reconstruction for self-driving applications. A fundamental prerequisite for effective fusion is the precise calibration be- tween LiDAR and camera systems. Many existing calibration methods are constrained by predefined mis-calibration ranges in the training data, essentially tying the network to a specific data distribution. However, if the range of evaluation data differs from what the network has been trained on, the resulting estimates may not meet expectations. Moreover, most methods require a precise initial guess for calibration to succeed. In this paper, we introduce LCCRAFT, an online calibration network designed for LiDAR and camera systems. Leveraging the 4D correlation volume and correlation lookup techniques inherited from RAFT, we apply them to correlate RGB images and depth maps derived from the projection of point clouds. Through weight sharing between update iterations and by enabling the update operator to learn from data with varying degrees of error, LCCRAFT demonstrates adaptability to diverse mis- calibration scenarios. This includes cases where the initial mis- calibration is even more severe than what the system encountered during training, demonstrating the robustness of the model. The calibration process executes in 93ms on a single GPU, meeting real-time requirements. Despite the modest 9M model parameters, LCCRAFT achieves competitive performance as compared to the state-of-the-art method, which entails 69M parameters.
|
|
16:30-18:00, Paper ThCT3-CC.2 | Add to My Program |
Physics-Informed Neural Network for Model Prediction and Dynamics Parameter Identification of Collaborative Robot Joints |
|
Yang, Xingyu | Aarhus University |
Du, Yixiong | Aarhus University |
Li, Leihui | Aarhus University |
Zhou, Zhengxue | University of Liverpool |
Zhang, Xuping | Aarhus University |
Keywords: Calibration and Identification, Deep Learning Methods, Dynamics
Abstract: Collaborative robots have promising potential for widespread use in small-and-medium-sized enterprise (SME) manufacturing and production due to the development of increasingly sophisticated Human-Robot Collaboration technologies. However, predicting and identifying the behavior of collaborative robots remains a challenging problem due to the significant non-linear properties of their unique gearbox, the harmonic drive. To tackle the engineering problem, this work proposes a physics-informed neural network (PINN) to predict and identify collaborative robot joint dynamics. The procedure involves deriving the state-space dynamic model, embedding the system's dynamics into a recurrent neural network (RNN) with customized Runge-Kutta cells, obtaining labeled training data, predicting system responses, and estimating dynamic parameters. The proposed method is applied to predict and identify collaborative robot joint dynamics, and the results are verified and validated through numerical simulations and experimental testing, respectively. The obtained results demonstrate a high level of agreement with the ground truth and exhibit superior performance compared to the conventional PINN and the non-linear grey-box state-space estimation algorithm when confronted with non-linearity and dynamic coupling. Moreover, the PINN exhibits the potential for extension to various dynamic systems.
|
|
16:30-18:00, Paper ThCT3-CC.3 | Add to My Program |
Estimating Material Properties of Interacting Objects Using Sum-GP-UCB |
|
Seker, Muhammet Yunus | Carnegie Mellon University |
Kroemer, Oliver | Carnegie Mellon University |
Keywords: Calibration and Identification, Incremental Learning, Perception for Grasping and Manipulation
Abstract: Robots need to estimate the material and dynamic properties of objects from observations in order to simulate them accurately. We present a Bayesian optimization approach to identifying the material property parameters of objects based on a set of observations. Our focus is on estimating these properties based on observations of scenes with different sets of interacting objects. We propose an approach that exploits the structure of the reward function by modeling the reward for each observation separately and using only the parameters of the objects in that scene as inputs. The resulting lower-dimensional models generalize better over the parameter space, which in turn results in a faster optimization. To speed up the optimization process further, and reduce the number of simulation runs needed to find good parameter values, we also propose partial evaluations of the reward function, wherein the selected parameters are only evaluated on a subset of real world evaluations. The approach was successfully evaluated on a set of scenes with a wide range of object interactions, and we showed that our method can effectively perform incremental learning without resetting the rewards of the gathered observations.
|
|
16:30-18:00, Paper ThCT3-CC.4 | Add to My Program |
LiDAR-Camera Extrinsic Calibration with Hierachical and Iterative Feature Matching |
|
Hu, Xuzhong | Huazhong University of Science and Technology |
Duan, ZaiPeng | Huazhong University of Science and Technology |
Ding, Junfeng | Huazhong University of Science and Technology |
Zhang, Zhe | Huazhong University of Science and Technology |
Huang, Xiao | China Ship Development and Design Center |
Ma, Jie | Huazhong University of Science and Technology |
Keywords: Calibration and Identification, Intelligent Transportation Systems, AI-Based Methods
Abstract: In autonomous driving, the LiDAR-Camera system plays a crucial role in a vehicle's perception of 3D environments. To effectively fuse information from both camera and LiDAR, extrinsic calibration is indispensable. Recently, some researchers have proposed deep learning-based methods that utilize convolutional networks to automatically extract features from LiDAR depth images and RGB images for calibration. However, these features do not sufficiently interact during feature matching, which limits the calibration accuracy. To this end, we introduce a novel extrinsic calibration network (HIFMNet) in this paper. It establishes a comprehensive connection between camera and LiDAR features by calculating a globally-aware map-to-map cost volume and hierachical point-to-map cost volumes. The former is used to regress large extrinsic offsets. The latter is employed to iteratively fine-tune extrinsic parameters, while the rigidity of LiDAR points is considered in each iteration to enhance regression robustness. Extensive experiments on the KITTI-odometry dataset demonstrate the superior performance of our HIFMNet compared to other state-of-the-art learning-based methods.
|
|
16:30-18:00, Paper ThCT3-CC.5 | Add to My Program |
GBEC: Geometry-Based Hand-Eye Calibration |
|
Liu, Yihao | Johns Hopkins University |
Zhang, Jiaming | Johns Hopkins University |
She, Zhangcong | Johns Hopkins University |
Kheradmand, Amir | Johns Hopkins University |
Armand, Mehran | Johns Hopkins University |
Keywords: Calibration and Identification, Medical Robots and Systems, Kinematics
Abstract: Hand-eye calibration is the problem of solving the transformation from the end-effector of a robot to the sensor attached to it. Commonly employed techniques, such as AXXB or AXZB formulations, rely on regression methods that require collecting pose data from different robot configurations, which can produce low accuracy and repeatability. However, the derived transformation should solely depend on the geometry of the end-effector and the sensor attachment. We propose Geometry-Based End-Effector Calibration (GBEC) that enhances the repeatability and accuracy of the derived transformation compared to traditional hand-eye calibrations. To demonstrate improvements, we apply the approach to two different robot-assisted procedures: Transcranial Magnetic Stimulation (TMS) and femoroplasty. We also discuss the generalizability of GBEC for camera-in-hand and marker-in-hand sensor mounting methods. In the experiments, we perform GBEC between the robot end-effector and an optical tracker's rigid body marker attached to the TMS coil or femoroplasty drill guide. Previous research documents low repeatability and accuracy of the conventional methods for robot-assisted TMS hand-eye calibration. Applying GBEC to repeated calibrations, we obtain transformations with standard deviations of 0.37mm, 0.65mm, and 0.40mm (translation) along x, y, and z axes of the end-effector, respectively. The tool alignment experiments after using GBEC achieve a mean accuracy around 0.2mm in Euclidean distance. When compared to some existing methods, the proposed method relies solely on the geometry of the flange and the pose of the rigid-body marker, making it independent of workspace constraints or robot accuracy, without sacrificing the orthogonality of the rotation matrix. Our results validate the accuracy and applicability of the approach, providing a new and generalizable methodology for obtaining the transformation from the end-effector to a sensor.
|
|
16:30-18:00, Paper ThCT3-CC.6 | Add to My Program |
A Learning-Based Approach for Estimating Inertial Properties of Unknown Objects from Encoder Discrepancies |
|
Lao, Zizhou | National University of Sinagpore |
Han, Yuanfeng | Johns Hopkins University |
Ma, Yunshan | National University of Singapore |
Chirikjian, Gregory | National University of Singapore |
Keywords: Calibration and Identification, Representation Learning
Abstract: Many robots utilize commercial force/torque sensors to identify inertial properties of unknown objects. However, such sensors can be difficult to apply to small-sized robots due to their weight, size, and cost. In this paper, we propose a learning-based approach for estimating the mass and center of mass (COM) of unknown objects without using force/torque sensors at the end effector or on the joints. In our method, a robot arm carries an unknown object as it moves through multiple discrete configurations. Measurements are collected when the robot reaches each discrete configuration and stops. A neural network then estimates joint torques from encoder discrepancies. Given multiple samples, we derive the closed-form relation between joint torques and the object’s inertial properties. Based on the derivation, the mass and COM of the object are identified by weighted least squares. In order to improve the accuracy of inferred inertial properties, an attention model is designed to generate the weights used in least squares, which indicate the relative importance for each joint. Our framework requires only encoder measurements without using any force/torque sensors, but still maintains accurate estimation capability. The proposed approach has been demonstrated on a 4 degree of freedom (DOF) robot arm.
|
|
16:30-18:00, Paper ThCT3-CC.7 | Add to My Program |
CalibFormer: A Transformer-Based Automatic LiDAR-Camera Calibration Network |
|
Xiao, Yuxuan | University of Science and Technology of China |
Li, Yao | University of Science and Technology of China |
Meng, Chengzhen | University of Science and Technology of China |
Li, XingChen | University of Science and Technology of China |
Ji, Jianmin | University of Science and Technology of China |
Zhang, Yanyong | University of Science and Technology of China |
Keywords: Calibration and Identification, Sensor Fusion, Deep Learning Methods
Abstract: The fusion of LiDARs and cameras has been increasingly adopted in autonomous driving for perception tasks. The performance of such fusion-based algorithms largely depends on the accuracy of sensor calibration, which is challenging due to the difficulty of identifying common features across different data modalities. Previously, many calibration methods involved specific targets and/or manual intervention, which has proven to be cumbersome and costly. Learning-based online calibration methods have been proposed, but their performance is barely satisfactory in most cases. These methods usually suffer from issues such as sparse feature maps, unreliable cross-modality association, inaccurate calibration parameter regression, etc. In this paper, to address these issues, we propose CalibFormer, an end-to-end network for automatic LiDAR-camera calibration. We aggregate multiple layers of camera and LiDAR image features to achieve high-resolution representations. A multi-head correlation module is utilized to identify correlations between features more accurately. Lastly, we employ transformer architectures to estimate accurate calibration parameters from the correlation information. Our method achieved a mean translation error of 0.8751cm and a mean rotation error of 0.0562◦ on the KITTI dataset, surpassing existing state-of-the-art methods and demonstrating strong robustness, accuracy, and generalization capabilities.
|
|
16:30-18:00, Paper ThCT3-CC.8 | Add to My Program |
Target-Free Extrinsic Calibration of Event-LiDAR Dyad Using Edge Correspondences |
|
Xing, Wanli | The University of Hong Kong |
Lin, Shijie | The University of Hong Kong |
Yang, Lei | The University of Hong Kong |
Pan, Jia | University of Hong Kong |
Keywords: Calibration and Identification, Sensor Fusion, Range Sensing
Abstract: Calibrating the extrinsic parameters of sensory devices is crucial for fusing multi-modal data. Recently, event cameras have emerged as a promising type of neuromorphic sensors, with many potential applications in fields such as mobile robotics and autonomous driving. When combined with LiDAR, they can provide more comprehensive information about the surrounding environment. Nonetheless, due to the distinctive representation of event cameras compared to traditional frame-based cameras, calibrating them with LiDAR presents a significant challenge. In this paper, we propose a novel method to calibrate the extrinsic parameters between a dyad of an event camera and a LiDAR without the need for a calibration board or other equipment. Our approach takes advantage of the fact that when an event camera is in motion, changes in reflectivity and geometric edges in the environment trigger numerous events, which can also be captured by LiDAR. Our proposed method leverages the edges extracted from events and point clouds and correlates them to estimate extrinsic parameters. Experimental results demonstrate that our proposed method is highly robust and effective in various scenes.
|
|
16:30-18:00, Paper ThCT3-CC.9 | Add to My Program |
LB-R2R-Calib: Accurate and Robust Extrinsic Calibration of Multiple Long Baseline 4D Imaging Radars for V2X |
|
Zhang, Jun | Nanyang Technological University |
Yang, Zihan | Nanyang Technological University |
Zhang, Fangwei | Nanyang Technological University |
Wu, Zhenyu | Nanyang Technological University |
Peng, Guohao | Nanyang Technological University |
Liu, Yiyao | NANYANG Technological University |
Lyu, Qiyang | Nanyang Technological University |
Wen, Mingxing | China-Singapore International Joint Research Center |
Wang, Danwei | Nanyang Technological University |
Keywords: Calibration and Identification, Sensor Networks, Intelligent Transportation Systems
Abstract: As a new sensor, 4D radar (x, y, z, velocity) has great potential for V2X, due to its 3D point cloud, direct doppler velocity output, long distance ranging, low-cost, and more importantly, robust perception in all weathers. However, the extrinsic calibration of multiple long baseline 4D radars is rarely researched in V2X, which is the key to fuse multi-radars. The main reasons are three-folds: (1) New sensor. Thus, it is not surprising that little related work can be found. (2) Long baseline and large viewpoint-difference. Current works are mainly focused on unmanned vehicles, which is short baseline and small viewpoint-difference. (3) Sparse, noisy, and very cluttered 4D radar point cloud. Thus, it is challenging to rapidly and accurately locate the target and extract the feature. In this paper, LB-R2R-Calib (Long Baseline Radar to Radar extrinsic Calibration) is proposed to address these problems. The novelties are: (1) A new target is introduced: an eight-quadrant corner reflector enclosed by a foam sphere. The benefit is the target center is a viewpoint-invariant feature. Thus, it is ideal for large viewpoint-difference calibration. (2) A new feature extraction algorithm is proposed to rapidly locate the target and extract the target center from a very cluttered point cloud, as we observed some important characteristics of 4D radar. Experiments with two 4D radars in real environments with four configurations demonstrate our method is highly accurate and robust.
|
|
ThCT4-CC Oral Session, CC-315 |
Add to My Program |
Cooperating Cellular Robots |
|
|
Co-Chair: Wen, John | Rensselaer Polytechnic Institute |
|
16:30-18:00, Paper ThCT4-CC.1 | Add to My Program |
AirTwins: Modular Bi-Copters Capable of Splitting from Their Combined Quadcopter in Midair |
|
Li, Song | Beihang University |
Liu, Fangyuan | Beihang University |
Gao, Yuzhe | Beihang University |
Xiang, Jinwu | Beihang University |
Tu, Zhan | Beihang University |
Li, Daochun | Beihang University |
Keywords: Cellular and Modular Robots, Aerial Systems: Mechanics and Control, Mechanism Design
Abstract: Micro tandem bi-copters are capable of passing through narrow gaps owing to their particular slender shape. However, the introduction of the tilting servo motors leads to a non-minimum phase roll dynamics, which affects their flight stability when exploring environments with unpredictable disturbances. In this paper, we propose and design a re-configurable aerial platform consisting of two modular bi-copters with an undocking mechanism. In combined configuration, a crossover docking approach is employed to compensate for the poor stability in their servo-controlled attitude of each bi-copter. In bicopter configuration, the minimum size (equal to ideal passable gap's width) of the system was reduced by 58% through mid-air separation. In detail, to compare the attitude response of the two configurations, a dynamic model considering servo response and non-minimum phase is established and simulated, and flying poking experiments were also conducted on them respectively. On the other hand, the performance of single bi-copter including trajectory tracking and passing through narrow gaps was demonstrated through flight tests. Finally, the feasibility of the undocking mechanism was verified by mid-air separation experiments. The proposed system is promising to be applied in scenarios containing both complex perturbations and confined spaces, while also having the potential to improve exploration efficiency through collaborative work.
|
|
16:30-18:00, Paper ThCT4-CC.2 | Add to My Program |
ArrayBot: Reinforcement Learning for Generalizable Distributed Manipulation through Touch |
|
Xue, Zhengrong | Tsinghua University |
Zhang, Han | Tsinghua University, Shanghai Qi Zhi Institute |
Jingwen, Cheng | Tsinghua University |
He, Zhengmao | Shanghai Qi Zhi Institute |
Ju, Yuanchen | Southwest University |
Lin, Changyi | Carnegie Mellon University |
Zhang, Gu | Shanghai Jiaotong University |
Xu, Huazhe | Tsinghua University |
Keywords: Cellular and Modular Robots, Deep Learning in Grasping and Manipulation, Force and Tactile Sensing
Abstract: We present ArrayBot, a distributed manipulation system consisting of a 16x16 array of vertically sliding pillars integrated with tactile sensors. Functionally, ArrayBot is designed to simultaneously support, perceive, and manipulate the tabletop objects. Towards generalizable distributed manipulation, we leverage reinforcement learning (RL) algorithms for the automatic discovery of control policies. In the face of the massively redundant actions, we propose to reshape the action space by considering the spatially local action patch and the low-frequency actions in the frequency domain. With this reshaped action space, we train RL agents that can relocate diverse objects through tactile observations only. Intriguingly, we find that the discovered policy can not only generalize to unseen object shapes in the simulator but also have the ability to transfer to the physical robot without any sim-to-real fine-tuning. Leveraging the deployed policy, we derive more real-world manipulation skills on ArrayBot to further illustrate the distinctive merits of our proposed system.
|
|
16:30-18:00, Paper ThCT4-CC.3 | Add to My Program |
Optimizing Modular Robot Composition: A Lexicographic Genetic Algorithm Approach |
|
Külz, Jonathan | Technical University of Munich |
Althoff, Matthias | Technische Universität München |
Keywords: Cellular and Modular Robots, Methods and Tools for Robot System Design, Mechanism Design
Abstract: Industrial robots are designed as general-purpose hardware with limited ability to adapt to changing task requirements or environments. Modular robots, on the other hand, offer flexibility and can be easily customized to suit diverse needs. The morphology, i.e., the form and structure of a robot, significantly impacts the primary performance metrics acquisition cost, cycle time, and energy efficiency. However, identifying an optimal module composition for a specific task remains an open problem, presenting a substantial hurdle in developing task-tailored modular robots. Previous approaches either lack adequate exploration of the design space or the possibility to adapt to complex tasks. We propose combining a genetic algorithm with a lexicographic evaluation of solution candidates to overcome this problem and navigate search spaces exceeding those in prior work by magnitudes in the number of possible compositions. We demonstrate that our approach outperforms a state-of-the-art baseline and is able to synthesize modular robots for industrial tasks in cluttered environments.
|
|
16:30-18:00, Paper ThCT4-CC.4 | Add to My Program |
WiBot 1.0: A Modular Reconfigurable Glass Cleaning Robot for High-Rise Buildings |
|
Akalanka, Sudheera | University of Moratuwa |
Sandeepa, Harith | University of Moratuwa |
Athauda Pathirana, Manu | University of Moratuwa |
Amarasinghe, Ranjith | University of Moratuway |
Jayasekara, A.G.B.P. | University of Moratuwa |
Hanchapola Appuhamilage, Gihan Charith Premachandra | Singapore University of Technology and Design |
Tan, U-Xuan | Singapore University of Techonlogy and Design |
Keywords: Cellular and Modular Robots, Task and Motion Planning, Climbing Robots
Abstract: Cleaning glass surfaces is a prevailing maintenance problem in high-rise buildings. In the traditional methods of cleaning windows, hanging on ropes poses significant occupational hazards to workers. Furthermore, most glass facades feature window frames to securely fasten the glass panels to the building structure, ensuring durability and elegance. In this context, existing robotic cleaning methods are limited by their capability to move-over window frames and need more flexibility to access tight corners and curved surfaces. This paper presents a novel reconfigurable glass cleaning robot called "WiBot" to address these limitations. WiBot is a kinematic chain comprising modular linkages with a prismatic joint and two revolute joints at each end. Each revolute joint has a suction unit that enables locomotion and adhesion. Window frames are detected using image processing with an onboard camera, and design optimizations were performed to improve the robot's capabilities. The prototype WiBot 1.0 was developed, and several experiments were conducted to evaluate the feasibility of the proposed system focusing on robot motion, window frame detection and move-over mechanism. The results show that WiBot can overcome the limitations of existing window cleaning solutions. Finally, several promising research directions are mentioned involving the proposed reconfigurable robot architecture in cleaning operations.
|
|
16:30-18:00, Paper ThCT4-CC.5 | Add to My Program |
Collaborative Manipulation of Deformable Objects with Predictive Obstacle Avoidance |
|
Aksoy, Burak | Rensselaer Polytechnic Institute |
Wen, John | Rensselaer Polytechnic Institute |
Keywords: Cooperating Robots, Collision Avoidance, Simulation and Animation
Abstract: Manipulating deformable objects arises in daily life and numerous applications. Despite phenomenal advances in industrial robotics, manipulation of deformable objects remains mostly a manual task. This is because of the high number of internal degrees of freedom and the complexity of predicting its motion. In this paper, we apply the computationally efficient position-based dynamics method to predict object motion and distance to obstacles. This distance is incorporated in a control barrier function for the resolved motion kinematic control for one or more robots to adjust their motion to avoid colliding with the obstacles. The controller has been applied in simulations to 1D and 2D deformable objects with varying numbers of assistant agents, demonstrating its versatility across different object types and multi-agent systems. Results indicate the feasibility of real-time collision avoidance through deformable object simulation, minimizing path tracking error while maintaining a predefined minimum distance from obstacles and preventing overstretching of the deformable object. The implementation is performed in ROS, allowing ready portability to different applications.
|
|
16:30-18:00, Paper ThCT4-CC.6 | Add to My Program |
D-Lite: Navigation-Oriented Compression of 3D Scene Graphs for Multi-Robot Collaboration |
|
Chang, Yun | MIT |
Ballotta, Luca | Delft University of Technology |
Carlone, Luca | Massachusetts Institute of Technology |
Keywords: Cooperating Robots, Multi-Robot Systems, Motion and Path Planning
Abstract: For a multi-robot team that collaboratively explores an unknown environment, it is of vital importance that the collected information is efficiently shared among robots in order to support exploration and navigation tasks. Practical constraints of wireless channels, such as limited bandwidth, urge robots to carefully select information to be transmitted. In this paper, we consider the case where environmental information is modeled using a 3D Scene Graph, a hierarchical map representation that describes both geometric and semantic aspects of the environment. Then, we leverage graph-theoretic tools, namely graph spanners, to design greedy algorithms that efficiently compress 3D Scene Graphs with the aim of enabling communication between robots under bandwidth constraints. Our compression algorithms are navigation-oriented in that they are designed to approximately preserve shortest paths between locations of interest, while meeting a user-specified communication budget constraint. The effectiveness of the proposed algorithms is demonstrated in synthetic robot navigation experiments in a realistic simulator.
|
|
16:30-18:00, Paper ThCT4-CC.7 | Add to My Program |
ColAG: A Collaborative Air-Ground Framework for Perception-Limited UGVs' Navigation |
|
Li, Zhehan | Zhejiang University |
Mao, Rui | Sun Yat-Sen University |
Chen, Nanhe | Zhejiang University |
Xu, Chao | Zhejiang University |
Gao, Fei | Zhejiang University |
Cao, Yanjun | Zhejiang University, Huzhou Institute of Zhejiang University |
Keywords: Cooperating Robots, Multi-Robot Systems, Planning, Scheduling and Coordination
Abstract: Perception is necessary for autonomous navigation in an unknown area crowded with obstacles. It’s challenging for a robot to navigate safely without any sensors that can sense the environment, resulting in a blind robot, and becomes more difficult when comes to a group of robots. However, it could be costly to equip all robots with expensive perception or SLAM systems. In this paper, we propose a novel system named ColAG, to solve the problem of autonomous navigation for a group of blind UGVs by introducing cooperation with one UAV, which is the only robot that has full perception capabilities in the group. The UAV uses SLAM for its odometry and mapping while sharing this information with UGVs via limited relative pose estimation. The UGVs plan their trajectories in the received map and predict possible failures caused by the uncertainty of its wheel odometry and unknown risky areas. The UAV dynamically schedules waypoints to prevent UGVs from collisions, formulated as a Vehicle Routing Problem with Time Windows to optimize the UAV’s trajectories and minimize time when UGVs have to wait to guarantee safety. We validate our system through extensive simulation with 7 UGVs and real-world experiments with 3 UGVs.
|
|
16:30-18:00, Paper ThCT4-CC.8 | Add to My Program |
GRF-Based Predictive Flocking Control with Dynamic Pattern Formation |
|
Yu, Chenghao | Sun Yat-Sen University |
Zhang, Dengyu | Sun Yat-Sen University |
Zhang, Qingrui | Sun Yat-Sen University |
Keywords: Cooperating Robots, Swarm Robotics, Multi-Robot Systems
Abstract: It is promising but challenging to design flocking control for a robot swarm to autonomously follow changing patterns or shapes in a optimal distributed manner. The optimal flocking control with dynamic pattern formation is, therefore, investigated in this paper. A predictive flocking control algorithm is proposed based on a Gibbs random field (GRF), where bio-inspired potential energies are used to charaterize ``robot-robot'' and ``robot-environment'' interactions. Specialized performance-related energies, e.g., motion smoothness, are introduced in the proposed design to improve the flocking behaviors. The optimal control is obtained by maximizing a posterior distribution of a GRF. A region-based shape control is accomplished for pattern formation in light of a mean shift technique. The proposed algorithm is evaluated via the comparison with two state-of-the-art flocking control methods in an environment with obstacles. Both numerical simulations and real-world experiments are conducted to demonstrate the efficiency of the proposed design.
|
|
ThCT5-CC Oral Session, CC-411 |
Add to My Program |
Visual Learning III |
|
|
Chair: Ryoo, Michael S. | Google, Stony Brook University |
Co-Chair: Chen, Yingcong | The University of Science and Technology (Guangzhou) |
|
16:30-18:00, Paper ThCT5-CC.1 | Add to My Program |
From Bird’s-Eye to Street View: Crafting Diverse and Condition-Aligned Images with Latent Diffusion Model |
|
Xu, Xiaojie | The Hong Kong University of Science and Technology(Guangzhou) |
Xu, Tianshuo | Hongkong University of Science and Technology (Guangzhou) |
Ma, Fulong | The Hong Kong University of Science and Technology |
Chen, Yingcong | The University of Science and Technology (Guangzhou) |
Keywords: Visual Learning, Computer Vision for Automation, Computer Vision for Transportation
Abstract: We explore Bird’s-Eye View (BEV) generation, converting a BEV map into its corresponding multi-view street images. Valued for its unified spatial representation aiding multi-sensor fusion, BEV is pivotal for various autonomous driving applications. Creating accurate street-view images from BEV maps is essential for portraying complex traffic scenarios and enhancing driving algorithms. Concurrently, diffusion based conditional image generation models have demonstrated remarkable outcomes, adept at producing diverse, high-quality, and condition-aligned results. Nonetheless, the training of these models demands substantial data and computational resources. Hence, exploring methods to fine-tune these advanced models, like Stable Diffusion, for specific conditional generation tasks emerges as a promising avenue. In this paper, we introduce a practical framework for generating images from a BEV layout.Our approach comprises two main components: the Neural View Transformation and the Street Image Generation. The Neural View Transformation phase converts the BEV map into aligned multi-view semantic segmentation maps by learning the shape correspondence between the BEV and perspective views. Subsequently, the Street Image Generation phase utilizes these segmentations as a condition to guide a fine-tuned latent diffusion model. This finetuning process ensures both view and style consistency. Our model leverages the generative capacity of large pretrained diffusion models within traffic contexts, effectively yielding diverse and condition-coherent street view images.
|
|
16:30-18:00, Paper ThCT5-CC.2 | Add to My Program |
Lightning NeRF: Efficient Hybrid Scene Representation for Autonomous Driving |
|
Cao, Junyi | Shanghai Jiao Tong University |
Li, Zhichao | Tusimple.ai |
Wang, Naiyan | TuSimple |
Ma, Chao | Shanghai Jiao Tong University |
Keywords: Visual Learning, Computer Vision for Automation, Simulation and Animation
Abstract: Recent studies have highlighted the promising application of NeRF in autonomous driving contexts. However, the complexity of outdoor environments, combined with the restricted viewpoints in driving scenarios, complicates the task of precisely reconstructing scene geometry. Such challenges often lead to diminished quality in reconstructions and extended durations for both training and rendering. To tackle these challenges, we present Lightning NeRF. It uses an efficient hybrid scene representation that effectively utilizes the geometry prior from LiDAR in autonomous driving scenarios. Lightning NeRF significantly improves the novel view synthesis performance of NeRF and reduces computational overheads. Through evaluations on real-world datasets, such as KITTI-360, Argoverse2, and our private dataset, we demonstrate that our approach not only exceeds the current state-of-the-art in novel view synthesis quality but also achieves a five-fold increase in training speed and a ten-fold improvement in rendering speed. Codes are available at https://github.com/VISION-SJTU/Lightning-NeRF.
|
|
16:30-18:00, Paper ThCT5-CC.3 | Add to My Program |
Physical Priors Augmented Event-Based 3D Reconstruction |
|
Wang, Jiaxu | Hong Kong University of Science and Technology (Guangzhou) |
He, Junhao | The Hong Kong University of Science and Technology (Guangzhou) |
Zhang, Ziyi | Hong Kong University of Science and Technology (Guangzhou) |
Xu, Renjing | The Hong Kong University of Science and Technology (Guangzhou) |
Keywords: Visual Learning, Data Sets for Robotic Vision, Representation Learning
Abstract: 3D Neural implicit representations play a significant component in many robotic applications. However, reconstructing neural radiance fields (NeRF) from realistic event data remains a challenge due to the sparsities and the lack of information when only event streams are available. In this paper, we utilize motion, geometry, and density priors behind event data to impose strong physical constraints to augment NeRF training. The proposed novel pipeline can directly benefit from those priors to reconstruct 3D scenes without additional inputs. Moreover, we present a novel density-guided patch-based sampling strategy for robust and efficient learning, which not only accelerates training procedures but also conduces to expressions of local geometries. More importantly, we establish the first large dataset for event-based 3D reconstruction, which contains 101 objects with various materials and geometries, along with the groundtruth of images and depth maps for all camera viewpoints, which significantly facilitates other research in the related fields. The code and dataset will be publicly available at https://github.com/Zerory1/Ev3D.
|
|
16:30-18:00, Paper ThCT5-CC.4 | Add to My Program |
SLAM Based on Camera-2D LiDAR Fusion |
|
Lu, Guoyu | University of Georgia |
Keywords: Visual Learning, Deep Learning for Visual Perception, Deep Learning Methods
Abstract: The SLAM system plays a pivotal role in robotic mapping and localization, leveraging various sensor technologies to achieve precision. Traditional passive sensors, such as RGB cameras, offer high-resolution imagery at a lower cost for SLAM applications, yet they fall short in accurately estimating 3D positions and camera orientations. On the other hand, LiDARs excel in generating accurate 3D maps but often come at a higher price and lower resolution. While active illumination sensors like LiDAR provide precise depth estimation, the prohibitive cost of high-resolution LiDAR systems restricts their widespread adoption across diverse applications. Although single-beam LiDAR is more affordable, its limited depth sensing capability hampers comprehensive environmental perception. Addressing these limitations, this study introduces a deep learning framework aimed at enhancing SLAM performance through the strategic fusion of camera and 2D LiDAR data. Our approach employs a novel self-supervised network alongside an economical single-beam LiDAR, striving to achieve or surpass the performance of more expensive LiDAR systems. The integration of single-beam LiDAR with our system allows for dynamic adjustment of scale uncertainty in depth maps generated by monocular camera systems within SLAM. Consequently, this fusion method enjoys the high-resolution and accuracy benefits of advanced LiDAR systems with the cost-effectiveness of single-beam LiDAR technology. Through this innovative combination, we demonstrate a SLAM system that not only maintains high fidelity in mapping and localization but also ensures affordability and broad applicability.
|
|
16:30-18:00, Paper ThCT5-CC.5 | Add to My Program |
NeRF-Enhanced Outpainting for Faithful Field-Of-View Extrapolation |
|
Yu, Rui | University of Louisville |
Liu, Jiachen | Pennsylvania State University |
Zhou, Zihan | Manycore Tech Inc |
Huang, Sharon X. | The Pennsylvania State University |
Keywords: Visual Learning, Deep Learning for Visual Perception, Vision-Based Navigation
Abstract: In various applications, such as robotic navigation and remote visual assistance, expanding the field of view (FOV) of the camera proves beneficial for enhancing environmental perception. Unlike image outpainting techniques aimed solely at generating aesthetically pleasing visuals, these applications demand an extended view that faithfully represents the scene. To achieve this, we formulate a new problem of faithful FOV extrapolation that utilizes a set of pre-captured images as prior knowledge of the scene. To address this problem, we present a simple yet effective solution called NeRF-Enhanced Outpainting (NEO) that uses extended-FOV images generated through NeRF to train a scene-specific image outpainting model. To assess the performance of NEO, we conduct comprehensive evaluations on three photorealistic datasets and one real-world dataset. Extensive experiments on the benchmark datasets showcase the robustness and potential of our method in addressing this challenge. We believe our work lays a strong foundation for future exploration within the research community.
|
|
16:30-18:00, Paper ThCT5-CC.6 | Add to My Program |
DL-PoseNet: A Differential Lightweight Network for Pose Regression Over SE(3) |
|
Li, Wenjie | Nanjng University |
Liu, Jia | Nanjing University |
Wang, Yanyan | Hohai University |
Ren, Dayong | Nanjing University |
Hao, Wei | Nanjing University |
Chen, Lijun | Nanjing University |
Keywords: Visual Learning, Deep Learning Methods, Localization
Abstract: Accurate pose estimation over SE(3) is fundamentally crucial for numerous perception tasks, including camera re-localization. While existing learning-based methods estimated from a series of RGB images have significantly improved the accuracy of pose, the majority of models still face one or two limitations. First, few representations on SE(3) are smooth and differential, making them difficult to apply in deep learning frameworks. Second, they often require high computational resources due to complex deep network designs. We in this paper propose the DL-PoseNet to address these issues. Specifically, we present a novel representation for SE(3) which follows the property of smoothness of the pose. We then design a lightweight neural network to regress the pose by developing a differential pose layer. Finally, we introduce a novel loss function and gradient descent method to better supervise the proposed lightweight pose network. Extensive experiments on the camera re-localization task on the Cambridge Landmarks and 7-Scenes datasets demonstrate the superior predictive accuracy and benefits of our method in comparison with the state-of-the-art.
|
|
16:30-18:00, Paper ThCT5-CC.7 | Add to My Program |
Crossway Diffusion: Improving Diffusion-Based Visuomotor Policy Via Self-Supervised Learning |
|
Li, Xiang | Stony Brook University |
Belagali, Varun | Stony Brook University |
Shang, Jinghuan | Stony Brook University |
Ryoo, Michael S. | Google, Stony Brook University |
Keywords: Visual Learning, Imitation Learning, Representation Learning
Abstract: Diffusion models have been adopted for behavioral cloning in a sequence modeling fashion, benefiting from their exceptional capabilities in modeling complex data distributions. The standard diffusion-based policy iteratively denoises action sequences from random noise conditioned on the input states and the model is typically trained with a singular diffusion loss. This paper explores the potential enhancements in such models when the denoising process is informed by a better visual representation. We study the scenario where the model is jointly optimized using the standard diffusion loss alongside an auxiliary objective based on self-supervised learning. After experimenting with various objectives, we introduce Crossway Diffusion, a simple yet effective way to enhance diffusion-based visuomotor policy learning via a state decoder and an auxiliary reconstruction objective. During training, the state decoder reconstructs raw image pixels and other states from the intermediate representations of the model. Experiments demonstrate the effectiveness of our method in various simulated and real-world tasks, confirming its consistent advantages over the standard diffusion-based policy and other baselines.
|
|
16:30-18:00, Paper ThCT5-CC.8 | Add to My Program |
Bi-KVIL: Keypoints-Based Visual Imitation Learning of Bimanual Manipulation Tasks |
|
Gao, Jianfeng | Karlsruhe Institute of Technology (KIT) |
Jin, Xiaoshu | Karlrsuhe Institute of Technology |
Krebs, Franziska | Karlsruhe Institute of Technology (KIT) |
Jaquier, Noémie | Karlsruhe Institute of Technology (KIT) |
Asfour, Tamim | Karlsruhe Institute of Technology (KIT) |
Keywords: Visual Learning, Learning from Demonstration, Bimanual Manipulation
Abstract: Visual imitation learning has achieved impressive progress in learning unimanual manipulation tasks from a small set of visual observations, thanks to the latest advances in computer vision. However, learning bimanual coordination strategies and complex object relations from bimanual visual demonstrations, as well as generalizing them to categorical objects in novel cluttered scenes remain unsolved challenges. In this paper, we extend our previous work on keypoints-based visual imitation learning (K-VIL) to bimanual manipulation tasks. The proposed Bi-KVIL jointly extracts so-called Hybrid Master-Slave Relationships (HMSR) among objects and hands, bimanual coordination strategies, and sub-symbolic task representations. Our bimanual task representation is object-centric, embodiment-independent, and viewpoint-invariant, thus generalizing well to categorical objects in novel scenes. We evaluate our approach in various real-world applications, showcasing its ability to learn fine-grained bimanual manipulation tasks from a small number of human demonstration videos. Videos and source code are available at https://sites.google.com/view/bi-kvil.
|
|
16:30-18:00, Paper ThCT5-CC.9 | Add to My Program |
Neural Radiance Fields for Unbounded Lunar Surface Scene |
|
Zhang, Xu | Beihang University |
Cui, Linyan | Beihang Universitity |
Yin, Jihao | Beihang University |
Keywords: Visual Learning, Mapping
Abstract: Accurate understanding of lunar surface topography is vital for effective decision-making and remote control of lunar rovers during exploration missions. Conventional sensing methods often struggle to capture the intricate details of the lunar landscape. In response, we propose an innovative approach that leverages NeRF to synthesize new viewpoints within the expansive lunar environment. By blending 3D hash grids and 2D plane grids representations, our approach provides a comprehensive scene representation. We employ the technique of spiral sampling and feature rendering to enhance rendering quality while simultaneously reducing training time. Additionally, we leverage sparse point cloud to aid the model in better learning the geometric structure of the lunar environment. Through experimentation, we have demonstrated that our method is capable of synthesizing realistic images of lunar environments.
|
|
ThCT6-CC Oral Session, CC-414 |
Add to My Program |
Simulation and Animation |
|
|
Chair: Urbann, Oliver | Fraunhofer IML |
Co-Chair: Uno, Kentaro | Tohoku University |
|
16:30-18:00, Paper ThCT6-CC.1 | Add to My Program |
A Convex Formulation of Frictional Contact between Rigid and Deformable Bodies |
|
Han, Xuchen | Toyota Research Institute |
Masterjohn, Joseph | Toyota Research Institute |
Castro, Alejandro | Toyota Research Institute |
Keywords: Simulation and Animation, Contact Modeling, Modeling, Control, and Learning for Soft Robots
Abstract: We present a novel convex formulation that models rigid and deformable bodies coupled through frictional contact. The formulation incorporates a new corotational material model with positive semi-definite Hessian, which allows us to extend our previous work on the convex formulation of compliant contact to model large body deformations. We rigorously characterize our approximations and present implementation details. With proven global convergence, effective warm-start, the ability to take large time steps, and specialized sparse algebra, our method runs robustly at interactive rates. We provide validation results and performance metrics on challenging simulations relevant to robotics applications. Our method is made available in the open-source robotics toolkit Drake.
|
|
16:30-18:00, Paper ThCT6-CC.2 | Add to My Program |
SocialGAIL: Faithful Crowd Simulation for Social Robot Navigation |
|
Ling, Bo | Southeast University |
Lyu, Yan | Southeast University |
Li, Dongxiao | Southeast University |
Gao, Guanyu | Nanjing University of Science and Technology |
Shi, Yi | Southeast University |
Xu, Xueyong | North Information Control Research Academy Group Co., Ltd |
Wu, Weiwei | Southeast University |
Keywords: Imitation Learning, Motion and Path Planning
Abstract: Navigation through crowded human environments is challenging for social robots. While reinforcement learning has been adopted for its capacity to capture complex interactions, the training process often relies on simulators to replicate realistic crowd behaviors, ensuring cost-efficiency. Existing crowd simulation methods typically rely on either handcrafted rules, which may lead to overly aggressive navigation, or learning from human trajectory demonstrations, which can be challenging to generalize effectively. In this paper, we introduce a data-driven crowd simulation method called SocialGAIL, which leverages Generative Adversarial Imitation Learning (GAIL) to emulate real pedestrian navigation in crowded environments. SocialGAIL utilizes an attention-based graph neural network to encode observations and employs a generator-discriminator architecture to closely mimic pedestrian behavior. We also propose a set of metrics to evaluate the faithfulness of crowd simulation. Experimental results demonstrate that SocialGAIL outperforms baseline methods in terms of goal-reaching, intermediate state faithfulness, trajectory faithfulness, and adherence to global trajectory patterns.
|
|
16:30-18:00, Paper ThCT6-CC.3 | Add to My Program |
MuRoSim – a Fast and Efficient Multi-Robot Simulation for Learning-Based Navigation |
|
Jestel, Christian | Fraunhofer IML |
Rösner, Karol | Fraunhofer IML |
Dietz, Niklas | Fraunhofer IML |
Bach, Nicolas | Fraunhofer IML |
Eßer, Julian | Fraunhofer IML |
Finke, Jan | Fraunhofer IML |
Urbann, Oliver | Fraunhofer IML |
Keywords: Simulation and Animation, Reinforcement Learning, Multi-Robot Systems
Abstract: Multi-robot navigation and dynamic obstacle avoidance are challenging problems in robot learning. Recent advancements in Deep Reinforcement Learning (DRL) have demonstrated great potential in this area. Nonetheless, they often face challenges related to low sample efficiency. To overcome this challenge, some research proposes simulators that incorporate hardware acceleration. Although these simulators improve efficiency, they often lack the flexibility to generate diverse learning scenarios as often needed in multi-robot scenarios, where the different environments have varying numbers of agents. In this paper, we introduce MuRoSim, a multi-robot simulation for lidar-based navigation specifically designed for DRL applications. Due to its high level of abstraction, complete implementation in C++, and rigorous thread pool utilization, MuRoSim achieves high computational performance. We apply MuRoSim for training navigation policies for omnidirectional mobile robots equipped with lidar sensors using DRL. Finally, we conduct extensive Sim-to-Real experiments to confirm the realism of the simulator, by deploying the learned policy for dynamic navigation with up to six robots in numerous of real-world experiments.
|
|
16:30-18:00, Paper ThCT6-CC.4 | Add to My Program |
STARK: A Unified Framework for Strongly Coupled Simulation of Rigid and Deformable Bodies with Frictional Contact |
|
Fernández-Fernández, José Antonio | RWTH Aachen University |
Lange, Ralph | Robert Bosch GmbH |
Laible, Stefan | University of Tuebingen |
Arras, Kai Oliver | University of Stuttgart |
Bender, Jan | RWTH Aachen University |
Keywords: Simulation and Animation
Abstract: The use of simulation in robotics is increasingly widespread for the purpose of testing, synthetic data generation and skill learning. A relevant aspect of simulation for a variety of robot applications is physics-based simulation of robot-object interactions. This involves the challenge of accurately modeling and implementing different mechanical systems such as rigid and deformable bodies as well as their interactions via constraints, contact or friction. Most state-of-the-art physics engines commonly used in robotics either cannot couple deformable and rigid bodies in the same framework, lack important systems such as cloth or shells, have stability issues in complex friction-dominated setups or cannot robustly prevent penetrations. In this paper, we propose a framework for strongly coupled simulation of rigid and deformable bodies with focus on usability, stability, robustness and easy access to state-of-the-art deformation and frictional contact models. Our system uses the Finite Element Method (FEM) to model deformable solids, the Incremental Potential Contact (IPC) approach for frictional contact and a robust second order optimizer to ensure stable and penetration-free solutions to tight tolerances. It is a general purpose framework, not tied to a particular use case such as grasping or learning, it is written in C++ and comes with a Python interface. We demonstrate our system’s ability to reproduce complex real-world experiments where a mobile vacuum robot interacts with a towel on different floor types and towel geometries. Our system is able to reproduce 100% of the qualitative outcomes observed in the laboratory environment. The simulation pipeline, named Stark (the German word for strong, as in strong coupling) is made open-source.
|
|
16:30-18:00, Paper ThCT6-CC.5 | Add to My Program |
Hydrodynamic Interactions in Schooling Fish: Prioritizing Real Fish Kinematics Over Travelling-Wavy Undulation |
|
Chao, Li-Ming | Max Planck Institute of Animal Behavior |
Li, Liang | Max-Planck Institute of Animal Behavior |
Keywords: Simulation and Animation, Biomimetics, Biologically-Inspired Robots
Abstract: Hydrodynamic interactions are crucial for understanding fish movement, particularly within the realm of robotic applications. Traditionally, many studies have favoured simplified travelling-wavy undulations derived from observed real fish kinematics. This approach often neglects higher-order undulations, thereby missing the subtleties of authentic fish movements. In this study, we utilised Computational Fluid Dynamics (CFD) to investigate the implications of using real fish kinematics in hydrodynamic interactions among schooling fish. We analysed two scenarios: one driven by real fish kinematics in spatiotemporal formations, and the other by travelling-wavy undulations inferred from the same real fish kinematics. Our results highlight the advantages of using real fish body kinematics for a more accurate representation of hydrodynamics in fish swimming. In contrast, the idealised travelling-wavy undulations tend to apply excessive force, displacing real fish more than expected. Additionally, the vortices and corresponding flow fields generated by real fish kinematics were found to be more stable than those arising from simplified travelling-wavy undulations. Our study underscores the significance of integrating real fish kinematics into robotic fish design and hydrodynamic studies in schooling fish.
|
|
16:30-18:00, Paper ThCT6-CC.6 | Add to My Program |
OmniLRS: A Photorealistic Simulator for Lunar Robotics |
|
Richard, Antoine | University of Luxembourg |
Kamohara, Junnosuke | Tohoku University |
Uno, Kentaro | Tohoku University |
Santra, Shreya | Tohoku University |
van der Meer, Dave | Interdisciplinary Centre for Security, Reliability and Trust - U |
Olivares-Mendez, Miguel A. | Interdisciplinary Centre for Security, Reliability and Trust - U |
Yoshida, Kazuya | Tohoku University |
Keywords: Simulation and Animation, Space Robotics and Automation, Object Detection, Segmentation and Categorization
Abstract: Developing algorithms for extra-terrestrial robotic exploration has always been challenging. Along with the complexity associated with these environments, one of the main issues remains the evaluation of said algorithms. With the regained interest in lunar exploration, there is also a demand for quality simulators that will enable the development of lunar robots. In this paper, we propose Omniverse Lunar Robotic-Sim (OmniLRS) that is a photorealistic Lunar simulator based on Nvidia's robotic simulator. This simulation provides fast procedural environment generation, multi-robot capabilities, along with synthetic data pipeline for machine-learning applications. It comes with ROS1 and ROS2 bindings to control not only the robots, but also the environments. This work also performs sim-to-real rock instance segmentation to show the effectiveness of our simulator for image-based perception. Trained on our synthetic data, a yolov8 model achieves performance close to a model trained on real-world data, with 5% performance gap. When finetuned with real data, the model achieves 14% higher average precision than the model trained on real-world data, demonstrating our simulator's photorealism. The code is fully open-source, accessible here: https://github.com/AntoineRichard/LunarSim, and comes with demonstrations.
|
|
16:30-18:00, Paper ThCT6-CC.7 | Add to My Program |
SceneControl: Diffusion for Controllable Traffic Scene Generation |
|
Lu, Jack | University of Waterloo |
Wong, Kelvin | University of Toronto |
Zhang, Chris | Waabi / University of Toronto |
Suo, Simon | Waabi |
Urtasun, Raquel | University of Toronto |
Keywords: Simulation and Animation, Deep Learning Methods, Probabilistic Inference
Abstract: We consider the task of traffic scene generation. A common approach in the self-driving industry is to use manual creation to generate scenes with specific characteristics and automatic generation to generate canonical scenes at scale. However, manual creation is not scalable, and automatic generation typically use rules-based algorithms that lack realism. In this paper, we propose SceneControl, a framework for controllable traffic scene generation. To capture the complexity of real traffic, SceneControl learns an expressive diffusion model from data. Then, using guided sampling, we can flexibly control the sampling process to generate scenes that exhibit desired characteristics. Our experiments show that SceneControl achieves greater realism and controllability than the existing state-of-the-art. We also illustrate how SceneControl can be used as a tool for interactive traffic scene generation.
|
|
16:30-18:00, Paper ThCT6-CC.8 | Add to My Program |
Jade: A Differentiable Physics Engine for Articulated Rigid Bodies with Intersection-Free Frictional Contact |
|
Yang, Gang | National University of Singapore |
Luo, Siyuan | Xi'an Jiaotong University |
Feng, Yunhai | University of California, San Diego |
Sun, Zhixin | Nanjing University |
Tie, Chenrui | Peking University |
Shao, Lin | National University of Singapore |
Keywords: Simulation and Animation, Contact Modeling
Abstract: We present Jade, a differentiable physics engine for articulated rigid bodies. Jade models contacts as the Linear Complementarity Problem (LCP). Compared to existing differentiable simulations, Adams offers features including intersection-free collision simulation and stable LCP solutions for multiple frictional contacts. We use continuous collision detection to detect the time of impact and adopt the backtracking strategy to prevent intersection between bodies with complex geometry shapes. We derive the gradient calculation to ensure the whole simulation process is differentiable under the backtracking mechanism. We modify the popular Dantzig algorithm to get valid solutions under multiple frictional contacts. We conduct extensive experiments to demonstrate the effectiveness of our differentiable physics simulation over a variety of contact-rich tasks. Supplemental materials and videos are available on our project webpage.
|
|
16:30-18:00, Paper ThCT6-CC.9 | Add to My Program |
Simulation Modeling of Highly Dynamic Omnidirectional Mobile Robots Based on Real-World Data |
|
Wiedemann, Marvin | Fraunhofer Institute for Material Flow and Logistics |
Ahmed, Ossama | Frauhofer IML |
Dieckhoefer, Anna | Fraunhofer Institute for Material Flow and Logistics |
Gasoto, Renato | Worcester Polytechnic Institute, NVIDIA |
Kerner, Sören | Fraunhofer IML |
Keywords: Simulation and Animation
Abstract: Simulation is a key technology in robotics as it enables the generation of environmental data and testing scenarios for development and maintenance purposes. However, simulations are an imperfect representation of the real world and the so-called sim-to-real gap between simulation and reality hinders the deployment of virtual developed solutions without additional effort. Modeling complex systems like highly dynamic and holonomic mobile robots presents additional complexities in simulation. This paper addresses these challenges through a case study on creating a model for a highly dynamic logistics robot. The study breaks down the modeling of the whole system down to creating appropriate colliders for the rollers of a Mecanum wheel. Additionally, the impact of significant physics parameters is presented. To bridge the sim-to-real gap, a pipeline is developed that utilizes a Motion Capture system to compare the behavior of a real robot with its simulated counterpart across various motions. By leveraging expert knowledge gained from the real-world data, the simulation model is manually tuned to replicate complex system behaviors, such as sliding effects.
|
|
ThCT7-CC Oral Session, CC-416 |
Add to My Program |
Machine Learning for Robot Control II |
|
|
Chair: Berenson, Dmitry | University of Michigan |
Co-Chair: Atanasov, Nikolay | University of California, San Diego |
|
16:30-18:00, Paper ThCT7-CC.1 | Add to My Program |
Sim-To-Real Learning for Humanoid Box Loco-Manipulation |
|
Dao, Jeremy | Oregon State University |
Duan, Helei | Oregon State University |
Fern, Alan | Oregon State University |
Keywords: Machine Learning for Robot Control, Bimanual Manipulation, Legged Robots
Abstract: In this work we propose a learning-based approach to box loco-manipulation for a humanoid robot. This is a particularly challenging problem due to the need for whole-body coordination in order to lift boxes of varying weight, position, and orientation while maintaining balance. To address this challenge, we present a sim-to-real reinforcement learning approach for training general box pickup and carrying skills for the bipedal robot Digit. Our reward functions are designed to produce the desired interactions with the box while also valuing balance and gait quality. We combine the learned skills into a full system for box loco-manipulation to achieve the task of moving boxes from one table to another with a variety of sizes, weights, and initial configurations. In addition to quantitative simulation results, we demonstrate successful sim-to-real transfer on the humanoid robot Digit. To our knowledge this is the first demonstration of a learned controller for such a task on real world hardware.
|
|
16:30-18:00, Paper ThCT7-CC.2 | Add to My Program |
Hamiltonian Dynamics Learning from Point Cloud Observations for Nonholonomic Mobile Robot Control |
|
Altawaitan, Abdullah | University of California San Diego |
Stanley, Jason | University of California, San Diego |
Ghosal, Sambaran | University of California San Diego |
Duong, Thai | University of California, San Diego |
Atanasov, Nikolay | University of California, San Diego |
Keywords: Machine Learning for Robot Control, Model Learning for Control, Wheeled Robots
Abstract: Reliable autonomous navigation requires adapting the control policy of a mobile robot in response to dynamics changes in different operational conditions. Hand-designed dynamics models may struggle to capture model variations due to a limited set of parameters. Data-driven dynamics learning approaches offer higher model capacity and better generalization but require large amounts of state-labeled data. This paper develops an approach for learning robot dynamics directly from point-cloud observations, removing the need and associated errors of state estimation, while embedding Hamiltonian structure in the dynamics model to improve data efficiency. We design an observation-space loss that relates motion prediction from the dynamics model with motion prediction from point-cloud registration to train a Hamiltonian neural ordinary differential equation. The learned Hamiltonian model enables the design of an energy-shaping model-based tracking controller for rigid-body robots. We demonstrate dynamics learning and tracking control on a real nonholonomic wheeled robot.
|
|
16:30-18:00, Paper ThCT7-CC.3 | Add to My Program |
Deep Model Predictive Optimization |
|
Sacks, Jacob | University of Washington |
Rana, Rwik | University of Washington |
Huang, Kevin | University of Washington |
Spitzer, Alexander | University of Washington |
Shi, Guanya | Carnegie Mellon University |
Boots, Byron | University of Washington |
Keywords: Machine Learning for Robot Control, Reinforcement Learning, Optimization and Optimal Control
Abstract: A major challenge in robotics is to design robust policies which enable complex and agile behaviors in the real world. On one end of the spectrum, we have model-free reinforcement learning (MFRL), which is incredibly flexible and general but often results in brittle policies. In contrast, model predictive control (MPC) continually re-plans at each time step to remain robust to perturbations and model inaccuracies. However, despite its real-world successes, MPC often under-performs the optimal strategy. This is due to model quality, myopic behavior from short planning horizons, and approximations due to computational constraints. And even with a perfect model and enough compute, MPC can get stuck in bad local optima, depending heavily on the quality of the optimization algorithm. To this end, we propose Deep Model Predictive Optimization (DMPO), which learns the inner-loop of an MPC optimization algorithm directly via experience, specifically tailored to the needs of the control problem. We evaluate DMPO on a real quadrotor agile trajectory tracking task, on which it improves performance over a baseline MPC algorithm for a given computational budget. It can outperform the best MPC algorithm by up to 27% with fewer samples and an end-to-end policy trained with MFRL by 19%. Moreover, because DMPO requires fewer samples, it can also achieve these benefits with 4.3X less memory. When we subject the quadrotor to turbulent wind fields with an attached drag plate, DMPO can adapt zero-shot while still outperforming all baselines. Additional results can be found at https://tinyurl.com/mr2ywmnw.
|
|
16:30-18:00, Paper ThCT7-CC.4 | Add to My Program |
Pay Attention to How You Drive: Safe and Adaptive Model-Based Reinforcement Learning for Off-Road Driving |
|
Wang, Sean J. | Carnegie Mellon University |
Zhu, Honghao | CMU |
Johnson, Aaron M. | Carnegie Mellon University |
Keywords: Reinforcement Learning, Wheeled Robots, Machine Learning for Robot Control
Abstract: Autonomous off-road driving is challenging as unsafe actions may lead to catastrophic damage. As such, developing controllers in simulation is often desirable. However, robot dynamics in unstructured off-road environments can be highly complex and difficult to simulate accurately. Domain randomization addresses this problem by randomizing simulation dynamics to train policies that are robust towards modeling errors. While these policies are robust across a range of dynamics, they are sub-optimal for any particular system dynamics. We introduce a novel model-based reinforcement learning approach that aims to balance robustness with adaptability. We train a System Identification Transformer (SIT) and an Adaptive Dynamics Model (ADM) under a variety of simulated dynamics. The SIT uses attention mechanisms to distill target system state-transition observations into a context vector, which provides an abstraction for the target dynamics. Conditioned on this, the ADM probabilistically models the system's dynamics. Online, we use a Risk-Aware Model Predictive Path Integral controller to safely control the robot under its current understanding of dynamics. We demonstrate in simulation and in the real world that this approach enables safer behaviors upon initialization and becomes less conservative (i.e. faster) as its understanding of the target system dynamics improves with more observations. In particular, our approach results in an approximately 41% improvement in lap-time over the non-adaptive baseline while remaining safe across different environments.
|
|
16:30-18:00, Paper ThCT7-CC.5 | Add to My Program |
SERL: A Software Suite for Sample-Efficient Robotic Reinforcement Learning |
|
Luo, Jianlan | UC Berkeley |
Hu, Zheyuan | University of California, Berkeley |
Xu, Charles | University of California, Berkeley |
Tan, You Liang | Georgia Institute of Technology |
Herman Berg, Jacob | University of Washington |
Sharma, Archit | Stanford University |
Schaal, Stefan | Google X |
Finn, Chelsea | Stanford University |
Gupta, Abhishek | University of Washington |
Levine, Sergey | UC Berkeley |
Keywords: Machine Learning for Robot Control, Reinforcement Learning, Assembly
Abstract: Recent years have seen the development of many methods for robotic reinforcement learning (RL), some of which can even operate on complex image observations, run directly in the real world, and incorporate auxiliary data, such as demonstrations and prior experience. However, despite these advances, robotic RL remains hard to use. It is acknowledged among practitioners that, separately from the fundamental technical ideas, the particular implementation details of these algorithms are often just as important (if not moreso) for performance as the choice of algorithm that is actually used. We posit that a significant challenge to widespread adoption of robotic RL, as well as further development of robotic RL methods, is comparative inaccessibility of such methods. To address this challenge, we developed a carefully implemented library containing a sample efficient modern deep RL method, together with frameworks for computing rewards and resetting the environment, high-quality controllers for a few common robots, and a number of challenging example tasks. We provide this library as a resource for the community, describe its design choices, and present experimental results. Perhaps surprisingly, we find that our implementation can achieve very efficient learning, acquiring policies for PCB board assembly, cable routing, and object relocation in less than an hour of training per policy, comparing very favorably to state-of-the-art results reported for similar tasks in the literature. We hope that these promising results and our high-quality open-source implementation will provide a tool for the robotics community to study new developments in robotic RL. Our code and videos can be found at https://serl-robot.github.io.
|
|
16:30-18:00, Paper ThCT7-CC.6 | Add to My Program |
Improving Out-Of-Distribution Generalization of Learned Dynamics by Learning Pseudometrics and Constraint Manifolds |
|
Lin, Yating | University of Michigan |
Chou, Glen | MIT |
Berenson, Dmitry | University of Michigan |
Keywords: Machine Learning for Robot Control
Abstract: We propose a method for improving the prediction accuracy of learned robot dynamics models on out-of-distribution (OOD) states. We achieve this by leveraging two key sources of structure often present in robot dynamics: 1) sparsity, i.e., some components of the state may not affect the dynamics, and 2) physical limits on the set of possible motions, in the form of nonholonomic constraints. Crucially, we do not assume this structure is known a priori, and instead learn it from data. We use contrastive learning to obtain a distance pseudometric that uncovers the sparsity pattern in the dynamics, and use it to reduce the input space when learning the dynamics. We then learn the unknown constraint manifold by approximating the normal space of possible motions from the data, which we use to train a Gaussian process (GP) representation of the constraint manifold. We evaluate our approach on a physical differential-drive robot and a simulated quadrotor, showing improved prediction accuracy on OOD data relative to baselines.
|
|
16:30-18:00, Paper ThCT7-CC.7 | Add to My Program |
Robotic Offline RL from Internet Videos Via Value-Function Learning |
|
Bhateja, Chethan | Stanford University |
Guo, Derek | UC Berkeley |
Ghosh, Dibya | UC Berkeley |
Singh, Anikait | Stanford University |
Tomar, Manan | University of Alberta |
Vuong, Quan | UC San Diego |
Chebotar, Yevgen | Google |
Levine, Sergey | UC Berkeley |
Kumar, Aviral | CMU |
Keywords: Machine Learning for Robot Control, Reinforcement Learning, Representation Learning
Abstract: Pre-training on Internet data has proven to be a key ingredient for broad generalization in many modern ML systems. What would it take to enable such capabilities in robotic reinforcement learning (RL)? Offline RL methods, which learn from datasets of robot experience, offer one way to leverage prior data into the robotic learning pipeline. However, these methods have a "type mismatch" with video data (such as Ego4D), which are the largest prior datasets available for robotics, since video offers observation-only experience without the action or reward annotations needed for RL methods. In this paper, we develop a system for leveraging large-scale human video datasets in robotic offline RL, based entirely on learning value functions via temporal-difference learning. We show that value learning on video datasets learns representations that are more conducive to downstream robotic offline RL than other approaches for learning from video data. Our system, called V-PTR, combines the benefits of pre-training on video data with robotic offline RL approaches that train on diverse robot data, resulting in value functions and policies for manipulation tasks that perform better, act robustly, and generalize broadly. On several manipulation tasks on a real WidowX robot and in simulated settings, our framework produces policies that greatly improve over other prior methods. Our video and additional details can be found at https://dibyaghosh.com/vptr/.
|
|
16:30-18:00, Paper ThCT7-CC.8 | Add to My Program |
Learning Manipulation of Steep Granular Slopes for Fast Mini Rover Turning |
|
Kerimoglu, Deniz | Georgia Institute of Technology |
Soto, Daniel | Georgia Institute of Technology |
Hemsley, Malone Lincoln | Morehouse College |
Brunner, Joseph | Georgia Institute of Technology |
Ha, Sehoon | Georgia Institute of Technology |
Zhang, Tingnan | Google |
Goldman, Daniel | Georgia Institute of Technology |
Keywords: Machine Learning for Robot Control, Space Robotics and Automation, Wheeled Robots
Abstract: Future planetary exploration missions will require reaching challenging regions such as craters and steep slopes. Such regions are ubiquitous and present science-rich targets potentially containing information regarding the planet’s internal structure. Steep slopes consisting of low-cohesion regolith are prone to flow downward under small disturbances, making it challenging for autonomous rovers to traverse. Moreover, the navigation trajectories of rovers are heavily limited by the terrain topology and future systems will need to maneuver on flowable surfaces without getting trapped, allowing them to further expand their reach and increase mission efficiency. In this work, we used a robophysical rover model and performed maneuvering experiments on a steep granular slope of poppy seeds to explore the rover's turning capabilities. The rover is capable of lifting, sweeping, and spinning its wheels, allowing it to execute leg-like gait patterns. The high-dimensional actuation capabilities of the rover facilitate effective manipulation of the underlying granular surface. We used Bayesian Optimization (BO) to gain insight into successful turning gaits in high dimensional search space and found strategies such as differential wheel spinning and pivoting around a single sweeping wheel. We then used these insights to further fine-tune the turning gait, enabling the rover to turn nearly 90 degrees at just above 4 seconds with minimal downhill slip. Combining gait optimization and human-tuning approaches, we found that fast turning is empowered by creating anisotropic torques with the sweeping wheel.
|
|
16:30-18:00, Paper ThCT7-CC.9 | Add to My Program |
Safe Reinforcement Learning with Dead-Ends Avoidance and Recovery |
|
Zhang, Xiao | Tongji University |
Zhang, Hai | Tongji University |
Zhou, Hongtu | Tongji University |
Huang, Chang | Tongji University |
Zhang, Di | TongJi University |
Ye, Chen | Tongji University |
Zhao, Junqiao | Tongji University |
Keywords: Machine Learning for Robot Control, AI-Based Methods, Robot Safety
Abstract: Safety is one of the main challenges in applying reinforcement learning to realistic environmental tasks. To ensure safety during and after training process, existing methods tend to adopt overly conservative policy to avoid unsafe situations. However, overly conservative policy severely hinders the exploration, and makes the algorithms substantially less rewarding. In this paper, we propose a method to construct a boundary that discriminates safe and unsafe states. The boundary we construct is equivalent to distinguishing dead-end states, indicating the maximum extent to which safe exploration is guaranteed, and thus has minimum limitation on exploration. Similar to Recovery Reinforcement Learning, we utilize a decoupled RL framework to learn two policies, (1) a task policy that only considers improving the task performance, and (2) a recovery policy that maximizes safety. The recovery policy and a corresponding safety critic are pretrained on an offline dataset, in which the safety critic evaluates upper bound of safety in each state as awareness of environmental safety for the agent. During online training, a behavior correction mechanism is adopted, ensuring the agent to interact with the environment using safe actions only. Finally, experiments of continuous control tasks demonstrate that our approach has better task performance with less safety violations than state-of-the-art algorithms.
|
|
ThCT8-CC Oral Session, CC-418 |
Add to My Program |
Data Sets for Robotic Vision II |
|
|
Chair: Nasseri, M. Ali | Technische Universitaet Muenchen |
Co-Chair: Sommersperger, Michael | Technical University of Munich |
|
16:30-18:00, Paper ThCT8-CC.1 | Add to My Program |
Exploring the Needle Tip Interaction Force with Retinal Tissue Deformation in Vitreoretinal Surgery |
|
Pannek, Simon Marc | TUM |
Dehghani, Shervin | TUM |
Sommersperger, Michael | Technical University of Munich |
Zhang, Peiyao | Johns Hopkins University |
Gehlbach, Peter | Johns Hopkins Medical Institute |
Nasseri, M. Ali | Technische Universitaet Muenchen |
Iordachita, Ioan Iulian | Johns Hopkins University |
Navab, Nassir | TU Munich |
Keywords: Data Sets for Robotic Vision, Medical Robots and Systems, Data Sets for Robot Learning
Abstract: Recent advancements in age-related macular degeneration treatments necessitate precision delivery into the subretinal space, emphasizing minimally invasive procedures targeting the retinal pigment epithelium (RPE)-Bruch’s membrane complex without causing trauma. Even for skilled surgeons, the inherent hand tremors during manual surgery can jeopardize the safety of these critical interventions. This has fostered the evolution of robotic systems designed to prevent such tremors. These robots are enhanced by FBG sensors, which sense the small force interactions between the surgical instruments and retinal tissue. To enable the community to design algorithms taking advantage of such force feedback data, this paper focuses on the need to provide a specialized dataset, integrating optical coherence tomography (OCT) imaging together with the aforementioned force data. We introduce a unique dataset, integrating force sensing data synchronized with OCT B-scan images, derived from a sophisticated setup involving robotic assistance and OCT integrated microscopes. Furthermore, we present a neural network model for image-based force estimation to demonstrate the dataset’s applicability.
|
|
16:30-18:00, Paper ThCT8-CC.2 | Add to My Program |
PanNote: An Automatic Tool for Panoramic Image Annotation of People's Positions |
|
Bacchin, Alberto | University of Padua |
Barcellona, Leonardo | University of Padova |
Shamsizadeh, Sepideh | University of Padova |
Olivastri, Emilio | University of Padua |
Pretto, Alberto | University of Padova |
Menegatti, Emanuele | The University of Padua |
Keywords: Data Sets for Robotic Vision, Omnidirectional Vision, Human Detection and Tracking
Abstract: Panoramic cameras offer a 4π steradian field of view, which is desirable for tasks like people detection and tracking since nobody can exit the field of view. Despite the recent diffusion of low-cost panoramic cameras, their usage in robotics remains constrained by the limited availability of datasets featuring annotations in the robot space, including people's 2D or 3D positions. To tackle this issue, we introduce PanNote, an automatic annotation tool for people's positions in panoramic videos. Our tool is designed to be cost-effective and straightforward to use without requiring human intervention during the labeling process and enabling the training of machine learning models with low effort. The proposed method introduces a calibration model and a data association algorithm to fuse data from panoramic images and 2D LiDAR readings. We validate the capabilities of PanNote by collecting a real-world dataset. On these data, we compared manual labels, automatic labels and the predictions of a baseline deep neural network. Results clearly show the advantage of using our method, with a 15-fold speed up in labeling time and a considerable gain in performance while training deep neural models on automatically labelled data.
|
|
16:30-18:00, Paper ThCT8-CC.3 | Add to My Program |
A Multimodal Handover Failure Detection Dataset and Baselines |
|
Thoduka, Santosh | Hochschule Bonn-Rhein-Sieg |
Hochgeschwender, Nico | University of Bremen |
Gall, Juergen | University of Bonn |
Plöger, Paul G. | Hochschule Bonn Rhein Sieg |
Keywords: Data Sets for Robotic Vision, Performance Evaluation and Benchmarking, Human-Robot Collaboration
Abstract: An object handover between a robot and a human is a coordinated action which is prone to failure for reasons such as miscommunication, incorrect actions and unexpected object properties. Existing works on handover failure detection and prevention focus on preventing failures due to object slip or external disturbances. However, there is a lack of datasets and evaluation methods that consider unpreventable failures caused by the human participant. To address this deficit, we present the multimodal Handover Failure Detection dataset, which consists of failures induced by the human participant, such as ignoring the robot or not releasing the object. We also present two baseline methods for handover failure detection: (i) a video classification method using 3D CNNs and (ii) a temporal action segmentation approach which jointly classifies the human action, robot action and overall outcome of the action. The results show that video is an important modality, but using force-torque data and gripper position help improve failure detection and action segmentation accuracy.
|
|
16:30-18:00, Paper ThCT8-CC.4 | Add to My Program |
Introducing CEA-IMSOLD: An Industrial Multi-Scale Object Localization Dataset |
|
Meden, Boris | Université Paris Saclay, CEA, LIST, F-91120 Palaiseau, France |
Vega, Emanuel Pablo | CEA, LIST, F-91120 Gif-Sur-Yvette Cedex |
Mayran de Chamisso, Fabrice | CEA, LIST, F-91120 Gif-Sur-Yvette Cedex |
Bourgeois, Steve | CEA LIST |
Keywords: Data Sets for Robotic Vision, RGB-D Perception, Object Detection, Segmentation and Categorization
Abstract: We introduce the CEA Industrial Multi-Scale Object Localization Dataset (CEA-IMSOLD), a new BOP format dataset for 6-DoF object localization, crucial for robotics. This dataset aims to evaluate the current localization methods with respect to a new difficulty: large variations in observation distance and, consequently, large variations in image appearance. Compared to the other publicly available datasets, our dataset provides both images with objects small and completely visible in the image, and images where objects are observed close enough so they appear larger than the field of view of the camera. We also propose to consider the observation distance in the evaluation process and introduce new metrics to do so. Finally, our dataset contains a large variety of industrial objects, from small and simple objects such as bolts to sizable and complex ones such as large car parts. We provide baseline results and the dataset is made publicly available to support the community at https://cea-list.github.io/CEA-IMSOLD/.
|
|
16:30-18:00, Paper ThCT8-CC.5 | Add to My Program |
PointSSC: A Cooperative Vehicle-Infrastructure Point Cloud Benchmark for Semantic Scene Completion |
|
Yan, Yuxiang | Fudan University |
Liu, Boda | Fudan University |
Ai, Jianfei | Moo Auto Intelligence and Telematics Information Technology Comp |
Li, Qinbu | MOGO |
Wan, Ru | Mogo Ai |
Pu, Jian | Fudan University |
Keywords: Data Sets for Robotic Vision, Semantic Scene Understanding, Computer Vision for Transportation
Abstract: Semantic Scene Completion (SSC) aims to jointly generate space occupancies and semantic labels for complex 3D scenes. Most existing SSC models focus on volumetric representations, which are memory-inefficient for large outdoor spaces. Point clouds provide a lightweight alternative but existing benchmarks lack outdoor point cloud scenes with semantic labels. To address this, we introduce PointSSC, the first cooperative vehicle-infrastructure point cloud benchmark for semantic scene completion. These scenes exhibit long-range perception and minimal occlusion. We develop an automated annotation pipeline leveraging Semantic Segment Anything to efficiently assign semantics. To benchmark progress, we propose a LiDAR-based model with a Spatial-Aware Transformer for global and local feature extraction and a Completion and Segmentation Cooperative Module for joint completion and segmentation. PointSSC provides a challenging testbed to drive advances in semantic point cloud completion for real-world navigation. The code and datasets are available at https://github.com/yyxssm/PointSSC.
|
|
16:30-18:00, Paper ThCT8-CC.6 | Add to My Program |
Close the Sim2real Gap Via Physically-Based Structured Light Synthetic Data Simulation |
|
Bai, Kaixin | University of Hamburg |
Zhang, Lei | University of Hamburg |
Chen, Zhaopeng | University of Hamburg |
Wan, Fang | Southern University of Science and Technology |
Zhang, Jianwei | University of Hamburg |
Keywords: Data Sets for Robotic Vision, Transfer Learning, Deep Learning for Visual Perception
Abstract: Despite the substantial progress in deep learning, its adoption in industrial robotics projects remains limited, primarily due to challenges in data acquisition and labeling. Previous sim2real approaches using domain randomization re- quire extensive scene and model optimization. To address these issues, we introduce an innovative physically-based structured light simulation system, generating both RGB and physically realistic depth images, surpassing previous dataset generation tools. We create an RGBD dataset tailored for robotic industrial grasping scenarios and evaluate it across various tasks, including object detection, instance segmentation, and embedding sim2real visual perception in industrial robotic grasping. By reducing the sim2real gap and enhancing deep learning training, we facilitate the application of deep learning models in industrial settings. Project details are available at https://baikaixin-github.io/structured light 3D synthesizer/.
|
|
16:30-18:00, Paper ThCT8-CC.7 | Add to My Program |
Interacting Objects: A Dataset of Object-Object Interactions for Richer Dynamic Scene Representations |
|
Unmesh, Asim | Purdue University |
Jain, Rahul | Purdue University |
Shi, Jingyu | Purdue University |
Manam, V. K. Chaithanya | Purdue University |
Chi, Hyung-gun | Purdue University |
Chidambaram, Subramanian | Purdue University |
Quinn, Alexander | Purdue University |
Ramani, Karthik | Purdue University |
Keywords: Data Sets for Robotic Vision, Visual Learning
Abstract: Dynamic environments in factories, surgical robotics, and warehouses increasingly involve humans, ma- chines, robots, and various other objects such as tools, fixtures, conveyors, and assemblies. In these environments, numerous interactions occur not just between humans and objects but also between objects themselves. However, current scene-graph datasets predominantly focus on human-object interactions (HOI) and overlook object-object interactions (OOIs) despite the necessity of OOIs in effectively representing dynamic environments. This oversight creates a significant gap in the coverage of interactive elements in dynamic scenes. We address this gap by proposing, to the best of our knowledge, the first dataset* annotating for OOIs in dynamic scenes. To model OOIs, we establish a classification taxonomy for spatio- temporal interactions. We use our taxonomy to annotate OOIs in video clips of dynamic scenes. Then, we introduce spatio- temporal OOI classification task which aims at identifying interaction categories between two given objects in a video clip. Further, we benchmark our dataset for the spatio-temporal OOI classification task by adopting state-of-the-art approaches from related areas of Human-Object Interaction Classification, Visual Relationship Classification, and Scene-Graph Gener- ation. Additionally, we utilize our dataset to examine the effectiveness of OOI and HOI-based features in the context of Action Recognition. Notably, our experimental results show th
|
|
16:30-18:00, Paper ThCT8-CC.8 | Add to My Program |
RTS-GT: Robotic Total Stations Ground Truthing Dataset |
|
Vaidis, Maxime | Université Laval |
Hassanzadeh Shahraji, Mohsen | Université Laval |
Daum, Effie | Université Laval |
Dubois, William | Université Laval |
Gigučre, Philippe | Université Laval |
Pomerleau, Francois | Université Laval |
Keywords: Data Sets for SLAM, Localization, Field Robots
Abstract: Numerous datasets and benchmarks exist to assess and compare Simultaneous Localization and Mapping (SLAM) algorithms. Nevertheless, their precision must follow the rate at which SLAM algorithms improved in recent years. Moreover, current datasets fall short of comprehensive data-collection protocol for reproducibility and the evaluation of the precision or accuracy of the recorded trajectories. With this objective in mind, we proposed the Robotic Total Stations Ground Truthing dataset (RTS-GT) dataset to support localization research with the generation of six-Degrees Of Freedom (DOF) ground truth trajectories. This novel dataset includes six-DOF ground truth trajectories generated using a system of three Robotic Total Stations (RTSs) tracking moving robotic platforms. Furthermore, we compare the performance of the RTS-based system to a Global Navigation Satellite System (GNSS)-based setup. The dataset comprises around sixty experiments conducted in various conditions over a period of 17 months, and encompasses over 49 kilometers of trajectories, making it the most extensive dataset of RTS-based measurements to date. Additionally, we provide the precision of all poses for each experiment, a feature not found in the current state-of-the-art datasets. Our results demonstrate that RTSs provide measurements that are 22 times more stable than GNSS in various environmental settings, making them a valuable resource for SLAM benchmark development.
|
|
16:30-18:00, Paper ThCT8-CC.9 | Add to My Program |
RaSim: A Range-Aware High-Fidelity RGB-D Data Simulation Pipeline for Real-World Applications |
|
Liu, Xingyu | Tsinghua University |
Zhang, Chenyangguang | Tsinghua University |
Wang, Gu | Tsinghua University |
Zhang, Ruida | Tsinghua University |
Ji, Xiangyang | Tsinghua University |
Keywords: Deep Learning for Visual Perception, Data Sets for Robotic Vision, RGB-D Perception
Abstract: In robotic vision, a de-facto paradigm is to learn in simulated environments and then transfer to real-world applications, which poses an essential challenge in bridging the sim-to-real domain gap. While mainstream works tackle this problem in the RGB domain, we focus on depth data synthesis and develop a Range-aware RGB-D data Simulation pipeline (RaSim). In particular, high-fidelity depth data is generated by imitating the imaging principle of real-world sensors. A range-aware rendering strategy is further introduced to enrich data diversity. Extensive experiments show that models trained with RaSim can be directly applied to real-world scenarios without any finetuning and excel at downstream RGB-D perception tasks. Data and code are available at https://github.com/shanice-l/RaSim.
|
|
ThCT9-CC Oral Session, CC-419 |
Add to My Program |
Task and Motion Planning II |
|
|
Chair: Kessens, Chad C. | United States Army Research Laboratory |
Co-Chair: Domae, Yukiyasu | The National Institute of Advanced Industrial Science and Technology (AIST) |
|
16:30-18:00, Paper ThCT9-CC.1 | Add to My Program |
When Prolog Meets Generative Models: A New Approach for Managing Knowledge and Planning in Robotic Applications |
|
Saccon, Enrico | University of Trento |
Tikna, Ahmet | University of Trento |
De Martini, Davide | Universitŕ Degli Studi Di Trento |
Lamon, Edoardo | University of Trento |
Palopoli, Luigi | University of Trento |
Roveri, Marco | University of Trento |
Keywords: Task and Motion Planning, Multi-Robot Systems, Human-Robot Collaboration
Abstract: In this paper, we propose a robot oriented knowledge representation system based on the use of the Prolog language. Our framework hinges on a special organisation of knowledge base that enables: 1) its efficient population from natural language texts using semi-automated procedures based on Large Language Models (LLMs); 2) the seamless generation of temporal parallel plans for multi-robot systems through a sequence of transformations; 3) the automated translation of the plan into an executable formalism. The framework is supported by a set of open source tools and its functionality is shown with a realistic application.
|
|
16:30-18:00, Paper ThCT9-CC.2 | Add to My Program |
HAPFI: History-Aware Planning Based on Fused Information |
|
Jeon, Sujin | Seoul National University |
Shin, Suyeon | Seoul National University |
Zhang, Byoung-Tak | Seoul National University |
Keywords: Deep Learning Methods, AI-Based Methods, Task Planning
Abstract: Embodied Instruction Following (EIF) is a task of planning a long sequence of sub-goals given high-level natural language instructions, such as ''Rinse a slice of lettuce and place on the white table next to the fork''. To successfully execute these long-term horizon tasks, we argue that an agent must consider its past, i.e., historical data, when making decisions in each step. Nevertheless, recent approaches in EIF often neglects the knowledge from historical data and also do not effectively utilize information across the modalities. To this end, we propose History-Aware Planning based on Fused Information(HAPFI), effectively leveraging the historical data from diverse modalities that agents collect while interacting with the environment. Specifically, HAPFI integrates multiple modalities, including historical RGB observations, bounding boxes, sub-goals, and high-level instructions, by effectively fusing modalities via our Mutually Attentive Fusion method. Through experiments with diverse comparisons, we show that an agent utilizing historical multi-modal information surpasses all the compared methods that neglect the historical data in terms of action planning capability, enabling the generation of well-informed action plans for the next step. Moreover, we provided qualitative evidence highlighting the significance of leveraging historical multi-modal data, particularly in scenarios where the agent encounters intermediate failures, showcasing its robust re-planning capabilities.
|
|
16:30-18:00, Paper ThCT9-CC.3 | Add to My Program |
Non-Axiomatic Reasoning for an Autonomous Mobile Robot |
|
Hammer, Patrick | KTH Royal Institute of Technology |
Isaev, Peter | Temple University |
Feng, Lei | KTH Royal Institute of Technology |
Johansson, Robert | Stockholm University |
Tumova, Jana | KTH Royal Institute of Technology |
Keywords: Sensorimotor Learning, Learning from Experience, AI-Based Methods
Abstract: We present the integration of a Non-Axiomatic Reasoning System (NARS) with mobile robots for planning and decision making. NARS enables robots to effectively handle uncertainty in real-time with complete sensor and actuator integration, thereby ensuring adaptability to evolving scenarios. We discuss essential parts of the logic, the architecture and working principles of NARS, and the integration of NARS as a ROS node. A case study is provided demonstrating the system's proficiency to carry out a garbage collection task in an open-air environment by operating a mobile robot with manipulator arm, and we demonstrate its ability to learn about the place-dependent accumulation of garbage items. Case study also reveals that our approach performs more effectively on the overall task than the Belief-Desire-Intention model we compared with.
|
|
16:30-18:00, Paper ThCT9-CC.4 | Add to My Program |
Asynchronous Task Plan Refinement for Multi-Robot Task and Motion Planning |
|
Sung, Yoonchang | The University of Texas at Austin |
Shome, Rahul | The Australian National University |
Stone, Peter | University of Texas at Austin |
Keywords: Task and Motion Planning, Motion and Path Planning
Abstract: This paper explores general multi-robot task and motion planning, where multiple robots in close proximity manipulate objects while satisfying constraints and a given goal. In particular, we formulate the plan refinement problem—which, given a task plan, finds valid assignments of variables corresponding to solution trajectories—as a hybrid constraint satisfaction problem. The proposed algorithm follows several design principles that yield the following features: (1) efficient solution finding due to sequential heuristics and implicit time and roadmap representations, and (2) maximized feasible solution space obtained by introducing minimally necessary coordination-induced constraints and not relying on prevalent simplifications that exist in the literature. The evaluation results demonstrate the planning efficiency of the proposed algorithm, outperforming the synchronous approach in terms of makespan.
|
|
16:30-18:00, Paper ThCT9-CC.5 | Add to My Program |
Optimal Planning for Timed Partial Order Specifications |
|
Watanabe, Kandai | University of Colorado Boulder |
Fainekos, Georgios | Toyota NA-R&D |
Hoxha, Bardh | Southern Illinois University |
Lahijanian, Morteza | University of Colorado Boulder |
Okamoto, Hideki | Toyota Motor North America |
Sankaranarayanan, Sriram | University of Colorado, Boulder |
Keywords: Task and Motion Planning, Formal Methods in Robotics and Automation
Abstract: This paper addresses the challenge of planning a sequence of tasks to be performed by multiple robots while minimizing the overall completion time subject to timing and precedence constraints. Our approach uses the Timed Partial Orders (TPO) model to specify these constraints. We translate this problem into a Traveling Salesman Problem (TSP) variant with timing and precedent constraints, and we solve it as a Mixed Integer Linear Programming (MILP) problem. Our contributions include a general planning framework for TPO specifications, a MILP formulation accommodating time windows and precedent constraints, its extension to multi-robot scenarios, and a method to quantify plan robustness. We demonstrate our framework on several case studies, including an aircraft turnaround task involving three Jackal robots, highlighting the approach's potential applicability to important real-world problems. Our benchmark results show that our MILP method outperforms state-of-the-art open-source TSP solvers OR-Tools.
|
|
16:30-18:00, Paper ThCT9-CC.6 | Add to My Program |
On the Convergence of a Closed-Loop Inverse Kinematics Solver with Time-Varying Task Functions |
|
Fiore, Mario Daniele | Universitŕ Degli Studi Della Campania "Luigi Vanvitelli" |
Natale, Ciro | Universitŕ Degli Studi Della Campania "Luigi Vanvitelli" |
Keywords: Constrained Motion Planning, Motion Control, Formal Methods in Robotics and Automation
Abstract: Many control algorithms devised to allow redundant robots to execute complex multiple tasks with priorities require a numerical inverse kinematics (IK) solver. The present letter investigates the conditions that, if satisfied, guarantee that a specific module of closed-loop numerical IK solvers, which is at the kernel of some of the aforementioned algorithms, converges to a feasible solution. The investigation has the objective to prove the convergence in those cases when the task function is time-varying. The conditions found to ensure convergence include not only the initial task error and the loop gain - as it happens for stationary task functions - but also the maximum sampling time to be used in the computation of the solution.
|
|
16:30-18:00, Paper ThCT9-CC.7 | Add to My Program |
PROTAMP-RRT: A Probabilistic Integrated Task and Motion Planner Based on RRT |
|
Saccuti, Alessio | University of Parma |
Monica, Riccardo | University of Parma |
Aleotti, Jacopo | University of Parma |
Keywords: Task and Motion Planning, Manipulation Planning, Motion and Path Planning
Abstract: Solving complex robot manipulation tasks requires a Task and Motion Planner (TAMP) that searches for a sequence of symbolic actions, i.e. a task plan, and also computes collision-free motion paths. As the task planner and the motion planner are closely interconnected TAMP is considered a challenging problem. In this paper, a Probabilistic Integrated Task and Motion Planner (PROTAMP-RRT) is presented. The proposed method is based on a unified Rapidly-exploring Random Tree (RRT) that operates on both the geometric space and the symbolic space. The RRT is guided by the task plan and it is enhanced with a probabilistic model that estimates the probability of sampling a new robot configuration towards the next sub-goal of the task plan. When the RRT is extended, the probabilistic model is updated alongside. The probabilistic model is used to generate a new task plan if the feasibility of the previous one is unlikely. The performance of PROTAMP-RRT was assessed in simulated pick-and-place tasks, and it was compared against state-of-the-art approaches TM-RRT and Planet, showing better performance.
|
|
ThCT10-CC Oral Session, CC-501 |
Add to My Program |
Contact Modeling |
|
|
Chair: Oh, Sehoon | DGIST |
|
16:30-18:00, Paper ThCT10-CC.1 | Add to My Program |
Intrinsic Contact Sensing and Object Perception of an Adaptive Fin-Ray Gripper Integrating Compact Deflection Sensors |
|
Chen, Genliang | Shanghai Jiao Tong University |
Tang, Shujie | Shanghai Jiao Tong University |
Xu, Shaoqiu | Shanghai Jiao Tong University |
Guan, Tong | Shanghai Jiao Tong University |
Xun, Yuanhao | Shanghai Jiao Tong University |
Zhang, Zhuang | Westlake University |
Wang, Hao | Shanghai Jiao Tong University |
Lin, Zhongqin | SJTU |
Keywords: Contact Modeling, Force and Tactile Sensing, Perception for Grasping and Manipulation, Adaptive Fin-ray Gripper
Abstract: Owing to their tremendous adaptability to free-form objects, soft grippers with fin-ray structures have a wide range of applications. However, kinetostatic analysis and contact sensing for such grippers are still challenging due to large structural deformations. In this paper, a model-based method for intrinsic contact sensing, object perception, and interactive manipulation, is proposed for these adaptive grippers. The contributions arise from the integration of compact deflection sensors, that are particularly fabricated for large deformations of flexible beams. Using a discretization-based approach, the contact condition can be identified via the local deformations from the deflection sensors. Prototypes are developed using simple materials and manufacturing methods, on which various validation experiments are conducted. Based on contact sensing, the developed adaptive gripper can perceive the boundary geometry and structural compliance of unstructured objects. Moreover, sensor-based feedback control can be accomplished to perform interactive manipulation, in which the contact force between the finger and object can be regulated precisely (5% RMS error) in realtime.
|
|
16:30-18:00, Paper ThCT10-CC.2 | Add to My Program |
Incipient Slip-Based Rotation Measurement Via Visuotactile Sensing During In-Hand Object Pivoting |
|
Li, Mingxuan | Tsinghua University |
Zhou, Yen Hang | Tsinghua University |
Li, Tiemin | Tsinghua University |
Jiang, Yao | Tsinghua University |
Keywords: Contact Modeling, Perception for Grasping and Manipulation, Force and Tactile Sensing
Abstract: In typical in-hand manipulation tasks represented by object pivoting, the real-time perception of rotational slippage has been proven beneficial for improving the dexterity and stability of robotic hands. An effective strategy is to obtain the contact properties for measuring rotation angle through visuotactile sensing. However, existing methods for rotation estimation did not consider the impact of the incipient slip during the pivoting process, which introduces measurement errors and makes it hard to determine the boundary between stable contact and macro slip. This paper describes a generalized 2-d contact model under pivoting, and proposes a rotation measurement method based on the line-features in the stick region. The proposed method was applied to the Tac3D vision-based tactile sensors using continuous marker patterns. Experiments show that the rotation measurement system could achieve an average static measurement error of 0.17°±0.15° and an average dynamic measurement error of 1.34°±0.48°. Besides, the proposed method requires no training data and can achieve real-time sensing during the in-hand object pivoting.
|
|
16:30-18:00, Paper ThCT10-CC.3 | Add to My Program |
Leveraging Compliant Tactile Perception for Haptic Blind Surface Reconstruction |
|
Emile Ramos Cheret, Laurent Yves | Lakehead University |
Prado da Fonseca, Vinicius | Memorial University of Newfoundland |
Alves de Oliveira, Thiago Eustaquio | Lakehead University |
Keywords: Contact Modeling, Soft Sensors and Actuators, Haptics and Haptic Interfaces
Abstract: Non-flat surfaces pose difficulties for robots operating in unstructured environments. Reconstructions of uneven surfaces may only be partially possible due to non-compliant end-effectors and limitations on vision systems such as transparency, reflections, and occlusions. This study achieves blind surface reconstruction by harnessing the robotic manipulator's kinematic data and a compliant tactile sensing module, which incorporates inertial, magnetic, and pressure sensors. The module's flexibility enables us to estimate contact positions and surface normals by analyzing its deformation during interactions with unknown objects. While previous works collect only positional information, we include the local normals in a geometrical approach to estimate curvatures between adjacent contact points. These parameters then guide a spline-based patch generation, which allows us to recreate larger surfaces without an increase in complexity while reducing the time-consuming step of probing the surface. Experimental validation demonstrates that this approach outperforms an off-the-shelf vision system in estimation accuracy. Moreover, this compliant haptic method works effectively even when the manipulator's approach angle is not aligned with the surface normals, which is ideal for unknown non-flat surfaces.
|
|
16:30-18:00, Paper ThCT10-CC.4 | Add to My Program |
Differentiable Compliant Contact Primitives for Estimation and Model Predictive Control |
|
Haninger, Kevin | Fraunhofer IPK |
Samuel, Kangwagye | DGIST |
Rozzi, Filippo | Politecnico Di Milano |
Oh, Sehoon | DGIST |
Roveda, Loris | SUPSI-IDSIA |
Keywords: Contact Modeling, Compliance and Impedance Control, Model Learning for Control
Abstract: Control techniques like MPC can realize contact-rich manipulation which exploits dynamic information, maintaining friction limits and safety constraints. However, contact geometry and dynamics are required to be known. This information is often extracted from CAD, limiting scalability and the ability to handle tasks with varying geometry. To reduce the need for a priori models, we propose a framework for estimating contact models online based on torque and position measurements. To do this, compliant contact models are used, connected in parallel to model multi-point contact and constraints such as a hinge. They are parameterized to be differentiable with respect to all of their parameters (rest position, stiffness, contact location), allowing the coupled robot/environment dynamics to be linearized or efficiently used in gradient-based optimization. These models are then applied for: offline gradient-based parameter fitting, online estimation via an extended Kalman filter, and online gradient-based MPC. The proposed approach is validated on two robots, showing the efficacy of sensorless contact estimation and the effects of online estimation on MPC performance.
|
|
16:30-18:00, Paper ThCT10-CC.5 | Add to My Program |
TacShade: A New 3D-Printed Soft Optical Tactile Sensor Based on Light, Shadow and Grey Scale for Shape Reconstruction |
|
Lu, Zhenyu | Bristol Robotics Laboratory |
Yang, Jialong | South China University of Technology; Peng Cheng Laboratory |
Li, Haoran | University of Bristol |
Li, Yifan | University of Bristol |
Si, Weiyong | University of Essex |
Lepora, Nathan | University of Bristol |
Yang, Chenguang | University of Liverpool |
Keywords: Contact Modeling, Force and Tactile Sensing, Soft Sensors and Actuators
Abstract: In this paper, we present the TacShade: a newly designed 3D-printed soft optical tactile sensor. The sensor is developed for shape reconstruction under the inspiration of sketch drawing that uses the density of sketch lines to draw light and shadow, resulting in the creation of a 3D-view effect. TacShade, building upon the strengths of the TacTip, a single-camera tactile sensor of large in-depth deformation and being sensitive to edge and surface following, improves the structure in that the markers are distributed within the gap of papillae pins. Variations in light, dark and grey effects can be generated inside the sensor under the external contact interactions. The contours of the contacting objects are outlined by white markers, while the contact depth characteristics can be indirectly obtained from the distribution of black pins and white markers, creating a 2.5D visualization. Based on the imaging effect, we improve the Shape from Shading (SFS) algorithm to process tactile images, enabling a coarse but fast reconstruction for the contact objects. Two experiments are performed. The first verifies TacShade’s ability to reconstruct the shape of the contact objects through one image for object distinction to avoid the long-term deep learning process. The second experiment shows the shape reconstruction capability of TacShade for a large panel with ridged patterns based on the location of robots and image splicing technology.
|
|
16:30-18:00, Paper ThCT10-CC.6 | Add to My Program |
Physics-Encoded Graph Neural Networks for Deformation Prediction under Contact |
|
Saleh, Mahdi | Technical University Munich |
Sommersperger, Michael | Technical University of Munich |
Navab, Nassir | TU Munich |
Tombari, Federico | Technische Universität München |
Keywords: Contact Modeling, Simulation and Animation, Deep Learning in Grasping and Manipulation
Abstract: In robotics, it's crucial to understand object deformation during tactile interactions. A precise understanding of deformation can elevate robotic simulations and have broad implications across different industries. We introduce a method using Physics-Encoded Graph Neural Networks (GNNs) for such predictions. Similar to robotic grasping and manipulation scenarios, we focus on modeling the dynamics between a rigid mesh contacting a deformable mesh under external forces. Our approach represents both the soft body and the rigid body within graph structures, where nodes hold the physical states of the meshes. We also incorporate cross-attention mechanisms to capture the interplay between the objects. By jointly learning geometry and physics, our model reconstructs consistent and detailed deformations. We've made our code and dataset public to advance research in robotic simulation and grasping.
|
|
16:30-18:00, Paper ThCT10-CC.7 | Add to My Program |
Unwieldy Object Delivery with Nonholonomic Mobile Base: A Stable Pushing Approach |
|
Tang, Yujie | Delft University of Technology |
Zhu, Hai | Defense Innovation Institute |
Potters, Susan | Delft University of Technology |
Wisse, Martijn | Delft University of Technology |
Pan, Wei | The University of Manchester |
Keywords: Contact Modeling, Motion and Path Planning, Manipulation Planning
Abstract: This paper addresses the problem of pushing manipulation with nonholonomic mobile robots. Pushing is a fundamental skill that enables robots to move unwieldy objects that cannot be grasped. We propose a stable pushing method that maintains stiff contact between the robot and the object to avoid consuming repositioning actions. We prove that a line contact, rather than a single point contact, is necessary for nonholonomic robots to achieve stable pushing. We also show that the stable pushing constraint and the nonholonomic constraint of the robot can be simplified as a concise linear motion constraint. Then, the pushing planning problem can be formulated as a constrained optimization problem using nonlinear model predictive control (NMPC). According to the experiments, our NMPC-based planner outperforms a reactive pushing strategy in terms of efficiency, reducing the robot’s travelled distance by 23.8% and time by 77.4%. Furthermore, our method requires four fewer hyperparameters and decision variables than the Linear Time-Varying (LTV) MPC approach, making it easier to implement. Real-world experiments are carried out to validate the proposed method with two differential-drive robots, Husky and Boxer, under different friction conditions.
|
|
16:30-18:00, Paper ThCT10-CC.8 | Add to My Program |
Robotic Contact Juggling |
|
Woodruff, James | Northwestern University |
Lynch, Kevin | Northwestern University |
Keywords: Contact Modeling, Nonholonomic Motion Planning, Dynamics, Manipulation Planning
Abstract: In this article, we define "robotic contact juggling" to be the purposeful control of the motion of a 3-D smooth object as it rolls freely on a motion-controlled robot manipulator, or "hand." While specific examples of robotic contact juggling have been studied before, in this article, we provide the first general formulation and solution method for the case of an arbitrary smooth object in a single-point rolling contact on an arbitrary smooth hand. Our formulation splits the problem into four subproblems: deriving the second-order rolling kinematics; deriving the 3-D rolling dynamics; planning rolling motions that satisfy the rolling dynamics and achieve the desired goal; and stabilization of planned rolling trajectories. The theoretical results are demonstrated in 3-D simulations and 2-D experiments using feedback from a high-speed vision system.
|
|
16:30-18:00, Paper ThCT10-CC.9 | Add to My Program |
Beyond Coulomb: Stochastic Friction Models for Practical Grasping and Manipulation |
|
Liu, Zixi | Harvard University |
Howe, Robert D. | Harvard University |
Keywords: Contact Modeling, Grasping, In-Hand Manipulation
Abstract: Reliable grasping and manipulation in daily tasks and unstructured environments require accurate contact modeling and grasp stability estimation. One key component is the coefficient of friction, which is typically estimated in robotics applications using Coulomb's law of friction as a constant coefficient of friction from the literature, even though actual friction behavior is variable and depends on many factors. In this work, we conducted sliding experiments with robot fingers and a hand, and show that rubber friction varies strongly with normal force Fn and contact velocity v, and includes a significant stochastic component. We present a framework for modeling the coefficient of friction as a distribution rather than a single constant, and show how this distribution can be narrowed when given a prior on Fn or v. For a given distribution, the likelihood of slipping is a continuous function with respect to the tangential-to-normal force ratio, instead of a step function according to Coulomb's law. By modeling friction as a function of Fn and v, we demonstrate that friction parameters can be estimated using regression models from a single sliding stroke of the fingertip against the object surface, and that strokes that span a larger range of Fn-v space provide better friction estimates. These results can be applied to grasp control to enable a quantitative trade-off between the likelihood of slipping vs. grasp force levels, and to sliding manipulation planning by clarifying
|
|
ThCT11-CC Oral Session, CC-502 |
Add to My Program |
Learning from Demonstration III |
|
|
Chair: Si, Weiyong | University of Essex |
Co-Chair: Ariki, Yuka | Sony Group Corporation |
|
16:30-18:00, Paper ThCT11-CC.1 | Add to My Program |
Learning Barrier-Certified Polynomial Dynamical Systems for Obstacle Avoidance with Robots |
|
Schonger, Martin | Technical University of Munich |
Kussaba, Hugo Tadashi | Technical University of Munich |
Chen, Lingyun | Technical University of Munich |
Figueredo, Luis | Technical University of Munich (TUM) |
Swikir, Abdalla | Technical University of Munich |
Billard, Aude | EPFL |
Haddadin, Sami | Technical University of Munich |
Keywords: Learning from Demonstration, Formal Methods in Robotics and Automation, Robot Safety
Abstract: Established techniques that enable robots to learn from demonstrations are based on learning a stable dynamical system (DS). To increase the robots' resilience to perturbations during tasks that involve static obstacle avoidance, we propose incorporating barrier certificates into an optimization problem to learn a stable and barrier-certified DS. Such optimization problem can be very complex or extremely conservative when the traditional linear parameter-varying formulation is used. Thus, different from previous approaches in the literature, we propose to use polynomial representations for DSs, which yields an optimization problem that can be tackled by sum-of-squares techniques. Finally, our approach can handle obstacle shapes that fall outside the scope of assumptions typically found in the literature concerning obstacle avoidance within the DS learning framework.
|
|
16:30-18:00, Paper ThCT11-CC.2 | Add to My Program |
Domain Adaptation of Visual Policies with a Single Demonstration |
|
Wang, Weiyao | The Johns Hopkins University |
Hager, Gregory | Johns Hopkins University |
Keywords: Transfer Learning, Learning from Demonstration, Deep Learning Methods
Abstract: Deploying machine learning algorithms for robot tasks in real-world applications presents a core challenge: overcoming the domain gap between the training and the deployment environment. This is particularly difficult for visuomotor policies that utilize high-dimensional images as input, particularly when those images are generated via simulation. A common method to tackle this issue is through domain randomization, which aims to broaden the span of the training distribution to cover the test-time distribution. However, this approach is only effective when the domain randomization encompasses the actual shifts in the test-time distribution. We take a different approach, where we make use of a single demonstration (a prompt) to learn policy that adapts to the testing target environment. Our proposed framework, PromptAdapt, leverages the Transformer architecture's capacity to model sequential data to learn demonstration-conditioned visual policies, allowing for in-context adaptation to a target domain that is distinct from training. Our experiments in both simulation and real-world settings show that PromptAdapt is a strong domain-adapting policy that outperforms baseline methods by a large margin under a range of domain shifts, including variations in lighting, color, texture, and camera pose.
|
|
16:30-18:00, Paper ThCT11-CC.3 | Add to My Program |
Learning Complex Motion Plans Using Neural ODEs with Safety and Stability Guarantees |
|
Savvas Sadiq Ali, Farhad Nawaz | University of Pennsylvania |
Li, Tianyu | University of Pennsylvania |
Matni, Nikolai | University of Pennsylvania |
Figueroa, Nadia | University of Pennsylvania |
Keywords: Learning from Demonstration, Safety in HRI, Robust/Adaptive Control
Abstract: We propose a Dynamical System (DS) approach to learn complex, possibly periodic motion plans from kinesthetic demonstrations using Neural Ordinary Differential Equations (NODE). To ensure reactivity and robustness to disturbances, we propose a novel approach that selects a target point at each time step for the robot to follow, by combining tools from control theory and the target trajectory generated by the learned NODE. A correction term to the NODE model is computed online by solving a quadratic program that guarantees stability and safety using control Lyapunov functions and control barrier functions, respectively. Our approach outperforms baseline DS learning techniques on the LASA handwriting dataset and complex periodic trajectories. It is also validated on the Franka Emika robot arm to produce stable motions for wiping and stirring tasks that do not have a single attractor, while being robust to perturbations and safe around humans and obstacles.
|
|
16:30-18:00, Paper ThCT11-CC.4 | Add to My Program |
Learning a Stable Dynamic System with a Lyapunov Energy Function for Demonstratives Using Neural Networks |
|
Zhang, Yu | University of Chinese Academy of Sciences |
Zou, Yongxiang | Institute of Automation, Chinese Academy of Sciences |
Zhang, Haoyu | Institute of Automation, Chinese Academy of Sciences |
Xia, Xiuze | Institute of Automation, Chinese Academy of Sciences |
Cheng, Long | Chinese Academy of Sciences |
Keywords: Learning from Demonstration, Imitation Learning, Motion and Path Planning
Abstract: Autonomous Dynamic System (DS)-based algorithms hold a pivotal and foundational role in the field of Learning from Demonstration (LfD). Nevertheless, they confront the formidable challenge of striking a delicate balance between achieving precision in learning and ensuring the overall stability of the system. In response to this substantial challenge, this paper introduces a novel DS algorithm rooted in neural network technology. This algorithm not only possesses the capability to extract critical insights from demonstration data but also demonstrates the capacity to learn a candidate Lyapunov energy function that is consistent with the provided demonstrations. The model presented in this paper employs a simplistic neural network architecture that excels in fulfilling a dual objective: optimizing accuracy while simultaneously preserving global stability. To comprehensively evaluate the effectiveness of the proposed algorithm, rigorous assessments are conducted using the LASA dataset, further reinforced by empirical validation through a robotic experiment.
|
|
16:30-18:00, Paper ThCT11-CC.5 | Add to My Program |
Learning a Flexible Neural Energy Function with a Unique Minimum for Globally Stable and Accurate Demonstration Learning |
|
Jin, Zhehao | Zhejiang University of Technology |
Si, Weiyong | University of Essex |
Liu, Andong | Zhejiang University of Technology |
Zhang, Wen-An | Zhejiang University of Technology, China |
Yu, Li | Zhejiang University of Technology |
Yang, Chenguang | University of Liverpool |
Keywords: Learning from Demonstration, Learning and Adaptive Systems, Cooperating Robots, dynamic system learning
Abstract: Learning a stable autonomous dynamic system (ADS) encoding human motion rules has been shown as an effective way for demonstration learning. However, the stability guarantee may sacrifice the demonstration learning accuracy. This article solves the issue by learning a stability certificate, represented by a neural energy function, on the demonstration set. We propose a Polar-like space analysis approach to derive parameter constraints to guarantee the unique-minimum property of the neural energy function, which is essential for it to be a cogent stability certificate. Then, the neural energy function is learned to capture the demonstration preferences via constrained optimization algorithms. With the learned neural energy function, a globally asymptotically stable ADS with predefined position constraint is further formulated. We also quantitatively analyze the generalization ability of the learned ADS by utilizing the substantial flexibility of the neural energy function. The effectiveness of the proposed approach is validated on the LASA data set and two representative robotic experiments.
|
|
16:30-18:00, Paper ThCT11-CC.6 | Add to My Program |
Inverse Constraint Learning and Generalization by Transferable Reward Decomposition |
|
Jang, Jaehwi | Korea Advanced Institute of Science and Technology |
Song, Minjae | KAIST |
Park, Daehyung | Korea Advanced Institute of Science and Technology, KAIST |
Keywords: Learning from Demonstration, Constrained Motion Planning
Abstract: We present the problem of inverse constraint learning (ICL), which recovers constraints from demonstrations to autonomously reproduce constrained skills in new scenarios. However, ICL suffers from an ill-posed nature, leading to inaccurate inference of constraints from demonstrations. To figure it out, we introduce a transferable constraint learning (TCL) algorithm that jointly infers a task-oriented reward and a task-agnostic constraint, enabling the generalization of learned skills. Our method TCL additively decomposes the overall reward into a task reward and its residual as soft constraints, maximizing policy divergence between task- and constraint-oriented policies to obtain a transferable constraint. Evaluating our method and five baselines in three simulated environments, we show TCL outperforms state-of-the-art IRL and ICL algorithms, achieving up to a 72% higher task-success rates with accurate decomposition compared to the next best approach in novel scenarios. Further, we demonstrate the robustness of TCL on two real-world robotic tasks.
|
|
16:30-18:00, Paper ThCT11-CC.7 | Add to My Program |
Learning Robot Motion in a Cluttered Environment Using Unreliable Human Skeleton Data Collected by a Single RGB Camera |
|
Takamido, Ryota | Research into Artifacts, Center for Engineering (RACE), School O |
Ota, Jun | The University of Tokyo |
Keywords: Learning from Demonstration, Motion and Path Planning, Collision Avoidance
Abstract: Current learning from demonstration (LfD) frameworks have difficulty dealing with an unreliable, limited number of demonstrations. To address this issue, we proposed a novel motion planning framework referred to as experience-driven random tree connect with human demonstration (ERTC-HD), which can facilitate the identification of valid motions in cluttered environments by only using human skeleton information extracted from a single red, green, and blue (RGB) camera. The point of this framework is to only extract the comprehensive features of human motion from unreliable demonstrations and use them as a rough estimate for solving complex planning problems instead of as a strict solution. During the process of ERTC-HD, robot motions generated from extracted comprehensive features of human motion are saved as a path experience and modified through the path adaptation process of an existing ERTC planner when transferring it to the new problem. The results of three simulation experiments revealed that the ERTC-HD could identify valid motion in cluttered environments within shorter time periods than other state-of-the-art planners even when using unreliable demonstration data collected by a single RGB camera. The reduction of the required accuracy of the original information resources can extend the range of applications of this LfD framework.
|
|
ThCT12-CC Oral Session, CC-503 |
Add to My Program |
Deep Learning III |
|
|
Co-Chair: Dolan, John M. | Carnegie Mellon University |
|
16:30-18:00, Paper ThCT12-CC.1 | Add to My Program |
Robot Interaction Behavior Generation Based on Social Motion Forecasting for Human-Robot Interaction |
|
Valls Mascaro, Esteve | Technische Universitat Wien |
Yan, Yashuai | Vienna University of Technology |
Lee, Dongheui | Technische Universität Wien (TU Wien) |
Keywords: Deep Learning Methods, Gesture, Posture and Facial Expressions, Imitation Learning
Abstract: Integrating robots into populated environments is a complex challenge that requires an understanding of human social dynamics. In this work, we propose to model social motion forecasting in a shared human-robot representation space, which facilitates us to synthesize robot motions that interact with humans in social scenarios despite not observing any robot in the motion training. We develop a transformer-based architecture called ECHO, which operates in the aforementioned shared space to predict the future motions of the agents encountered in social scenarios. Contrary to prior works, we reformulate the social motion problem as the refinement of the predicted individual motions based on the surrounding agents, which facilitates the training while allowing for single-motion forecasting when only one human is in the scene. We evaluate our model in multi-person and human-robot motion forecasting tasks and obtain state-of-the-art performance by a large margin while being efficient and performing in real-time. Additionally, our qualitative results showcase the effectiveness of our approach in generating human-robot interaction behaviors that can be controlled via text commands.
|
|
16:30-18:00, Paper ThCT12-CC.2 | Add to My Program |
SPCGC: Scalable Point Cloud Geometry Compression for Machine Vision |
|
Xie, Liang | Peking University |
Gao, Wei | Peking University Shenzhen Graduate School |
Zheng, Huiming | Peking University |
Li, Ge | Peking University Shenzhen Graduate School |
Keywords: Deep Learning Methods
Abstract: With the proliferation of sensor devices, the extensive utilization of three-dimensional data in multimedia continues to grow. Point clouds are widely adopted within this domain because they are one of the most intuitive representations of three-dimensional data. However, the substantial volume of point cloud data poses significant challenges for storage and transmission. Moreover, a considerable portion of the data loses its semantic information during transmission. Consequently, how can we ensure both the perceptual quality for the human and the performance of downstream tasks during the transmission? To address this issue, we propose a scalable point cloud geometry compression framework (SPCGC) for machine perception. This framework tackles the fidelity issues associated with point cloud compression and preserves more semantic information, enhancing the performance of machine vision tasks. Our solution consists of a base layer bitstream and an enhancement layer bitstream. The base layer bitstream contains geometry data, while the enhancement layer bitstream utilizes semantic-guided residual data. Additionally, we introduce two modules for extracting and coding residual features. And incorporate classification and segmentation losses from downstream tasks into the Rate-Distortion (RD) optimization. Our approach outperforms existing learning-based lossy point cloud coding methods through empirical validation in downstream tasks without sacrificing point cloud compression performance.
|
|
16:30-18:00, Paper ThCT12-CC.3 | Add to My Program |
CppFlow: Generative Inverse Kinematics for Efficient and Robust Cartesian Path Planning |
|
Morgan, Jeremy | University of Southern California |
Millard, David | University of Southern California |
Sukhatme, Gaurav | University of Southern California |
Keywords: Motion and Path Planning, Constrained Motion Planning
Abstract: In this work we present CppFlow - a novel and performant planner for the Cartesian Path Planning problem, which finds valid trajectories up to 129x faster than current methods, while also succeeding on more difficult problems where others fail. At the core of the proposed algorithm is the use of a learned, generative Inverse Kinematics solver, which is able to efficiently produce promising entire candidate solution trajectories on the GPU. Precise, valid solutions are then found through classical approaches such as differentiable programming, global search, and optimization. In combining approaches from these two paradigms we get the best of both worlds - efficient approximate solutions from generative AI which are made exact using the guarantees of traditional planning and optimization. We evaluate our system against other state of the art methods on a set of established baselines as well as new ones introduced in this work and find that our method significantly outperforms others in terms of the time to find a valid solution and planning success rate, and performs comparably in terms of trajectory length over time. Additional results and an open source implementation is available at https://jstmn.github.io/cppflow-website
|
|
16:30-18:00, Paper ThCT12-CC.4 | Add to My Program |
Safe Deep Policy Adaptation |
|
Xiao, Wenli | Carnegie Mellon University |
He, Tairan | Carnegie Mellon University |
Dolan, John M. | Carnegie Mellon University |
Shi, Guanya | Carnegie Mellon University |
Keywords: Reinforcement Learning, Robot Safety, Model Learning for Control
Abstract: A critical goal of autonomy and artificial intelligence is enabling autonomous robots to rapidly adapt in dynamic and uncertain environments. Classic adaptive control and safe control provide stability and safety guarantees but are limited to specific system classes. In contrast, policy adaptation based on reinforcement learning (RL) offers versatility and generalizability but presents safety and robustness challenges. We propose SafeDPA, a novel RL and control framework that simultaneously tackles the problems of policy adaptation and safe reinforcement learning. SafeDPA jointly learns adaptive policy and dynamics models in simulation, predicts environment configurations, and fine-tunes dynamics models with few-shot real-world data. A safety filter based on the Control Barrier Function (CBF) on top of the RL policy is introduced to ensure safety during real-world deployment. We provide theoretical safety guarantees of SafeDPA and show the robustness of SafeDPA against learning errors and extra perturbations. Comprehensive experiments on (1) classic control problems (Inverted Pendulum), (2) simulation benchmarks (Safety Gym), and (3) a real-world agile robotics platform (RC Car) demonstrate great superiority of SafeDPA in both safety and task performance, over state-of-the-art baselines. Particularly, SafeDPA demonstrates notable generalizability, achieving a 300% increase in safety rate compared to the baselines, under unseen disturbances in real-world experiments.
|
|
16:30-18:00, Paper ThCT12-CC.5 | Add to My Program |
Physics-Informed Neural Networks for Continuum Robots: Towards Fast Approximation of Static Cosserat Rod Theory |
|
Bensch, Martin | Leibniz University Hanover |
Job, Tim-David | Leibniz University Hanover |
Habich, Tim-Lukas | Leibniz University Hannover |
Seel, Thomas | Leibniz Universität Hannover |
Schappler, Moritz | Institute of Mechatronic Systems, Leibniz Universitaet Hannover |
Keywords: Deep Learning Methods, Motion and Path Planning
Abstract: Sophisticated models can accurately describe deformations of continuum robots while being computationally demanding, which limits their application. Especially when considering sampling-based path planning, the model has to be evaluated frequently, which can lead to substantially increased computation times. We present a new approach to compute the entire shape of a tendon-driven continuum robot by a physics-informed neural network (PINN). The underlying physics is modelled with the Cosserat rod theory and incorporated into the PINN’s loss function. The boundary values for the training are obtained from a reference model, solved by the shooting method. Our approach allows for a computation of the learned Cosserat rod model multiple orders of magnitude faster than a publicly available reference model. The median position deviation from the reference model lies below 1mm (0.5% of the simulated robot length) for each of the robot’s 20 disks.
|
|
16:30-18:00, Paper ThCT12-CC.6 | Add to My Program |
Fast Kinodynamic Planning on the Constraint Manifold with Deep Neural Networks |
|
Kicki, Piotr | Poznan University of Technology |
Liu, Puze | Technische Universität Darmstadt |
Tateo, Davide | Technische Universität Darmstadt |
Bou Ammar, Haitham | Huawei |
Walas, Krzysztof, Tadeusz | Poznan University of Technology |
Skrzypczynski, Piotr | Poznan University of Technology |
Peters, Jan | Technische Universität Darmstadt |
Keywords: Motion and Path Planning, Deep Learning in Robotics and Automation, Learning to plan, Manipulation Planning
Abstract: Motion planning is a mature area of research in robotics with many well-established methods based on optimization or sampling the state space, suitable for solving kinematic motion planning. However, when dynamic motions under constraints are needed and computation time is limited, fast kinodynamic planning on the constraint manifold is indispensable. In recent years, learning-based solutions have become alternatives to classical approaches, but they still lack comprehensive handling of complex constraints, such as planning on a lower-dimensional manifold of the task space while considering the robot's dynamics. This paper introduces a novel learning-to-plan framework that exploits the concept of constraint manifold, including dynamics, and neural planning methods. Our approach generates plans satisfying an arbitrary set of constraints and computes them in a short constant time, namely the inference time of a neural network. This allows the robot to plan and replan reactively, making our approach suitable for dynamic environments. We validate our approach on~two~simulated tasks and in a demanding real-world scenario, where we use a Kuka LBR Iiwa 14 robotic arm to perform the hitting movement in robotic Air Hockey.
|
|
16:30-18:00, Paper ThCT12-CC.7 | Add to My Program |
MANER: Multi-Agent Neural Rearrangement Planning of Objects in Cluttered Environments |
|
Gupta, Vivek | Purdue University, West Lafayette |
Dhir, Prabhpreet | Purdue University |
Dani, Jeegn | Purdue University |
Qureshi, Ahmed H. | Purdue University |
Keywords: Task and Motion Planning, Multi-Robot Systems, Deep Learning Methods
Abstract: Object rearrangement is a fundamental problem in robotics with various practical applications ranging from managing warehouses to cleaning and organizing home kitchens. While existing research has primarily focused on single-agent solutions, real-world scenarios often require multiple robots to work together on rearrangement tasks. This paper proposes a comprehensive learning-based framework for multi-agent object rearrangement planning, addressing the challenges of task sequencing and path planning in complex environments. The proposed method iteratively selects objects, determines their relocation regions, and pairs them with available robots under kinematic feasibility and task reachability for execution to achieve the target arrangement. Our experiments on a diverse range of simulated and real-world environments demonstrate the effectiveness and robustness of the proposed framework. Furthermore, results indicate improved performance in terms of traversal time and success rate compared to baseline approaches.
|
|
16:30-18:00, Paper ThCT12-CC.8 | Add to My Program |
Benchmarking Actor-Critic Deep Reinforcement Learning Algorithms for Robotics Control with Action Constraints |
|
Kasaura, Kazumi | Omron Sinic X |
Miura, Shuwa | University of Massachusetts, Amherst |
Kozuno, Tadashi | Omron Sinic X |
Yonetani, Ryo | CyberAgent |
Hoshino, Kenta | Kyoto University |
Hosoe, Yohei | Kyoto University |
Keywords: Reinforcement Learning, Machine Learning for Robot Control, Performance Evaluation and Benchmarking
Abstract: This study presents a benchmark for evaluating action-constrained reinforcement learning (RL) algorithms. In action-constrained RL, each action taken by the learning system must comply with certain constraints. These constraints are crucial for ensuring the feasibility and safety of actions in real-world systems. We evaluate existing algorithms and their novel variants across multiple robotics control environments, encompassing multiple action constraint types. Our evaluation provides the first in-depth perspective of the field, revealing surprising insights, including the effectiveness of a straightforward baseline approach. The benchmark problems and associated code utilized in our experiments are made available online at github.com/omron-sinicx/action-constrained-RL-benchmark for further research and development.
|
|
ThCT13-AX Oral Session, AX-201 |
Add to My Program |
Human-Robot Collaboration VI |
|
|
Chair: Vijayakumar, Sethu | University of Edinburgh |
Co-Chair: Secchi, Cristian | Univ. of Modena & Reggio Emilia |
|
16:30-18:00, Paper ThCT13-AX.1 | Add to My Program |
Robust and Dexterous Dual-Arm Tele-Cooperation Using Adaptable Impedance Control |
|
Kouhkiloui Babarahmati, Keyhan | University of Edinburgh |
Kasaei, Mohammadreza | University of Edinburgh |
Tiseo, Carlo | University Fo Sussex |
Mistry, Michael | University of Edinburgh |
Vijayakumar, Sethu | University of Edinburgh |
Keywords: Human-Robot Collaboration, Physical Human-Robot Interaction, Telerobotics and Teleoperation
Abstract: In recent years, the need for robots to transition from isolated industrial tasks to shared environments, including human-robot collaboration and teleoperation, has become increasingly evident. Building on the foundation of Fractal Impedance Control (FIC) introduced in our previous work, this paper presents a novel extension to dual-arm tele-cooperation, leveraging the non-linear stiffness and passivity of FIC to adapt to diverse cooperative scenarios. Unlike traditional impedance controllers, our approach ensures stability without relying on energy tanks, as demonstrated in our prior research. In this paper, we further extend the FIC framework to bimanual operations, allowing for stable and smooth switching between different dynamic tasks without gain tuning. We also introduce a telemanipulation architecture that offers higher transparency and dexterity, addressing the challenges of signal latency and low-bandwidth communication. Through extensive experiments, we validate the robustness of our method and the results confirm the advantages of the FIC approach over traditional impedance controllers, showcasing its potential for applications in planetary exploration and other scenarios requiring dexterous telemanipulation. This paper's contributions include the seamless integration of FIC into multi-arm systems, the ability to perform robust interactions in highly variable environments, and the provision of a comprehensive comparison with competing approaches, thereby significantly enhancing the robustness and adaptability of robotic systems.
|
|
16:30-18:00, Paper ThCT13-AX.2 | Add to My Program |
PlanCollabNL: Leveraging Large Language Models for Adaptive Plan Generation in Human-Robot Collaboration |
|
Izquierdo-Badiola, Silvia | Eurecat |
Canal, Gerard | King's College London |
Rizzo, Carlos | University of Zaragoza |
Alenyŕ, Guillem | CSIC-UPC |
Keywords: Human-Robot Collaboration, Task Planning, AI-Enabled Robotics
Abstract: "Hey, robot. Let's tidy up the kitchen. By the way, I have back pain today". How can a robotic system devise a shared plan with an appropriate task allocation from this abstract goal and agent condition? Classical AI task planning has been explored for this purpose, but it involves a tedious definition of an inflexible planning problem. Large Language Models (LLMs) have shown promising generalisation capabilities in robotics decision-making through knowledge extraction from Natural Language (NL). However, the translation of NL information into constrained robotics domains remains a challenge. In this paper, we use LLMs as translators between NL information and a structured AI task planning problem, targeting human-robot collaborative plans. The LLM generates information that is encoded in the planning problem, including specific subgoals derived from an NL abstract goal, as well as recommendations for subgoal allocation based on NL agent conditions. The framework, PlanCollabNL, is evaluated for a number of goals and agent conditions, and the results show that correct and executable plans are found in most cases. With this framework, we intend to add flexibility and generalisation to HRC plan generation, eliminating the need for a manual and laborious definition of restricted planning problems and agent models.
|
|
16:30-18:00, Paper ThCT13-AX.3 | Add to My Program |
Multi-Agent Strategy Explanations for Human-Robot Collaboration |
|
Pandya, Ravi | Carnegie Mellon University |
Zhao, Michelle | Carnegie Mellon University |
Liu, Changliu | Carnegie Mellon University |
Simmons, Reid | Carnegie Mellon University |
Admoni, Henny | Carnegie Mellon University |
Keywords: Human-Robot Teaming, Human-Robot Collaboration
Abstract: As robots are deployed in human spaces, it is important that they are able to coordinate their actions with the people around them. Part of such coordination involves ensuring that people have a good understanding of how a robot will act in the environment. This can be achieved through explanations of the robot's policy. Much prior work in explainable AI and RL focuses on generating explanations for single-agent policies, but little has been explored in generating explanations for collaborative policies. In this work, we investigate how to generate multi-agent strategy explanations for human-robot collaboration. We formulate the problem using a generic multi-agent planner, show how to generate visual explanations through strategy-conditioned landmark states and generate textual explanations by giving the landmarks to an LLM. Through a user study, we find that when presented with explanations from our proposed framework, users are able to better explore the full space of strategies and collaborate more efficiently with new robot partners.
|
|
16:30-18:00, Paper ThCT13-AX.4 | Add to My Program |
Efficient ISO/TS 15066 Compliance through Model Predictive Control |
|
Pupa, Andrea | University of Modena and Reggio Emilia |
Secchi, Cristian | Univ. of Modena & Reggio Emilia |
Keywords: Human-Robot Collaboration, Safety in HRI, Human-Aware Motion Planning
Abstract: In the actual industrial scenarios, human operators and robots work together sharing the workspace. Such proximity requires special attention in ensuring safety for the human operator, which is often translated in collision avoidance behaviour or high speed reduction. Adhering safety however is not the only aspect that must be taken into account. For many tasks, such as welding, it is crucial to ensure that the robot performs exactly the planned path. To optimize robot performance while complying with safety regulations, this work introduces a novel optimal nonlinear control problem. It prioritizes path preservation, exploiting redundancy to minimize task execution time, while explicitly adhering to the constraints imposed by ISO/TS 15066. To achieve high-performance outcomes, the control problem is addressed using the Model Predictive Control (MPC) approach. The proposed strategy has been experimentally validated in both simulations and a real-world industrial task involving a Kuka LWR4+ robot.
|
|
16:30-18:00, Paper ThCT13-AX.5 | Add to My Program |
Dual-Mode Human-Robot Collaboration with Guaranteed Safety Using Time-Varying Zeroing Control Barrier Functions and Quadratic Program |
|
Shi, Kaige | Nanyang Technological University |
Hu, Guoqiang | Nanyang Technological University, |
Keywords: Human-Robot Collaboration, Physical Human-Robot Interaction, Safety in HRI
Abstract: Safety and efficiency are two important aspects of human-robot collaboration (HRC). Most existing control methods for HRC consider either contactless HRC or physical HRC, hindering more efficient HRC. The proposed control framework enables dual-mode HRC, filling the gap between contactless and physical HRCs. With the framework, the robot can perform contactless HRC under safety regulations regarding the co-working human. Meanwhile, the human can safely interrupt the robot via physical contact to enter physical HRC, in which he/she can hand guide the robot or take over its gripped object. First, human safety is defined as bounded approaching velocities between human and multiple robot links based on ISO/TS 15066, allowing gradual establishing of physical contact. Then, the time-varying zeroing control barrier function is proposed and defined to guarantee the bounded approaching velocities by a safety control set. Second, a unified task control set is designed to achieve different robot tasks for different HRC modes in a unified manner. The unified task control set enables the robot to switch smoothly between the two HRC modes. An optimal final control input is determined by a quadratic program (QP) based on different control sets. Experiments were conducted to verify the proposed framework and compare the proposed framework with existing methods. An application example is presented to show the versatility of the proposed framework.
|
|
16:30-18:00, Paper ThCT13-AX.6 | Add to My Program |
A Time-Optimal Energy Planner for Safe Human-Robot Collaboration |
|
Pupa, Andrea | University of Modena and Reggio Emilia |
Minelli, Marco | University of Modena and Reggio Emilia |
Secchi, Cristian | Univ. of Modena & Reggio Emilia |
Keywords: Human-Robot Collaboration, Safety in HRI, Human-Aware Motion Planning
Abstract: The human-robot collaboration scenarios are characterized by the presence of human operators and robots that work in close contact with each other. As a consequence, the safety regulations have been updated in order to provide guidelines on how to asses safety in these new scenarios. In particular, Power and Force Limiting (PFL) collaborative mode describes how the energy should be regulated during the collaboration. Based on these guidelines, we propose a new optimal trajectory planner which, by exploiting the variability of the robot's inertia as a function of its configuration, is able to return trajectories that can be travelled at greater speed and in less time, while guaranteeing the safety limits according to the standard. The proposed planner was validated first in simulation, comparing completion times with other state-of-the-art planning algorithms, and then experimentally, demonstrating the performance of the planned trajectories during physical interaction with the environment. Both validations confirm the effectiveness of the proposed planner, which returns shorter completion times while ensuring safe interaction.
|
|
16:30-18:00, Paper ThCT13-AX.7 | Add to My Program |
Discuss before Moving: Visual Language Navigation Via Multi-Expert Discussions |
|
Long, Yuxing | Peking University |
Li, Xiaoqi | Peking University |
Cai, Wenzhe | Southeast University |
Dong, Hao | Peking University |
Keywords: Vision-Based Navigation, Natural Dialog for HRI, Task Planning
Abstract: Visual language navigation (VLN) is an embodied task demanding a wide range of skills encompassing understanding, perception, and planning. For such a multifaceted challenge, previous VLN methods totally rely on one model's own thinking to make predictions within one round. However, existing models, even the most advanced large language model GPT4, still struggle with dealing with multiple tasks by single-round self-thinking. In this work, drawing inspiration from the expert consultation meeting, we introduce a novel zero-shot VLN framework. Within this framework, large models possessing distinct abilities are served as domain experts. Our proposed navigation agent, namely DiscussNav, can actively discuss with these experts to collect essential information before moving at every step. These discussions cover critical navigation subtasks like instruction understanding, environment perception, and completion estimation. Through comprehensive experiments, we demonstrate that discussions with domain experts can effectively facilitate navigation by perceiving instruction-relevant information, correcting inadvertent errors, and sifting through in-consistent movement decisions. The performances on the representative VLN task R2R show that our method surpasses the leading zero-shot VLN model by a large margin on all metrics. Additionally, real-robot experiments display the obvious advantages of our method over single-round self-thinking.
|
|
16:30-18:00, Paper ThCT13-AX.8 | Add to My Program |
Personality and Memory-Based Software Framework for Human-Robot Interaction |
|
Nardelli, Alice | University of Genoa |
Sgorbissa, Antonio | University of Genova |
Recchiuto, Carmine Tommaso | University of Genova |
Keywords: Emotional Robotics, Cognitive Modeling, Human-Robot Collaboration
Abstract: The synergic orchestration of the cognitive and psychological dimensions characterizes human intelligence. Accordingly, carefully designing this mechanism in artificial intelligence can be a successful strategy to increase human likeness in a robot, enhancing mutual understanding and building a more natural and intuitive interaction. For this purpose, the main contribution of this work is a psychological and cognitive architecture tailored for HRI based on the interplay between robotic personality and memory-based cognitive processes. Indeed, the artificial personality manifests itself not only in various aspects of the behavior but also within the action selection process, which is closely intertwined with personality-dependent hedonic experiences linked to memories. Within this paper, we propose a task- and platform-independent framework, evaluated in a multiparty collaborative scenario. Obtained results show that a robot connected to our proposed framework is perceived as a cognitive agent capable of manifesting perceivable and distinguishable personality traits.
|
|
ThCT15-AX Oral Session, AX-203 |
Add to My Program |
Service Robots |
|
|
Chair: Iwasawa, Yusuke | The University of Tokyo |
Co-Chair: Alami, Rachid | CNRS |
|
16:30-18:00, Paper ThCT15-AX.1 | Add to My Program |
Self-Recovery Prompting: Promptable General Purpose Service Robot System with Foundation Models and Self-Recovery |
|
Shirasaka, Mimo | The University of Tokyo |
Matsushima, Tatsuya | The University of Tokyo |
Tsunashima, Soshi | The University of Tokyo |
Ikeda, Yuya | University of Tokyo |
Horo, Aoi | The University of Tokyo |
Ikoma, So | The University of Tokyo |
Tsuji, Chikaha | The University of Tokyo |
Wada, Hikaru | The University of Tokyo |
Omija, Tsunekazu | The University of Tokyo |
Komukai, Dai | The Univeristy of Tokyo |
Matsuo, Yutaka | The University of Tokyo |
Iwasawa, Yusuke | The University of Tokyo |
Keywords: Service Robotics
Abstract: A general-purpose service robot (GPSR), which can execute diverse tasks in various environments, requires a system with high generalizability and adaptability to tasks and environments. In this paper, we first developed a top-level GPSR system for worldwide competition (RoboCup@Home 2023) based on multiple foundation models. This system is both generalizable to variations and adaptive by prompting each model. Then, by analyzing the performance of the developed system, we found three types of failure in more realistic GPSR application settings: insufficient information, incorrect plan generation, and plan execution failure. We then propose the self-recovery prompting pipeline, which explores the necessary information and modifies its prompts to recover from failure. We experimentally confirm that the system with the self-recovery mechanism can accomplish tasks by resolving various failure cases.
|
|
16:30-18:00, Paper ThCT15-AX.2 | Add to My Program |
Autonomous Quilt Spreading for Caregiving Robots |
|
Guo, Yuchun | Harbin Institute of Technology, Shenzhen |
Lu, Zhiqing | Harbin Institute of Technology,Shenzhen |
Zhou, Yanling | Harbin Institute of Technology (Shenzhen) |
Jiang, Xin | Harbin Institute of Technology, Shenzhen |
Keywords: Service Robotics
Abstract: In this work, we propose a novel strategy to ensure infants, who inadvertently displace their quilts during sleep, are promptly and accurately re-covered. Our approach is formulated into two subsequent steps: interference resolution and quilt spreading. By leveraging the DWPose human skeletal detection and the Segment Anything instance segmentation models, the proposed method can accurately recognize the states of the infant and the quilt over her, which involves addressing the interferences resulted from an infant's limbs laid on part of the quilt. Building upon prior research, the EM*D deep learning model is employed to forecast quilt state transitions before and after quilt spreading actions. To improve the sensitivity of the network in distinguishing state variation of the handled quilt, we introduce an enhanced loss function that translates the voxelized quilt state into a more representative one. Both simulation and real-world experiments validate the efficacy of our method, in spreading and recover a quilt over an infant.
|
|
16:30-18:00, Paper ThCT15-AX.3 | Add to My Program |
CNS: Correspondence Encoded Neural Image Servo Policy |
|
Chen, Anzhe | Zhejiang University |
Yu, Hongxiang | Zhejiang University |
Wang, Yue | Zhejiang University |
Xiong, Rong | Zhejiang University |
Keywords: Service Robotics, Deep Learning in Grasping and Manipulation, Visual Servoing
Abstract: Image servo is an indispensable technique in robotic applications that helps to achieve high precision positioning. The intermediate representation of image servo policy is important to sensor input abstraction and policy output guidance. Classical approaches achieve high precision but require clean keypoint correspondence, and suffer from limited convergence basin or weak feature error robustness. Recent learning-based methods achieve moderate precision and large convergence basin on specific scenes but face issues when generalizing to novel environments. In this paper, we encode keypoints and correspondence into a graph and use graph neural network as architecture of controller. This design utilizes both advantages: generalizable intermediate representation from keypoint correspondence and strong modeling ability from neural network. Other techniques including realistic data generation, feature clustering and distance decoupling are proposed to further improve efficiency, precision and generalization. Experiments in simulation and real-world verify the effectiveness of our method in speed (maximum 40fps along with observer), precision (<0.3° and sub-millimeter accuracy) and generalization (sim-to-real without fine-tuning). Project homepage (full paper with supplementary text, video and code): https://github.com/hhcaz/CNS
|
|
16:30-18:00, Paper ThCT15-AX.4 | Add to My Program |
Adapting for Calibration Disturbances: A Neural Uncalibrated Visual Servoing Policy |
|
Yu, Hongxiang | Zhejiang University |
Chen, Anzhe | Zhejiang University |
Xu, Kechun | Zhejiang University |
Guo, Dashun | Zhejiang University |
Zhou, Zhongxiang | Zhejiang University |
Wei, Yufei | Zhejiang University |
Zhang, Xuebo | Nankai University, |
Wang, Yue | Zhejiang University |
Xiong, Rong | Zhejiang University |
Keywords: Service Robotics, Deep Learning in Grasping and Manipulation, Visual Servoing
Abstract: Visual servoing (VS) is a widely used technique in industries where there are hundreds of robots, but it requires accurate camera calibration including camera intrinsic and extrinsic parameters. However, it is labour-intensive to calibrate robots one-by-one in practical use. In this paper, we propose a neural uncalibrated VS policy (NUVS) that can adapt to calibration disturbances with an adaption mechanism and a control-oriented guidance. It bridges the disturbance adaption of classical VS methods and the large convergence of learning-based VS methods. NUVS estimates the calibration embedding from past observations and servos to the desired pose under the supervision of a PBVS that can access the ground truth in simulation. With this adaption mechanism, NUVS outperforms the classical IBUVS algorithm when facing large initial camera pose offsets under the calibration disturbance. Supplementary material in: https://sites.google.com/view/neural-uncalibrated-vs
|
|
16:30-18:00, Paper ThCT15-AX.5 | Add to My Program |
ARIS 1.0: An Autonomous Multitasking Medical Service Robot for Hospital Environments |
|
Dunuwila, Anurisha Piyathma | University of Moratuwa |
Gunawardhana, Lahiru | University of Moratuwa |
Basnayake, Hirantha | University of Moratuwa, Sri Lanka |
Amarasinghe, Ranjith | University of Moratuway |
Jayasekara, A.G.B.P. | University of Moratuwa |
Hanchapola Appuhamilage, Gihan Charith Premachandra | Singapore University of Technology and Design |
Tamura, Hiroki | University of Miyazaki |
Tan, U-Xuan | Singapore University of Techonlogy and Design |
Keywords: Service Robotics, Human-Robot Collaboration, Medical Robots and Systems
Abstract: Introducing robotics in the healthcare sector revolutionizes medical services by providing advanced treatments, medication management, and robotic assistance while overcoming resource limitations. In the current healthcare domain, an intermediate robotic communication platform is essential for distributing equal medical services, facilitating remote consultations, and maintaining the integrity of medical education, especially in rural areas and during pandemics. This work introduces ARIS, a multitasking medical service robot designed for telemedicine aspects and to facilitate remote medical education activities such as ward rounds. The prototype called ARIS 1.0 was developed, including a three-wheeled omnidirectional mobile platform, a torso and a novel movable neck mechanism with a face. The prototype robot can generate an online summarized report using its integrated language interaction and IoT-based vital sign extraction modules. The ROS-based semi-autonomous navigation facilitates the robot to be an assistive agent, allowing it to either accompany doctors or visit patients individually. Ultimately, ARIS 1.0 serves telepresence and novel regional language capabilities, specifically Sinhala-based self-communication features. This enables inter-party communication among doctors, medical students, and patients. The functionalities of ARIS 1.0 were validated in an emulated indoor environment to evaluate their feasibility. The results indicate that ARIS 1.0 is feasible for providing remote medical services. Furthermore, the paper discusses several promising research directions related to the proposed concept.
|
|
16:30-18:00, Paper ThCT15-AX.6 | Add to My Program |
Design, Modeling and Analysis of a Spherical Parallel Continuum Manipulator for Nursing Robots |
|
Gong, Zhenhua | Soochow University |
Ning, Chuanxin | Soochow University |
Liang, Jiejunyi | Huazhong University of Science and Technology |
Zhang, Ting | Soochow University |
Keywords: Service Robotics, Medical Robots and Systems, Redundant Robots
Abstract: In the healthcare industry, nursing robots have made great contributions, assisting in the delivery of food and medicine as well as the movement and transfer of patients. However, the traditional continuum manipulator often has the problems of limited workspace and weak carrying capacity. Compared with traditional manipulator, the continuum manipulator has the advantages of a small moment of inertia and high dexterity. This paper proposes a original cable-driven parallel continuum manipulator with a spherical parallel mechanism as the continuous segments. Due to the spherical parallel mechanisms’ characteristics, the proposed cable-driven spherical parallel continuum manipulator offers many inherent advantages for nursing robots. The prototype is tested and analyzed, and the kinematics and statics are verified. The results show that the cable-driven spherical parallel continuum manipulator for nursing robots has low requirements for workspace, suitable for complex spaces and can have a large carrying capacity.
|
|
16:30-18:00, Paper ThCT15-AX.7 | Add to My Program |
LeagTag: An Elongated High-Accuracy Fiducial Marker for Tight Spaces |
|
Tanaka, Hideyuki | National Institute of AIST |
Ogata, Kunihiro | National Institute of Advanced Industrial Science and Technology |
Keywords: Service Robotics, Mobile Manipulation, Sensor-based Control
Abstract: Fiducial markers enable reliable service robot control. In human-robot coexistence environments, efficient placement of square or circular markers can be challenging due to limited space. In this study, we developed a world-first, elongated fiducial marker, capable of high-accuracy 6-DoF measurements, designed to be installable in tight spaces. We introduced two types of lenticular angle gauges to enhance pose estimation and developed new marker patterns and measurement algorithms to maintain recognition distance and accuracy. The proposed marker achieved a measurement accuracy of 0.1% position error and 0.5 deg orientation error. This technology will enhance the practicality and applicability of fiducial markers, contributing to the creation of robot-friendly space for future service robots.
|
|
16:30-18:00, Paper ThCT15-AX.8 | Add to My Program |
An Open and Flexible Robot Perception Framework for Mobile Manipulation Tasks |
|
Mania, Patrick | University of Bremen |
Stelter, Simon | Universität Bremen |
Kazhoyan, Gayane | University of Bremen |
Beetz, Michael | University of Bremen |
Keywords: Service Robotics, Perception for Grasping and Manipulation, Software, Middleware and Programming Environments
Abstract: Over the last years, powerful methods for solving specific perception problems such as object detection, pose estimation or scene understanding have been developed. While performing mobile manipulation actions, a robot's perception framework needs to execute a series of these methods in a specific sequence each time it receives a new perception task. Generating proficient combinations of vision methods to solve individual perception tasks remains a challenge, as the combination depends on the requirements of the task and the capabilities of the robot's hardware. In this paper, we propose RoboKudo, an open-source knowledge-enabled perception framework that leverages the strengths of the Unstructured Information Management (UIM) principle and the flexibility of Behavior Trees to model task-specific perception processes. The framework can combine state-of-the-art computer vision methods to satisfy the requirements of each perception task and scales to different robot platforms. The generality and effectiveness of the framework are evaluated in real world experiments where it solves various perception tasks in the context of mobile manipulation actions in a household domain. Code and additional material are available at https://robokudo.ai.uni-bremen.de/rkop.
|
|
16:30-18:00, Paper ThCT15-AX.9 | Add to My Program |
Toward Mass Customization of a Robot's Morphology Design for Improving Area Coverage |
|
Muthugala Arachchige, Viraj Jagathpriya Muthugala | Singapore University of Technology and Design |
Samarakoon Mudiyanselage, Bhagya Prasangi Samarakoon | Singapore University of Technology and Design |
Enjikalayil Abdulkader, Raihan | Singapore University of Technology and Design |
Elara, Mohan Rajesh | Singapore University of Technology and Design |
Keywords: Service Robotics, Product Design, Development and Prototyping, Mechanism Design
Abstract: Floor cleaning robots have been developed to cater to building maintenance needs. Complete area coverage is crucial for a floor cleaning robot, and its morphology design plays a vital role in realizing complete area coverage. However, floor cleaning robots with fixed morphologies have difficulty in achieving a high area coverage performance. Mass customization of a robot’s morphology would improve its productivity in terms of area coverage. This paper proposes a novel system that can be used for mass customizing the morphology of a robot to improve area coverage performance in an environment of interest. The customized morphology is determined through an optimization technique by considering an environment of interest and design constraints. The area coverage of a candidate morphology design is evaluated by simulating the robot navigation in an environment of interest. Generalized pattern search, particle swarm optimization, and surrogate optimization are independently considered optimization techniques. Experiments have been conducted considering the cases of robot deployments. The statistical conclusions on experimental results validate that the proposed system can synthesize a morphology that significantly improves the area coverage performance in an environment of interest.
|
|
ThCT16-AX Oral Session, AX-204 |
Add to My Program |
Wearable Robotics II |
|
|
Chair: Ryu, Jee-Hwan | Korea Advanced Institute of Science and Technology |
Co-Chair: Mohammed, Samer | University of Paris Est Créteil - (UPEC) |
|
16:30-18:00, Paper ThCT16-AX.1 | Add to My Program |
Leaf-Inspired FSR Array and Insole-Type Sensor Module for Mobile Three-Dimensional Ground Reaction Force Estimation |
|
Kim, Taeyeon | Korea Advanced Institute of Science and Technology |
Song, Eunseok | Korea Advanced Institute of Science and Technology (KAIST) |
An, Seongbin | KAIST |
Choi, Hyunjin | Sangmyung University |
Kong, Kyoungchul | Korea Advanced Institute of Science and Technology |
Keywords: Wearable Robotics, Soft Sensors and Actuators
Abstract: This paper presents an insole-type sensor module with a novel leaf-inspired force-sensitive resistor (FSR) array for accurate three-dimensional ground reaction force (GRF) estimation during human's various motions. Joint torque analysis, essential for numerous applications in biomechanics and wearable robotics, necessitates the measurement of three-dimensional GRF vector information, traditionally achieved in indoor environments using costly force plates. To overcome these limitations, this study proposes an alternative method by incorporating FSRs on three inclined planes within the insole. A vector scaling process transforms the force values from the FSRs into the three-dimensional force vector, enabling continuous and user-independent estimation of GRF. The sensor module is integrated with machine learning, demonstrating its accuracy and usability in various motion scenarios. The results confirm the effectiveness of the leaf-inspired FSR array, giving the possibilities for portable and cost-effective motion analysis systems.
|
|
16:30-18:00, Paper ThCT16-AX.2 | Add to My Program |
Human-Exoskeleton Locomotion Interaction Experience Transfer: Speeding up and Improving the Performance of Preference-Based Optimizations of Exoskeleton Assistance During Walking |
|
Li, Hongwu | Harbin Institute of Technology |
Liu, Junchen | Harbin Institute of Technology |
Wang, Ziqi | Harbin Institute of Technology |
Ju, Haotian | Harbin Institute of Technology |
Zheng, Tianjiao | Harbin Institute of Technology |
Gao, Yongsheng | Harbin Institute of Technology |
Zhao, Jie | Harbin Institute of Technology |
Zhu, Yanhe | Harbin Institute of Technology |
Keywords: Wearable Robotics
Abstract: Preference-based optimizing methods have shown their advantages and potential in exploring individual, comfortable, and effective control strategies and assistance parameters of exoskeletons during locomotion. Research indicates that compared with naive wearers, knowledgeable wearers with abundant exoskeleton assistance experience have obvious advantages in speeding up the parameters exploration process and improving the assistant performance. However, there is no existing method that could utilize the human-exoskeleton locomotion interaction experience (HELIE) to assist naive wearers during the exploration process. In this work, we propose a novel preference-based human-exoskeleton locomotion interaction experience transfer (LIET) framework, which could speed up the exploration of human-preferred parameters and acquire more satisfying results for naive wearers via the HELIE acquired from knowledgeable wearers. In addition, based on the proposed LIET framework, we establish the mathematical expression of the HELIE transfer during exoskeleton assistance. This will promote the research that concerns utilizing HELIE for exoskeleton control parameters optimizations in the future. Finally, experiments demonstrate the proposed LIET framework could speed up the exploration process and acquire more satisfying optimized results for naive wearers.
|
|
16:30-18:00, Paper ThCT16-AX.3 | Add to My Program |
Design of a Knee-Joint Exoskeleton to Reduce Misalignment in Both the Sagittal and Coronal Planes |
|
Sengupta, Shubhranil | Korea Advanced Institute of Science and Technology |
Ryu, Jee-Hwan | Korea Advanced Institute of Science and Technology |
Keywords: Wearable Robotics, Rehabilitation Robotics, Prosthetics and Exoskeletons
Abstract: Many individuals experience knee dysfunctions attributed to the natural aging process and degenerative con- ditions. To aid individuals in regaining knee functionality, supportive exoskeletons were designed to be affixed to both the shin and thigh. However, a common issue encountered in knee exoskeletons involves the misalignment of joints between the exoskeleton and the user, resulting in discomfort and potential injuries. To reduce misalignment with the knee joint, it is essential for the thigh and shin harnesses of the exoskeleton to replicate the natural trajectories of the knee. However, achieving this is a complex task due to the shifting center of rotation of the knee in both the Sagittal and Coronal planes. Previous knee exoskeletons primarily focus on aligning the joint in the Sagittal plane, neglecting alignment in the other dimension due to inherent design constraints. For the first time, this study introduces a knee-joint exoskeleton capable of conforming to the natural movement of the knee in both the Sagittal and Coronal planes, with the aim of minimizing joint misalignment without the use of inherently soft materials. A spherical scissor linkage mechanism (SSLM) was utilized in conjunction with a customized guide rail to adjust the center of rotation of the SSLM. This configuration facilitates knee flexion/extension while accommodating the knee joint’s center of rotation in both the Sagittal and Coronal planes. The experimental outcomes demonstrated a substantial reduction in misalignment with the knee when compared to a commercial knee-support brace with a one-degree-of-freedom revolute joint.
|
|
16:30-18:00, Paper ThCT16-AX.4 | Add to My Program |
Adaptive Active Disturbance Rejection Control of an Actuated Ankle Foot Orthosis for Ankle Movement Assistance |
|
Jradi, Rami | UPEC |
Rifai, Hala | University of Paris Est Créteil |
Mohammed, Samer | University of Paris Est Créteil - (UPEC) |
Keywords: Wearable Robotics, Robust/Adaptive Control, Motion Control
Abstract: Foot-drop (FD) is a post-stroke gait disorder characterized by impaired foot lifting during the swing phase. This paper focuses on providing a continuous ankle joint assistance throughout the gait cycle using an actuated ankle foot orthosis (AAFO). The control strategy is based on an adaptive active disturbance rejection controller (AADRC) such that the orthosis provides the only required amount of assistance to complement the human effort needed to perform the walking activity. The proposed controller exhibits adaptability, making it suitable for various subjects without the need for prior parameter identification. To demonstrate its effectiveness, the control strategy is experimentally validated with five healthy subjects and compared to state-of-the-art controllers.
|
|
16:30-18:00, Paper ThCT16-AX.5 | Add to My Program |
A Novel Funnel-Based L1 Adaptive Fuzzy Approach for the Control of an Actuated Ankle Foot Orthosis |
|
Bey, Oussama | University Paris-Est Créteil - UPEC |
Jradi, Rami | UPEC |
Moon, Huiseok | LISSI-Lab, Universite De Paris-Est Creteil (UPEC) |
Rifai, Hala | University of Paris Est Créteil |
Das Sharma, Kaushik | University of Calcutta |
Amirat, Yacine | University of Paris Est Créteil (UPEC) |
Mohammed, Samer | University of Paris Est Créteil - (UPEC) |
Keywords: Wearable Robotics, Prosthetics and Exoskeletons, Physically Assistive Devices
Abstract: This paper introduces a novel funnel-based adaptive L1 fuzzy control strategy for assisting ankle joint movement during walking with the use of an actuated ankle foot orthosis (AAFO). A projection-based adaptation mechanism employing a fuzzy system is used to estimate the unknown time-varying parameters of the L1 control law, ensuring precise tracking of the AAFO-wearer system by the state estimator. The projection operator guarantees the convergence of the parameters while offering a limited amount of assistance torque. Funnel-based feedback control is used to mitigate the typical time lag seen when using L1-based approaches due to the presence of a low-pass filter commonly used in this type of approach. The effectiveness of the proposed control strategy is demonstrated through experiments involving five healthy subjects.
|
|
16:30-18:00, Paper ThCT16-AX.6 | Add to My Program |
Pneumatic Back Exoskeleton for Lifting Posture Detection and Correction |
|
Chen, Yu | Nanyang Technological University |
Wang, Minda | Nanyang Technological University |
Wang, Yifan | Nanyang Technological University |
Keywords: Wearable Robotics, Prosthetics and Exoskeletons, Soft Robot Applications
Abstract: Low back pain is a widespread issue that affects people worldwide and can lead to serious conditions such as herniated discs, spinal stenosis, or lumbar radiculopathy. Improper posture while lifting heavy weights is a common cause of back pain, especially among laborers. However, current back exoskeletons are often bulky and require electric motors, making them challenging to use and consuming significant power. Some passive exoskeletons don’t require power, but their fixed stiffness constrains normal motion. This paper presents a novel solution: a pneumatic back exoskeleton made of structured fabrics that can adjust stiffness under various air pressures. Additionally, it includes IMU sensors to detect lifting posture and correct it in real time. The exoskeleton’s effectiveness was tested through lifting experiments, demonstrating that it significantly corrects lifting posture, reduces stress on the lumbar spine, and mitigates back muscle stress. This pneumatic back exoskeleton offers a promising solution to prevent low back pain during weight-lifting tasks and provides guidance for future back exoskeleton designs.
|
|
ThCT18-AX Oral Session, AX-206 |
Add to My Program |
Representation Learning II |
|
|
Chair: So, Peter | Technical University of Munich |
Co-Chair: Fuxin, Li | Oregon State University |
|
16:30-18:00, Paper ThCT18-AX.1 | Add to My Program |
CITR: A Coordinate-Invariant Task Representation for Robotic Manipulation |
|
So, Peter | Technical University of Munich |
Cabral Muchacho, Rafael Ignacio | KTH Royal Institute of Technology |
Kirschner, Robin Jeanne | TU Munich, Institute for Robotics and Systems Intelligence |
Swikir, Abdalla | Technical University of Munich |
Figueredo, Luis | Technical University of Munich (TUM) |
Abu-Dakka, Fares | Mondragon University |
Haddadin, Sami | Technical University of Munich |
Keywords: Representation Learning, Learning Categories and Concepts, Dexterous Manipulation
Abstract: The basis for robotics skill learning is an adequate representation of manipulation tasks based on their physical properties. As manipulation tasks are inherently invariant to the choice of reference frame, an ideal task representation would also exhibit this property. Nevertheless, most robotic learning approaches use unprocessed, coordinate-dependent robot state data for learning new skills, thus inducing challenges regarding the interpretability and transferability of the learned models. In this paper, we propose a transformation from spatial measurements to a coordinate-invariant feature space, based on the pairwise inner product of the input measurements. We describe and mathematically deduce the concept, establish the task fingerprints as an intuitive image-based representation, experimentally collect task fingerprints, and demonstrate the usage of the representation for task classification. This representation motivates further research on data-efficient and transferable learning methods for online manipulation task classification and task-level perception.
|
|
16:30-18:00, Paper ThCT18-AX.2 | Add to My Program |
SlotGNN: Unsupervised Discovery of Multi-Object Representations and Visual Dynamics |
|
Rezazadeh, Alireza | University of Minnesota |
Badithela, Athreyi | University of Minnesota - Twin Cities |
Desingh, Karthik | University of Minnesota |
Choi, Changhyun | University of Minnesota, Twin Cities |
Keywords: Representation Learning, Deep Learning for Visual Perception, Visual Learning
Abstract: Learning multi-object dynamics from visual data using unsupervised techniques is challenging due to the need for robust, object representations that can be learned through robot interactions. This paper presents a novel framework with two new architectures: SlotTransport for discovering object representations from RGB images and SlotGNN for predicting their collective dynamics from RGB images and robot interactions. Our SlotTransport architecture is based on slot attention for unsupervised object discovery and uses a feature transport mechanism to maintain temporal alignment in object-centric representations. This enables the discovery of slots that consistently reflect the composition of multi-object scenes. These slots robustly bind to distinct objects, even under heavy occlusion or absence. Our SlotGNN, a novel unsupervised graph-based dynamics model, predicts the future state of multi-object scenes. SlotGNN learns a graph representation of the scene using the discovered slots from SlotTransport and performs relational and spatial reasoning to predict the future appearance of each slot conditioned on robot actions. We demonstrate the effectiveness of SlotTransport in learning object-centric features that accurately encode both visual and positional information. Further, we highlight the accuracy of SlotGNN in downstream robotic tasks, including challenging multi-object rearrangement and long-horizon prediction. Finally, our unsupervised approach proves effective in the real world. With only minimal additional data, our framework robustly predicts slots and their corresponding dynamics in real-world control tasks.
|
|
16:30-18:00, Paper ThCT18-AX.3 | Add to My Program |
What Do We Learn from a Large-Scale Study of Pre-Trained Visual Representations in Sim and Real Environments? |
|
Silwal, Sneha | Meta |
Yadav, Karmesh | Georgia Tech |
Wu, Tingfan | Meta AI |
Vakil, Jay | Meta |
Majumdar, Arjun | Georgia Institute of Technology |
Arnaud, Sergio | Meta |
Chen, Claire | Stanford University |
Berges, Vincent-Pierre | Meta AI Research |
Batra, Dhruv | Georgia Tech / Facebook AI Research |
Rajeswaran, Aravind | Meta AI |
Kalakrishnan, Mrinal | Meta |
Meier, Franziska | Facebook |
Maksymets, Oleksandr | Facebook AI Research |
Keywords: Representation Learning, Deep Learning for Visual Perception, Vision-Based Navigation
Abstract: We present a large empirical investigation on the use of pre-trained visual representations (PVRs) for training downstream policies that execute real-world tasks. Our study involves five different PVRs, each trained for five distinct manipulation or indoor navigation tasks. We performed this evaluation using three different robots and two different policy learning paradigms. From this effort, we can arrive at three insights: 1) the performance trends of PVRs in the simulation are generally indicative of their trends in the real world, 2) the use of PVRs enables a first-of-its-kind result with indoor ImageNav (zero-shot transfer to a held-out scene in the real world), and 3) the benefits from variations in PVRs, primarily data-augmentation and fine-tuning, also transfer to the real-world performance.
|
|
16:30-18:00, Paper ThCT18-AX.4 | Add to My Program |
L-DYNO: Framework to Learn Consistent Visual Features Using Robot’s Motion |
|
Singh, Kartikeya | University at Buffalo |
Adhivarahan, Charuvahan | University at Buffalo, State University of New York |
Dantu, Karthik | University of Buffalo |
Keywords: Representation Learning, Deep Learning for Visual Perception, Localization
Abstract: Historically, feature-based approaches have been used extensively for camera-based robot perception tasks such as localization, mapping, tracking, and others. Several of these approaches also combine other sensors (inertial sensing, for example) to perform combined state estimation. Our work rethinks this approach; we present a representation learning mechanism that identifies visual features that best correspond to robot motion as estimated by an external signal. Specifically, we utilize the robot’s transformations through an external signal (inertial sensing, for example) and give attention to image space that is most consistent with the external signal. We use a pairwise consistency metric as a representation to keep the visual features consistent through a sequence with the robot’s relative pose transformations. This approach enables us to incorporate information from the robot’s perspective instead of solely relying on the image attributes. We evaluate our approach on real-world datasets such as KITTI & EuRoC and compare the refined features with existing feature descriptors. We also evaluate our method using our real robot experiment. We notice an average of 49% reduction in the image search space without compromising the trajectory estimation accuracy. Our method reduces the execution time of visual odometry by 4.3% and also reduces reprojection errors. We demonstrate the need to select only the most important features and show the competitiveness using various feature detection baselines.
|
|
16:30-18:00, Paper ThCT18-AX.5 | Add to My Program |
Point Cloud Models Improve Visual Robustness in Robotic Learners |
|
Peri, Skand | Oregon State University |
Lee, Iain | University of Utah |
Kim, Chanho | Oregon State University |
Fuxin, Li | Oregon State University |
Hermans, Tucker | University of Utah |
Lee, Stefan | Oregon State University |
Keywords: Representation Learning, Reinforcement Learning
Abstract: Visual control policies can encounter significant performance degradation when visual conditions like lighting or camera position differ from those seen during training -- often exhibiting sharp declines in capability even for minor differences. In this work, we examine robustness to a suite of these types of visual changes for RGB-D and point cloud based visual control policies. To perform these experiments on both model-free and model-based reinforcement learners, we introduce a novel Point Cloud World Model (PCWM) and point cloud based control policies. Our experiments show that policies that explicitly encode point clouds are significantly more robust than their RGB-D counterparts. Further, we find our proposed PCWM significantly outperforms prior works in terms of sample efficiency during training. Taken together, these results suggest reasoning about the 3D scene through point clouds can improve performance, reduce learning time, and increase robustness for robotic learners.
|
|
16:30-18:00, Paper ThCT18-AX.6 | Add to My Program |
HIO-SDF: Hierarchical Incremental Online Signed Distance Fields |
|
Vasilopoulos, Vasileios | Samsung Research America |
Garg, Suveer | University of Pennsylvania |
Huh, Jinwook | Samsung |
Lee, Bhoram | SRI International |
Isler, Volkan | University of Minnesota |
Keywords: Representation Learning, Incremental Learning
Abstract: A good representation of a large, complex mobile robot workspace must be space-efficient yet capable of encoding relevant geometric details. When exploring unknown environments, it needs to be updatable incrementally in an online fashion. We introduce HIO-SDF, a new method that represents the environment as a Signed Distance Field (SDF). State of the art representations of SDFs are based on either neural networks or voxel grids. Neural networks are capable of representing the SDF continuously. However, they are hard to update incrementally as neural networks tend to forget previously observed parts of the environment unless an extensive sensor history is stored for training. Voxel-based representations do not have this problem but they are not space-efficient especially in large environments with fine details. HIO-SDF combines the advantages of these representations using a hierarchical approach which employs a coarse voxel grid that captures the observed parts of the environment together with high-resolution local information to train a neural network. HIO-SDF achieves a 46% lower mean global SDF error across all test scenes than a state of the art continuous representation, and a 30% lower error than a discrete representation at the same resolution as our coarse global SDF grid. Videos and code are available at: https://samsunglabs.github.io/HIO-SDF-project-page/
|
|
16:30-18:00, Paper ThCT18-AX.7 | Add to My Program |
Recasting Generic Pretrained Vision Transformers As Object-Centric Scene Encoders for Manipulation Policies |
|
Qian, Jianing | University of Pennsylvania |
Panagopoulos, Anastasios | University of Pennsylvania |
Jayaraman, Dinesh | University of Pennsylvania |
Keywords: Representation Learning, Imitation Learning, Sensorimotor Learning
Abstract: Generic re-usable pre-trained image representation encoders have become a standard component of methods for many computer vision tasks. As visual representations for robots however, their utility has been limited, leading to a recent wave of efforts to pre-train robotics-specific image encoders that are better suited to robotic tasks than their generic counterparts. We propose Scene Objects From Transformers, abbreviated as SOFT, a wrapper around pre-trained vision transformer (PVT) models that bridges this gap without any further training. Rather than construct representations out of only the final layer activations, SOFT individuates and locates object-like entities from PVT attentions, and describes them with PVT activations, producing an object-centric representation. Across standard choices of generic pre-trained vision transformers PVT, we demonstrate in each case that policies trained on SOFT(PVT) far outstrip standard PVT representations for manipulation tasks in simulated and real settings, approaching the state-of-the-art robotics-aware representations. Appendix and videos: https://sites.google.com/view/robot-soft/
|
|
16:30-18:00, Paper ThCT18-AX.8 | Add to My Program |
NeRF-Loc: Transformer-Based Object Localization within Neural Radiance Fields |
|
Sun, Jiankai | Stanford University |
Xu, Yan | The Chinese University of Hong Kong |
Ding, Mingyu | UC Berkeley |
Yi, Hongwei | Max Planck Institute for Intelligent Systems |
Wang, Chen | Stanford University |
Wang, Jingdong | Baidu |
Zhang, Liangjun | Baidu |
Schwager, Mac | Stanford University |
Keywords: Representation Learning, Semantic Scene Understanding, Computer Vision for Automation
Abstract: Neural Radiance Fields (NeRFs) have become a widely-applied scene representation technique in recent years, showing advantages for robot navigation and manipulation tasks. To further advance the utility of NeRFs for robotics, we propose a transformer-based framework, NeRF-Loc, to extract 3D bounding boxes of objects in NeRF scenes. NeRF-Loc takes a pre-trained NeRF model and camera view as input and produces labeled, oriented 3D bounding boxes of objects as output. Using current NeRF training tools, a robot can train a NeRF environment model in real-time and, using our algorithm, identify 3D bounding boxes of objects of interest within the NeRF for downstream navigation or manipulation tasks. Concretely, we design a pair of paralleled transformer encoder branches, namely the coarse stream and the fine stream, to encode both the context and details of target objects. The encoded features are then fused together with attention layers to alleviate ambiguities for accurate object localization. We have compared our method with conventional RGB(-D) based methods that take rendered RGB images and depths from NeRFs as inputs. Our method achieves state-of-the-art performance. In addition, we also present the first NeRF samples-based object localization benchmark NeRFLocBench. We will make the benchmark and code publicly available.
|
|
ThCT19-NT Oral Session, NT-G301 |
Add to My Program |
Surgical Robotics III |
|
|
Chair: Huang, Huang | University of California at Berkeley |
Co-Chair: Iordachita, Ioan Iulian | Johns Hopkins University |
|
16:30-18:00, Paper ThCT19-NT.1 | Add to My Program |
Iterative PnP and Its Application in 3D-2D Vascular Image Registration for Robot Navigation |
|
Song, Jingwei | University of Michigan |
Yang, Keke | United Imaging |
Zhang, Zheng | 1. the Institute of Medical Imaging Technology, School of Biomed |
Li, Meng | Shanghai United Imaging Healthcare Co., Ltd |
Cao, Tuoyu | United Imaging Healthcare |
Ghaffari, Maani | University of Michigan |
Keywords: Surgical Robotics: Steerable Catheters/Needles, Localization, Computer Vision for Medical Robotics
Abstract: This paper reports on a new real-time robot-centered 3D-2D vascular image alignment algorithm, which is robust to outliers and can align nonrigid shapes. Few works have managed to achieve both real-time and accurate performance for vascular intervention robots. This work bridges high-accuracy 3D-2D registration techniques and computational efficiency requirements in intervention robot applications. We categorize centerline-based vascular 3D-2D image registration problems as an iterative Perspective-n-Point (PnP) problem and propose using the Levenberg-Marquardt solver on the Lie manifold. Then, the recently developed Reproducing Kernel Hilbert Space (RKHS) algorithm is introduced to overcome the ``big-to-small'' problem in typical robotic scenarios. Finally, an iterative reweighted least squares is applied to solve RKHS-based formulation efficiently. Experiments indicate that the proposed algorithm processes registration over 50 Hz (rigid) and 20 Hz (nonrigid) and obtains competing registration accuracy similar to other works. Results indicate that our Iterative PnP is suitable for future vascular intervention robot applications.
|
|
16:30-18:00, Paper ThCT19-NT.2 | Add to My Program |
Sim2Real Transfer of Reinforcement Learning for Concentric Tube Robots |
|
Iyengar, Keshav Kannan | University College London |
Sadati, S.M.Hadi | King's College London |
Bergeles, Christos | King's College London |
Spurgeon, Sarah | University College London |
Stoyanov, Danail | University College London |
Keywords: Surgical Robotics: Steerable Catheters/Needles, Machine Learning for Robot Control, Reinforcement Learning
Abstract: Concentric Tube Robots (CTRs) are promising for minimally invasive interventions due to their miniature diameter, high dexterity, and compliance with soft tissue. CTRs comprise individual pre-curved tubes usually composed of NiTi and are arranged concentrically. As each tube is relatively rotated and translated, the backbone elongates, twists, and bends with a dexterity that is advantageous for confined spaces. Tube interactions, unmodelled phenomena, and inaccurate tube parameter estimation make physical modeling of CTRs challenging, complicating in turn kinematics and control. Deep reinforcement learning (RL) has been investigated as a solution. However, hardware validation has remained a challenge due to differences between the simulation and hardware domains. With simulation-only data, in this work, domain randomization is proposed as a strategy for translation to hardware of a simulation policy with no additionally acquired physical training data. The differences in simulation and hardware forward kinematics accuracy and precision are characterized by errors of 14.74 +/- 8.87 mm or 26.61 +/- 17.00 % robot length. We showcase that the proposed domain randomization approach reduces errors by 56% in mean errors as compared to no domain randomization. Furthermore, we demonstrate path following capability in hardware with a line path with resulting errors of 4.37 +/- 2.39 mm or 5.61 +/- 3.11 % robot length.
|
|
16:30-18:00, Paper ThCT19-NT.4 | Add to My Program |
A Kinetostatic Model for Concentric Push-Pull Robots |
|
Childs, Jake | EndoTheia, Inc |
Rucker, Caleb | University of Tennessee |
Keywords: Surgical Robotics: Steerable Catheters/Needles, Medical Robots and Systems, Kinematics, Continuum Robots
Abstract: Concentric push-pull robots (CPPR) operate through the mechanical interactions of concentrically nested, laser-cut tubes with offset stiffness centers. The distal tips of the tubes are attached to each other, and relative displacement of the tube bases generates bending in the CPPR. Previous CPPR kinematic models assumed two tubes, planar shapes, no torsion, and no external loads. In this paper, we develop a new, more general CPPR model accounting for any number of tubes, describing their variable-curvature 3D shape when actuated, including the effects of torsion and external loads. To accomplish this, we employ a modified Kirchhoff rod model for each tube (with offset stiffness center) and embed the constraints of concentricity. We use an energy method to determine robot shape as a function of actuation and external loading. We experimentally validate this kinetostatic model on prototype CPPRs with two tubes and three tubes and non-constant laser-cut patterns that create variable curvature and stiffness. Experimental results agree with the model, paving the way for use of this model in design optimization, planning, and control of CPPRs.
|
|
16:30-18:00, Paper ThCT19-NT.5 | Add to My Program |
Fully Distributed Shape Sensing of a Flexible Surgical Needle Using Optical Frequency Domain Reflectometry for Prostate Interventions |
|
Francoeur, Jacynthe | Polytechnique Montréal |
Lezcano, Dimitri A. | Johns Hopkins University |
Zhetpissov, Yernar | Johns Hopkins University |
Kashyap, Raman | Polytechnique Montreal |
Iordachita, Ioan Iulian | Johns Hopkins University |
Kadoury, Samuel | Polytechnique Montréal |
Keywords: Surgical Robotics: Steerable Catheters/Needles, Medical Robots and Systems, Visual Tracking
Abstract: In minimally invasive procedures such as biopsies and prostate cancer brachytherapy, accurate needle placement remains challenging due to limitations in current tracking methods related to interference, reliability, resolution or image contrast. This often leads to frequent needle adjustments and reinsertions. To address these shortcomings, we introduce an optimized needle shape-sensing method using a fully distributed grating-based sensor. The proposed method uses simple trigonometric and geometric modeling of the fiber using optical frequency domain reflectometry (OFDR), without requiring prior knowledge of tissue properties or needle deflection shape and amplitude. Our optimization process includes a reproducible calibration process and a novel tip curvature compensation method. We validate our approach through experiments in artificial isotropic and inhomogeneous animal tissues, establishing ground truth using 3D stereo vision and cone beam computed tomography (CBCT) acquisitions, respectively. Our results yield an average RMSE ranging from 0.58 ± 0.21 mm to 0.66 ± 0.20 mm depending on the chosen spatial resolution, achieving the submillimeter accuracy required for interventional procedures.
|
|
16:30-18:00, Paper ThCT19-NT.6 | Add to My Program |
Integrated Magnetic Location Sensing and Actuation of Steerable Robotic Catheters for Peripheral Arterial Disease Treatment |
|
Wu, Jingjie | The University of Texas at Austin |
Yu, Kevin | University of Texas at Austin |
Lopez, Ithza | University of Texas at Austin |
Aguilar Izquierdo, Alexa | The University of Texas at Austin |
Saber, Hamidreza | The University of Texas at Austin |
Alambeigi, Farshid | University of Texas at Austin |
Zhou, Lei | University of Wisconsin-Madison |
Keywords: Surgical Robotics: Steerable Catheters/Needles, Medical Robots and Systems
Abstract: Magnetically steerable robotic catheters (MSRC) are a promising technology for percutaneous endovascular intervention procedures to treat peripheral arterial diseases, where magnetic actuation is used to steer the catheter tip during navigation. However, today's MSRC systems require fluoroscopic imaging for catheter localization during navigation, which risks creating radiation-induced injuries to both the patient and the surgeon. Aiming to reduce the duration of x-ray radiation in interventions using MSRCs, this letter introduces a new steerable robotic catheter system that integrates magnetic location sensing and magnetic actuation. The proposed catheter uses a magnetic tip to enable magnetic steering. In addition, a cylindrical array of magnetic sensors is used to measure the field from the catheter tip to enable real-time catheter localization. To enable improved localization accuracy, a novel nested calibration algorithm for sensor positions and magnet dipole strength is introduced. This letter further proposes a novel integration of magnetic actuation and magnetic localization in MSRC systems, where fluoroscopic imaging is only required during catheter steering at bifurcations in the vasculatures. The proposed methodology is tested with an MSRC prototype, where the magnet location estimation algorithm is implemented for real-time visual feedback to the operator with a low latency of 400 ms. Experiments show that an average localization error of 0.95 mm can be achieved a
|
|
16:30-18:00, Paper ThCT19-NT.7 | Add to My Program |
Semi-Autonomous Robotic Manipulator for Minimally Invasive Aortic Valve Replacement |
|
Tamadon, Izadyar | University of Twente |
Sadati, S.M.Hadi | King's College London |
Mamone, Virginia | University of Pisa, EndoCAS |
Ferrari, Vincenzo | Universitŕ Di Pisa |
Bergeles, Christos | King's College London |
Menciassi, Arianna | Scuola Superiore Sant'Anna - SSSA |
Keywords: Surgical Robotics: Steerable Catheters/Needles, Motion Control of Manipulators, Force and Tactile Sensing, Visual Servoing
Abstract: Aortic valve surgery is the preferred procedure for replacing a damaged valve with an artificial one. The ValveTech robotic platform comprises a flexible articulated manipulator and surgical interface supporting the effective delivery of an artificial valve by tele-operation and endoscopic vision. This manuscript presents our recent work on force-perceptive safe semi-autonomous navigation of the ValveTech platform prior to valve implantation. First, we present a force observer that transfers forces from the manipulator body and tip to a haptic interface. Second, we demonstrate how hybrid forward/inverse mechanics together with endoscopic visual servoing lead to autonomous valve positioning. Benchtop experiments and an artificial phantom quantify the performance of the developed robot controller and navigator. Valves can be autonomously delivered with a 2.0±0.5 mm position error, and minimal misalignment of 3.4±0.9°. The hybrid Force/Shape Observer (FSO) algorithm was able to predict distributed external forces on the articulated manipulator body with an average 0.09 N error. FSO can also estimate loads on tip with an average accuracy of 3.3%. The presented system can lead to better
|
|
16:30-18:00, Paper ThCT19-NT.8 | Add to My Program |
Robotic Needle Insertion with 2D Ultrasound – 3D CT Fusion Guidance (I) |
|
Lei, Long | Shenzhen Institute of Advanced Technology, Chinese Academy of Sc |
Zhao, Baoliang | Shenzhen Institutes of Advanced Technology, Chinese Academy of S |
Qi, Xiaozhi | Shenzhen Institutes of Advanced Technology, Chinese Academy of S |
Mi, Rui | Department of Radiology, Shenzhen University General Hospital, S |
Ye, Hai | Department of Radiology, Shenzhen University General Hospital, S |
Zhang, Peng | Shenzhen Institutes of Advanced Technology, ChineseAcademyofScien |
Wang, Qiong | Shenzhen Institutes of Advanced Technology, Chinese Academy of S |
Heng, Pheng Ann | The Chinese University of Hong Kong |
Hu, Ying | Shenzhen Institute of Advanced Technology, ShenZhen, China |
Keywords: Surgical Robotics: Steerable Catheters/Needles, Software Architecture for Robotic and Automation, Medical Robots and Systems
Abstract: Puncture robots pave a new way for stable, accurate and safe percutaneous liver tumor puncture operation. However, affected by respiratory motion, intraoperative accurate location of the tumor and its surrounding anatomical structures remains a difficult problem in existing robot-assisted puncture operations. In this paper, a dual-arm robotic needle insertion system with guidance of intraoperative 2D ultrasound (US) and preoperative 3D computed tomography (CT) fusion is proposed, addressing the shortcomings of existing puncture robots. To deal with the challenge of cross-modal and cross-dimensional registration between 2D US and 3D CT, a decoupled two-stage registration approach combining initial vessel structure-based 3D US – 3D CT registration with intraoperative intensity-based 2D US - 3D US registration is proposed. To achieve fast and robust ultrasound probe calibration, a method based on an improved N-wire phantom is proposed. Twenty puncture experiments are performed in different breath-holding positions on a respiratory motion simulation platform, and experimental results show that the mean puncture error is 2.48 mm, which can meet the requirements in a wide of clinical scenarios.
|
|
ThCT20-NT Oral Session, NT-G302 |
Add to My Program |
Software for Robotic and Automation |
|
|
Chair: Oldemeyer, Carsten | German Aerospace Center |
Co-Chair: Reiher, Lennart | RWTH Aachen University |
|
16:30-18:00, Paper ThCT20-NT.1 | Add to My Program |
MoRC - a Modular Robot Controller |
|
Oldemeyer, Carsten | German Aerospace Center |
Hellerer, Matthias | German Aerospace Center |
Reiner, Matthias | German Aerospace Center |
Thiele, Bernhard | German Aerospace Center |
Weber, Patrick | German Aerospace Center (DLR) |
Bellmann, Tobias | German Aerospace Center |
Keywords: Software Architecture for Robotic and Automation, Control Architectures and Programming, Industrial Robots
Abstract: MoRC is a high-performance modular robot controller based on the Functional Mock-up Interface (FMI) standard. The goal is to control any (industrial) robot with electrical drives using a customizable vendor-agnostic control cabinet and an innovative, self-developed software architecture based on exchangeable multi-rate real-time control components with standardized interfaces. On the hardware side, the use of EtherCAT (Ethernet for Control Automation Technology) allows connecting a freely selectable number of COTS (commercial off-the-shelf) electrical drives and sensors. On the software side, this is matched with exchangeable control software modules based on the FMI standard. Those can be interconnected for forming user-defined multi-rate control structures which can be executed as synchronized real-time threads on a central Linux-based multi-core computing unit. That unlocks additional computational potential for advanced high-frequency control algorithms. Control structures can be switched at runtime to handle highly diverse control tasks. This paper presents the architectural concepts as well as first experiments on an industrial robot testbed.
|
|
16:30-18:00, Paper ThCT20-NT.2 | Add to My Program |
Enabling the Deployment of Any-Scale Robotic Applications in Microservice Architectures through Automated Containerization |
|
Busch, Jean-Pierre | RWTH Aachen University |
Reiher, Lennart | RWTH Aachen University |
Eckstein, Lutz | Institute for Automotive Engineering, RWTH Aachen University |
Keywords: Software Architecture for Robotic and Automation, Software Tools for Robot Programming, Intelligent Transportation Systems
Abstract: In an increasingly automated world – from warehouse robots to self-driving cars – streamlining the development and deployment process and operations of robotic applications becomes ever more important. Automated DevOps processes and microservice architectures have already proven successful in other domains such as large-scale customer-oriented web services (e.g., Netflix). We recommend to employ similar microservice architectures for the deployment of small- to large-scale robotic applications in order to accelerate development cycles, loosen functional dependence, and improve resiliency and elasticity. In order to facilitate involved DevOps processes, we present and release a tooling suite for automating the development of microservices for robotic applications based on the Robot Operating System (ROS). Our tooling suite covers the automated minimal containerization of ROS applications, a collection of useful machine learning-enabled base container images, as well as a CLI tool for simplified interaction with container images during the development phase. Within the scope of this paper, we embed our tooling suite into the overall context of streamlined robotics deployment and compare it to alternative solutions. We release our tools as open-source software at https://github.com/ika-rwth-aachen/dorotos.
|
|
16:30-18:00, Paper ThCT20-NT.3 | Add to My Program |
Plug’n Play Task-Level Autonomy for Robotics Using POMDPs and Probabilistic Programs |
|
Wertheim, Or | Ben Gurion University of the Negev |
Suissa, Dan Rouven | Ben-Gurion University of the Negev |
Brafman, Ronen | Ben-Gurion University |
Keywords: Software Architecture for Robotic and Automation, Task Planning
Abstract: We describe AOS, the first general-purpose system for model-based control of autonomous robots using AI planning that fully supports partial observability and noisy sensing. The AOS provides a code-based language for specifying a generative model of the system, making model specification easier and model sampling efficient. It also provides a language for specifying the relationship between the model and the actual code, using which it auto-generates all required integration code. This allows Plug'n Play behavior, which facilitates incremental and modular system design. Extensive experiments on real and simulated robotic platforms demonstrate these advantages.
|
|
16:30-18:00, Paper ThCT20-NT.4 | Add to My Program |
CoBRA: A Composable Benchmark for Robotics Applications |
|
Mayer, Matthias | Technical University of Munich |
Külz, Jonathan | Technical University of Munich |
Althoff, Matthias | Technische Universität München |
Keywords: Software Tools for Benchmarking and Reproducibility, Cellular and Modular Robots, Industrial Robots
Abstract: Selecting an optimal robot, its base pose, and trajectory for a given task is currently mainly done by human expertise or trial and error. To evaluate automatic approaches to this combined optimization problem, we introduce a benchmark suite encompassing a unified format for robots, environments, and task descriptions. Our benchmark suite is especially useful for modular robots, where the multitude of robots that can be assembled creates a host of additional parameters to optimize. We include tasks such as machine tending and welding in synthetic environments and 3D scans of real-world machine shops. All benchmarks are accessible through cobra.cps.cit.tum.de, a platform to conveniently share, reference, and compare tasks, robot models, and solutions.
|
|
16:30-18:00, Paper ThCT20-NT.5 | Add to My Program |
GSL-Bench: High Fidelity Gas Source Localization Benchmarking Tool |
|
Erwich, Hajo Henricus | Delft University of Technology |
Duisterhof, Bart | Carnegie Mellon University |
de Croon, Guido | Delft University of Technology |
Keywords: Software Tools for Benchmarking and Reproducibility, Localization, Data Sets for Robot Learning
Abstract: Gas Source Localization (GSL) is a challenging field of research within the robotics community, with high-stakes search-and-rescue applications. Existing methods vary widely and each has its strengths and weaknesses. Comparisons of different methods are limited due to the lack of a broadly adopted and standardized testing methodology. Existing GSL evaluations vary in environment size, wind conditions, and gas simulation fidelity. They also lack photo-realistic rendering for the integration of obstacle avoidance. In this paper, we propose GSL-Bench, a benchmarking tool that can evaluate the performance of existing GSL algorithms. GSL-Bench features high-fidelity graphics and gas simulation, featuring NVIDIA's Isaac Sim and OpenFOAM computational fluid dynamics software (CFD). Realism is further increased by simulating relevant gas and wind sensors. Scene generation is simplified with the introduction of AutoGDM+, capable of procedural environment generation, CFD and particle-based gas dispersion simulation. To illustrate GSL-Bench's capabilities, three algorithms are compared in six warehouse settings of increasing complexity: E. Coli, dung beetle, and a random walker. Our results demonstrate GSL-Bench's ability to provide valuable insights into algorithm performance.
|
|
16:30-18:00, Paper ThCT20-NT.6 | Add to My Program |
Cook2LTL: Translating Cooking Recipes to LTL Formulae Using Large Language Models |
|
Mavrogiannis, Angelos | University of Maryland, College Park |
Mavrogiannis, Christoforos | University of Michigan |
Aloimonos, Yiannis | University of Maryland |
Keywords: Software Tools for Robot Programming, AI-Enabled Robotics, AI-Based Methods
Abstract: Cooking recipes are challenging to translate to robot plans as they feature rich linguistic complexity, temporally-extended interconnected tasks, and an almost infinite space of possible actions. Our key insight is that combining a source of cooking domain knowledge with a formalism that captures the temporal richness of cooking recipes could enable the extraction of unambiguous, robot-executable plans. In this work, we use Linear Temporal Logic (LTL) as a formal language expressive enough to model the temporal nature of cooking recipes. Leveraging a pretrained Large Language Model (LLM), we present Cook2LTL, a system that translates instruction steps from an arbitrary cooking recipe found on the internet to a set of LTL formulae, grounding high-level cooking actions to a set of primitive actions that are executable by a manipulator in a kitchen environment. Cook2LTL makes use of a caching scheme that dynamically builds a queryable action library at runtime. We instantiate Cook2LTL in a realistic simulation environment (AI2-THOR), and evaluate its performance across a series of cooking recipes. We demonstrate that our system significantly decreases LLM API calls (-51%), latency (-59%), and cost (-42%) compared to a baseline that queries the LLM for every newly encountered action at runtime.
|
|
16:30-18:00, Paper ThCT20-NT.7 | Add to My Program |
Toward Automated Programming for Robotic Assembly Using ChatGPT |
|
Cote, Nicholas | Autodesk, Inc |
Macaluso, Annabella | University of California, San Diego |
Chitta, Sachin | Autodesk Inc |
Keywords: Software Tools for Robot Programming, Assembly, Process Control
Abstract: Despite significant technological advancements, the process of programming robots for adaptive assembly remains labor-intensive, demanding expertise in multiple domains and often resulting in task-specific, inflexible code. This work explores the potential of Large Language Models (LLMs), like ChatGPT, to automate this process, leveraging their ability to understand natural language instructions, generalize examples to new tasks, and write code. In this paper, we suggest how these abilities can be harnessed and applied to real-world challenges in the manufacturing industry. We present a novel system that uses ChatGPT to automate the process of programming robots for adaptive assembly by decomposing complex tasks into simpler subtasks, generating robot control code, executing the code in a simulated workcell, and debugging syntax and control errors, such as collisions. We outline the architecture of this system and strategies for task decomposition and code generation. Finally, we demonstrate how our system can autonomously program robots for various assembly tasks in a real-world project.
|
|
16:30-18:00, Paper ThCT20-NT.8 | Add to My Program |
A Method for Multi-Robot Asynchronous Trajectory Execution in MoveIt2 |
|
Stoop, Pascal | OST |
Ratnayake, Tharaka | Zurich Applied Science University |
Toffetti, Giovanni | Zurich University of Applied Sciences (ZHAW) |
Keywords: Software Tools for Robot Programming, Collision Avoidance, Dual Arm Manipulation
Abstract: This paper introduces a method that enables the parallel independent execution of trajectories for multi-robot multi-arm systems in a shared workspace in MoveIt2. The proposed method leverages a centralized scheduler in a distributed set up to prevent collisions while the robots move independently. We argue that this approach is better suited than the state of the art (i.e., synchronous execution) for flexible/adaptive robotic tasks where the actions to be performed may vary in planning and execution time depending on sensor data (e.g., pick and place with inspection, assembly) as it is able to reduce the total execution time w.r.t. current approaches leveraging a single arm or multiple arms with synchronous motion planning.
|
|
16:30-18:00, Paper ThCT20-NT.9 | Add to My Program |
Improving the ROS 2 Navigation Stack with Real-Time Local Costmap Updates for Agricultural Applications |
|
Sani, Ettore | University of Genova |
Sgorbissa, Antonio | University of Genova |
Carpin, Stefano | University of California, Merced |
Keywords: Software Tools for Robot Programming, Software, Middleware and Programming Environments, Agricultural Automation
Abstract: The ROS 2 Navigation Stack (Nav2) has emerged as a widely used software component providing the underlying basis to develop a variety of high-level functionalities. However, when used in outdoor environments such as orchards and vineyards, its functionality is notably limited by the presence of obstacles and/or situations not commonly found in indoor settings. One such example is given by tall grass and weeds that can be safely traversed by a robot, but that can be perceived as obstacles by LiDAR sensors, and then force the robot to take longer paths to avoid them, or abort navigation altogether. To overcome these limitations, domain specific extensions must be developed and integrated into the software pipeline. This paper presents a new, lightweight approach to address this challenge and improve outdoor robot navigation. Leveraging the multi-scale nature of the costmaps supporting Nav2, we developed a system that using a depth camera performs pixel level classification on the images, and in real time injects corrections into the local cost map, thus enabling the robot to traverse areas that would otherwise be avoided by the Nav2. Our approach has been implemented and validated on a Clearpath Husky and we demonstrate that with this extension the robot is able to perform navigation tasks that would be otherwise not practical with the standard components.
|
|
ThCT21-NT Oral Session, NT-G303 |
Add to My Program |
Microrobotics for Biology |
|
|
Chair: Arai, Fumihito | The University of Tokyo |
Co-Chair: Boudaoud, Mokrane | Sorbonne Université |
|
16:30-18:00, Paper ThCT21-NT.1 | Add to My Program |
Automated Non-Invasive Analysis of Motile Sperms Using Cross-Scale Guidance Network |
|
Dai, Wei | City University of Hong Kong |
Wu, Zixuan | City University of Hong Kong |
Wang, Jiaqi | The Chinese University of HongKong,Shenzhen |
Liu, Rui | City University of Hong Kong |
Wang, Min | City University of Hong Kong |
Wu, Tianyi | City University of Hong Kong |
Zhou, Junxian | City University of Hong Kong |
Zhang, Zhuoran | The Chinese University of Hong Kong, Shenzhen |
Liu, Jun | City University of Hong Kong |
Keywords: Automation at Micro-Nano Scales, Biological Cell Manipulation, Micro/Nano Robots
Abstract: Unbiased measurement of sperm morphometric and motility parameters is essential for assessing fertility potential and guiding visual feedback for microrobotic manipulation. Automated analysis of multiple sperms and selection of an optimal sperm is crucial for in vitro fertilisation treatment such as robotic intracytoplasmic sperm injection. However, conventional image processing methods have limitations in analysing small sperm objects under microscopic imaging. The emergence of convolutional neural networks (CNNs) has offered promising advancements in microscopic image analysis. However, previous CNN methods have struggled to accurately segment tiny objects, requiring staining or fluorescence techniques to enhance visual contrast between sperm and culture medium, leading to clinical impracticality. To address these limitations, we introduce a novel segmentation network named the cross-scale guidance (CSG) network for accurate and efficient segmentation of minute sperm objects. The CSG network employs innovative modules, including collateral multi-scale convolution, cross-scale feature map guide, and multi-scale feature fusion, to preserve essential sperm details despite their small size. Experimental results indicate that the CSG network surpassed the state-of-the-art models designed for small object segmentation, achieving a significant increase up to 18.62% higher mean intersection over union (mIoU). Additionally, the CSG network excelled in sperm morphometric analysis, achieving errors below 20%. Moreover, sperm motility parameters were further derived from the segmentation results for comprehensive sperm fertility analysis.
|
|
16:30-18:00, Paper ThCT21-NT.2 | Add to My Program |
Multi-Scale Visual Servoing Framework for Optical Microscopy Based on SIFT Matching |
|
Zhang, Yameng | The Chinese University of Hong Kong |
Xu, Ao | The Chinese University of Hong Kong |
Chen, Yuhan | Southern University of Science and Technology |
Meng, Max Q.-H. | The Chinese University of Hong Kong |
Liu, Li | The Chinese University of Hong Kong |
Keywords: Automation at Micro-Nano Scales, Calibration and Identification, Visual Servoing
Abstract: This paper introduces an innovative multi-scale visual servoing framework for optical microscopy, engineered to automatically reposition the microscope for high-magnification target view across multiple magnifications, thereby facilitating repetitive and accurate histologic biopsies. The framework encompasses an active microscope-camera system equipped with both auto-calibration and multi-scale visual servoing capabilities. The auto-calibration technique addresses the challenges posed by the limited depth of field and pattern requirements of the microscope-camera system, and determines its intrinsic and hand-eye parameters through a two-step algorithm. The calibration data is then utilized to execute a SIFT matching-based visual servoing control at progressively increasing magnifications, using only a single high-magnification target view as a reference, ultimately enabling rapid and precise repositioning of the microscope. Experimental results demonstrate the precision and stability of the auto-calibration method, as well as the robustness of the visual servoing method against occlusion, blur, and low illumination.
|
|
16:30-18:00, Paper ThCT21-NT.3 | Add to My Program |
Robotic Capillary Insertion to the Xenopus Oocyte Using Microscopic Image Analysis and QCR Force Sensor |
|
Otani, Kazusa | The University of Tokyo |
Sugiura, Hirotaka | The University of Tokyo |
Watanabe, Shiro | The University of Tokyo |
Turan, Bilal | Nagoya University |
Amaya, Satoshi | The University of Tokyo |
Arai, Fumihito | The University of Tokyo |
Keywords: Automation at Micro-Nano Scales, Force Control, Biological Cell Manipulation
Abstract: This paper presented the three-dimensional oocyte manipulation system for the two-electrode voltage clamp (TEVC) experiment under stereomicroscopy. We firstly developed a sequential calibration method to correlate the workspace of the stereomicroscopy with the image and the micromanipulator. Even though the focal depth of the microscopy was limited, the proposed method functioned the three-dimensional position detection and calculated the homogeneous transformation matrix. We secondly employed hybrid use of the image-based manipulation and the quartz crystal resonator (QCR) force sensor. The imaging technique was used to detect the tip of the glass capillary and the contact to the cell membrane, whereas the QCR force sensor was incorporated to detect the force interaction between the sample and the glass capillary. Using the system and proposed technique, we demonstrated the automatic capillary insertion for TEVC experiment, at which the low insertion depth was preferable. The results indicated that the coordination calibration technique provided the positioning accuracy of the capillary tip on the order of 10 um. The imaging technique could detect the contact to the elastic objects and cell membrane. QCR force sensor achieved quite small force measurement and feedback control at the control frequency of 100 Hz without latency.
|
|
16:30-18:00, Paper ThCT21-NT.4 | Add to My Program |
Robotic Mosaic Atomic Force Microscopy through Sequential Imaging and Multiview Iterative Closest Points Method |
|
Romero Leiro, Freddy | Sorbonne Université - Institut Des Systčmes Intélligents Et Robo |
Régnier, Stéphane | Sorbonne University |
Delarue, Frederic | Sorbonne University |
Boudaoud, Mokrane | Sorbonne Université |
Keywords: Automation at Micro-Nano Scales, Micro/Nano Robots
Abstract: This paper presents a functionality that has been developed for the home-made AFM-in-SEM robotic system at the ISIR laboratory. The method allows extending the range of an Atomic Force Microscope (AFM) and dealing with drift issues by fusing multiple individually AFM topography patches. The merging of the patches into a single image is done through a Generalized Procrustes Analysis Iterative Closest Point (GPA-ICP) algorithm. To validate the effectiveness of the approach, an AFM image of a TGX1 calibration grid and a 3.4- billion-year-old organic-walled microfossil are reconstructed by automatically merging 50 AFM elementary topography patches of dimension 0.9 μm × 1.2 μm based on feature matching. The overlap between two adjacent patches is 50 % and 33 % in the X and Y axes respectively. The result is a coherent 3.2 μm × 3.0 μm drift-free long range AFM topography without significant artifacts. The method is tested using an AFM-in- SEM system based on a 3-DOF cartesian robot equipped with inertial piezoelectric actuators. This method can be used to extend the range of any type of AFM with a dual XY stage setup. Thus, it opens the door for high-resolution long-range AFM by adding a long-range coarse resolution stage to a preexisting AFM system all without needing to actuate both stages simultaneously.
|
|
16:30-18:00, Paper ThCT21-NT.5 | Add to My Program |
Automated Sperm Immobilization with a Clinically-Compatible and Compact XYZ Stage |
|
Song, Haocong | University of Toronto |
Chen, Wenyuan | University of Toronto |
Dai, Changsheng | Dalian University of Technology |
Shan, Guanqiao | University of Toronto |
Yang, Steven | University of Toronto |
Jiang, Aojun | University of Toronto |
Zhang, Zhuoran | The Chinese University of Hong Kong, Shenzhen |
Sun, Yu | University of Toronto |
Keywords: Biological Cell Manipulation, Automation at Micro-Nano Scales
Abstract: Automated positioning systems play a pivotal role in micro-scale cell manipulation. In clinical intracytoplasmic sperm injection (ICSI) of in vitro fertilization (IVF) treatment, a motile sperm needs to be immobilized by glass micropipette tapping for subsequent surgical steps. The process requires accurate tracking of the target sperm and precise alignment between the sperm tail and the micropipette. Manual sperm immobilization suffers from inconsistent success rates, and current robotic systems developed for the task fail to comply with the standard clinical setup. Instead of using a motorized micromanipulator as in existing robotic systems, this paper presents an automated, compact three-dimensional positioning stage for sperm immobilization that can be seamlessly integrated into standard clinical platforms. Based on the analysis of the sperm head orientation, an adaptive tail tapping planning strategy is established to avoid the risk of touching the sperm head where DNA is contained. A visual servo controller equipped with a dynamic sperm motion observer is employed to achieve precise tracking and positioning of the target sperm three-dimensionally. Experimental results revealed the system achieved a success rate of 93.5% and a time cost of 5.5 s for automated sperm immobilization.
|
|
16:30-18:00, Paper ThCT21-NT.6 | Add to My Program |
Automated Sperm Morphology Analysis Based on Instance-Aware Part Segmentation |
|
Chen, Wenyuan | University of Toronto |
Song, Haocong | University of Toronto |
Dai, Changsheng | Dalian University of Technology |
Jiang, Aojun | University of Toronto |
Shan, Guanqiao | University of Toronto |
Liu, Hang | University of Toronto |
Zhou, Yanlong | Henan University |
Abdalla, Khaled | CReATe Fertility Centre |
Dhanani, Shivani N | Create Fertility Center |
Katy Fatemeh, Moosavi | Create Fertility Center |
Pathak, Shruti | Create Fertility Center |
Librach, Clifford | University of Toronto |
Zhang, Zhuoran | The Chinese University of Hong Kong, Shenzhen |
Sun, Yu | University of Toronto |
Keywords: Computer Vision for Automation
Abstract: Traditional sperm morphology analysis is based on tedious manual annotation. Automated morphology analysis of a high number of sperm requires accurate segmentation of each sperm part and quantitative morphology evaluation. State-of-the-art instance-aware part segmentation networks follow a “detect-then-segment” paradigm. However, due to sperm’s slim shape, their segmentation suffers from large context loss and feature distortion due to bounding box cropping and resizing during ROI Align. Moreover, morphology measurement of sperm tail is demanding because of the long and curved shape and its uneven width. This paper presents automated techniques to measure sperm morphology parameters automatically and quantitatively. A novel attention-based instance-aware part segmentation network is designed to reconstruct lost contexts outside bounding boxes and to fix distorted features, by refining preliminary segmented masks through merging features extracted by feature pyramid network. An automated centerline-based tail morphology measurement method is also proposed, in which an outlier filtering method and endpoint detection algorithm are designed to accurately reconstruct tail endpoints. Experimental results demonstrate that the proposed network outperformed the state-of-the-art top-down RP-R-CNN by 9.2% AP_vol^p, and the proposed automated tail morphology measurement method achieved high measurement accuracies of 95.34%,96.39%,91.20% for length, width and curvature, respectively.
|
|
16:30-18:00, Paper ThCT21-NT.7 | Add to My Program |
Fast Photoacoustic Microscopy with Robot Controlled Microtrajectory Optimization |
|
Luo, Yating | Shanghai Jiao Tong University |
Liu, Yuxuan | Shanghai Jiao Tong University |
Zhou, Jiasheng | Shanghai Jiao Tong Univerity |
Chen, Sung-Liang | Shanghai Jiao Tong University |
Guo, Yao | Shanghai Jiao Tong University |
Yang, Guang-Zhong | Shanghai Jiao Tong University |
Keywords: Computer Vision for Medical Robotics, Surgical Robotics: Planning, Optimization and Optimal Control
Abstract: Photoacoustic Microscopy (PAM) is a relatively new imaging modality in biomedicine. However, point-by-point raster scanning in PAM suffers from low imaging speed. Sparse sampling has been studied in recent years and with the development of deep learning algorithms, extensive efforts have been devoted to sparse image reconstruction while little attention has been paid to sparse sampling trajectory design required for actual implementation. The use of real-time adaptive robotically controlled sampling with micro-scale accuracy with due consideration of physical constraints can pave the way for using PAM for robot-assisted microsurgery. This work proposes a fast PAM scheme with robot-controlled microtrajectory optimization. The proposed method is adaptive to imaging details of different regions of interest (ROI) and detailed experiments have been conducted on both simulation and in-vivo settings. Results show that our proposed method can achieve faster scanning speed than traditional raster scanning and improved image quality in ROI than the standard spiral trajectory, which demonstrates the effectiveness of our proposed method and its potential to be deployed in other point-by-point scanning systems.
|
|
16:30-18:00, Paper ThCT21-NT.8 | Add to My Program |
Acoustically Driven Micropipette for Hydrodynamic Manipulation of Mouse Oocytes |
|
Zuo, Zhaofeng | Beijing Institute of Technology |
Liu, Xiaoming | Beijing Institute of Technology |
Chen, Zhuo | Beijing Institute of Technology |
Li, Yuyang | Beijing Institute of Technology |
Tang, Xiaoqing | Beijing Institute of Technology |
Liu, Dan | Beijing Institute of Technology |
Huang, Qiang | Beijing Institute of Technology |
Arai, Tatsuo | University of Electro-Communications |
Keywords: Micro/Nano Robots, Biological Cell Manipulation
Abstract: Micromanipulation techniques that can achieve controlled fine operations at the micro scale play an important role in biomedical fields including embryo engineering, gene engineering, drug screening, and cell analysis. However, micromanipulation of biological micro-objects, such as cells and micro tissues, suffers from mechanical damage and low efficiency. Several techniques have been introduced to manipulate cells more easily, but most of them are restricted by expensive devices, limited work area, and potential damage to cellular structure. Here we develop a hydrodynamic manipulation method to rotate and transport mouse oocytes, which utilizes acoustic waves and micropipette to generate acoustic radiation force and excite microstreaming. This method can accomplish rotational and translational operations precisely and controllably. We tested the process of trapping, rotation, and transportation of the mouse oocytes, and measured rotational and translational speed with a range of applied voltage. The method was able to shorten the cost time of delivery and posture adjustment before oocyte injection. Our study provides an easy-to-use technique for oocyte manipulation without contact, and it has the potential to be universally applied in many cellular studies.
|
|
16:30-18:00, Paper ThCT21-NT.9 | Add to My Program |
Remote Control of Untethered Magnetic Robots within a Lumen Using X-Ray-Guided Robotic Platform |
|
Ligtenberg, Leendert-Jan Wouter | University of Twente |
Rabou, Nicole Christina Antoinetta | University of Twente |
Peters, Sander Lars | Universiteit Twente |
Vengetela, Trishal Sai Srinivas | University of Twente |
Vincent, Schut | Saxion University |
Liefers, Herman Remco | University of Twente |
Warle, Michiel | Radboud University Medical Center |
Khalil, Islam S.M. | University of Twente |
Keywords: Micro/Nano Robots, Medical Robots and Systems
Abstract: Until now, the potential of untethered magnetic robots (UMRs), propelled by external time-periodic magnetic fields, has been hindered by the limitations of wireless manipulation systems or noninvasive imaging techniques combined. The need for simultaneous actuation and noninvasive localization imposes a strict constraint on both functionalities. This study addresses this challenge by substantiating the feasibility through experimental validation, showcasing the direct teleoperation of UMRs within a fluid-filled lumen. This teleoperation capability is facilitated by a scalable X-ray-guided robotic platform, extendable to match the dimensions required for in vivo applications, marking a noteworthy advancement. Our methodology is demonstrated by teleoperating a 12-mm-long screw-shaped UMR (5 mm in diameter) within a bifurcated lumen, filled with blood. This navigation is achieved using controlled rotating magnetic fields, guided by real-time Xray Fluoroscopy images. Incorporating a two-degree-of-freedom control system, we demonstrate the operator’s capability to use X-ray Fluoroscopy images to keep the UMR coupled with the external field during wireless teleoperations, resulting in a success rate of 76.6% when moving along the intended pathways, with a mean absolute position error of 1.6 ± 2.1 mm.
|
|
ThCT22-NT Oral Session, NT-G304 |
Add to My Program |
Telerobotics and Teleoperation II |
|
|
Chair: Aragon-Camarasa, Gerardo | University of Glasgow |
Co-Chair: Li, Songpo | Honda Research Institute |
|
16:30-18:00, Paper ThCT22-NT.1 | Add to My Program |
TELESIM: A Modular and Plug-And-Play Framework for Robotic Arm Teleoperation Using a Digital Twin |
|
Audonnet, Florent | University of Glasgow |
Grizou, Jonathan | University of Glasgow |
Hamilton, Andrew | School of Computing Science, University of Glasgow |
Aragon-Camarasa, Gerardo | University of Glasgow |
Keywords: Telerobotics and Teleoperation, Virtual Reality and Interfaces, Human-Robot Collaboration
Abstract: Teleoperating robotic arms can be a challenging task for non-experts, particularly when using complex control devices or interfaces. To address the limitations and challenges of existing teleoperation frameworks, such as cognitive strain, control complexity, robot compatibility, and user evaluation, we propose TELESIM, a modular and plug-and-play framework that enables direct teleoperation of any robotic arm using a digital twin as the interface between users and the robotic system. Due to TELESIM's modular design, it is possible to control the digital twin using any device that outputs a 3D pose, such as a virtual reality controller or a finger-mapping hardware controller. To evaluate the efficacy and user-friendliness of TELESIM, we conducted a user study with 37 participants. The study involved a simple pick-and-place task, which was performed using two different robots equipped with two different control modalities. Our experimental results show that most users were able to succeed by building at least a tower of 3 cubes in 10 minutes, with only 5 minutes of training beforehand, regardless of the control modality or robot used, demonstrating the usability and user-friendliness of TELESIM.
|
|
16:30-18:00, Paper ThCT22-NT.2 | Add to My Program |
Synchronized Human-Humanoid Motion Imitation |
|
Dallard, Antonin | LIRMM |
Benallegue, Mehdi | AIST Japan |
Kanehiro, Fumio | National Inst. of AIST |
Kheddar, Abderrahmane | CNRS-AIST |
Keywords: Telerobotics and Teleoperation, Intention Recognition, Human and Humanoid Motion Analysis and Synthesis
Abstract: We present a tele-operation control framework that (i) enhances the upper motion synchrony between a user and a robot using the minimum-jerk model coupled with a recursive least-square filter, and (ii) synchronizes the walking pace by predicting user's stepping frequency using motion capture data and a deep learning model. By integrating (i) and (ii) in a task-space whole-body controller, we achieve full-body synchronization. We assess our humanoid-to-human whole-body synchronized motion model on the HRP-4 humanoid robot in forward, lateral and backward walks with concurrent upper limbs motions experiments.
|
|
16:30-18:00, Paper ThCT22-NT.3 | Add to My Program |
SPOTS: Stable Placement of Objects with Reasoning in Semi-Autonomous Teleoperation Systems |
|
Lee, Joonhyung | Korea University |
Park, Sangbeom | Korea University |
Park, Jeongeun | Korea University |
Lee, Kyungjae | Chung-Ang University |
Choi, Sungjoon | Korea University |
Keywords: Telerobotics and Teleoperation, Simulation and Animation, Semantic Scene Understanding
Abstract: Pick-and-place is one of the fundamental tasks in robotics research. However, the attention has been mostly focused on the ``pick'' task, leaving the ``place'' task relatively unexplored. In this paper, we address the problem of placing objects in the context of a teleoperation framework. Particularly, we focus on two aspects of the place task: stability robustness and contextual reasonableness of object placements. Our proposed method combines simulation-driven physical stability verification via real-to-sim and the semantic reasoning capability of large language models. In other words, given place context information (e.g., user preferences, object to place, and current scene information), our proposed method outputs a probability distribution over the possible placement candidates, considering the robustness and reasonableness of the place task. Our proposed method is extensively evaluated in two simulation and one real world environments and we show that our method can greatly increase the physical plausibility of the placement as well as contextual soundness while considering user preferences. Code, video, and details are available at: https://joonhyung-lee.github.io/spots/
|
|
16:30-18:00, Paper ThCT22-NT.4 | Add to My Program |
Online Minimization of the Robot Silhouette Viewed from Eye-To-Hand Camera |
|
Cortigiani, Giovanni | University of Siena |
Brogi, Bernardo | University of Siena |
Villani, Alberto | University of Siena |
Lisini Baldi, Tommaso | University of Siena |
D'Aurizio, Nicole | University of Siena |
Prattichizzo, Domenico | University of Siena |
Keywords: Telerobotics and Teleoperation, Virtual Reality and Interfaces, Human-Centered Robotics
Abstract: Redundant robots have the potential to perform internal joints motion without modifying the pose of the end-effector by exploiting the null-space of the Jacobian matrix. Capitalizing on that feature, we developed a control technique for minimizing the robot visual appearance when observed from an eye-to-hand camera. Such algorithm is instrumental in contexts where quickly adjusting the perspective to see objects obstructed by the robot is impractical (e.g., teleoperation in narrow environment). Diminished reality techniques are frequently employed in these cases to mitigate the robot intrusion into the environment, although these techniques may sometimes compromise the perceived realism. The experimental evaluation confirmed the effectiveness of our control algorithm, demonstrating an average reduction of 4.67% of the area covered by the robot within the frame when compared to the case without the optimization action.
|
|
16:30-18:00, Paper ThCT22-NT.5 | Add to My Program |
IRoCo: Intuitive Robot Control from Anywhere Using a Smartwatch |
|
Weigend, Fabian Clemens | Arizona State University |
Liu, Xiao | Arizona State University |
Sonawani, Shubham | Arizona State University |
Kumar, Neelesh | Procter and Gamble |
Vasudevan, Venugopal | Procter & Gamble |
Ben Amor, Heni | Arizona State University |
Keywords: Wearable Robotics, Multi-Modal Perception for HRI, Telerobotics and Teleoperation
Abstract: This paper introduces iRoCo (intuitive Robot Control) - a framework for ubiquitous human-robot collaboration using a single smartwatch and smartphone. By integrating probabilistic differentiable filters, iRoCo optimizes a combination of precise robot control and unrestricted user movement from ubiquitous devices. We demonstrate and evaluate the effectiveness of iRoCo in practical teleoperation and drone piloting applications. Comparative analysis shows no significant difference between task performance with iRoCo and gold-standard control systems in teleoperation tasks. Additionally, iRoCo users complete drone piloting tasks 32% faster than with a traditional remote control and report less frustration in a subjective load index questionnaire. Our findings strongly suggest that iRoCo is a promising new approach for intuitive robot control through smartwatches and smartphones from anywhere, at any time. The code is available at www.github.com/wearable-motion-capture
|
|
16:30-18:00, Paper ThCT22-NT.6 | Add to My Program |
Integrating Open-World Shared Control in Immersive Avatars |
|
Naughton, Patrick | University of Illinois at Urbana-Champaign |
Nam, James Seungbum | University of Illinois at Urbana-Champaign |
Stratton, Andrew | University of Michigan |
Hauser, Kris | University of Illinois at Urbana-Champaign |
Keywords: Telerobotics and Teleoperation, Virtual Reality and Interfaces, Intention Recognition
Abstract: Teleoperated avatar robots allow people to transport their manipulation skills to environments that may be difficult or dangerous to work in. Current systems are able to give operators direct control of many components of the robot to immerse them in the remote environment, but operators still struggle to complete tasks as competently as they could in person. We present a framework for incorporating open-world shared control into avatar robots to combine the benefits of direct and shared control. This framework preserves the fluency of our avatar interface by minimizing obstructions to the operator's view and using the same interface for direct, shared, and fully autonomous control. In a human subjects study (N=19), we find that operators using this framework complete a range of tasks significantly more quickly and reliably than those that do not.
|
|
16:30-18:00, Paper ThCT22-NT.7 | Add to My Program |
Hierarchical Deep Learning for Intention Estimation of Teleoperation Manipulation in Assembly Tasks |
|
Mingyu, Cai | University of California Riverside |
Patel, Karankumar | Honda Research Institute |
Iba, Soshi | Honda Research Institute USA |
Li, Songpo | Honda Research Institute |
Keywords: Telerobotics and Teleoperation, Human-Robot Collaboration
Abstract: In human-robot collaboration, shared control presents an opportunity to teleoperate robotic manipulation to improve the efficiency of manufacturing and assembly processes. Robots are expected to assist in executing the user's intentions. To this end, robust and prompt intention estimation is needed, relying on behavioral observations. The framework presents an intention estimation technique at hierarchical levels i.e., low-level actions and high-level tasks, by incorporating multi-scale hierarchical information in neural networks. Technically, we employ hierarchical dependency loss to boost overall accuracy. Furthermore, we propose a multi-window method that assigns proper hierarchical prediction windows of input data. An analysis of the predictive power with various inputs demonstrates the predominance of the deep hierarchical model in the sense of prediction accuracy and early intention identification. We implement the algorithm on a virtual reality (VR) setup to teleoperate robotic hands in a simulation with various assembly tasks to show the effectiveness of online estimation.
|
|
16:30-18:00, Paper ThCT22-NT.8 | Add to My Program |
Dynamic Mobile Manipulation Via Whole-Body Bilateral Teleoperation of a Wheeled Humanoid |
|
Purushottam, Amartya | University of Illinois, Urbana-Champaign |
Xu, Christopher | University of Illinois Urbana-Champaign |
Jung, Yeongtae | Jeonbuk National University |
Ramos, Joao | University of Illinois at Urbana-Champaign |
Keywords: Telerobotics and Teleoperation, Whole-Body Motion Planning and Control, Human and Humanoid Motion Analysis and Synthesis
Abstract: Humanoid robots have the potential to help human workers by realizing physically demanding manipulation tasks such as moving large boxes within warehouses. We define such tasks as Dynamic Mobile Manipulation (DMM). This paper presents a framework for DMM via whole-body teleoperation, built upon three key contributions: Firstly, a teleoperation framework employing a Human Machine Interface (HMI) and a bi-wheeled humanoid, SATYRR, is proposed. Secondly, the study introduces a dynamic locomotion mapping, utilizing human-robot reduced order models, and a kinematic retargeting strategy for manipulation tasks. Additionally, the paper discusses the role of whole-body haptic feedback for wheeled humanoid control. Finally, the system's effectiveness and mappings for DMM are validated through locomanipulation experiments and heavy box pushing tasks. Here we show two forms of DMM: grasping a target moving at an average speed of 0.4 m/s, and pushing boxes weighing up to 105% of the robot's weight. By simultaneously adjusting their pitch and using their arms, the pilot adjusts the robot pose to apply larger contact forces and move a heavy box at a constant velocity of 0.2 m/s.
|
|
16:30-18:00, Paper ThCT22-NT.9 | Add to My Program |
3D Autocomplete: Enhancing UAV Teleoperation with AI in the Loop |
|
Ibrahim, Batool | American University of Beirut AUB |
Elhajj, Imad | American University of Beirut |
Asmar, Daniel | American University of Beirut |
Keywords: Telerobotics and Teleoperation, Human-Robot Collaboration, Virtual Reality and Interfaces
Abstract: Manually teleoperating a flying robot can be a demanding task, especially for users with limited levels of experience. This is primarily due to the non-linear properties of such robots in addition to the difficulty of controlling various degrees of freedom at the same time. 3D Autocomplete helps mitigate such limitations by assisting the users in teleoperation. It aids in teleoperating 3D motions, such as helical motions, which are more challenging to the users. The proposed framework uses Artificial Intelligence (AI) to predict just-in-time the user's intended motion and then, if the user accepts, completes it autonomously in 3D. The AI component of 3D Autocomplete was presented in our previous work, where we introduced a deep learning model and an algorithm to predict as early as possible the user's desired motion. Moving forward in this work, we focus on synthesizing and completing the user-intended motion autonomously. Also, we introduce a Mixed Reality (MR) user interface for better human-robot interaction. Finally, we evaluate our system subjectively and objectively through human-subject experiments. Autocomplete outperformed traditional method on all criteria with at least 30% improvement in all objective measures.
|
|
ThCT23-NT Oral Session, NT-G401 |
Add to My Program |
Aerial Systems: Perception and Autonomy III |
|
|
Chair: Schoellig, Angela P. | TU Munich |
Co-Chair: Scherer, Sebastian | Carnegie Mellon University |
|
16:30-18:00, Paper ThCT23-NT.1 | Add to My Program |
Control-Barrier-Aided Teleoperation with Visual-Inertial SLAM for Safe MAV Navigation in Complex Environments |
|
Zhou, Siqi | Technical University of Munich |
Papatheodorou, Sotiris | Technical University of Munich |
Leutenegger, Stefan | Technical University of Munich |
Schoellig, Angela P. | TU Munich |
Keywords: Aerial Systems: Perception and Autonomy, Robot Safety, Motion Control
Abstract: In this paper, we consider a Micro Aerial Vehicle (MAV) system teleoperated by a non-expert and introduce a perceptive safety filter that leverages Control Barrier Functions (CBFs) in conjunction with Visual-Inertial Simultaneous Localization and Mapping (VI-SLAM) and dense 3D occupancy mapping to guarantee safe navigation in complex and unstructured environments. Our system relies solely on onboard IMU measurements, stereo infrared images, and depth images and autonomously corrects teleoperated inputs when they are deemed unsafe. We define a point in 3D space as unsafe if it satisfies either of two conditions: (i) it is occupied by an obstacle, or (ii) it remains unmapped. At each time step, an occupancy map of the environment is updated by the VI-SLAM by fusing the onboard measurements, and a CBF is constructed to parameterize the (un)safe region in the 3D space. Given the CBF and state feedback from the VI-SLAM module, a safety filter computes a certified reference that best matches the teleoperation input while satisfying the safety constraint encoded by the CBF. In contrast to existing perception-based safe control frameworks, we directly close the perception-action loop and demonstrate the full capability of safe control in combination with real-time VI-SLAM without any external infrastructure or prior knowledge of the environment. We verify the efficacy of the perceptive safety filter in real-time MAV experiments using exclusively onboard sensing and computation and show that the teleoperated MAV is able to safely navigate through unknown environments despite arbitrary inputs sent by the teleoperator.
|
|
16:30-18:00, Paper ThCT23-NT.2 | Add to My Program |
Flow Shadowing: A Method to Detect Multiple Flow Headings Using an Array of Densely Packed Whisker-Inspired Sensors |
|
Kent, Teresa | Carnegie Mellon University |
Bergbreiter, Sarah | Carnegie Mellon University |
Keywords: Aerial Systems: Perception and Autonomy, Sensor Fusion, Soft Sensors and Actuators
Abstract: Understanding airflow around a drone is critical for performing advanced maneuvers while maintaining flight stability. Recent research has worked to understand this flow by employing 2D and 3D flow sensors to measure flow from a single source like wind or the drone’s relative motion. Our current work advances flow detection by introducing a strategy to distinguish between two flow sources applied simultaneously from different directions. By densely packing an array of flow sensors (or whiskers), we alter the path of airflow as it moves through the array. We have named this technique “flow shadowing” because we take advantage of the fact that a downstream whisker shadowed (or occluded) by an upstream whisker receives less incident flow. We show that this relationship is predictable for two whiskers based on the percent of occlusion. We then show that a 2x2 spatial array of whiskers responds asymmetrically when multiple flow sources from different headings are applied to the array. This asymmetry is direction-dependent, allowing us to predict the headings of flow from two different sources, like wind and a drone’s relative motion.
|
|
16:30-18:00, Paper ThCT23-NT.3 | Add to My Program |
Onboard Dynamic-Object Detection and Tracking for Autonomous Robot Navigation with RGB-D Camera |
|
Xu, Zhefan | Carnegie Mellon University |
Zhan, Xiaoyang | Carnegie Mellon University |
Xiu, Yumeng | Carnegie Mellon University |
Suzuki, Christopher | Carnegie Mellon University |
Shimada, Kenji | Carnegie Mellon University |
Keywords: Aerial Systems: Perception and Autonomy, Collision Avoidance, Vision-Based Navigation
Abstract: Deploying autonomous robots in crowded indoor environments usually requires them to have accurate dynamic obstacle perception. Although plenty of previous works in the autonomous driving field have investigated the 3D object detection problem, the usage of dense point clouds from a heavy Light Detection and Ranging (LiDAR) sensor and their high computation cost for learning-based data processing make those methods not applicable to small robots, such as vision-based UAVs with small onboard computers. To address this issue, we propose a lightweight 3D dynamic obstacle detection and tracking (DODT) method based on an RGB-D camera, which is designed for low-power robots with limited computing power. Our method adopts a novel ensemble detection strategy, combining multiple computationally efficient but low-accuracy detectors to achieve real-time high-accuracy obstacle detection. Besides, we introduce a new feature-based data association and tracking method to prevent mismatches utilizing point clouds' statistical features. In addition, our system includes an optional and auxiliary learning-based module to enhance the obstacle detection range and dynamic obstacle identification. The proposed method is implemented in a small quadcopter, and the results show that our method can achieve the lowest position error (0.11m) and a comparable velocity error (0.23m/s) across the benchmarking algorithms running on the robot's onboard computer.
|
|
16:30-18:00, Paper ThCT23-NT.4 | Add to My Program |
APACE: Agile and Perception-Aware Trajectory Generation for Quadrotor Flights |
|
Chen, Xinyi | The Hong Kong University of Science and Technology |
Zhang, Yichen | The Hong Kong University of Science and Technology |
Zhou, Boyu | Sun Yat-Sen University |
Shen, Shaojie | Hong Kong University of Science and Technology |
Keywords: Aerial Systems: Perception and Autonomy, View Planning for SLAM, Motion and Path Planning
Abstract: Various perception-aware planning approaches have attempted to enhance the state estimation accuracy during maneuvers, while the feature matchability among frames, a crucial factor influencing estimation accuracy, has often been overlooked. In this paper, we present APACE, an Agile and Perception-Aware trajeCtory gEneration framework for quadrotors aggressive flight, that takes into account feature matchability during trajectory planning. We seek to generate a perception-aware trajectory that reduces the error of visual-based estimator while satisfying the constraints on smoothness, safety, agility and the quadrotor dynamics. The perception objective is achieved by maximizing the number of covisible features while ensuring small enough parallax angles. Additionally, we propose a differentiable and accurate visibility model that allows decomposition of the trajectory planning problem for efficient optimization resolution. Through validations conducted in both a photorealistic simulator and real-world experiments, we demonstrate that the trajectories generated by our method significantly improve state estimation accuracy, with root mean square error (RMSE) reduced by up to an order of magnitude. The source code will be released to benefit the community.
|
|
16:30-18:00, Paper ThCT23-NT.5 | Add to My Program |
SpECULARIA: Towards Fully Autonomous Robotic Indoor Farming System |
|
Car, Marsela | University of Zagreb |
Arbanas Ferreira, Barbara | University of Zagreb, Faculty of Electrical Engineering and Comp |
Vuletic, Jelena | University of Zagreb, Faculty of Electrical Engineering and Comp |
Orsag, Matko | University of Zagreb, Faculty of Electrical Engineering and Comp |
Keywords: Agricultural Automation, AI-Enabled Robotics, Multi-Robot Systems
Abstract: To support the hypothesis that embracing robotics has the potential to address farming challenges and at the same time replace large and complex farm machinery, this paper proposes designing a farm around a heterogeneous robotic system dubbed SpECULARIA. Within this multi-robot system, mobile robots are deployed to work just like in a warehouse, moving plants grown in containers to make sure every plant receives optimal care and ideal growing conditions. By structuring the work cell environment around a stationary dual arm manipulator, the system can plan and execute procedures to control every plant's growth and hygiene, from seed to harvest. Such a system surpasses current farming robots in scalability and versatility. We showcase compliance control algorithms combined with artificial intelligence which help us build a functional model of the plant. The same approach is used to program different plant treatments. Finally we benchmark the proposed setup with a classical mobile manipulation approach, demonstrating its feasibility.
|
|
16:30-18:00, Paper ThCT23-NT.6 | Add to My Program |
End-To-End Thermal Updraft Detection and Estimation for Autonomous Soaring Using Temporal Convolutional Networks |
|
Gall, Christian | University of Stuttgart |
Fichter, Walter | University of Stuttgart |
Ahmad, Aamir | University of Stuttgart |
Keywords: Aerial Systems: Perception and Autonomy, AI-Based Methods, Energy and Environment-Aware Automation
Abstract: Exploiting thermal updrafts to gain altitude can significantly extend the endurance of fixed-wing aircraft, as has been demonstrated by human glider pilots for decades. In this work, we present a novel end-to-end deep learning approach for the simultaneous detection of multiple thermal updrafts and the estimation of their properties - a key capability to let autonomous unmanned aerial vehicles soar as well. In contrast to previous works, our approach does not require separate algorithms for the detection of individual updrafts. Instead, a sequence of sensor measurements from a time window of interest can be directly fed into our temporal convolutional network, which estimates the position, strength, and spread of the encountered updrafts. We demonstrated in simulations that our approach can reliably detect updrafts solely based on measurements of the aircraft's position and the local vertical wind velocity. Nevertheless, our method can additionally make use of measurements of the roll moment induced by updrafts, which improves the precision further. Compared with a particle-filter-based method, we can determine the correct number of encountered updrafts with an accuracy of 99.99% instead of 79.50%, significantly improve the precision of strength as well as spread estimates, and reduce the computational demand.
|
|
16:30-18:00, Paper ThCT23-NT.7 | Add to My Program |
SANet: Small but Accurate Detector for Aerial Flying Object |
|
Zhou, Xunkuai | Tongji University |
Zhao, Benyun | The Chinese University of Hong Kong |
Yang, Guidong | The Chinese University of Hong Kong |
Zhang, Jihan | Chinese University of Hong Kong |
Li, Li | Tongji University |
Chen, Ben M. | Chinese University of Hong Kong |
Keywords: Aerial Systems: Applications, AI-Based Methods, Object Detection, Segmentation and Categorization
Abstract: This paper proposes SANet, a small but accurate detector for aerial flying objects. The detector introduces an attention module into the feature extraction module (FEM) for enhancing the accuracy. This FEM with fewer convolutional kernel channels can reduce the parameters, speed up the inference time, and mitigate the computational burden.Furthermore, we optimize the Spatial Pyramid Pooling (SPP) module to enhance both the accuracy and speed. By analyzing the structure characteristic of the ResNet and RepVGG network that are usually utilized to extract features, a feature fusion module named RepNeck is designed to comprehensively fuse features extracted by the FEM, further enhancing the speed and accuracy. Eventually, we develop a neural network with an impressively small model size of only 4.5M. This network can achieve the state-of-the-art performance on three challenging datasets. Apart from its superior performance, our approach enjoys a real-time detection speed of 14.8 frames per second (fps) and power consumption of only 2.9W while the CPU and GPU temperatures are maintained below 50◦C even on an edge- computing device, highlighting the practicality of our approach for long-duration flying object detection and monitor tasks.
|
|
16:30-18:00, Paper ThCT23-NT.8 | Add to My Program |
N-QR: Natural Quick Response Codes for Multi-Robot Instance Correspondence |
|
Glaser, Nathaniel | Georgia Institute of Technology |
Ravi, Rajashree | Bowery Farming |
Kira, Zsolt | Georgia Institute of Technology |
Keywords: Agricultural Automation, Multi-Robot Systems, Deep Learning for Visual Perception
Abstract: Image correspondence serves as the backbone for many tasks in robotics, such as visual fusion, localization, and mapping. However, existing correspondence methods do not scale to large multi-robot systems, and they struggle when image features are weak, ambiguous, or evolving. In response, we propose Natural Quick Response codes, or N-QR, which enables rapid and reliable correspondence between large-scale teams of heterogeneous robots. Our method works like a QR code, using keypoint-based alignment, rapid encoding, and error correction via ensembles of image patches of natural patterns. We deploy our algorithm in a production-scale robotic farm, where groups of growing plants must be matched across many robots. We demonstrate superior performance compared to several baselines, obtaining a retrieval accuracy of 88.2%. Our method generalizes to a farm with 100 robots, achieving a 12.5x reduction in bandwidth and a 20.5x speedup. We leverage our method to correspond 700k plants and confirm a link between a robotic seeding policy and germination.
|
|
ThCT24-NT Oral Session, NT-G402 |
Add to My Program |
Robotics and Automation in Agriculture and Forestry IV |
|
|
Chair: Valada, Abhinav | University of Freiburg |
Co-Chair: Stachniss, Cyrill | University of Bonn |
|
16:30-18:00, Paper ThCT24-NT.1 | Add to My Program |
Containerized Vertical Farming Using Cobots |
|
Mahalingam, Dasharadhan | Stony Brook University |
Patankar, Aditya | Stony Brook University |
Phi, Khiem | Stony Brook University |
Chakraborty, Nilanjan | Stony Brook University |
McGann, Ryan | CubicAcres LLC |
Ramakrishnan, Iv | Stony Brook University |
Keywords: Robotics and Automation in Agriculture and Forestry, Constrained Motion Planning, Motion and Path Planning
Abstract: Containerized vertical farming is a type of vertical farming practice using hydroponics in which plants are grown in vertical layers within a mobile shipping container. Space limitations within shipping containers make the automation of different farming operations challenging. In this paper, we explore the use of cobots (i.e., collaborative robots) to automate two key farming operations, namely, the transplantation of saplings and the harvesting of grown plants. Our method uses a single demonstration from a farmer to extract the motion constraints associated with the tasks, namely, transplanting and harvesting, and can then generalize to different instances of the same task. For transplantation, the motion constraint arises during insertion of the sapling within the growing tube, whereas for harvesting, it arises during extraction from the growing tube. We present experimental results to show that using RGBD camera images (obtained from an eye-in-hand configuration) and one demonstration for each task, it is feasible to perform transplantation of saplings and harvesting of leafy greens using a cobot, without task-specific programming.
|
|
16:30-18:00, Paper ThCT24-NT.2 | Add to My Program |
INoD: Injected Noise Discriminator for Self-Supervised Representation Learning in Agricultural Fields |
|
Hindel, Julia | University of Freiburg |
Gosala, Nikhil | University of Freiburg |
Bregler, Kevin | Fraunhofer IPA |
Valada, Abhinav | University of Freiburg |
Keywords: Robotics and Automation in Agriculture and Forestry, Deep Learning for Visual Perception, Computer Vision for Automation
Abstract: Perception datasets for agriculture are limited both in quantity and diversity which hinders effective training of supervised learning approaches. Self-supervised learning techniques alleviate this problem, however, existing methods are not optimized for dense prediction tasks in agriculture domains which results in degraded performance. In this work, we address this limitation with our proposed Injected Noise Discriminator (INoD) which exploits principles of feature replacement and dataset discrimination for self-supervised representation learning. INoD interleaves feature maps from two disjoint datasets during their convolutional encoding and predicts the dataset affiliation of the resultant feature map as a pretext task. Our approach enables the network to learn unequivocal representations of objects seen in one dataset while observing them in conjunction with similar features from the disjoint dataset. This allows the network to reason about higher-level semantics of the entailed objects, thus improving its performance on various downstream tasks. Additionally, we introduce the novel Fraunhofer Potato 2022 dataset consisting of over 16,800 images for object detection in potato fields. Extensive evaluations of our proposed INoD pretraining strategy for the tasks of object detection, semantic segmentation, and instance segmentation on the Sugar Beets 2016 and our potato dataset demonstrate that it achieves state-of-the-art performance.
|
|
16:30-18:00, Paper ThCT24-NT.3 | Add to My Program |
Unsupervised Generation of Labeled Training Images for Crop-Weed Segmentation in New Fields and on Different Robotic Platforms |
|
Chong, Yue Linn | University of Bonn |
Weyler, Jan | University of Bonn |
Lottes, Philipp | University of Bonn |
Behley, Jens | University of Bonn |
Stachniss, Cyrill | University of Bonn |
Keywords: Robotics and Automation in Agriculture and Forestry, Deep Learning for Visual Perception, Object Detection, Segmentation and Categorization
Abstract: Agricultural robots have the potential to improve the efficiency and sustainability of existing agricultural practices. Most autonomous agricultural robots rely on machine vision systems. Such systems,however, often perform worse in new fields or when the robotic platforms change. While we can alleviate the performance degradation by manually labeling more data obtained in the new setup, this procedure is labor and cost-intensive. Therefore, we propose an approach to improve the performance of machine vision systems for new fields and different robotic platforms without additional manual labeling. In an unsupervised manner, our approach can generate images and corresponding labels to train machine vision systems. We use StyleGAN2 to generate images that appear like they are from desired new field or robotic platform. Additionally, we propose a label refinement method to generate labels corresponding to the generated images. We show that our approach can improve the performance of the crop-weed segmentation task in new fields and on different robotic platforms without additional manual labeling.
|
|
16:30-18:00, Paper ThCT24-NT.4 | Add to My Program |
Log Loading Automation for Timber-Harvesting Industry |
|
Ayoub, Elie | FPInnovations |
Fernando, Heshan | FP Innovations |
Larrivée-Hardy, William | Laval University |
Lemieux, Nicolas | FPInnovations |
Gigučre, Philippe | Université Laval |
Sharf, Inna | McGill University |
Keywords: Robotics and Automation in Agriculture and Forestry, Field Robots, Perception for Grasping and Manipulation
Abstract: The timber-harvesting industry is lagging its peer industries, such as mining and agriculture, with respect to deployment of robotic, AI and autonomous technologies. In this paper, we tackle automation of a critical task that arises in transporting logs from the forest to the sawmill: the log loading operation. This work is motivated by the acute shortages of human operators and the need to improve the efficiencies of timber-harvesting processes. To this end, we demonstrate the full autonomy pipeline for the log loading operation with a fixed-base manipulator (a.k.a., the crane), starting with perception of logs around the machine, then grasp planning for where to grasp logs, through motion planning and control of the log loading maneuver. Our main contribution is in the full integration of the necessary elements to achieve a completely autonomous loading cycle, where the crane picks up and loads all logs within its reach on a trailer. Notable features of our implementation are a generalizable perception stack, a grasp planner to pick up multiple logs at a time and an extensive experimental campaign conducted outdoors, on a commercial log loader retrofitted for autonomy. Our results demonstrate an overall 87% success rate of the log loading operation, with primary failure cases due to log segmentation errors and deficiencies in the final height adjustment algorithm for grasping logs. We also present detailed timing results of the main parts of the autonomy pipeline, which support the feasibility of deployment in operational environment.
|
|
16:30-18:00, Paper ThCT24-NT.5 | Add to My Program |
Region-Determined Localization Method for Unmanned Ground Vehicle under Pole-Like Feature Environment |
|
Lai, Yu-Hsiang | National Taiwan University |
Chuang, Chia-Yun | National Taiwan University |
Chen, Yu-Qiang | National Taiwan University |
Lian, Feng-Li | National Taiwan University |
Keywords: Robotics and Automation in Agriculture and Forestry, Localization
Abstract: In this paper, a region-determined navigation method applied for unmanned ground vehicles (UGVs) is presented. The method aims to solve GNSS-denied localization problem using pole-like feature such as trees or street lights. The approach includes three parts: mapping, bounding, and localization. To map and reconstruct the environment, the hector mapping approach and circle-fitting method are adopted for the occupancy mapping and feature mapping. To bound out the available working region, we define the intersection area of features' enlarged radius and desired operating area as negative and positive virtual boundaries. While the robot is cruising, the likelihood detection method is adopted for obstacle searching and comparing. Using the detection's searching results as feedback reference, the Extended Kalman Filter (EKF) can modify the shifting between the GNSS signal and the true waypoints of the mowing robot. Three cruising demonstrations are presented to show the mapping and optimizing results. Different cases of demonstration represent different situations and potential issues.
|
|
16:30-18:00, Paper ThCT24-NT.6 | Add to My Program |
Tree Instance Segmentation and Traits Estimation for Forestry Environments Exploiting LiDAR Data Collected by Mobile Robots |
|
Malladi, Meher Venkata Ramakrishna | University of Bonn |
Guadagnino, Tiziano | University of Bonn |
Lobefaro, Luca | University of Bonn |
Mattamala, Matias | University of Oxford |
Griess, Holger | Swiss Federal Institute for Forest, Snow and Landscape Research |
Schweier, Janine | Swiss Federal Institute for Forest, Snow and Landscape Research |
Chebrolu, Nived | University of Oxford |
Fallon, Maurice | University of Oxford |
Behley, Jens | University of Bonn |
Stachniss, Cyrill | University of Bonn |
Keywords: Robotics and Automation in Agriculture and Forestry, Mapping
Abstract: Forests play a crucial role in our ecosystems, functioning as carbon sinks, climate stabilizers, biodiversity hubs, and sources of wood. By the very nature of their scale, monitoring and maintaining forests is a challenging task. Robotics in forestry can have the potential for substantial automation toward efficient and sustainable foresting practices. In this paper, we address the problem of automatically producing a forest inventory by exploiting LiDAR data collected by a mobile platform. To construct an inventory, we first extract tree instances from point clouds. Then, we process each instance to extract forestry inventory information. Our approach provides per-tree geometric traits such as diameter at breast height together with the individual tree locations in a plot. We validate our results against manual measurements collected by foresters during field trials. Our experiments show strong segmentation and tree trait estimation performance, underlining the potential for automating forestry services.
|
|
16:30-18:00, Paper ThCT24-NT.7 | Add to My Program |
Automated Testing of Spatially-Dependent Environmental Hypotheses through Active Transfer Learning |
|
Harrison, Nicholas | The University of Sydney: The Australian Centre for Field Roboti |
Wallace, Nathan Daniel | University of Sydney |
Sukkarieh, Salah | The University of Sydney: The Australian Centre for Field Roboti |
Keywords: Robotics and Automation in Agriculture and Forestry, Reactive and Sensor-Based Planning, Transfer Learning
Abstract: The efficient collection of samples is an important factor in outdoor information gathering applications on account of high sampling costs such as time, energy, and potential destruction to the environment. Utilization of available a-priori data can be a powerful tool for increasing efficiency. However, the relationships of this data with the quantity of interest are often not known ahead of time, limiting the ability to leverage this knowledge for improved planning efficiency. To this end, this work combines transfer learning and active learning through a Multi-Task Gaussian Process and an information-based objective function. Through this combination it can explore the space of hypothetical inter-quantity relationships and evaluate these hypotheses in real-time, allowing this new knowledge to be immediately exploited for future plans. The performance of the proposed method is evaluated against synthetic data and is shown to evaluate multiple hypotheses correctly. Its effectiveness is also demonstrated on real datasets. The technique is able to identify and leverage hypotheses which show a medium or strong correlation to reduce prediction error by a factor of 1.4--3.4 within the first 7 samples, and poor hypotheses are quickly identified and rejected eventually having no adverse effect.
|
|
16:30-18:00, Paper ThCT24-NT.8 | Add to My Program |
Decentralized Multi-Phase Formation Control for Cattle Herding |
|
Nguyen, Dac Dang Khoa | University of Technology Sydney |
Paul, Gavin | University of Technology Sydney |
Alempijevic, Alen | University of Technology Sydney |
Keywords: Swarm Robotics, Agricultural Automation, Multi-Robot Systems
Abstract: Herding is performed by people or trained animals to control the movement of livestock under the desired direction of a operator. This paper presents a novel decentralized control strategy for a group of robots to herd animals which consists of two phases, a surrounding phase and a driving phase. In the surrounding phase, a custom artificial potential field is employed to simultaneously guide the robots to encircle the herd by tracking the outmost animals and maintaining a safe distance with other neighboring robots. Once the encirclement is complete, the robots transition to drive the animals toward a designated goal by simply maintaining their initial formation and traverse to it. Unlike existing works on herding using flocking control, local observations of the nearest animals and communication with other robots within the sensing range are the only requirements for the robots to effectively surround and herd the animals. Moreover, the animal-robot behavior model resembles interaction of livestock under the presence of an external predatory threat, where robots act as predators. An analytical proof and empirical results collected from different simulators demonstrate that the proposed control enables the robots to converge around the boundary of the animals and guide them toward the designated goal.
|
|
ThCT25-NT Oral Session, NT-G403 |
Add to My Program |
Localization and Mapping II |
|
|
Chair: Huang, Guoquan | University of Delaware |
Co-Chair: Fallon, Maurice | University of Oxford |
|
16:30-18:00, Paper ThCT25-NT.1 | Add to My Program |
Quantized Visual-Inertial Odometry |
|
Peng, Yuxiang | University of Delaware |
Chen, Chuchu | University of Delaware |
Huang, Guoquan | University of Delaware |
Keywords: Localization, Visual-Inertial SLAM, SLAM
Abstract: As edge devices equipped with cameras and inertial measurement units (IMUs) are emerging, it holds huge implications to endow these mobile devices with spatial computing capability. However, ultra-efficient visual-inertial estimation at the size, weight, and power (SWAP)-constrained edge devices to provide accurate 3D motion tracking remains challenging. This is exacerbated by data transfer (between different processors and memory) that consumes significantly more energy than computing itself. To push the state of the art, this paper proposes the first-of-its-kind quantized visual-inertial odometry (QVIO) to offer energy-efficient 3D motion tracking. In particular, we first quantize raw visual measurements in an intuitive way with a given small number of bits and then perform an EKF update with these quantized measurements (termed zQVIO). To improve this ad-hoc quantizer (although it works well in practice), we systematically quantize each measurement residual into a single bit and perform maximum-a-posterior (MAP) estimation. Thanks to these quantizers, the proposed QVIO estimators significantly reduce the data transfer and thus improve energy efficiency. As shown in our extensive experiments, the proposed residual-quantized VIO (rQVIO) achieves remarkably competing performance even when using an average of only 3.7 bits per measurement, equivalent to a data reduction of 8.6 times compared to transmitting single-precision measurements.
|
|
16:30-18:00, Paper ThCT25-NT.2 | Add to My Program |
OCC-VO: Dense Mapping Via 3D Occupancy-Based Visual Odometry for Autonomous Driving |
|
Li, Heng | University of Science and Technology of China |
Duan, Yifan | University of Science and Technology of China |
Zhang, Xinran | University of Science and Technology of China |
Liu, Haiyi | University of Science and Technology of China |
Ji, Jianmin | University of Science and Technology of China |
Zhang, Yanyong | University of Science and Technology of China |
Keywords: Mapping, SLAM
Abstract: Visual Odometry (VO) plays a pivotal role in autonomous systems, with a principal challenge being the lack of depth information in camera images. This paper introduces OCC-VO, a novel framework that capitalizes on recent advances in deep learning to transform 2D camera images into 3D semantic occupancy, thereby circumventing the traditional need for concurrent estimation of ego poses and landmark locations. Within this framework, we utilize the TPV-Former to convert surround view cameras' images into 3D semantic occupancy. Addressing the challenges presented by this transformation, we have specifically tailored a pose estimation and mapping algorithm that incorporates Semantic Label Filter, Dynamic Object Filter, and finally, utilizes Voxel PFilter for maintaining a consistent global semantic map. Evaluations on the Occ3D-nuScenes not only showcase a 20.6% improvement in Success Ratio and a 29.6% enhancement in trajectory accuracy against ORB-SLAM3, but also emphasize our ability to construct a comprehensive map. Our implementation is open-sourced and available at: https://github.com/USTCLH/OCC-VO.
|
|
16:30-18:00, Paper ThCT25-NT.3 | Add to My Program |
NF-Atlas: Multi-Volume Neural Feature Fields for Large Scale LiDAR Mapping |
|
Yu, Xuan | Zhejiang University |
Liu, Yili | Zhejiang University |
Mao, Sitong | ShenZhen Huawei Cloud Computing Technologies Co., Ltd |
Zhou, Shunbo | The Chinese University of Hong Kong |
Xiong, Rong | Zhejiang University |
Liao, Yiyi | Zhejiang University |
Wang, Yue | Zhejiang University |
Keywords: Mapping, SLAM
Abstract: LiDAR Mapping has been a long-standing problem in robotics. Recent progress in neural implicit representation has brought new opportunities to robotic mapping. In this paper, we propose the multi-volume neural feature fields, called NF-Atlas, which bridge the neural feature volumes with pose graph optimization. By regarding the neural feature volume as pose graph nodes and the relative pose between volumes as pose graph edges, the entire neural feature field becomes both locally rigid and globally elastic. Locally, the neural feature volume employs a sparse feature Octree and a small MLP to encode the signed distance function (SDF) of the submap with an option of semantics. Learning the map using this structure allows for end-to-end solving of maximum a posteriori (MAP) based probabilistic mapping. Globally, the map is built volume by volume independently, avoiding catastrophic forgetting when mapping incrementally. Furthermore, when a loop closure occurs, with the elastic pose graph based representation, only updating the origin of neural volumes is required without remapping. Finally, these functionalities of NF-Atlas are validated. Thanks to the sparsity and the optimization based formulation, NF-Atlas shows competitive performance in terms of accuracy, efficiency and memory usage on both simulation and real-world datasets. The project page is: https://yuxuan1206.github.io/NFAtlas/.
|
|
16:30-18:00, Paper ThCT25-NT.4 | Add to My Program |
Dusk Till Dawn: Self-Supervised Nighttime Stereo Depth Estimation Using Visual Foundation Models |
|
Vankadari, Madhu | University of Oxford |
Hodgson, Samuel | University of Oxford |
Shin, Sangyun | University of Oxford |
Zhou, Kaichen | University of Oxford |
Markham, Andrew | Oxford University |
Trigoni, Niki | University of Oxford |
Keywords: Mapping, SLAM, Deep Learning for Visual Perception
Abstract: Self-supervised depth estimation algorithms rely heavily on frame-warping relationships, exhibiting substantial performance degradation when applied in challenging circum- stances, such as low-visibility and nighttime scenarios with varying illumination conditions. Addressing this challenge, we introduce an algorithm designed to achieve accurate self- supervised stereo depth estimation focusing on nighttime conditions. Specifically, we use pretrained visual foundation models to extract generalised features across challenging scenes and present an efficient method for matching and integrating these features from stereo frames. Moreover, to prevent pixels violating photometric consistency assumption from negatively affecting the depth predictions, we propose a novel masking approach designed to filter out such pixels. Lastly, addressing weaknesses in the evaluation of current depth estimation algorithms, we present novel evaluation metrics. Our experiments, conducted on challenging datasets including Oxford RobotCar and Multi- Spectral Stereo, demonstrate the robust improvements realized by our approach.
|
|
16:30-18:00, Paper ThCT25-NT.5 | Add to My Program |
SiLVR: Scalable Lidar-Visual Reconstruction with Neural Radiance Fields for Robotic Inspection |
|
Tao, Yifu | University of Oxford |
Bhalgat, Yash Sanjay | University of Oxford |
Fu, Lanke Frank Tarimo | University of Oxford |
Mattamala, Matias | University of Oxford |
Chebrolu, Nived | University of Oxford |
Fallon, Maurice | University of Oxford |
Keywords: Mapping, SLAM, Deep Learning for Visual Perception
Abstract: We present a neural-field-based large-scale reconstruction system that fuses lidar and vision data to generate high-quality reconstructions that are geometrically accurate and capture photo-realistic textures. This system adapts the state-of-the-art neural radiance field (NeRF) representation to also incorporate lidar data which adds strong geometric constraints on the depth and surface normals. We exploit the trajectory from a real-time lidar SLAM system to bootstrap a Structure-from-Motion (SfM) procedure to both significantly reduce the computation time and to provide metric scale which is crucial for lidar depth loss. We use submapping to scale the system to large-scale environments captured over long trajectories. We demonstrate the reconstruction system with data from a multi-camera, lidar sensor suite onboard a legged robot, hand-held while scanning building scenes for 600 metres, and onboard an aerial robot surveying a multi-storey mock disaster site-building. Website: https://ori.ox.ac.uk/labs/drs/nerf-mapping/
|
|
16:30-18:00, Paper ThCT25-NT.6 | Add to My Program |
LESS-Map: Lightweight and Evolving Semantic Map in Parking Lots for Long-Term Self-Localization |
|
MingRui, Liu | Zhejiang University |
Tang, Xinyang | Shanghai Jiao Tong University |
Qian, Yeqiang | Shanghai Jiao Tong University |
Chen, Jiming | Zhejiang University |
Li, Liang | Zhejiang Univerisity |
Keywords: Mapping, SLAM, Omnidirectional Vision
Abstract: Precise and long-term stable localization is essential in parking lots for tasks like autonomous driving or autonomous valet parking, textit{etc}. Existing methods rely on a fixed and memory-inefficient map, which lacks robust data association approaches. And it is not suitable for precise localization or long-term map maintenance. In this paper, we propose a novel mapping, localization, and map update system based on ground semantic features, utilizing low-cost cameras. We present a precise and lightweight parameterization method to establish improved data association and achieve accurate localization at centimeter-level. Furthermore, we propose a novel map update approach by implementing high-quality data association for parameterized semantic features, allowing continuous map update and refinement during re-localization, while maintaining centimeter-level accuracy. We validate the performance of the proposed method in real-world experiments and compare it against state-of-the-art algorithms. The proposed method achieves an average accuracy improvement of 5cm during the registration process. The generated maps consume only a compact size of 450 KB/km and remain adaptable to evolving environments through continuous update.
|
|
16:30-18:00, Paper ThCT25-NT.7 | Add to My Program |
Observation Time Difference: An Online Dynamic Objects Removal Method for Ground Vehicles |
|
Wu, Rongguang | Northeastern University |
Pang, Chenglin | Northeastern University |
Wu, Xuankang | Northeastern University |
Fang, Zheng | Northeastern University |
Keywords: Mapping, SLAM, Range Sensing
Abstract: In the process of urban environment mapping, the sequential accumulations of dynamic objects will leave a large number of traces in the map. These traces will usually have bad influences on the localization accuracy and navigation performance of the robot. Therefore, dynamic objects removal plays an important role for creating clean map. However, conventional dynamic objects removal methods usually run offline. That is, the map is reprocessed after it is constructed, which undoubtedly increases additional time costs. To tackle the problem, this paper proposes a novel method for online dynamic objects removal for ground vehicles. According to the observation time difference between the object and the ground where it is located, dynamic objects are classified into two types: suddenly appear and suddenly disappear. For these two kinds of dynamic objects, we propose downward retrieval and upward retrieval methods to eliminate them respectively. We validate our method on SemanticKITTI dataset and author-collected dataset with highly dynamic objects. Compared with other state-of-the-art methods, our method is more efficient and robust, and reduces the running time per frame by more than 60% on average. Our method is open-sourced on GitHub
|
|
16:30-18:00, Paper ThCT25-NT.8 | Add to My Program |
RO-MAP: Real-Time Multi-Object Mapping with Neural Radiance Fields |
|
Han, Xiao | University of Electronic Science and Technology of China |
Liu, Houxuan | University of Electronic Science and Technology of China |
Ding, Yunchao | University of Electronic Science and Technology of China |
Yang, Lu | University of Electronic Science and Technology of China |
Keywords: Mapping, SLAM, Semantic Scene Understanding
Abstract: Accurate perception of objects in the environment is important for improving the scene understanding capability of SLAM systems. In robotic and augmented reality applications, object maps with semantic and metric information show attractive advantages. In this paper, we present RO-MAP, a novel multi-object mapping pipeline that does not rely on 3D priors. Given only monocular input, we use neural radiance fields to represent objects and couple them with a lightweight object SLAM based on multi-view geometry, to simultaneously localize objects and implicitly learn their dense geometry. We create separate implicit models for each detected object and train them dynamically and in parallel as new observations are added. Experiments on synthetic and real-world datasets demonstrate that our method can generate semantic object map with shape reconstruction, and be competitive with offline methods while achieving real-time performance (25Hz). The code and dataset will be available at: https://github.com/XiaoHan-Git/RO-MAP
|
|
16:30-18:00, Paper ThCT25-NT.9 | Add to My Program |
OctoMap-RT: Fast Probabilistic Volumetric Mapping Using Ray-Tracing GPUs |
|
Min, Heajung | Ewha Womans University |
Han, Kyungmin | Ewha Woman's Univeristy |
Kim, Young J. | Ewha Womans University |
Keywords: Mapping, Simulation and Animation, Hardware-Software Integration in Robotics
Abstract: A 3D occupancy map that is accurately modeled after real-world environments is essential for reliably performing robotic tasks. Probabilistic volumetric mapping (PVM) is a well-known environment mapping method using volumetric voxel grids that represent the probability of occupancy. The main bottleneck of current CPU-based PVM, such as OctoMap, is determining voxel grids with occupied and free states using ray-shooting. In this paper, we propose an octree-based PVM, called OctoMap-RT, using a hybrid of off-the-shelf ray-tracing GPUs and CPUs to substantially improve CPU-based PVM. OctoMap-RT employs massively parallel ray-shooting using GPUs to generate occupied and free voxel grids and to update their occupancy states in parallel, and it exploits CPUs to restructure the PVM using the updated voxels. Our experiments using various large-scale real-world benchmarking environments with dense and high-resolution sensor measurements demonstrate that OctoMap-RT builds maps up to 41.2 times faster than OctoMap and 9.3 times faster than the recent SuperRay CPU implementation. Moreover, OctoMap-RT constructs a map with 0.52% higher accuracy, in terms of the number of occupancy grids, than both OctoMap and SuperRay.
|
|
ThCT26-NT Oral Session, NT-G404 |
Add to My Program |
SLAM VI |
|
|
Chair: Leutenegger, Stefan | Technical University of Munich |
Co-Chair: Xu, Yang | Zhejiang University |
|
16:30-18:00, Paper ThCT26-NT.1 | Add to My Program |
VICAN: Very Efficient Calibration Algorithm for Large Camera Networks |
|
Moreira, Gabriel | Carnegie Mellon University |
Marques, Manuel | Instituto Superior Técnico |
Costeira, Joao Paulo | Insituto Superior Tecnico |
Hauptmann, Alexander | Carnegie Mellon University |
Keywords: SLAM, Sensor Networks, Multi-Robot SLAM
Abstract: The precise estimation of camera poses within large camera networks is a foundational problem in computer vision and robotics, with broad applications spanning autonomous navigation, surveillance, and augmented reality. In this paper, we introduce a novel methodology that extends state-of-the-art Pose Graph Optimization (PGO) techniques. Departing from the conventional PGO paradigm, which primarily relies on camera-camera edges, our approach centers on the introduction of a dynamic element - any rigid object free to move in the scene - whose pose can be reliably inferred from a single image. Specifically, we consider the bipartite graph encompassing cameras, object poses evolving dynamically, and camera-object relative transformations at each time step. This shift not only offers a solution to the challenges encountered in directly estimating relative poses between cameras, particularly in adverse environments, but also leverages the inclusion of numerous object poses to ameliorate and integrate errors, resulting in accurate camera pose estimates. Though our framework retains compatibility with traditional PGO solvers, its efficacy benefits from a custom-tailored optimization scheme. To this end, we introduce an iterative primal-dual algorithm, capable of handling large graphs. Empirical benchmarks, conducted on a new dataset of simulated indoor environments, substantiate the efficacy and efficiency of our approach.
|
|
16:30-18:00, Paper ThCT26-NT.2 | Add to My Program |
Tightly-Coupled LiDAR-Visual-Inertial SLAM and Large-Scale Volumetric-Occupancy Mapping |
|
Boche, Simon | Technical University of Munich |
Barbas Laina, Sebastián | TU Munich |
Leutenegger, Stefan | Technical University of Munich |
Keywords: SLAM, Visual-Inertial SLAM, Mapping
Abstract: Autonomous navigation is one of the key requirements for every potential application of mobile robots in the real-world. Besides high-accuracy state estimation, a suitable and globally consistent representation of the 3D environment is indispensable. We present a fully tightly-coupled LiDAR-Visual-Inertial SLAM system and 3D mapping framework applying local submapping strategies to achieve scalability to large-scale environments. A novel and correspondence-free, inherently probabilistic, formulation of LiDAR residuals is introduced, expressed only in terms of the occupancy fields and its respective gradients. These residuals can be added to a factor graph optimisation problem, either as frame-to-map factors for the live estimates or as map-to-map factors aligning the submaps with respect to one another. Experimental validation demonstrates that the approach achieves state-of-the-art pose accuracy and furthermore produces globally consistent volumetric occupancy submaps which can be directly used in downstream tasks such as navigation or exploration.
|
|
16:30-18:00, Paper ThCT26-NT.3 | Add to My Program |
Active Visual Localization for Multi-Agent Collaboration: A Data-Driven Approach |
|
Hanlon, Matthew | ETH Zurich |
Sun, Boyang | ETH Zurich |
Pollefeys, Marc | ETH Zurich |
Blum, Hermann | ETH Zurich |
Keywords: View Planning for SLAM, Deep Learning for Visual Perception
Abstract: Rather than having each newly deployed robot create its own map of its surroundings, the growing availability of SLAM-enabled devices provides the option of simply localizing in a map of another robot or device. In cases such as multi-robot or human-robot collaboration, localizing all agents in the same map is even necessary. However, localizing e.g. a ground robot in the map of a drone or head-mounted MR headset presents unique challenges due to viewpoint changes. This work investigates how active visual localization can be used to overcome such challenges of viewpoint changes. Specifically, we focus on the problem of selecting the optimal viewpoint at a given location. We compare existing approaches in the literature with additional proposed baselines and propose a novel data-driven approach. The result demonstrates the superior performance of our data-driven approach when compared to existing methods, both in controlled simulation experiments and real-world deployment.
|
|
16:30-18:00, Paper ThCT26-NT.4 | Add to My Program |
Autonomous Implicit Indoor Scene Reconstruction with Frontier Exploration |
|
Zeng, Jing | Zhejiang University |
Li, Yanxu | Zhejiang University |
Sun, Jiahao | Zhejiang University |
Ye, Qi | Zhejiang University |
Ran, Yunlong | Zhejiang University |
Chen, Jiming | Zhejiang University |
Keywords: View Planning for SLAM, Mapping, Continual Learning
Abstract: Implicit neural representations have demonstrated significant promise for 3D scene reconstruction. Recent works have extended their applications to autonomous implicit reconstruction through the Next Best View (NBV) based method. However, the NBV method cannot guarantee complete scene coverage and often necessitates extensive viewpoint sampling, particularly in complex scenes. In the paper, we propose to 1) incorporate frontier-based exploration tasks for global coverage with implicit surface uncertainty-based reconstruction tasks to achieve high-quality reconstruction. and 2) introduce a method to achieve implicit surface uncertainty using color uncertainty, which reduces the time needed for view selection. Further with these two tasks, we propose an adaptive strategy for switching modes in view path planning, to reduce time and maintain superior reconstruction quality. Our method exhibits the highest reconstruction quality among all planning methods and superior planning efficiency in methods involving reconstruction tasks. We deploy our method on a UAV and the results show that our method can plan multi-task views and reconstruct a scene with high quality.
|
|
16:30-18:00, Paper ThCT26-NT.5 | Add to My Program |
Probabilistic Active Loop Closure for Autonomous Exploration |
|
Yin, He | Amazon.com, Inc |
Park, Jong Jin | Amazon Lab126 |
Mendes de Almeida Neto, Marcelino | University of Texas at Austin |
Labrie, Martin | Amazon |
Zamiska, James | Amazon |
Kim, Richard | Amazon, Lab126 |
Keywords: View Planning for SLAM, SLAM, Motion and Path Planning
Abstract: When a mobile robot autonomously explores an indoor space to produce a localization and navigation map, it is important to create both a stable pose graph and a high-quality occupancy map that covers all the navigable areas. In this work, we propose a novel probabilistic active loop closure framework which attempts to maximally reduce pose graph uncertainty during exploration and improves occupancy map quality. We calculate a probabilistic reward of getting a loop closure at any pose on a pose graph, which considers both how much pose graph uncertainty would be reduced by getting a loop closure there, and the robot’s travel cost to navigate to that pose. By choosing poses that provide the largest rewards, we can maximally reduce pose graph uncertainty while avoiding long travel times. The effectiveness of the method is illustrated through on-device testing in various floor plans.
|
|
16:30-18:00, Paper ThCT26-NT.6 | Add to My Program |
CARE: Confidence-Rich Autonomous Robot Exploration Using Bayesian Kernel Inference and Optimization |
|
Xu, Yang | Zhejiang University |
Zheng, Ronghao | Zhejiang University |
Zhang, Senlin | Zhejiang University |
Liu, Meiqin | Zhejiang University |
Huang, Shoudong | University of Technology, Sydney |
Keywords: View Planning for SLAM, SLAM, Motion and Path Planning
Abstract: In this paper, we consider improving the efficiency of information-based autonomous robot exploration in unknown and complex environments. We first utilize Gaussian process (GP) regression to learn a surrogate model to infer the confidence-rich mutual information (CRMI) of querying control actions, then adopt an objective function consisting of predicted CRMI values and prediction uncertainties to conduct Bayesian optimization (BO), i.e., GP-based BO (GPBO). The trade-off between the best action with the highest CRMI value (exploitation) and the action with high prediction variance (exploration) can be realized. To further improve the efficiency of GPBO, we propose a novel lightweight information gain inference method based on Bayesian kernel inference and optimization (BKIO), achieving an approximate logarithmic complexity without the need for training. BKIO can also infer the CRMI and generate the best action using BO with bounded cumulative regret, which ensures its comparable accuracy to GPBO with much higher efficiency. Extensive numerical and real-world experiments show the desired efficiency of our proposed methods without losing exploration performance in different unstructured, cluttered environments.
|
|
16:30-18:00, Paper ThCT26-NT.7 | Add to My Program |
Event-Based Stereo Visual Odometry with Native Temporal Resolution Via Continuous-Time Gaussian Process Regression |
|
Wang, Jianeng | University of Oxford |
Gammell, Jonathan | University of Oxford |
Keywords: Vision-Based Navigation, Localization, SLAM
Abstract: Event-based cameras asynchronously capture individual visual changes in a scene. This makes them more robust than traditional frame-based cameras to highly dynamic motions and poor illumination. It also means that every measurement in a scene can occur at a unique time. Handling these different measurement times is a major challenge of using event-based cameras. It is often addressed in visual odometry (VO) pipelines by approximating temporally close measurements as occurring at one common time. This grouping simplifies the estimation problem but, absent additional sensors, sacrifices the inherent temporal resolution of event-based cameras. This paper instead presents a complete stereo VO pipeline that estimates directly with individual event-measurement times without requiring any grouping or approximation in the estimation state. It uses continuous-time trajectory estimation to maintain the temporal fidelity and asynchronous nature of event-based cameras through Gaussian process regression with a physically motivated prior. Its performance is evaluated on the MVSEC dataset, where it achieves 7.9·10-3 and 5.9·10-3 RMS relative error on two independent sequences, outperforming the existing publicly available event-based stereo VO pipeline by two and four times, respectively.
|
|
16:30-18:00, Paper ThCT26-NT.8 | Add to My Program |
MSCEqF: A Multi State Constraint Equivariant Filter for Vision-Aided Inertial Navigation |
|
Fornasier, Alessandro | University of Klagenfurt |
van Goor, Pieter | The Australian National University |
Allak, Eren | University of Klagenfurt |
Mahony, Robert | Australian National University |
Weiss, Stephan | Universität Klagenfurt |
Keywords: Vision-Based Navigation, Visual-Inertial SLAM, Localization
Abstract: This letter re-visits the problem of visual-inertial navigation system (VINS) and presents a novel filter design we dub the multi state constraint equivariant filter (MSCEqF, in analogy to the well known MSCKF). We define a symmetry group and corresponding group action that allow specifically the design of an equivariant filter for the problem of visual-inertial odometry (VIO) including IMU bias, and camera intrinsic and extrinsic calibration states. In contrast to state-of-the-art invariant extended Kalman filter (IEKF) approaches that simply tack IMU bias and other states onto the SE2(3) group, our filter builds upon a symmetry that properly includes all the states in the group structure. Thus, we achieve improved behavior, particularly when linearization points largely deviate from the truth (i.e., on transients upon state disturbances). Our approach is inherently consistent even during convergence phases from significant errors without the need for error uncertainty adaptation, observability constraint (OC), or other consistency enforcing techniques. This leads to greatly improved estimator behavior for significant error and unexpected state changes during, e.g., long-duration missions. We evaluate our approach with a multitude of different experiments using three different prominent real-world datasets.
|
|
16:30-18:00, Paper ThCT26-NT.9 | Add to My Program |
L-VIWO: Visual-Inertial-Wheel Odometry Based on Lane Lines |
|
Zhao, Bin | Northeastern University |
Zhang, Yunzhou | Northeastern University |
Huang, Junjie | Northeastern University |
Zhang, Xichen | Northeastern University |
Long, Zeyu | Northeastern University |
Li, Yulong | Northeastern University |
Keywords: Localization, SLAM, Visual-Inertial SLAM
Abstract: To achieve precise localization for autonomous vehicles and mitigate the problem of accumulated drift error in odometry, this paper proposes L-VIWO, a Visual-Inertial-Wheel Odometry based on lane lines. This method effectively utilizes the lateral constraints provided by lane lines to eliminate and relieve the incrementally accumulated pose errors. Firstly, we introduce a lane line tracking method that enables multi-frame tracking of the same lane line, thereby obtaining multi-frame data of a lane line. Then, we utilize multi-frame data of the lane lines and the curvature characteristics of adjacent lane lines to optimize the positions of the lane line sample points, thus building a reliable lane line map. Finally, we use the built local lane line map to correct the position of the vehicle. Based on the corrected position and prior pose from the odometry, we build a graph optimization model to optimize the pose of vehicle. Through localization experiments on the KAIST dataset, it has been demonstrated that the proposed method effectively enhances the localization accuracy of odometry, thus confirming the effectiveness of the method.
|
|
ThCT27-NT Oral Session, NT-G2 |
Add to My Program |
Multifingered Hands |
|
|
Chair: Katzschmann, Robert Kevin | ETH Zurich |
Co-Chair: Liarokapis, Minas | The University of Auckland |
|
16:30-18:00, Paper ThCT27-NT.1 | Add to My Program |
Fast Force-Closure Grasp Synthesis with Learning-Based Sampling |
|
Xu, Wei | Shanghai Jiao Tong University |
Guo, Weichao | Shanghai Jiao Tong University |
Shi, Xu | Shanghai Jiao Tong University |
Sheng, Xinjun | Shanghai Jiao Tong University |
Zhu, Xiangyang | Shanghai Jiao Tong University |
Keywords: Multifingered Hands, Grasping, Deep Learning in Grasping and Manipulation
Abstract: Anthropomorphic robotic hands have been widely investigated to dexterously manipulate objects because of their anatomical similarity to the human hand. However, the large dimension of configuration space challenges the real-time performance of existing grasp planning methods and drastically limits the application of anthropomorphic hands. In this letter, we propose a fast force-closure grasp synthesis (FFCGS) method for the anthropomorphic hand to efficiently grasp unknown objects. The FFCGS is implemented by using a signed distance field (SDF) as input. Firstly, a network that samples feasible 6D wrist poses is trained in an end-to-end fashion to reduce the dimension of search space. Furthermore, a fast optimization algorithm is presented to find finger configurations for force-closure precision grasp based on the differentiable Q-distance metric. We validate our method in both a simulated and a real-world environment. Experiment results show that the proposed FFCGS achieves a significantly improved performance in terms of time efficiency (5 times faster), grasp quality metrics, and success rate (5%-10% improvement) over benchmark methods. The outcomes of this study have great significance in promoting the motion planning of robot hand-arm systems and upper-limb prostheses.
|
|
16:30-18:00, Paper ThCT27-NT.2 | Add to My Program |
The New Dexterity Modular, Dexterous, Anthropomorphic, Open-Source, Bimanual Manipulation Platform: Combining Adaptive and Hybrid Actuation Systems with Lockable Joints |
|
Chang, Che-Ming | University of Auckland |
Sanches, Felipe Padula | University of Auckland |
Gao, Geng | Acumino Inc |
Liarokapis, Minas | The University of Auckland |
Keywords: Dual Arm Manipulation, Multifingered Hands, Dexterous Manipulation
Abstract: This work introduces the New Dexterity modular, dexterous, anthropomorphic, open-source, bimanual manipulation platform (OpenBMP) that is designed for research and rapid experimentation in robot grasping, dexterous manipulation, and bimanual manipulation. The platform combines adaptive and hybrid actuation systems with lockable joints, facilitating transitions between the execution of delicate and forceful tasks. Antagonistic tendon-driven elbows and inline actuator transmissions, reduce the system's inertial mass while enhancing energy efficiency and overall performance. Leveraging 3D printing and carbon fiber reinforced manufacturing of core parts the platform is easy to replicate and highly modular. This paper presents the details of the design, the actuation principles, and the experimental validation of the efficiency of the platform with the execution of complex teleoperation and telemanipulation tasks. The designs, the electronics, and the code are open-sourced to allow replication by others.
|
|
16:30-18:00, Paper ThCT27-NT.3 | Add to My Program |
Fully 3D Printable Robot Hand and Soft Tactile Sensor Based on Air-Pressure and Capacitive Proximity Sensing |
|
Taylor, Sean | University of Illinois at Urbana Champaign |
Park, Kyungseo | Daegu Gyeongbuk Institute of Science and Technology (DGIST) |
Yamsani, Sankalp | University of Illinois Urbana-Champaign |
Kim, Joohyung | University of Illinois at Urbana-Champaign |
Keywords: Multifingered Hands, Force and Tactile Sensing, Grasping
Abstract: Soft tactile sensors can enable robots to grasp objects easily and stably by simultaneously providing tactile data and mechanical compliance to robotic hands. If there are low-cost and easy-to-build robotic hands equipped with soft tactile sensors, they would be highly accessible and facilitate many robotics projects. To this end, we propose an accessible robot hand capable of tactile sensing, which can be produced through digital fabrication. We made the robot hand using commercial servo motors as well as components 3D printed from PETG, TPU, and conductive TPU. These materials allow the robot hand to have a soft, durable, and even functional structure. Specifically, the soft fingertip was crafted from TPU and conductive TPU, and their mechanical and electrical properties enable easy implementation of tactile sensing capabilities, such as force and capacitive touch, simply by adding off-the-shelf sensors (air-pressure and capacitance). The proposed robot hand could effectively sense interaction forces and proximity to conductive objects, and its utilization in various tasks was also demonstrated successfully.
|
|
16:30-18:00, Paper ThCT27-NT.4 | Add to My Program |
TPGP: Temporal-Parametric Optimization with Deep Grasp Prior for Dexterous Motion Planning |
|
Li, Haoming | Zhejiang University |
Ye, Qi | Zhejiang University |
Huo, Yuchi | Zhejiang University |
Liu, Qingtao | Zhejiang University |
Jiang, Shijian | Zhejiang University |
Zhou, Tao | Zhejiang University |
Li, Xiang | Oppo Us Research Center |
Zhou, Yang | OPPO |
Chen, Jiming | Zhejiang University |
Keywords: Constrained Motion Planning, Grasping, Multifingered Hands
Abstract: Grasping motion planning aims to find a feasible grasping trajectory in the configuration space given an input target grasp. While optimizing grasp motion with two or three-fingered grippers has been well studied, the study on natural grasp motion planning with a dexterous hand remains a very challenging problem due to the high dimensional working space. In this work, we propose a novel temporal-parametric grasp prior (TPGP) optimization method to simplify the difficulty of grasping trajectory optimization for the dexterous hand while maintaining smooth and natural properties of the grasping motion. Specifically, we formulate the discrete trajectory parameters into a temporal-based parameterization, where the prior constraint provided by a hand poser network, is introduced to ensure that hand pose is natural and reasonable throughout the trajectory. Finally, we present a joint target optimization strategy to enhance the target pose for more feasible trajectories. Extensive validations on two public datasets show that our method outperforms state-of-the-art methods regarding grasp motion on various metrics.
|
|
16:30-18:00, Paper ThCT27-NT.5 | Add to My Program |
A Wearable Robotic Hand for Hand-Over-Hand Imitation Learning |
|
Wei, Dehao | Tsinghua University |
Xu, Huazhe | Tsinghua University |
Keywords: Multifingered Hands, Learning from Demonstration, In-Hand Manipulation
Abstract: Dexterous manipulation through imitation learning has gained significant attention in robotics research. The collection of high-quality expert data holds paramount importance when using imitation learning. The existing approaches for acquiring expert data commonly involve utilizing a data glove to capture hand motion information. However, this method suffers from limitations as the collected information cannot be directly mapped to the robotic hand due to discrepancies in their degrees of freedom or structures. Furthermore, it fails to accurately capture force feedback information between the hand and objects during the demonstration process. To overcome these challenges, this paper presents a novel solution in the form of a wearable dexterous hand, namely Hand-over-hand Imitation learning wearable Robotic Hand(HIRO Hand), which integrates expert data collection and enables the implementation of dexterous operations. This HIRO Hand empowers the operator to utilize their own tactile feedback to determine appropriate force, position, and actions, resulting in more accurate imitation of the expert’s actions. We develop both non-learning and visual behavior cloning-based controllers allowing HIRO Hand successfully achieves grasping and in-hand manipulation ability.
|
|
16:30-18:00, Paper ThCT27-NT.6 | Add to My Program |
WARABI Hand: Five-Fingered Robotic Hand with Flexible Skin and Force Sensors for Social Interaction |
|
Nakane, Aoi | The Univeersity of Tokyo |
Yanokura, Iori | University of Tokyo |
Hasegawa, Shun | The University of Tokyo |
Yamaguchi, Naoya | The University of Tokyo |
Kojima, Kunio | The University of Tokyo |
Okada, Kei | The University of Tokyo |
Inaba, Masayuki | The University of Tokyo |
Keywords: Physical Human-Robot Interaction, Touch in HRI, Multifingered Hands
Abstract: A robotic hand for social interaction should be capable of comfortable touch with humans. However, it is difficult to mount skin, tactile sensors, and driving mechanism required for human contact, especially holding hands, on a slender finger. In addition, in order to unitize the hand for easy use with any robot and maintainability, the mechanism must be contained within the small space of the fingers and palms. In this paper, we propose a human-sized five-fingered robotic hand named WARABI Hand. It is covered with multi-layored rubber skin to realize human-like soft and pleasant feel. Force sensors on each finger link detect contact with humans and adjust gripping force. We conducted experiments in which a humanoid equipped with WARABI Hand grasped forearm, held hands, and interlocked fingers with a person. The performance for object grasping was also evaluated. We demonstrated that our proposed hand is useful for interaction with humans including receiving and handing over things.
|
|
16:30-18:00, Paper ThCT27-NT.7 | Add to My Program |
Sensorized Soft Skin for Dexterous Robotic Hands |
|
Egli, Jana | ETHZ |
Forrai, Benedek | ETH Zürich |
Buchner, Thomas Jakob Konrad | ETH Zurich |
Su, Jiangtao | Nanyang Technological University |
Chen, Xiaodong | Nanyang Technological University |
Katzschmann, Robert Kevin | ETH Zurich |
Keywords: Multifingered Hands, Force and Tactile Sensing, Dexterous Manipulation
Abstract: Conventional industrial robots often use two-fingered grippers or suction cups to manipulate objects or interact with the world. Because of their simplified design, they are unable to reproduce the dexterity of human hands when manipulating a wide range of objects. While the control of humanoid hands evolved greatly, hardware platforms still lack capabilities, particularly in tactile sensing and providing soft contact surfaces. In this work, we present a method that equips the skeleton of a tendon-driven humanoid hand with a soft and sensorized tactile skin. Multi-material 3D printing allows us to iteratively approach a cast skin design which preserves the robot’s dexterity in terms of range of motion and speed. We demonstrate that a soft skin enables firmer grasps and piezoresistive sensor integration enhances the hand’s tactile sensing capabilities.
|
|
16:30-18:00, Paper ThCT27-NT.8 | Add to My Program |
Identifying Expert Behavior in Offline Training Datasets Improves Behavioral Cloning of Robotic Manipulation Policies |
|
Wang, Qiang | University College Dublin |
McCarthy, Robert | CeADAR - Ireland’s Centre for Applied AI, University College Dub |
Cordova Bulens, David | University College Dublin |
Sanchez, Francisco Roldan | Dublin City University |
McGuinness, Kevin | Dublin City University |
O’Connor, Noel E. | Dublin City University |
Redmond, Stephen J. | University College Dublin |
Keywords: Imitation Learning, Deep Learning Methods, Dexterous Manipulation
Abstract: This paper presents our solution for the Real Robot Challenge III, aiming to address dexterous robotic manipulation tasks through learning from offline data. In this competition, participants were given two types of datasets for each task: expert and mixed. Each expert dataset is collected by a high-skill policy, whereas the mixed dataset is collected using both expert and non-expert policies. We found that vanilla behavioural cloning (BC) can learn a very proficient policy with minimal human intervention when trained on expert datasets. Notably, BC outperformed even the most advanced offline reinforcement learning (RL) algorithms. However, when applied to mixed datasets, the performance of BC deteriorates; the performance of offline RL algorithms is also less than satisfactory. Upon examining the provided datasets, it was apparent that each mixed dataset contained a significant proportion of expert data, which should enable the training of a proficient BC agent. However, the expert data is not labelled in the datasets. As a result, we propose a classifier to identify the pattern of the expert behaviour within a mixed dataset and then utilize it to isolate the expert data. To further boost the BC performance, we take advantage of the geometric symmetry of the arena to augment the training dataset through mathematical transformations. Our submission outperformed that of other participants. Site: https://github.com/wq13552463699/Real-Robot-Challenge-III-Winning-Solution
|
|
16:30-18:00, Paper ThCT27-NT.9 | Add to My Program |
Development of a Versatile Robotic Hand Toward Jig-Less Assembly of a Shaft-Shaped Part |
|
Shibata, Kohei | Wakayama University |
Dobashi, Hiroki | Wakayama University |
Keywords: Multifingered Hands, Grasping
Abstract: Jig-less assembly of a shaft-shaped part with a single versatile robotic hand requires several functions of the hand to achieve a series of operations such as alignment, picking, reorientation, and positioning of the part. In this research, we propose a novel robotic hand with these functions and corresponding finger mechanisms. Moreover, we propose a manipulation strategy for grasping shaft-shaped parts with the proposed hand, and experimentally verify the feasibility of desired operations with the proposed method as well as the versatility of the hand for several different parts.
|
|
ThCT28-NT Oral Session, NT-G4 |
Add to My Program |
Perception for Grasping and Manipulation III |
|
|
Chair: Alzugaray, Ignacio | Imperial College London |
Co-Chair: Han, Yiheng | Beijing University of Technology |
|
16:30-18:00, Paper ThCT28-NT.1 | Add to My Program |
AHPPEBot: Autonomous Robot for Tomato Harvesting Based on Phenotyping and Pose Estimation |
|
Li, Xingxu | Beijing University of Technology, Beijing, China |
Ma, Nan | Beijing University of Technology, Beijing, China |
Han, Yiheng | Beijing University of Technology |
Yang, Shun | Beijing AIForceTech Technology Co., Ltd |
Zheng, Siyi | Beijing AIForce Technology Co., Ltd |
Keywords: Perception for Grasping and Manipulation, Robotics and Automation in Agriculture and Forestry, Agricultural Automation
Abstract: To address the limitations inherent to conventional automated harvesting robots specifically their suboptimal success rates and risk of crop damage, we design a novel bot named AHPPEBot which is capable of autonomous harvesting based on crop phenotyping and pose estimation. Specifically, In phenotyping, the detection, association, and maturity estimation of tomato trusses and individual fruits are accomplished through a multi-task YOLOv5 model coupled with a detection-based adaptive DBScan clustering algorithm. In pose estimation, we employ a deep learning model to predict seven semantic keypoints on the pedicel. These keypoints assist in the robot's path planning, minimize target contact, and facilitate the use of our specialized end effector for harvesting. In autonomous tomato harvesting experiments conducted in commercial greenhouses, our proposed robot achieved a harvesting success rate of 86.67%, with an average successful harvest time of 32.46 s, showcasing its continuous and robust harvesting capabilities. The result underscores the potential of harvesting robots to bridge the labor gap in agriculture.
|
|
16:30-18:00, Paper ThCT28-NT.2 | Add to My Program |
Unknown Object Grasping for Assistive Robotics |
|
Miller, Elle | University of Edinburgh |
Durner, Maximilian | German Aerospace Center DLR |
Humt, Matthias | German Aerospace Center (DLR), Technical University Munich (TUM) |
Quere, Gabriel | DLR |
Boerdijk, Wout | German Aerospace Center (DLR) |
Sundaram, Ashok M. | German Aerospace Center (DLR) |
Stulp, Freek | DLR - Deutsches Zentrum Für Luft Und Raumfahrt E.V |
Vogel, Jörn | German Aerospace Center (DLR) |
Keywords: Perception for Grasping and Manipulation, Physically Assistive Devices, Grasping
Abstract: We propose a novel pipeline for unknown object grasping in shared robotic autonomy scenarios. State-of-the-art methods for fully autonomous scenarios are typically learning-based approaches optimised for a specific end-effector, that generate grasp poses directly from sensor input. In the domain of assistive robotics, we seek instead to utilise the user's cognitive abilities for enhanced satisfaction, grasping performance, and alignment with their high level task-specific goals. Given a pair of stereo images, we perform unknown object instance segmentation and generate a 3D reconstruction of the object of interest. In shared control, the user then guides the robot end-effector across a virtual hemisphere centered around the object to their desired approach direction. A physics-based grasp planner finds the most stable local grasp on the reconstruction, and finally the user is guided by shared control to this grasp. In experiments on the DLR EDAN platform, we report a grasp success rate of 87% for 10 unknown objects, and demonstrate the method's capability to grasp objects in structured clutter and from shelves.
|
|
16:30-18:00, Paper ThCT28-NT.3 | Add to My Program |
Liquids Identification and Manipulation Via Digitally Fabricated Impedance Sensors |
|
Zhu, Junyi | Massachusetts Institute of Technology |
Lee, Young Joong | Massachusetts Institute of Technology |
Luo, Yiyue | Massachusetts Institute of Technology |
Xu, Tianyu | Massachusetts Institute of Technology, Google |
Liu, Chao | Massachusetts Institute of Technology |
Rus, Daniela | MIT |
Mueller, Stefanie | MIT CSAIL |
Matusik, Wojciech | MIT |
Keywords: Perception for Grasping and Manipulation, Grippers and Other End-Effectors, Intelligent and Flexible Manufacturing
Abstract: Despite recent exponential advancements in computer vision and reinforcement learning, it remains challenging for robots to interact with liquids. These challenges are particularly pronounced due to the limitations imposed by opaque containers, transparent liquids, fine-grained splashes, and visual obstructions arising from the robot's own manipulation activities. Yet, there exists a substantial opportunity for robotics to excel in liquid identification and manipulation, given its potential role in chemical handling in laboratories and various manufacturing sectors such as pharmaceuticals or beverages. In this work, we present a novel approach for liquid class identification and state estimation leveraging electrical impedance sensing. We design and mount a digitally embroidered electrode array to a commercial robot gripper. Coupled with a customized impedance sensing board, we collect data on liquid manipulation with a swept frequency sensing mode and a frequency-specific impedance measuring mode. Our developed learning-based model achieves an accuracy of 93.33% in classifying 9 different types of liquids (8 liquids + air), and 97.65% in estimating the liquid state. We investigate the effectiveness of our system with a series of ablation studies. These findings highlight our work as a promising solution for enhancing robotic manipulation in liquid-related tasks.
|
|
16:30-18:00, Paper ThCT28-NT.4 | Add to My Program |
Learning to Grasp in Clutter with Interactive Visual Failure Prediction |
|
Murray, Michael | University of Washington |
Gupta, Abhishek | University of Washington |
Cakmak, Maya | University of Washington |
Keywords: Perception for Grasping and Manipulation, Learning from Experience, Grasping
Abstract: Modern warehouses process millions of unique objects which are often stored in densely packed containers. To automate tasks in this environment, a robot must be able to pick diverse objects from highly cluttered scenes. Real-world learning is a promising approach, but executing picks in the real world is time-consuming, can induce costly failures, and often requires extensive human intervention, which causes operational burden and limits the scope of data collection and deployments. In this work, we leverage interactive probes to visually evaluate grasps in clutter without fully executing picks, a capability we refer to as Interactive Visual Failure Prediction (IVFP). This enables autonomous verification of grasps during execution to avoid costly downstream failures as well as autonomous reward assignment, providing supervision to continuously shape and improve grasping behavior as the robot gathers experience in the real world, without constantly requiring human intervention. Through experiments on a Stretch RE1 robot, we study the effect that IVFP has on performance - both in terms of effective data throughput and success rate, and show that this approach leads to grasping policies that outperform policies trained with human supervision alone, while requiring significantly less human intervention.
|
|
16:30-18:00, Paper ThCT28-NT.5 | Add to My Program |
Kinesthetic-Based In-Hand Object Recognition with an Underactuated Robotic Hand |
|
Arolovitch, Julius | Carnegie Mellon University |
Azulay, Osher | Tel Aviv University |
Sintov, Avishai | Tel-Aviv University |
Keywords: Perception for Grasping and Manipulation, Tendon/Wire Mechanism
Abstract: Tendon-based underactuated hands are intended to be simple, compliant and affordable. Often, they are 3D printed and do not include tactile sensors. Hence, performing in-hand object recognition with direct touch sensing is not feasible. Adding tactile sensors can complicate the hardware and introduce extra costs to the robotic hand. Also, the common approach of visual perception may not be available due to occlusions. In this paper, we explore whether kinesthetic haptics can provide in-direct information regarding the geometry of a grasped object during in-hand manipulation with an underactuated hand. By solely sensing actuator positions and torques over a period of time during motion, we show that a classifier can recognize an object from a set of trained ones with a high success rate of almost 95%. In addition, the implementation of a real-time majority vote during manipulation further improves recognition. Additionally, a trained classifier is also shown to be successful in distinguishing between shape categories rather than just specific objects.
|
|
16:30-18:00, Paper ThCT28-NT.6 | Add to My Program |
Fit-NGP: Fitting Object Models to Neural Graphics Primitives |
|
Taher, Marwan | Imperial College London |
Alzugaray, Ignacio | Imperial College London |
Davison, Andrew J | Imperial College London |
Keywords: Perception for Grasping and Manipulation, Semantic Scene Understanding, Object Detection, Segmentation and Categorization
Abstract: Accurate 3D object pose estimation is key to enabling many robotic applications that involve challenging object interactions. In this work, we show that the density field created by a state-of-the-art efficient radiance field re-construction method is suitable for highly accurate and robust pose estimation for objects with known 3D models, even when they are very small and with challenging reflective surfaces. We present a fully automatic object pose estimation system based on a robot arm with a single wrist-mounted camera, which can scan a scene from scratch, detect and estimate the 6-Degrees of Freedom (DoF) poses of multiple objects within a couple of minutes of operation. Small objects such as bolts and nuts are estimated with accuracy on order of 1mm.
|
|
16:30-18:00, Paper ThCT28-NT.7 | Add to My Program |
Efficient Object Rearrangement Via Multi-View Fusion |
|
Huang, Dehao | Southern University of Science and Technology |
Tang, Chao | Southern University of Science and Technology |
Zhang, Hong | SUSTech |
Keywords: Perception for Grasping and Manipulation, Service Robotics
Abstract: The prospect of assistive robots aiding in object organization has always been compelling. In an image-goal setting, the robot rearranges the current scene to match the single image captured from the goal scene. The key to an image-goal rearrangement system is estimating the desired placement pose of each object based on the single goal image and observations from the current scene. In order to establish sufficient associations for accurate estimation, the system should observe an object from a viewpoint similar to that in the goal image. Existing image-goal rearrangement systems, due to their reliance on a fixed viewpoint for perception, often require redundant manipulations to randomly adjust an object's pose for a better perspective. Addressing this inefficiency, we introduce a novel object rearrangement system that employs multi-view fusion. By observing the current scene from multiple viewpoints before manipulating objects, our approach can estimate a more accurate pose without redundant manipulation times. A standard visual localization pipeline at the object level is developed to capitalize on the advantages of multi-view observations. Simulation results demonstrate that the efficiency of our system outperforms existing single-view systems. The effectiveness of our system is further validated in a physical experiment. For videos, please visit https://sites.google.com/view/multi-view-rearr.
|
|
16:30-18:00, Paper ThCT28-NT.8 | Add to My Program |
EDOPT: Event-Camera 6-DoF Dynamic Object Pose Tracking |
|
Glover, Arren | Istituto Italiano Di Tecnologia |
Gava, Luna | University of Genova |
Li, Zhichao | Istituto Italiano Di Tecnologia |
Bartolozzi, Chiara | Istituto Italiano Di Tecnologia |
Keywords: Perception for Grasping and Manipulation, Visual Tracking, Object Detection, Segmentation and Categorization
Abstract: High-frequency, low-latency, 6-DoF object tracking is useful for grasping objects in motion, taking robots beyond pick-and-place tasks. We propose using an event-camera for tracking the objects to leverage the low-latency and continuous (i.e. not fixed-rate) data capture for high-frequency tracking. We propose the EDOPT algorithm, which maintains real-time operation with a variable event-rate (which occurs due to variation in camera velocity and scene texture) and avoids frame-jumps and motion-blur which are problematic in traditional computer vision solutions. EDOPT uses a strong object prior, leading to a novel solution possible only with the event-camera. To our knowledge, this is the first method for 6-DoF object pose estimation with only the event-camera. The proposed method achieves comparable results to a state-of-the-art DNN technique that fuses frames, depth, and events. We demonstrate smooth, online object pose tracking with a live camera feed at >300 Hz.
|
|
16:30-18:00, Paper ThCT28-NT.9 | Add to My Program |
Attention-Based Cloth Manipulation from Model-Free Topological Representation |
|
Galassi, Kevin | Universitŕ Di Bologna |
Wu, Bingbing | Naver Labs Europe |
Perez, Julien | Naver Labs Europe |
Palli, Gianluca | University of Bologna |
Renders, Jean-Michel | Naver Labs Europe |
Keywords: Perception for Grasping and Manipulation, Imitation Learning, Manipulation Planning
Abstract: The robotic manipulation of deformable objects, such as clothes and fabric, is known as a complex task from both the perception and planning perspectives. Indeed, the stochastic nature of the underlying environment dynamics makes it an interesting research field for statistical learning approaches and neural policies. In this work, we introduce a novel attention-based neural architecture capable of solving a smoothing task for such objects by means of a single robotic arm. To train our network, we leverage an oracle policy, executed in simulation, which uses the topological description of a mesh of points for representing the object to smooth. In a second step, we transfer the resulting behavior in the real world with imitation learning using the cloth point cloud as decision support, which is captured from a single RGBD camera placed egocentrically on the wrist of the arm. This approach allows fast training of the real-world manipulation neural policy while not requiring scene reconstruction at test time, but solely a point cloud acquired from a single RGBD camera. Our resulting policy first predicts the desired point to choose from the given point cloud and then the correct displacement to achieve a smoothed cloth. Experimentally, we first assess our results in a simulation environment by comparing them with an existing heuristic policy, as well as several baseline attention architectures. Then, we validate the performance of our approach in a real-world scenario.
|
|
ThCT29-NT Oral Session, NT-G5 |
Add to My Program |
Object Detection V |
|
|
Chair: Shi, Qing | Beijing Institute of Technology |
Co-Chair: Dias, Jorge | Khalifa University |
|
16:30-18:00, Paper ThCT29-NT.1 | Add to My Program |
WLST: Weak Labels Guided Self-Training for Weakly-Supervised Domain Adaptation on 3D Object Detection |
|
Tsou, Tsung Lin | National Taiwan University |
Wu, Tsung-Han | National Taiwan University |
Hsu, Winston | National Taiwan University |
Keywords: Object Detection, Segmentation and Categorization, Sensor Fusion, Transfer Learning
Abstract: In the field of domain adaptation (DA) on 3D object detection, most of the work is dedicated to unsupervised domain adaptation (UDA). Yet, without any target annotations, the performance gap between the UDA approaches and the fully-supervised approach is still noticeable, which is impractical for real-world applications. On the other hand, weakly-supervised domain adaptation (WDA) is an underexplored yet practical task that only requires few labeling effort on the target domain. To improve the DA performance in a cost-effective way, we propose a general weak labels guided self-training framework, WLST, designed for WDA on 3D object detection. By incorporating autolabeler, which can generate 3D pseudo labels from 2D bounding boxes, into the existing self-training pipeline, our method is able to generate more robust and consistent pseudo labels that would benefit the training process on the target domain. Extensive experiments demonstrate the effectiveness, robustness, and detector-agnosticism of our WLST framework. Notably, it outperforms previous state-of-the-art methods on all evaluation tasks.
|
|
16:30-18:00, Paper ThCT29-NT.2 | Add to My Program |
Towards a Robust Sensor Fusion Step for 3D Object Detection on Corrupted Data |
|
Wozniak, Maciej Kazimierz | KTH Royal Institute of Technology |
Karefjard, Viktor | KTH Royal Institute of Technology |
Thiel, Marko | Hamburg University of Technology (TUHH) |
Jensfelt, Patric | KTH - Royal Institute of Technology |
Keywords: Object Detection, Segmentation and Categorization, Sensor Fusion, Deep Learning for Visual Perception
Abstract: Multimodal sensor fusion methods for 3D object detection have been revolutionizing the autonomous driving re- search field. Nevertheless, most of these methods heavily rely on dense LiDAR data and accurately calibrated sensors which is often not the case in real-world scenarios. Data from LiDAR and cameras often comes misaligned due to the miscalibration, de- calibration, or different frequencies of the sensors. Additionally, some parts of the LiDAR data may be occluded and parts of the data may be missing due to hardware malfunction or weather conditions. This work presents a novel fusion step that addresses data corruptions and makes sensor fusion for 3D object detection more robust. Through extensive experiments, we demonstrate that our method performs on par with state-of-the-art approaches on normal data and outperforms them on misaligned data.
|
|
16:30-18:00, Paper ThCT29-NT.3 | Add to My Program |
TerrainSense: Vision-Driven Mapless Navigation for Unstructured Off-Road Environments |
|
Hassan, Bilal | Khalifa University, Abu Dhabi |
Sharma, Arjun | Khalifa University |
Abdel Madjid, Nadya | Khalifa University |
Khonji, Majid | Khalifa University |
Dias, Jorge | Khalifa University |
Keywords: Semantic Scene Understanding, Object Detection, Segmentation and Categorization, AI-Based Methods
Abstract: Navigating autonomous vehicles efficiently across unstructured and off-road terrains remains a formidable challenge, often requiring intricate mapping or multi-step pipelines. However, these conventional approaches struggle to adapt to dynamic environments. This paper presents TerrainSense, an end-to-end framework that overcomes these limitations. By utilizing a transformers, TerrainSense detects lane semantics and topology from camera images, enabling mapless path planning without the reliance on highly detailed maps. The efficacy of TerrainSense was rigorously assessed on six diverse datasets, evaluating its efficacy in detection, segmentation, and path prediction using various metrics. Notably, it outperforms the other state-of-the-art methods by 9.32% in precisely predicting the path with 18.28% faster inference time.
|
|
16:30-18:00, Paper ThCT29-NT.4 | Add to My Program |
RCM-Fusion: Radar-Camera Multi-Level Fusion for 3D Object Detection |
|
Kim, Jisong | Hanyang University |
Seong, Minjae | Hanyang University |
Bang, Geonho | Hanyang University |
Kum, Dongsuk | KAIST |
Choi, Jun Won | Seoul National University |
Keywords: Object Detection, Segmentation and Categorization, Sensor Fusion, Deep Learning for Visual Perception
Abstract: While LiDAR sensors have been successfully applied to 3D object detection, the affordability of radar and camera sensors has led to a growing interest in fusing radars and cameras for 3D object detection. However, previous radarcamera fusion models could not fully utilize the potential of radar information. In this paper, we propose Radar-Camera Multi-level fusion (RCM-Fusion), which attempts to fuse both modalities at feature and instance levels. For feature-level fusion, we propose a Radar Guided BEV Encoder which transforms camera features into precise BEV representations using the guidance of radar Bird’s-Eye-View (BEV) features and combines the radar and camera BEV features. For instance-level fusion, we propose a Radar Grid Point Refinement module that reduces localization error by accounting for the characteristics of the radar point clouds. The experiments on the public nuScenes dataset demonstrate that our proposed RCM-Fusion achieves state-of-the-art performances among single framebased radar-camera fusion methods in the nuScenes 3D object detection benchmark. The code will be made publicly available.
|
|
16:30-18:00, Paper ThCT29-NT.5 | Add to My Program |
One-Vs-All Semi-Automatic Labeling Tool for Semantic Segmentation in Autonomous Driving |
|
Jing, Gu | Expleo Germany GmbH |
Keywords: Semantic Scene Understanding, Object Detection, Segmentation and Categorization, Computer Vision for Automation
Abstract: Semantic image segmentation plays a pivotal role in creating High-Definition (HD) maps for autonomous driving, where every pixel in an image is assigned a label from a specific semantic class. However, obtaining dense pixel-level annotations for model training is a laborious and expensive process. Active learning holds promise as a method to reduce the human annotation effort needed for semantic segmentation. However, existing active learning methods often perform well in the majority classes but struggle with the minority classes, negatively impacting segmentation performance. To tackle this challenge, we propose a novel One-vs-All (OVA) active learning framework, known as OVAAL. This paper explains how OVAAL can shift more attention towards the minority classes and thoroughly analyzes its contributions to performance enhancement. Additionally, we introduce an OVA-based semi-supervised learning method for post-processing, referred to as OVAAL+. Our results demonstrate that both OVAAL and OVAAL+ lead to significant improvements, with mean Intersection over Union (mIoU) gains of 4.55% and 6.38%, respectively, compared to the state-of-the-art active learning method Pixelpick on the Cityscapes semantic segmentation benchmark. These improvements are achieved while maintaining an economical annotation budget of 1.44% of the training data. We foresee further research exploring the potential of OVA-based active selection to address challenges in cold start scenarios and resource-constrained training environments.
|
|
16:30-18:00, Paper ThCT29-NT.6 | Add to My Program |
LiRaFusion: Deep Adaptive LiDAR-Radar Fusion for 3D Object Detection |
|
Song, Jingyu | University of Michigan |
Zhao, Lingjun | University of Michigan - Ann Arbor |
Skinner, Katherine | University of Michigan |
Keywords: Object Detection, Segmentation and Categorization, Sensor Fusion, Deep Learning for Visual Perception
Abstract: We propose LiRaFusion to tackle LiDAR-radar fusion for 3D object detection to fill the performance gap of existing LiDAR-radar detectors. To improve the feature extraction capabilities from these two modalities, we design an early fusion module for joint voxel feature encoding, and a middle fusion module to adaptively fuse feature maps via a gated network. We perform extensive evaluation on nuScenes to demonstrate that LiRaFusion leverages the complementary information of LiDAR and radar effectively and achieves notable improvement over existing methods.
|
|
16:30-18:00, Paper ThCT29-NT.7 | Add to My Program |
BaSAL: Size Balanced Warm Start Active Learning for LiDAR Semantic Segmentation |
|
Wei, Jiarong | Delft University of Technology |
Lin, Yancong | Delft University of Technology (TU Delft) |
Caesar, Holger | Delft University of Technology |
Keywords: Semantic Scene Understanding, Object Detection, Segmentation and Categorization, Deep Learning for Visual Perception
Abstract: Active learning strives to reduce the need for costly data annotation, by repeatedly querying an annotator to label the most informative samples from a pool of unlabeled data, and then training a model from these samples. We identify two problems with existing active learning methods for LiDAR semantic segmentation. First, they overlook the severe class imbalance inherent in LiDAR semantic segmentation datasets. Second, to bootstrap the active learning loop when there is no labeled data available, they train their initial model from randomly selected data samples, leading to low performance. This situation is referred to as the cold start problem. To address these problems we propose BaSAL, a size-balanced warm start active learning model, based on the observation that each object class has a characteristic size. By sampling object clusters according to their size, we can thus create a size-balanced dataset that is also more class-balanced. Furthermore, in contrast to existing information measures like entropy or CoreSet, size-based sampling does not require a pretrained model, thus addressing the cold start problem effectively. Results show that we are able to improve the performance of the initial model by a large margin. Combining warm start and size-balanced sampling with established information measures, our approach achieves comparable performance to training on the entire SemanticKITTI dataset, despite using only 5% of the annotations, outperforming existing active learning methods. We also match the existing state-of-the-art in active learning on nuScenes.
|
|
16:30-18:00, Paper ThCT29-NT.8 | Add to My Program |
Trajectory-Prediction-Based Dynamic Tracking of a UGV to a Moving Target under Multi-Disturbed Conditions |
|
Si, Jinge | Beijing Institute of Technology |
Li, Bin | Beijing Institute of Technology |
Xu, Yongkang | Beijing Institute of Technology |
Wang, Liang | Beijing Institute of Technology |
Deng, ChenCheng | Beijing Institute of Technology |
Wang, Shoukun | Beijing Institute of Technology |
Wang, Junzheng | Beijing Institute of Technology |
Keywords: Robust/Adaptive Control, Wheeled Robots, Motion Control
Abstract: Tracking dynamic targets poses a significant challenge for Unmanned Ground Vehicles (UGVs). Existing methods often lack research on multi-disturbed conditions. To address this issue, we propose a trajectory-prediction-based dynamic tracking scheme, which includes target localization, trajectory prediction, and UGV control. Firstly, an estimation algorithm based on the Extended Kalman Filter (EKF) is employed to mitigate noise and estimate the absolute states of the target accurately. To enhance robustness, we present an Adaptive Trajectory Prediction (ATP) algorithm based on prediction anchors. In this method, a quantization standard for trajectory disturbance is designed for adaptive control. Subsequently, we iteratively solve prediction anchor points based on two motion models to robustly predict the target trajectory even in the presence of unknown disturbances. Finally, the Linear Time-Varying Model Predictive Control (LTV-MPC) is utilized in the UGV controller for dynamic tracking. Experimental results demonstrate that the ATP exhibits superior prediction robustness and accuracy in perturbed environments compared to other prediction algorithms. In addition, the proposed scheme effectively achieves dynamic tracking of the Unmanned Aerial Vehicle (UAV) by the UGV under multi-disturbed conditions. Specifically, when the target moves at a speed of 1.0 m/s, the UGV can maintain a tracking error within 0.346 m.
|
|
ThCT30-NT Oral Session, NT-G6 |
Add to My Program |
AI Robotics |
|
|
Co-Chair: Görner, Michael | University of Hamburg |
|
16:30-18:00, Paper ThCT30-NT.1 | Add to My Program |
Sim-To-Real Robotic Sketching Using Behavior Cloning and Reinforcement Learning |
|
Jia, Biao | University of Maryland at College Park |
Manocha, Dinesh | University of Maryland |
Keywords: Art and Entertainment Robotics, AI-Enabled Robotics
Abstract: Robotic sketching in real-world scenarios poses a challenging problem with diverse applications in art, robotics, and digital design. We present a novel approach that bridges the gap between digital and robotic sketching, leveraging behavior cloning and reinforcement learning techniques. This paper intro- duces an approach aimed at bridging the gap between simulated and real-world robotic sketching through the integration of behavior cloning and reinforcement learning techniques. Our approach trains painting policies that operate effectively in both virtual environments and real-world robotic sketching systems. We have implemented a robotic sketching system featuring an UltraArm robot equipped with a RealSense D415 camera, closely emulating the MyPaint virtual environment. Our system can perceive its environment and adapt painting policies to natural painting media. Our results highlight the effectiveness of our agent in terms of acquiring policies for high-dimensional continuous action spaces, enabling the seamless transfer of brush manipulation techniques from simulation to practical robotic sketching. Furthermore, we demonstrate our robotic sketching system’s capability to generate complex images and strokes using various configurations.
|
|
16:30-18:00, Paper ThCT30-NT.2 | Add to My Program |
Safe Table Tennis Swing Stroke with Low-Cost Hardware |
|
Cursi, Francesco | Imperial College London |
Kalander, Marcus | Huawei Technologies |
Wu, Shuang | Huawei |
Xue, Xidi | Huawei |
Tian, Yu | The Chinese University of Hong Kong |
Tian, Guangjian | Huawei |
Quan, Xingyue | Huawei |
Hao, Jianye | Noah's Ark Lab |
Keywords: Art and Entertainment Robotics, Constrained Motion Planning, Reinforcement Learning
Abstract: Playing table tennis with a human player is a challenging robotic task due to its dynamic nature. Despite a number of researches being devoted to developing robotic table tennis systems, most of the works have demanding hardware requirements and ignore safety measures when generating the swing stoke. To address these issues, we propose a safe motion planning framework that fully pushes the robotic hardware performance limits to play table tennis. In particular, we propose a pipeline to generate manipulator joint trajectories with environmental safety constraints and scale the trajectories to satisfy joint movement limitations. We use three different agents to validate the planning algorithm with our handmade robot platform in both simulation and real-world environments.
|
|
16:30-18:00, Paper ThCT30-NT.3 | Add to My Program |
Pluck and Play: Self-Supervised Exploration of Chordophones for Robotic Playing |
|
Görner, Michael | University of Hamburg |
Hendrich, Norman | University of Hamburg |
Zhang, Jianwei | University of Hamburg |
Keywords: Art and Entertainment Robotics, Representation Learning, Incremental Learning
Abstract: Existing robotic musicians utilize detailed handcrafted instrument models to generate or learn policies for playing because model-free or inaccurate policy rollouts might easily damage or wear out fragile instruments. We introduce an approach to characterize geometric models of chordophones and their audio onset responses directly through audio-tactile exploration with a physical robot arm. Initially, the system refines prior estimates of string positions, provided by kinesthetic teaching or visual estimation, through repeated attempts to pluck individual strings. A subsequent stage implements a Safe Active Exploration paradigm based on Gaussian Processes to explore and characterize the audio onset response of feasible plucking motions while minimizing invalid attempts. The resulting models can be used to actuate an imprecise robotic arm to play sequences of notes with varying loudness on a Chinese Guzheng.
|
|
16:30-18:00, Paper ThCT30-NT.4 | Add to My Program |
MBot: A Modular Ecosystem for Scalable Robotics Education |
|
Gaskell, Peter | University of Michigan |
Pavlasek, Jana | University of Michigan |
Gao, Tom | University of Michigan |
Narula, Abhishek | University of Michigan |
Lewis, Stanley | University of Michigan |
Jenkins, Odest Chadwicke | University of Michigan |
Keywords: Education Robotics, Hardware-Software Integration in Robotics
Abstract: The Michigan Robotics MBot is a low-cost mobile robot platform that has been used to train over 1,400 students in autonomous navigation since 2014 at the University of Michigan and our collaborating colleges. The MBot platform was designed to meet the needs of teaching robotics at scale to match the growth of robotics as a field and an academic discipline, spanning all levels of undergraduate and graduate experiences. Transformative advancements in robot navigation over the past decades have led to a significant demand for skilled roboticists across industry and academia. This demand has sparked a need for robotics courses in higher education. Incorporating real robot platforms into such courses and curricula is effective for sparking student motivation and conveying the unique challenges of programming embodied agents in real-world environments. However, teaching with real robots remains challenging due to the cost of hardware and the development effort involved in adapting existing hardware for a new course. In this paper, we describe the design and evolution of the MBot platform, and keys to success in terms of its scalability and flexibility.
|
|
16:30-18:00, Paper ThCT30-NT.5 | Add to My Program |
SO(2)-Equivariant Downwash Models for Close Proximity Flight |
|
Smith, Henry | University of Cambridge |
Shankar, Ajay | University of Cambridge, UK |
Gielis, Jennifer | University of Cambridge |
Blumenkamp, Jan | University of Cambrdige |
Prorok, Amanda | University of Cambridge |
Keywords: Machine Learning for Robot Control, Aerial Systems: Applications, Multi-Robot Systems
Abstract: Multirotors flying in close proximity induce aerodynamic wake effects on each other through propeller downwash. Conventional methods have fallen short of providing adequate 3D force-based models that can be incorporated into robust control paradigms for deploying dense formations. Thus, learning a model for these downwash patterns presents an attractive solution. In this paper, we present a novel learning-based approach for modelling the downwash forces that exploits the latent geometries (i.e. symmetries) present in the problem. We demonstrate that when trained with only 5 minutes of real world flight data, our geometry-aware model outperforms state-of-the-art baseline models trained with more than 15 minutes of data. In dense real-world flights with two vehicles, deploying our model online improves 3D trajectory tracking by nearly 36% on average (and vertical tracking by 56%).
|
|
16:30-18:00, Paper ThCT30-NT.6 | Add to My Program |
Hierarchical Meta-Learning-Based Adaptive Controller |
|
Xie, Fengze | California Institute of Technology |
Shi, Guanya | Carnegie Mellon University |
O'Connell, Michael | California Institute of Technology |
Yue, Yisong | California Institute of Technology |
Chung, Soon-Jo | Caltech |
Keywords: Machine Learning for Robot Control, Aerial Systems: Mechanics and Control, Robust/Adaptive Control
Abstract: We study how to design learning-based adaptive controllers that enable fast and accurate online adaptation in changing environments. In these settings, learning is typically done during an initial (offline) design phase, where the vehicle is exposed to different environmental conditions and disturbances (e.g., a drone exposed to different winds) to collect training data. Our work is motivated by the observation that real-world disturbances fall into two categories: 1) those that can be directly monitored or controlled during training, which we call ``manageable''; and 2) those that cannot be directly measured or controlled (e.g., nominal model mismatch, air plate effects, and unpredictable wind), which we call ``latent''. Imprecise modeling of these effects can result in degraded control performance, particularly when latent disturbances continuously vary. This paper presents the Hierarchical Meta-learning-based Adaptive Controller (HMAC) to learn and adapt to such multi-source disturbances. Within HMAC, we develop two techniques: 1) Hierarchical Iterative Learning, which jointly trains representations to caption the various sources of disturbances, and 2) Smoothed Streaming Meta-Learning, which learns to capture the evolving structure of latent disturbances over time (in addition to standard meta-learning on the manageable disturbances). Experimental results demonstrate that HMAC exhibits more precise and rapid adaptation to multi-source disturbances than other adaptive controllers.
|
|
16:30-18:00, Paper ThCT30-NT.7 | Add to My Program |
A Novel Wide-Area Multiobject Detection System with High-Probability Region Searching |
|
Long, Xianlei | Chongqing University |
Zhao, Hui | College of Computer Science, China University of Geoscience |
Chen, Chao | Chongqing University |
Gu, Fuqiang | Chongqing University |
Gu, Qingyi | Institute of Automation, Chinese Academy of Sciences |
Keywords: Surveillance Robotic Systems, Hardware-Software Integration in Robotics, Computer Vision for Transportation
Abstract: In recent years, wide-area visual detection systems have been widely applied in various industrial and security sectors. These systems, however, face significant challenges when implementing multi-object detection due to conflicts arising from the need for high-resolution imaging, efficient object searching, and accurate localization. To address these challenges, this paper presents a hybrid system that incorporates a wide-angle camera, a high-speed search camera, and a galvano-mirror. In this system, the wide-angle camera offers panoramic images as prior information, which helps the search camera in capturing detailed images of the targeted objects. This integrated approach enhances the overall efficiency and effectiveness of wide-area visual detection systems. Specifically, in this study, we introduce a wide-angle camera-based method to generate a panoramic probability map (PPM) for estimating high-probability regions of target object presence. Then, we propose a probability searching module that uses the PPM-generated prior information to dynamically adjust the sampling range and refine target coordinates based on uncertainty variance computed by the object detector. Finally, the integration of PPM and the probability searching module yields an efficient hybrid vision system capable of achieving 120 fps multi-object search and detection. Extensive experiments are conducted to verify the system's effectiveness and robustness.
|
|
ThCT31-NT Oral Session, NT-G7 |
Add to My Program |
Autonomous Vehicle Navigation III |
|
|
Chair: Yang, Ming | Shanghai Jiao Tong University |
Co-Chair: Song, Ran | Shandong University |
|
16:30-18:00, Paper ThCT31-NT.1 | Add to My Program |
Implicit Point Function for LiDAR Super-Resolution in Autonomous Driving |
|
Park, Minseong | Yonsei University |
Son, Haengseon | Korea Electronics Technology Institute |
Kim, Euntai | Yonsei University |
Keywords: Autonomous Vehicle Navigation, Deep Learning for Visual Perception
Abstract: LiDAR super-resolution is a relatively new problem in which we seek to fill in the blanks between measured points when a low-resolution LiDAR is given, making a high-resolution LiDAR or even a resolution-free LiDAR. Recently, several research works have been reported regarding LiDAR super-resolution. However, most of the works on LiDAR super-resolution have the drawback that they first transform 3D LiDAR point cloud into 2D depth map and upsample the LiDAR output by applying the image super-resolution method, ignoring the 3D geometric information of the point cloud obtained from a LiDAR. To solve the above problem, we propose a new deep learning network named as implicit point function (IPF). The basic idea of IPF is that when we are given low-resolution point cloud and a query ray, we generate the 3D target point embeddings on the query ray using on-the-ray positional embedding and local features, preserving the 3D geometric information of the given point cloud. Then, we aggregate them into one target point via the attention mechanism. IPF enables us to learn continuous representation of 3D space from low-resolution LiDAR and upsample a small number of layers to any number that we want. Finally, our IPF is applied to large-scale synthetic dataset and real dataset, and its validity is demonstrated by comparing with the previous methods.
|
|
16:30-18:00, Paper ThCT31-NT.2 | Add to My Program |
Circular Accessible Depth: A Robust Traversability Representation for UGV Navigation |
|
Xie, Shikuan | Shandong University |
Song, Ran | Shandong University |
Zhao, Yuenan | Shandong University |
Huang, Xueqin | Shandong University |
Li, Yibin | Shandong University |
Zhang, Wei | Shandong University |
Keywords: Autonomous Vehicle Navigation, Deep Learning in Robotics and Automation, AI-Based Methods, Traversability Representation
Abstract: In this paper, we present the Circular Accessible Depth (CAD), a robust traversability representation for an unmanned ground vehicle (UGV) to learn traversability in various scenarios containing irregular obstacles. To predict CAD, we propose a neural network, namely CADNet, with an attention-based multi-frame point cloud fusion module, Stability-Attention Module (SAM), to encode the spatial features from point clouds captured by LiDAR. CAD is designed based on the polar coordinate system and focuses on predicting the border of traversable area. Since it encodes the spatial information of the surrounding environment, which enables a semi-supervised learning for the CADNet, and thus desirably avoids annotating a large amount of data. Extensive experiments demonstrate that CAD outperforms baselines in terms of robustness and precision. We also implement our method on a real UGV and show that it performs well in real-world scenarios.
|
|
16:30-18:00, Paper ThCT31-NT.3 | Add to My Program |
Robots That Can See: Leveraging Human Pose for Trajectory Prediction |
|
Salzmann, Tim | Technical University Munich |
Chiang, Hao-Tien | Google Deepmind |
Ryll, Markus | Technical University Munich |
Sadigh, Dorsa | Stanford University |
Parada, Carolina | Google |
Bewley, Alex | Google |
Keywords: Autonomous Vehicle Navigation, Deep Learning Methods, Human-Aware Motion Planning
Abstract: Anticipating the motion of all humans in dynamic environments such as homes and offices is critical to enable safe and effective robot navigation. Such spaces remain challenging as humans do not follow strict rules of motion and there are often multiple occluded entry points such as corners and doors that create opportunities for sudden encounters. In this work, we present a Transformer based architecture to predict human future trajectories in human-centric environments from input features including human positions, head orientations, and 3D skeletal keypoints from onboard in-the-wild sensory information. The resulting model captures the inherent uncertainty for future human trajectory prediction and achieves state-of-the-art performance on common prediction benchmarks and a human tracking dataset captured from a mobile robot adapted for the prediction task. Furthermore, we identify new agents with limited historical data as a major contributor to error and demonstrate the complimentary nature of 3D skeletal poses in reducing prediction error in such challenging scenarios.
|
|
16:30-18:00, Paper ThCT31-NT.4 | Add to My Program |
Uncertainty-Aware Reinforcement Learning for Autonomous Driving with Multimodal Digital Driver Guidance |
|
Huang, Wenhui | NanYang Technological University |
Shan, Zitong | Jilin University |
Lou, Shanhe | Nanyang Technological University |
Lv, Chen | Nanyang Technological University |
Keywords: Autonomous Vehicle Navigation, Reinforcement Learning, Human Factors and Human-in-the-Loop
Abstract: While existing Learning from intervention (LfI) methods within the human-in-the-loop reinforcement learning (HiL-RL) paradigm mainly operate on the assumption that human policies are homogeneous and deterministic with low variance, natural human driving behaviors are multimodal with intrinsic uncertainties, and hence, accommodating diverse human capabilities is significant for its practical applications. This work proposes an enhanced LfI approach for learning the optimal RL policy by leveraging multimodal human behaviors in the setting of N-driver concurrent interventions. Specifically, we first learn the N number of human digital drivers from the multi-human demonstration dataset, wherein each driver possesses its own policy distribution. Then, the post-trained drivers will be kept in the training loop of the RL algorithms, providing diverse driving guidance whenever the intervention is required. Additionally, to better utilize the provided guidance, we augment the RL regarding the fundamental architecture and optimization objectives to facilitate the proposed uncertainty-aware reinforcement learning (UnaRL) algorithm. The pro-posed approach, which won 2nd place in the Alibaba Future Car Innovation Challenge 2022, is solidly compared in two challenging autonomous driving scenarios against state-of-the-art (SOTA) LfI baselines, and results of both simulation and real-world experiment confirm the superiority of our method in terms of learning robustness and driving performance. Videos and source code are provided.
|
|
16:30-18:00, Paper ThCT31-NT.5 | Add to My Program |
Boosting Offline Reinforcement Learning for Autonomous Driving with Hierarchical Latent Skills |
|
Li, Zenan | Tsinghua University |
Nie, Fan | Shanghai Jiao Tong University |
Sun, Qiao | Shanghai QiZhi Institute |
Da, Fang | QCraft |
Zhao, Hang | Tsinghua University |
Keywords: Autonomous Vehicle Navigation, Reinforcement Learning, Integrated Planning and Learning
Abstract: Vehicle planning is receiving increasing attention with the emergence of diverse driving simulators and large-scale driving datasets. While offline reinforcement learning (RL) is well suited for these safety-critical tasks, it still struggles to plan over extended periods. In this work, we present a skill-based framework that enhances offline RL to overcome the long-horizon vehicle planning challenge. Specifically, we design a variational autoencoder (VAE) to learn skills from offline demonstrations. To mitigate the posterior collapse problem, we introduce a two-branch sequence encoder to capture both discrete options and continuous variations of the complex driving skills. The final policy treats learned skills as actions and can be trained by any off-the-shelf offline RL algorithms. This facilitates a shift in focus from per-step actions to temporally extended skills, thereby enabling long-term reasoning into the future. Extensive results on CARLA prove that our model consistently outperforms strong baselines at both training and new scenarios. Additional visualizations and experiments demonstrate the interpretability and transferability of extracted skills.
|
|
16:30-18:00, Paper ThCT31-NT.6 | Add to My Program |
A Framework for Real-Time Generation of Multi-Directional Traversability Maps in Unstructured Environments |
|
Huang, Tao | Chongqing University |
Wang, Gang | Chongqing University |
Liu, Hongliang | Chongqing University |
Luo, Jun | Chongqing University |
Wu, Lang | Huazhong University of Science and Technology |
Zhu, Tao | Chongqing University |
Pu, Huayan | Shanghai University |
Luo, Jun | Chongqing University |
Wang, Shuxin | Tianjin University |
Keywords: Foundations of Automation, Autonomous Vehicle Navigation, Task Planning
Abstract: In complex unstructured environments, accurate terrain traversability analysis is a fundamental requirement for the successful execution of any movements of ground robots, especially given that terrain traversability often exhibits anisotropy. However, the difficulty in obtaining multi-directional terrain labels hinders the emergence of end-to-end multi-directional traversability network. This paper introduces a framework for real-time multi-directional traversability maps (MTraMap) generation tailored for unstructured environments. It involves pre-training a uni-directional traversability classifier, termed UniTraT, through self-supervised learning using ground robot travel simulation. Furthermore, it employs Uni-directional to Multi-directional Traversability Distillation (UMTraDistill) to distill a multi-directional traversability network, termed MultiTCNN, which is capable of directly generating MTraMap. We evaluated both networks on our traversability dataset, achieving an 89% accuracy in terrain traversability classification with the UniTraT. Compared to UniTraT, the accuracy of the MultiTCNN distilled via UMTraDistill only decreases by 1.8%, and it can process 10 m × 10 m elevation map at a speed of 74 fps. Field robotics experiments were also conducted and showed that MultiTCNN can generate MTraMap of the surrounding 20 m × 20 m environment at a rate of 9.39 fps, with a slight reduction of 0.61 fps compared to the lidar data publishing rate, and the generated MTraMap can clearly delineate the multi-directional traversability of the surrounding environments.
|
|
16:30-18:00, Paper ThCT31-NT.7 | Add to My Program |
Cross-Modal Registration Using Adaptive Modeling in Infrastructure-Based Vehicle Localization |
|
Wang, Fei | Shanghai JiaoTong University |
He, Yuesheng | Shanghai Jiao Tong University |
Zhuang, Hanyang | Shanghai Jiao Tong University |
Yang, Chenxi | Shanghai Jiao Tong University |
Yang, Ming | Shanghai Jiao Tong University |
Keywords: Intelligent Transportation Systems, Autonomous Vehicle Navigation
Abstract: Infrastructure-based vehicle localization, in comparison to single-agent approaches, offers several advantages including reduced system cost, extended perception range, enhanced data fusion capabilities, and energy savings. Many conventional approaches impose limitations on the types of objects due to the need for specific object-end modifications, such as applying perceptual markers like color-labeled plates and reflective balls. LiDAR presents a solution in terms of object arbitrariness, as it addresses the challenges of feature-free object modeling and continuous registration. However, achieving complete environmental coverage with LiDAR remains prohibitively expensive, particularly in extensive areas. Hence, this study proposes a cross-modal localization approach using adaptive modeling, employing LiDAR for object modeling and cost-effective sensor cameras for object tracking through image-point-cloud registration. Accurate correspondence between the model and observation can be estimated in real-time. The experiments are conducted in a typical scenario that requires adaptive modeling: Autonomous Valet Parking (AVP). Results demonstrate that the proposed system achieves comparable performance with significantly reduced system costs, highlighting its potential for large-scale deployment.
|
|
16:30-18:00, Paper ThCT31-NT.8 | Add to My Program |
Efficient Gas Source Active Search in Unfamiliar Environments |
|
Zhai, Yu | China University of Mining and Technology |
Miao, Yanzi | China University of Mining and Technology |
Keywords: Service Robotics, Autonomous Vehicle Navigation
Abstract: Searching Gas Source actively and efficiently in unknown hazard environments is an important but challenging issue. Using mobile robots to autonomously search and navigate to gas source location provides a promising way. Existing methods are mostly based on the modularization framework which investigates the gas-source search and robot navigation tasks independently, leading to a decoupled approach that results in higher collision risks and lower navigation efficiency. Moreover, existing robot navigation techniques grapple with the intricacies of navigating through unknown environments. To tackle these complexities, we introduce an integrated framework that merges gas source localization with robot navigation. This unified structure, underpinned by an end-to-end learning approach, resolves the inherent conflicts between gas exploration and collision avoidance. Our approach aggregates the local observations (raw 3D-LiDAR data) and the expert guidance information (gas distribution), and directly generates navigation actions by implementing the reinforcement learning with a novel reward function based on region dynamic guidances, thus effectively addressing the challenges of active gas source searching in unknown environments. Simulation results underscore the adaptability of our method to diverse unknown environments, along with its superior gas source searching capabilities compared to conventional approaches. Finally, we conduct real-world experiments to demonstrate our feasibility.
|
|
16:30-18:00, Paper ThCT31-NT.9 | Add to My Program |
RGBD-Based Image Goal Navigation with Pose Drift: A Topo-Metric Graph Based Approach |
|
Ye, Shuhao | Zhejiang University |
Cui, Yuxiang | Zhejiang University |
Sha, Hao | Zhejiang University |
Lu, Sha | Zhejiang University |
Zhang, Yu | Zhejiang University |
Xiong, Rong | Zhejiang University |
Wang, Yue | Zhejiang University |
Keywords: Service Robotics, Autonomous Vehicle Navigation, Domestic Robotics
Abstract: Image-goal navigation in unknown environments with sensor error is of considerable difficulty for autonomous robots. In this paper, we propose a drift-resisting topo-metric graph to map the environment and localize the robot using only relative poses. The error-sharing mechanism under this representation effectively reduces the impact of accumulated drifts commonly encountered in navigation tasks. A Reinforcement Learning based policy was proposed for sub-goal selection on this topo-metric graph, which improves navigation efficiency by handling task-driven features taking both image correlation and topological layout into account. We adopt a modular system design with this map representation and graph policy, leaving the low-level motion planning problems to classical controllers for better stability and generalizability. Experimental results demonstrate that our method can achieve robust navigation performance in a variety of unknown environments and even 50% higher success rate over existing methods in complex environments with odometry drift.
|
|
ThCT32-NT Oral Session, NT-G8 |
Add to My Program |
Image-Based Navigation II |
|
|
Chair: Zhao, Hao | Tsinghua University |
Co-Chair: Yu, Hongkai | Cleveland State University |
|
16:30-18:00, Paper ThCT32-NT.1 | Add to My Program |
MonoOcc: Digging into Monocular Semantic Occupancy Prediction |
|
Zheng, Yupeng | School of Artificial Intelligence, University of Chinese Academy |
Li, Xiang | Department of Computer Science and Technology, Tsinghua Universi |
Li, Pengfei | Institute for AI Industry Research (AIR), Tsinghua University |
Zheng, Yuhang | Beihang University |
Jin, Bu | Institute of Automation, Chinese Academy of Sciences |
Zhong, Chengliang | Tsinghua University |
Long, Xiaoxiao | The University of Hong Kong |
Zhao, Hao | Tsinghua University |
Zhang, Qichao | Institute of Automation, Chinese Academy of Sciences |
Keywords: Computer Vision for Transportation, Deep Learning for Visual Perception, Visual Learning
Abstract: Monocular Semantic Occupancy Prediction aims to infer the complete 3D geometry and semantic information of scenes from only 2D images. It has garnered significant attention, particularly due to its potential to enhance the 3D perception of autonomous vehicles. However, existing methods rely on a complex cascaded framework with relatively limited information to restore 3D scenes, including a dependency on supervision solely on the whole network's output, single-frame input, and the utilization of a small backbone. These challenges, in turn, hinder the optimization of the framework and yield inferior prediction results, particularly concerning smaller and long-tailed objects. To address these issues, we propose MonoOcc. In particular, we (i) improve the monocular occupancy prediction framework by proposing an auxiliary semantic loss as supervision to the shallow layers of the framework and an image-conditioned cross-attention module to refine voxel features with visual clues, and (ii) employ a distillation module that transfers temporal information and richer knowledge from a larger image backbone to the monocular semantic occupancy prediction framework with low cost of hardware. With these advantages, our method yields state-of-the-art performance on the camera-based SemanticKITTI Scene Completion benchmark. Codes and models can be accessed at at https://github.com/ucaszyp/MonoOcc.
|
|
16:30-18:00, Paper ThCT32-NT.2 | Add to My Program |
ShaSTA: Modeling Shape and Spatio-Temporal Affinities for 3D Multi-Object Tracking |
|
Sadjadpour, Tara | Stanford University |
Li, Jie | Toyota Research Institute |
Ambrus, Rares | Toyota Research Institute |
Bohg, Jeannette | Stanford University |
Keywords: Computer Vision for Transportation, Deep Learning for Visual Perception, Visual Tracking
Abstract: Multi-object tracking (MOT) is a cornerstone capability of any robotic system. Tracking quality is largely dependent on the quality of input detections. In many applications, such as autonomous driving, it is preferable to over-detect objects to avoid catastrophic outcomes due to missed detections. As a result, current state-of-the-art 3D detectors produce high rates of false-positives to ensure a low number of false-negatives. This can negatively affect tracking by making data association and track lifecycle management more challenging. Additionally, occasional false-negative detections due to difficult scenarios like occlusions can harm tracking performance. To address these issues in a unified framework, we propose ShaSTA which learns shape and spatio-temporal affinities between tracks and detections in consecutive frames. The affinity is a probabilistic matching that leads to robust data association, track lifecycle management, false-positive elimination, false-negative propagation, and sequential track confidence refinement. We offer the first self-contained framework that addresses all aspects of the 3D MOT problem. We quantitatively evaluate ShaSTA on the nuScenes tracking benchmark with 5 metrics, including the most common tracking accuracy metric called AMOTA, to demonstrate how ShaSTA may impact the ultimate goal of an autonomous mobile agent. ShaSTA achieves 1st place amongst LiDAR-only trackers that use CenterPoint detections. The open-source code for reproducing
|
|
16:30-18:00, Paper ThCT32-NT.3 | Add to My Program |
Breaking Data Silos: Cross-Domain Learning for Multi-Agent Perception from Independent Private Sources |
|
Li, Jinlong | Cleveland State University |
Li, Baolu | Cleveland State University |
Liu, Xinyu | Cleveland State University |
Xu, Runsheng | UCLA |
Ma, Jiaqi | University of California, Los Angeles |
Yu, Hongkai | Cleveland State University |
Keywords: Computer Vision for Transportation, Intelligent Transportation Systems
Abstract: The diverse agents in multi-agent perception systems may be from different companies. Each company might use the identical classic neural network architecture based encoder for feature extraction. However, the data source to train the various agents is independent and private in each company, leading to the Distribution Gap of different private data for training distinct agents in multi-agent perception system. The data silos by the above Distribution Gap could result in a significant performance decline in multi-agent perception. In this paper, we thoroughly examine the impact of the distribution gap on existing multi-agent perception systems. To break the data silos, we introduce the Feature Distribution-aware Aggregation (FDA) framework for cross-domain learning to mitigate the above Distribution Gap in multi-agent perception. FDA comprises two key components: Learnable Feature Compensation Module and Distribution-aware Statistical Consistency Module, both aimed at enhancing intermediate features to minimize the distribution gap among multi-agent features. Intensive experiments on the public OPV2V and V2XSet datasets underscore FDA’s effectiveness in point cloud-based 3D object detection, presenting it as an invaluable augmentation to existing multi-agent perception systems. The code is available at https://github.com/jinlong17/BDS-V2V.
|
|
16:30-18:00, Paper ThCT32-NT.4 | Add to My Program |
AdvGPS: Adversarial GPS for Multi-Agent Perception Attack |
|
Li, Jinlong | Cleveland State University |
Li, Baolu | Cleveland State University |
Liu, Xinyu | Cleveland State University |
Fang, Jianwu | Xian Jiaotong University |
Juefei-Xu, Felix | Meta AI |
Guo, Qing | Agency for Science, Technology and Research (A*STAR) |
Yu, Hongkai | Cleveland State University |
Keywords: Computer Vision for Transportation, Intelligent Transportation Systems, Autonomous Agents
Abstract: The multi-agent perception system collects visual data from sensors located on various agents and leverages their relative poses determined by GPS signals to effectively fuse information, mitigating the limitations of single-agent sensing, such as occlusion. However, the precision of GPS signals can be influenced by a range of factors, including wireless transmission and obstructions like buildings. Given the pivotal role of GPS signals in perception fusion and the potential for various interference, it becomes imperative to investigate whether specific GPS signals can easily mislead the multiagent perception system. To address this concern, we frame the task as an adversarial attack challenge and introduce ADVGPS, a method capable of generating adversarial GPS signals which are also stealthy for individual agents within the system, significantly reducing object detection accuracy. To enhance the success rates of these attacks in a black-box scenario, we introduce three types of statistically sensitive natural discrepancies: appearance-based discrepancy, distribution-based discrepancy, and task-aware discrepancy. Our extensive experiments on the OPV2V dataset demonstrate that these attacks substantially undermine the performance of state-of-the-art methods, showcasing remarkable transferability across different point cloud based 3D detection systems. This alarming revelation underscores the pressing need to address security implications within multi-agent perception systems, thereby underscoring a critical area of research. The code is available at https://github.com/jinlong17/AdvGPS.
|
|
16:30-18:00, Paper ThCT32-NT.5 | Add to My Program |
Towards Motion Forecasting with Real-World Perception Inputs: Are End-To-End Approaches Competitive? |
|
Xu, Yihong | Valeo.ai |
Chambon, Loick | Valeo |
Zablocki, Eloi | Valeo |
Chen, Mickaël | Valeo |
Alahi, Alexandre | EPFL |
Cord, Matthieu | Sorbonne Université, Valeo.ai |
Perez, Patrick | Valeo |
Keywords: Computer Vision for Transportation, Intelligent Transportation Systems, Computer Vision for Automation
Abstract: Motion forecasting is crucial in enabling autonomous vehicles to anticipate the future trajectories of surrounding agents. To do so, it requires solving mapping, detection, tracking, and then forecasting problems, in a multi-step pipeline. In this complex system, advances in conventional forecasting methods have been made using curated data, i.e., with the assumption of perfect maps, detection, and tracking. This paradigm, however, ignores any errors from upstream modules. Meanwhile, an emerging end-to-end paradigm, that tightly integrates the perception and forecasting architectures into joint training, promises to solve this issue. So far, however, the evaluation protocols between the two methods were incompatible and their comparison was not possible. In fact, and perhaps surprisingly, conventional forecasting methods are usually not trained nor tested in real-world pipelines (e.g., with upstream detection, tracking, and mapping modules). In this work, we aim to bring forecasting models closer to real-world deployment. First, we propose a unified evaluation pipeline for forecasting methods with real-world perception inputs, allowing us to compare the performance of conventional and end-to-end methods for the first time. Second, our in-depth study uncovers a substantial performance gap when transitioning from curated to perception-based data. In particular, we show that this gap (1) stems not only from differences in precision but also from the nature of imperfect inputs provided by perception modules, and that (2) is not trivially reduced by simply finetuning on perception outputs. Based on extensive experiments, we provide recommendations for critical areas that require improvement and guidance towards more robust motion forecasting in the real world. We will release an evaluation library to benchmark models under standardized and practical conditions.
|
|
16:30-18:00, Paper ThCT32-NT.6 | Add to My Program |
QUEST: Query Stream for Practical Cooperative Perception |
|
Fan, Siqi | Tsinghua University |
Yu, Haibao | The University of Hong Kong |
Yang, Wenxian | Tsinghua University |
Yuan, Jirui | Tsinghua University |
Nie, Zaiqing | Tsinghua University |
Keywords: Computer Vision for Transportation, Intelligent Transportation Systems, Multi-Robot Systems
Abstract: Cooperative perception can effectively enhance individual perception performance by providing additional viewpoint and expanding the sensing field. Existing cooperation paradigms are either interpretable (result cooperation) or flexible (feature cooperation). In this paper, we propose the concept of query cooperation to enable interpretable instance-level flexible feature interaction. To specifically explain the concept, we propose a cooperative perception framework, termed QUEST, which let query stream flow among agents. The cross-agent queries are interacted via fusion for co-aware instances and complementation for individual unaware instances. Taking camera-based vehicle-infrastructure perception as a typical practical application scene, the experimental results on the real-world dataset, DAIR-V2X-Seq, demonstrate the effectiveness of QUEST and further reveal the advantage of the query cooperation paradigm on transmission flexibility and robustness to packet dropout. We hope our work can further facilitate the cross-agent representation interaction for better cooperative perception in practice.
|
|
16:30-18:00, Paper ThCT32-NT.7 | Add to My Program |
Towards Visibility Estimation and Noise-Distribution-Based Defogging for LiDAR in Autonomous Driving |
|
Zhan, Jie | Huazhong University of Science and Technology |
Duan, Yucong | Huazhong University of Science and Technology |
Ding, Junfeng | Huazhong University of Science and Technology |
Hu, Xuzhong | Huazhong University of Science and Technology |
Huang, Xiao | China Ship Development and Design Center |
Ma, Jie | Huazhong University of Science and Technology |
Keywords: Computer Vision for Transportation, Range Sensing, Intelligent Transportation Systems
Abstract: Point clouds play a crucial role in robots and intelligent vehicles. Noise caused by fog droplets seriously degrades the quality of point clouds. Previous research has shown that the extent of degradation is correlated with visibility. The fog attenuation coefficient is associated with visibility. In light of this background, this paper proposes a noise-distribution based defogging method for point clouds. Our approach hinges on the estimation of fog attenuation coefficient, facilitated by road-based prior knowledge. Subsequently, our method integrates the fog-induced noise distribution inferred from the LiDAR imaging model with the spatially non-uniform distribution of point clouds caused by LiDAR structure. The fused results are input to a statistical filter based on the relative sparsity of noise to achieve defogging. This paper is one of the early works focusing on point cloud defogging. It's core insight lies in the estimation of the attenuation coefficient and the employment of fog-induced noise distribution for defogging. Experiments demonstrate that our method can accurately mitigate the impact of fog and meanwhile enhance the performance of 3D object detection network.
|
|
16:30-18:00, Paper ThCT32-NT.8 | Add to My Program |
CenterCoop: Center-Based Feature Aggregation for Communication-Efficient Vehicle-Infrastructure Cooperative 3D Object Detection |
|
Zhou, Linyi | Fudan University |
Gan, Zhongxue | Fudan University |
Fan, Jiayuan | Fudan University |
Keywords: Computer Vision for Transportation, Sensor Fusion, Object Detection, Segmentation and Categorization
Abstract: Vehicle-Infrastructure Cooperative (VIC) 3D object detection is a challenging task for balancing communication bandwidth and detection performance simultaneously. Intermediate fusion is recently studied to reach a better balance by transferring feature maps. Existing works mainly perform spatial-wise fusion and adopt feature compression to alleviate bandwidth cost by high-resolution feature maps, which would inevitably lead to information loss. Besides, overlapping observations between the two sensors would lead to near-duplicate detections, making trivial improvement to cooperative task while causing unnecessary bandwidth cost. To mitigate these problems, we propose a novel feature aggregation framework called CenterCoop, which first encodes the informative clues from the whole Bird's Eye View (BEV) context into compact center representations, enabling feature aggregation at sequence-level to significantly reduce the communication cost. Furthermore, to tackle the redundancy of transmitted data, we incorporate communication-aware regularization which enforces the network to extract complementary and beneficial cues for collaboration task. From an information-theoretic perspective, the proposed auxiliary constraints facilitate cooperative-view independence mining, resulting in enlarged perception range within the limited bandwidth. Extensive experiments on the DAIR-V2X dataset demonstrate the superior performance-bandwidth trade-off of CenterCoop, which achieves the state-of
|
|
16:30-18:00, Paper ThCT32-NT.9 | Add to My Program |
Probabilistic 3D Multi-Object Cooperative Tracking for Autonomous Driving Via Differentiable Multi-Sensor Kalman Filter |
|
Chiu, Hsu-kuang | Carnegie Mellon University |
Wang, Chien-Yi | NVIDIA |
Chen, Min-Hung | NVIDIA |
Smith, Stephen F. | Carnegie Mellon University |
Keywords: Computer Vision for Transportation, Visual Tracking, Deep Learning for Visual Perception
Abstract: Current state-of-the-art autonomous driving vehicles mainly rely on each individual sensor system to perform perception tasks. Such a framework's reliability could be limited by occlusion or sensor failure. To address this issue, more recent research proposes using vehicle-to-vehicle (V2V) communication to share perception information with others. However, most relevant works focus only on cooperative detection and leave cooperative tracking an underexplored research field. A few recent datasets, such as V2V4Real, provide 3D multi-object cooperative tracking benchmarks. However, their proposed methods mainly use cooperative detection results as input to a standard single-sensor Kalman Filter-based tracking algorithm. In their approach, the measurement uncertainty of different sensors from different connected autonomous vehicles (CAVs) may not be properly estimated to utilize the theoretical optimality property of Kalman Filter-based tracking algorithms. In this paper, we propose a novel 3D multi-object cooperative tracking algorithm for autonomous driving via a differentiable multi-sensor Kalman Filter. Our algorithm learns to estimate measurement uncertainty for each detection that can better utilize the theoretical property of Kalman Filter-based tracking methods. The experiment results show that our algorithm improves the tracking accuracy by 17% with only 0.037x communication costs compared with the state-of-the-art method in V2V4Real.
|
|
ThCT33-CC Oral Session, CC-301 |
Add to My Program |
Motion Analysis and Planning |
|
|
Chair: Wilhelm, Nikolas Jakob | Technical University of Munich |
Co-Chair: Johnson, Aaron M. | Carnegie Mellon University |
|
16:30-18:00, Paper ThCT33-CC.1 | Add to My Program |
Design and Implementation of a Robotic Testbench for Analyzing Pincer Grip Execution in Human Specimen Hands |
|
Wilhelm, Nikolas Jakob | Technical University of Munich |
Glowalla, Claudio | Department of Orthopaedics and Sports Orthopaedics, Klinikum Rec |
Haddadin, Sami | Technical University of Munich |
Schote, Julian | Department of Orthopaedics and Sports Orthopaedics, Klinikum Rec |
Hoeppner, Hannes | Berliner Hochschule Für Technik, BHT |
van der Smagt, Patrick | Volkswagen Group |
Karl, Maximilian | Volkswagen AG |
Burgkart, Rainer | Technische Universität München |
Keywords: Human and Humanoid Motion Analysis and Synthesis, Datasets for Human Motion
Abstract: This study presents an innovative test rig engineered to explore the kinematic and viscoelastic characteristics of human specimen hands. The rig features eight force-controlled motors linked to muscle tendons, enabling precise stimulation of hand specimens. Hand movements are monitored through an optical tracking system, while a force-torque sensor quantifies the resultant fingertip loads. Employing this setup, we successfully demonstrated a pincer grip using a cadaver hand and measured both muscle forces and grip strength. Our results reveal a nonlinear relationship between tendon forces and grip strength, which can be modeled by an exponential fit. This investigation serves as a nexus between biomechanical and robotics-focused research, providing critical insights for the advancement of robotic hand actuation and therapeutic interventions.
|
|
16:30-18:00, Paper ThCT33-CC.2 | Add to My Program |
Human Gait Cost Function Varies with Walking Speed: An Inverse Optimal Control Study |
|
Weng, Jiacheng | University of Waterloo |
Hashemi, Ehsan | University of Alberta |
Arami, Arash | University of Waterloo |
Keywords: Human and Humanoid Motion Analysis and Synthesis, Optimization and Optimal Control, Human-Centered Robotics
Abstract: This work investigates the optimal cost function composition for human gait at different walking speeds. Kinematic and kinetic data for walking at four walking speeds were collected from five able-bodied individuals. The data was then used to recover optimal cost functions in a predictive simulation environment with musculoskeletal models. 20 inverse optimal control (IOC) problems were solved for cost function weight tuning using the previously developed and validated Adaptive Reference IOC (AR-IOC) algorithm. Given the walking speed range examined (0.6-1.5m/s), the converged cost function weights suggest that the increase in walking speed attributes to a reduction of foot sliding penalty weight and weight increase for the center of mass (CoM) acceleration and stability as confirmed by several experiments. Furthermore, we did not observe any significant weight shift in effort reduction between the upper and the lower body with respect to walking speed. The obtained results from this study can be used in a toolbox for obtaining subject- and task-specific cost functions and assisting the development of personalized rehabilitation technologies.
|
|
16:30-18:00, Paper ThCT33-CC.3 | Add to My Program |
LORIS: A Lightweight Free-Climbing Robot for Extreme Terrain Exploration |
|
Nadan, Paul | Carnegie Mellon University |
Backus, Spencer | NASA Jet Propulsion Lab |
Johnson, Aaron M. | Carnegie Mellon University |
Keywords: Climbing Robots, Grippers and Other End-Effectors, Compliant Joints and Mechanisms
Abstract: Climbing robots can investigate scientifically valuable sites that conventional rovers cannot access due to steep terrain features. Robots equipped with microspine grippers are particularly well-suited to ascending rocky cliff faces, but most existing designs are either large and slow or limited to relatively flat surfaces such as walls. We present a novel free-climbing robot to bridge this gap through innovations in gripper design and force control. Fully passive grippers and wrist joints allow secure grasping while reducing mass and complexity. Forces are distributed among the robot's grippers using an optimization-based control strategy to minimize the risk of unexpected detachment. The robot prototype has demonstrated vertical climbing on both flat cinder block walls and uneven rock surfaces in full Earth gravity.
|
|
16:30-18:00, Paper ThCT33-CC.4 | Add to My Program |
Floating-Base Manipulation on Zero-Perturbation Manifolds |
|
Bittner, Brian | JHUAPL |
Reid, Jason | Jet Propulsion Laboratory |
Wolfe, Kevin | Johns Hopkins University Applied Physics Laboratory |
Keywords: Nonholonomic Motion Planning, Bimanual Manipulation, Mobile Manipulation
Abstract: To achieve high-dexterity motion planning on floating-base systems, the base dynamics induced by arm motions must be treated carefully. In general, it is a significant challenge to establish a fixed-base frame during tasking due to forces and torques on the base that arise directly from arm motions (e.g. arm drag in low Reynolds environments and arm momentum in high Reynolds environments). While thrusters can in theory be used to regulate the vehicle pose, it is often insufficient to establish a stable pose for precise tasking, whether the cause be due to underactuation, modeling inaccuracy, suboptimal control parameters, or insufficient power. We propose a solution that asks the thrusters to do less high bandwidth perturbation correction by planning arm motions that induce zero perturbation on the base. We are able to cast our motion planner as a nonholonomic rapidly-exploring random tree (RRT) by representing the floating-base dynamics as pfaffian constraints on joint velocity. These constraints guide the manipulators to move on zero-perturbation manifolds (which inhabit a subspace of the tangent space of the internal configuration space). To invoke this representation (termed a perturbation map) we assume the body velocity (perturbation) of the base to be a joint-defined linear mapping of joint velocity and describe situations where this assumption is realistic (including underwater, aerial, and orbital environments). The core insight of this work is that when perturbation of the floating-base has affine structure with respect to joint velocity, it provides the system a class of kinematic reduction that permits the use of sample-based motion planners (specifically a nonholonomic RRT). We show that this allows rapid, exploration-geared motion planning for high degree of freedom systems in obstacle rich environments, even on floating-base systems with nontrivial dynamics.
|
|
16:30-18:00, Paper ThCT33-CC.5 | Add to My Program |
Towards Geometric Motion Planning for High-Dimensional Systems: Gait-Based Coordinate Optimization and Local Metrics |
|
Yang, Yanhao | Oregon State University |
Bass, Capprin | Oregon State University |
Hatton, Ross | Oregon State University |
Keywords: Nonholonomic Motion Planning, Whole-Body Motion Planning and Control, Biologically-Inspired Robots
Abstract: Geometric motion planning offers effective and interpretable gait analysis and optimization tools for locomoting systems. However, due to the curse of dimensionality in coordinate optimization, a key component of geometric motion planning, it is almost infeasible to apply current geometric motion planning to high-dimensional systems. In this paper, we propose a gait-based coordinate optimization method that overcomes the curse of dimensionality. We also identify a unified geometric representation of locomotion by generalizing various nonholonomic constraints into local metrics. By combining these two approaches, we take a step towards geometric motion planning for high-dimensional systems. We test our method in two classes of high-dimensional systems - low Reynolds number swimmers and free-falling Cassie - with up to 11-dimensional shape variables. The resulting optimal gait in the high-dimensional system shows better efficiency compared to that of the reduced-order model. Furthermore, we provide a geometric optimality interpretation of the optimal gait.
|
|
ThCL-EX Poster Session, Exhibition Hall |
Add to My Program |
Late Breaking Results Poster IX |
|
|
|
16:30-18:00, Paper ThCL-EX.1 | Add to My Program |
To Help or Not to Help: LLM-Based Attentive Support for Human-Robot Group Interactions |
|
Tanneberg, Daniel | Honda Research Institute |
Ocker, Felix | Honda Research Institute Europe |
Hasler, Stephan | Honda Research Institute Europe |
Deigmoeller, Joerg | Honda Research Institute Europe |
Belardinelli, Anna | Honda Research Institute Europe |
Wang, Chao | Honda Research Institute Europe GmbH |
Wersing, Heiko | Honda Research Institute Europe |
Sendhoff, Bernhard | Honda Research Institute Europe GmbH |
Gienger, Michael | Honda Research Institute Europe |
Keywords: Robot Companions, Physical Human-Robot Interaction, Social HRI
Abstract: How can a robot provide unobtrusive physical support within a group of humans? We present Attentive Support, a novel interaction concept for robots to support a group of humans. It combines scene perception, dialogue acquisition, situation understanding, and behavior generation with the common-sense reasoning capabilities of Large Language Models. In addition to following user instructions, Attentive Support is capable of deciding when and how to support the humans, and when to remain silent to not disturb the group. With a diverse set of scenarios, we show and evaluate the robot's attentive behavior, which supports and helps the humans when required, while not disturbing if no help is needed.
|
|
16:30-18:00, Paper ThCL-EX.2 | Add to My Program |
Disaster Robotics Category: Standard Disaster Robotics Challenge in Would Robot Summit 2025 |
|
Kinugasa, Tetsuya | Okayama University of Science |
Kimura, Tetsuya | Nagaoka University of Technology |
Sato, Noritaka | Nagoya Institute of Technology |
Suzuki, Soichiro | JAEA |
Keywords: Competition, Aerial Systems: Applications, Field Robots
Abstract: Drones are a technologically competitive field and are essential for future industrial promotion. The development of drone technology includes applications in infrastructure and plant inspection as well as disaster prevention and response. It is important to clearly define the direction of technology development by understanding the social implementation needs and to develop and popularize standard test methods (STM) for objectively evaluating performance. On the other hand, regarding the procedure for search and rescue in disasters, the INSARAG has indicated guidelines as international standards, and standardization for the operation of drones in police, firefighting, and emergency services is being established by the NFPA. Therefore, the STM for drones, which are rapidly being implemented in society for infrastructure inspection and disaster response, is required. Against this backdrop, the F-REI will host the Disaster Robotics Category as one of the World Robot Summit 2025 competitions. We are organizing the Standard Disaster Robotics Challenge, which will be a featured competition in this category. This report presents the STM for drones used in the challenge, developed based on the NIST-STM apparatus and the ReAMo specimens. It also provides an overview of the evaluation criteria under severe environmental conditions, including information gathering and processing capabilities and autonomy, which are crucial during disaster response.
|
|
16:30-18:00, Paper ThCL-EX.3 | Add to My Program |
Novel Three-Fingered Gripper Designs for Bussing Table Service |
|
Choi, Jeongseok | Hanyang University |
Shin, Jeongpil | Hanyang University |
Kim, YoungHwan | HANYANG UNIVERSITY |
Lee, Wonhyoung | Hanyang Uninversity |
Won, Jeeho | Hanyang University |
Lee, Minsu | Hanyang University |
Seo, TaeWon | Hanyang University |
Keywords: Grippers and Other End-Effectors, Compliant Assembly, Service Robotics
Abstract: Depending on the design of the grippers, the performance of the robot is different. In this paper, we present novel three-fingered gripper designs for bussing table service. The main objective of this study is to grasp and release plates, dishes, and cups with three fingers vertically. To accommodate various sizes and weights of dishes, we adopted compliant fingers. The first proposed gripper design focuses on the kinematic structure, utilizing springs, linkages, and stoppers. As the shapes of the dishes are various, the configuration of the fingers adjusts to firmly grasp them. The second proposed gripper design is based on belts and kinematic structure. Unlike the first proposed design, the second one is thinner and lighter, achieved by replacing some of the kinematic structures with soft materials which is belt. Consequently, both proposed grippers were demonstrated to grasp and release various sizes and weights of dishes effectively. Moreover, the first proposed gripper design, based on kinematic structure, allows for the handling of heavier dishes due to the reliability of its kinematic structure compared to the second one. The second proposed gripper, based on soft materials such as belts, is suitable for grasping dishes of various sizes and shapes. Therefore, through the design and demonstration of these novel three-fingered gripper designs for bussing table service, we studied its effectiveness in handling dishes of various sizes and weights in this paper.
|
|
16:30-18:00, Paper ThCL-EX.4 | Add to My Program |
Enhancing Clarity for Sky-High Insights: Drone-Enhanced Aerial Object Detection with YOLOv5 and Super Resolution |
|
Nihal, Md Ragib Amin | Tokyo Institute of Technology |
Yen, Benjamin | Tokyo Institute of Technology |
Itoyama, Katsutoshi | Tokyo Institute of Technology |
Nakadai, Kazuhiro | Tokyo Institute of Technology |
Keywords: Object Detection, Segmentation and Categorization, Deep Learning for Visual Perception, Aerial Systems: Applications
Abstract: The rise in drone and satellite usage has spiked the demand for precise object detection in aerial images. Traditional models falter with small, clustered objects common in these scenarios. To tackle this, we introduce a novel method merging super-resolution with a modified YOLOv5 design, optimized for lightweight operation and real-time use. Our model, tested on datasets like VisDrone, incorporates Transformer encoder blocks for enhanced global and local context awareness, significantly boosting detection in cluttered environments. This approach not only heightens accuracy but also maintains resource efficiency, ideal for real-time scenarios. Results reveal its prowess in pinpointing small, grouped objects, with a notable 52.5% mAP on VisDrone, surpassing existing standards. This strategy is poised to refine aerial object detection, ensuring more precise and dependable outcomes across various applications.
|
|
16:30-18:00, Paper ThCL-EX.5 | Add to My Program |
Linear Parameter Estimation Using Physics-Informed Machine Learning Algorithm for Leader Follower Tracking |
|
Lee, Sangmoon | Kyungpook National University |
Shin, Woosang | KITECH |
Jin, Yongsik | Electronics and Telecommunications Research Institute |
Keywords: AI-Based Methods, Machine Learning for Robot Control, Wheeled Robots
Abstract: Mobile robots are playing an increasingly important role in modern industries and services. These robots appear in various forms, such as automobiles, drones, and robot arms, and are actively used in various applications such as automation in industrial sites, environmental exploration, structure inspection, logistics, and transportation. In particular, the leader-follower problem of mobile robots is recognized as a key problem when robots perform tasks in cooperation. The leader-follower problem aims to accurately estimate the movement of the leader robot in a situation where the leader robot leads the following robot and performs tracking control for the follower robot. In this paper, we propose a system for linear parameter estimation that can estimate the dynamic characteristics of a real model if we only know the limited range for uncertainty to overcome the model uncertainty that can arise from the leader-follower problem. The characteristics of the state observer were applied to the learning algorithm so that the state estimation error could converge to 0 at a faster rate, and an algorithm that could estimate an uncertain model through the estimated state value was presented.
|
|
16:30-18:00, Paper ThCL-EX.6 | Add to My Program |
FROG: A New People Detection Dataset for Knee-High 2D Range Finders |
|
Amodeo, Fernando | Universidad Pablo De Olavide |
Perez-Higueras, Noe | University Pablo De Olavide |
Merino, Luis | Universidad Pablo De Olavide |
Caballero, Fernando | Universidad De Sevilla |
Keywords: Data Sets for Robot Learning, Human Detection and Tracking, Deep Learning Methods
Abstract: Mobile robots require knowledge of the environment, especially of humans located in its vicinity. While the most common approaches for detecting humans involve computer vision, an often overlooked hardware feature of robots for people detection are their 2D range finders. In most robots, they are conveniently located at a height approximately between the ankle and the knee, so they can be used for detecting people too, and with a larger field of view and depth resolution compared to cameras. In this paper, we present a new dataset for people detection using knee-high 2D range finders called FROG. This dataset has greater laser resolution, scanning frequency, and more complete annotation data compared to existing datasets such as DROW. Particularly, the FROG dataset contains annotations for 100% of its laser scans (unlike DROW which only annotates 5%), 17x more annotated scans, 100x more people annotations, and over twice the distance traveled by the robot. We propose a benchmark based on the FROG dataset, and analyze a collection of state-of-the-art people detectors based on 2D range finder data. We also propose and evaluate a new end-to-end deep learning approach for people detection. Our solution works with the raw sensor data directly (not needing hand-crafted input data features). Experimental results show how the proposed people detector attains results comparable to the state of the art, while an optimized implementation for ROS can operate at more than 500 Hz.
|
|
16:30-18:00, Paper ThCL-EX.7 | Add to My Program |
Construction of the AR-Haptic Feedback System for Hyper-Elastic Materials Using RFEA |
|
Kang, Hyeseon | Seoul National University of Science and Technology |
Bae, Jaehyoung | Seoul National University of Science and Technology |
Kim, Jinhyun | Seoul National University of Science and Technology |
Keywords: Haptics and Haptic Interfaces, Human Factors and Human-in-the-Loop
Abstract: Augmented reality shows promise for innovative technological advancements in the design and diagnosis of mechanical, construction, and medical systems. In particular, utilizing AR for real-time deformation and stress analysis of structures and providing realistic user feedback plays an important role in shaping the meta-verse of the future. Rubber, which is widely used in engineering, is a representative hyper-elastic materials with a nonlinear strain-stress relationship. Therefore, it is difficult to implement hyper-elastic materials as spring-damper systems, but real-time deformation and stress analysis can be implemented by utilizing pre-built RFEA(Real-time Finite Element Analysis). In this study, we used RFEA data to build an augmented reality that allows users to interact with hyper-elastic materials in AR, and provides visualization of real-time finite element analysis and haptic feedback to simulate the nonlinear elasticity of hyper-elastic materials
|
|
16:30-18:00, Paper ThCL-EX.8 | Add to My Program |
Do Humans Retaliate against Immoral Robots? |
|
Rezaei Khavas, Zahra | Umass Lowell |
Kotturu, Monish Reddy | University of Massachusetts Lowell |
Azadeh, Reza | University of Massachusetts Lowell |
Robinette, Paul | University of Massachusetts Lowell |
Keywords: Human-Centered Robotics, Human-Centered Automation, Human Factors and Human-in-the-Loop
Abstract: The growing implementation of robots in societal contexts necessitates a deeper exploration of the dynamics of trust between humans and robots. This exploration should expand beyond traditional viewpoints that primarily emphasize the influence of robot performance. In the burgeoning area of social robotics, fine-tuning a robot's personality traits is increasingly recognized as a crucial element in shaping users' experiences during human-robot interaction (HRI). Research in this field has led to the creation of trust scales that encompass various trust dimensions in HRI. These scales include aspects related to performance as well as moral dimensions. Our study investigates how these trust aspects affect human trust in robots, particularly examining if breaches of moral trust by robots impact human trust more negatively than performance trust breaches. We also explore if trust loss and retaliation tendencies differ between human and robotic teammates following the violations of these different trust aspects. Through multiple versions of an online search task, we examined our research questions and found that moral trust violations by robotic teammates, like with human teammates, damage human trust more severely than performance violations, and also violations of moral trust by a teammate increase retaliation tendencies. These findings highlight the importance of moral trust in determining how humans view a robot's trustworthiness.
|
|
16:30-18:00, Paper ThCL-EX.9 | Add to My Program |
Wearable Robotic Tail to Support Balance |
|
Anwar, Eisa | Queen Mary University of London |
Abeywardena, Sajeeva | University of Surrey |
Miller, Stuart | Queen Mary University of London |
Farkhatdinov, Ildar | Queen Mary University of London |
Keywords: Physically Assistive Devices, Body Balancing, Wearable Robotics
Abstract: Workers in many industries frequently have to manipulate heavy loads as part of their work. This comes in many forms, for example, loading and unloading vehicles as part of last mile delivery or packing shelves in warehouses or supermarkets. These actions can move the body’s centre of mass further away from the middle of the body and subsequently the body may have to strain itself to maintain balance, which can potentially lead to chronic pains. Inspired by nature, where animals use tails for balance, we have built a supernumerary robotic limb in the form of a robotic tail. Its position is controlled based on the distance of a carried object from the user's body, with actuating motors serving the dual purpose of moving the tail as well as acting as its counterbalance. This characteristic gives a higher counterweight to overall weight ratio maximising its effectiveness whilst minimising its weight. Tests with a human participant have shown that the tail can keep the centre of mass positioned more closely to the middle of the base of support.
|
|
16:30-18:00, Paper ThCL-EX.10 | Add to My Program |
Robotic Grasping of Small-Sized Industrial Components Via Multi-Stage Visual Servoing |
|
Qian, Kun | Heriot Watt University |
Erden, Mustafa Suphi | Heriot-Watt University |
Kong, Xianwen | Heriot-Watt Universiy |
Keywords: Perception for Grasping and Manipulation, Deep Learning in Grasping and Manipulation, Assembly
Abstract: Manual assembly of industrial components with complex geometries and small dimensions, such as concentrator photovoltaics solar panel units, often results in suboptimal accuracy, efficiency, and throughput. Our ultimate goal is to develop a robotic assembly system capable of delicately manipulating and precisely assembling industrial objects. To address the challenge of localization and grasping of small sized components, this paper introduces a multi-stage visual servoing framework. The initial positioning of the object is conducted using deep learning-based 6D pose estimation, followed by feature matching to accomplish the final grasping. Leveraging the established robotic system, experiments are conducted to validate the efficacy of the proposed framework, utilizing the smallest component within a photovoltaic unit, known as the solar cell (length: 12mm, width: 10mm, height: 3mm).
|
|
16:30-18:00, Paper ThCL-EX.11 | Add to My Program |
Energy-Aware Hierarchical Reinforcement Learning for CubeSat Task Scheduling |
|
Ramezani, Mahya | University of Luxembourg |
Amiri Atashgah, M.A. | University of Tehran |
Rezaee, Alireza | University of Tehran |
Alandihallaj, Mohammadamin | University of Luxembourg |
Sanchez-Lopez, Jose Luis | University of Luxembourg |
Hein, Andreas | University of Luxembourg |
Keywords: AI-Based Methods, Energy and Environment-Aware Automation
Abstract: Learning (HierRL) methodology tailored for optimizing CubeSat task scheduling in Low Earth Orbits (LEO). Incorporating a high-level policy for global task distribution and a low-level policy for real-time adaptations as a safety mechanism, our approach integrates the Similarity Attention-based Encoder (SABE) for task prioritization and an MLP estimator for energy consumption forecasting. Integrating this mechanism creates a safe and fault-tolerant system for CubeSat task scheduling. Simulation results validate the HierRL’s superior convergence and task success rate, outperforming both the MADDPG model and traditional random scheduling across multiple CubeSat configurations.
|
|
16:30-18:00, Paper ThCL-EX.12 | Add to My Program |
Development of a Standard Process Model for Robotic Equipment for Cell Culture Inspection and Media Change Processes |
|
Choo, Sungwon | KITECH |
Baek, Sunhyuk | Hanyang Univ |
Koo, Taehoon | KITECH |
Nam, Kyung-Tae | Kitech |
Keywords: Industrial Robots, Grasping, Process Control
Abstract: The bio industry, reliant on skilled professionals, encompasses various processes like cell line selection, cultivation, purification, and final product processing. Cell cultivation, pivotal in producing therapeutics, vaccines, and bio-products, often relies on manual labor, leading to inconsistent results due to operator proficiency and environmental factors. Minimizing cell damage during critical operations like harvesting and splitting is crucial. Automation systems are being developed to quantify operator skills and streamline tasks, aiming to enhance consistency and efficiency in bio-manufacturing processes, particularly in cell culture. In this paper, we propose cell culture automation system that utilizes a single robotic arm to manipulate various objects within the confined space, aiming to address the challenges. Analyzing the cell culture process, we design an Automatic Tool Changer (ATC) utilizing a robotic arm to easily interchange grippers and dispensing modules. This ATC, along with the gripper, facilitates various operations such as consumable handling, pick place, suction, and dispensing by seamlessly replacing tools during the cell culture process
|
|
16:30-18:00, Paper ThCL-EX.13 | Add to My Program |
Design Considerations for Bioinspired Multi-Morphing Omnidirectional Robot |
|
S P S, Paramesh | Department of Mechanical Engineering, Amrita School of Engineeri |
T, Ruvanthika | Department of Mechanical Engineering, Amrita School of Engineeri |
S, Sanjeev | Department of Mechanical Engineering, Amrita School of Engineeri |
A, Barathwaj | Department of Mechanical Engineering, Amrita School of Engineeri |
V, Shourya | Department of Mechanical Engineering, Amrita School of Engineeri |
R, Sharan | Department of Mechanical Engineering, Amrita School of Engineeri |
S, Anil | SISC, Bengaluru |
S, Rammohan | Amrita School of Engineering, Coimbatore, Amrita Vishwa Vidyapee |
Keywords: Mechanism Design, Kinematics, Wheeled Robots
Abstract: Bioinspired multi-morphing omnidirectional robots are attracting several researchers. The ability to navigate through confined spaces by morphing their shape is seen in creatures such as spiders, crabs, etc. The present work aims at identifying the design strategies for a morphing robot mathematically and proposing an effective scheme of controlling the robot using API. We propose a novel configuration of the quadruped robot with joints similar to the major axes of an articulated manipulator. The selected joints architecture is inspired by the motion prevailing in few creatures such as crabs and spiders. This configuration not only offers the advantage of facilitating bioinspired morphing but also simplifies the mathematical modeling of the robot. The potential values of wheel radius, wheel offset and steering angle where the Condition Number of the Jacobian matrix is 1 are determined. A simplified control of robot using API is also proposed for maneuvering the robot by any end user.
|
|
16:30-18:00, Paper ThCL-EX.16 | Add to My Program |
A Soft Wearable Robot for Upper Limb Assistance Using Layer Jamming Mechanisms |
|
Kim, Namho | Chung-Ang University |
Park, Jong Hoon | Yonsei University |
Shin, Dongjun | Yonsei University |
Keywords: Wearable Robotics, Soft Sensors and Actuators, Soft Robot Materials and Design
Abstract: Wearable robotic systems, especially those integrating soft materials, are increasingly capturing attention due to their comfort, ease of use, and versatility in supporting various tasks. Striking a balance between ensuring wearer comfort with low impedance and delivering sufficient assistive force presents a significant design challenge in wearable robotics. In this study, we propose the utilization of variable impedance tailored to different types of muscle contractions in the human body. By specifically addressing eccentric muscle contractions, adjusting impedance can alleviate muscular strain as both forces act in the same direction. To realize this concept, we introduce layer jamming mechanisms capable of adjusting impedance across multiple directions. This mechanism not only allows for a broad range of impedance variation in multi-degree-of-freedom (DoF) rotations but also facilitates customized directional torque in human multi-DoF joints. Through the development of a wearable robot prototype equipped with the proposed layer jamming mechanisms, experimental validation confirms the effectiveness of this impedance-based assistance strategy. The findings of this study unveil new possibilities in wearable robot design, showcasing how finely tuned impedance can enhance human motion, potentially boosting task efficiency and minimizing injury risks. Therefore, this work presents a fresh perspective for researchers involved in the field of wearable robotics.
|
|
16:30-18:00, Paper ThCL-EX.17 | Add to My Program |
Development and Fundamental Experiment of Bladeless Fan Propulsions for Small Unmanned Aerial Vehicles |
|
Saito, Tatsuki | Shibaura Institute of Technology |
Hamane, Hiroto | Kogakuin University |
Abiko, Satoko | Shibaura Institute of Technology |
Keywords: Aerial Systems: Mechanics and Control, Aerial Systems: Applications
Abstract: Unmanned aerial vehicles (UAVs) are generally equipped with propeller engines to achieve flight. However, the exposed propeller engines of conventional UAVs cause human injury, bird strike problems, and so on. These problems lead to severe accidents involving human lives. The previous research was conducted to develop multicopter-type UAVs with bladeless fans. However, there has been no research on fixed-wing bladeless UAVs. Therefore, this presentation describes the development of a fixed-wing type small uncrewed aerial vehicle equipped with a bladeless fan without a propeller. In this presentation, the bladeless fans are designed and developed to be installed on the small UAV. Computational fluid dynamics analyzes three different shapes of bladeless fans to visualize the airflow of those bladeless fans. Based on the results of the analysis, the second design, which had the highest velocity generated by the bladeless fan, was selected for the airframe. In addition, a model with two bladeless fans was built to provide additional thrust to the fuselage. Flight tests showed that the flight time was only about 9 seconds. For future works, the bladeless fan needs to be redesigned to reduce the weight of the fuselage, and a weight reduction study will be conducted to realize a long-distance flight. The reduction of the total weight also needs to be considered in future work.
|
|
16:30-18:00, Paper ThCL-EX.18 | Add to My Program |
PatLink : Patella-Inspired Linkage Joint for Force Transmission Mechanism |
|
Lee, Sinyoung | Chungang University |
Lee, Dongun | Yonsei University |
Shin, Dongjun | Yonsei University |
Keywords: Modeling, Control, and Learning for Soft Robots, Soft Robot Materials and Design, Soft Robot Applications
Abstract: Pneumatic artificial muscles (PAMs) are promising actuators for safe human-robot interaction due to their high force density and compliance. However, challenges like force reduction and poor control performance hinder their practical use. To address this, we propose the Patella-Inspired Linkage Joint (PatLink), amplifying joint rotation with muscle contraction. This enhances joint speed and torque output. Experimental results demonstrate a 104% increase in range of motion and a 108% enhancement in mean work rate, validating PatLink's effectiveness. Furthermore, PatLink can be extended to similar actuators like shape memory alloy and twisted string actuators.
|
|
16:30-18:00, Paper ThCL-EX.19 | Add to My Program |
ADAPT Hand: Robust Robotic Manipulation through Biomimetic Distributed Compliance |
|
Junge, Kai | École Polytechnique Fédérale De Lausanne |
Hughes, Josie | EPFL |
Keywords: Multifingered Hands, Biomimetics, Dexterous Manipulation
Abstract: The impressive capabilities of humans to robustly perform manipulation relies on compliant interactions, enabled through the structure and materials spatially distributed in our hands. We propose by mimicking this distributed compliance in an anthropomorphic robotic hand, the open-loop manipulation robustness increases and observe the emergence of human-like behaviours. To achieve this, we introduce the ADAPT Hand equipped with tunable compliance throughout the skin, fingers, and the wrist. Through extensive automated pick-and-place tests, we show the grasping robustness closely mirrors an estimated geometric theoretical limit. In a grasping task under a constrained environment, we demonstrate the hand-object self-organization behavior, where the hand automatically exhibits different grasp types depending on object geometries. Furthermore, the robot grasp type mimics a natural human grasp with a direct similarity of 68%.
|
|
16:30-18:00, Paper ThCL-EX.20 | Add to My Program |
An Agile Monopedal Hopping Quadcopter with Synergistic Hybrid Locomotion |
|
Bai, Songnan | City University of Hong Kong |
Pan, Qiqi | Hong Kong University of Science and Technology |
Ding, Runze | City University of Hongkong |
Jia, Huaiyuan | City University of Hong Kong |
Yang, Zhengbao | Hong Kong University of Science and Technology |
Chirarattananon, Pakpong | City University of Hong Kong |
Keywords: Aerial Systems: Mechanics and Control, Legged Robots, Biologically-Inspired Robots
Abstract: Nature abounds with examples of superior mobility through the fusion of aerial and ground movement. Drawing inspiration from such multimodal locomotion, we introduce a high-performance hybrid hopping and flying robot. The proposed robot seamlessly integrates a nano quadcopter with a passive telescopic leg, overcoming limitations of previous jumping mechanisms that rely on stance phase leg actuation. Based on the identified dynamics, a thrust-based control method and detachable active aerodynamic surfaces were devised for the robot to perform continuous jumps with and without position feedback. This unique design and actuation strategy enable tuning of jump height and reduced stance phase duration, leading to agile hopping locomotion. The robot recorded an average vertical hopping speed of 2.38 meters per second at a jump height of 1.63 meters. By harnessing multimodal locomotion, the robot is capable of intermittent midflight jumps that result in substantial instantaneous accelerations and rapid changes in flight direction, offering enhanced agility and versatility in complex environments. The passive leg design holds potential for direct integration with conventional rotorcraft, unlocking seamless hybrid hopping and flying locomotion.
|
|
16:30-18:00, Paper ThCL-EX.21 | Add to My Program |
From Imitation Learning to Instruction Learning: A New Paradigm for Efficient Motion Learning |
|
Ye, Linqi | Shanghai University |
Li, Jiayi | Tsinghua University |
Cheng, Yi | Tsinghua University |
Xianglong, Li | Qiyuan Lab |
Liu, Houde | Shenzhen Graduate School, Tsinghua University |
Liang, Bin | Center for Artificial Intelligence and Robotics, Graduate School |
Keywords: Reinforcement Learning, Legged Robots, Humanoid and Bipedal Locomotion
Abstract: Recent years have witnessed many successful trials in the robot learning field. For contact-rich robotic tasks, it is challenging to learn coordinated motor skills by reinforcement learning. Imitation learning solves this problem by using a mimic reward to encourage the robot to track a given reference trajectory. However, imitation learning is not so efficient and may constrain the learned motion. We propose instruction learning, which is inspired by the human learning process and is highly efficient, flexible, and versatile for robot motion learning. Instead of using a reference signal in the reward, instruction learning applies a reference signal directly as a feedforward action, and it is combined with a feedback action learned by reinforcement learning to control the robot. Besides, we propose the action bounding technique and remove the mimic reward, which is shown to be crucial for efficient and flexible learning. We compare the performance of instruction learning with imitation learning, indicating that instruction learning can greatly speed up the training process and guarantee learning the desired motion correctly. The effectiveness of instruction learning is validated through a bunch of motion learning examples for a biped robot and a quadruped robot, where skills can be learned typically within several million steps.
|
|
16:30-18:00, Paper ThCL-EX.22 | Add to My Program |
Development of Quad Nozzle FFF 3D Printer for Multi-Material Single-Step 3D Printing |
|
Lee, Haemin | Seoul National University |
Park, Jong Hoo | Seoul National University |
Cho, Kyu-Jin | Seoul National University, Biorobotics Laboratory |
Keywords: Methods and Tools for Robot System Design, Embedded Systems for Robotic and Automation, Product Design, Development and Prototyping
Abstract: Building robots involves the assembly of numerous components such as mechanical joints, links, sensors, actuators, and power sources. There has long been a focus on minimizing the number of components and reducing the resources required for assembling these components, particularly in certain categories of systems where specific factors are crucial. These factors include the development of small-scale robots, point-of-use manufacturing, and highly customizable design. Traditional fabrication processes often involve specific manufacturing for individual components, followed by subsequent assembly steps, requiring serial, complex, and occasionally manual interventions. In contrast, the single-step 3D printing approach,which refers fabricating of a whole object with monolithically integrated components in a single printing process without involving assembly required, challenges the long-standing tradition of treating the components as independent modules. Although single-step 3D printing offers an opportunity to revolutionizes the process by simultaneously creating entire robots with integrated components, the capacity to concurrently print multiple materials with distinct characteristics has been a significant challenge. This study presents the development of a Quad Nozzle 3D Printer equipped with four nozzles to enable single-step 3D printing of up to four different materials simultaneously.
|
|
ThE-EX Expo Session, Exhibition Hall |
Add to My Program |
ICRA EXPO Day 3 |
|
|
Chair: Ravankar, Ankit A. | Tohoku University |
Co-Chair: Salazar Luces, Jose Victorio | Tohoku University |
|
13:30-18:00, Paper ThE-EX.1 | Add to My Program |
Nezha-F: Design and Demonstration of a Foldable and Self-Deployable HAUV |
|
Bi, Yuanbo | Shanghai jiao tong University |
Xu, Zhuxiu | Shanghai Jiao Tong University |
Bai, YuLin | Shanghai Jiao Tong University |
Zhou, Hexiong | Shanghai Jiao Tong University |
Zeng, Zheng | Shanghai Jiao Tong University |
|
13:30-18:00, Paper ThE-EX.2 | Add to My Program |
STAR: Swarm Technology for Aerial Robotics Research |
|
Chiun, Jimmy | National University of Singapore |
Leong, Wai Lun | National University of Singapore |
Cao, Yuhong | National University of Singapore |
Tan, Yan Rui | National University of Singapore |
Teo, Rodney | Defense Science Organization |
Sartoretti, Guillaume Adrien | National University of Singapore (NUS) |
|
13:30-18:00, Paper ThE-EX.3 | Add to My Program |
Demonstration of Multi-Modal Aerial Robot System for Forest Canopy Research: Perching and Disentangling Maneuvers |
|
Romanello, Luca | TUM |
Lan, Tian | Technical University of Munich |
Kovac, Mirko | Imperial College London |
Armanini, Sophie Franziska | Technical University of Munich |
Kocer, Basaran Bahadir | Imperial College London |
|
13:30-18:00, Paper ThE-EX.4 | Add to My Program |
Surface Normal Estimation As Always-On Perception for Vision-Based Robots |
|
Bae, Gwangbin | Imperial College London |
Davison, Andrew J | Imperial College London |
|
13:30-18:00, Paper ThE-EX.5 | Add to My Program |
SCALER-B: A Multi-Modal Versatile Robot for Simultaneous Locomotion and Grasping |
|
Tanaka, Yusuke | University of California, Los Angeles |
Schperberg, Alexander | University of California Los Angeles |
Hong, Dennis | UCLA |
|
13:30-18:00, Paper ThE-EX.6 | Add to My Program |
Remote Control of a Water Hydraulically Driven Robot through Thin and Long 20 M Tubes |
|
Nakamura, Yuki | The University of Electro-Communications |
Noda, Tomoyuki | ATR Computational Neuroscience Laboratories |
Yoshimura, Shuto | The university of electro-communications |
Nakata, Yoshihiro | The University of Electro-Communications |
|
13:30-18:00, Paper ThE-EX.7 | Add to My Program |
A Sturdy, Two-Body Robot for Handlebar Placement in Any Location |
|
Bolli, Roberto | MIT |
Asada, Harry | MIT |
|
13:30-18:00, Paper ThE-EX.8 | Add to My Program |
Multi-Platform Mixed-Reality Robotics Teaching and Learning System |
|
Zhao, Xinyan | The Chinese University of Hong Kong |
Shang, Siqi | Columbia University |
Lau, Darwin | The Chinese University of Hong Kong |
Lee, Jimmy H.M. | The Chinese University of Hong Kong |
|
13:30-18:00, Paper ThE-EX.9 | Add to My Program |
Tac3D: Robot Fingertip Multimodal Tactile Sensor |
|
Zhang, Lunwei | Tsinghua University |
Zhou, Yen Hang | Tsinghua University |
Sui, Ruomin | Tsinghua University |
Jiang, Yao | Tsinghua university |
|
13:30-18:00, Paper ThE-EX.10 | Add to My Program |
Minimalistic Grasping Automation - Multimodal Soft Gripper |
|
Dontu, Saikrishna | Singapore University of Technology and Design |
Kanhere, Elgar | Singapore University of Technology and Design |
Valdivia y Alvarado, Pablo | Singapore University of Technology and Design, MIT |
Stalin, Thileepan | Singapore University of Technology and Design |
|
13:30-18:00, Paper ThE-EX.11 | Add to My Program |
Robot Teleoperation in Constrained Spaces with a SLIM End Effector |
|
Thomasson, Rachel | Stanford University |
Bernardini, Alessandra | University of Bologna |
Zhu, Peizhang | Flexiv Ltd. |
Zhang, Zhipeng | Flexiv Ltd. |
Cutkosky, Mark | Stanford University |
|
13:30-18:00, Paper ThE-EX.12 | Add to My Program |
Bidirectional Haptic Transmission through a Bracelet Device Using a Sensory Equivalence Conversion of High-frequencyVibration |
|
Nida, Takaya | Tohoku Univ. |
Matsubara, Toru | Tohoku University |
Waga, Masamune | Tohoku University |
Konyo, Masashi | Tohoku University |
Tadokoro, Satoshi | Tohoku University |
|
13:30-18:00, Paper ThE-EX.13 | Add to My Program |
Cyber-Physical Interactions through a Human Coincident Robot |
|
Sasaki, Tomoya | Tokyo University of Science |
Watanabe, Takafumi | Preferred Robotics ainc. |
Inami, Masahiko | The University of Tokyo |
Yoshida, Eiichi | Tokyo University of Science |
|
13:30-18:00, Paper ThE-EX.14 | Add to My Program |
Human Support Robot |
|
Okada, Hiroyuki | Tamagawa University |
Contreras-Toledo, Luis Angel | Tamagawa University |
Mizutani, Akinobu | Kyushu Institute of Technology |
Tamukoh, Hakaru | Kyushu Institute of Technology |
|
13:30-18:00, Paper ThE-EX.15 | Add to My Program |
Design of a Self-Righting Shell for a Robotic Hexapod |
|
King, Katelyn | University of Michigan |
Revzen, Shai | University of Michigan |
|
13:30-18:00, Paper ThE-EX.16 | Add to My Program |
On Bringing Robots Home |
|
Shafiullah, Nur Muhammad (Mahi) | New York University |
Etukuru, Haritheja | New York University |
Pinto, Lerrel | New York University |
|
13:30-18:00, Paper ThE-EX.17 | Add to My Program |
Contact-Safe Lead Screw Actuator with Simple-Shaped Magnets |
|
Heya, Akira | Nagoya University |
Nakata, Yoshihiro | The University of Electro-Communications |
|
13:30-18:00, Paper ThE-EX.18 | Add to My Program |
Omnidirectional Crawler Mechanisms with Circular Cross Section |
|
Tadakuma, Kenjiro | Osaka University |
Sano, Shunsuke | Tohoku University |
Kayawake, Ryotaro | Tohoku University |
Abe, Kazuki | Osaka University |
Watanabe, Masahiro | Osaka University |
Tadakuma, Riichiro | Yamagata University |
Tadokoro, Satoshi | Tohoku University |
|
13:30-18:00, Paper ThE-EX.19 | Add to My Program |
Demonstration of Fully 3D Printable Robot Hand and Soft Tactile Sensor Based on Air-Pressure and Capacitive Proximity Sensing |
|
Taylor, Sean | University of Illinois at urbana champaign |
Park, Kyungseo | Daegu Gyeongbuk Institute of Science and Technology (DGIST) |
Yamsani, Sankalp | University of Illinois Urbana-Champaign |
Kim, Joohyung | University of Illinois at Urbana-Champaign |
|
13:30-18:00, Paper ThE-EX.20 | Add to My Program |
Adaptive Speed Control of Treadmill Device for Enhanced Exercise in Virtual and Mixed Reality Experiences |
|
Manríquez-Cisterna, Ricardo | Tohoku University |
Breuss, Alexander | Sensory-Motor Systems Lab, Institute of Robotics and Intelligent Systems, ETH Zurich |
Gnarra, Oriella | ETH Zurich |
Peńa Queralta, Jorge | ETH Zürich |
Ravankar, Ankit A. | Tohoku University |
Salazar Luces, Jose Victorio | Tohoku University |
Paez Granados, Diego Felipe | ETH Zurich |
Riener, Robert | Eidgenössische Technische Hochschule (ETH) Zürich |
Hirata, Yasuhisa | Tohoku University |
|
13:30-18:00, Paper ThE-EX.21 | Add to My Program |
A Rigid-Flexible Coupled Robotic Arm for Efficient and Accurate Harvest |
|
Chen, BinHao | Shanghai Jiao Tong University |
Gong, Liang | Shanghai Jiao Tong University |
Luo, Cheng | Shanghai Jiao Tong University |
Chen, Jiayu | Shanghai Jiao Tong University |
Sun, Yefeng | Shanghai Jiao Tong University |
Bishu, Gao | Shanghai Jiao Tong University |
Chen, Feifei | Shanghai Jiao Tong University |
Li, Yanming | Shanghai Jiao Tong University |
Huang, Yixiang | Shanghai Jiao Tong University |
Liu, Chengliang | Shanghai Jiao Tong University |
|
13:30-18:00, Paper ThE-EX.22 | Add to My Program |
Onix - Rescue Robotics for Power Plant Inspection and Complex Terrain Navigation - |
|
Kojima, Shotaro | Tohoku University |
|
13:30-18:00, Paper ThE-EX.23 | Add to My Program |
Recognition of Coffee Roasting Degree Using an Intelligent Multi-Spectral Vision System |
|
Lin, Ming-Yi | Yuan Ze University |
|
13:30-18:00, Paper ThE-EX.24 | Add to My Program |
Gaussian Splatting SLAM |
|
Matsuki, Hidenobu | Imperial College London |
Murai, Riku | Imperial College London |
Kelly, Paul H J | Imperial College London |
Davison, Andrew J | Imperial College London |
|
13:30-18:00, Paper ThE-EX.25 | Add to My Program |
Bimanual Torque Research Reference Platform Demo |
|
Bien, Seongjin | Technical University of Munich |
Eberle, Felix | Technical University of Munich |
Vorndamme, Jonathan | Chair of Robotics and Systems Intelligence, Technical University of Munich |
Škerlj, Jon | Technical University of Munich |
Figueredo, Luis | University of Nottingham (UoN) |
Naceri, Abdeldjallil | Technical University of Munich |
Haddadin, Sami | Technical University of Munich |
|
13:30-18:00, Paper ThE-EX.26 | Add to My Program |
Demonstration of Novel TRUST-Based Upper Limb Support Exoskeleton |
|
Seong, Hyeonseok | Korea Advanced Institute of Science and Technology |
Farag, Seif | Korea Advanced Institute of Science and Technology |
Lee, Sang Wook | SAMSUNG HEAVY INDUSTRIES CO. LTD. |
Gaponov, Igor | University College London |
Ryu, Jee-Hwan | Korea Advanced Institute of Science and Technology |
|
13:30-18:00, Paper ThE-EX.27 | Add to My Program |
CoFRIDA: A Human-Robot Collaborative Drawing Demonstration |
|
Schaldenbrand, Peter | Carnegie Mellon University |
Parmar, Gaurav | Carnegie Mellon University |
Zhu, Jun-Yan | Carnegie Mellon University |
McCann, James | Carnegie Mellon University |
Oh, Jean | Carnegie Mellon University |
|
13:30-18:00, Paper ThE-EX.28 | Add to My Program |
Automotive Workloads Based on Autoware's Open AD Kit and PIXKit 3.0 Autonomous Developer Chassis |
|
Carballo, Alexander | Gifu University |
Walmroth, David | PIX Moving Inc. |
Wong, David | Nagoya University |
Kütük, Samet | LeoDrive |
| |