| |
Last updated on October 19, 2022. This conference program is tentative and subject to change
Technical Program for Monday October 24, 2022
|
MoA-1 |
Rm1 (Room A) |
Award Session I |
Regular session |
Chair: Yoshida, Eiichi | Tokyo University of Science |
Co-Chair: Pettersen, Kristin Y. | Norwegian University of Science and Technology |
|
10:00-10:15, Paper MoA-1.1 | |
SpeedFolding: Learning Efficient Bimanual Folding of Garments (Finalist for IROS Best Paper Award and ABB Best Student Paper Award) (Finalist for IROS Best RoboCup Paper Award Sponsored by RoboCup Federation) |
|
Avigal, Yahav | UC Berkeley |
Berscheid, Lars | Karlsruhe Institute of Technology |
Asfour, Tamim | Karlsruhe Institute of Technology (KIT) |
Kroeger, Torsten | Karlsruher Institut Für Technologie (KIT) |
Goldberg, Ken | UC Berkeley |
Keywords: Bimanual Manipulation, Deep Learning in Grasping and Manipulation, Dual Arm Manipulation
Abstract: Folding garments reliably and efficiently is a long standing challenge in robotic manipulation due to the complex dynamics and high dimensional configuration space of garments. An intuitive approach is to initially manipulate the garment to a canonical smooth configuration before folding. In this work, we develop SpeedFolding, a reliable and efficient bimanual system, which given user-defined instructions as folding lines, manipulates an initially crumpled garment to (1) a smoothed and (2) a folded configuration. Our primary contribution is a novel neural network architecture that is able to predict pairs of gripper poses to parameterize a diverse set of bimanual action primitives. After learning from 4300 human-annotated and self-supervised actions, the robot is able to fold garments from a random initial configuration in under 120s on average with a success rate of 93%. Real-world experiments show that the system is able to generalize to unseen garments of different color, shape, and stiffness. While prior work achieved 3-6 Folds Per Hour (FPH), SpeedFolding achieves 30-40 FPH.
|
|
10:15-10:30, Paper MoA-1.2 | |
FAR Planner: Fast, Attemptable Route Planner Using Dynamic Visibility Update (Finalist for IROS Best Paper Award and ABB Best Student Paper Award) |
|
Yang, Fan | Carnegie Mellon University |
Cao, Chao | Carnegie Mellon University |
Zhu, Hongbiao | Harbin Institute of Technology |
Oh, Jean | Carnegie Mellon University |
Zhang, Ji | Carnegie Mellon University |
Keywords: Reactive and Sensor-Based Planning, Motion and Path Planning
Abstract: We present our work on a visibility graph-based planning framework. The planner constructs a polygonal representation of the environment by extracting edge points around obstacles to form enclosed polygons. With that, the method dynamically updates a global visibility graph using a two-layered data structure, expanding the visibility edges along with the navigation and removing edges that become occluded by dynamic obstacles. The planner is capable of dealing with navigation tasks in both known and unknown environments. In the latter case, the method is attemptable in discovering a way to the goal by picking up the environment layout on the fly and fast re-planning to account for the newly observed environment. We evaluate the method in both simulated and real-world settings. The method shows the capability to navigate through unknown environments and reduces the travel time by up to 12-47% from search-based methods: A*, D* Lite, and more than 24-35% than sampling-based methods: RRT*, BIT*, and SPARS.
|
|
10:30-10:45, Paper MoA-1.3 | |
Learning-Based Localizability Estimation for Robust LiDAR Localization (Finalist for IROS Best Paper Award and ABB Best Student Paper Award) |
|
Nubert, Julian | ETH Zürich |
Walther, Etienne | ETH Zürich |
Khattak, Shehryar | ETH Zurich |
Hutter, Marco | ETH Zurich |
Keywords: Field Robots, Localization, Failure Detection and Recovery
Abstract: LiDAR-based localization and mapping is one of the core components in many modern robotic systems due to the direct integration of range and geometry, allowing for precise motion estimation and generation of high quality maps in real-time. Yet, as a consequence of insufficient environmental constraints present in the scene, this dependence on geometry can result in localization failure, happening in self-symmetric surroundings such as tunnels. This work addresses precisely this issue by proposing a neural network-based estimation approach for detecting (non-)localizability during robot operation. Special attention is given to the localizability of scan-to-scan registration, as it is a crucial component in many LiDAR odometry estimation pipelines. In contrast to previous, mostly traditional detection approaches, the proposed method enables early detection of failure by estimating the localizability on raw sensor measurements without evaluating the underlying registration optimization. Moreover, previous approaches remain limited in their ability to generalize across environments and sensor types, as heuristic-tuning of degeneracy detection thresholds is required. The proposed approach avoids this problem by learning from a collection of different environments, allowing the network to function over various scenarios. Furthermore, the network is trained exclusively on simulated data, avoiding arduous data collection in challenging and degenerate, often hard-to-access, environments. The presented method is tested during field experiments conducted across challenging environments and on two different sensor types without any modifications. The observed detection performance is on par with state-of-the-art methods after environment-specific threshold tuning.
|
|
10:45-11:00, Paper MoA-1.4 | |
Adversarial Motion Priors Make Good Substitutes for Complex Reward Functions (Finalist for IROS Best Paper Award and ABB Best Student Paper Award) |
|
Escontrela, Alejandro | UC Berkeley |
Peng, Xue Bin | University of California, Berkeley |
Yu, Wenhao | Google |
Zhang, Tingnan | Google |
Iscen, Atil | Google |
Goldberg, Ken | UC Berkeley |
Abbeel, Pieter | UC Berkeley |
Keywords: Legged Robots, Reinforcement Learning, Art and Entertainment Robotics
Abstract: Training a high-dimensional simulated agent with an under-specified reward function often leads the agent to learn physically infeasible strategies that are ineffective when deployed in the real world. To mitigate these unnatural behaviors, reinforcement learning practitioners often utilize complex reward functions that encourage physically plausible behaviors. However, a tedious labor-intensive tuning process is often required to create hand-designed rewards which might not easily generalize across platforms and tasks. We propose substituting complex reward functions with ``style rewards" learned from a dataset of motion capture demonstrations. A learned style reward can be combined with an arbitrary task reward to train policies that perform tasks using naturalistic strategies. These natural strategies can also facilitate transfer to the real world. We build upon Adversarial Motion Priors -- an approach from the computer graphics domain that encodes a style reward from a dataset of reference motions -- to demonstrate that an adversarial approach to training policies can produce behaviors that transfer to a real quadrupedal robot without requiring complex reward functions. We also demonstrate that an effective style reward can be learned from a few seconds of motion capture data gathered from a German Shepherd and leads to energy-efficient locomotion strategies with natural gait transitions.
|
|
11:00-11:15, Paper MoA-1.5 | |
RCareWorld: A Human-Centric Simulation World for Caregiving Robots (Finalist for IROS Best Paper Award and ABB Best Student Paper Award) (Finalist for IROS Best RoboCup Paper Award Sponsored by RoboCup Federation) |
|
Ye, Ruolin | Cornell University |
Xu, Wenqiang | Shanghai Jiaotong University |
Fu, Haoyuan | Shanghai Jiao Tong University |
Jenamani, Rajat Kumar | Cornell University |
Nguyen, Vy | Cornell University |
Lu, Cewu | ShangHai Jiao Tong University |
Dimitropoulou, Katherine | Columbia University |
Bhattacharjee, Tapomayukh | Cornell University |
Keywords: Simulation and Animation, Human-Centered Robotics, Robot Companions
Abstract: In this paper, we present RCareWorld, a human-centric simulation world for physical and social robot caregiving with support for realistic human modeling, home environments with multiple levels of accessibility, and robots used for assistance. This simulation is designed using inputs from stakeholders such as expert occupational therapists, care-recipients, and caregivers. It provides a variety of benchmark ADL tasks in realistic settings. It interfaces with various physics engines to model rigid, articulated and deformable objects. It provides the capability to plan, control, and learn both human and robot control policies by interfacing it with state-of-the-art external planning and learning libraries. We performed experiments on a subset of these ADL tasks using reinforcement learning methods. We performed a representative real-world physical robotic caregiving experiment by transferring policies learned in RCareWorld directly to a real robot. Additionally, we performed a real-world social caregiving experiment using behaviors modeled in RCareWorld. Robotic caregiving, though potentially impactful towards enhancing the quality-of-life of care-recipients and caregivers, is a field with many barriers of entry due to it's interdisciplinary facets. RCareWorld takes the first step towards building a realistic simulation environment for robotic caregiving research to democratize this field that would enable robotics researchers around the world to contribute to this exciting field. Demo videos and supplementary materials can be found here: https://emprise.cs.cornell.edu/rcareworld/
|
|
11:15-11:30, Paper MoA-1.6 | |
Design and Modeling of a Spring-Like Continuum Joint with Variable Pitch for Endoluminal Surgery (Finalist for IROS Best Paper Award and ABB Best Student Paper Award) |
|
Li, Wei | Imperial College London |
Zhang, Dandan | Imperial College London |
Yang, Guang-Zhong | Shanghai Jiao Tong University |
Lo, Benny Ping Lai | Imperial College London |
Keywords: Medical Robots and Systems
Abstract: In endoluminal surgery, the miniature instruments shall be of high accuracy and flexibility for the minimal invasive diagnosis and surgical intervention. To this end, continuum robots with flexible joints have been proposed as the mechanism of endoscopic instruments. The compliance and deformability of the continuum joints enable access into the curved lumen. However, the manufacturing tolerances are not normally considered in the design procedure, and led to inaccuracy in the robotic control. To improve the control accuracy and flexibility of endoluminal surgical robot, we propose a novel design of a metal printed continuum joint in this paper, which incorporates a variable pitch design into the spring-like structure. The design can reduce the position errors accumulated on the distal tip of the joint, especially at large bending angles. The specification of variable pitch is investigated and determined with a friction model. In addition, to eliminate the distortion of the joint induced during the metal printing process, an extensive experiment was conducted to access the effect of the variables in the design (pitch, thickness, width and number of coils), with the aim of determining optimal parameters for reducing discrepancy caused by manufacturing variations. The final results indicated that the bending error of a single joint can be reduced from 18.10% to 4.63%, and a multi-segment prototype was developed to verify its effectiveness for potential surgical applications.
|
|
MoA-2 |
Rm2 (Room B-1) |
Learning 1 |
Regular session |
Chair: Stachniss, Cyrill | University of Bonn |
Co-Chair: Hughes, Josie | EPFL |
|
10:00-10:10, Paper MoA-2.1 | |
CALVIN: A Benchmark for Language-Conditioned Policy Learning for Long-Horizon Robot Manipulation Tasks |
|
Mees, Oier | Albert-Ludwigs-Universität |
Hermann, Lukas | University of Freiburg |
Rosete-Beas, Erick | University of Freiburg |
Burgard, Wolfram | University Fo Freiburg |
Keywords: Data Sets for Robot Learning, Machine Learning for Robot Control, Imitation Learning
Abstract: General-purpose robots coexisting with humans in their environment must learn to relate human language to their perceptions and actions to be useful in a range of daily tasks. Moreover, they need to acquire a diverse repertoire of general-purpose skills that allow composing long-horizon tasks by following unconstrained language instructions. In this paper, we present CALVIN (Composing Actions from Language and Vision), an open-source simulated benchmark to learn long-horizon language-conditioned tasks. Our aim is to make it possible to develop agents that can solve many robotic manipulation tasks over a long horizon, from onboard sensors, and specified only via human language. CALVIN tasks are more complex in terms of sequence length, action space, and language than existing vision-and-language task datasets and supports flexible specification of sensor suites. We evaluate the agents in zero-shot to novel language instructions and to novel environments and objects. We show that a baseline model based on multi-context imitation learning performs poorly on CALVIN, suggesting that there is significant room for developing innovative agents that learn to relate human language to their world models with this benchmark.
|
|
10:10-10:20, Paper MoA-2.2 | |
Bio-Inspired Reflex System for Learning Visual Information for Resilient Robotic Manipulation |
|
Junge, Kai | École Polytechnique Fédérale De Lausanne |
Qiu, Kevin | École Polytechnique Fédérale De Lausanne |
Hughes, Josie | EPFL |
Keywords: Bioinspired Robot Learning, Visual Learning, Biologically-Inspired Robots
Abstract: Humans have an incredible sense of self-preservation that is both instilled, and also learned through experience. One system which contributes to this is the pain and reflex system which both minimizes damage through involuntary reflex actions and also serves as a means of `negative reinforcement' to allow learning of poor actions or decision. Equipping robots with a reflex system and parallel learning architecture could help to prolong their useful life and allow for continued learning of safe actions. Focusing on a specific mock-up scenario of cubes on a `stove' like setup, we investigate the hardware and learning approaches for a robotic manipulator to learn the presence of `hot' objects and its contextual relationship to the environment. By creating a reflex arc using analog electronics that bypasses the `brain' of the system we show an increase in the speed of release by at least two-fold. In parallel we have a learning procedure which combines visual information of the scene with this `pain signal' to learn and predict when an object may be hot, utilizing an object detection neural network. Finally, we are able to extract the learned contextual information of the environment by introducing a method inspired by `thought experiments' to generate heatmaps that indicate the probability of the environment being hot
|
|
10:20-10:30, Paper MoA-2.3 | |
RECALL: Rehearsal-Free Continual Learning for Object Classification |
|
Knauer, Markus | German Aerospace Center (DLR) |
Denninger, Maximilian | German Aerospace Center (DLR) |
Triebel, Rudolph | German Aerospace Center (DLR) |
Keywords: Continual Learning, Data Sets for Robotic Vision, Incremental Learning
Abstract: Convolutional neural networks show remarkable results in classification but struggle with learning new things on the fly. We present a novel rehearsal-free approach, where a deep neural network is continually learning new unseen object categories without saving any data of prior sequences. Our approach is called RECALL, as the network recalls categories by calculating logits for old categories before training new ones. These are then used during training to avoid changing the old categories. For each new sequence, a new head is added to accommodate the new categories. To mitigate forgetting, we present a regularization strategy where we replace the classification with a regression. Moreover, for the known categories, we propose a Mahalanobis loss that includes the variances to account for the changing densities between known and unknown categories. Finally, we present a novel dataset for continual learning (HOWS-CL-25), especially suited for object recognition on a mobile robot, including 150,795 synthetic images of 25 household object categories. Our approach RECALL outperforms the current state of the art on CORe50 and iCIFAR-100 and reaches the best performance on HOWS-CL-25.
|
|
10:30-10:40, Paper MoA-2.4 | |
PoseIt: A Visual-Tactile Dataset of Holding Poses for Grasp Stability Analysis |
|
Kanitkar, Shubham Satish | Carnegie Mellon University |
Jiang, Helen | Carnegie Mellon University |
Yuan, Wenzhen | Carnegie Mellon University |
Keywords: Data Sets for Robot Learning, Deep Learning in Grasping and Manipulation
Abstract: When humans grasp objects in the real world, we often move our arms to hold the object in a different pose where we can use it. In contrast, typical lab settings only study the stability of the grasp immediately after lifting, without any subsequent re-positioning of the arm. However, the grasp stability could vary widely based on the object’s holding pose, as the gravitational torque and gripper contact forces could change completely. To facilitate the study of how holding poses affect grasp stability, we present PoseIt, a novel multi-modal dataset that contains visual and tactile data collected from a full cycle of grasping an object, re-positioning the arm to one of the sampled poses, and shaking the object. Using data from PoseIt, we can formulate and tackle the task of predicting whether a grasped object is stable in a particular held pose. We train an LSTM classifier that achieves 85% accuracy on the proposed task. Our experimental results show that multi-modal models trained on PoseIt achieve higher accuracy than using solely vision or tactile data and that our classifiers can also generalize to unseen objects and poses. The PoseIt dataset is publicly released here: https://github.com/CMURoboTouch/PoseIt.
|
|
10:40-10:50, Paper MoA-2.5 | |
LaneSNNs: Spiking Neural Networks for Lane Detection on the Loihi Neuromorphic Processor |
|
Viale, Alberto | TU Wien |
Marchisio, Alberto | TU Wien |
Martina, Maurizio | Politecnico Di Torino |
Masera, Guido | Politecnico Di Torino |
Shafique, Muhammad | New York University Abu Dhabi |
Keywords: Bioinspired Robot Learning, Embedded Systems for Robotic and Automation, Autonomous Vehicle Navigation
Abstract: Autonomous Driving (AD) related features represent important elements for the next generation of mobile robots and autonomous vehicles focused on increasingly intelligent, autonomous, and interconnected systems. The applications involving the use of these features must provide, by definition, real-time decisions, and this property is key to avoid catastrophic accidents. Moreover, all the decision processes must require low power consumption, to increase the lifetime and autonomy of battery-driven systems. These challenges can be addressed through efficient implementations of Spiking Neural Networks (SNNs) on Neuromorphic Chips and the use of event-based cameras instead of traditional frame-based cameras. In this paper, we present a new SNN-based approach, called LaneSNN, for detecting the lanes marked on the streets using the event-based camera input. We develop four novel SNN models characterized by low complexity and fast response, and train them using an offline supervised learning rule. Afterward, we implement and map the learned SNNs models onto the Intel Loihi Neuromorphic Research Chip. For the loss function, we develop a novel method based on the linear composition of Weighted binary Cross Entropy (WCE) and Mean Squared Error (MSE) measures. Our experimental results show a maximum Intersection over Union (IoU) measure of about 0.62 and very low power consumption of about 1 W. The best IoU is achieved with an SNN implementation that occupies only 36 neurocores on the Loihi processor while providing a low latency of less than 8 ms to recognize an image, thereby enabling real-time performance. The IoU measures provided by our networks are comparable with the state-of-the-art, but at a much low power consumption of 1 W.
|
|
10:50-11:00, Paper MoA-2.6 | |
Striving for Less: Minimally-Supervised Pseudo-Label Generation for Monocular Road Segmentation |
|
Robinet, François | University of Luxembourg |
Akl, Yussef | University of Luxembourg |
Ullah, Kaleem | Saarland University |
Nozarian, Farzad | DFKI |
Christian, Müller | DFKI |
Frank, Raphael | University of Luxembourg |
Keywords: Deep Learning for Visual Perception, Computer Vision for Transportation, AI-Based Methods
Abstract: Identifying traversable space is one of the most important problems in autonomous robot navigation and is primarily tackled using learning-based methods. To alleviate the prohibitively high annotation-cost associated with labeling large and diverse datasets, research has recently shifted from tradi-tional supervised methods to focus on unsupervised and semi-supervised approaches. This work focuses on monocular road segmentation and proposes a practical, generic, and minimally-supervised approach based on task-specific feature extraction and pseudo-labeling. Building on recent advances in monocular depth estimation models, we process approximate dense depth maps to estimate pixel-wise road-plane distance maps. These maps are then used in both unsupervised and semi-supervised road segmentation scenarios. In the unsupervised case, we propose a pseudo-labeling pipeline that reaches state-of-the-art Intersection-over-Union (IoU), while reducing complexity and computations compared to existing approaches. We also investigate a semi-supervised extension to our method and find that even minimal labeling efforts can greatly improve results. Our semi-supervised experiments using as little as 1% & 10% of ground truth data, yield models scoring 0.9063 & 0.9332 on the IoU metric respectively. These results correspond to a comparative performance of 95.9% & 98.7% of a fully-supervised model’s IoU score, which motivates a pragmatic approach to labeling.
|
|
11:00-11:10, Paper MoA-2.7 | |
Learning Sequential Descriptors for Sequence-Based Visual Place Recognition |
|
Mereu, Riccardo | Politecnico Di Torino |
Trivigno, Gabriele | Polytechnic of Turin |
Berton, Gabriele | Politecnico Di Torino |
Masone, Carlo | Politecnico Di Torino |
Caputo, Barbara | Sapienza University |
Keywords: Deep Learning for Visual Perception, Localization, Visual Learning
Abstract: In robotics, visual place recognition (VPR) is a continuous process that receives as input a video stream to produce a hypothesis of the robot's current position within a map of known places. This work proposes a taxonomy of the architectures used to learn sequential descriptors for VPR, highlighting different mechanisms to fuse the information from the individual images. This categorization is supported by a complete benchmark of experimental results that provides evidence of the strengths and weaknesses of these different architectural choices. The analysis is not limited to existing sequential descriptors, but we extend it further to investigate the viability of Transformers instead of CNN backbones. We further propose a new ad-hoc sequence-level aggregator called SeqVLAD, which outperforms prior state of the art on different datasets. The code is available at https://github.com/vandal-vpr/vg-transformers
|
|
11:10-11:20, Paper MoA-2.8 | |
DCPCR: Deep Compressed Point Cloud Registration in Large-Scale Outdoor Environments |
|
Wiesmann, Louis | University of Bonn |
Guadagnino, Tiziano | Sapienza University of Rome |
Vizzo, Ignacio | University of Bonn |
Grisetti, Giorgio | Sapienza University of Rome |
Behley, Jens | University of Bonn |
Stachniss, Cyrill | University of Bonn |
Keywords: Deep Learning Methods, Localization, SLAM
Abstract: Reliable and accurate registration of point clouds is a challenging problem in robotics as well as in the domain of autonomous driving. In this paper, we address the task of aligning point clouds with low overlap, containing moving objects, and without prior information about the initial guess. We enhance classical ICP-based registration with neural feature-based matching to reliably find point correspondences. Our novel 3D convolutional and attention-based network is trained in an end-to-end fashion to learn features, which are well suited for matching and to rate the quality of the point correspondences. By utilizing a compression encoder, we can directly operate on a compressed map representation, making our approach well suited for operation under memory constraints. We evaluate our approach on point clouds obtained at completely different points in time, showing that our approach is able to reliably register point clouds even under those challenging conditions. The implementation of our approach and the preprocessed data can be accessed at https://github.com/PRBonn/DCPCR.
|
|
11:20-11:30, Paper MoA-2.9 | |
Deep Koopman Operator with Control for Nonlinear Systems |
|
Haojie, Shi | Chinese University of Hong Kong |
Meng, Max Q.-H. | The Chinese University of Hong Kong |
Keywords: Deep Learning Methods, Model Learning for Control, Machine Learning for Robot Control
Abstract: Recently Koopman operator has become a promising data-driven tool to facilitate real-time control for unknown nonlinear systems. It maps nonlinear systems into equivalent linear systems in embedding space, ready for real-time linear control methods. However, designing an appropriate Koopman embedding function remains a challenging task. Furthermore, most Koopman-based algorithms only consider nonlinear systems with linear control input, resulting in lousy prediction and control performance when the system is fully nonlinear with the control input. In this work, we propose an end-to-end deep learning framework to learn the Koopman embedding function and Koopman Operator together to alleviate such difficulties. We first parameterize the embedding function and Koopman Operator with the neural network and train them end-to-end with the K-steps loss function. We then design an auxiliary control network to encode the nonlinear state-dependent control term to model the nonlinearity in control input. For linear control, this encoded term is considered the new control variable instead, ensuring the linearity of the embedding space. Then we deploy Linear Quadratic Regulator (LQR) on the linear embedding space to derive the optimal control policy and decode the actual control input from the control net. Experimental results demonstrate that our approach outperforms other existing methods, reducing the prediction error by order-of-magnitude and achieving superior control performance in several nonlinear dynamic systems like damping pendulum, CartPole, and 7 Dof robotic manipulator.
|
|
MoA-3 |
Rm3 (Room B-2) |
Service Robotics |
Regular session |
Chair: Wada, Kazuyoshi | Tokyo Metropolitan University |
Co-Chair: Song, Sichao | CyberAgent Inc |
|
10:00-10:10, Paper MoA-3.1 | |
An Autonomous Descending-Stair Cleaning Robot with RGB-D Based Detection, Approaching, and Area Coverage Process |
|
Prabakaran, Veerajagadheswar | Singapore University of Technology and Design |
Le, Anh Vu | Optoelectronics Research Group Faculty of Electricals and Electr |
Kyaw, Phone Thiha | Yangon Technological University |
Mohan, Rajesh Elara | Singapore University of Technology and Design |
Aung Paing, Aung | Yangon Technological University |
Keywords: Domestic Robotics, Service Robotics, Robotics and Automation in Construction
Abstract: Cleaning robots are one of the market dominators in the commercialized robot space. So far, numerous robots have been introduced that can perform cleaning tasks in various settings, including floor, pavement, pool, lawn, windows, etc. However, none of the existing commercial cleaning robots targets the staircase, commonly found in multi-story buildings. Even though few works in the literature introduced robotic solutions for staircase cleaning, they primarily focused on cleaning the ascending staircase often, with a loose connection to access the descending staircase. In this paper, we propose a novel autonomous reconfigurable robotic platform called sTetro-D that can autonomously detect the descending staircase, approach the step, and perform area coverage in an unknown environment. The developed autonomy framework consists of two modes which are search mode and clean mode. In search mode, we implemented an RGB-D camera-based fusion technique wherein we combined the image bounding box from DCNN (Deep Convolution Neural Network) with the depth information to find the 3D first step pose that assists the robot in approaching it precisely. After the successful stair approach, the cleaning mode enables the staircase area coverage process. We described all these aspects and concluded with an experimental analysis of the proposed robotic system in a real-world scenario. The results demonstrate that the robot has a significant performance in detecting the descending staircase, staircase approach, and area coverage.
|
|
10:10-10:20, Paper MoA-3.2 | |
Non-Parametric Modeling of Spatio-Temporal Human Activity Based on Mobile Robot Observations |
|
Stuede, Marvin | Leibniz University Hannover |
Schappler, Moritz | Institute of Mechatronic Systems, Leibniz Universitaet Hannover |
Keywords: Service Robotics, Modeling and Simulating Humans, Probabilistic Inference
Abstract: This work presents a non-parametric spatio-temporal model for mapping human activity by mobile autonomous robots in a long-term context. Based on Variational Gaussian Process Regression, the model incorporates prior information of spatial and temporal-periodic dependencies to create a continuous representation of human occurrences. The inhomogeneous data distribution resulting from movements of the robot is included in the model via a heteroscedastic likelihood function and can be accounted for as predictive uncertainty. Using a sparse formulation, data sets over multiple weeks and several hundred square meters can be used for model building. The experimental evaluation, based on multi- week data sets, demonstrates that the proposed approach outperforms the state of the art both in terms of predictive quality and subsequent path planning.
|
|
10:20-10:30, Paper MoA-3.3 | |
Service Robots in a Bakery Shop: A Field Study |
|
Song, Sichao | CyberAgent Inc |
Baba, Jun | CyberAgent, Inc |
Nakanishi, Junya | Osaka Univ |
Yoshikawa, Yuichiro | Osaka University |
Ishiguro, Hiroshi | Osaka University |
Keywords: Service Robotics, Social HRI
Abstract: In this paper, we report on a field study in which we employed two service robots in a bakery store as a sales promotion. Previous studies have explored public applications of service robots public such as shopping malls. However, more evidence is needed that service robots can contribute to sales in real stores. Moreover, the behaviors of customers and service robots in the context of sales promotions have not been examined well. Hence, the types of robot behavior that can be considered effective and the customers’ responses to these robots remain unclear. To address these issues, we installed two tele-operated service robots in a bakery store for nearly 2 weeks, one at the entrance as a greeter and the other one inside the store to recommend products. The results show a dramatic increase in sales during the days when the robots were applied. Furthermore, we annotated the video recordings of both the robots’ and customers' behavior. We found that although the robot placed at the entrance successfully attracted the interest of the passersby, no apparent increase in the number of customers visiting the store was observed. However, we confirmed that the recommendations of the robot operating inside the store did have a positive impact. We discuss our findings in detail and provide both theoretical and practical recommendations for future research and applications.
|
|
10:30-10:40, Paper MoA-3.4 | |
Shared Autonomy for Safety between a Self-Reconfigurable Robot and a Teleoperator Using Multi-Layer Fuzzy Logic |
|
Garcia Azcarate, Raul Fernando | SUTD |
Sanchez Cruz, Daniela | SUTD |
Hayat, Abdullah Aamir | Singapore University of Technology and Design |
Lim, Yi | Singapore University of Technology and Design |
Muthugala Arachchige, Viraj Jagathpriya Muthugala | Singapore University of Technology and Design |
Qinrui, Tang | SUTD |
Palanisamy, Povendhan | Singapore University of Technology and Design |
Leong, Kristor Leong Jie Kai | Singapore University of Technology and Design |
Elara, Mohan Rajesh | Singapore University of Technology and Design |
Keywords: Service Robotics, Safety in HRI, Robot Safety
Abstract: Autonomous vehicles are designed to elevate the efficiency of assigned tasks and ensure the safety of the environment in which they operate. This paper presents a research study focused on shared autonomy using a multi-layer fuzzy logic framework to build a relationship between an autonomous self-reconfigurable robot and a human user by switching control to the teleoperator to assist the robot when it faces challenging scenarios while keeping a good performance and maintaining a safe environment. A novel multi-layer fuzzy logic decision process with shared autonomy for a safety framework is proposed. It evaluates safety based on the robot's multi-sensor inputs, the teleoperator's attention level, and the configuration state of the self-reconfigurable robot and switches the operation mode, robot speed gain, and configuration state for performance and safety without compromises. The experimental outcome successfully demonstrates the self-reconfigurable robot’s capability to navigate safely using shared autonomy in real-world pavement scenarios using the proposed algorithm during autonomous navigation.
|
|
10:40-10:50, Paper MoA-3.5 | |
Pedestrian-Robot Interactions on Autonomous Crowd Navigation: Reactive Control Methods and Evaluation Metrics |
|
Paez-Granados, Diego | ETH Zurich |
He, Yujie | EPFL |
Gonon, David Julian | École Polytechnique Fédérale De Lausanne |
Jia, Dan | RWTH Aachen |
Leibe, Bastian | RWTH Aachen University |
Suzuki, Kenji | University of Tsukuba |
Billard, Aude | EPFL |
Keywords: Service Robotics, Reactive and Sensor-Based Planning, Autonomous Vehicle Navigation
Abstract: Autonomous navigation in highly populated areas remains a challenging task for robots because of the difficulty in guaranteeing safe interactions with pedestrians in unstructured situations. In this work, we present a crowd navigation control framework that delivers continuous obstacle avoidance and post-contact control evaluated on an autonomous personal mobility vehicle. We propose evaluation metrics for accounting efficiency, controller response and crowd interactions in natural crowds. We report the results of over 110 trials in different crowd types: sparse, flows, and mixed traffic, with low- (< 0.15 ppsm), mid- (< 0.65 ppsm), and high- (< 1 ppsm) pedestrian densities. We present comparative results between two low-level obstacle avoidance methods and a baseline of shared control. Results show a 10% drop in relative time to goal on the highest density tests, and no other efficiency metric decrease. Moreover, autonomous navigation showed to be comparable to shared-control navigation with a lower relative jerk and significantly higher fluency in commands indicating high compatibility with the crowd. We conclude that the reactive controller fulfils a necessary task of fast and continuous adaptation to crowd navigation, and it should be coupled with high-level planners for environmental and situational awareness.
|
|
10:50-11:00, Paper MoA-3.6 | |
Design of a Reconfigurable Robot with Size-Adaptive Path Planner |
|
Samarakoon Mudiyanselage, Bhagya Prasangi Samarakoon | Singapore University of Technology and Design |
Muthugala Arachchige, Viraj Jagathpriya Muthugala | Singapore University of Technology and Design |
Kalimuthu, Manivannan | Singapore University of Technology and Design |
Chandrasekaran, Sathis Kumar | Singapore University of Technology and Design |
Elara, Mohan Rajesh | Singapore University of Technology and Design |
Keywords: Service Robotics, Product Design, Development and Prototyping, Motion and Path Planning
Abstract: Area coverage is demanded from the robots utilized in application domains such as floor cleaning. Even though many advanced coverage algorithms have been developed, the area coverage performance is limited due to the inaccessibility of narrow spaces caused by physical constraints. Reconfigurable robots have been introduced to overcome this limitation where reconfigurability could help in assessing narrow spaces. Nevertheless, the state-of-the-art reconfigurable robots are not capable of changing the morphology size and shape as a single entity. Therefore, this paper proposes a novel design of a reconfigurable robot with a size-adaptive coverage strategy. The reconfiguration mechanism is designed in such a way that the robot can independently expand or shrink its size along the principal planar axes, where the behavior allows the change of size and shape. The coverage strategy is based on boustrophedon motion and the A* algorithm modified for accessing narrow areas using the size adaptability. The design of the robot is detailed in the paper, including electro-mechanical aspects, design considerations, and the coverage path planning method. Experiments have been conducted using a prototype of the proposed design to analyze and evaluate the characteristics and the performance of the robot. The results show that the proposed robot design can improve the productivity of a floor cleaning robot in terms of area coverage and coverage time.
|
|
11:00-11:10, Paper MoA-3.7 | |
Testing Service Robots in the Field: An Experience Report |
|
Ortega, Argentina | Hochschule Bonn-Rhein-Sieg |
Hochgeschwender, Nico | Bonn-Rhein-Sieg University |
Berger, Thorsten | Chalmers | University of Gothenburg |
Keywords: Engineering for Robotic Systems, Performance Evaluation and Benchmarking
Abstract: Service robots are mobile autonomous robots, often operating in uncertain and difficult environments. While being increasingly popular, engineering service robots is challenging. Especially, evolving them from prototype to deployable product requires effective validation and verification, assuring the robot's correct and safe operation in the target environment. While testing is the most common validation and verification technique used in practice, surprisingly little is known about the actual testing practices and technologies used in the service robotics domain. We present an experience report on field testing of an industrial-strength service robot, as it transitions from lab experiments to an operational environment. We report challenges and solutions, and reflect on their effectiveness. Our long-term goal is to establish empirically-validated testing techniques for service robots. This experience report constitutes a necessary, but self-contained first step, exploring field testing practices in detail. Our data sources are detailed test artifacts and developer interviews. We model the field testing process and describe test-case design practices. We discuss experiences from performing these field tests over a 10-month test campaign.
|
|
11:10-11:20, Paper MoA-3.8 | |
Approximate Task Tree Retrieval in a Knowledge Network for Robotic Cooking |
|
Sakib, Md Sadman | University of South Florida |
Paulius Ramos, David | Brown University |
Sun, Yu | University of South Florida |
Keywords: Service Robotics, Task Planning, Planning under Uncertainty
Abstract: Flexible task planning continues to pose a difficult challenge for robots, where a robot is unable to creatively adapt their task plans to new or unseen problems, which is mainly due to the limited knowledge it may have about its actions and world. Because of this, robots typically cannot exploit knowledge or concepts in a way that mimics human creativity. Motivated by our ability as humans to adapt, we explore how task plans from a knowledge graph, known as the Functional Object-Oriented Network (FOON), can be generated for novel problems requiring concepts that are not readily available in the robot's knowledge base. Knowledge from 140 cooking recipes are structured in a FOON knowledge graph, which is used for acquiring task plan sequences known as task trees. Task trees can be modified to replicate recipes in a FOON knowledge graph format, which can be useful for expanding FOON with new recipes containing unknown object and state combinations, by relying upon semantic similarity. We demonstrate the power of task tree generation to create task trees with never-before-seen ingredient and state combinations as seen in recipes from the Recipe1M+ dataset, with which we evaluate the quality of the trees based on how accurately they depict newly added ingredients. Our experimental results show that our framework is able to provide task sequences with 76% accuracy.
|
|
11:20-11:30, Paper MoA-3.9 | |
Robotic Depowdering for Additive Manufacturing Via Pose Tracking |
|
Liu, Zhenwei | Carnegie Mellon University |
Geng, Junyi | Carnegie Mellon University |
Dai, Xikai | Carnegie Mellon University |
Swierzewski, Tomasz | Carnegie Mellon |
Shimada, Kenji | Carnegie Mellon University |
Keywords: Service Robotics, Industrial Robots, Computer Vision for Manufacturing
Abstract: With the rapid development of powder-based additive manufacturing, depowdering, a process of removing unfused powder that covers 3D-printed parts, has become a major bottleneck to further improve its productiveness. Traditional manual depowdering is extremely time-consuming and costly, and some prior automated systems either require pre-depowdering or lack adaptability to different 3D-printed parts. To solve these problems, we introduce a robotic system that automatically removes unfused powder from the surface of 3D-printed parts. The key component is a visual perception system, which consists of a pose-tracking module that tracks the 6D pose of powder-occluded parts in real-time, and a progress estimation module that estimates the depowdering completion percentage. The tracking module can be run efficiently on a laptop CPU at up to 60 FPS. Experiments show that our system can remove unfused powder from the surface of various 3D-printed parts without causing any damage. To the best of our knowledge, this is one of the first vision-based depowdering systems that adapt to parts with various shapes without the need for pre-depowdering.
|
|
MoA-4 |
Rm4 (Room C-1) |
Manipulation Systems 1 |
Regular session |
Chair: Tanaka, Kazutoshi | OMRON SINIC X Corporation |
Co-Chair: Keipour, Azarakhsh | Amazon |
|
10:00-10:10, Paper MoA-4.1 | |
A Hierarchical Framework for Long Horizon Planning of Object-Contact Trajectories |
|
Aceituno, Bernardo | Massachusetts Institute of Technology (MIT) |
Rodriguez, Alberto | Massachusetts Institute of Technology |
Keywords: Manipulation Planning, Dexterous Manipulation, Optimization and Optimal Control
Abstract: Given an object, an environment, and a goal pose, how should a robot make contact to move it? Solving this problem requires reasoning about rigid-body dynamics, object and environment geometries, and hybrid contact mechanics. This paper proposes a hierarchical framework that solves this problem in 2D worlds, with polygonal objects and point fingers. To achieve this, we decouple the problem in three stages: 1) a high-level textit{graph search} over regions of free-space, 2) a medium-level randomized textit{motion planner} for the object motion, and 3) a low-level textit{contact-trajectory optimization} for the robot and environment contacts. In contrast to the state of the art, this approach does not rely on handcrafted primitives and can still be solved efficiently. This algorithm does not require seeding and can be applied to complex object shapes and environments. We validate this framework with extensive simulated experiments showcasing long-horizon and contact-rich interactions. We demonstrate how our algorithm can reliably solve complex planar manipulation problems in the order of seconds.
|
|
10:10-10:20, Paper MoA-4.2 | |
Constraint-Based Task Specification and Trajectory Optimization for Sequential Manipulation |
|
Phoon, Mun Seng | Technical University of Munich |
Schmitt, Philipp Sebastian | Siemens Corporate Technology |
v. Wichert, Georg | Siemens AG |
Keywords: Manipulation Planning, Motion and Path Planning
Abstract: To economically deploy robotic manipulators the programming and execution of robot motions must be swift. To this end, we propose a novel, constraint-based method to intuitively specify sequential manipulation tasks and to compute time-optimal robot motions for such a task specification. Our approach follows the ideas of constraint-based task specification by aiming for a minimal and object-centric task description that is largely independent of the underlying robot kinematics. We transform this task description into a non-linear optimization problem. By solving this problem we obtain a (locally) time-optimal robot motion, not just for a single motion, but for an entire manipulation sequence. We demonstrate the capabilities of our approach in a series of experiments involving five distinct robot models, including a highly redundant mobile manipulator.
|
|
10:20-10:30, Paper MoA-4.3 | |
Quasistatic Contact-Rich Manipulation Via Linear Complementarity Quadratic Programming |
|
Katayama, Sotaro | Kyoto University |
Taniai, Tatsunori | OMRON SINIC X Corporation |
Tanaka, Kazutoshi | OMRON SINIC X Corporation |
Keywords: Manipulation Planning, Motion and Path Planning, Dexterous Manipulation
Abstract: Contact-rich manipulation is challenging due to dynamically-changing physical constraints by the contact mode changes undergone during manipulation. This paper proposes a versatile local planning and control framework for contact-rich manipulation that determines the continuous control action under variable contact modes online. We model the physical characteristics of contact-rich manipulation by quasistatic dynamics and complementarity constraints. We then propose a linear complementarity quadratic program (LCQP) to efficiently determine the control action that implicitly includes the decisions on the contact modes under these constraints. In the LCQP, we relax the complementarity constraints to alleviate ill-conditioned problems that are typically caused by measure noises or model miss-matches. We conduct dynamical simulations on a 3D physical simulator and demonstrate that the proposed method can achieve various contact-rich manipulation tasks by determining the control action including the contact modes in real-time.
|
|
10:30-10:40, Paper MoA-4.4 | |
Efficient Spatial Representation and Routing of Deformable One-Dimensional Objects for Manipulation |
|
Keipour, Azarakhsh | Amazon |
Bandari, Maryam | X |
Schaal, Stefan | Google X |
Keywords: Manipulation Planning, Task Planning
Abstract: With the field of rigid-body robotics having matured in the last fifty years, routing, planning, and manipulation of deformable objects have recently emerged as a more untouched research area in many fields ranging from surgical robotics to industrial assembly and construction. Routing approaches for deformable objects which rely on learned implicit spatial representations (e.g., Learning-from-Demonstration methods) make them vulnerable to changes in the environment and the specific setup. On the other hand, algorithms that entirely separate the spatial representation of the deformable object from the routing and manipulation, often using a representation approach independent of planning, result in slow planning in high dimensional space. This paper proposes a novel approach to routing deformable one-dimensional objects (e.g., wires, cables, ropes, sutures, threads). This approach utilizes a compact representation for the object, allowing efficient and fast online routing. The spatial representation is based on the geometrical decomposition of the space into convex subspaces, resulting in a discrete coding of the deformable object configuration as a sequence. With such a configuration, the routing problem can be solved using a fast dynamic programming sequence matching method that calculates the next routing move. The proposed method couples the routing and efficient configuration for improved planning time. Our simulation and real experiments show the method correctly computing the next manipulation action in sub-millisecond time and accomplishing various routing and manipulation tasks.
|
|
10:40-10:50, Paper MoA-4.5 | |
Learning and Generalizing Cooperative Manipulation Skills Using Parametric Dynamic Movement Primitives (I) |
|
Kim, Hyoin | Seoul National University |
Oh, Changsuk | Seoul National University |
Jang, Inkyu | Seoul National University |
Park, Sungyong | Seoul National University |
Seo, Hoseong | Seoul National University |
Kim, H. Jin | Seoul National University |
Keywords: Learning from Demonstration, Manipulation Planning, Imitation Learning
Abstract: This paper presents an approach that generates the overall trajectory of mobile manipulators for a complex mission consisting of several sub-tasks. Parametric dynamic movement primitives (PDMPs) can quickly generalize the online motion of robot manipulation by learning multiple demonstrations in offline. However, regarding complex missions consisting of multiple sub-tasks, a large number of demonstrations are required for full generalization, which is impractical. In this paper, we propose a framework that reduces the number of demonstrations for a complex mission. In the proposed method, complex demonstrations are segmented into multiple unit motions representing sub-tasks, and one PDMP is formed per each segment, resulting in multiple PDMPs. The phase decision process determines which sub-task and associated PDMPs to be executed online, allowing multiple PDMPs to be autonomously configured within an integrated framework. In order to generalize the execution time and regional goal in each phase, the Gaussian process regression (GPR) is applied. Simulation results from two different scenarios confirm that the proposed framework not only effectively reduces the number of demonstrations but also improves generalization performance. The actual experiments also demonstrate that the mobile manipulators effectively perform complex missions through the proposed framework.
|
|
10:50-11:00, Paper MoA-4.6 | |
A Solution to Slosh-Free Robot Trajectory Optimization |
|
Cabral Muchacho, Rafael Ignacio | Munich Institute of Robotics & Machine Intelligence, Technische |
Laha, Riddhiman | Technical University of Munich |
Figueredo, Luis Felipe Cruz | Technical University of Munich (TUM) |
Haddadin, Sami | Technical University of Munich |
Keywords: Manipulation Planning, Optimization and Optimal Control, Nonholonomic Motion Planning
Abstract: This paper is about fast slosh-free fluid transportation. Existing approaches are either computationally heavy or only suitable for specific robots and container shapes. We model the end effector as a point mass suspended by a spherical pendulum and study the requirements for slosh-free motion and the validity of the point mass model. In this approach, slosh-free trajectories are generated by controlling the pendulum’s pivot and simulating the motion of the point mass. We cast the trajectory optimization problem as a quadratic program—this strategy can be used to obtain valid control inputs. Through simulations and experiments on a 7 DoF Franka Emika Panda robot we validate the effectiveness of the proposed approach.
|
|
11:00-11:10, Paper MoA-4.7 | |
Uncertainty-Aware Manipulation Planning Using Gravity and Environment Geometry |
|
von Drigalski, Felix Wolf Hans Erich | Mujin Inc |
Kasaura, Kazumi | Omron Sinic X |
Beltran-Hernandez, Cristian Camilo | Osaka University |
Hamaya, Masashi | OMRON SINIC X Corporation |
Tanaka, Kazutoshi | OMRON SINIC X Corporation |
Matsubara, Takamitsu | Nara Institute of Science and Technology |
Keywords: Manipulation Planning, Planning under Uncertainty, Assembly
Abstract: Factory automation robot systems often depend on specially-made jigs that precisely position each part, which increases the system's cost and limits flexibility. We propose a method to determine the 3D pose of an object with high precision and confidence, using only parallel robotic grippers and no parts-specific jigs. Our method automatically generates a sequence of actions that ensures that the real-world position of the physical object matches the system's assumed pose to sub-mm precision. Furthermore, we propose the use of ``extrinsic" actions, which use gravity, the environment and the gripper geometry to significantly reduce or even eliminate the uncertainty about an object's pose. We show in simulated and real-robot experiments that our method outperforms our previous work, at success rates over 95%. The source code will be made public at github.com/omron-sinicx.
|
|
11:10-11:20, Paper MoA-4.8 | |
Goal-Driven Robotic Pushing Using Tactile and Proprioceptive Feedback (I) |
|
Lloyd, John | University of Bristol |
Lepora, Nathan | University of Bristol |
Keywords: Force and Tactile Sensing, Dexterous Manipulation
Abstract: In robots, nonprehensile manipulation operations such as pushing are a useful way of moving large, heavy, or unwieldy objects, moving multiple objects at once, or reducing uncertainty in the location or pose of objects. In this study, we propose a reactive and adaptive method for robotic pushing that uses rich feedback from a high-resolution optical tactile sensor to control push movements instead of relying on analytical or data-driven models of push interactions. Specifically, we use goal-driven tactile exploration to actively search for stable pushing configurations that cause the object to maintain its pose relative to the pusher while incrementally moving the pusher and object toward the target. We evaluate our method by pushing objects across planar and curved surfaces. For planar surfaces, we show that the method is accurate and robust to variations in initial contact position/angle, object shape, and start position; for curved surfaces, the performance is degraded slightly. An immediate consequence of our work is that it shows that explicit models of push interactions might be sufficient but are not necessary for this type of task. It also raises the interesting question of which aspects of the system should be modeled to achieve the best performance and generalization across a wide range of scenarios. Finally, it highlights the importance of testing on nonplanar surfaces and in other more complex environments when developing new methods for robotic pushing.
|
|
11:20-11:30, Paper MoA-4.9 | |
Learning to Fold Real Garments with One Arm: A Case Study in Cloud-Based Robotics Research |
|
Hoque, Ryan | University of California, Berkeley |
Shivakumar, Kaushik | University of California Berkeley |
Aeron, Shrey | University of California, Berkeley |
Deza, Gabriel | University of California Berkeley |
Ganapathi, Aditya | University of California, Berkeley |
Wong, Adrian | Cornell University, Sandia National Labs |
Lee, Johnny | Google |
Zeng, Andy | Google |
Vanhoucke, Vincent | Google Research |
Goldberg, Ken | UC Berkeley |
Keywords: Deep Learning in Grasping and Manipulation, Performance Evaluation and Benchmarking, Imitation Learning
Abstract: Autonomous fabric manipulation is a longstanding challenge in robotics, but evaluating progress is difficult due to the cost and diversity of robot hardware. Using Reach, a cloud robotics platform that enables low-latency remote execution of control policies on physical robots, we present the first systematic benchmarking of fabric manipulation algorithms on physical hardware. We develop 4 novel learning-based algorithms that model expert actions, keypoints, reward functions, and dynamic motions, and we compare these against 4 learning-free and inverse dynamics algorithms on the task of folding a crumpled T-shirt with a single robot arm. The entire lifecycle of data collection, model training, and policy evaluation was performed remotely without physical access to the robot workcell. Results suggest a new algorithm combining imitation learning with analytic methods achieves human-level performance on the flattening task and 93% of human-level performance on the folding task. See https://sites.google.com/berkeley.edu/cloudfolding for all data, code, models, and supplemental material.
|
|
MoA-5 |
Rm5 (Room C-2) |
Computer Vision for Transportation |
Regular session |
Chair: Burgard, Wolfram | University Fo Freiburg |
Co-Chair: Triebel, Rudolph | German Aerospace Center (DLR) |
|
10:00-10:10, Paper MoA-5.1 | |
Multi-Sensor Data Annotation Using Sequence-Based Active Learning |
|
Denzler, Patrick | DB Netz AG |
Ziegler, Markus | DB Netz AG |
Jacobs, Arne | DB Netz AG |
Eiselein, Volker | DB Netz AG |
Neumaier, Philipp | DB Netz AG |
Koeppel, Martin | DB Netz AG |
Keywords: Computer Vision for Automation, Computer Vision for Transportation, Intelligent Transportation Systems
Abstract: Neural Networks are the state-of-the-art technology for environmental perception in applications such as autonomous driving. However, they require a large amount of training data in order to perform well, making the selection and annotation of sensor data a time-consuming and expensive task. Active learning is a promising approach to reduce the required amount of training data by selecting samples for annotation that are expected to improve the neural network the most. In this work, we propose a sequence-based active learning approach that selects sequences of consecutive frames instead of individual images. This allows to evaluate tracking algorithms and to reduce the annotation effort by interpolating labels between frames. Our approach is compared to a random sampling strategy as baseline. Over 15 iterations, both approaches select 1000 additional images for training in each iteration. The performance of the neural network trained on the data selected by our sequence-based active learning approach is compared to the performance of the network trained on the data select by the baseline approach. The results show that sequence-based active learning can reduce the required amount of training data by up to 25% while reaching a similar performance. Furthermore, sequence-based active learning can improve the neural network’s overall performance by 2% compared to a random sampling strategy. In this work, the proposed method was evaluated with a new dataset consisting of 15 scenes in railway environments. The dataset has 45,888 frames in total, 14,513 frames contain persons on railway stations or close to tracks.
|
|
10:10-10:20, Paper MoA-5.2 | |
3D Single-Object Tracking with Spatial-Temporal Data Association |
|
Zhang, Yongchang | Institute of Automation, Chinese Academy of Sciences, Beijing, C |
Niu, Hanbing | University of Electronic Science and Technology of China |
Guo, Yue | Chinese Academy of Sciences |
He, Wenhao | University of Chinese Academy of Sciences |
Keywords: Computer Vision for Transportation, RGB-D Perception, Autonomous Vehicle Navigation
Abstract: This paper proposes a novel 3D single-object tracker to more stably, accurately, and faster track objects, even if they are temporarily missed. Our idea is to utilize spatial-temporal data association to achieve object tracking robustly, and it consists of two main parts. We firstly employ a temporal motion model cross frames to estimate the object's temporal information and update the region of interest(ROI). The advanced detector only focuses on ROI rather than the whole scene to generate the spatial position. Second, we introduce a new pairwise evaluation system to exploit spatial-temporal data association in point clouds. The proposed evaluation system considers detection confidence, orientation offset, and objects distance to more stably achieve object matching. Then, we update the predicted state based on the pairwise spatial-temporal data. Finally, we utilize the previous trajectory to enhance the accuracy of static tracking in the refinement scheme. Experiments on the KITTI and nuScenes tracking datasets demonstrate that our method outperforms other state-of-the-art methods by a large margin (a 10% improvement and 280 FPS on a single NVIDIA 1080Ti GPU). Compared with multi-object tracking, our tracker also has superiority.
|
|
10:20-10:30, Paper MoA-5.3 | |
Instance-Aware Multi-Object Self-Supervision for Monocular Depth Prediction |
|
Boulahbal, Houssem Eddine | University Cote d'Azur, Renault Software Labs |
Voicila, Adrian | Renault Software Labs |
Comport, Andrew Ian | CNRS-I3S/UNS |
Keywords: Deep Learning for Visual Perception, Computer Vision for Automation, RGB-D Perception
Abstract: This paper proposes a self-supervised monocular image-to-depth prediction framework that is trained with an end-to-end photometric loss that handles not only 6-DOF camera motion but also 6-DOF moving object instances. Self-supervision is performed by warping the images across a video sequence using depth and scene motion including object instances. One novelty of the proposed method is the use of the multi-head attention of the transformer network that matches moving objects across time and models their interaction and dynamics. This enables accurate and robust pose estimation for each object instance. Most image-to-depth predication frameworks make the assumption of rigid scenes, which largely degrades their performance with respect to dynamic objects. Only a few SOTA papers have accounted for dynamic objects. The proposed method is shown to outperform these methods on standard benchmarks and the impact of the dynamic motion on these benchmarks is exposed. Furthermore, the proposed image-to-depth prediction framework is also shown to be competitive with SOTA video-to-depth prediction frameworks.
|
|
10:30-10:40, Paper MoA-5.4 | |
TransDARC: Transformer-Based Driver Activity Recognition with Latent Space Feature Calibration |
|
Peng, Kunyu | Karlsruhe Institute of Technology |
Roitberg, Alina | Karlsruhe Institute of Technology (KIT) |
Yang, Kailun | Karlsruhe Institute of Technology |
Zhang, Jiaming | Karlsruhe Institute of Technology |
Stiefelhagen, Rainer | Karlsruhe Institute of Technology |
Keywords: Computer Vision for Transportation, Recognition, Gesture, Posture and Facial Expressions
Abstract: Traditional video-based human activity recognition has experienced remarkable progress linked to the rise of deep learning, but this effect was slower as it comes to the downstream task of driver behavior understanding. Understanding the situation inside the vehicle cabin is essential for Advanced Driving Assistant System (ADAS) as it enables identifying distraction, predicting driver’s intent and leads to more convenient human-vehicle interaction. At the same time, driver observation systems face substantial obstacles as they need to capture different granularities of driver states while the complexity of such secondary activities grows with the rising automation and increased driver freedom. Furthermore, a model is rarely deployed under conditions identical to the ones in the training set, as sensor placements and types vary from vehicle to vehicle, constituting a substantial obstacle forreal-life deployment of data-driven models. In this work, we present a novel vision based framework for recognizing secondary driver behaviours based on visual transformers and an additional augmented feature distribution calibration module. This module operates in the latent feature-space enriching and diversifying the training set at feature-level in order to improve (1) generalization to novel data appearances, (e.g. ,sensor changes) and (2) recognition of driver behaviours un-der represented in the training set. Our framework consistently leads to better recognition rates, surpassing previous state of-the-art results of the public Drive&Act benchmark on all granularity levels. Our code will be made publicly available at https://github.com/KPeng9510/TransDARC.
|
|
10:40-10:50, Paper MoA-5.5 | |
Attention-Based Deep Driving Model for Autonomous Vehicles with Surround-View Cameras |
|
Zhao, Yang | University of Electronic Science and Technology of China |
Li, Jie | University of Electronic Science and Technology of China |
Huang, Rui | University of Electronic Science and Technology of China |
Li, Boqi | University of Michigan, Ann Arbor |
Luo, Ao | Megvii Technology |
Li, Yaochen | Xi'an Jiaotong University |
Cheng, Hong | University of Electronic Science and Technology |
Keywords: Computer Vision for Transportation, Autonomous Vehicle Navigation, Intelligent Transportation Systems
Abstract: Experienced human drivers always make safe driving decisions with selectively observing the front, rear and side-view mirrors. Several end-to-end methods have been proposed to learn driving models with multi-view visual information. However, these benchmarking methods lack semantic understanding of multi-view image contents, where human drivers usually reasoning these information for decision making with different visual region of interests. In this paper, we propose an attention-based deep learning method to learn a driving model with input of surround-view visual information and the route planner, in which a multi-view attention module is designed for obtaining region of interests from human drivers. We evaluate our model on the Drive360 dataset with comparison of benchmarking deep driving models. Results demonstrate that our model achieves a competitive accuracy in both steering angle and speed prediction than benchmarking methods. Code is available at https://github.com/jet-uestc/MVA-Net.
|
|
10:50-11:00, Paper MoA-5.6 | |
Towards Safety-Aware Pedestrian Detection in Autonomous Systems |
|
Lyssenko, Maria | Robert Bosch GmbH, University of Munich |
Gladisch, Christoph David | Robert Bosch GmbH |
Heinzemann, Christian | Robert Bosch GmbH |
Woehrle, Matthias | Robert Bosch GmbH |
Triebel, Rudolph | German Aerospace Center (DLR) |
Keywords: Computer Vision for Transportation, Motion and Path Planning, Deep Learning for Visual Perception
Abstract: In this paper, we present a framework to assess the quality of a pedestrian detector in an autonomous driving scenario. To do this, we exploit performance metrics from the domain of computer vision on one side and so-called threat metrics from the motion planning domain on the other side. Based on a reachability analysis that accounts for the uncertainty in future motions of other traffic participants, we can determine the worst-case threat from the planning domain and relate it to the corresponding detection from the visual input. Our evaluation results for a RetinaNet on the Argoverse 1.1 [1] dataset show that already a rather simple threat metric such as time-to-collision (TTC) allows to select potentially dangerous interactions between the ego vehicle and a pedestrian when purely vision-based detections fail, even if they are passed to a subsequent object tracker. In addition, our results show that two different DNNs (Deep Neural Networks) with comparable performance differ significantly in the number of critical scenarios that we can identify with our method.
|
|
11:00-11:10, Paper MoA-5.7 | |
Self-Supervised Moving Vehicle Detection from Audio-Visual Cues |
|
Zürn, Jannik | University of Freiburg |
Burgard, Wolfram | University Fo Freiburg |
Keywords: Computer Vision for Transportation, Deep Learning for Visual Perception, Representation Learning
Abstract: Robust detection of moving vehicles is a critical task for any autonomously operating outdoor robot or self-driving vehicle. Most modern approaches for solving this task rely on training image-based detectors using large-scale vehicle detection datasets such as nuScenes or the Waymo Open Dataset. Providing manual annotations is an expensive and laborious exercise that does not scale well in practice. To tackle this problem, we propose a self-supervised approach that leverages audio-visual cues to detect moving vehicles in videos. Our approach employs contrastive learning for localizing vehicles in images from corresponding pairs of images and recorded audio. In extensive experiments carried out with a real-world dataset, we demonstrate that our approach provides accurate detections of moving vehicles and does not require manual annotations. We furthermore show that our model can be used as a teacher to supervise an audio-only detection model. This student model is invariant to illumination changes and thus effectively bridges the domain gap inherent to models leveraging exclusively vision as the predominant modality.
|
|
11:10-11:20, Paper MoA-5.8 | |
Multi-Source Domain Alignment for Robust Segmentation in Unknown Targets |
|
Shyam, Pranjay | Korea Advanced Institute of Science and Technology |
Yoon, Kuk-Jin | KAIST |
Kim, Kyung-Soo | KAIST(Korea Advanced Institute of Science and Technology) |
Keywords: Computer Vision for Transportation, Deep Learning for Visual Perception, Semantic Scene Understanding
Abstract: Semantic segmentation provides scene understanding capability by performing pixel-wise classification of objects within an image. However, the sensitivity of such algorithms towards domain changes requires fine-tuning using an annotated dataset for each novel domain, which is expensive to construct and inefficient. We highlight that irrespective of the training dataset, structural properties of scenes remain the same hence domain sensitivity arises from training methodology. Thus, in this paper, we propose a domain alignment approach wherein multiple synthetic source domains are used to train an underlying segmentation network such that it performs consistently in unknown real target domains. Towards this end, we propose a pixel-wise supervised contrastive learning framework that enforces constraints in latent space resulting in features belonging to the same class being clustered closely and away from different classes. This approach allows for better capturing of global and local semantics while providing domain invariant properties. Our approach can be easily incorporated into prior semantic segmentation approaches without the significant computational overhead. We empirically demonstrate the efficacy of the proposed approach on GTAV to Cityscapes, GTAV+Synthia to Cityscapes, and GTAV+Synthia+Synscapes to Cityscapes scenarios and report state-of-the-art (SoTA) performance without requiring access to images from the target domain.
|
|
11:20-11:30, Paper MoA-5.9 | |
Depth360: Self-Supervised Learning for Monocular Depth Estimation Using Learnable Camera Distortion Model |
|
Hirose, Noriaki | TOYOTA Central R&D Labs., INC |
Tahara, Kosuke | Toyota Central R&D Labs., Inc |
Keywords: Deep Learning for Visual Perception, Omnidirectional Vision, Computer Vision for Transportation
Abstract: Self-supervised monocular depth estimation has been widely investigated to estimate depth images and relative poses from RGB images. This framework is promising because the depth and pose networks can be trained from just time-sequence images without the need for the ground truth depth and poses. In this work, we estimate the depth around a robot (360 degree view) using time-sequence spherical camera images, from a camera whose parameters are unknown. We propose a learnable axisymmetric camera model which accepts distorted spherical camera images with two fisheye camera images as well as pinhole camera images. In addition, we trained our models with a photo-realistic simulator to generate ground truth depth images to provide supervision. Moreover, we introduced loss functions to provide floor constraints to reduce artifacts that can result from reflective floor surfaces. We demonstrate the efficacy of our method using the spherical camera images from the GO Stanford dataset and pinhole camera images from the KITTI dataset to compare our method’s performance with that of baseline method in learning the camera parameters.
|
|
MoA-6 |
Rm6 (Room D) |
Aerial Systems 1 |
Regular session |
Chair: Katzschmann, Robert Kevin | ETH Zurich |
Co-Chair: Suzuki, Satoshi | Chiba University |
|
10:00-10:10, Paper MoA-6.1 | |
Real-Time Hybrid Mapping of Populated Indoor Scenes Using a Low-Cost Monocular UAV |
|
Golodetz, Stuart | University of Oxford |
Vankadari, Madhu | University of Oxford |
Everitt, Aluna | University of Oxford |
Shin, Sangyun | University of Oxford |
Markham, Andrew | Oxford University |
Trigoni, Niki | University of Oxford |
Keywords: Aerial Systems: Applications, SLAM, Human Detection and Tracking
Abstract: Unmanned aerial vehicles (UAVs) have been used for many applications in recent years, from urban search and rescue, to agricultural surveying, to autonomous underground mine exploration. However, deploying UAVs in tight, indoor spaces, especially close to humans, remains a challenge. One solution, when limited payload is required, is to use micro-UAVs, which pose less risk to humans and typically cost less to replace after a crash. However, micro-UAVs can only carry a limited sensor suite, e.g. a monocular camera instead of a stereo pair or LiDAR, complicating tasks like dense mapping and markerless multi-person 3D human pose estimation, which are needed to operate in tight environments around people. Monocular approaches to such tasks exist, and dense monocular mapping approaches have been successfully deployed for UAV applications. However, despite many recent works on both marker-based and markerless multi-UAV single-person motion capture, markerless single-camera multi-person 3D human pose estimation remains a much earlier-stage technology, and we are not aware of existing attempts to deploy it in an aerial context. In this paper, we present what is thus, to our knowledge, the first system to perform simultaneous mapping and multi-person 3D human pose estimation from a monocular camera mounted on a single UAV. In particular, we show how to loosely couple state-of-the-art monocular depth estimation and monocular 3D human pose estimation approaches to reconstruct a hybrid map of a populated indoor scene in real time. We validate our component-level design choices via extensive experiments on the large-scale ScanNet and GTA-IM datasets. To evaluate our system-level performance, we also construct a new Oxford Hybrid Mapping dataset of populated indoor scenes.
|
|
10:10-10:20, Paper MoA-6.2 | |
GaSLAM: An Algorithm for Simultaneous Gas Source Localization and Gas Distribution Mapping in 3D |
|
Ercolani, Chiara | EPFL |
Tang, Lixuan | EPFL |
Martinoli, Alcherio | EPFL |
Keywords: Aerial Systems: Applications, Environment Monitoring and Management, Probabilistic Inference
Abstract: Chemical gas dispersion poses considerable threat to humans, animals and the environment. The research areas of gas source localization and gas distribution mapping aim to localize the source of gas leaks and map the gas plume respectively, in order to help the coordination of swift rescue missions. Although very similar, these two areas are often treated separately in literature. In some cases, inferences on the gas distribution are made a posteriori from the source location, or vice-versa. In this paper, we introduce GaSLAM, a methodology that couples the estimation of the gas map and the source location using two state of the art algorithms with a novel navigation strategy based on informative quantities. The synergistic approach allows our algorithm to achieve a good estimation of both objectives and push the navigation strategies towards informative areas of the experimental volume. We validate the algorithm in simulation and with physical experiments in varying environmental conditions. We show that the algorithm improves on the source location estimate compared to a similar approach found in literature, and is able to deliver good quality maps of the gas distribution.
|
|
10:20-10:30, Paper MoA-6.3 | |
An Aerial Parallel Manipulator with Shared Compliance |
|
Stephens, Brett | Imperial College London |
Orr, Lachlan | Imperial College London |
Kocer, Basaran Bahadir | Imperial College London |
Nguyen, Hai-Nguyen | Imperial College London |
Kovac, Mirko | Imperial College London |
Keywords: Aerial Systems: Mechanics and Control, Aerial Systems: Applications, Dynamics
Abstract: Accessing and interacting with difficult to reach surfaces at various orientations is of interest within a variety of industrial contexts. Thus far, the predominant robotic solution to such a problem has been to leverage the maneuverability of a fully actuated, omnidirectional aerial manipulator. Such an approach, however, requires a specialised system with a high relative degree of complexity, thus reducing platform endurance and real-world applicability. The work here presents a new aerial system composed of a parallel manipulator and conventional, underactuated multirotor flying base to demonstrate interaction with vertical and non-vertical surfaces. Our solution enables compliance to external disturbance on both subsystems, the manipulator and flying base, independently with a goal of improved overall system performance when interacting with surfaces. To achieve this behaviour, an admittance control strategy is implemented on various layers of the flying base's dynamics together with torque limits imposed on the manipulator actuators. Experimental evaluations show that the proposed system is compliant to external perturbations while allowing for differing interaction behaviours as compliance parameters of each subsystem are altered. Such capabilities enable an adjustable form of dexterity in completing sensor installation, inspection and aerial physical interaction tasks. A video of our system interacting with various surfaces can be found here: https://youtu.be/38neGb8-lXg.
|
|
10:30-10:40, Paper MoA-6.4 | |
RAPTOR: Rapid Aerial Pickup and Transport of Objects by Robots |
|
Appius, Aurel Xaver | ETH Zürich |
Bauer, Erik | ETH Zürich |
Blöchlinger, Marc | ETHZ |
Kalra, Aashi | ETHZ |
Oberson, Robin | ETHZ |
Raayatsanati, Arman | ETH Zurich |
Strauch, Pascal | ETH Zurich |
S Menon, Sarath | PSG College of Technology |
von Salis, Marco | ETH Zurich |
Katzschmann, Robert Kevin | ETH Zurich |
Keywords: Aerial Systems: Applications, Soft Robot Applications, Grippers and Other End-Effectors
Abstract: Rapid aerial grasping through robots can lead to many applications that utilize fast and dynamic picking and placing of objects. Rigid grippers traditionally used in aerial manipulators require high precision and specific object geometries for successful grasping. We propose RAPTOR, a quadcopter platform combined with a custom Fin Ray® gripper to enable more flexible grasping of objects with different geometries, leveraging the properties of soft materials to increase the contact surface between the gripper and the objects. To reduce the communication latency, we present a new lightweight middleware solution based on Fast DDS (Data Distribution Service) as an alternative to ROS (Robot Operating System). We show that RAPTOR achieves an average of 83% grasping efficacy in a real-world setting for four different object geometries while moving at an average velocity of 1 m/s during grasping. In a high-velocity setting, RAPTOR supports up to four times the payload compared to previous works. Our results highlight the potential of aerial drones in automated warehouses and other manipulation applications where speed, swiftness, and robustness are essential while operating in hard-to-reach places.
|
|
10:40-10:50, Paper MoA-6.5 | |
Side-Pull Maneuver: A Novel Control Strategy for Dragging a Cable-Tethered Load of Unknown Weight Using a UAV |
|
Brandao, Alexandre Santos | Federal University of Viçosa |
Smrcka, Daniel | Czech Technical University in Prague |
Pairet Artau, Èric | Technology Innovation Institute |
Nascimento, Tiago | Universidade Federal Da Paraiba |
Saska, Martin | Czech Technical University in Prague |
Keywords: Aerial Systems: Applications, Intelligent Transportation Systems, Control Architectures and Programming
Abstract: This work presents an approach for dealing with suspended-cable load transportation using unmanned aerial vehicles (UAVs), specifically when the cargo overcomes the lifting capacity. Herein, this approach is referred to as the Side-Pull Maneuver (SPM). This maneuver is an alternative and viable strategy for cases where there is no impediment or restriction to dragging the load along a surface, such as with pastures or marine environments. The proposal is based on a joint observation of the thrust and altitude of the UAV. To make this possible, the high-level rigid-body dynamics model is described and represented as an underactuated system. Its altitude-rate control input is then analyzed during flight. A flight state supervisor decides whether the cargo should be carried by lifting or by side-pulling, or whether it should be labeled as non-transportable. Comparative real experiments validate the proposal according to which maneuver (lifting or dragging) is performed for transport.
|
|
10:50-11:00, Paper MoA-6.6 | |
On-Board Physical Battery Replacement System and Procedure for Drones During Flight |
|
Guetta, Yoad | Ben Gurion University of the Negev |
Shapiro, Amir | Ben Gurion University of the Negev |
Keywords: Aerial Systems: Applications, Product Design, Development and Prototyping, Mechanism Design
Abstract: One of the major disadvantages of drones is their limited flight time. This paper introduces a new concept and mechanism for an onboard system that physically replaces batteries during flight, analogous to “aerial refueling”. This capability allows drones to remain in mid-air indefinitely while pursuing their mission without forcing them to change flight paths for logistical needs. The concept is composed of an additional UAV array that delivers new batteries from various ground points. We first describe the Flying Hot-Swap Battery (FHSB) system’s conceptual design. This unique design uses a FIFO logical process and the force of gravity to replace the energy source. The main innovation involves combining the ability to receive a battery from an external source, connect mechanically, hot-swap between the batteries, and dispose of the discharged battery. We report on the design of a dedicated battery cartridge for reception and connection in any spatial variation or orientation. Each component has a duplicate that works independently to increase system redundancy. Finally, we present the multiple experiments conducted to test the FHSB. The prototype successfully hovered and connected the battery in various reception orientations and hot-swapped the battery, thus maintaining the drone’s continuous power supply. This proof of concept of a complete battery replacement process during flight took an average replacement time of 15.2 seconds, which is only a 0.81% energy loss, thus enabling the drone to continue flying indefinitely without needing to modify its flight path (see attached video).
|
|
11:00-11:10, Paper MoA-6.7 | |
Frequency-Based Wind Gust Estimation for Quadrotors Using a Nonlinear Disturbance Observer |
|
Asignacion, Abner Jr | Chiba University |
Suzuki, Satoshi | Chiba University |
Noda, Ryusuke | Kyoto University |
Nakata, Toshiyuki | Chiba University |
Liu, Hao | Chiba University |
Keywords: Aerial Systems: Mechanics and Control, Automation Technologies for Smart Cities, Environment Monitoring and Management
Abstract: In city-wide weather prediction, wind gust information can be obtained using unmanned aerial vehicles (UAVs). Although wind sensors are available, an algorithm-based active estimation can be helpful not only as a weightless substitute but also as feedback for robust control. This paper aims to estimate the wind gusts affecting the quadrotors (a type of UAV) as the input disturbances by using a frequency-based nonlinear disturbance observer (NDOB). To obtain highly accurate estimations, frequency is considered as the main design parameter, thereby focusing the estimation on the frequency range of the wind gusts. The NDOB is developed using the Takagi-Sugeno (T-S) fuzzy framework. In this approach, the twelfth-order nonlinear model is approximated into a sixth-order T-S fuzzy model to reduce computational cost. A two-step verification method is presented, which includes MATLAB/Simulink simulations and the experiments performed using a 2.5 kg quadrotor.
|
|
11:10-11:20, Paper MoA-6.8 | |
Vision-Based Relative Detection and Tracking for Teams of Micro Aerial Vehicles |
|
Ge, Rundong | New York University |
Lee, Moonyoung | Carnegie Mellon University |
Radhakrishnan, Vivek | Technology Innovation Institute, New York University |
Zhou, Yang | New York University |
Li, Guanrui | New York University |
Loianno, Giuseppe | New York University |
Keywords: Aerial Systems: Applications
Abstract: In this paper, we address the vision-based detection and tracking problems of multiple aerial vehicles using a single camera and Inertial Measurement Unit (IMU) as well as the corresponding perception consensus problem (i.e., uniqueness and identical IDs across all observing agents). We design several vision-based decentralized Bayesian multi-tracking filtering strategies to resolve the association between the incoming unsorted measurements obtained by a visual detector algorithm and the tracked agents. We compare their accuracy in different operating conditions as well as their scalability according to the number of agents in the team. This analysis provides useful insights about the most appropriate design choice for the given task. We further show that the proposed perception and inference pipeline which includes a Deep Neural Network (DNN) as visual target detector is lightweight and capable of concurrently running control and planning with Size, Weight, and Power (SWaP) constrained robots on-board. Experimental results show the effective tracking of multiple drones in various challenging scenarios such as heavy occlusions.
|
|
11:20-11:30, Paper MoA-6.9 | |
Efficient Concurrent Design of the Morphology of Unmanned Aerial Systems and Their Collective-Search Behavior |
|
Zeng, Chen | University at Buffalo |
Krisshna Kumar, Prajit | University at Buffalo |
Witter, Jhoel | University at Buffalo |
Chowdhury, Souma | University at Buffalo, State University of New York |
Keywords: Optimization and Optimal Control, Swarm Robotics, Methods and Tools for Robot System Design
Abstract: The collective operation of robots (such as unmanned aerial vehicles or UAVs) operating as a team or swarm is affected by their individual capabilities, which in turn is dependent on their physical design, aka morphology. However, with the exception of a few (albeit ad hoc) evolutionary robotics methods, there has been very little work on understanding the interplay of morphology and collective behavior, especially given the lack of computational frameworks to concurrently search for the robot morphology and the hyper-parameters of their behavior model that jointly optimize the collective (team) performance. To address this gap, this paper proposes a new co-design framework. Here the exploding computational cost of an otherwise nested morphology/behavior co-design is effectively alleviated through the novel concept of ``talent" metrics; while also allowing significantly better solutions compared to the typically sub-optimal sequential morphologytobehavior design approach. This framework comprises four major steps: talent metrics selection, talent Pareto exploration (a multi-objective morphology optimization process), behavior optimization, and morphology finalization. This co-design concept is demonstrated by applying it to design UAVs that operate as a team to localize signal sources (e.g., in victim search and hazard localization), where the collective behavior is driven by a recently reported batch Bayesian search algorithm called Bayes-Swarm. Our case studies show that the outcome of co-design provides significantly higher success rates in signal source localization compared to a baseline design, across a variety of signal environments and teams with 6 to 15 UAVs. Moreover, this co-design process provides two orders of magnitude reduction in computing time compared to a projected nested design approach.
|
|
MoA-7 |
Rm7 (Room E) |
Medical Robots and Systems 1 |
Regular session |
Chair: Schlenk, Christopher | German Aerospace Center (DLR) |
Co-Chair: Chen, Cheng-Wei | National Taiwan University |
|
10:00-10:10, Paper MoA-7.1 | |
Automatic Laser Steering for Middle Ear Surgery |
|
So, Jae-Hun | Institut Des Systèmes Intelligents Et De Robotique |
Szewczyk, Jérôme | Université Pierre Et Marie Curie-Paris 6 |
Tamadazte, Brahim | CNRS |
Keywords: Medical Robots and Systems, Computer Vision for Medical Robotics
Abstract: This paper deals with the control of a laser spot in the context of minimally invasive surgery of the middle ear, e.g., cholesteatoma removal. More precisely, our work is concerned with the exhaustive burring of residual infected cells after primary mechanical resection of the pathological tissues since the latter cannot guarantee the treatment of all the infected tissues, the remaining infected cells cause regeneration of the diseases in 20%-25% of cases, which require a second surgery 12-18 months later. To tackle such a complex surgery, we have developed a robotic platform that consists of the combination of a macro-scale system (7 degrees of freedoms (DoFs) robotic arm) and a micro-scale flexible system (2 DoFs) which operates inside the middle ear cavity. To be able to treat the residual cholesteatoma regions, we proposed a method to generate optimal laser automatically scanning trajectories inside the areas and between them. The trajectories are tacked using an image-based control scheme. The proposed method and materials were validated experimentally using the lab-made robotic platform. The obtained results in terms of accuracy and behaviour meet the laser surgery requirements perfectly.
|
|
10:10-10:20, Paper MoA-7.2 | |
Virtual Reality Simulator for Fetoscopic Spina Bifida Repair Surgery |
|
Korzeniowski, Przemyslaw | Sano Centre for Computational Medicine |
Plotka, Szymon | Sano Centre for Computational Medicine |
Brawura-Biskupski-Samaha, Robert | II Department of Obstetrics and Gynaecology, the Medical Centre |
Sitek, Arkadiusz | Sano Centre for Computational Medicine |
Keywords: Medical Robots and Systems, Virtual Reality and Interfaces, Haptics and Haptic Interfaces
Abstract: Spina Bifida (SB) is a birth defect developed during the early stage of pregnancy in which there is an incomplete closing of the spine around the spinal cord. The growing interest in fetoscopic Spina Bifida repair, which is performed in fetuses who are still in the pregnant uterus, prompts the need for appropriate training. The learning curve for such procedures is steep and requires excellent procedural skills. Computer-based virtual reality (VR) simulation systems offer a safe, cost-effective, and configurable training environment free from ethical and patient safety issues. However, to the best of our knowledge, there are currently no commercial or experimental VR training simulation systems available for fetoscopic SB-repair procedures. In this paper, we propose a novel VR simulator for core manual skills training for SB-repair. An initial simulation realism validation study was carried out by obtaining subjective feedback (face and content validity) from 14 clinicians. The overall simulation realism was on average marked 4.07 on a 5-point Likert scale (1 - ‘very unrealistic’, 5 - ‘very realistic’). Its usefulness as a training tool for SB-repair as well as in learning fundamental laparoscopic skills was marked 4.63 and 4.80, respectively. These results indicate that VR simulation of fetoscopic procedures may contribute to surgical training without putting fetuses and their mothers at risk. It could also facilitate wider adaptation of fetoscopic procedures in place of much more invasive open fetal surgeries.
|
|
10:20-10:30, Paper MoA-7.3 | |
A Pneumatic MR-Conditional Guidewire Delivery Mechanism with Decoupled Rotary Linear Actuation for Endovascular Intervention |
|
Huang, Shaoping | Shanghai Jiao Tong University |
Lou, Chuqian | Imperial College London |
Xuan, Lian | Shanghai Jiao Tong University |
Gao, Hongyan | Shanghai Jiao Tong University |
Gao, Anzhu | Shanghai Jiao Tong University |
Yang, Guang-Zhong | Shanghai Jiao Tong University |
Keywords: Medical Robots and Systems
Abstract: Percutaneous coronary intervention (PCI) involves the delivery of a flexible submillimeter guidewire and existing x-ray based approaches impose significant ironing radiation. The use of magnetic resonance imaging (MRI) for intraoperative guidance has the advantages of not only being safe but also having high positioning accuracy and excellent tissue contrast. This paper develops a pneumatically driven MR-conditional delivery mechanism for the ease of manipulation of the guidewire in vivo. It incorporates newly developed rotary pneumatic step motors and a pneumatic slip ring for actuation and decoupling of translational and rotational motions. An effective clamping mechanism for the locking and releasing of the guidewire is also incorporated. The proposed pneumatic slip ring mechanism decouples six gas lines, where four are used to supply a pneumatic step motor for translational motion, and two for the clamping mechanism. High friction sil sleeve is used to hold the guidewire firmly and the rotary pneumatic motor has excellent sealing and stability, providing an output torque of 15.75 Nm/MPa. Experiments show that the average error of translational motion is 0.37 mm. Real-time MRI-guided endovascular intervention is performed in a vascular phantom with pulsatile flows to validate its potential clinical use. The imaging artifact test under MRI shows no noticeable distortion and the loss of Signal-to-Noise Ratio (SNR) is less than 2%.
|
|
10:30-10:40, Paper MoA-7.4 | |
Design and Evaluation of the Infant Cardiac Robotic Surgical System (iCROSS) |
|
Chen, Po-Chih | National Taiwan University |
Hsieh, Pei-An | National Taiwan University |
Huang, Jing-Yuan | National Taiwan University |
Huang, Shu-Chien | National Taiwan University Hospital |
Chen, Cheng-Wei | National Taiwan University |
Keywords: Medical Robots and Systems, Dual Arm Manipulation, Mechanism Design
Abstract: In this study, the infant Cardiac Robotic Surgical System (iCROSS) is developed to assist a surgeon in performing the patent ductus arteriosus (PDA) closure and other infant cardiac surgeries. The iCROSS is a dual-arm robot allowing two surgical instruments to collaborate in a narrow space while keeping a sufficiently large workspace. Compared with the existing surgical robotic systems, the iCROSS meets the specific requirements of infant cardiac surgeries. Its feasibility has been validated through several teleoperated tasks performed in the experiment. In particular, the iCROSS is able to perform surgical ligation successfully within one minute.
|
|
10:40-10:50, Paper MoA-7.5 | |
A Robotic System for Solo Surgery in Flexible Ureterorenoscopy |
|
Schlenk, Christopher | German Aerospace Center (DLR) |
Klodmann, Julian | German Aerospace Center |
Hagmann, Katharina | German Aerospace Center |
Kolb, Alexander | DLR |
Hellings-Kuß, Anja | DLR |
Steidle, Florian | German Aerospace Center |
Schoeb, Dominik Stefan | University of Freiburg |
Jürgens, Thorsten | Olympus Winter & Ibe GmbH |
Miernik, Arkadiusz | University of Freiburg |
Albu-Schäffer, Alin | DLR - German Aerospace Center |
Keywords: Medical Robots and Systems, Physical Human-Robot Interaction, Performance Evaluation and Benchmarking
Abstract: Urolithiasis is a common disease with increasing prevalence across all ages. A common treatment option for smaller kidney stones is flexible ureterorenoscopy (fURS), where a flexible ureteroscope (FU) is used for stone removal and to inspect the renal collecting system. The handling of the flexible ureteroscope and end effectors (EEs), however, is challenging and requires two surgeons. In this paper, we introduce a modular robotic system for endoscope manipulation, which enables solo surgery (SSU) and is adaptable to various hand-held FUs. Both the developed hardware components and the proposed workflow and its representation in software are described. We then present and discuss the results of an initial user study. Finally, we describe subsequent developmental steps towards more extensive testing by clinical staff.
|
|
10:50-11:00, Paper MoA-7.6 | |
Light in the Larynx: A Miniaturized Robotic Optical Fiber for In-Office Laser Surgery of the Vocal Folds |
|
Chiluisa, Alex | Worcester Polytechnic Institute |
Pacheco, Nicholas | Worcester Polytechnic Institute |
Do, Hoang | Worcester Polytechnic Institute |
Tougas, Ryan | Worcester Polytechnic Institute |
Minch, Emily | Worcester Polytechnic Institute |
Mihaleva, Rositsa | Worcester Polytechnic Institute |
Shen, Yao | Worcester Polytechnic Institute |
Liu, Yuxiang | Worcester Polytechnic Institute |
Carroll, Thomas | Harvard Medical School |
Fichera, Loris | Worcester Polytechnic Institute |
Keywords: Medical Robots and Systems, Surgical Robotics: Steerable Catheters/Needles
Abstract: This letter reports the design, construction, and experimental validation of a novel hand-held robot for in-office laser surgery of the vocal folds. In-office endoscopic laser surgery is an emerging trend in Laryngology: It promises to deliver the same patient outcomes of traditional surgical treatment (i.e., in the operating room), at a fraction of the cost. Unfortunately, office procedures can be challenging to perform; the optical fibers used for laser delivery can only emit light forward in a line-of-sight fashion, which severely limits anatomical access. The robot we present in this letter aims to overcome these challenges. The end effector of the robot is a steerable laser fiber, created through the combination of a thin optical fiber (0.225 mm) with a tendon-actuated Nickel-Titanium notched sheath that provides bending. This device can be seamlessly used with most commercially available endoscopes, as it is sufficiently small (1.1 mm) to pass through a working channel. To control the fiber, we propose a compact actuation unit that can be mounted on top of the endoscope handle, so that, during a procedure, the operating physician can operate both the endoscope and the steerable fiber with a single hand. We report simulation and phantom experiments demonstrating that the proposed device substantially enhances surgical access compared to current clinical fibers.
|
|
11:00-11:10, Paper MoA-7.7 | |
A 5-DOFs Robot for Posterior Segment Eye Microsurgery |
|
Wang, Ning | Xi'an Jiaotong University |
Zhang, Xiaodong | Xi’an Jiaotong University |
Li, Mingyang | Xi'an Jiaotong University |
Zhang, Hongbing | The First Affiliated Hospital of Northwestern University |
Stoyanov, Danail | University College London |
Stilli, Agostino | University College London |
Keywords: Medical Robots and Systems, Surgical Robotics: Steerable Catheters/Needles, Parallel Robots
Abstract: In retinal surgery clinicians access the internal volume of the eyeball through small scale trocar ports, typically 0.65 mm in diameter, to treat vitreoretinal disorders like idiopathic epiretinal membrane and age-related macular holes. The treatment of these conditions involves the removal of thin layers of diseased tissue, namely the epiretinal membrane and the internal limiting membrane. These membranes have an average thickness of only 60 μm and 2 μm respectively making extremely challenging even for expert clinicians to peel without damaging the surrounding tissue. In this work we present a novel Ophthalmic microsurgery Robot (OmSR) designed to operate a standard surgical forceps used in these procedures with micrometric precision, overcoming the limitations of current robotic systems associated with the offsetting of the remote centre of motion of the end effector when accessing the sclera. The design of the proposed system is presented, and its performance evaluated. The results show that the end effector can be controlled with an accuracy of less than 30 μm and the surgical forceps opening and closing positional error is less than 4.3 μm. Trajectory-following experiments and membrane peeling experiments are also presented, showing promising results in both scenarios.
|
|
MoA-8 |
Rm8 (Room F) |
Mechanism Design 1 |
Regular session |
Chair: Mizuuchi, Ikuo | Tokyo University of Agriculture and Technology |
Co-Chair: Park, Hae-Won | Korea Advanced Institute of Science and Technology |
|
10:00-10:10, Paper MoA-8.1 | |
DRPD, Compact Dual Reduction Ratio Planetary Drive for Actuators of Articulated Robots |
|
Song, Tae-Gyu | Korea Advanced Institute of Science and Technology, KAIST |
Shin, Young-Ha | KAIST |
Hong, Seungwoo | Korea Advanced Institute of Science and Technology |
Choi, Hyungho Chris | Korea Advanced Institute of Science and Technology |
Kim, Joon-Ha | Korea Advanced Institute of Science and Technology(KAIST) |
Park, Hae-Won | Korea Advanced Institute of Science and Technology |
Keywords: Mechanism Design, Actuation and Joint Mechanisms
Abstract: This paper presents a reduction mechanism for robot actuators that can switch between two types of reduction ratio. By fixing the carrier or ring gear of the proposed actuator which is based on the 3K compound planetary drive, the actuator can shift its reduction ratio. For compact design with reduced weight of the actuator, unique pawl brake mechanism interacting with cams and micro servos for switching mechanism is designed. The resulting prototype module has a reduction ratio of 6.91 and 44.93 for ‘low-reduction’ and ‘high-reduction’ ratios, respectively. Reduction ratios can be easily adjusted by modifying the pitch diameters of gears. Experimental results demonstrate that the proposed actuator could extend its operation region via two reduction modes that are interchangeable with gear shifting.
|
|
10:10-10:20, Paper MoA-8.2 | |
Single-Rod Brachiation Robot |
|
Akahane, Hijiri | Tokyo University of Agriculture and Technology |
Mizuuchi, Ikuo | Tokyo University of Agriculture and Technology |
Keywords: Mechanism Design, Biologically-Inspired Robots, Motion Control
Abstract: In this paper, we propose a new brachiation robot, a single-rod brachiation robot. Brachiation is a method of locomotion that makes clever use of gravity and has been tried to apply to robots. Conventional brachiation robots are multiple-pendulum-like robots that mimic a gibbon. Although the multiple-pendulum-like robot can easily change the length of one brachiation step by joints, it has complex structures and generates aperiodic motions such as chaos. In contrast, the single-rod brachiation robot has the advantages of simple structure and the ability to suppress complex multiple-pendulum trajectories. The single-rod brachiation robot has a disadvantage because it is difficult to adjust the distance to the next bar. However, we can solve it by aerial brachiation, which includes an aerial phase before grasping the next bar. Using the actual robot, we showed that the swinging amplitude could be increased by appropriately moving its center of gravity like a trapeze motion. In addition, using this, we achieved continuous brachiation across three bars and brachiation including the aerial phase with a flight distance of 140 mm.
|
|
10:20-10:30, Paper MoA-8.3 | |
A Compact, Lightweight and Singularity-Free Wrist Joint Mechanism for Humanoid Robots |
|
Klas, Cornelius | Karlsruhe Institute of Technology (KIT) |
Asfour, Tamim | Karlsruhe Institute of Technology (KIT) |
Keywords: Mechanism Design, Humanoid Robot Systems, Kinematics
Abstract: Building humanoid robots with properties similar to those of humans in terms of strength and agility is a great and unsolved challenge. This work introduces a compact and lightweight wrist joint mechanism that is singularity-free and has large range of motion. The mechanism has two degrees of freedom (DoF) and has been designed to be integrated into a human scale humanoid robot arm. It is based on a parallelmechanism with rolling contact joint behaviour and remote actuation that facilitates a compact design with low mass and inertia. The mechanism’s kinematics together with a solution of the inverse kinematics problem for the specific design, and the manipulability analysis are presented. The first prototype of the proposed mechanism shows the possible integration of actuation, sensing and electronics in small and narrow space. Experimental evaluations shows that the design feature unique performance regarding weight, speed, payload and accuracy.
|
|
10:30-10:40, Paper MoA-8.4 | |
Wirelessly Magnetically Actuated Robotic Implant for Tissue Regeneration |
|
Duffield, Cameron | The University of Sheffield |
Smith, Abigail Florence | University of Sheffield |
Rus, Daniela | MIT |
Damian, Dana | University of Sheffield |
Miyashita, Shuhei | University of Sheffield |
Keywords: Mechanism Design, Medical Robots and Systems
Abstract: Abstract—In biomedical engineering, robotic implants provide new methods to restore and improve bodily function, and regenerate tissue. A significant challenge with the design of these devices is to safely actuate them for weeks or months, while they are residing in a patient’s body. Magnetic, and other force-at distance actuation methods, allow mechanisms to be controlled remotely and without contact or line of sight to the device. In this paper, we present a novel magnetic field driven wireless motor. The motor drives a robotic implant for the treatment of long gap esophageal atresia and short bowel syndrome. The motor is equipped with two oppositely oriented permanent magnets which experience forces in opposite directions when a magnetic field is applied tangential to the magnets’ directions. The implant can produce a force of 2 N. It is demonstrated with an ex vivo porcine esophagus.
|
|
10:40-10:50, Paper MoA-8.5 | |
Multiple Curvatures in a Tendon-Driven Continuum Robot Using a Novel Magnetic Locking Mechanism |
|
Pogue, Chloe | University of Toronto |
Rao, Priyanka | University of Toronto |
Peyron, Quentin | Inria Lille-Nord Europe and CRIStAL UMR CNRS 9189, University Of |
Kim, Jongwoo | Kyung Hee University |
Burgner-Kahrs, Jessica | University of Toronto |
Diller, Eric D. | University of Toronto |
Keywords: Mechanism Design, Soft Robot Materials and Design, Medical Robots and Systems
Abstract: Tendon-driven continuum robots show promise for use in surgical applications as they can assume complex configurations to navigate along tortuous paths. However, to achieve these complex robot shapes, multiple segments are required as each robot segment can bend only with a single constant curvature. To actuate these additional robot segments, multiple tendons must typically be added on-board the robot, complicating their integration, robot control, and actuation. This work presents a method of achieving two curvatures in a single tendon-driven continuum robot segment through use of a novel magnetic locking mechanism. Thus, the need for additional robot segments and actuating tendons is eliminated. The resulting two curvatures in a single segment are demonstrated in two and three dimensions. Furthermore, the maximum magnetic field required to actuate the locking mechanism for different robot bending angles is experimentally measured to be 6.1 mT. Additionally, the locking mechanism resists unintentional unlocking unless the robot assumes a 0° bending angle and a magnetic field of 18.1 mT is applied, conditions which are not typically reached during routine use of the system. Finally, addressable actuation of two locking mechanisms is achieved, demonstrating the capability of producing multiple curvatures in a single robot segment.
|
|
10:50-11:00, Paper MoA-8.6 | |
Toroidal Origami Monotrack: Mechanism to Realize Smooth Driving and Bending for Closed-Skin-Drive Robots |
|
Watanabe, Masahiro | Tohoku University |
Kemmotsu, Yuto | Tohoku University |
Tadakuma, Kenjiro | Tohoku University |
Abe, Kazuki | Tohoku University |
Konyo, Masashi | Tohoku University |
Tadokoro, Satoshi | Tohoku University |
Keywords: Mechanism Design, Soft Robot Materials and Design, Search and Rescue Robots
Abstract: We propose a novel toroidal origami monotrack capable of smooth-skin driving and bending for closed-skin-drive robots. Monotracks are a promising solution for achieving high mobility in unstructured environments. Toroidal-drive mechanisms enable whole skin drive; however, conventional methods experience unexpected wrinkling and buckles that lead to a large resistance. In this study, we propose an origami bellows structure with multiple rollers that can maintain the skin tension and deal with the cause of large friction between the skin and the body. The origami structure design method is presented, and the bending angle range and required drive force were derived through a theoretical analysis. The validity of the effectiveness of the concept was verified through prototype testing.
|
|
11:00-11:10, Paper MoA-8.7 | |
A Modified Rocker-Bogie Mechanism with Fewer Actuators and High Mobility |
|
Lim, Kyeongtae | Hanyang University Mechanical Engineering, RoDEL |
Ryu, Sijun | Hanyang University |
Won, Jee Ho | Hanyang University |
Seo, TaeWon | Hanyang University |
Keywords: Mechanism Design, Actuation and Joint Mechanisms, Climbing Robots
Abstract: In this study, a modified rocker-bogie mechanism that improve mobility using only two actuators and a damper is proposed. Previous mechanisms use large number of motors or complex controls to overcome obstacles such as rough terrain or stairs. To compensate this, the damper-driven rocker-bogie (DDRB) is devised. The effectiveness of the damper was verified using quasi-static analysis, and the optimal parameters of prototype was determined using multibody dynamics. Finally, the performance of climbing stairs was successfully demonstrated through experiments with the prototype. The mobility performance is expected to be improved through additional active systems, and will be applied not only to stairs but also to various rough terrains in the future.
|
|
11:10-11:20, Paper MoA-8.8 | |
Planar Multi-Closed-Loop Hyper-Redundant Manipulator Using Extendable Tape Springs: Design, Modeling, and Experiments |
|
Ding, Tonghuan | Shanghai University |
Li, Bo | Shanghai Aerospace System Engineering Institute |
Liu, Hu | Shanghai University |
Peng, Yan | Shanghai University |
Yang, Yi | Shanghai University |
Keywords: Mechanism Design, Redundant Robots, Manipulation Planning
Abstract: This paper describes the development of a planar multi-closed-loop hyper-redundant manipulator. The proposed manipulator consists of two extendable tape springs and several inner tube rods. As the tape springs can be wound, the manipulator achieves a large workspace with a small footprint. The inner tube rods are arranged between the two tape springs, providing the manipulator with a triangular-lattice structure in which the inner tube rods and tape springs are subjected to purely axial loads. By controlling the two fixed drive components and eight mobile drive components in the manipulator, various redundant configurations can be produced. The kinematic model of this manipulator is established, and a configuration planning approach based on optimal stiffness is proposed. Simulations are conducted for three different cases, and a prototype is fabricated and tested to validate the proposed design and method.
|
|
11:20-11:30, Paper MoA-8.9 | |
Autonomous State-Based Flipper Control for Articulated Tracked Robots in Urban Environments |
|
Azayev, Teymur | Czech Technical University in Prague |
Zimmermann, Karel | Czech Technical University Prague |
Keywords: Machine Learning for Robot Control, Deep Learning Methods, Imitation Learning
Abstract: We demonstrate a hybrid approach to autonomous flipper control, focusing on a fusion of hard-coded and learned knowledge. The result is a sample-efficient and modifiable control structure that can be used in conjunction with a mapping/navigation stack. The backbone of the control policy is formulated as a state machine whose states define various flipper action templates and local control behaviors. It is also used as an interface that facilitates the gathering of demonstrations to train the transitions of the state machine. We propose a soft-differentiable state machine neural network that mitigates the shortcomings of its naively implemented counterpart and improves over a multi-layer perceptron baseline in the task of state-transition classification. We show that by training on several minutes of user-gathered demonstrations in simulation, our approach is capable of a zero-shot domain transfer to a wide range of obstacles on a similar real robotic platform. Our results show a considerable increase in performance over a previous competing approach in several essential criteria. A subset of this work was successfully used in the Defense Advanced Research Projects Agency (DARPA) Subterranean Challenge to alleviate the operator of manual flipper control. We autonomously traversed stairs and other obstacles, improving map coverage.
|
|
MoA-9 |
Rm9 (Room G) |
Object Detection, Segmentation and Categorization 1 |
Regular session |
Chair: Li, Bing | Clemson University |
Co-Chair: Sawada, Yoshihide | AISIN Corporation |
|
10:00-10:10, Paper MoA-9.1 | |
Ensemble Based Anomaly Detection for Legged Robots to Explore Unknown Environments |
|
Puck, Lennart | FZI Forschungszentrum Informatik |
Schik, Maximilian | FZI Forschungszentrum Informatik |
Schnell, Tristan | FZI Forschungszentrum Informatik |
Buettner, Timothee | FZI Research Center for Information Technology |
Roennau, Arne | FZI Forschungszentrum Informatik, Karlsruhe |
Dillmann, Rüdiger | FZI - Forschungszentrum Informatik - Karlsruhe |
Keywords: Object Detection, Segmentation and Categorization, AI-Enabled Robotics, Space Robotics and Automation
Abstract: Exploring unknown environments, such as caves or planetary surfaces, requires a quick understanding of the surroundings. Beforehand, only aerial footage from satellites or images from previous missions might be available. The proposed ensemble based anomaly detection framework utilizes previously gained knowledge and incorporates it with insights gained during the mission. The modular system consists of different networks which are combined to determine anomalies in the current surroundings. By utilizing data from other missions, simulations or aerial photos, a precise anomaly detection can be achieved at the start of a mission. The system can further be improved by training new networks during the mission, which can be incorporated into the ensemble at runtime. This allows for synchronous execution of mission and training of models on a base station. The proposed system is tested and evaluated on an ANYmal C walking robot in different scenarios, however the approach is applicable for different kinds of mobile robots. The results show a clear improvement of ensembles compared to individual networks, while keeping a small memory footprint and low inference time on the mobile system.
|
|
10:10-10:20, Paper MoA-9.2 | |
FocusTR: Focusing on Valuable Feature by Multiple Transformers for Fusing Feature Pyramid on Object Detection |
|
Xie, Bangquan | South China University of Technology |
Liang, Yang | City University of New York, City College |
Yang, Zongming | Clemson University |
Wei, Ailin | Clemson Univeristy |
Weng, Xiaoxiong | South China University of Technology |
Li, Bing | Clemson University |
Keywords: Object Detection, Segmentation and Categorization, Computer Vision for Transportation, Sensor Fusion
Abstract: The feature pyramid, which is a vital component of the convolutional neural networks, plays a significant role in several perception tasks, including object detection for autonomous driving. However, how to better fuse multi-level and multi-sensor feature pyramids is still a significant challenge, especially for object detection. This paper presents a FocusTR (Focusing on the valuable features by multiple Transformers), which is a simple yet effective architecture, to fuse feature pyramid for the single-stream 2D detector and two-stream 3D detector. Specifically, FocusTR encompasses several novel self-attention mechanisms, including the spatial-wise boxAlign attention (SB) for low-level spatial locations, context-wise affinity attention (CA) for high-level context information, and level-wise attention for the multi-level feature. To alleviate self-attention's computational complexity and slow training convergence, FocusTR introduces a low and high-level fusion (LHF) to reduce the computational parameters, and the Pre-LN to accelerate the training convergence. Comparative experiments on public benchmarks and datasets show that FocusTR achieves a higher detection accuracy than the baseline methods, especially for the small object detection. Our method shows a 2.1 higher detection accuracy on APs index of the small object on MS-COCO 2017 with ResNeXt-101 backbone, a 2.18 higher 3D detection accuracy (moderate difficulty category) for small object-pedestrian on KITTI, and a 6.85 higher RC index (Town05 Long) on CARLA urban driving simulator.
|
|
10:20-10:30, Paper MoA-9.3 | |
DeepShapeKit: Accurate 4D Shape Reconstruction of Swimming Fish |
|
Wu, Ruiheng | University of Konstanz |
Deussen, Oliver | University of Konstanz |
Li, Liang | Max-Planck Institute of Animal Behavior |
Keywords: Object Detection, Segmentation and Categorization, Deep Learning Methods
Abstract: In this paper, we present methods for capturing 4D body shapes of swimming fish with affordable small training datasets and textureless 2D videos. Automated capture of spatiotemporal animal movements and postures is revolutionizing the study of collective animal behavior. 4D (including 3D space + time) shape data from animals like schooling fish contains a rich array of social and non-social information that can be used to shed light on the fundamental mechanisms underlying collective behavior. However, unlike the large datasets used for 4D shape reconstructions of the human body, there are no large amounts of labeled training datasets for reconstructing fish bodies in 4D, due to the difficulty of underwater data collection. We created a template mesh model using 3D scan data from a real fish, then extracted silhouettes (segmentation masks) and key-points of the fish body using Mask R-CNN and DeepLabCut, respectively. Next, using the Adam optimizer, we optimized the 3D template mesh model for each frame by minimizing the difference between the projected 3D model and the detected silhouettes as well as the key-points. Finally, using an LSTM-based smoother, we generated accurate 4D shapes of schooling fish based on the 3D shapes over each frame. Our results show that the method is effective for 4D shape reconstructions of swimming fish, with greater fidelity than other state-of-the-art algorithms.
|
|
10:30-10:40, Paper MoA-9.4 | |
E2Pose: Fully Convolutional Networks for End-To-End Multi-Person Pose Estimation |
|
Tobeta, Masakazu | AISIN Corporation |
Sawada, Yoshihide | AISIN Corporation |
Zheng, Ze | AISIN Corporation |
Takamuku, Sawa | AISIN Corporation |
Natori, Naotake | AISIN Corporation |
Keywords: Object Detection, Segmentation and Categorization, Gesture, Posture and Facial Expressions, AI-Based Methods
Abstract: Highly accurate multi-person pose estimation at a high framerate is a fundamental problem in autonomous driving. Solving the problem could aid in preventing pedestrian--car accidents. The present study tackles this problem by proposing a new model composed of a feature pyramid and an original head to a general backbone. The original head is built using lightweight CNNs and directly estimates multi-person pose coordinates. This configuration avoids the complex post-processing and two-stage estimation adopted by other models and allows for a lightweight model. Our model can be trained end-to-end and performed in real-time on a resource-limited platform (low-cost edge device) during inference. Experimental results using the COCO and CrowdPose datasets showed that our model can achieve a higher framerate (approx. 20 frames/sec with NVIDIA Jetson AGX Xavier) than other state-of-the-art models while maintaining sufficient accuracy for practical use.
|
|
10:40-10:50, Paper MoA-9.5 | |
Fast Detection of Moving Traffic Participants in LiDAR Point Clouds by Using Particles Augmented with Free Space Information |
|
Reich, Andreas | Universität Der Bundeswehr München |
Wuensche, Hans Joachim Joe | Universität Der Bundeswehr München |
Keywords: Object Detection, Segmentation and Categorization, Intelligent Transportation Systems
Abstract: To navigate safely, it is essential for a robot to detect all kinds of moving objects that could possibly interfere with the own trajectory. For common object classes, like cars, regular pedestrians, and trucks, there are large scale datasets as well as corresponding machine learning techniques, which provide remarkable results in commonly available detection benchmarks. A big challenge that remains, are less frequent classes, which are not part of a dataset in a sufficient number and variation. Dynamic occupancy grids are a promising approach for detection of moving objects in point clouds since they impose only a few assumptions about the objects' appearance and shape. Typically, they use particle filters to detect motion of occupancy in the grid. Existing approaches, however, often generate false positives at long obstacles because particles move along them. Therefore we propose a highly efficient approach, which performs the classification in a more structured and conservative way by making extensive use of available free space information. As a result, much less false positives are generated while the number of false negatives remains low. Our approach can be used to complement CNN-based object detections in order to detect both, frequent and uncommon object classes reliably. By using polar data structures that match the polar measurement principle, we are able to process even large point clouds of modern LiDARs with 128 lasers efficiently.
|
|
10:50-11:00, Paper MoA-9.6 | |
RPG: Learning Recursive Point Cloud Generation |
|
Ko, Wei Jan | National Chiao Tung University |
Chiu, Chen-Yi | National Yang Ming Chiao Tung University |
Kuo, Yu-Liang | National Yang Ming Chiao Tung University |
Chiu, Wei-Chen | National Chiao Tung University |
Keywords: Object Detection, Segmentation and Categorization, Representation Learning, Deep Learning for Visual Perception
Abstract: In this paper we propose a novel point cloud generator that is able to reconstruct and generate 3D point clouds composed of semantic parts. Given a latent represen- tation of the target 3D model, the generation starts from a single point and gets expanded recursively to produce the high- resolution point cloud via a sequence of point expansion stages. During the recursive procedure of generation, we not only obtain the coarse-to-fine point clouds for the target 3D model from every expansion stage, but also unsupervisedly discover the semantic segmentation of the target model according to the hierarchical/parent-child relation between the points across expansion stages. Moreover, the expansion modules and other elements used in our recursive generator are mostly sharing weights thus making the overall framework light and efficient. Extensive experiments are conducted to show that our point cloud generator has comparable or even superior performance on both generation and reconstruction tasks in comparison to various baselines, and provides the consistent co-segmentation among instances of the same object class.
|
|
11:00-11:10, Paper MoA-9.7 | |
Fully Convolutional Transformer with Local–Global Attention |
|
Lee, Sihaeng | LG AI Research |
Yi, Eojindl | KAIST |
Lee, Janghyeon | LG AI Research |
Yoo, Jinsu | Hanyang University |
Lee, Honglak | University of Michigan |
Kim, Seung Hwan | LG AI Research |
Keywords: Object Detection, Segmentation and Categorization, RGB-D Perception, Deep Learning for Visual Perception
Abstract: In an attempt to imitate the success of transformers in the field of natural language processing into computer vision tasks, vision transformers (ViTs) have recently gained attention. Performance breakthroughs have been achieved in coarse-grained tasks like classification. However, dense prediction tasks, such as detection, segmentation, and depth estimation, require additional modifications and have been tackled only in an ad-hoc manner, by replacing the convolutional neural network encoder backbone of an existing architecture with a ViT. This study proposes a fully convolutional transformer that can perform both coarse and dense prediction tasks. The proposed architecture is, to the best of our knowledge, the first architecture composed of attention layers, even in the decoder part of the network. This is because our newly proposed local--global attention (LGA) can flexibly perform both downsampling and upsampling of spatial features, which are key operations required for dense prediction. Against existing ViTs on classification tasks, our architecture shows a reasonable trade-off between performance and efficiency. In the depth estimation task, our architecture achieves performance comparable to that of state-of-the-art transformer-based methods.
|
|
11:10-11:20, Paper MoA-9.8 | |
DeepFusion: A Robust and Modular 3D Object Detector for Lidars, Cameras and Radars |
|
Drews, Florian | Robert Bosch GmbH |
Feng, Di | Robert Bosch GmbH |
Faion, Florian | Robert Bosch GmbH |
Rosenbaum, Lars | Robert Bosch GmbH |
Ulrich, Michael | Robert Bosch GmbH |
Glaeser, Claudius | Robert Bosch GmbH |
Keywords: Autonomous Vehicle Navigation, Object Detection, Segmentation and Categorization, Intelligent Transportation Systems
Abstract: We propose DeepFusion, a modular multi-modal architecture to fuse lidars, cameras and radars in different combinations for 3D object detection. Specialized feature extractors take advantage of each modality and can be exchanged easily, making the approach simple and flexible. Extracted features are transformed into bird's-eye-view as a common representation for fusion. Spatial and semantic alignment is performed prior to fusing modalities in the feature space. Finally, a detection head exploits rich multi-modal features for improved 3D detection performance. Experimental results for lidar-camera, lidar-camera-radar and camera-radar fusion show the flexibility and effectiveness of our fusion approach. In the process, we study the largely unexplored task of faraway car detection up to 225 meters, showing the benefits of our lidar-camera fusion. Furthermore, we investigate the required density of lidar points for 3D object detection and illustrate implications at the example of robustness against adverse weather conditions. Moreover, ablation studies on our camera-radar fusion highlight the importance of accurate depth estimation.
|
|
11:20-11:30, Paper MoA-9.9 | |
CVFNet: Real-Time 3D Object Detection by Learning Cross View Features |
|
Gu, Jiaqi | Zhejiang University |
Xiang, Zhiyu | Zhejiang University |
Zhao, Pan | Zhejiang University |
Bai, Tingming | Zhejiang University |
Wang, Lingxuan | Zhejiang University |
Zhao, Xijun | China North Vehicle Research Institute, China North Artificial I |
Zhang, Zhiyuan | Zhejiang University |
Keywords: Object Detection, Segmentation and Categorization, Deep Learning Methods, Intelligent Transportation Systems
Abstract: In recent years 3D object detection from LiDAR point clouds has made great progress thanks to the development of deep learning technologies. Although voxel or point based methods are popular in 3D object detection, they usually involve time-consuming operations such as 3D convolutions on voxels or ball query among points, making the resulting network inappropriate for time critical applications. On the other hand, 2D view-based methods feature high computing efficiency while usually obtaining inferior performance than the voxel or point based methods. In this work, we present a real-time view-based single stage 3D object detector, namely CVFNet to fulfill this task. To strengthen the cross-view feature learning under the condition of demanding efficiency, our framework extracts the features of different views and fuses them in an efficient progressive way. We first propose a novel Point-Range feature fusion module that deeply integrates point and range view features in multiple stages. Then, a special Slice Pillar is designed to well maintain the 3D geometry when transforming the obtained deep point-view features into bird's eye view. To better balance the ratio of samples, a sparse pillar detection head is presented to focus the detection on the nonempty grids. We conduct experiments on the popular KITTI and NuScenes benchmark, and state-of-the-art performances are achieved in terms of both accuracy and speed.
|
|
MoA-10 |
Rm10 (Room H) |
Haptics |
Regular session |
Chair: Hasegawa, Yasuhisa | Nagoya University |
Co-Chair: Yamane, Katsu | Path Robotics Inc |
|
10:00-10:10, Paper MoA-10.1 | |
Reduced Interface Models for Haptic Interfacing with Virtual Environments |
|
Kerr, Liam | McGill University |
Kovecses, Jozsef | McGill University |
Keywords: Haptics and Haptic Interfaces, Dynamics, Physical Human-Robot Interaction
Abstract: Haptic interfacing typically requires high communication frequencies in order to render realistic interactions between a user and a virtual environment. In this work, we introduce reduced interface modelling (RIM) as a method to bridge the discrepancy in frequency requirements between haptic devices and virtual environment simulators. The method offers a model-based approach to approximate the environment behaviour between integration time steps, without relying on time history extrapolations of the environment state with no physical basis. Using a vehicle dynamics simulation interfacing with a haptic steering wheel, we show that the proposed method results in a drastic reduction in the computation time required for numerical integration and updates to the virtual environment compared to the common zero-order-hold method. T-tests on numerical ratings by participants in a multi-user study also confirm that the RIM yields better uncoupled stability and haptic rendering smoothness compared to a common time-history-based multi-rate sampling method.
|
|
10:10-10:20, Paper MoA-10.2 | |
A Soft Robotic Haptic Feedback Glove for Colonoscopy Procedures |
|
Gerald, Arincheyan | Boston University |
Batliwala, Rukaiya | Boston University |
Ye, Jonathan | Boston University |
Hsu, Patra | Boston University |
Aihara, Hiroyuki | Brigham and Women's Hospital |
Russo, Sheila | Boston University |
Keywords: Haptics and Haptic Interfaces, Force and Tactile Sensing, Wearable Robotics
Abstract: This paper presents a proof-of-concept soft robotic glove that provides haptic feedback to the surgeon’s hand during interventional endoscopy procedures, specifically colonoscopy. The glove is connected to a force sensing soft robotic sleeve that is mounted onto a colonoscope. The glove consists of pneumatic actuators that inflate in proportion to the incident forces on the soft robotic sleeve. Thus, the glove is capable of alerting the surgeon of potentially dangerous forces exerted on the colon wall by the colonoscope during the navigation. The proposed glove is adaptable to a variety of hand sizes. It features modular actuators that facilitate convenient and rapid assembly and attachment before the procedure and removal afterward. The glove is calibrated to respond to incident forces on the soft robotic sleeve ranging from 0-3 N. The glove’s actuators are able to reach an internal pressure of 53 kPa and exert forces up to 20 N, thereby relaying and amplifying the force exerted by the colonoscope on the colon to the surgeon’s hand.
|
|
10:20-10:30, Paper MoA-10.3 | |
A Large-Area Wearable Soft Haptic Device Using Stacked Pneumatic Pouch Actuation |
|
Nunez, Cara M. | Stanford University |
Do, Brian | Stanford University |
Low, Andrew K. | Stanford University |
Blumenschein, Laura | Purdue University |
Yamane, Katsu | Path Robotics Inc |
Okamura, Allison M. | Stanford University |
Keywords: Haptics and Haptic Interfaces, Soft Sensors and Actuators
Abstract: While haptics research has traditionally focused on the fingertips and hands, other locations on the body provide large areas of skin that could be utilized to relay large-area haptic sensations. Researchers have thus developed wearable devices that use distributed vibrotactile actuators and distributed pneumatic force displays, but these methods have limitations. In prior work, we presented a novel actuation technique involving stacking pneumatic pouches and evaluated the actuator output. In this work, we developed a wearable haptic device using this actuation technique and evaluated how the actuator output is perceived. We conducted a user study with 20 participants to evaluate users’ perception thresholds, ability to localize, and ability to detect differences in contact area and compare their perception using the stacked pneumatic pouch actuation to traditional single-layer pouch actuation. We also used our device with stacked pneumatic actuation in a demonstration of a haptic hug that replicates the dynamics, pressure profile, and mapping to the human back, showcasing how this actuation technique can be used to create novel haptic stimuli.
|
|
10:30-10:40, Paper MoA-10.4 | |
EMG-Based Feedback Modulation for Increased Transparency in Teleoperation |
|
Schoot Uiterkamp, Luc | University of Twente |
Porcini, Francesco | PERCRO Laboratory, TeCIP Institute, Sant’Anna School of Advanced |
Englebienne, Gwenn | University of Twente |
Frisoli, Antonio | Scuola Superiore Sant'Anna |
Dresscher, Douwe | University of Twente |
Keywords: Haptics and Haptic Interfaces, Telerobotics and Teleoperation
Abstract: In interacting with stiff environments through teleoperated systems, time delays cause a mismatch between haptic feedback and the expected feedback by the operator. This mismatch causes artefacts in the feedback, which decrease transparency, but so does filtering these artefacts. Through modelling of operator stiffness and the expected feedback force with EMG, the artifacts can be selectively filtered without loss of transparency. We developed several feedback modulation techniques to bring the feedback force closer to the expected force: 1) the average between the modelled operator force and the feedback force, 2) a low pass filter and 3) a scaling modulation. To control for overdamping, a transparency check is included. We show that the averaging approach yields significantly better contacts than unmodulated feedback. None of the modulation algorithms differ significantly from the unmodulated feedback in transparency.
|
|
10:40-10:50, Paper MoA-10.5 | |
Cutaneous Feedback Interface for Teleoperated In-Hand Manipulation |
|
Zhu, Yaonan | Nagoya University |
Colan, Jacinto | Nagoya University |
Aoyama, Tadayoshi | Nagoya University |
Hasegawa, Yasuhisa | Nagoya University |
Keywords: Haptics and Haptic Interfaces, Telerobotics and Teleoperation, In-Hand Manipulation
Abstract: In-hand pivoting is one of the important manipulation skills that leverage robot grippers’ extrinsic dexterity to perform repositioning tasks to compensate for environmental uncertainties and imprecise motion execution. Although many researchers have been trying to solve pivoting problems using mathematical modeling or learning-based approaches, the problems remain as open challenges. On the other hand, humans perform in-hand manipulation with remarkable precision and speed. Hence, the solution could be provided by making full use of this intrinsic human skill through dexterous teleoperation. For dexterous teleoperation to be successful, interfaces that enhance and complement haptic feedback are of great necessity. In this paper, we propose a cutaneous feedback interface that complements the somatosensory information humans rely on when performing dexterous skills. The interface is designed based on five-bar link mechanisms and provides two contact points in the index finger and thumb for cutaneous feedback. By integrating the interface with a commercially available haptic device, the system can display information such as grasping force, shear force, friction, and grasped object’s pose. Passive pivoting tasks inside a numerical simulator Isaac Sim is conducted to evaluate the effect of the proposed cutaneous feedback interface.
|
|
10:50-11:00, Paper MoA-10.6 | |
Sensorimotor Control Sharing with Vibrotactile Feedback for Body Integration through Avatar Robot |
|
Tanaka, Yoshihiro | Nagoya Institute of Technology |
Katagiri, Takumi | Nagoya Institute of Technology |
Yukawa, Hikari | Nagoya Institute of Technology |
Nishimura, Takumi | Nagoya Institute of Technology |
Tanada, Ryohei | Nagoya Institute of Technology |
Ogura, Itsuki | Nagoya Institute of Technology |
Hagiwara, Takayoshi | Keio University |
Minamizawa, Kouta | Keio University |
Keywords: Haptics and Haptic Interfaces, Telerobotics and Teleoperation, Human Performance Augmentation
Abstract: An avatar robot can be operated by multiple users, augmenting the operation of a single user. In this study, we assembled a collaborative operation system for a 7-DoF robotic arm with a gripper controlled by two users. The users share the role of controlling the robot with one user controlling the arm and the other controlling the gripper. The actions of the two users must be seamlessly coordinated to ensure smooth and precise operations. Therefore, this study aimed to investigate vibrotactile feedback to promote recognition of the actions of a partner. A pick-and-place task was considered as a basic operation. First, an intensity adjustment was developed for vibrotactile feedback to ensure linear sensitivity and its usefulness was verified. An estimation test for the reaching motion showed that the target position was estimated accurately based on velocity feedback, indicating that the participants imaged the arm motion. Subsequently, collaboration tests with and without vibrotactile velocity feedback revealed a potential effect of vibrotactile feedback in terms of reducing the time required for completing the target task. Our experimental results indicate that vibrotactile feedback for partner's motion can be applied to achieve smooth collaborative operations using an avatar robot.
|
|
11:00-11:10, Paper MoA-10.7 | |
Perception of Mechanical Properties Via Wrist Haptics: Effects of Feedback Congruence |
|
Sarac, Mine | Kadir Has University |
Di Luca, Massimiliano | University of Birmingham |
Okamura, Allison M. | Stanford University |
Keywords: Haptics and Haptic Interfaces, Virtual Reality and Interfaces, Wearable Robotics
Abstract: Despite non-co-location, haptic stimulation at the wrist can potentially provide feedback regarding interactions at the fingertips without encumbering the user's hand. Here we investigate how two types of skin deformation at the wrist (normal and shear) relate to the perception of the mechanical properties of virtual objects. We hypothesized that a congruent mapping (i.e. when the most relevant interaction forces during a virtual interaction spatially match the haptic feedback at the wrist) would result in better perception than other mappings. We performed an experiment where haptic devices at the wrist rendered either normal or shear feedback during manipulation of virtual objects with varying stiffness, mass, or friction properties. Perception of mechanical properties was more accurate with congruent skin stimulation than noncongruent. In addition, discrimination performance and subjective reports were positively influenced by congruence. This study demonstrates that users can perceive mechanical properties via haptic feedback provided at the wrist with a consistent mapping between haptic feedback and interaction forces at the fingertips, regardless of congruence.
|
|
11:10-11:20, Paper MoA-10.8 | |
Haptic Feedback Relocation from the Fingertips to the Wrist for Two-Finger Manipulation in Virtual Reality |
|
Palmer, Jasmin | Stanford University |
Sarac, Mine | Kadir Has University |
Garza, Aaron | Stanford University |
Okamura, Allison M. | Stanford University |
Keywords: Haptics and Haptic Interfaces, Wearable Robotics, Virtual Reality and Interfaces
Abstract: Relocation of haptic feedback from the fingertips to the wrist has been considered as a way to enable haptic interaction with mixed reality virtual environments while leaving the fingers free for other tasks. We present a pair of wrist-worn tactile haptic devices and a virtual environment to study how various mappings between fingers and tactors affect task performance. The haptic feedback rendered to the wrist reflects the interaction forces occurring between a virtual object and virtual avatars controlled by the index finger and thumb. We performed a user study comparing four different finger-to tactor haptic feedback mappings and one no-feedback condition as a control. We evaluated users’ ability to perform a simple pick-and-place task via the metrics of task completion time, path length of the fingers and virtual cube, and magnitudes of normal and shear forces at the fingertips. We found that multiple mappings were effective, and there was a greater impact when visual cues were limited. We discuss the limitations of our approach and describe next steps toward multi-degree-of-freedom haptic rendering for wrist-worn devices to improve task performance in virtual environments.
|
|
11:20-11:30, Paper MoA-10.9 | |
Feeling the Pressure: The Influence of Vibrotactile Patterns on Feedback Perception |
|
Smith, Alex | Bristol Robotics Laboratory |
Ward-Cherrier, Benjamin | University of Bristol |
Etoundi, Appolinaire C. | University of the West of England |
Pearson, Martin | Bristol Robotics Laboratory |
Keywords: Haptics and Haptic Interfaces, Prosthetics and Exoskeletons, Telerobotics and Teleoperation
Abstract: Tactile feedback is necessary for closing the sensorimotor loop in prosthetic and tele-operable control, which would allow for more precise manipulation and increased acceptance of use of such devices. Pressure stimuli are commonly presented to users in haptic devices through a sensory substitution to vibration. The precise nature of this substitution affects pressure sensitivity, as well as the comfort and intuitiveness of the device for the user. This study determines the effects of different vibrational encodings for pressure on user-preference and performance in a 4-alternative absolute identification task. 4 different encoding patterns for pressure were examined: short pulse and long pulse amplitude modulation along with sine and square wave frequency modulation. Of the methods examined, frequency modulation methods led to the best discrimination of stimuli (p < 0.001). There was a notable difference in user preference between the two frequency modulated systems, with sinusoidal stimulation being highest ranked across all the preference metrics and square-wave being ranked lowest in two of the three. This difference trended towards, but did not achieve statistical significance (TLX rankings, p = 0.098). This suggests that prostheses or teleoperated devices utilising vibrotactile feedback may benefit from implementing a discrete frequency-based sinusoidal pattern to indicate changes in grip force
|
|
MoA-11 |
Rm11 (Room I) |
Human Factors and Human-In-The-Loop |
Regular session |
Chair: Rossi, Silvia | Universita' Di Napoli Federico II |
Co-Chair: Oyama, Eimei | AIST |
|
10:00-10:10, Paper MoA-11.1 | |
SPARCS: Structuring Physically Assistive Robotics for Caregiving with Stakeholders-In-The-Loop |
|
Madan, Rishabh | Cornell University |
Jenamani, Rajat Kumar | Cornell University |
Nguyen, Vy | Cornell University |
Moustafa, Ahmed | Cornell University |
Hu, Xuefeng | Cornell University |
Dimitropoulou, Katherine | Columbia University |
Bhattacharjee, Tapomayukh | Cornell University |
Keywords: Human Factors and Human-in-the-Loop, Physical Human-Robot Interaction, Human-Centered Robotics
Abstract: Existing work in physical robot caregiving is limited in its ability to provide long-term assistance. This is due to (i) lack of well-defined problems, (ii) diversity of tasks, and (iii) limited access to stakeholders from the caregiving community. We propose Structuring Physically Assistive Robotics for Caregiving with Stakeholders-in-the-loop (SPARCS) to address these challenges. SPARCS is a framework for physical robot caregiving comprising (i) Building Blocks that define physical robot caregiving scenarios, (ii) Structured Workflows - hierarchical workflows that enable us to answer the Whats and Hows of physical robot caregiving, and (iii) SPARCS-box, a web-based platform to facilitate dialogue between all stakeholders. We collect clinical data for six care recipients with varying disabilities and demonstrate the use of SPARCS in designing well-defined caregiving scenarios and identifying their care requirements. All the data and workflows from this study are available on SPARCS-box. We demonstrate the utility of SPARCS in building a robot-assisted feeding system for one of the care recipients. We also perform experiments to show the adaptability of this system to different caregiving scenarios. Finally, we identify open challenges in physical robot caregiving by consulting care recipients and caregivers. Supplementary material can be found at emprise.cs.cornell.edu/sparcs/.
|
|
10:10-10:20, Paper MoA-11.2 | |
To Ask for Help or Not to Ask: A Predictive Approach to Human-In-The-Loop Motion Planning for Robot Manipulation Tasks |
|
Papallas, Rafael | University of Leeds |
Dogar, Mehmet R | University of Leeds |
Keywords: Human Factors and Human-in-the-Loop, Manipulation Planning, Motion and Path Planning
Abstract: We present a predictive system for non-prehensile, physics-based motion planning in clutter with a human-in-the-loop. Recent shared-autonomous systems present motion planning performance improvements when high-level reasoning is provided by a human. Humans are usually good at quickly identifying high-level actions in high-dimensional spaces, and robots are good at converting high-level actions into valid robot trajectories. In this paper, we present a novel framework that permits a single human operator to effectively guide a fleet of robots in a virtual warehouse. The robots are tackling the problem of Reaching Through Clutter (RTC), where they are reaching onto cluttered shelves to grasp a goal object while pushing other obstacles out of the way. We exploit information from the motion planning algorithm to predict which robot requires human help the most and assign that robot to the human. With twenty virtual robots and a single human-operator, the results suggest that this approach improves the system’s overall performance compared to a baseline with no predictions. The results also show that there is a cap on how many robots can effectively be guided simultaneously by a single human operator.
|
|
10:20-10:30, Paper MoA-11.3 | |
You Are in My Way: Non-Verbal Social Cues for Legible Robot Navigation Behaviors |
|
Angelopoulos, Georgios | Interdepartmental Center for Advances in Robotic Surgery - ICARO |
Rossi, Alessandra | University of Naples Federico II |
Di Napoli, Claudia | ICAR - National Research Council |
Rossi, Silvia | Universita' Di Napoli Federico II |
Keywords: Robot Companions, Gesture, Posture and Facial Expressions, Design and Human Factors
Abstract: People and robots may need to cross each other in narrow spaces when they are sharing environments. It is expected that autonomous robots will behave in these contexts safely but also show social behaviors. Thereby, developing an acceptable behavior for autonomous robots in the area mentioned above is a foreseeable problem for the Human-Robot Interaction (HRI) field. Our current work focuses on integrating legible non-verbal behaviors into the robot's social navigation to make nearby humans aware of its intended trajectory. Results from a within-subjects study involving 33 participants show that deictic gestures as navigational cues for humanoid robots result in fewer navigation conflicts than the use of a simulated gaze. Additionally, an increase in the perceived anthropomorphism is found when the robot uses the deictic gesture as a cue. These findings show the importance of social behaviors for people avoidance and suggest a paradigm of such behaviors in future humanoid robotic applications.
|
|
10:30-10:40, Paper MoA-11.4 | |
Robot Trajectory Adaptation to Optimise the Trade-Off between Human Cognitive Ergonomics and Workplace Productivity in Collaborative Tasks |
|
Lagomarsino, Marta | Istituto Italiano Di Tecnologia |
Lorenzini, Marta | Istituto Italiano Di Tecnologia |
De Momi, Elena | Politecnico Di Milano |
Ajoudani, Arash | Istituto Italiano Di Tecnologia |
Keywords: Human Factors and Human-in-the-Loop, Human-Centered Robotics, Human-Robot Collaboration
Abstract: In hybrid industrial environments, workers' comfort and positive perception of safety are essential requirements for successful acceptance and usage of collaborative robots. This paper proposes a novel human-robot interaction framework in which the robot behaviour is adapted online according to the operator's cognitive workload and stress. The method exploits the generation of B-spline trajectories in the joint space and formulation of a multi-objective optimisation problem to online adjust the total execution time and smoothness of the robot trajectories. The former ensures human efficiency and productivity of the workplace, while the latter contributes to safeguarding the user's comfort and cognitive ergonomics. The performance of the proposed framework was evaluated in a typical industrial task. Results demonstrated its capability to enhance the productivity of the human-robot dyad while mitigating the cognitive workload induced in the worker.
|
|
10:40-10:50, Paper MoA-11.5 | |
EMG-Based Hybrid Impedance-Force Control for Human-Robot Collaboration on Ultrasound Imaging |
|
Li, Teng | University of Alberta |
Xing, Hongjun | Nanjing University of Aeronautics and Astronautics |
Taghirad, Hamid D. | K.N.Toosi University of Technology |
Tavakoli, Mahdi | University of Alberta |
Keywords: Physical Human-Robot Interaction, Compliance and Impedance Control, Medical Robots and Systems
Abstract: Ultrasound (US) imaging is a common but physically demanding task in the medical field, and sonographers may need to put in considerable physical effort for producing high-quality US images. During physical human-robot interaction on US imaging, robot compliance is a critical feature that can ensure human user safety while automatic force regulation ability can help to improve task performance. However, higher robot compliance may mean lower force regulation accuracy, and vice versa. Especially, the contact/non-contact status transition can largely affect the control system stability. In this paper, a novel electromyography (EMG)-based hybrid impedance-force control system is developed for US imaging task. The proposed control system incorporates the robot compliance and force regulation ability via a hybrid controller while the EMG channel enables the user to online modulate the trade-off between the two features as necessary. Two experiments are conducted to examine the hybrid controller and show the necessity of involving an EMG-based modulator. A proof-of-concept study on US imaging is performed with implementing the proposed EMG-based control system, and the effectiveness is demonstrated. The proposed control system is promising to ensure robot's stability and patient's safety, thus obtain high-quality US images, while monitoring and reducing sonographer's fatigue. Furthermore, it can be easily adapted to other physically demanding tasks in the field of medicine.
|
|
10:50-11:00, Paper MoA-11.6 | |
Modeling Human Response to Robot Errors for Timely Error Detection |
|
Stiber, Maia | Johns Hopkins University |
Taylor, Russell H. | The Johns Hopkins University |
Huang, Chien-Ming | Johns Hopkins University |
Keywords: Human Factors and Human-in-the-Loop, Physical Human-Robot Interaction, Failure Detection and Recovery
Abstract: In human-robot collaboration, robot errors are inevitable---damaging user trust, willingness to work together, and task performance. Prior work has shown that people naturally respond to robot errors socially and that in social interactions it is possible to use human responses to detect errors. However, there is little exploration in the domain of non-social, physical human-robot collaboration such as assembly and tool retrieval. In this work, we investigate how people's organic, social responses to robot errors may be used to enable timely automatic detection of errors in physical human-robot interactions. We conducted a data collection study to obtain facial responses to train a real-time detection algorithm and a case study to explore the generalizability of our method with different task settings and errors. Our results show that natural social responses are effective signals for timely detection and localization of robot errors even in non-social contexts and that our method is robust across a variety of task contexts, robot errors, and user responses. This work contributes to robust error detection without detailed task specifications.
|
|
11:00-11:10, Paper MoA-11.7 | |
Effects of Multiple Avatar Images Presented Consecutively with Temporal Delays on Self-Body Recognition |
|
Oyama, Eimei | AIST |
Ioka, Yuya | Meiji University |
Agah, Arvin | University of Kansas |
Okada, Hiroyuki | Tamagawa University |
Shimada, Sotaro | Meiji Univeristy |
Keywords: Human Factors and Human-in-the-Loop, Virtual Reality and Interfaces, Telerobotics and Teleoperation
Abstract: Self-body awareness refers to the recognition of one's body as one's own and consists of two senses: "sense of body ownership" and "sense of agency." In telexistence/telepresence robot operation, time delays in the robot's motion degrade self-body awareness of the robot body. We investigated how self-body recognition can be affected in a telexistence robot operation in a VR space when the robot is presented with a real robot arm that simulates a real robot with a delay and a virtual robot arm, or several virtual robot arms, with a delay less than that of the real robot. These experimental conditions include a 'Predictive Display,' which is well known as a time delay countermeasure. The results suggest that virtual robot arms presented consecutively with less delay than a real robot arm do not induce a sense of body ownership to the real robot arm, but they enhance the sense of agency to the real robot arm, and that sense of agency is stronger when the task requires precision.
|
|
11:10-11:20, Paper MoA-11.8 | |
Interactive Reinforcement Learning with Bayesian Fusion of Multimodal Advice |
|
Trick, Susanne | Technische Universität Darmstadt |
Herbert, Franziska | TU Darmstadt |
Rothkopf, Constantin | Frankfurt Institute for Advanced Studies |
Koert, Dorothea | Technische Universitaet Darmstadt |
Keywords: Human Factors and Human-in-the-Loop, Multi-Modal Perception for HRI, Reinforcement Learning
Abstract: Interactive Reinforcement Learning (IRL) has shown promising results in decreasing the learning times of Reinforcement Learning algorithms by incorporating human feedback and advice. In particular, the integration of multimodal feedback channels such as speech and gestures into IRL systems can enable more versatile and natural interaction of everyday users. In this paper, we propose a novel approach to integrate human advice from multiple modalities into IRL algorithms. For each advice modality we assume an individual base classifier that outputs a categorical probability distribution and fuse these distributions using the Bayesian fusion method Independent Opinion Pool. While existing approaches rely on heuristic fusion, our Bayesian approach is theoretically founded and fully exploits the uncertainty represented in the distributions. Experimental evaluations in a simulated grid world scenario and on a real-world human-robot interaction task with a 7-DoF robot arm show that our method clearly outperforms the closest related approach for multimodal IRL. In particular, our novel approach is more robust against misclassifications of the modalities' individual base classifiers.
|
|
11:20-11:30, Paper MoA-11.9 | |
A Camera-Based Deep-Learning Solution for Visual Attention Zone Recognition in Maritime Navigational Operations |
|
Wu, Baiheng | NTNU |
Han, Peihua | Norwegian University of Science and Technology |
Kanazawa, Motoyasu | NTNU |
Hildre, Hans Petter | Aalesund University College |
Zhao, Luman | Norwegian University of Science and Technology |
Zhang, Houxiang | Norwegian University of Science and Technology |
Li, Guoyuan | Norwegian University of Science and Technology |
Keywords: Human Factors and Human-in-the-Loop, Computer Vision for Transportation, Deep Learning for Visual Perception
Abstract: The visual attention of navigators is imperative to understanding the logic of navigation and the surveillance of the navigators' status and operations. Currently, existing studies are implemented under the help of wearable eye-tracker glasses, while the high expenditure demanded by the equipment and service and the limitations on usability have impeded the relevant research to be performed extensively. In this letter, the authors propose a framework which is the first attempt in the maritime domain to provide a camera-based deep-learning (CaBDeeL) visual attention recognition solution to outperform the intrusive eye tracker in terms of the shortcomings. A wide-angle camera is configured in front of the navigator in the advanced ship-bridge simulator so that visual attention reflected by the facial and head movements is captured at the front view. A pair of eye-tracker glasses are used to classify the captured visual attention images to establish the primary database. While the camera-captured images are classified, a convolutional neural network (CNN) is built as an automatic classifier. The CNN is applied to two scenarios, and it scores an overall 95 % precision.
|
|
MoA-12 |
Rm12 (Room J) |
Visual Learning |
Regular session |
Chair: Lee, Chun-Yi | National Tsing Hua University |
Co-Chair: Zhang, Jianwei | University of Hamburg |
|
10:00-10:10, Paper MoA-12.1 | |
Spatiotemporally Enhanced Photometric Loss for Self-Supervised Monocular Depth Estimation |
|
Zhang, Tianyu | Chinese Academy of Sciences |
Zhu, Dongchen | Shanghai Institute of Microsystem and Information Technology, Chi |
Zhang, Guanghui | Shanghai Institute of Microsystem and Information Technology, Ch |
Shi, Wenjun | Shanghai Institute of Microsystem and Information Technology |
Liu, Yanqing | Shanghai Institute of Microsystem and Information Technology, Ch |
Zhang, Xiaolin | Shanghai Institute of Microsystem and Information Technology, Chi |
Li, Jiamao | Shanghai Institute of Microsystem and Information Technology, Chi |
Keywords: Deep Learning for Visual Perception, Visual Learning, Deep Learning Methods
Abstract: Recovering depth information from a single image is a long-standing challenge, and self-supervised depth estimation methods have gradually attracted attention due to not relying on high-cost ground truth. Constructing an accurate photometric loss based on photometric consistency is crucial for these self-supervised methods to obtain high-quality depth maps. However, the photometric loss in most studies treats all pixels indiscriminately, resulting in poor performance. In this paper, we propose two modules based on the spatial and temporal cues to refine the photometric loss. Delving into the geometric model of photometric consistency, we introduce a depth-aware pixel correspondence module (DPC) inside the monocular depth estimation pipeline. It reduces the uncertainty of photometric errors by applying the homography matrix to the projection of corresponding pixels in far regions instead of the fundamental matrix. Furthermore, we design an omnidirectional auto-masking module (OA) to boost the robustness of our model, which utilizes temporal sequences to generate disturbance poses and hypothetical views to distinguish dynamic objects with different directions that violate the photometric consistency. Experiments on the KITTI and the Make3d datasets reveal that our framework achieves state-of-the-art performance.
|
|
10:10-10:20, Paper MoA-12.2 | |
J-RR: Joint Monocular Depth Estimation and Semantic Edge Detection Exploiting Reciprocal Relations |
|
Wu, Deming | Shanghaitech |
Zhu, Dongchen | Shanghai Institute of Microsystem and Information Technology, Chi |
Zhang, Guanghui | Shanghai Institute of Microsystem and Information Technology, Ch |
Shi, Wenjun | Shanghai Institute of Microsystem and Information Technology |
Zhang, Xiaolin | Shanghai Institute of Microsystem and Information Technology, Chi |
Li, Jiamao | Shanghai Institute of Microsystem and Information Technology, Chi |
Keywords: Deep Learning for Visual Perception, Visual Learning
Abstract: Depth estimation and semantic edge detection are two key tasks in computer vision, which have made great progress. To date, how to associatively predict the depth and the semantic edge is rarely explored. In this work, we first propose a flexible two-branch framework that can make the two tasks take advantage of each other, achieving a win-win situation. Specifically, for the semantic edge detection branch, an Enhanced Edge Weighting strategy (EEW) is designed, which learns weight information from the by-product of depth branch, depth edge, to enhance edge perception in features. Meanwhile, we make depth estimation benefit from semantic edge detection through introducing Depth Edge Semantic Classification module (DESC). Furthermore, a double reconstruction (D-reconstruction) approach is presented, together with semantic edge-guided disparity smoothing loss to mitigate the ambiguities of the self-supervised manner for depth estimation. Experiments on the Cityscapes dataset demonstrate that our framework outperforms the state-of-the-art method in depth estimation along with a significant improvement in semantic edge detection.
|
|
10:20-10:30, Paper MoA-12.3 | |
Towards Two-View 6D Object Pose Estimation: A Comparative Study on Fusion Strategy |
|
Wu, Jun | Zhejiang University |
Liu, Lilu | Zhejiang University |
Wang, Yue | Zhejiang University |
Xiong, Rong | Zhejiang University |
Keywords: Deep Learning for Visual Perception, Visual Learning, Perception for Grasping and Manipulation
Abstract: Current RGB-based 6D object pose estimation methods have achieved noticeable performance on datasets and real world applications. However, predicting 6D pose from single 2D image features is susceptible to disturbance from changing of environment and textureless or resemblant object surfaces. Hence, RGB-based methods generally achieve less competitive results than RGBD-based methods, which deploy both image features and 3D structure features. To narrow down this performance gap, this paper proposes a framework for 6D object pose estimation that learns implicit 3D information from 2 RGB images. Combining the learned 3D information and 2D image features, we establish more stable correspondence between the scene and the object models. To seek for the methods best utilizing 3D information from RGB inputs, we conduct an investigation on three different approaches, including Early- Fusion, Mid-Fusion, and Late-Fusion. We ascertain the Mid- Fusion approach is the best approach to restore the most precise 3D keypoints useful for object pose estimation. The experiments show that our method outperforms state-of-the-art RGB-based methods, and achieves comparable results with RGBD-based methods.
|
|
10:30-10:40, Paper MoA-12.4 | |
Maximizing Self-Supervision from Thermal Image for Effective Self-Supervised Learning of Depth and Ego-Motion |
|
Shin, Ukcheol | KAIST(Korea Advanced Institute of Science and Technology) |
Lee, Kyunghyun | KAIST |
Lee, Byeong-Uk | KAIST |
Kweon, In So | KAIST |
Keywords: Deep Learning for Visual Perception, Visual Learning, Computer Vision for Transportation
Abstract: Recently, self-supervised learning of depth and ego-motion from thermal images shows strong robustness and reliability under challenging lighting and weather conditions. However, the inherent thermal image properties such as weak contrast, blurry edges, and noise hinder to generate effective self-supervision from thermal images. Therefore, most previous researches just rely on additional self-supervisory sources such as RGB video, generative models, and Lidar information. In this paper, we conduct an in-depth analysis of thermal image characteristics that degenerates self-supervision from thermal images. Based on the analysis, we propose an effective thermal image mapping method that significantly increases image information, such as overall structure, contrast, and details, while preserving temporal consistency. By resolving the fundamental problem of the thermal image, our depth and pose network trained only with thermal images achieves state-of-the-art results without utilizing any extra self-supervisory source. As our best knowledge, this work is the first self-supervised learning approach to train monocular depth and relative pose networks with only thermal images.
|
|
10:40-10:50, Paper MoA-12.5 | |
Domain Invariant Siamese Attention Mask for Small Object Change Detection Via Everyday Indoor Robot Navigation |
|
Takeda, Koji | Tokyo Metropolitan Industrial Technology Research Institute |
Tanaka, Kanji | University of Fukui |
Nakamura, Yoshimasa | Tokyo Metropolitan Industrial Technology Research Institute |
Keywords: Visual Learning, Object Detection, Segmentation and Categorization, Data Sets for Robotic Vision
Abstract: The problem of image change detection via everyday indoor robot navigation is explored from a novel perspective of the self-attention technique. Detecting semantically non-distinctive and visually small changes remains a key challenge in the robotics community. Intuitively, these small non-distinctive changes may be better handled by the recent paradigm of the attention mechanism, which is the basic idea of this work. However, existing self-attention models require significant retraining cost per domain, so it is not directly applicable to robotics applications. We propose a new self-attention technique with an ability of unsupervised on-the-fly domain adaptation, which introduces an attention mask into the intermediate layer of an image change detection model, without modifying the input and output layers of the model. Experiments, in which an indoor robot aims to detect visually small changes in everyday navigation, demonstrate that our attention technique significantly boosts the state-of-the-art image change detection model. We will make this new dataset publicly available for future research in this area.
|
|
10:50-11:00, Paper MoA-12.6 | |
Investigation of Factorized Optical Flows As Mid-Level Representations |
|
Yang, Hsuan-Kung | National Tsing Hua University |
Hsiao, Tsu-Ching | National Tsing Hua University |
Liao, Ting-Hsuan | National Tsing Hua University |
Liu, Hsu-Shen | National Tsing Hua University |
Tsao, Li-Yuan | National Tsing Hua University |
Wang, Tzu-Wen | National Tsing Hua University |
Yang, Shan-Ya | National Tsing Hua University |
Chen, Yu-Wen | National Tsing Hua University |
Liao, HuangRu | National Tsing Hua University |
Lee, Chun-Yi | National Tsing Hua University |
Keywords: Visual Learning, Machine Learning for Robot Control, Collision Avoidance
Abstract: In this paper, we introduce a new concept of incorporating factorized flow maps as mid-level representations, for bridging the perception and the control modules in modular learning based robotic frameworks. To investigate the advantages of factorized flow maps and examine their interplay with the other types of mid-level representations, we further develop a configurable framework, along with four different environments that contain both static and dynamic objects, for analyzing the impacts of factorized optical flow maps on the performance of deep reinforcement learning agents. Based on this framework, we report our experimental results on various scenarios, and offer a set of analyses to justify our hypothesis. Finally, we validate flow factorization in real world scenarios.
|
|
11:00-11:10, Paper MoA-12.7 | |
Augment-Connect-Explore: A Paradigm for Visual Action Planning with Data Scarcity |
|
Lippi, Martina | University of Roma Tre |
Welle, Michael C. | KTH Royal Institute of Technology |
Poklukar, Petra | KTH Royal Institute of Technology |
Marino, Alessandro | University of Cassino and Southern Lazio |
Kragic, Danica | KTH |
Keywords: Visual Learning, Representation Learning, Task Planning
Abstract: Visual action planning particularly excels in applications where the state of the system cannot be computed explicitly, such as manipulation of deformable objects, as it enables planning directly from raw images. Even though the field has been significantly accelerated by deep learning techniques, a crucial requirement for their success is the availability of a large amount of data. In this work, we propose the Augment-Connect-Explore (ACE) paradigm to enable visual action planning in cases of data scarcity. We build upon the Latent Space Roadmap (LSR) framework which performs planning with a graph built in a low dimensional latent space. In particular, ACE is used to i) Augment the available training dataset by autonomously creating new pairs of datapoints, ii) create new unobserved Connections among representations of states in the latent graph, and iii) Explore new regions of the latent space in a targeted manner. We validate the proposed approach on both simulated box stacking and real-world folding task showing the applicability for rigid and deformable object manipulation tasks, respectively.
|
|
11:10-11:20, Paper MoA-12.8 | |
Learning 6-DoF Task-Oriented Grasp Detection Via Implicit Estimation and Visual Affordance |
|
Chen, Wenkai | University of Hamburg |
Liang, Hongzhuo | University of Hamburg |
Chen, Zhaopeng | University of Hamburg |
Sun, Fuchun | Tsinghua University |
Zhang, Jianwei | University of Hamburg |
Keywords: Visual Learning, Deep Learning in Grasping and Manipulation
Abstract: Currently, task-oriented grasp detection approaches are mostly based on pixel-level affordance detection and semantic segmentation. These pixel-level approaches heavily rely on the accuracy of 2D affordance mask, and the generated grasp candidates are restricted to a small workspace. To mitigate these limitations, we proposed a novel 6-DoF task-oriented grasp detection framework, which takes the observed object point cloud as input and predicts diverse 6-DoF grasp poses for different tasks. Specifically, our implicit estimation network and visual affordance network in this framework could directly predict coarse grasp candidates, and corresponding 3D affordance heatmap for each potential task, respectively. Furthermore, the grasping scores from coarse grasps are combined with heatmap values to generate more accurate and finer candidates. Our proposed framework shows significant improvements compared to baselines for existing and novel objects on our self-constructed simulation dataset. Although our framework is trained based on the simulated objects and environment, the final generated grasp candidates can be accurately and stably executed in the real robot experiments when the object is randomly placed on a support surface.
|
|
11:20-11:30, Paper MoA-12.9 | |
NARF22: Neural Articulated Radiance Fields for Configuration-Aware Rendering |
|
Lewis, Stanley | University of Michigan |
Pavlasek, Jana | University of Michigan |
Jenkins, Odest Chadwicke | University of Michigan |
Keywords: Visual Learning, RGB-D Perception, Deep Learning for Visual Perception
Abstract: Articulated objects pose a unique challenge for robotic perception and manipulation. Their increased number of degrees-of-freedom makes tasks such as localization computationally difficult, while also making the process of realworld dataset collection unscalable. With the aim of addressing these scalability issues, we propose Neural Articulated Radiance Fields (NARF22), a pipeline which uses a fully-differentiable, configuration-parameterized Neural Radiance Field (NeRF) as a means of providing high quality renderings of articulated objects. NARF22 requires no explicit knowledge of the object structure at inference time. We propose a two-stage parts-based training mechanism which allows the object rendering models to generalize well across the configuration space even if the underlying training data has as few as one configuration represented. We demonstrate the efficacy of NARF22 by training configurable renderers on a real-world articulated tool dataset collected via a Fetch mobile manipulation robot. We show the applicability of the model to gradient-based inference methods through a configuration estimation and 6 degree-of-freedom pose refinement task.
|
|
MoA-13 |
Rm13 (Room K) |
Mapping 1 |
Regular session |
Chair: Tan, U-Xuan | Singapore University of Techonlogy and Design |
Co-Chair: Rajvanshi, Abhinav | SRI International |
|
10:00-10:10, Paper MoA-13.1 | |
Multi-Agent Relative Pose Estimation with UWB and Constrained Communications |
|
Fishberg, Andrew | MIT |
How, Jonathan | Massachusetts Institute of Technology |
Keywords: Localization, Multi-Robot Systems, Range Sensing
Abstract: Inter-agent relative localization is critical for any multi-robot system operating in the absence of external positioning infrastructure or prior environmental knowledge. We propose a novel inter-agent relative 2D pose estimation system where each participating agent is equipped with several ultra-wideband (UWB) ranging tags. Prior work typically supplements noisy UWB range measurements with additional continuously transmitted data, such as odometry, making these approaches scale poorly with increased swarm size or decreased communication throughput. This approach addresses these concerns by using only locally collected UWB measurements with no additionally transmitted data. By modeling observed ranging biases and systematic antenna obstructions in our proposed optimization solution, our experimental results demonstrate an improved mean position error (while remaining competitive in other metrics) over a similar state-of-the-art approach that additionally relies on continuously transmitted odometry.
|
|
10:10-10:20, Paper MoA-13.2 | |
Ranging-Aided Ground Robot Navigation Using UWB Nodes at Unknown Locations |
|
Rajvanshi, Abhinav | SRI International |
Chiu, Han-Pang | SRI International |
Krasner, Alex | SRI International |
Sizintsev, Mikhail | SRI International |
Murray, Glenn | SRI International |
Samarasekera, Supun | SRI Sarnoff |
Keywords: Localization, Range Sensing, Cooperating Robots
Abstract: Ranging information from ultra-wideband (UWB) ranging radios can be used to improve estimated navigation accuracy of a ground robot with other on-board sensors. However, all ranging-aided navigation methods demand the locations of ranging nodes to be known, which is not suitable for time-pressed situations, dynamic cluttered environments, or collaborative navigation applications. This paper describes a new ranging-aided navigation approach that does not require the locations of ranging radios. Our approach formulates relative pose constraints using ranging readings. The formulation is based on geometric relationships between each stationary ranging node and two ranging antennas on the moving robot across time. Our experiments show that estimated navigation accuracy of the ground robot is substantially enhanced with ranging information using our approach under a variety of scenarios, when ranging nodes are placed at unknown locations. We analyze and compare our performance with a traditional ranging-aided method, which requires mapping the positions of ranging nodes. We also demonstrate the applicability of our approach for collaborative navigation in large-scale unknown environments, by using ranging information from one mobile robot to improve navigation estimation of the other robot. This application does not require the installation of ranging nodes at fixed locations.
|
|
10:20-10:30, Paper MoA-13.3 | |
Efficient and Probabilistic Adaptive Voxel Mapping for Accurate Online LiDAR Odometry |
|
Chongjian, Yuan | The University of Hong Kong |
Xu, Wei | University of Hong Kong |
Liu, Xiyuan | The University of Hong Kong |
Hong, Xiaoping | Southern University of Science and Technology |
Zhang, Fu | University of Hong Kong |
Keywords: Mapping, Localization, SLAM
Abstract: This paper proposes an efficient and probabilistic adaptive voxel mapping method for LiDAR odometry. The map is a collection of voxels; each contains one plane (or edge) feature that enables the probabilistic representation of the environment and accurate registration of a new LiDAR scan. We further analyze the need for coarse-to-fine voxel mapping and then use a novel voxel map organized by a Hash table and octrees to build and update the map efficiently. We apply the proposed voxel map to an iterated extended Kalman filter and construct a maximum a posteriori probability problem for pose estimation. Experiments on the open KITTI dataset show the high accuracy and efficiency of our method compared to other state-of-the-art methods. Outdoor experiments on unstructured environments with non-repetitive scanning LiDARs further verify the adaptability of our mapping method to different environments and LiDAR scanning patterns (see our attached video). Our codes and dataset are open-sourced on Github。
|
|
10:30-10:40, Paper MoA-13.4 | |
360ST-Mapping: An Online Semantics-Guided Topological Mapping Module for Omnidirectional Visual SLAM |
|
Liu, Hongji | The Hong Kong University of Science and Technology |
Huang, Huajian | The Hong Kong University of Science and Technology |
Yeung, Sai-Kit | Hong Kong University of Science and Technology |
Liu, Ming | Hong Kong University of Science and Technology |
Keywords: Mapping, Omnidirectional Vision, SLAM
Abstract: As an abstract representation of the environment structure, a topological map has advantageous properties for path-planning and navigation. Here we proposed an online topological mapping method, 360ST-Mapping, using omnidirectional vision. The 360-degree field-of-view allows the agent to obtain consistent observation and incrementally extract topological environment information. Moreover, we leverage semantic information to guide topological place recognition, further improving performance. The topological map possessing semantic information has the potential to support semantics-related advanced tasks. After integrating the topological mapping module into the omnidirectional visual SLAM system, we conducted extensive experiments in several large-scale indoor scenes and validated the method’s effectiveness.
|
|
10:40-10:50, Paper MoA-13.5 | |
Online Distance Field Priors for Gaussian Process Implicit Surfaces |
|
Ivan, Jean-Paul André | Örebro University |
Stoyanov, Todor | Örebro University |
Stork, Johannes A. | Orebro University |
Keywords: Mapping
Abstract: Gaussian process (GP) implicit surface models provide environment and object representations which elegantly address noise and uncertainty while remaining sufficiently flexible to capture complex geometry. However, GP models quickly become intractable as the size of the observation set grows - a trait which is difficult to reconcile with the rate at which modern range sensors produce data. Furthermore, naive applications of GPs to implicit surface models allocate model resources uniformly, thus using precious resources to capture simple geometry. In contrast to prior work addressing these challenges though model sparsification, spatial partitioning, or ad-hoc filtering we propose introducing model bias online through the GP's mean function. We achieve more accurate distance fields using smaller models by creating a distance field prior from features which are easy to extract and have analytic distance fields. In particular, we demonstrate this approach using linear features. We show the proposed distance field halves model size in a 2D mapping task using data from a SICK S300 sensor. When applied to a single 3D scene from the TUM RGB-D SLAM dataset we achieve a fivefold reduction in model size. Our proposed prior results in more accurate GP implicit surfaces, while allowing existing models to function in larger environments or with larger spatial partitions due to reduced model size.
|
|
10:50-11:00, Paper MoA-13.6 | |
An Algorithm for the SE(3)-Transformation on Neural Implicit Maps for Remapping Functions |
|
Yuan, Yijun | University of Wuerzburg |
Nuechter, Andreas | University of Würzburg |
Keywords: Mapping, SLAM
Abstract: Implicit representations are widely used for object reconstruction due to their efficiency and flexibility. In 2021, a novel structure named neural implicit map has been invented for incremental reconstruction. A neural implicit map alleviates the problem of inefficient memory cost of previous online 3D dense reconstruction while producing better quality. However, the neural implicit map suffers the limitation that it does not support remapping as the frames of scans are encoded into a deep prior after generating the neural implicit map. This means, that neither this generation process is invertible, nor a deep prior is transformable. The non-remappable property makes it not possible to apply loop-closure techniques. We present a neural implicit map based transformation algorithm to fill this gap. As our neural implicit map is transformable, our model supports remapping for this special map of latent features. Experiments show that our remapping module is capable to well-transform neural implicit maps to new poses. Embedded into a SLAM framework, our mapping model is able to tackle the remapping of loop closures and demonstrates high-quality surface reconstruction. Our implementation is available at githubfootnote{url{https://github.com/Jarrome/IMT_Mapping}} for the research community.
|
|
11:00-11:10, Paper MoA-13.7 | |
PlaneSDF-Based Change Detection for Long-Term Dense Mapping |
|
Fu, Jiahui | Massachusetts Institute of Technology |
Lin, Chengyuan | Meta |
Taguchi, Yuichi | Meta |
Cohen, Andrea | Meta Inc |
Zhang, Yifu | Meta |
Mylabathula, Stephen | Meta (Facebook) |
Leonard, John | MIT |
Keywords: Mapping, SLAM, Range Sensing
Abstract: The ability to process environment maps across multiple sessions is critical for robots operating over extended periods of time. Specifically, it is desirable for autonomous agents to detect changes amongst maps of different sessions so as to gain a conflict-free understanding of the current environment. In this paper, we look into the problem of change detection based on a novel map representation, dubbed Plane Signed Distance Fields (PlaneSDF), where dense maps are represented as a collection of planes and their associated geometric components in SDF volumes. Given point clouds of the source and target scenes, we propose a three-step PlaneSDF-based change detection approach: (1) PlaneSDF volumes are instantiated within each scene and registered across scenes using plane poses; 2D height maps and object maps are extracted per volume via height projection and connected component analysis. (2) Height maps are compared and intersected with the object map to produce a 2D change location mask for changed object candidates in the source scene. (3) 3D geometric validation is performed using SDF-derived features per object candidate for change mask refinement. We evaluate our approach on both synthetic and real-world datasets and demonstrate its effectiveness via the task of changed object detection. Supplementary video is available at: https://youtu.be/oh-MQPWTwZI
|
|
11:10-11:20, Paper MoA-13.8 | |
Efficient 2D LIDAR Based Map Updating for Long Term Operations in Dynamic Environments |
|
Stefanini, Elisa | Centro Di Ricerca E. Piaggio - Università Di Pisa |
Ciancolini, Enrico | Università Di Pisa |
Settimi, Alessandro | Proxima Robotics S.r.l |
Pallottino, Lucia | Università Di Pisa |
Keywords: Mapping, Localization, SLAM
Abstract: Long-time operations of autonomous vehicles and mobile robots in logistics and service applications are still a challenge. To avoid a continuous re-mapping, the map can be updated to obtain a consistent representation of the current environment. In this paper, we propose a novel LIDAR-based occupancy grid map updating algorithm for dynamic environments taking into account possible localisation and measurement errors. The proposed approach allows robust long-term operations as it can detect changes in the working area even in presence of moving elements. Results highlighting maps quality and localisation performance, both in simulation and experiments, are reported. We will extend localisation performance analysis to real-world experiments employing an external tracking system.
|
|
11:20-11:30, Paper MoA-13.9 | |
Monocular UAV Localisation with Deep Learning and Uncertainty Propagation |
|
Oh, Xueyan | Singapore University of Technology and Design |
Lim, Ryan Jon Hui | Singapore University of Technology & Design |
Loh, Leonard | Singapore University of Technology and Design |
Tan, Chee How | Singapore University of Technology & Design |
Foong, Shaohui | Singapore University of Technology and Design |
Tan, U-Xuan | Singapore University of Techonlogy and Design |
Keywords: Localization, Aerial Systems: Applications, Visual Tracking
Abstract: In this paper, we propose a ground-based monocular UAV localisation system that detects and localises an LED marker attached to the underside of a UAV. Our system removes the need for extensive infrastructure and calibration unlike existing technologies such as UWB, radio frequency and multi-camera systems often used for localisation in GPS-denied environment. To improve deployablity for real-world applications without the need to collect extensive real dataset, we train a CNN on synthetic binary images as opposed to using real images in existing monocular UAV localisation methods, and factor in the camera’s zoom to allow tracking of UAVs flying at further distances. We propose NoisyCutout algorithm for augmenting synthetic binary images to simulate binary images processed from real images and show that it improves localisation accuracy as compared to using existing salt-and-pepper and Cutout augmentation methods. We also leverage uncertainty propagation to modify the CNN’s loss function and show that this also improves localisation accuracy. Real-world experiments are conducted to evaluate our methods and we achieve an overall 3D RMSE of approximately 0.41m.
|
|
MoA-14 |
Rm14 (Room 501) |
Human-Centered Robotics 1 |
Regular session |
Chair: Cruz, Francisco | UNSW Sydney |
Co-Chair: Itadera, Shunki | National Institute of Advanced Industrial Science and Technology |
|
10:00-10:10, Paper MoA-14.1 | |
Task Decoupling in Preference-Based Reinforcement Learning for Personalized Human-Robot Interaction |
|
Liu, Mingjiang | Nanjing University |
Chen, Chunlin | Nanjing University |
Keywords: Human-Centered Robotics, Reinforcement Learning, Human-Robot Collaboration
Abstract: Intelligent robots designed to interact with humans in the real world need to adapt to the preferences of different individuals. Preference-based reinforcement learning (RL) has shown great potential for teaching robots to learn personalized behaviors from interacting with humans without a meticulous, hand-crafted reward function, replaced by learning reward based on a human's preferences between two robot trajectories. However, poor feedback efficiency and poor exploration in the state and reward spaces make current preference-based RL algorithms perform poorly in complex interactive tasks. To improve the performance of preference-based RL, we incorporate prior knowledge of the task into preference-based RL. Specifically, we decouple the task from preference in human-robot interaction. We utilize a sketchy task reward derived from task priori to instruct robots to conduct more effective task exploration. Then a learned reward from preference-based RL is used to optimize the robot's policy to align with human preferences. In addition, these two parts are combined organically via reward shaping. The experimental results show that our method is a practical and effective solution for personalized human-robot interaction. Code is available at https://github.com/Wenminggong/PbRL_for_PHRI.
|
|
10:10-10:20, Paper MoA-14.2 | |
Holo-SpoK: Affordance-Aware Augmented Reality Control of Legged Manipulators |
|
Chacon Quesada, Rodrigo | Imperial College London |
Demiris, Yiannis | Imperial College London |
Keywords: Human-Centered Robotics, Virtual Reality and Interfaces, Human-Robot Collaboration
Abstract: Although there is extensive research regarding legged manipulators, comparatively little focuses on their User Interfaces (UIs). Towards extending the state-of-art in this domain, in this work, we integrate a Boston Dynamics (BD) Spot with a light-weight 7 DoF Kinova robot arm and a Robotiq 2F-85 gripper into a legged manipulator. Furthermore, we jointly control the robotic platform using an affordance-aware Augmented Reality (AR) Head-Mounted Display (HMD) UI developed for the Microsoft HoloLens 2. We named the combined platform Holo-SpoK. Moreover, we explain how this manipulator colocalises with the HoloLens 2 for its control through AR. In addition, we present the details of our algorithms for autonomously detecting grasp-ability affordances and for the refinement of the positions obtained via vision-based colocalisation. We validate the suitability of our proposed methods with multiple navigation and manipulation experiments. To the best of our knowledge, this is the first demonstration of an AR HMD UI for controlling legged manipulators.
|
|
10:20-10:30, Paper MoA-14.3 | |
Keeping Humans in the Loop: Teaching Via Feedback in Continuous Action Space Environments |
|
Sheidlower, Isaac | Tufts University |
Moore, Allison | Tufts University |
Short, Elaine Schaertl | Tufts University |
Keywords: Human-Centered Robotics, Reinforcement Learning, Human-Robot Collaboration
Abstract: Interactive Reinforcement Learning (IntRL) allows human teachers to accelerate the learning process of Reinforcement Learning (RL) robots. However, IntRL has largely been limited to tasks with discrete-action spaces in which actions are relatively slow. This limits IntRL’s application to more complicated and challenging robotic tasks, the very tasks that modern RL is particularly well-suited for. We seek to bridge this gap by presenting Continuous Action-space Interactive Reinforcement learning (CAIR): the first continuous action-space IntRL algorithm that is capable of using teacher feedback to out-perform state-of-the-art RL algorithms in those tasks. CAIR combines policies learned from the environment and the teacher into a single policy that proportionally weights the two policies based on their agreement. This allows a CAIR agent to learn a relatively stable policy despite potentially noisy or coarse teacher feedback. We validate our approach in two simulated robotics tasks with easy-to-design and -understand heuristic oracle teachers. Furthermore, we validate our approach in a human subjects study through Amazon Mechanical Turk and show CAIR out-performs the prior state-of-the-art in Interactive RL.
|
|
10:30-10:40, Paper MoA-14.4 | |
An Adaptive, Affordable, Humanlike Arm Hand System for Deaf and Deaf Blind Communication with the American Sign Language |
|
Chang, Che-Ming | University of Auckland |
Sanches, Felipe Padula | University of Auckland |
Gao, Geng | University of Auckland |
Johnson, Samantha | Northeastern University |
Liarokapis, Minas | The University of Auckland |
Keywords: Human-Centered Robotics, Physical Human-Robot Interaction
Abstract: To communicate, the sim1.5 million Americans living with deafblindess use tactile American Sign Language (t-ASL). To provide DeafBlind (DB) individuals with a means of using their primary communication language without the use of an interpreter, we developed an assistive technology that promotes their autonomy. The TATUM (Tactile ASL Translational User Mechanism) anthropomorphic arm hand system leverages previous developments of a fingerspelling hand to sign more complex ASL words and phrases. The TATUM hand-wrist system is attached onto a 4 DOF robot arm and a human motion recognition and human to robot gesture transfer framework is used for signing recognition and replication. In particular, signing trajectories based on vision-based motion capture data from a sign demonstrator were used to control the robot's actuators. The performance of the system was evaluated through tactile based sign recognition performed by a blinded user and for its accuracy with novice, sighted users.
|
|
10:40-10:50, Paper MoA-14.5 | |
Examining Distance in UAV Gesture Perception |
|
Jelonek, Karissa | University of Nebraska-Lincoln |
Fletcher, Paul | University of Nebraska |
Duncan, Brittany | University of Nebraska, Lincoln |
Detweiler, Carrick | University of Nebraska-Lincoln |
Keywords: Human-Centered Robotics, Gesture, Posture and Facial Expressions, Aerial Systems: Applications
Abstract: Unmanned aerial vehicles (UAVs) are becoming more common, presenting the need for effective human-robot communication strategies that address the unique nature of unmanned aerial flight. Visual communication via drone flight paths, also called gestures, may prove to be an ideal method. However, the effectiveness of visual communication techniques is dependent on several factors including an observer's position relative to a UAV. Previous work has studied the maximum line-of-sight at which observers can identify a small UAV. However, this work did not consider how changes in distance may affect an observer's ability to perceive the shape of a UAV's motion. In this study, we conduct a series of online surveys to evaluate how changes in line-of-sight distance and gesture size affect observers' ability to identify and distinguish between UAV gestures. We first examine observers' ability to accurately identify gestures when adjusting a gesture's size relative to the size of a UAV. We then measure how observers' ability to identify gestures changes with respect to varying line-of-sight distances. Lastly, we consider how altering the size of a UAV gesture may improve an observer's ability to identify drone gestures from varying distances. Our results show that increasing the gesture size across varying UAV to gesture ratios did not have a significant effect on participant response accuracy. We found that between 17 m and 75 m from the observer, their ability to accurately identify a drone gesture was inversely proportional to the distance between the observer and the drone. Finally, we found that maintaining a gesture's apparent size improves participant response accuracy over changing line-of-sight distances.
|
|
10:50-11:00, Paper MoA-14.6 | |
GA-STT: Human Trajectory Prediction with Group Aware Spatial-Temporal Transformer |
|
Zhou, Lei | Institute of Robotics and Automatic Information System, Nankai U |
Yang, Dingye | Nankai University |
Zhai, Xiaolin | Nankai University |
Wu, Shichao | Nankai University |
Hu, Zhengxi | Nankai University |
Liu, Jingtai | Nankai University |
Keywords: Human-Centered Robotics, Long term Interaction, Modeling and Simulating Humans
Abstract: Human trajectory prediction is a crucial yet challenging problem, which is of fundamental importance to robotics and autonomous driving vehicles. The core challenge lies in effectively modelling the socially aware spatial interaction and complex temporal dependencies among crowds. However, previous methods either model spatial and temporal information separately or only use individual features without exploring the intra-structure of the crowds. We propose a novel trajectory prediction framework termed GA-STT, a group aware spatial-temporal transformer network to address these issues. Specifically, we first get the individual representations supervised by group-based annotations. Then, we model the complex spatial-temporal interactions with spatial and temporal transformers separately and fuse spatial-temporal embedding through the cross-attention mechanism. Results on the publicly available ETH/UCY datasets show that our model outperforms the state-of-the-art method by 19.4% in ADE and 16.9% in FDE and successfully predicts complex spatial-temporal interactions.
|
|
11:00-11:10, Paper MoA-14.7 | |
Evaluating Human-Like Explanations for Robot Actions in Reinforcement Learning Scenarios |
|
Cruz, Francisco | UNSW Sydney |
Young, Charlotte | Federation University |
Dazeley, Richard | Deakin University |
Vamplew, Peter | Federation University |
Keywords: Human-Centered Robotics, Reinforcement Learning, Acceptability and Trust
Abstract: Explainable artificial intelligence is a research field that tries to provide more transparency for autonomous intelligent systems. Explainability has been used, particularly in reinforcement learning and robotic scenarios, to better understand the robot decision-making process. Previous work, however, has been widely focused on providing technical explanations that can be better understood by AI practitioners than non-expert end-users. In this work, we make use of human-like explanations built from the probability of success to complete the goal that an autonomous robot shows after performing an action. These explanations are intended to be understood by people who have no or very little experience with artificial intelligence methods. This paper presents a user trial to study whether these explanations that focus on the probability an action has of succeeding in its goal constitute a suitable explanation for non-expert end-users. The results obtained show that non-expert participants rate robot explanations that focus on the probability of success higher and with less variance than technical explanations generated from Q-values, and also favor counterfactual explanations over standalone explanations.
|
|
11:10-11:20, Paper MoA-14.8 | |
A Modular and Portable Black Box Recorder for Increased Transparency of Autonomous Service Robots |
|
Schmidt, Max | Simulation, Systems Optimization and Robotics Group, Technical U |
Kirchhoff, Jérôme | Technische Universität Darmstadt |
von Stryk, Oskar | Technische Universität Darmstadt |
Keywords: Human-Centered Robotics, Acceptability and Trust
Abstract: Autonomous service robots have great potential to support humans in tasks they cannot perform due to, amongst others, time constraints, work overload, or staff shortages. An important step for such service robots to be trusted or accepted by society is the provision of transparency. Its purpose is not only to communicate what a robot is doing according to the human interaction partners' needs, it should also regard social and legal requirements. A black box recorder (inspired by flight recorders) increases the system's transparency by facilitating the investigation of the cause of an incident, clarifying responsibilities, or improving the user's understanding about the robot. In this work we propose the needed requirements of such a black box recorder for increased transparency of autonomous service robots, based on the related work. Further, we present a new modular and portable black box recorder design meeting these requirements. The applicability of the system is evaluated based on real-world robot data, using the realized open-source reference implementation.
|
|
MoA-15 |
Rm15 (Room 509) |
Localization 1 |
Regular session |
Chair: Behley, Jens | University of Bonn |
Co-Chair: Pages, Gaël | ISAE-SUPAERO |
|
10:00-10:10, Paper MoA-15.1 | |
LiDAR-Aided Visual-Inertial Localization with Semantic Maps |
|
Li, Hao | Tusimple |
Pan, Liangliang | Tusimple |
Zhao, Ji | TuSimple |
Keywords: Localization
Abstract: Accurate and robust localization is an essential task for autonomous driving systems. In this paper, we propose a novel 3D LiDAR-aided visual-inertial localization method. Our method fully explores the complementarity of visual and LiDAR observations. On the one hand, the association between semantic features in images and a given semantic map provides constraints for the absolute pose. On the other hand, LiDAR odometry (LO) can provide an accurate and robust 6DOF relative pose. The Error State Kalman Filter (ESKF) framework is exploited to estimate the vehicle pose relative to the semantic map, which fuses the global constraints between the image and the semantic map, the relative pose from the LO, and the raw IMU data. The method achieves centimeter-level localization accuracy in a variety of challenging scenarios. We validate the robustness and accuracy of our method in real-world scenes over 50 km. The experimental results show that the proposed method is able to achieve an average lateral accuracy of 0.059 m and longitudinal accuracy of 0.158 m, which demonstrates the practicality of the proposed system in autonomous driving applications.
|
|
10:10-10:20, Paper MoA-15.2 | |
Robust Onboard Localization in Changing Environments Exploiting Text Spotting |
|
Zimmerman, Nicky | University of Bonn |
Wiesmann, Louis | University of Bonn |
Guadagnino, Tiziano | Sapienza University of Rome |
Läbe, Thomas | University of Bonn |
Behley, Jens | University of Bonn |
Stachniss, Cyrill | University of Bonn |
Keywords: Localization
Abstract: Robust localization in a given map is a crucial component of most autonomous robots. In this paper, we address the problem of localizing in an indoor environment that changes and where prominent structures have no correspondence in the map built at a different point in time. To overcome the discrepancy between the map and the observed environment caused by such changes, we exploit human-readable localization cues to assist localization. These cues are readily available in most facilities and can be detected using RGB camera images by utilizing text spotting. We integrate these cues into a Monte Carlo localization framework using a particle filter that operates on 2D LiDAR scans and camera data. By this, we provide a robust localization solution for environments with structural changes and dynamics by humans walking. We evaluate our localization framework on multiple challenging indoor scenarios in an office environment. The experiments suggest that our approach is robust to structural changes and can run on an onboard computer. We release an open source implementation of our approach (upon paper acceptance), which uses off-the-shelf text spotting, written in C++ with a ROS wrapper.Robust localization in a given map is a crucial component of most autonomous robots. In this paper, we address the problem of localizing in an indoor environment that changes and where prominent structures have no correspondence in the map built at a different point in time. To overcome the discrepancy between the map and the observed environment caused by such changes, we exploit human-readable localization cues to assist localization. These cues are readily available in most facilities and can be detected using RGB camera images by utilizing text spotting. We integrate these cues into a Monte Carlo localization framework using a particle filter that operates on 2D LiDAR scans and camera data. By this, we provide a robust localization solution for environments with structural changes and dynamics by humans walking. We evaluate our localization framework on multiple challenging indoor scenarios in an office environment. The experiments suggest that our approach is robust to structural changes and can run on an onboard computer. We release an open source implementation of our approach (upon paper acceptance), which uses off-the-shelf text spotting, written in C++ with a ROS wrapper.
|
|
10:20-10:30, Paper MoA-15.3 | |
Fast Scan Context Matching for Omnidirectional 3D Scan |
|
Kihara, Hikaru | Kumamoto University |
Kumon, Makoto | Kumamoto University |
Nakatsuma, Kei | Kumamoto University |
Furukawa, Tomonari | University of Virginia |
Keywords: Localization
Abstract: Autonomous robots need to recognize the environment by identifying the scene. Scan context is one of global descriptors, and it encodes the three-dimensional scan data of the scene for the identification in a matrix form. Scan context is in a matrix form that is simple to store, but the matching of scan contexts can require computational effort because the descriptor is orientation-dependent. Because a scan context of an omnidirectional LiDAR scan becomes periodic in azimuth, this paper proposes to compute the scan context matching efficiently incorporating the cross-correlation with fast Fourier transform, and, hence, the method is named fast scan context matching. The effectiveness of the proposed method for computation time, accuracy, and robustness are reported in this paper. It is also shown that the method was also tested as a loop closure detector of a SLAM package as a practical application and that the proposed method outperformed the conventional scan context matching.
|
|
10:30-10:40, Paper MoA-15.4 | |
Probabilistic Object Maps for Long-Term Robot Localization |
|
Adkins, Amanda | University of Texas at Austin |
Chen, Taijing | University of Texas at Austin |
Biswas, Joydeep | University of Texas at Austin |
Keywords: Localization, Mapping
Abstract: Robots deployed in settings such as warehouses and parking lots must cope with frequent and substantial changes when localizing in their environments. While many previous localization and mapping algorithms have explored methods of identifying and focusing on long-term features to handle change in such environments, we propose a different approach – can a robot understand the distribution of movable objects and relate it to observations of such objects to reason about global localization? In this paper, we present probabilistic object maps (POMs), which represent the distributions of movable objects using pose-likelihood sample pairs derived from prior trajectories through the environment and use a Gaussian process classifier to generate the likelihood of an object at a query pose. We also introduce POM-Localization, which uses an observation model based on POMs to perform inference on a factor graph for globally consistent long-term localization. We present empirical results showing that POM-Localization is indeed effective at producing globally consistent localization estimates in challenging real-world environments and that POM-Localization improves trajectory estimates even when the POM is formed from partially incorrect data.
|
|
10:40-10:50, Paper MoA-15.5 | |
Level Set-Based Camera Pose Estimation from Multiple 2D/3D Ellipse-Ellipsoid Correspondences |
|
Zins, Matthieu | Inria |
Simon, Gilles | Loria |
Berger, Marie-Odile | INRIA |
Keywords: Localization, SLAM
Abstract: In this paper, we propose an object-based camera pose estimation from a single RGB image and a pre-built map of objects, represented with ellipsoidal models. We show that contrary to point correspondences, the definition of a cost function characterizing the projection of a 3D object onto a 2D object detection is not straightforward. We develop an ellipse-ellipse cost based on level sets sampling, demonstrate its nice properties for handling partially visible objects and compare its performance with other common metrics. Finally, we show that the use of a predictive uncertainty on the detected ellipses allows a fair weighting of the contribution of the correspondences which improves the computed pose. The code is released at gitlab.inria.fr/tangram/level-set-based-camera-pose-estimation.
|
|
10:50-11:00, Paper MoA-15.6 | |
Optimal Localizability Criterion for Positioning with Distance-Deteriorated Relative Measurements |
|
Cano, Justin | Polytechnique Montréal |
Pages, Gaël | ISAE-SUPAERO |
Chaumette, Eric | University of Toulouse/Isae-Supaero |
Le Ny, Jerome | Polytechnique Montreal |
Keywords: Localization, Multi-Robot Systems
Abstract: Position estimation in Multi-Robot Systems (MRS) relies on relative angle or distance measurements between the robots, which generally deteriorate as distances increase. Moreover, the localization accuracy is strongly influenced both by the quality of the raw measurements but also by the overall geometry of the network. In this paper, we design a cost function that accounts for these two issues and can be used to develop motion planning algorithms that optimize the localizability in MRS, i.e., the ability of individual robots to localize themselves accurately. This cost function is based on computing new Cramer Rao Lower Bounds characterizing the achievable positioning performance with range and angle measurements that deteriorate with increasing distances. We describe a gradient-based motion-planning algorithm for MRS deployment that can be implemented in a distributed manner, as well as a non-myopic strategy to escape local minima. Finally, we test the proposed methodology experimentally for range measurements obtained using ultra-wide band transceivers and illustrate the improvements resulting from leveraging the more accurate measurement model in the robot placement algorithms.
|
|
11:00-11:10, Paper MoA-15.7 | |
Improving Worst Case Visual Localization Coverage Via Place-Specific Sub-Selection in Multi-Camera Systems |
|
Hausler, Stephen | Queensland University of Technology |
Xu, Ming | Queensland University of Technology |
Garg, Sourav | Queensland University of Technology |
Chakravarty, Punarjay | Ford Motor Company |
Shrivastava, Shubham | Ford Greenfield Labs |
Vora, Ankit | Ford AV LLC |
Milford, Michael J | Queensland University of Technology |
Keywords: Localization, Autonomous Vehicle Navigation
Abstract: 6-DoF visual localization systems utilize principled approaches rooted in 3D geometry to perform accurate camera pose estimation of images to a map. Current techniques use hierarchical pipelines and learned 2D feature extractors to improve scalability and increase performance. However, despite gains in typical recall@0.25m type metrics, these systems still have limited utility for real-world applications like autonomous vehicles because of their worst areas of performance - the locations where they provide insufficient recall at a certain required error tolerance. Here we investigate the utility of using place specific configurations, where a map is segmented into a number of places, each with its own configuration for modulating the pose estimation step, in this case selecting a camera within a multi-camera system. On the Ford AV benchmark dataset, we demonstrate substantially improved worst-case localization performance compared to using off-the-shelf pipelines - minimizing the percentage of the dataset which has low recall at a certain error tolerance, as well as improved overall localization performance. Our proposed approach is particularly applicable to the crowdsharing model of autonomous vehicle deployment, where a fleet of AVs are regularly traversing a known route.
|
|
11:10-11:20, Paper MoA-15.8 | |
Hybrid Interval-Probabilistic Localization in Building Maps |
|
Ehambram, Aaronkumar | Institute of Systems Engineering, Leibniz Universität Hannover |
Jaulin, Luc | ENSTA-Bretagne |
Wagner, Bernardo | Leibniz Universität Hannover |
Keywords: Localization
Abstract: We present a novel online capable hybrid interval-probabilistic localization method using publicly available 2D building maps. Given an initially large uncertainty for the orientation and position derived from GNSS data, our novel interval-based approach first narrows down the orientation to a smaller interval and provides a set described by a minimal polygon for the position of the vehicle that encloses the feasible set of poses by taking the building geometry into account using 3D Light Detection and Ranging (LiDAR) sensor data. Second, we perform a probabilistic Maximum Likelihood Estimation (MLE) to determine the best solution within the determined feasible set. The MLE is converted into a least-squares problem that is solved by an optimization approach that takes the bounds of the solution set into account so that only a solution within the feasible set is selected as the most likely one. We experimentally show with real data that the novel interval-based localization provides sets of poses that contain the true pose for more than 99% of the frames and that the bounded optimization provides more reliable results compared to a classical unbounded optimization and a Monte Carlo Localization approach.
|
|
11:20-11:30, Paper MoA-15.9 | |
Online Target Localization Using Adaptive Belief Propagation in the HMM Framework |
|
Seo, Min-Won | University of California, Irvine |
Kia, Solmaz | Uinversity of California Irvine |
Keywords: Localization, Sensor Networks, Range Sensing
Abstract: This paper proposes a novel adaptive sample space-based Viterbi algorithm for target localization in an online manner. The method relies on discretizing the target's motion space into cells representing a finite number of hidden states. Then, the most probable trajectory of the tracked target is computed via dynamic programming in a Hidden Markov Model (HMM) framework. The proposed method uses a Bayesian estimation framework which is neither limited to Gaussian noise models nor requires a linearized target motion model or sensor measurement models. However, an HMM-based approach to localization can suffer from poor computational complexity in scenarios where the number of hidden states increases due to high-resolution modeling or target localization in a large space. To improve this poor computational complexity, this paper proposes a belief propagation in the most probable belief space with a low to high-resolution sequentially, reducing the required resources significantly. The proposed method is inspired by the k-d Tree algorithm (e.g., quadtree) commonly used in the computer vision field. Experimental tests using an ultra-wideband (UWB) sensor network demonstrate our results.
|
|
MoA-16 |
Rm16 (Room 510) |
Representation Learning |
Regular session |
Chair: Taniguchi, Tadahiro | Ritsumeikan University |
Co-Chair: Haddadin, Sami | Technical University of Munich |
|
10:00-10:10, Paper MoA-16.1 | |
Reshaping Robot Trajectories Using Natural Language Commands: A Study of Multi-Modal Data Alignment Using Transformers |
|
Bucker, Arthur Fender Coelho | Universidade De São Paulo |
Figueredo, Luis Felipe Cruz | Technical University of Munich (TUM) |
Haddadin, Sami | Technical University of Munich |
Kapoor, Ashish | MicroSoft |
Ma, Shuang | Microsoft |
Bonatti, Rogerio | Microsoft |
Keywords: Representation Learning, Human Factors and Human-in-the-Loop, Multi-Modal Perception for HRI
Abstract: Natural language is the most intuitive medium for us to interact with other people when expressing commands and instructions. However, using language is seldom an easy task when humans need to express their intent towards robots, since most of the current language interfaces require rigid templates with a static set of action targets and commands. In this work, we provide a flexible language-based interface for human-robot collaboration, which allows a user to reshape existing trajectories for an autonomous agent. We take advantage of recent advancements in the field of large language models (BERT and CLIP) to encode the user command, and then combine these features with trajectory information using multi-modal attention transformers. We train the model using imitation learning over a dataset containing robot trajectories modified by language commands, and treat the trajectory generation process as a sequence prediction problem, analogously to how language generation architectures operate. We evaluate the system in multiple simulated trajectory scenarios, and show a significant performance increase of our model over baseline approaches. In addition, our real-world experiments with a robot arm show that users significantly prefer our natural language interface over traditional methods such as kinesthetic teaching or cost-function programming. Our study shows how the field of robotics can take advantage of large pre-trained language models towards creating more intuitive interfaces between robots and machines. Project webpage: https://arthurfenderbucker.github.io/NL_trajectory_reshaper
|
|
10:10-10:20, Paper MoA-16.2 | |
DreamingV2: Reinforcement Learning with Discrete World Models without Reconstruction |
|
Okada, Masashi | Panasonic Corporation |
Taniguchi, Tadahiro | Ritsumeikan University |
Keywords: Reinforcement Learning, Representation Learning, Machine Learning for Robot Control
Abstract: The present paper proposes a novel reinforcement learning method with world models, DreamingV2, a collaborative extension of DreamerV2 and Dreaming. DreamerV2 is a cutting-edge model-based reinforcement learning from pixels that uses discrete world models to represent latent states with categorical variables. Dreaming is also a form of reinforcement learning from pixels that attempts to avoid the autoencoding process in general world model training by involving a reconstruction-free contrastive learning objective. The proposed DreamingV2 is a novel approach of adopting both the discrete representation of DreamingV2 and the reconstruction-free objective of Dreaming. Compared to DreamerV2 and other recent model-based methods without reconstruction, DreamingV2 achieves the best scores on five simulated challenging 3D robot arm tasks. We believe that DreamingV2 will be a reliable solution for robot learning since its discrete representation is suitable to describe discontinuous environments, and the reconstruction-free fashion well manages complex vision observations.
|
|
10:20-10:30, Paper MoA-16.3 | |
Playful Interactions for Representation Learning |
|
Young, Sarah | UC Berkeley |
Pari, Jyothish | New York University |
Abbeel, Pieter | UC Berkeley |
Pinto, Lerrel | New York University |
Keywords: Representation Learning, Bioinspired Robot Learning, Imitation Learning
Abstract: One of the key challenges in visual imitation learning is collecting large amounts of expert demonstrations for a given task. While methods for collecting human demonstrations are becoming easier with teleoperation methods and the use of low-cost assistive tools, we often still require 100-1000 demonstrations for every task to learn a visual representation and policy. To address this, we turn to an alternate form of data that does not require task-specific demonstrations -- play. Playing is a fundamental method children use to learn a set of skills and behaviors and visual representations in early learning. Importantly, play data is diverse, task-agnostic, and relatively cheap to obtain. In this work, we propose to use playful interactions in a self-supervised manner to learn visual representations for downstream tasks. We collect 2 hours of playful data in 19 diverse environments and use self-predictive learning to extract visual representations. Given these representations, we train policies using imitation learning for two downstream tasks: Pushing and Stacking. We demonstrate that our visual representations generalize better than standard behavior cloning and can achieve similar performance with only half the number of required demonstrations. Our representations, which are trained from scratch, compare favorably against ImageNet pretrained representations. Finally, we provide an experimental analysis on the effects of different pretraining modes on downstream task learning.
|
|
10:30-10:40, Paper MoA-16.4 | |
COMPASS: Contrastive Multimodal Pretraining for Autonomous Systems |
|
Ma, Shuang | Microsoft |
Vemprala, Sai | Microsoft Corporation |
Wang, Wenshan | Carnegie Mellon University |
Gupta, Jayesh | Stanford University |
Song, Yale | Meta AI |
McDuff, Daniel | Microsoft |
Kapoor, Ashish | MicroSoft |
Keywords: Representation Learning, AI-Enabled Robotics, Deep Learning for Visual Perception
Abstract: Learning representations that generalize across tasks and domains is challenging yet necessary for autonomous systems. Although task-driven approaches are appealing, designing models specific to each application can be difficult in the face of limited data, especially when dealing with highly variable multimodal input spaces arising from different tasks in different environments. We introduce the first general-purpose pretraining pipeline, COntrastive Multimodal Pretraining for AutonomouS Systems (COMPASS), to overcome the limitations of task-specific models and existing pretraining approaches. COMPASS constructs a multimodal graph by considering the essential information for autonomous systems and the properties of different modalities. Through this graph, multimodal signals are connected and mapped into two factorized spatio-temporal latent spaces: a 'motion pattern space' and a 'current state space'. By learning from multimodal correspondences in each latent space, COMPASS creates state representations that models necessary information such as temporal dynamics, geometry, and semantics. We pretrain COMPASS on a large-scale multimodal simulation dataset TartanAir~cite{tartanair2020iros} and evaluate it on drone navigation, vehicle racing, and visual odometry tasks. The experiments indicate that COMPASS can tackle all three scenarios and can also generalize to unseen environments and real-world data.
|
|
10:40-10:50, Paper MoA-16.5 | |
Explainable Knowledge Graph Embedding: Inference Reconciliation for Knowledge Inferences Supporting Robot Actions |
|
Daruna, Angel | Georgia Institute of Technology, Atlanta, GA 30332 |
Das, Devleena | Georgia Institute of Technology |
Chernova, Sonia | Georgia Institute of Technology |
Keywords: AI-Enabled Robotics, Representation Learning, Human-Centered Robotics
Abstract: Learned knowledge graph representations supporting robots contain a wealth of domain knowledge that drives robot behavior. However, there does not exist an inference reconciliation framework that expresses how a knowledge graph representation affects a robot's sequential decision making. We use a pedagogical approach to explain the inferences of a learned, black-box knowledge graph representation, a knowledge graph embedding. Our interpretable model uses a decision tree classifier to locally approximate the predictions of the black-box model and provides natural language explanations interpretable by non-experts. Results from our algorithmic evaluation affirm our model design choices, and the results of our user studies with non-experts support the need for the proposed inference reconciliation framework. Critically, results from our simulated robot evaluation indicate that our explanations enable non-experts to correct erratic robot behaviors due to nonsensical beliefs within the black-box.
|
|
10:50-11:00, Paper MoA-16.6 | |
Neural Scene Representation for Locomotion on Structured Terrain |
|
Hoeller, David | ETH Zurich, NVIDIA |
Rudin, Nikita | ETH Zurich, NVIDIA |
Choy, Christopher | NVIDIA |
Anandkumar, Anima | Caltech |
Hutter, Marco | ETH Zurich |
Keywords: Representation Learning, Deep Learning for Visual Perception, Legged Robots
Abstract: We propose a learning-based method to reconstruct the local terrain for locomotion with a mobile robot traversing urban environments. Using a stream of depth measurements from the onboard cameras and the robot's trajectory, the algorithm estimates the topography in the robot's vicinity. The raw measurements from these cameras are noisy and only provide partial and occluded observations that in many cases do not show the terrain the robot stands on. Therefore, we propose a 3D reconstruction model that faithfully reconstructs the scene, despite the noisy measurements and large amounts of missing data coming from the blind spots of the camera arrangement. The model consists of a 4D fully convolutional network on point clouds that learns the geometric priors to complete the scene from the context and an auto-regressive feedback to leverage spatio-temporal consistency and use evidence from the past. The network can be solely trained with synthetic data, and due to extensive augmentation, it is robust in the real world, as shown in the validation on a quadrupedal robot, ANYmal, traversing challenging settings. We run the pipeline on the robot's onboard low-power computer using an efficient sparse tensor implementation and show that the proposed method outperforms classical map representations.
|
|
11:00-11:10, Paper MoA-16.7 | |
MO-Transformer: A Transformer-Based Multi-Object Point Cloud Reconstruction Network |
|
Lyu, Erli | Harbin Institute of Technology Shenzhen |
Zhang, Zhengyan | Harbin Institute of Technology, Shenzhen |
Liu, Wei | Southern University of Science and Technology |
Wang, Jiaole | Harbin Institute of Technology, Shenzhen |
Song, Shuang | Harbin Institute of Technology (Shenzhen) |
Meng, Max Q.-H. | The Chinese University of Hong Kong |
Keywords: Representation Learning, Deep Learning for Visual Perception
Abstract: This paper proposes a new network for reconstructing multi-object point cloud. Different from previous networks which reconstruct multi-object point cloud as a whole, our network iteratively reconstructs each individual object point cloud from a frame of multi-object point cloud. To achieve this goal, we have designed MO-Transformer, a transformer-based autoregressive network. During training, MO-Transformer takes a frame of multi-object point cloud and individual object point clouds as input. During testing, MOTransformer iteratively reconstructs individual object point clouds only based on the input multi-object point cloud. To train the proposed MO-Transformer, we design a new loss function called separate Chamfer distance (SCD). In addition, we prove that SCD is an upper bound of the traditional Chamfer distance calculated based on the entire multi-object point cloud. The reconstruction experiment verifies the efficacy of our network in multi-object point cloud reconstruction. Furthermore, the reconstruction experiment also investigates the effect of different dimensions using a series of datasets. The ablation study experiment verifies the necessity of SCD in training MO-Transformer.
|
|
11:10-11:20, Paper MoA-16.8 | |
Self-Supervised Feature Learning from Partial Point Clouds Via Pose Disentanglement |
|
Tsai, Meng-Shiun | National Yang Ming Chiao Tung University |
Chiang, Pei-Ze | National Yang Ming Chiao Tung University |
Tsai, Yi-Hsuan | NEC Labs America |
Chiu, Wei-Chen | National Chiao Tung University |
Keywords: Representation Learning, Object Detection, Segmentation and Categorization, Deep Learning for Visual Perception
Abstract: Self-supervised learning on point clouds has gained a lot of attention recently, since it addresses the label-efficiency and domain-gap problems on point cloud tasks. In this paper, we propose a novel self-supervised framework to learn informative features from partial point clouds. We leverage partial point clouds scanned by LiDAR that contain both content and pose attributes, and we show that disentangling such two factors from partial point clouds enhances feature learning. To this end, our framework consists of three main parts: 1) a completion network to capture holistic semantics of point clouds; 2) a pose regression network to understand the viewing angle where partial data is scanned from; 3) a partial reconstruction network to encourage the model to learn content and pose features. To demonstrate the robustness of the learnt feature representations, we conduct several downstream tasks including classification, part segmentation, and registration, with comparisons against state-of-the-art methods. Our method not only outperforms existing self-supervised methods, but also shows a better generalizability across synthetic and real-world datasets.
|
|
11:20-11:30, Paper MoA-16.9 | |
Efficiently Learning Manipulations by Selecting Structured Skill Representations |
|
Sharma, Mohit | Carnegie Mellon University |
Kroemer, Oliver | Carnegie Mellon University |
Keywords: Deep Learning in Grasping and Manipulation, Learning from Demonstration, Reinforcement Learning
Abstract: A key challenge in learning to perform manipulation tasks is selecting a suitable skill representation. While specific skill representations are often easier to learn, they are often only suitable for a narrow set of tasks. In most prior works, roboticists manually provide the robot with a suitable skill representation to use, e.g., a neural network or DMPs. By contrast, we propose to allow the robot to select the most appropriate skill representation for the underlying task. Given the large space of skill representations, we utilize a single demonstration to select a small set of potential task-relevant representations. This set is then further refined using reinforcement learning to select the most suitable skill representation. Experiments in both simulation and real world show how our proposed approach leads to improved sample efficiency and enables directly learning on the real robot.
|
|
MoA-17 |
Rm17 (Room 553) |
Automation and Robotics at Micro-Nano Scales |
Regular session |
Chair: Kojima, Masaru | Osaka University |
Co-Chair: Sugiura, Hirotaka | The University of Tokyo |
|
10:00-10:10, Paper MoA-17.1 | |
Hierarchical Learning and Control for In-Hand Micromanipulation Using Multiple Laser-Driven Micro-Tools |
|
Jia, Yongyi | Tsinghua University |
Chen, Yu | Tsinghua University |
Liu, Hao | Tsinghua University |
Li, Xiu | Tsinghua University |
Li, Xiang | Tsinghua University |
Keywords: Automation at Micro-Nano Scales, Reinforcement Learning, Robust/Adaptive Control
Abstract: Laser-driven micro-tools are formulated by treating highly-focused laser beams as actuators, to control the tool's motion to contact then manipulate a micro-object, which allows it to manipulate opaque micro-objects, or large cells without causing photodamage. However, most existing laser-driven tools are limited to relatively simple tasks, such as moving and caging, and cannot carry out in-hand dexterous tasks. This is mainly because in-hand manipulation involves continuously coordinating multiple laser beams, micro-tools, and the object itself, which has high degrees of freedom (DoF) and poses up the challenge for planner and controller design. This paper presents a new hierarchical formulation for the grasping and manipulation of micro-objects using multiple laser-driven micro-tools. In hardware, multiple laser-driven tools are assembled to act as a robotic hand to carry out in-hand tasks (e.g., rotating); in software, a hierarchical scheme is developed to shrunken the action space and coordinate the motion of multiple tools, subject to both the parametric uncertainty in the tool and the unknown dynamic model of the object. Such a formulation provides a potential for achieving robotic in-hand manipulation at a micro scale. The performance of the proposed system is validated in simulation studies under different scenarios.
|
|
10:10-10:20, Paper MoA-17.2 | |
A Micro-Robotic Approach for the Correction of Angular Deviations in AFM Samples from Generic Topographic Data |
|
Romero Leiro, Freddy | Sorbonne Université - Institut Des Systèmes Intélligents Et Robo |
Bazaei, Ali | University of Newcastle, Australia |
Régnier, Stéphane | Sorbonne University |
Boudaoud, Mokrane | Sorbonne Université |
Keywords: Micro/Nano Robots, Automation at Micro-Nano Scales, Motion Control
Abstract: This article proposes a method for the correction of angular deviations caused during the fixing process of samples prepared for Atomic Force Microscopy (AFM). The correction is done using the angular control of a 6-DOF PPPS parallel platform were the sample is placed, while the AFM scan is performed by a 3-DOF serial cartesian robot with a tuning fork probe designed to perform FM-AFM. The method uses the generic x, y, and z data provided by the AFM after performing a scan on a free surface of the sample substrate. This is used to calculate the plane that closest approximates the points by solving a system of linear equations. This plane is then used to estimate the angular corrections that the 6-DOF parallel robot has to do in order to compensate the deviations. The proposed algorithm can be performed iteratively in order to refine the correction. The method also does not require any special preparation of the substrate. It only requires to have a free surface to scan. Experiments are performed using this algorithm to correct the orientation deviation of a substrate of V1 High-grade mica. The results show that the method is able to correct the angular deviation of the sample relatively to the AFM probe with an error of 0.2 ° after only two iterations of the algorithm.
|
|
10:20-10:30, Paper MoA-17.3 | |
Modeling and Characterization of Artificial Bacteria Flagella with Micro-Structured Soft-Magnetic Teeth |
|
Yu, Zejie | City University of Hong Kong |
Hou, Chaojian | City University of Hong Kong |
Wang, Shuideng | City University of Hongkong |
Wang, Kun | City University of Hong Kong |
Chen, Donglei | City University of HongKong |
Zhang, Wenqi | City University of HongKong |
Qu, Zhi | City University of Hong Kong |
Sun, Zhiyong | The University of Hong Kong |
Song, Bo | Hefei Institutes of Physical Science, Chinese Academy of Science |
Zhou, Chao | Institute of Plasma Physics, Hefei Institute of Physical Science |
Dong, Lixin | City University of Hong Kong |
Keywords: Micro/Nano Robots, Nanomanufacturing, Automation at Micro-Nano Scales
Abstract: Sub-structures such as micro-structured magnetic teeth fabricated with an artificial bacteria flagellum (ABF) are designed for achieving more motion modes, higher precision, and better controllability. To achieve these, a more precise model considering the non-circular cross-sectional features is setup without simplifying the structure as a helical filament with a circular cross-section as having been used in previous investigations, making it possible to include the effects of the substructures into the motion equation. Analyses and experiments verified the correctness. Besides of the geometric effects, our experimental observation also shows an anomalous step-out frequency appeared in an ABF. This asynchronous motion is attributed to the lag of magnetization with respect to the external rotating magnetic field due to the geometries and the soft-magnetic materials of the ribbons, which is different from the regular asynchronous motion solely caused by low Reynolds number of fluid to microscopic swimmers. While the lag of magnetization can be further attributed initiatively to the soft magnetic materials adopted, the feasibility to arrange the easy axis will enable many new possibilities, which is of particular interest in generating more modes for swarms such as cascade stepping out of ABFs with the same nominal overall sizes and for more precise positioning using stepping motion.
|
|
10:30-10:40, Paper MoA-17.4 | |
Deep Learning-Based 3D Magnetic Microrobot Tracking Using 2D MR Images |
|
Tiryaki, Mehmet Efe | Max Plank Institute for Intelligent Systems |
Demir, Sinan Ozgun | Max Planck Institute for Intelligent Systems |
Sitti, Metin | Max-Planck Institute for Intelligent Systems |
Keywords: Micro/Nano Robots, Deep Learning Methods, Visual Servoing
Abstract: Magnetic resonance imaging (MRI)-guided robots emerged as a promising tool for minimally invasive medical operations. Recently, MRI scanners have been proposed for actuating magnetic microrobots and localizing them in the patient’s body using two-dimensional (2D) MR images. However, three- dimensional (3D) magnetic microrobots tracking during motion is still an untackled issue in MRI-powered microrobotics. Here, we present a deep learning-based 3D magnetic microrobot tracking method using 2D MR images during microrobot motion. The proposed method comprises a convolutional neural network (CNN) and complementary particle filter for 3D microrobot tracking. The CNN localizes the microrobot position relative to the 2D MRI slice and classifies the microrobot visibility in the MR images. First, we create an ultrasound (US) imaging-mentored MRI-based microrobot imaging and actuation system to train the CNN. Then, we trained the CNN using the MRI data generated by automated experiments using US image-based visual servoing of a microrobot with a 500 μm-diameter magnetic core. We showed that the proposed CNN can localize the microrobot and classified its visibility in an in vitro environment with ±0.56 mm and 87.5% accuracy in 2D MR images, respectively. Furthermore, we demonstrated ex-vivo 3D microrobot tracking with ±1.43 mm accuracy, improving tracking accuracy by 60% compared to the previous studies. The presented tracking strategy will enable MRI-powered microrobots to be used in high-precision targeted medical applications in the future.
|
|
10:40-10:50, Paper MoA-17.5 | |
A PZT-Driven 6-DOF High-Speed Micromanipulator for Circular Vibration Simulation and Whirling Flow Generation |
|
Luo, Weikun | Beijing Institute of Technology |
Liu, Xiaoming | Beijing Institute of Technology |
Tang, Xiaoqing | Beijing Institute of Technology |
Liu, Dan | Beijing Institute of Technology |
Kojima, Masaru | Osaka University |
Huang, Qiang | Beijing Institute of Technology |
Arai, Tatsuo | University of Electro-Communications |
Keywords: Dexterous Manipulation, Micro/Nano Robots, Biological Cell Manipulation
Abstract: Existing micromanipulation methods, whether contact mi-cromanipulation or non-contact micromanipulation, can hardly meet the requirements of low damage and multiple functions in the biomedical field. This study provides a high-speed mi-cromanipulator that can simulate circular vibrations and gen-erate microflow, which could be utilized to actuate non-invasive multiple operations of biological targets at the microscale. We design a PZT-driven 6-DOF micromanipulator with a hybrid structure and flexible hinges. Two 3-PRS parallel modules are serially connected in mirror style to achieve high speed, high accuracy, and a big workspace simultaneously, which enables the highly-controllable circular vibration simulation and strong whirling flow generation. The static, dynamic, and trajectory tracking performances were evaluated. The experimental result showed the 324 × 331 × 40 μm3 workspace, which was also the range of trajectories that could be generated. Trajectory track-ing performance evaluation showed that it could realize ~200 Hz circular vibration with an error of about 2.4% through open-loop control. Finally, the microflow generation experiment indicated the great potential of the proposed micromanipulator and the method of whirling flow generation in operating the biological targets at the microscale.
|
|
10:50-11:00, Paper MoA-17.6 | |
Real-Time Acoustic Holography with Physics-Based Deep Learning for Acoustic Robotic Manipulation |
|
Zhong, Chengxi | ShanghaiTech University |
Sun, Zhenhuan | Shanghaitech University |
Lyu, Kunyong | Shanghaitech University |
Guo, Yao | Shanghai Jiao Tong University |
Liu, Song | ShanghaiTech University |
Keywords: Automation at Micro-Nano Scales, Micro/Nano Robots, Grippers and Other End-Effectors
Abstract: Acoustic holography is a newly emerging and promising technique to dynamically generate arbitrary desired holographic acoustic field in 3D space for contactless robotic manipulation. The latest technology supporting complex dynamic holographic acoustic field reconstruction is through phased transducer array (PTA), where the phase profile of emitted acoustic wave from discrete transducers is controlled independently by sophisticated circuits to modulate the acoustic interference field. While the forward kinematics of a phased array based robotic manipulation system is simple and straightforward, the inverse kinematics of the required holographic acoustic field is mathematically non-linear and unsolvable, which substantially limits the application of dynamic holographic acoustic field for robot manipulation. In this work, we propose a physics-based deep learning framework for this phase retrieval inverse kinematics problem so that the target complex hologram could be reconstructed precisely with average MAE of 0.022 and in real time with prediction time of 47 milliseconds on GPU. The accuracy and real time of the proposed method for dynamic holographic acoustic field reconstruction from PTA are demonstrated experimentally.
|
|
11:00-11:10, Paper MoA-17.7 | |
On-Chip Fabrication of Micro-Chain Robot with Selective Magnetization Using Magnetically Guided Arraying Microfluidic Devices |
|
Tang, Xiaoqing | Beijing Institute of Technology |
Liu, Xiaoming | Beijing Institute of Technology |
Li, Yuyang | Beijing Institute of Technology |
Liu, Dan | Beijing Institute of Technology |
Li, Yuke | Beijing Institute of Technology |
Kojima, Masaru | Osaka University |
Huang, Qiang | Beijing Institute of Technology |
Arai, Tatsuo | University of Electro-Communications |
Keywords: Soft Robot Materials and Design, Micro/Nano Robots, Assembly
Abstract: The magnetic microrobot has become a promising approach in many biomedical applications due to its small volume, flexible motion, and untethered micromachines. The micro-chain robot is one of the most popular magnetic microrobots. However, the uncontrollable magnetic moment direction and quantity of the magnetic beads consisted in the existing self-assembled micro-chain robot limit their locomotion and applications. This paper proposed an on-chip micro-chain robot fabrication method to assemble the magnetic beads with controllable magnetic moment direction and quantity. The bead quantity can be controlled by the structure limits of the microchannel, and the direction of the magnetic moment can be adjusted by the integrated external magnetic field. The assembled magnetic beads are then glued by the hydrogel under UV exposure. The micro-chain robots with different quantities and magnetic moment directions of the magnetic beads were successfully fabricated and tested in experiments. Due to the array structure of the microfluidic device, batch manufacturing of low-cost magnetic robots was achieved in our method. The movement of dual-bead microrobots with two orthogonal magnetic moment directions was analyzed and compared. One of the dual-bead microrobots was applied in the transportation of the hydrogel module using pushing and pulling modes. It indicated that the proposed controllable on-chip fabrication of the magnetic micro-chain robots has the potential to enhance the microrobot ability in biomedical applications.
|
|
11:10-11:20, Paper MoA-17.8 | |
On-Chip Automatic Trapping and Rotating for Zebrafish Embryo Injection |
|
Chen, Zhuo | Beijing Institute of Technology |
Liu, Xiaoming | Beijing Institute of Technology |
Tang, Xiaoqing | Beijing Institute of Technology |
Li, Yuyang | Beijing Institute of Technology |
Liu, Dan | Beijing Institute of Technology |
Li, Yuke | Beijing Institute of Technology |
Huang, Qiang | Beijing Institute of Technology |
Arai, Tatsuo | University of Electro-Communications |
Keywords: Automation at Micro-Nano Scales, Biological Cell Manipulation, Robotics and Automation in Life Sciences
Abstract: Zebrafish embryo injection is often required in biomedical research using zebrafish. In the injecting operation, trapping and rotating the zebrafish embryo to achieve a proper posture is essential for the high success rate. We proposed an on-chip platform capable of efficient and automatic trapping and rotating for the injection of zebrafish embryos. A low-cost 3D-printed microchannel is designed to trap zebrafish embryos into the cavity array. The blind-hole design at each cavity can generate microbubbles, and the bubbles exposed to the acoustic wave with a specific frequency can trap and rotate the zebrafish embryos. The progress, including trapping and rotating, can be monitored and executed automatically with computer vision assistance. Experimental results show that we realized on-chip trapping and rotating operations successfully. The success rate of trapping zebrafish embryos was up to 99%, and the time of trapping a single embryo was as low as 0.2 s. Embryo rotation could be achieved by two different modes, including continuous rotation and intermittent rotation. The accuracy and maximal velocity of rotating the embryo reached 5° and 3.5 r/s, respectively. Thus, we believe the proposed efficient automatic on-chip trapping and rotating platform could support the zebrafish embryo injection well.
|
|
11:20-11:30, Paper MoA-17.9 | |
Independent Control Strategy of Multiple Magnetic Flexible Millirobots for Position Control and Path Following (I) |
|
Xu, Tiantian | Chinese Academy of Sciences |
Huang, Chenyang | Shenzhen Institutes of Advanced Technology, Chinese Academy of S |
Lai, Zhengyu | Chinese Academy of Science, Shenzhen Institude of Advanced Techn |
Wu, Xinyu | CAS |
Keywords: Micro/Nano Robots, Multi-Robot Systems, Motion Control
Abstract: Magnetically actuated small-scale robots have great potential for numerous applications in remote, confined, or enclosed environments. Multiple small-scale robots enable cooperation and increase the operating efficiency. However, independent control of multiple magnetic small-scale robots is a great challenge, because the robots receive identical control inputs from the same external magnetic field. In this article, we propose a novel strategy of completely decoupled independent control of magnetically actuated flexible swimming millirobots. A flexible millirobot shows a crawling motion on a flat plane within an oscillating magnetic field. Millirobots with different magnetization directions have the same velocity response curve to the oscillating magnetic field but with a difference of phase. We designed and fabricated a group of up to four heterogeneous millirobots with identical geometries and different magnetization directions. According to their velocity response curves, an optimal direction of oscillating magnetic field is calculated to induce a desired velocity vector for the millirobot group, one of which is nonzero and the others are approximately zero. The strategy is verified by experiments of independent position control of up to four millirobots and independent path following control of up to three millirobots with small errors. We further expect that with this independent control strategy, the millirobots will be able to cooperate to finish complicated tasks.
|
|
MoA-18 |
Rm18 (Room 554) |
Motion and Path Planning 1 |
Regular session |
Chair: Moosmann, Marius | Fraunhofer IPA |
Co-Chair: Yu, Jingjin | Rutgers University |
|
10:00-10:10, Paper MoA-18.1 | |
Learning Goal-Oriented Non-Prehensile Pushing in Cluttered Scenes |
|
Dengler, Nils | University of Bonn |
Großklaus, David | Rheinische Friedrich-Wilhelms-Universität Bonn |
Bennewitz, Maren | University of Bonn |
Keywords: Manipulation Planning, Machine Learning for Robot Control, Reinforcement Learning
Abstract: Pushing objects through cluttered scenes is a challenging task, especially when the objects to be pushed have initially unknown dynamics and touching other entities has to be avoided to reduce the risk of damage. In this paper, we approach this problem by applying deep reinforcement learning to generate pushing actions for a robotic manipulator acting on a planar surface where objects have to be pushed to goal locations while avoiding other items in the same workspace. With the latent space learned from a depth image of the scene and other observations of the environment, such as contact information between the end effector and the object as well as distance to the goal, our framework is able to learn contact-rich pushing actions that avoid collisions with other objects. As the experimental results with a six degrees of freedom robotic arm show, our system is able to successfully push objects from start to end positions while avoiding nearby objects. Furthermore, we evaluate our learned policy in comparison to a state-of-the-art pushing controller for mobile robots and show that our agent performs better in terms of success rate, collisions with other objects, and continuous object contact in various scenarios.
|
|
10:10-10:20, Paper MoA-18.2 | |
Transfer Learning for Machine Learning-Based Detection and Separation of Entanglements in Bin-Picking Applications |
|
Moosmann, Marius | Fraunhofer IPA |
Spenrath, Felix | Fraunhofer Institute for Manufacturing Engineering and Automatio |
Rosport, Johannes | Fraunhofer-Institut Für Produktionstechnik Und Automatisierung |
Melzer, Philipp | Fraunhofer-Institut Für Produktionstechnik Und Automatisierung I |
Kraus, Werner | Fraunhofer IPA |
Bormann, Richard | Fraunhofer IPA |
Huber, Marco F. | University of Stuttgart |
Keywords: Manipulation Planning, Deep Learning in Grasping and Manipulation, Grasping
Abstract: In this paper, we present a Domain Randomization and a Domain Adaptation approach to transfer experience for entanglement detection and separation from simulation into a real-world bin-picking application. We investigate the influence of different randomization options in image processing and use a CycleGAN as a further Domain Adaptation method to synthesize simulation data as realistically as possible. On the basis of this adapted data we re-train our detection and separation methods and validate the usefulness of these Sim-to-Real methods. In numerous real-world experiments we show that we achieve a significant increase of up to 71.74% in the performance of the overall system by using the Sim-to-Real approaches as opposed to the direct transfer.
|
|
10:20-10:30, Paper MoA-18.3 | |
Deep Reinforcement Learning Based on Local GNN for Goal-Conditioned Deformable Object Rearranging |
|
Deng, Yuhong | Tsinghua Univerisity |
Xia, Chongkun | Tsinghua University |
Wang, Xueqian | Center for Artificial Intelligence and Robotics, Graduate School |
Chen, Lipeng | Tencent |
Keywords: Manipulation Planning, Reinforcement Learning, Perception for Grasping and Manipulation
Abstract: Object rearranging is one of the most common deformable manipulation tasks, where the robot needs to rearrange a deformable object into a goal configuration. Previous studies focus on designing an expert system for each specific task by model-based or data-driven approaches and the application scenarios are therefore limited. Some research has been attempting to design a general framework to obtain more advanced manipulation capabilities for deformable rearranging tasks, with lots of progress achieved in simulation. However, transferring from simulation to reality is difficult due to the limitation of the end-to-end CNN architecture. To address these challenges, we design a local GNN (graph neural network) based learning method, which utilizes two representation graphs to encode keypoints detected from images. Self-attention is applied for graph updating and cross-attention is applied for generating manipulation actions. Extensive experiments have been conducted to demonstrate that our framework is effective in multiple 1-D (rope, rope ring) and 2-D (cloth) rearranging tasks in simulation and can be easily transferred to a real robot by fine-tuning a keypoint detector.
|
|
10:30-10:40, Paper MoA-18.4 | |
Controlling the Cascade: Kinematic Planning for N-Ball Toss Juggling |
|
Ploeger, Kai | Technische Universität Darmstadt |
Peters, Jan | Technische Universität Darmstadt |
Keywords: Manipulation Planning, Dynamics, Kinematics
Abstract: Dynamic movements are ubiquitous in human motor behavior as they tend to be more efficient and can solve a broader range of skill domains than their quasi-static counterparts. For decades, robotic juggling tasks have been among the most frequently studied dynamic manipulation problems since the required dynamic dexterity can be scaled to arbitrarily high difficulty. However, successful approaches have been limited to basic juggling skills, indicating a lack of understanding of the required constraints for dexterous toss juggling. We present a detailed analysis of the toss juggling task, identifying the key challenges of the switching contacts task and formalizing it as a trajectory optimization problem. Building on our state-of-the-art, real-world toss juggling platform, we reach the theoretical limits of toss juggling in simulation, evaluate a resulting real-time controller in environments of varying difficulty and achieve robust toss juggling of up to 17 balls on two anthropomorphic manipulators. https://sites.google.com/view/controlling-the-cascade
|
|
10:40-10:50, Paper MoA-18.5 | |
Rearrangement-Based Manipulation Via Kinodynamic Planning and Dynamic Planning Horizons |
|
Ren, Kejia | Rice University |
Kavraki, Lydia | Rice University |
Hang, Kaiyu | Rice University |
Keywords: Manipulation Planning
Abstract: Robot manipulation in cluttered environments often requires complex and sequential rearrangement of multiple objects in order to achieve the desired reconfiguration of the target objects. Due to the sophisticated physical interactions involved in such scenarios, rearrangement-based manipulation is still limited to a small range of tasks and is especially vulnerable to physical uncertainties and perception noise. This paper presents a planning framework that leverages the efficiency of sampling-based planning approaches, and closes the manipulation loop by dynamically controlling the planning horizon. Our approach interleaves planning and execution to progressively approach the manipulation goal while correcting any errors or path deviations along the process. Meanwhile, our framework allows the definition of manipulation goals without requiring explicit goal configurations, enabling the robot to flexibly interact with all objects to facilitate the manipulation of the target ones. With extensive experiments both in simulation and on a real robot, we evaluate our framework on three manipulation tasks in cluttered environments: grasping, relocating, and sorting. In comparison with two baseline approaches, we show that our framework can significantly improve planning efficiency, robustness against physical uncertainties, and task success rate under limited time budgets.
|
|
10:50-11:00, Paper MoA-18.6 | |
Parallel Monte Carlo Tree Search with Batched Rigid-Body Simulations for Speeding up Long-Horizon Episodic Robot Planning |
|
Huang, Baichuan | Rutgers University |
Boularias, Abdeslam | Rutgers University |
Yu, Jingjin | Rutgers University |
Keywords: Manipulation Planning, Deep Learning in Grasping and Manipulation, Grasping
Abstract: We propose a novel Parallel Monte Carlo tree search with Batched Simulations (PMBS) algorithm for accelerating long-horizon, episodic robotic planning tasks. Monte Carlo tree search (MCTS) is an effective heuristic search algorithm for solving episodic decision-making problems whose underlying search spaces are expansive. Leveraging a GPU-based large-scale simulator, PMBS introduces massive parallelism into MCTS for solving planning tasks through the batched execution of a large number of concurrent simulations, which allows for more efficient and accurate evaluations of the expected cost-to-go over large action spaces. When applied to the challenging manipulation tasks of object retrieval from clutter, PMBS achieves a speedup of over 30x with an improved solution quality, in comparison to a serial MCTS implementation. We show that PMBS can be directly applied to real robot hardware with negligible sim-to-real differences. Supplementary material, including video, can be found at https://github.com/arc-l/pmbs.
|
|
11:00-11:10, Paper MoA-18.7 | |
Adapting Rapid Motor Adaptation for Bipedal Robots |
|
Kumar, Ashish | UC Berkeley |
Li, Zhongyu | University of California, Berkeley |
Zeng, Jun | University of California, Berkeley |
Pathak, Deepak | Carnegie Mellon University |
Sreenath, Koushil | University of California, Berkeley |
Malik, Jitendra | UC Berkeley |
Keywords: Humanoid and Bipedal Locomotion, Robust/Adaptive Control, Reinforcement Learning
Abstract: Recent advances in legged locomotion have enabled quadrupeds to walk on challenging terrain. However, bipedal robots are inherently more unstable and hence it's harder to design walking controllers for them. In this work, we leverage the recent advances in rapid adaptation for walking, and extend them to work on bipedal robots. Similar to existing works, we start with a base policy which produces actions while taking as input an estimated extrinsics vector from an adaptation module. This extrinsics vector contains information about the environment and enables the walking controller to rapidly adapt online. However, the extrinsics estimator could be imperfect, which might lead to poor performance of the base policy which expects a perfect estimator. We propose A-RMA (adapting RMA), which additionally adapts the base policy for the imperfect extrinsics estimator by finetuing it using model free RL. We demonstrate that A-RMA outperforms a number of RL based baseline controllers and model based controllers in simulation, and show zero-shot deployment of a single A-RMA policy to enable Cassie (a bipedal robot) to walk in a variety of different scenarios beyond what it has seen during training. Videos at url{https://ashish-kmr.github.io/a-rma/}
|
|
11:10-11:20, Paper MoA-18.8 | |
Three-Dimensional Dynamic Running with a Point-Foot Biped Based on Differentially Flat SLIP |
|
Hong, Zejun | Southern University of Science and Technology |
Chen, Hua | Southern University of Science and Technology |
Zhang, Wei | Southern University of Science and Technology |
Keywords: Humanoid and Bipedal Locomotion, Whole-Body Motion Planning and Control, Underactuated Robots
Abstract: This paper presents a novel framework for point-foot biped running in three-dimensional space. The proposed approach generates center of mass (CoM) reference trajectories based on a differentially flat spring-loaded inverted pendulum (SLIP) model. A foothold planner is used to select touch down location that renders optimal CoM trajectory for upcoming step in real time. Dynamically feasible trajectories of CoM and orientation are subsequently generated by a simplified single rigid body (SRB) model based model predictive control (MPC). A task-space controller is then applied online to compute whole-body joint torques which embeds these target dynamics into the robot. The proposed approach is evaluated on physical simulation of a 12 degree-of-freedom (DoF), 7.95 kg point-foot bipedal robot. The robot achieves stable running at at varying speeds with maximum value of 1.1 m/s. The proposed scheme is shown to be able to reject vertical disturbances of 8 Ns and lateral disturbance of 6.5 Ns applied at the robot base.
|
|
11:20-11:30, Paper MoA-18.9 | |
A Passive Control Framework for a Bilateral Leader-Follower Robotic Surgical Setup Imposing RCM and Active Constraints |
|
Kastritsi, Theodora | Aristotle University of Thessaloniki |
Doulgeri, Zoe | Aristotle University of Thessaloniki |
Keywords: Telerobotics and Teleoperation
Abstract: We consider the problem of controlling a bilateral leader-follower robotic surgical set-up to allow kinesthetic haptic feedback to the user when the instrument approaches a forbidden area like sensitive organs arteries or veins that should be protected from injuries during surgery. The leader is a haptic device while the follower is a general purpose manipulator holding an elongated tool with an articulated instrument that should be manipulated through an entry port. We propose a control framework that is proved passive, incorporating a target admittance model for the follower that is designed in a way to impose a remote center of motion (RCM) while being subject to repulsive forces generated by properly designed artificial potentials associated with forbidden areas. Simulation and experimental results utilizing a virtual intraoperative environment provided as a point cloud of a kidney and its surrounding vessels characterized as forbidden areas, validate and demonstrate the performance of the proposed control scheme.
|
|
MoA-19 |
Rm19 (Room 555) |
Biologically-Inspired Robots 1 |
Regular session |
Chair: Nansai, Shunsuke | Tokyo Denki University |
Co-Chair: Umedachi, Takuya | Shinshu University |
|
10:00-10:10, Paper MoA-19.1 | |
The Flatworm-Like Mesh Robot WORMESH-II: Steering Control of Pedal Wave Locomotion |
|
Ganegoda Vidanage, Charaka Rasanga | Saitama University |
Hiraishi, Kengo | Saitama University |
Hodoshima, Ryuichi | Saitama University |
Kotosaka, Shinya | Saitama University |
Keywords: Biologically-Inspired Robots, Motion Control, Field Robots
Abstract: WORMESH is a unique robot concept inspired by flatworm locomotion and its key feature is the use of multiple traveling waves for locomotion. This paper presents the steering method for anisotropic module configuration (AMC) of WORMESH-II based on the kinematics of skid steering of mobile robots. AMC of WORMESH-II used two parallel pedal waves to generate locomotion. The kinematic model of WORMESH-II shows its longitudinal and angular velocities depend on the summation and the difference of two synchronous left and right pedal waves amplitudes Al and Ar respectively. When both pedal waves have the same amplitude, the robot moves on a straight line, whereas the trajectory becomes a curve for different wave amplitudes. The radius of curve trajectory is inversely proportional to j Al Ar j. The proposed method was ineffective when Ai 0(i = l; r). The proposed method was confirmed by the dynamic simulation of WORMESH-II using a physics engine. Moreover, the recommended skid steering method was tested using the prototype and verified.
|
|
10:10-10:20, Paper MoA-19.2 | |
In-Hand Manipulation Exploiting Bending and Compression Deformations of Caterpillar-Locomotion-Inspired Fingers |
|
Onodera, Tomoya | Shinshu University |
Iwamoto, Noriyasu | Shinshu Univ |
Umedachi, Takuya | Shinshu University |
Keywords: Soft Robot Applications, Biologically-Inspired Robots, Tendon/Wire Mechanism
Abstract: The paper presents a novel way to realize in-hand manipulation inspired by the peristaltic motion of a large-sized caterpillar's locomotion. The sharp contrast of the proposed soft-bodied finger to the traditional hard/rigid robotic ones is peristaltic motion with both compression and bending deformations. The design is based on the biological fact that large-size caterpillars (e.g., Bombyx mori) utilize bending and compression/extension of the body to produce crawling locomotion. Exploiting the multi-modal deformations, the prototype hand consisting of two proposed fingers can rotate and transport the grasped object, we demonstrated. We also found that the two fingers' motion's time gap is needed to stabilize in-hand manipulation. The design can shed new light on designing a gripper inspired by soft-bodied creatures.
|
|
10:20-10:30, Paper MoA-19.3 | |
Embodying Rather Than Encoding: Undulation with Binary Input |
|
Li, Longchuan | Beijing University of Chemical Technology |
Ma, Shugen | Ritsumeikan University |
Tokuda, Isao | Ritsumeikan University |
Tian, Yang | Ritsumeikan University |
Cao, Yiming | Ritsumeikan University |
Nokata, Makoto | Ritsumeikan University |
Li, Zhiqing | Beijing University of Chemical Technology |
Keywords: Biologically-Inspired Robots, Biomimetics
Abstract: Undulation is the most common gait generated by legless creatures, which enables their robust and efficient locomotion in various environments. Such advantages inspired the control design of many kinds of locomotion robots. Despite their technical details, most of them realize the undulation gait via tracking predetermined trajectories called serpenoid curves, which are a group of sinusoidal waveforms with specified phase differences. This technique, however, sounds quite redundant in terms of sensing and control. Here, we investigate the research question: whether the sinusoidal waveform is necessary to be encoded in the control signal to make the whole body an “S-shape”? We use a 4-link rigid body dynamics model as a simple example, by which numerical simulations are conducted. Together with theoretical analysis, we show that undulation gait emerges naturally based on embodied position controller and filter, where binary actuation torques are required only. Our results not only discover locomotion mechanisms for significantly reducing the sensing and control requirement of generating artificial undulation gait, but also provide additional understandings for biological systems from the mechanical engineering point of view.
|
|
10:30-10:40, Paper MoA-19.4 | |
A Creeping Snake-Like Robot with Partial Actuation |
|
Cao, Yiming | Ritsumeikan University |
Li, Longchuan | Beijing University of Chemical Technology |
Ma, Shugen | Ritsumeikan University |
Keywords: Biologically-Inspired Robots, Biomimetics
Abstract: Enlightened by the creeping gait of natural snakes, snake-like robots swing joints side to side at similar tracks for generating propelling forces. However, it is not always essential to control all joints of a snake-like robot to realize the creeping gait. Therefore, in this paper, a creeping snake-like robot with partially actuated joints has been investigated, towards reducing the redundancy caused by full actuation. Essentially, this approach is composed of the following two concepts: 1) joint equipped with torsion spring mechanism bridges the passive joint to generate rhythm oscillation, and 2) harmonic joint trajectories assist the robot in generating more efficient locomotion. We hereafter demonstrated that the actuated joint dominates passive dynamics of the system, which contributes to overall motion. Meanwhile, different spring stiffness affects the motion performance. Additionally, the interaction between robot and environment through Coulomb friction has been considered to reveal the contributing factors that assist the snake-like robot to yield better locomotion performance.
|
|
10:40-10:50, Paper MoA-19.5 | |
Design and Experiments of Snake Robots with Docking Function |
|
Qin, Fatao | TianGong University |
Duan, Xiaojie | TianGong University |
Ma, Shihao | Tiangong University |
Yuan, Jinglun | TIANGONG University |
Wang, Xiangyu | Tiangong University |
Wang, Jianming | Tiangong University |
Xiao, Xuan | Tiangong University |
Keywords: Biologically-Inspired Robots, Mechanism Design, Computer Vision for Manufacturing
Abstract: This paper presents a novel snake robot with the docking function, which can help the snake robots to connect with each other to achieve a stronger one with double length and double degrees of freedom. First, the mechanical design of the snake robot with docking function is introduced, including the body link and the head-tail passive docking mechanical structure. Second, the control system is built, and the control strategies of locomotion and docking are separately proposed. Then, the visual perception function is implemented for the target recognition during the docking process. Finally, the prototype is developed. The mobility and the docking function are fully verified and analyzed through the physical experiments.
|
|
10:50-11:00, Paper MoA-19.6 | |
Modeling and Control of a Lizard Inspired Single Actuated Robot |
|
Nansai, Shunsuke | Tokyo Denki University |
Kamamichi, Norihiro | Tokyo Denki University |
Itoh, Hiroshi | Tokyo Denki University |
Noji, Shohei | Tokyo Denki University |
Keywords: Biologically-Inspired Robots, Mechanism Design, Legged Robots
Abstract: In this study, trajectory-tracking control for a lizard-inspired single-actuated robot (LISA) is implemented using a novel morphology. A high degree of reproducibility is obtained between the results of kinematic analysis and the behavior of the actual robot. Several 1-degree-of-freedom (DOF)-driven robots have been proposed. However, the application of these robots is severely restricted because of their inherent morphology limitations. LISA is a multilegged robot that can overcome these disadvantages and propel and turn with 1-DOF. In this study, we formulate the kinematics, such as the turning angle and stride length, of LISA. Furthermore, a unique robot coordinate is defined to enable the symmetrical representation of critical robot state quantities. Next, a trajectory-tracking control system based on the proportional-integral-derivative (PID) control is designed, and it is verified by both experiments and numerical simulations. Finally, the results of the experiments and numerical simulations are quantitatively compared according to the root mean square, and the reproducibility of the kinematics analysis and the behavior of the actual system are discussed. This study makes the following contributions: (1) The morphology of LISA facilitates analytical results with only kinematic analysis, which is considerably simpler than dynamics analysis. (2) LISA achieved sufficiently good kinematic performance for the required motions. Experiments revealed that the designed trajectory-tracking control system allows for LISA to appropriately track several types of trajectories.
|
|
11:00-11:10, Paper MoA-19.7 | |
Optimal Path Following Control with Efficient Computation for Snake Robots Subject to Multiple Constraints and Unknown Frictions |
|
Li, Xiaohan | Tianjin University |
Ren, Chao | Tianjin University |
Ma, Shugen | Ritsumeikan University |
Keywords: Biologically-Inspired Robots, Motion Control
Abstract: This letter proposes a real-time optimal robust path following control scheme for planar snake robots without sideslip constraints using model predictive control (MPC). One of the features is that a linear double-integrator model rather than the complex dynamic model of snake robots is used for the MPC design to improve calculation efficiency. Moreover, in addition to constraints on joint angles, velocity, and acceleration for security, constraints on the joint offset and velocity are also considered to keep the snake robot moving forward efficiently. And the multiple inequality constraints are handled by a novel constraints translator. Furthermore, to achieve robust path following control subject to unknown and varied friction forces and disturbances, two reduced-order structure-improved extended state observers are designed to avoid complex environmental modeling. Extensive comparative experiments were conducted to verify the constraints design and the effectiveness of the proposed control scheme.
|
|
11:10-11:20, Paper MoA-19.8 | |
A Distributed Coach-Based Reinforcement Learning Controller for Snake Robot Locomotion |
|
Jia, Yuanyuan | Ritsumeikan University |
Ma, Shugen | Ritsumeikan University |
Keywords: Biologically-Inspired Robots, Redundant Robots, Probability and Statistical Methods
Abstract: Reinforcement learning commonly suffers from slow convergence speed and requires thousands of episodes, which makes it hard to be applied for physical robotic applications. Because of the complexity of a large degree of freedom, little study has been done on snake robot control using reinforcement learning. Existing methods either adopts an asynchronous A3C structure or a joint state representation. In this paper, a fully distributed coach-based reinforcement learning method is proposed for snake robot control that can considerably accelerate training while using fewer episodes. The major contributions comprises: 1) we extend our previously presented graphical coach based RL control method into a completely distributed framework; 2) an explicit stochastic density propagation rule for each robot link is mathematically derived; 3) the various interactions with uncertainty have been well modeled and estimated to achieve an efficient and robust control for snake robots. The preliminary results of both simulation and real-world experiments have demonstrated promising performance comparing with other recent methods.
|
|
11:20-11:30, Paper MoA-19.9 | |
Bio-Inspired 2D Vertical Climbing with a Novel Tripedal Robot |
|
Webster, Clyde | University of Technology Sydney |
Kong, Felix Honglim | The University of Technology Sydney |
Fitch, Robert | University of Technology Sydney |
Keywords: Biologically-Inspired Robots, Climbing Robots, Legged Robots
Abstract: Climbing robots have the potential to revolutionize the maintenance and inspection operations of many types of vertical structures. In nature, parrots exhibit a remarkable capacity for manipulation during climbing behaviors, for which robotics can benefit from studying. In this paper we present a novel tripedal robot that is inspired by the morphology of these impressive birds, which use their legs and beak in a tripedal fashion when climbing. We propose several foot placement, trajectory generation, and control methods for this system along with performance evaluation in simulation. A video of select simulations and live bird data is included in the supplementary material, and can also be found at https: //youtu.be/vRVGraIyQgQ.
|
|
MoA-20 |
Rm20 (Room 104) |
Formal Methods in Robotics and Automation |
Regular session |
Chair: Kress-Gazit, Hadas | Cornell University |
Co-Chair: Suleiman, Wael | University of Sherbrooke |
|
10:00-10:10, Paper MoA-20.1 | |
Event-Based Signal Temporal Logic Tasks: Execution and Feedback in Complex Environments |
|
Gundana, David | Cornell University |
Kress-Gazit, Hadas | Cornell University |
Keywords: Formal Methods in Robotics and Automation, Hybrid Logical/Dynamical Planning and Verification, Multi-Robot Systems
Abstract: In this work, we synthesize control for high-level, reactive robot tasks that include timing constraints and choices over goals and constraints. We enrich Event-based Signal Temporal Logic by adding disjunctions, and propose a framework for synthesizing controllers that satisfy such specifications. If there are multiple ways to satisfy a specification, we choose, at run-time, a controller that instantaneously maximizes robustness. During execution, we automatically generate feedback in the form of pre-failure warnings that give users insight as to why a specification may be violated in the future. We demonstrate our work through physical and simulated multi-robot systems operating in complex environments.
|
|
10:10-10:20, Paper MoA-20.2 | |
Optimizing Demonstrated Robot Manipulation Skills for Temporal Logic Constraints |
|
Dhonthi, Akshay | University of Twente |
Schillinger, Philipp | Bosch Center for Artificial Intelligence |
Rozo, Leonel | Bosch Center for Artificial Intelligence |
Nardi, Daniele | Sapienza University of Rome |
Keywords: Formal Methods in Robotics and Automation, Manipulation Planning, Learning from Demonstration
Abstract: For performing robotic manipulation tasks, the core problem is determining suitable trajectories that fulfill the task requirements. Various approaches to compute such trajectories exist, being learning and optimization the main driving techniques. Our work builds on the learning-from-demonstration (LfD) paradigm, where an expert demonstrates motions, and the robot learns to imitate them. However, expert demonstrations are not sufficient to capture all sorts of task specifications, such as the timing to grasp an object. In this paper, we propose a new method that considers formal task specifications within LfD skills. Precisely, we leverage Signal Temporal Logic (STL), an expressive form of temporal properties of systems, to formulate task specifications and use black-box optimization (BBO) to adapt an LfD skill accordingly. We demonstrate our approach in simulation and on a real industrial setting using several tasks that showcase how our approach addresses the LfD limitations using STL and BBO.
|
|
10:20-10:30, Paper MoA-20.3 | |
Classification of Time-Series Data Using Boosted Decision Trees |
|
Aasi, Erfan | Boston University |
Vasile, Cristian Ioan | Lehigh University |
Bahreinian, Mahroo | Boston University |
Belta, Calin | Boston University |
Keywords: Formal Methods in Robotics and Automation
Abstract: Time-series data classification is central to the analysis and control of autonomous systems, such as robots and self-driving cars. Temporal logic-based learning algorithms have been proposed recently as classifiers of such data. However, current frameworks are either inaccurate for real-world applications, such as autonomous driving, or they generate long and complicated formulae that lack interpretability. To address these limitations, we introduce a novel learning method, called Boosted Concise Decision Trees (BCDTs), to generate binary classifiers that are represented as Signal Temporal Logic (STL) formulae. Our algorithm leverages an ensemble of Concise Decision Trees (CDTs) to improve the classification performance, where each CDT is a decision tree that is empowered by a set of techniques to generate simpler formulae and improve interpretability. The effectiveness and classification performance of our algorithm are evaluated on naval surveillance and urban-driving case studies.
|
|
10:30-10:40, Paper MoA-20.4 | |
Robustness-Based Synthesis for Stochastic Systems under Signal Temporal Logic Tasks |
|
Scher, Guy | Cornell University |
Sadraddini, Sadra | MIT |
Kress-Gazit, Hadas | Cornell University |
Keywords: Formal Methods in Robotics and Automation
Abstract: We develop a method for synthesizing control policies for stochastic, linear, time-varying systems that must perform tasks specified in signal temporal logic. We build upon an efficient, sampling-based framework that computes the probability of the system satisfying its specification. By exploiting the properties of linear systems and robustness score in temporal logic specifications, we obtain sample-efficient gradients of the satisfaction probability with respect to controller parameters. Therefore, by applying gradient descent we obtain locally optimized controllers that maximize the chances of satisfying the specification. We demonstrate our approach through examples of a mobile robot and a mobile manipulator in simulation.
|
|
10:40-10:50, Paper MoA-20.5 | |
Sensor Observability Index: Evaluating Sensor Alignment for Task-Space Observability in Robotic Manipulators |
|
Wong, Christopher Yee | University of Sherbrooke |
Suleiman, Wael | University of Sherbrooke |
Keywords: Formal Methods in Robotics and Automation, Kinematics, Robot Safety
Abstract: In this paper, we propose a preliminary definition and analysis of the novel concept of sensor observability index. The goal is to analyse and evaluate the performance of distributed directional or axial-based sensors to observe specific axes in task space as a function of joint configuration in serial robot manipulators. For example, joint torque sensors are often used in serial robot manipulators and assumed to be perfectly capable of estimating end effector forces, but certain joint configurations may cause one or more task-space axes to be unobservable as a result of how the joint torque sensors are aligned. The proposed sensor observability provides a method to analyse the quality of the current robot configuration to observe the task space. Parallels are drawn between sensor observability and the traditional kinematic Jacobian for the particular case of joint torque sensors in serial robot manipulators. Although similar information can be retrieved from kinematic analysis of the Jacobian transpose in serial manipulators, sensor observability is shown to be more generalizable in terms of analysing non-joint-mounted sensors and other sensor types. In addition, null-space analysis of the Jacobian transpose is susceptible to false observability singularities. Simulations and experiments using the robot Baxter demonstrate the importance of maintaining proper sensor observability in physical interactions.
|
|
10:50-11:00, Paper MoA-20.6 | |
Fair Planning for Mobility-On-Demand with Temporal Logic Requests |
|
Liang, Kaier | Lehigh University |
Vasile, Cristian Ioan | Lehigh University |
Keywords: Formal Methods in Robotics and Automation, Optimization and Optimal Control, Hybrid Logical/Dynamical Planning and Verification
Abstract: Mobility-on-demand systems are transforming the way we think about the transportation of people and goods. Most research effort has been placed on scalability issues for systems with a large number of agents and simple pick-up/drop-off demands. In this paper, we consider fair multi-vehicle route planning with streams of complex, temporal logic transportation demands. We consider the envy-free fair allocation of demands to limited-capacity vehicles based on agents' accumulated utility over a finite time horizon, representing for example monetary reward or utilization level. We propose a scalable approach based on the construction of assignment graphs that relate agents to routes and demands and pose the problem as an Integer Linear Program (ILP). Routes for assignments are computed using automata-based methods for each vehicle and demands sets of size at most the capacity of the vehicle while taking into account their pick-up wait time and delay tolerances. In addition, we integrate utility-based weights in the assignment graph and ILP to ensure approximative fair allocation. We demonstrate the computational and operational performance of our methods in ride-sharing case studies over a large environment in mid-Manhattan and Linear Temporal Logic demands with stochastic arrival times. We show that our method significantly decreases the utility deviation between agents and the vacancy rate.
|
|
11:00-11:10, Paper MoA-20.7 | |
Incremental Path Planning Algorithm Via Topological Mapping with Metric Gluing |
|
Upadhyay, Aakriti | University at Albany, SUNY |
Goldfarb, Boris | University at Albany, SUNY |
Ekenna, Chinwe | University at Albany |
Keywords: Formal Methods in Robotics and Automation, Motion and Path Planning, Computational Geometry
Abstract: We present an incremental topology-based motion planner that, while planning paths in the configuration space, performs metric gluing on the constructed Vietoris-Rips simplicial complex of each sub-space (voxel). By incrementally capturing topological and geometric information in batches of voxel graphs, our algorithm avoids the time overhead of analyzing the properties of the entire configuration space. We theoretically prove in this paper that the simplices of all voxel graphs joined together are homotopy-equivalent to the union of the simplices in the configuration space. Experiments were carried out in seven different environments using various robots, including the articulated linkage robot, the Kuka YouBot, and the PR2 robot. In all environments, the results show that our algorithm achieves better convergence for path cost and computation time with a memory-efficient roadmap than state-of-the-art methods.
|
|
11:10-11:20, Paper MoA-20.8 | |
UAV-miniUGV Hybrid System for Hidden Area Exploration and Manipulation |
|
Pushp, Durgakant | Indiana University Bloomington |
Kalhapure, Swapnil | The University of Sheffield |
Das, Kaushik | TATA Consultancy Service |
Liu, Lantao | Indiana University |
Keywords: Hardware-Software Integration in Robotics, Engineering for Robotic Systems, Software-Hardware Integration for Robot Systems
Abstract: We propose a novel hybrid system (both hardware and software) of an Unmanned Aerial Vehicle (UAV) carrying a miniature Unmanned Ground Vehicle (miniUGV) to perform a complex search and manipulation task. This system leverages the heterogeneous robots to accomplish a task that cannot be done using a single robot system. It enables the UAV to explore a hidden space with a narrow opening through which the miniUGV can easily enter and escape. The hidden space is assumed to be navigable for the miniUGV. The miniUGV uses Infrared (IR) sensors and a monocular camera to search an object in the hidden space. The proposed system takes advantage of a wider field of view (fov) of camera as well as the stochastic nature of the object detection algorithms to guide the robot in the hidden space to find the object. Upon finding the object the miniUGV grabs it using visual servoing and then returns back to its start point from where the UAV retracts it back and transports the object to a safe place. In case there is no object found in the hidden space, UAV continues the aerial search. The tethered miniUGV gives the UAV an ability to act beyond its reach and perform a search and manipulation task which was not possible before for any of the robots individually. The system has a wide range of applications and we have demonstrated its feasibility through repetitive experiments
|
|
11:20-11:30, Paper MoA-20.9 | |
Inference of Multi-Class STL Specifications for Multi-Label Human-Robot Encounters |
|
Linard, Alexis | KTH Royal Institute of Technology |
Torre, Ilaria | KTH Royal Institute of Technology |
Leite, Iolanda | KTH Royal Institute of Technology |
Tumova, Jana | KTH Royal Institute of Technology |
Keywords: Formal Methods in Robotics and Automation
Abstract: This paper is interested in formalizing human trajectories in human-robot encounters. Inspired by robot navigation tasks in human-crowded environments, we consider the case where a human and a robot walk towards each other, and where humans have to avoid colliding with the incoming robot. Further, humans may describe different behaviors, ranging from being in a hurry/minimizing completion time to maximizing safety. We propose a decision tree-based algorithm to extract STL formulae from multi-label data. Our inference algorithm learns STL specifications from data containing multiple classes, where instances can be labelled by one or many classes. We base our evaluation on a dataset of trajectories collected through an online study reproducing human-robot encounters.
|
|
MoA-OL1 |
Rm21 (on-line) |
SLAM 1 |
Regular session |
Chair: Biber, Peter | Robert Bosch GmbH |
Co-Chair: He, Li | Southern University of Science and Technology |
|
10:00-10:10, Paper MoA-OL1.1 | |
Making Parameterization and Constrains of Object Landmark Globally Consistent Via SPD(3) Manifold and Improved Cost Functions |
|
Hu, Yutong | Beihang University |
Wang, Wei | Beihang University |
Keywords: SLAM, Mapping, Computer Vision for Automation
Abstract: Object-level SLAM introduces semantic-meaningful and compact object landmarks that help both indoor robot applications and outdoor autonomous driving tasks. However, the back end of object-level SLAM suffers from singularity problems because existing methods parameterize object landmarks separately by their scale and pose. Under that parameterization method, the same abstract object can be represented by rotating the object coordinate frame by 90 degrees and swapping its length with width value, making the pose of the same object landmark not globally consistent. To avoid the singularity problem, we first introduce the symmetric positive-definite (SPD) matrix manifold as an improved object-level landmark representation and further improve the cost functions in the back end to make them compatible with it. Our method demonstrates a faster convergence rate and more robustness in simulation experiments. And experiments on real datasets reveal that using the same front-end data, our strategy improves mapping accuracy by 22% on average.
|
|
10:10-10:20, Paper MoA-OL1.2 | |
ULSM: Underground Localization and Semantic Mapping with Salient Region Loop Closure under Perceptually-Degraded Environment |
|
Wang, Junhui | Institute of Automation, Chinese Academy of Sciences |
Bin, Tian | Institute of Automation, Chinese Academy of Sciences |
Zhang, Rui | Waytous Inc, Beijing |
Chen, Long | Sun Yat-Sen University |
Keywords: SLAM, Mapping, Localization
Abstract: Simultaneous Localization and Mapping (SLAM) has greatly assisted in exploring perceptually-degraded underground environments, such as human-made tunnels, mine tunnels, and caves. However, the recurring sensor failures and spurious loop closures in these scenes bring significant challenges to applying SLAM. This paper proposes an architecture for underground localization and semantic mapping (ULSM) that promotes the robustness of odometry estimation and map-building. In this architecture, a two-stage robust motion compensation method is proposed to adapt to sensor-failure situations. The proposed salient region loop closure detection contributes to avoiding spurious loop closures. Meanwhile, the 2D pose as the initial value for point cloud registration is estimated without additional input. We also design a multi-robot cooperative mapping scheme based on descriptors of the salient region. Extensive experiments are conducted on datasets collected in the Tunnel Circuit of DARPA Subterranean Challenge.
|
|
10:20-10:30, Paper MoA-OL1.3 | |
NDD: A 3D Point Cloud Descriptor Based on Normal Distribution for Loop Closure Detection |
|
Zhou, Ruihao | Guangdong University of Technology |
He, Li | Southern University of Science and Technology |
Zhang, Hong | SUSTech |
Lin, Xubin | Guangdong University of Technology |
Guan, Yisheng | Guangdong University of Technology |
Keywords: SLAM, Range Sensing
Abstract: Loop closure detection is a key technology for long-term robot navigation in complex environments. In this paper, we present a global descriptor, named Normal Distribution Descriptor (NDD), for 3D point cloud loop closure detection. The descriptor encodes both the probability density score and entropy of a point cloud as the descriptor. We also propose a fast rotation alignment process and use correlation coefficient as the similarity between descriptors. Experimental results show that our approach outperforms the state-of-the-art point cloud descriptors in both accuracy and efficency. The source code will be available and can be integrated into existing LiDAR odometry and mapping (LOAM) systems.
|
|
10:30-10:40, Paper MoA-OL1.4 | |
FEJ-VIRO: A Consistent First-Estimate Jacobian Visual-Inertial-Ranging Odometry |
|
Jia, Shenhan | Zhejiang University |
Jiao, Yanmei | Zhejiang University |
Zhang, Zhuqing | Zhejiang University |
Xiong, Rong | Zhejiang University |
Wang, Yue | Zhejiang University |
Keywords: SLAM, Sensor Fusion, Visual-Inertial SLAM
Abstract: In recent years, Visual-Inertial Odometry (VIO) has achieved many significant progresses. However, VIO methods suffer from localization drift over long trajectories. In this paper, we propose a First-Estimates Jacobian Visual-Inertial-Ranging Odometry (FEJ-VIRO) to reduce the localization drifts of VIO by incorporating ultra-wideband (UWB) ranging measurements into the VIO framework consistently. Considering that the initial positions of UWB anchors are usually unavailable, we propose a long-short window structure to initialize the UWB anchors' positions as well as the covariance for state augmentation. After initialization, the FEJ-VIRO estimates the UWB anchors' positions simultaneously along with the robot poses. We further analyze the observability of the visual-inertial-ranging estimators and proved that there are four unobservable directions in the ideal case, while only three unobservable directions exist in the actual case. Based on these analyses, we propose to leverage the FEJ technique to fuse ranging measurements consistently. Finally, we validate our analysis and the proposed FEJ-VIRO with both simulation and real-world experiments.
|
|
10:40-10:50, Paper MoA-OL1.5 | |
RGB-D SLAM in Indoor Planar Environments with Multiple Large Dynamic Objects |
|
Long, Ran | University of Edinburgh |
Rauch, Christian | University of Edinburgh |
Zhang, Tianwei | The University of Tokyo |
Ivan, Vladimir | Touchlab Limited |
Lam, Tin Lun | The Chinese University of Hong Kong, Shenzhen |
Vijayakumar, Sethu | University of Edinburgh |
Keywords: SLAM, Visual Tracking, Sensor Fusion
Abstract: This work presents a novel dense RGB-D SLAM approach for dynamic planar environments that enables simultaneous multi-object tracking, camera localisation and background reconstruction. Previous dynamic SLAM methods either rely on semantic segmentation to directly detect dynamic objects; or assume that dynamic objects occupy a smaller proportion of the camera view than the static background and can, therefore, be removed as outliers. With the aid of camera motion prior, our approach enables dense SLAM when the camera view is largely occluded by multiple dynamic objects. The dynamic planar objects are separated by their different rigid motions and tracked independently. The remaining dynamic non-planar areas are removed as outliers and not mapped into the background. The evaluation demonstrates that our approach outperforms the state-of-the-art methods in terms of localisation, mapping, dynamic segmentation and object tracking. We also demonstrate its robustness to large drift in the camera motion prior.
|
|
10:50-11:00, Paper MoA-OL1.6 | |
DRG-SLAM: A Semantic RGB-D SLAM Using Geometric Features for Indoor Dynamic Scene |
|
Wang, Yanan | Beihang University |
Xu, Kun | Beijing University of Aeronautics and Astronautics |
Tian, Yaobin | Beihang University |
Ding, Xilun | Beijing Univerisity of Aeronautics and Astronautics |
Keywords: SLAM, Localization, RGB-D Perception
Abstract: Visual SLAM methods based on point features have achieved acceptable results in texture-rich static scenes, but they often suffer from a deficiency of texture and the existence of dynamic objects in real indoor scenes, which limits the application of these methods. In this paper, we have presented DRG-SLAM, which combines line features and plane features into point features to improve the robustness of the system. We tested the proposed algorithm on publicly available datasets, and the results demonstrate that the algorithm has superior accuracy and robustness in indoor dynamic scenes compared with the state-of-the-art methods.
|
|
11:00-11:10, Paper MoA-OL1.7 | |
Scale-Aware Direct Monocular Odometry |
|
Campos, Carlos | Universidad De Zaragoza |
Tardos, Juan D. | Universidad De Zaragoza |
Keywords: SLAM, Localization, Mapping
Abstract: We present a generic framework for scale-aware direct monocular odometry based on depth prediction from a deep neural network. In contrast with previous methods where depth information is only partially exploited, we formulate a novel depth prediction residual which allows us to incorporate multi-view depth information. In addition, we propose to use a truncated robust cost function which prevents considering inconsistent depth estimations. The photometric and depth-prediction measurements are integrated into a tightly-coupled optimization leading to a scale-aware monocular system which does not accumulate scale drift. Our proposal does not particularize for a concrete neural network, being able to work along with the vast majority of the existing depth prediction solutions. We demonstrate the validity and generality of our proposal evaluating it on the KITTI odometry dataset, using two publicly available neural networks and comparing it with similar approaches and the state-of-the-art for monocular and stereo SLAM. Experiments show that our proposal largely outperforms classic monocular SLAM, being 5 to 9 times more precise, beating similar approaches and having an accuracy which is closer to that of stereo systems.
|
|
11:10-11:20, Paper MoA-OL1.8 | |
IMU Preintegration for 2D SLAM Problems Using Lie Groups |
|
Geer Cousté, Idril Tadzio | Universitat Politècnica De Catalunya |
Vallvé, Joan | CSIC-UPC |
Solà, Joan | Institut De Robòtica I Informàtica Industrial |
Keywords: SLAM, Localization
Abstract: 2D SLAM is useful for mobile robots that are constrained to a 2D plane, for example in a warehouse, simplifying calculations in respect to the 3D case. The use of an IMU in such a context can enrich the estimation and make it more robust. In this paper we reformulate the IMU preintegration widely used in 3D problems for the 2D case, making use of Lie Theory. The Lie theory based formalization, first derived for a perfectly horizontal plane, allows us to easily extend it to problems where the plane is not orthogonal to the gravity vector. We implement the theory in a factor graph based estimation library, and carry out experiments to validate it on a mobile platform. Two experiments are carried out; on a horizontal and a sloped environment, and the sensor data is processed using our two 2D methods and a state-of-the-art 3D method.
|
|
11:20-11:30, Paper MoA-OL1.9 | |
Scale Estimation with Dual Quadrics for Monocular Object SLAM |
|
Song, Shuangfu | Tongji University |
Zhao, Junqiao | Tongji University |
Feng, Tiantian | Tongji University |
Ye, Chen | Tongji University |
Xiong, Lu | Tongji University |
Keywords: SLAM, Mapping, Localization
Abstract: The scale ambiguity problem is inherently unsolvable to monocular SLAM without the metric baseline between moving cameras. In this paper, we present a novel scale estimation approach based on an object-level SLAM system. To obtain the absolute scale of the reconstructed map, we formulate an optimization problem to make the scaled dimensions of objects conform to the distribution of their sizes in the physical world, without relying on any prior information about gravity direction. The dual quadric is adopted to represent objects for its ability to describe objects compactly and accurately, thus providing reliable dimensions for scale estimation. In the proposed monocular object-level SLAM system, semantic objects are initialized first from fitted 3-D oriented bounding boxes and then further optimized under constraints of 2-D detections and 3-D map points. Experiments on indoor and outdoor public datasets show that our approach outperforms existing methods in terms of accuracy and robustness.
|
|
MoA-OL3 |
Rm23 (on-line) |
Grasping 1 |
Regular session |
Chair: Makita, Satoshi | Fukuoka Institute of Technology |
Co-Chair: Chung, Jen Jen | ETH Zürich |
|
10:00-10:10, Paper MoA-OL3.1 | |
Learn from Interaction: Learning to Pick Via Reinforcement Learning in Challenging Clutter |
|
Zhao, Chao | Hong Kong University of Science and Technology |
Seo, Jungwon | The Hong Kong University of Science and Technology |
Keywords: Grasping, Grippers and Other End-Effectors
Abstract: Bin picking is a challenging problem in robotics due to high dimensional action space, partially visible objects, and contact-rich environments. State-of-the-art methods for bin picking are often simplified as planar manipulation or learn policy based on human demonstration and motion primitives. The designs have escalated in complexity while still failing to reach the generality and robustness of human picking ability. Here, we present an end-to-end reinforcement learning (RL) framework to produce an adaptable and robust policy for picking objects in diverse real-world environments, including but not limited to tilted bins and corner objects. We present a novel solution to incorporate object interaction in policy learning. The object interaction is represented by the poses of objects. The policy learning is based on two neural networks with asymmetric state inputs. One acts on the object interaction information, while the other acts on the depth observation and proprioceptive signals of robots. The resulting policy demonstrates remarkable zero-shot generalization from simulation to the real world, and extensive real-world picking experiments show the effectiveness of the approach.
|
|
10:10-10:20, Paper MoA-OL3.2 | |
GE-Grasp: Efficient Target-Oriented Grasping in Dense Clutters |
|
Liu, Zhan | Tsinghua University |
Wang, Ziwei | Tsinghua University |
Huang, Sichao | Tsinghua University |
Zhou, Jie | Tsinghua University |
Lu, Jiwen | Tsinghua University |
Keywords: Grasping, Deep Learning in Grasping and Manipulation, RGB-D Perception
Abstract: Grasping in dense clutters is a fundamental skill for autonomous robots. However, the crowdedness and occlusions in the cluttered scenario cause significant difficulties to generate valid grasp poses without collisions, which results in low efficiency and high failure rates. To address these, we present a generic framework called GE-Grasp for robotic motion planning in dense clutters, where we leverage diverse action primitives for occluded objects removal and present the generator-evaluator architecture to avoid spatial collisions. Therefore, our GE-Grasp is capable of grasping objects in dense clutters efficiently with promising success rates. Specifically, we define three action primitives: target-oriented grasping for picking up the target directly, pushing and nontarget-oriented grasping to reduce the crowdedness and occlusions. The generators select preferred action primitive set via a spatial correlation test (SCT), which effectively provides various motion candidates for target grasping in clutters. Meanwhile, the evaluators assess the selected action primitive candidates, where the optimal action is implemented by the robot. Extensive experiments in the simulated and real-world environments show that our approach outperforms the state-of-the-art methods of grasping in clutters with respect to the motion efficiency and success rates. Moreover, we achieve comparable performance in the real world as that in the simulation environment, which indicates the strong generalization ability of our GE-Grasp. Video demo is provided in the supplementary material.
|
|
10:20-10:30, Paper MoA-OL3.3 | |
Simultaneous Object Reconstruction and Grasp Prediction Using a Camera-Centric Object Shell Representation |
|
Chavan-Dafle, Nikhil | Samsung Research America |
Popovych, Sergiy | Princeton University |
Agrawal, Shubham | Columbia University |
Lee, Daniel | Cornell Tech |
Isler, Volkan | University of Minnesota |
Keywords: Perception for Grasping and Manipulation, Grasping, Deep Learning in Grasping and Manipulation
Abstract: Being able to grasp objects is a fundamental component of most robotic manipulation systems. In this paper, we present a new approach to simultaneously reconstruct a mesh and a dense grasp quality map of an object from a depth image. At the core of our approach is a novel camera-centric object representation called the ''object shell" which is composed of an observed ''entry image" and a predicted ''exit image." We present an image-to-image residual ConvNet architecture in which the object shell and a grasp-quality map are predicted as separate output channels. The main advantage of the shell representation and the corresponding neural network architecture, ShellGrasp-Net, is that the input-output pixel correspondences in the shell representation are explicitly represented in the architecture. We show that this coupling yields superior generalization capabilities for object reconstruction and accurate grasp quality estimation implicitly considering the object geometry. Our approach yields an efficient dense grasp quality map and an object geometry estimate in a single forward pass. Both of these outputs can be used in a wide range of robotic manipulation applications. With rigorous experimental validation, both in simulation and on a real setup, we show that our shell-based method can be used to generate precise grasps and the associated grasp quality with over 90% accuracy. Diverse grasps computed on shell reconstructions allow the robot to select and execute grasps in cluttered scenes with more than 93% success rate.
|
|
10:30-10:40, Paper MoA-OL3.4 | |
Visual Manipulation Relationship Detection Based on Gated Graph Neural Network for Robotic Grasping |
|
Ding, Mengyuan | Xi'an Jiaotong University |
Liu, YaXin | Xi'an Jiaotong University |
Yang, Chenjie | Xi'an Jiaotong University |
Lan, Xuguang | Xi'an Jiaotong University |
Keywords: Grasping, Deep Learning in Grasping and Manipulation
Abstract: Exploring the relationship among objects and giving the correct operation sequence is vital for robotic manipulation. However, most previous algorithms only model the relationship between pairs of objects independently, ignoring the interaction effect between them, which may generate redundant or missing relations in complex scenes, such as multi-object stacking and partial occlusion. To solve this problem, a Gated Graph Neural Network (GGNN) is designed for visual manipulation relationship detection, which can help robots detect targets in complex scenes and obtain the appropriate grasping order. Firstly, the robot extracts feature from the input image and estimate object categories. Then GGNN is used to effectively capture the dependencies between objects in the whole scene, update the relevant features, and output the grasping sequence. In addition, by embedding positional encoding into pair object features, accurate context information is obtained to reduce the adverse effects of complex scenes. Finally, the constructed algorithm is applied to the physical robot for grasping. Experiment results on the Visual Manipulation Relationship Dataset (VMRD) and the large-scale relational grasp dataset named REGRAD show that our method significantly improves the accuracy of relationship detection in complex scenes, and can be well generalized in the real world.
|
|
10:40-10:50, Paper MoA-OL3.5 | |
Closed-Loop Next-Best-View Planning for Target-Driven Grasping |
|
Breyer, Michel | Autonomous Systems Lab, ETH Zurich |
Ott, Lionel | ETH Zurich |
Siegwart, Roland | ETH Zurich |
Chung, Jen Jen | ETH Zürich |
Keywords: Perception for Grasping and Manipulation
Abstract: Picking a specific object from clutter is an essential component of many manipulation tasks. Partial observations often require the robot to collect additional views of the scene before attempting a grasp. This paper proposes a closed-loop next-best-view planner that drives exploration based on occluded object parts. By continuously predicting grasps from an up-to-date scene reconstruction, our policy can decide online to finalize a grasp execution or to adapt the robot’s trajectory for further exploration. We show that our reactive approach decreases execution times without loss of grasp success rates compared to common camera placements and handles situations where the fixed baselines fail. Code is available at https://github.com/ethz-asl/active_grasp.
|
|
10:50-11:00, Paper MoA-OL3.6 | |
The DressGripper: A Collaborative Gripper with Electromagnetic Fingertips for Dressing Assistance |
|
Dragusanu, Mihai | University of Siena |
Marullo, Sara | University of Siena |
Malvezzi, Monica | University of Siena |
Achilli, Gabriele Maria | University of Perugia, Polo Scientifico Didattico Di Terni |
Valigi, Maria Cristina | University of Perugia |
Prattichizzo, Domenico | Università Di Siena |
Salvietti, Gionata | University of Siena |
Keywords: Grippers and Other End-Effectors, Safety in HRI, Soft Robot Applications
Abstract: This paper introduces a gripper designed for a safe interaction while handling clothes in wearing operations. If we consider a robotic system helping people to get dressed, we have two main goals to achieve: i) the gripper that comes in contact with the person has to be intrinsically safe and ii) the gripper should be able to keep the clothes during dressing, e.g. while passing the arm inside the sleeve of a jacket. The gripper proposed in this work solves these issues by matching a compliant and safe structure with an additional magnetic actuation at the fingertips. This combination enable a soft interaction with the robot while guaranteeing the necessary grasping tightness. We report experiment with the proposed prototype that demonstrate its applicability in robotic dressing assistance scenarios.
|
|
11:00-11:10, Paper MoA-OL3.7 | |
A New Gripper That Acts As an Active and Passive Joint to Facilitate Prehensile Grasping and Locomotion |
|
Govindan, Nagamanikandan | International Institute of Information Technology Hyderabad |
Ramesh, Shashank | Indian Institute of Technology, Madras |
Thondiyath, Asokan | IIT Madras |
Keywords: Grippers and Other End-Effectors, Mechanism Design, Actuation and Joint Mechanisms
Abstract: Among primates, the prehensile nature of the hand is vital for greater adaptability and a secure grip over the substrate/branches, particularly for arm-swinging motion or brachiation. Though various brachiation mechanisms that are mechanically equivalent to underactuated pendulum models are reported in the literature, not much attention has been given to the hand design that facilitates both locomotion and within-hand manipulation. In this paper, we propose a new robotic gripper design, equipped with shape conformable active gripping surfaces that can act as an active or passive joint and adapt to substrates with different shapes and sizes. A floating base serial chain, named GraspMaM, equipped with two such grippers, increases the versatility by performing a range of locomotion and manipulation modes without using dedicated systems. The unique gripper design allows the robot to estimate the passive joint state while arm-swinging and exhibits a dual relationship between manipulation and locomotion. We report the design details of the multimodal gripper and how it can be adapted for the brachiation motion assuming it as an articulated suspended pendulum model. Further, the system parameters of the physical prototype are estimated, and experimental results for the brachiation mode are discussed to validate and show the effectiveness of the proposed design.
|
|
11:10-11:20, Paper MoA-OL3.8 | |
Design and Control of a Quasi-Direct Drive Robotic Gripper for Collision Tolerant Picking at High Speed |
|
Ostyn, Frederik | Ghent University |
Vanderborght, Bram | Vrije Universiteit Brussel |
Crevecoeur, Guillaume | Ghent University |
Keywords: Grippers and Other End-Effectors, Industrial Robots, Failure Detection and Recovery
Abstract: Faster robotic picking of objects can improve industrial production throughput. A gripper design with appropriate control strategy is demonstrated to pick objects at high speed, while being tolerant to unintentional collisions. The design has a quasi–direct drive with backdrivable, rigid finger mechanism to transfer collision forces with minimal delay towards the motor side. These forces are detected based on band–pass filtered momentum and motor speed monitoring. The motor velocity is observed through Kalman filtering in order to reduce noise in the low–level control loop and band–pass momentum observer, resulting in a higher collision detection sensitivity. On a higher level, the path of the gripper is planned with respect to the trade–off picking speed versus collision tolerance, quantified by the remaining gripper stroke upon impact. The gripper collision tolerance was experimentally verified in several scenarios during high–speed object picking where it mitigated collisions up to 0.6 m/s.
|
|
MoB-1 |
Rm1 (Room A) |
Award Session II |
Regular session |
Chair: Yoshida, Eiichi | Tokyo University of Science |
Co-Chair: Hutchinson, Seth | Georgia Institute of Technology |
|
14:10-14:25, Paper MoB-1.1 | |
Going in Blind: Object Motion Classification Using Distributed Tactile Sensing for Safe Reaching in Clutter (Finalist for IROS Best Paper Award and ABB Best Student Paper Award) |
|
Thomasson, Rachel | Stanford University |
Roberge, Etienne | École De Technologie Supérieure |
Cutkosky, Mark | Stanford University |
Roberge, Jean-Philippe | École De Technologie Supérieure |
Keywords: Perception for Grasping and Manipulation, Force and Tactile Sensing, Robot Safety
Abstract: Robotic manipulators navigating cluttered shelves or cabinets may find it challenging to avoid contact with obstacles. Indeed, rearranging obstacles may be necessary to access a target. Rather than planning explicit motions that place obstacles into a desired pose, we suggest allowing incidental contacts to rearrange obstacles while monitoring contacts for safety. Bypassing object identification, we present a method for categorizing object motions from tactile data collected from incidental contacts with a capacitive tactile skin on an Allegro Hand. We formalize tactile cues associated with categories of object motion, demonstrating that they can determine with >90% accuracy whether an object is movable and whether a contact is causing the object to slide stably (safe contact) or tip (unsafe).
|
|
14:25-14:40, Paper MoB-1.2 | |
PI-ARS: Accelerating Evolution-Learned Visual-Locomotion with Predictive Information Representations (Finalist for IROS Best Paper Award and ABB Best Student Paper Award) |
|
Lee, Kuang-Huei | Google Research |
Nachum, Ofir | Google |
Zhang, Tingnan | Google |
Guadarrama, Sergio | Google |
Tan, Jie | Google |
Yu, Wenhao | Google |
Keywords: Representation Learning, Reinforcement Learning, Legged Robots
Abstract: Evolution Strategy (ES) algorithms have shown promising results in training complex robotic control policies due to their massive parallelism capability, simple implementation, effective parameter-space exploration, and fast training time. However, a key limitation of ES is its scalability to large capacity models, including modern neural network architectures. In this work, we develop Predictive Information Augmented Random Search (PI-ARS) to mitigate this limitation by leveraging recent advancements in representation learning to reduce the parameter search space for ES. Namely, PI-ARS combines a gradient-based representation learning technique, Predictive Information (PI), with a gradient-free ES algorithm, Augmented Random Search (ARS), to train policies that can process complex robot sensory inputs and handle highly nonlinear robot dynamics. We evaluate PI-ARS on a set of challenging visual-locomotion tasks where a quadruped robot needs to walk on uneven stepping stones, quincuncial piles, and moving platforms, as well as to complete an indoor navigation task. Across all tasks, PI-ARS demonstrates significantly better learning efficiency and performance compared to the ARS baseline. We further validate our algorithm by demonstrating that the learned policies can successfully transfer to a real quadruped robot, for example, achieving a 100% success rate on the real-world stepping stone environment, dramatically improving prior results achieving 40% success.
|
|
14:40-14:55, Paper MoB-1.3 | |
Learning Visual Feedback Control for Dynamic Cloth Folding (Finalist for IROS Best Paper Award and ABB Best Student Paper Award) (Finalist for IROS Best RoboCup Paper Award Sponsored by RoboCup Federation) |
|
Hietala, Julius | Aalto University |
Blanco-Mulero, David | Aalto University |
Alcan, Gokhan | Aalto University |
Kyrki, Ville | Aalto University |
Keywords: Machine Learning for Robot Control, Deep Learning in Grasping and Manipulation, Reinforcement Learning
Abstract: Robotic manipulation of cloth is a challenging task due to the high dimensionality of the configuration space and the complexity of dynamics affected by various material properties. The effect of complex dynamics is even more pronounced in dynamic folding, for example, when a square piece of fabric is folded in two by a single manipulator. To account for the complexity and uncertainties, feedback of the cloth state using e.g. vision is typically needed. However, construction of visual feedback policies for dynamic cloth folding is an open problem. In this paper, we present a solution that learns policies in simulation using Reinforcement Learning (RL) and transfers the learned policies directly to the real world. In addition, to learn a single policy that manipulates multiple materials, we randomize the material properties in simulation. We evaluate the contributions of visual feedback and material randomization in real-world experiments. The experimental results demonstrate that the proposed solution can fold successfully different fabric types using dynamic manipulation in the real world. Code, data, and videos are available at https://sites.google.com/view/dynamic-cloth-folding.
|
|
14:55-15:10, Paper MoB-1.4 | |
PCBot: A Minimalist Robot Designed for Swarm Applications (Finalist for IROS Best Paper Award and ABB Best Student Paper Award) (Finalist for IROS Best Paper Award on Robot Mechanisms and Design Sponsored by ROBOTIS) |
|
Wang, Jingxian | Northwestern University |
Rubenstein, Michael | Northwestern University |
Keywords: Swarm Robotics, Mechanism Design, Multi-Robot Systems
Abstract: Complexity, cost, and power requirements for the actuation of individual robots can play a large factor in limiting the size of robotic swarms. Here we present PCBot, a minimalist robot that can precisely move on an orbital shake table using a bi-stable solenoid actuator built directly into its PCB. This allows the actuator to be built as part of the automated PCB manufacturing process, greatly reducing the impact it has on manual assembly. Thanks to this novel actuator design, PCBot has merely five major components and can be assembled in under 20 seconds, potentially enabling them to be easily mass-manufactured. Here we present the electro-magnetic and mechanical design of PCBot. Additionally, a prototype robot is used to demonstrate its ability to move in a straight line as well as follow given paths.
|
|
15:10-15:25, Paper MoB-1.5 | |
Characterization of Real-Time Haptic Feedback from Multimodal Neural Network-Based Force Estimates During Teleoperation (Finalist for IROS Best Paper Award and ABB Best Student Paper Award) |
|
Chua, Zonghe | Stanford University |
Okamura, Allison M. | Stanford University |
Keywords: Haptics and Haptic Interfaces, Telerobotics and Teleoperation, Computer Vision for Medical Robotics
Abstract: Force estimation using neural networks is a promising approach to enable haptic feedback in minimally invasive surgical robots without end-effector force sensors. Various network architectures have been proposed, but none have been tested in real time with surgical-like manipulations. Thus, questions remain about the real-time transparency and stability of force feedback from neural network-based force estimates. We characterize the real-time impedance transparency and stability of force feedback rendered on a da Vinci Research Kit teleoperated surgical robot using neural networks with vision-only, state-only, and state and vision inputs. Networks were trained on an existing dataset of teleoperated manipulations without force feedback. To measure real-time stability and transparency during teleoperation with force feedback to the operator, we modeled a one-degree-of-freedom human and surgeon-side manipulandum that moved the patient-side robot to perform manipulations on silicone artificial tissue over various robot and camera configurations, and tools. We found that the networks using state inputs displayed more transparent impedance than a vision-only network. However, state-based networks displayed large instability when used to provide force feedback during lateral manipulation of the silicone. In contrast, the vision-only network showed consistent stability in all the evaluated directions. We confirmed the performance of the vision-only network for real-time force feedback in a demonstration with a human teleoperator.
|
|
15:25-15:40, Paper MoB-1.6 | |
Hierarchical Reinforcement Learning for Precise Soccer Shooting Skills Using a Quadrupedal Robot (Finalist for IROS Best RoboCup Paper Award Sponsored by RoboCup Federation) |
|
Ji, Yandong | MIT |
Li, Zhongyu | University of California, Berkeley |
Sun, Yinan | University of California, Berkeley |
Peng, Xue Bin | University of California, Berkeley |
Levine, Sergey | UC Berkeley |
Berseth, Glen | Université De Montréal |
Sreenath, Koushil | University of California, Berkeley |
Keywords: Legged Robots, Reinforcement Learning, Manipulation Planning
Abstract: We address the problem of enabling quadrupedal robots to perform precise shooting skills in the real world using reinforcement learning. Developing algorithms to enable a legged robot to shoot a soccer ball to a given target is a challenging problem that combines robot motion control and planning into one task. To solve this problem, we need to consider the dynamics limitation and motion stability during the control of a dynamic legged robot. Moreover, we need to consider motion planning to shoot the hard-to-model deformable ball rolling on the ground with uncertain friction to a desired location. In this paper, we propose a hierarchical framework that leverages deep reinforcement learning to train (a) a robust motion control policy that can track arbitrary motions and (b) a planning policy to decide the desired kicking motion to shoot a soccer ball to a target. We deploy the proposed framework on an A1 quadrupedal robot and enable it to accurately shoot the ball to random targets in the real world.
|
|
MoB-2 |
Rm2 (Room B-1) |
Learning 2 |
Regular session |
Chair: Rosendo, Andre | Worcester Polytechnic Institute |
Co-Chair: Tran, Dinh Tuan | College of Information Science and Engineering, Ritsumeikan University, Japan |
|
14:10-14:20, Paper MoB-2.1 | |
Multi-Modal Legged Locomotion Framework with Automated Residual Reinforcement Learning |
|
Yu, Chen | ShanghaiTech University |
Rosendo, Andre | ShanghaiTech University |
Keywords: Evolutionary Robotics, Humanoid and Bipedal Locomotion, Legged Robots
Abstract: While quadruped robots usually have good stability and load capacity, bipedal robots offer a higher level of flexibility / adaptability to different tasks and environments. A multi-modal legged robot can take the best of both worlds. In this paper, we propose a multi-modal locomotion framework that is composed of a hand-crafted transition motion and a learning-based bipedal controller—learnt by a novel algorithm called Automated Residual Reinforcement Learning. This framework aims to endow arbitrary quadruped robots with the ability to walk bipedally. In particular, we 1) design an additional supporting structure for a quadruped robot and a sequential multi-modal transition strategy; 2) propose a novel class of Reinforcement Learning algorithms for bipedal control and evaluate their performances in both simulation and the real world. Experimental results show that our proposed algorithms have the best performance in simulation and maintain a good performance in a real-world robot. Overall, our multi-modal robot could successfully switch between biped and quadruped, and walk in both modes. Experiment videos and code are available at https://chenaah.github.io/multimodal/.
|
|
14:20-14:30, Paper MoB-2.2 | |
Learning to Climb: Constrained Contextual Bayesian Optimisation on a Multi-Modal Legged Robot |
|
Yu, Chen | ShanghaiTech University |
Cao, Jinyue | Shanghaitech University |
Rosendo, Andre | ShanghaiTech University |
Keywords: Evolutionary Robotics, Climbing Robots, Bioinspired Robot Learning
Abstract: Controlling a legged robot to climb obstacles with different heights is challenging, but important for an autonomous robot to work in an unstructured environment. In this paper, we model this problem as a novel contextual constrained multi-armed bandit framework. We further propose a learning-based Constrained Contextual Bayesian Optimisation (CoCoBo) algorithm that can solve this class of problems efficiently. CoCoBo models both the reward function and constraints as Gaussian processes, incorporate continuous context space and action space into each Gaussian process, and find the next training samples through excursion search. The experimental results show that CoCoBo is more data-efficient and safe, compared to other related state-of-the-art optimisation methods, on both synthetic test functions and real-world experiments. Our real-world results—our robot could successfully learn to climb an obstacle higher than itself—reveal that our method has an enormous potential to allow self-adaptive robots to work in various terrains. Experiment videos and code are available at the project website https://chenaah.github.io/coco/.
|
|
14:30-14:40, Paper MoB-2.3 | |
Class-Incremental Gesture Recognition Learning with Out-Of-Distribution Detection |
|
Li, Mingxue | University of Chinese Academy of Sciences, Shenyang Institute Of |
Cong, Yang | Chinese Academy of Science, China |
Liu, Yuyang | State Key Laboratory of Robotics, Shenyang Institute of Automati |
Gan, Sun | State Key Laboratory of Robotics, Shenyang Institute of Automati |
Keywords: Gesture, Posture and Facial Expressions, Continual Learning, AI-Based Methods
Abstract: Gesture recognition is a popular human-computer interaction technology, which has been widely applied in many fields (e.g., autonomous driving, medical care, VR and AR). However, 1) most existing gesture recognition methods focus on the fixed recognition scenarios with several gestures, which could lead to memory consumption and computational effort when continuously learning new gestures; 2) Meanwhile, the performance of popular class-incremental methods degrades significantly for previously learned classes (i.e., catastrophic forgetting) due to the ambiguity and variability of gestures. To tackle these challenges, we propose a novel class-incremental gesture recognition method with out-of-distribution (OOD) detection, which can continuously adapt to new gesture classes and achieve high performance for both learned and new gestures. Specifically, we construct an episodic memory with a subset of learned training samples to preserve the previous knowledge from forgetting. Moreover, the OOD detection-based memory management is developed for exploring the most representative and informative core set from the learned datasets. When a new gesture recognition task with strange classes comes, rehearsal enhancement is adopted to increase the diversity of memory exemplars for better fitting the real characteristics of gesture recognition. After deriving an effective class-incremental gesture recognition strategy, we perform experiments on two representative datasets to validate the superiority of our method. Evaluation experiments demonstrate that our proposed method substantially outperforms the state-of-the-art methods with about 2.17%-3.81% improvement under different class-incremental learning scenarios.
|
|
14:40-14:50, Paper MoB-2.4 | |
PIMNet: Physics-Infused Neural Network for Human Motion Prediction |
|
Zhang, Zhibo | University at Buffalo |
Zhu, Yanjun | University at Buffalo |
Rai, Rahul | Clemson University |
Doermann, David | University at Buffalo |
Keywords: Human Detection and Tracking, Deep Learning Methods, AI-Based Methods
Abstract: Human motion prediction has recently attracted attention in computer vision and the robotic domain. Research of human motion prediction helps machines understand human behavior, plan target actions and optimize interaction strategies. Currently, existing prediction methods are based on either purely first principle or data-driven, in which the limitation of the methods restricts the performance. To overcome the limitations of both approaches, we proposed a hybrid model, term as PIMNet, which combines a physics-based and statistical model. In PIMNet, an Encoder-LSTM-Decoder based statistical model is applied. Thus, as a physics-infused machine learning model, PIMNet obtains computational efficiency and physical consistency simultaneously. With the help of the simplified human full-body dynamic model, our LSTM based machine model could accurately predict human motion not only in the short-term but in the long term. By comparing our proposed model with several state-of-the-art approaches, we conclude that our physics-infused hybrid model could help the model be physics-plausible and advance the performance.
|
|
14:50-15:00, Paper MoB-2.5 | |
What Matters in Language Conditioned Robotic Imitation Learning Over Unstructured Data |
|
Mees, Oier | Albert-Ludwigs-Universität |
Hermann, Lukas | University of Freiburg |
Burgard, Wolfram | University Fo Freiburg |
Keywords: Learning Categories and Concepts, Machine Learning for Robot Control, Representation Learning
Abstract: A long-standing goal in robotics is to build robots that can perform a wide range of daily tasks from perceptions obtained with their onboard sensors and specified only via natural language. While recently substantial advances have been achieved in language-driven robotics by leveraging end-to-end learning from pixels, there is no clear and well-understood process for making various design choices due to the underlying variation in setups. In this paper, we conduct an extensive study of the most critical challenges in learning language conditioned policies from offline free-form imitation datasets. We further identify architectural and algorithmic techniques that improve performance, such as a hierarchical decomposition of the robot control learning, a multimodal transformer encoder, discrete latent plans and a self-supervised contrastive loss that aligns video and language representations. By combining the results of our investigation with our improved model components, we are able to present a novel approach that significantly outperforms the state of the art on the challenging language conditioned long-horizon robot manipulation CALVIN benchmark. We have open-sourced our implementation to facilitate future research in learning to perform many complex manipulation skills in a row specified with natural language. Codebase and trained models available at http://hulc.cs.uni-freiburg.de
|
|
15:00-15:10, Paper MoB-2.6 | |
Learning Solution Manifolds for Control Problems Via Energy Minimization |
|
Zamora, Miguel | ETH Zurich |
Poranne, Roi | ETHZ |
Coros, Stelian | ETH Zurich |
Keywords: Machine Learning for Robot Control, Optimization and Optimal Control, Imitation Learning
Abstract: A variety of control tasks such as inverse kinematics (IK), trajectory optimization (TO), and model predictive control (MPC) are commonly formulated as energy minimization problems. Numerical solutions to such problems are well-established. However, these are often too slow to be used directly in real-time applications. The alternative is to learn solution manifolds for control problems in an offline stage. Although this distillation process can be trivially formulated as a behavioral cloning (BC) problem in an imitation learning setting, our experiments highlight a number of significant shortcomings arising due to incompatible local minima, interpolation artifacts, and insufficient coverage of the state space. In this paper, we propose an alternative to BC that is efficient and numerically robust. We formulate the learning of solution manifolds as a minimization of the energy terms of a control objective integrated over the space of problems of interest. We minimize this energy integral with a novel method that combines Monte Carlo-inspired adaptive sampling strategies with the derivatives used to solve individual instances of the control task. We evaluate the performance of our formulation on a series of robotic control problems of increasing complexity, and we highlight its benefits through comparisons against traditional methods such as behavioral cloning and Dataset aggregation (Dagger).
|
|
15:10-15:20, Paper MoB-2.7 | |
First Do Not Fall: Learning to Exploit a Wall with a Damaged Humanoid Robot |
|
Anne, Timothée | Université De Lorraine |
Dalin, Eloise | INRIA |
Bergonzani, Ivan | INRIA |
Ivaldi, Serena | INRIA |
Mouret, Jean-Baptiste | Inria |
Keywords: Machine Learning for Robot Control, Humanoid Robot Systems
Abstract: Humanoid robots could replace humans in hazardous situations but most of such situations are equally dangerous for them, which means that they have a high chance of being damaged and falling. We hypothesize that humanoid robots would be mostly used in buildings, which makes them likely to be close to a wall. To avoid a fall, they can therefore lean on the closest wall, as a human would do, provided that they find in a few milliseconds where to put the hand(s). This article introduces a method, called D-Reflex, that learns a neural network that chooses this contact position given the wall orientation, the wall distance, and the posture of the robot. This contact position is then used by a whole-body controller to reach a stable posture. We show that D-Reflex allows a simulated TALOS robot (1.75m, 100kg, 30 degrees of freedom) to avoid more than 75% of the avoidable falls and can work on the real robot.
|
|
15:20-15:30, Paper MoB-2.8 | |
Learning Suction Cup Dynamics from Motion Capture: Accurate Prediction of an Object's Vertical Motion During Release |
|
Lubbers, Menno | Eindhoven University of Technology (TU/e) |
van Voorst, Job | Eindhoven University of Technology (TU/e) |
Jongeneel, Maarten | Eindhoven University of Technology |
Saccon, Alessandro | Eindhoven University of Technology |
Keywords: Machine Learning for Robot Control, Contact Modeling, Logistics
Abstract: Suction grippers are the most common pick-and-place end effectors used in industry. However, there is little literature on creating and validating models to predict their force interaction with objects in dynamic conditions. In this paper, we study the interaction dynamics of an active vacuum suction gripper during the vertical release of an object. Object and suction cup motions are recorded using a motion capture system. As the object’s mass is known and can be changed for each experiment, a study of the object’s motion can lead to an estimate of the interaction force generated by the suction gripper. We show that, by learning this interaction force, it is possible to accurately predict the object’s vertical motion as a function of time. This result is the first step toward 3D motion prediction when releasing an object from a suction gripper.
|
|
15:30-15:40, Paper MoB-2.9 | |
Embodied Active Domain Adaptation for Semantic Segmentation Via Informative Path Planning |
|
Zurbrügg, René | ETH Zürich |
Blum, Hermann | ETH Zurich |
Cadena Lerma, Cesar | ETH Zurich |
Siegwart, Roland | ETH Zurich |
Schmid, Lukas Maximilian | ETH Zurich |
Keywords: Perception-Action Coupling, Integrated Planning and Learning, Object Detection, Segmentation and Categorization
Abstract: This work presents an embodied agent that can adapt its semantic segmentation network to new indoor environments in a fully autonomous way. Because semantic segmentation networks fail to generalize well to unseen environments, the agent collects images of the new environment which are then used for self-supervised domain adaptation. We formulate this as an informative path planning problem, and present a novel information gain that leverages uncertainty extracted from the semantic model to safely collect relevant data. As domain adaptation progresses, these uncertainties change over time and the rapid learning feedback of our system drives the agent to collect different data. Experiments show that our method adapts to new environments faster and with higher final performance compared to an exploration objective, and can successfully be deployed to real-world environments on physical robots.
|
|
MoB-3 |
Rm3 (Room B-2) |
Grasping 2 |
Regular session |
Chair: Ozawa, Ryuta | Meiji University |
Co-Chair: Park, Jaeheung | Seoul National University |
|
14:10-14:20, Paper MoB-3.1 | |
ICK-Track: A Category-Level 6-DoF Pose Tracker Using Inter-Frame Consistent Keypoints for Aerial Manipulation |
|
Sun, Jingtao | Hunan University, College of Electrical and Information Engineer |
Wang, Yaonan | Hunan University |
Feng, Mingtao | Xidian University |
Wang, Danwei | Nanyang Technological University |
Zhao, Jiawen | College of Electrical and Information Engineering, Hunan Univers |
Stachniss, Cyrill | University of Bonn |
Chen, Xieyuanli | N/A |
Keywords: Perception for Grasping and Manipulation, Aerial Systems: Perception and Autonomy, Object Detection, Segmentation and Categorization
Abstract: Robots that are supposed to interact with or manipulate objects in the world must be able to track the poses of objects in their sensor data. Thus, Detecting and tracking the 6-DoF poses of targeted objects is important for aerial manipulation and is still in the early stage due to the high dynamics and limited onboard capacity of such systems. In this paper, we propose ICK-Track, a novel method for onboard category-level object 6-DoF pose tracking that can be applied to aerial manipulation without using any pre-defined object CAD models. It first utilizes a semi-supervised video segmentation to detect objects in the eye-in-hand RGB-D camera stream to segment the 3D points of objects. Then, canonical keypoints are extracted using iterative farthest point sampling. We propose a novel inter-frame consistent keypoints generation network to generate the corresponding keypoint pairs, which are used together with ICP to estimate the pose changes of objects for tracking. Experimental results show that our method is more robust to viewpoint changes and runs faster than the state-of-the-art methods on category-level pose tracking. We further test our proposed method on a real aerial manipulator. A demo video showing the use of our method on a real aerial manipulator and the implementation of our method are available at: https://github.com/S-JingTao/ICK-Track.
|
|
14:20-14:30, Paper MoB-3.2 | |
Multi-Finger Grasping Like Humans |
|
Du, Yuming | Ecole Des Ponts Paristech |
Weinzaepfel, Philippe | NAVER LABS Europe |
Lepetit, Vincent | ENPC ParisTech |
Brégier, Romain | NAVER LABS Europe |
Keywords: Grasping, Multifingered Hands
Abstract: Robots with multi-fingered grippers could perform advanced manipulation tasks for us if we were able to properly specify to them what to do. In this study, we take a step in that direction by making a robot grasp an object like a grasping demonstration performed by a human. We propose a novel optimization-based approach for transferring human grasp demonstrations to any multi-fingered grippers, which produces robotic grasps that mimic the human hand orientation and the contact area with the object, while alleviating interpenetration. Extensive experiments with the Allegro and BarrettHand grippers show that our method leads to grasps more similar to the human demonstration than existing approaches, without requiring any gripper-specific tuning. We confirm these findings through a user study and validate the applicability of our approach on a real robot.
|
|
14:30-14:40, Paper MoB-3.3 | |
Computation and Selection of Secure Gravity Based Caging Grasps of Planar Objects |
|
Shirizly, Alon | Technion - Israel Institute of Technology |
Rimon, Elon | Technion - Israel Institute of Technology |
Keywords: Grasping, Manipulation Planning, Computational Geometry
Abstract: Gravity based caging grasps are robotic grasps where the robot hand passively supports an object against gravity. When a robot hand supports an object at a local minimum of the object gravitational energy, the robot hand forms a basket like grasp of the object. Any object movement in a basket grasp requires an increase of the object gravitational energy, thus allowing secure object pickup and transport with robot hands that use a small number fingers. The basket grasp depth measures the minimal additional energy the object must acquire to escape the basket grasp. This paper describes a computation scheme that determines the depth of entire sets of candidate basket grasps associated with alternative finger placements on the object boundary before pickup. The computation relies on categorization of escape stances that mark the basket grasp depth: double-support escapes are first analyzed and computed, then single-support escapes are analyzed and computed. The minimum energy combination of both types of escape stances defines the depth of entire sets of candidate basket grasps, which is then used to identify the deepest and hence most secure basket grasp. The computation scheme is fully implemented and demonstrated on several examples with reported run-times.
|
|
14:40-14:50, Paper MoB-3.4 | |
F1 Hand: A Versatile Fixed-Finger Gripper for Delicate Teleoperation and Autonomous Grasping |
|
Maeda, Guilherme Jorge | Sony AI |
Fukaya, Naoki | Preferred Networks, Inc |
Maeda, Shin-ichi | Preferred Networks |
Keywords: Grippers and Other End-Effectors, Telerobotics and Teleoperation, Grasping
Abstract: Teleoperation is often limited by the ability of an operator to not only react but also predict the behavior of the robot as it interacts with the environment. For example, to grasp small objects on a table by teleoperating a multiple-linkage adaptive finger, we need to predict the position of the fingertips before the fingers are closed to avoid them hitting the table while successfully grasping the objects. For that reason, we developed the F1 Hand, a single-motor gripper that makes teleoperation intuitive due to the presence of a fixed finger. As a result, it is possible to grasp objects as thin as a paper clip and a toothpick, and as heavy and large as a cordless drill. Moreover, the applicability of the hand can be expanded by replacing the fixed finger with different shapes according to different requirements. This flexibility allows the hand to be highly versatile and developable without sacrificing much time and money. On one hand, the asymmetric design of the two-finger gripper is uncommon compared with the symmetric gripper. Therefore, it is not obvious how to control the F1 hand for autonomous grasping. Thus, we propose a controller that approximates actuation symmetry by using the motion of the whole arm. The F1 hand and its controller are compared side-by-side with the original Toyota Human Support Robot (HSR) gripper in teleoperation using 22 objects from the YCB dataset in addition to small objects. The grasping time and peak contact forces could be decreased by 20% and 70%, respectively while increasing success rates by 5%. Using an off-the-shelf grasp pose estimator for autonomous grasping, the system achieved similar success rates to the original HSR gripper, at the order of 80%.
|
|
14:50-15:00, Paper MoB-3.5 | |
EfficientGrasp: A Unified Data-Efficient Learning to Grasp Method for Multi-Fingered Robot Hands |
|
Li, Kelin | Imperial College London |
Baron, Nicholas | Ocado Technology |
Zhang, Xian | Imperial College London |
Rojas, Nicolas | Imperial College London |
Keywords: Grasping, Deep Learning in Grasping and Manipulation, Multifingered Hands
Abstract: Autonomous grasping of novel objects that are previously unseen to a robot is an ongoing challenge in robotic manipulation. In the last decades, many approaches have been presented to address this problem for specific robot hands. The UniGrasp framework, introduced recently, has the ability to generalize to different types of robotic grippers; however, this method does not work on grippers with closed-loop constraints and is data-inefficient when applied to robot hands with multigrasp configurations. In this paper, we present EfficientGrasp, a generalized grasp synthesis and gripper control method that is independent of gripper model specifications. EfficientGrasp utilizes a gripper workspace feature rather than UniGrasp’s gripper attribute inputs. This reduces memory use by 81.7% during training and makes it possible to generalize to more types of grippers, such as grippers with closed-loop constraints. The effectiveness of EfficientGrasp is evaluated by conducting object grasping experiments both in simulation and real-world; results show that the proposed method also outperforms UniGrasp when considering only grippers without closed-loop constraints. In these cases, EfficientGrasp shows 9.85% higher accuracy in generating contact points and 3.10% higher grasping success rate in simulation. The real-world experiments are conducted with a gripper with closed-loop constraints, which UniGrasp fails to handle while EfficientGrasp achieves a success rate of 83.3%. The main causes of grasping failures of the proposed method are analyzed, highlighting ways of enhancing grasp performance.
|
|
15:00-15:10, Paper MoB-3.6 | |
BRL/Pisa/IIT SoftHand: A Low-Cost, 3D-Printed, Underactuated, Tendon-Driven Hand with Soft and Adaptive Synergies |
|
Li, Haoran | University of Bristol |
Ford, Christopher | University of Bristol |
Bianchi, Matteo | University of Pisa |
Catalano, Manuel Giuseppe | Istituto Italiano Di Tecnologia |
Psomopoulou, Efi | University of Bristol |
Lepora, Nathan | University of Bristol |
Keywords: Multifingered Hands, Underactuated Robots
Abstract: This paper introduces the BRL/Pisa/IIT (BPI) SoftHand: a single actuator-driven, low-cost, 3D-printed, tendon-driven, underactuated robot hand that can be used to perform a range of grasping tasks. Based on the adaptive synergies of the Pisa/IIT SoftHand, we design a new joint system and tendon routing to facilitate the inclusion of both soft and adaptive synergies, which helps us balance durability, affordability and moderate dexterity of the hand. The focus of this work is on the design, simulation, synergies and grasping tests of this SoftHand. The novel phalanges are designed and printed based on linkages, gear pairs and geometric restraint mechanisms, and can be applied to most tendon-driven robotic hands. We show that the robot hand can successfully grasp and lift various target objects and adapt to hold complex geometric shapes, reflecting the successful adoption of the soft and adaptive synergies. We intend to open-source the design of the hand so that it can be built cheaply on a home 3D-printer.
|
|
15:10-15:20, Paper MoB-3.7 | |
On Robotic Manipulation of Flexible Flat Cables: Employing a Multi-Modal Gripper with Dexterous Tips, Active Nails, and a Reconfigurable Suction Cup Module |
|
Buzzatto, Joao | The University of Auckland |
Chapman, Jayden | The University of Auckland |
Shahmohammadi, Mojtaba | University of Auckland |
Sanches, Felipe Padula | University of Auckland |
Nejati, Mahla | The University of Auckland |
Matsunaga, Saori | Mitsubishi Electric Corporation |
Haraguchi, Rintaro | MitsubishiElectric Corp |
Mariyama, Toshisada | Mitsubishi Electric Corporation |
MacDonald, Bruce | University of Auckland |
Liarokapis, Minas | The University of Auckland |
Keywords: Grippers and Other End-Effectors, In-Hand Manipulation
Abstract: A popular solution for connecting different components in modern electronics, such as mobile phones, laptops, tablets, etc, is the use of flexible flat cables (FFC). Typically, it takes hours of repetition from a highly trained worker, or a high precision autonomous robot with specialised end effectors to reliably manage the installation of these cables. Human workers are prone to error, and cannot work endlessly without a break, while the robots often come with a significant expense, and require a substantial amount of time to program and reprogram. Additionally, the use of sophisticated sensing elements further increases the complexity of the required control system. As a result, the performance and robustness of such systems is far from sufficient, hindering their mass adoption. The manipulation of FFCs is also quite challenging. In this work, we focus on the robotic manipulation of a plethora of flexible cables, proposing a multi-modal gripper with locally-dexterous tips and active fingernails. The fingers of the gripper are equipped with: i) locally-dexterous fingertips that accommodate manipulation-capable degrees of freedom, ii) a combination of Nitinol-based active fingernails and suction cups that allow picking up and handling of cables that rest on flat surfaces, and iii) compliant finger-pads that conform to the object surface to increase grasping stability. The proposed robotic gripper is equipped with a camera and a perception system that allow for the execution of complex cable manipulation and assembly tasks in dynamic environments.
|
|
15:20-15:30, Paper MoB-3.8 | |
Automated Design of Embedded Constraints for Soft Hands Enabling New Grasp Strategies |
|
Bo, Valerio | Istituto Italiano Di Tecnologia |
Turco, Enrico | Istituto Italiano Di Tecnologia |
Pozzi, Maria | University of Siena |
Malvezzi, Monica | University of Siena |
Prattichizzo, Domenico | University of Siena |
Keywords: Grasping, Grippers and Other End-Effectors, Soft Robot Materials and Design
Abstract: Soft robotic hands allow to fully exploit hand-object-environment interactions to complete grasping tasks. However, their usability can still be limited in some scenarios (e.g., restricted or cluttered spaces). In this paper, we propose to enhance the versatility of soft grippers by adding special passive components to their structure, without completely altering their design, nor their control. A method for the automated design of soft-rigid scoop-shaped add-ons acting as “embedded constraints” is presented. Given a certain gripper and a large set of objects, the design parameters of the optimal scoop for each object are derived by solving an optimization problem. Also the object-environment relative pose is considered in the optimization. The obtained “optimal scoops” are clustered to get a limited set of representative scoop designs which can be prototyped and used in grasping tasks. In this work, we also introduce a data-driven method allowing a grasp planner to select the most suitable scoop to be added to the used hand, given a certain object and its configuration with respect to the surrounding environment. Experiments with two different hands validate the proposed approach.
|
|
15:30-15:40, Paper MoB-3.9 | |
Scalable Learned Geometric Feasibility for Cooperative Grasp and Motion Planning |
|
Park, Suhan | Seoul National University |
Kim, Hyoung Cheol | Seoul National University |
Baek, Jiyeong | Seoul National University |
Park, Jaeheung | Seoul National University |
Keywords: Grasping, Manipulation Planning, Deep Learning in Grasping and Manipulation
Abstract: This study proposes a novel learned feasibility estimator that considers multi-modal grasp poses for grasp and motion planning. Grasp poses have inherently multi-modal structures: continuous and discrete parameters. Mixed-integer programming (MIP) is one method that solves these multi-modal problems. However, searching all discrete parameters costs considerable time. Therefore, by learning the feasibility of each mode from geometric variables, the problem can be solved efficiently within the given time limit. The feasibility of grasp poses is related to the pose of the object and obstacles nearby the object. Utilizing this information, we introduce the learned geometric feasibility (LGF) that prioritizes the integer search of MIP. LGF is scalable to multiple robots and environments because it learns the feasibility using object-oriented information. It is demonstrated to improve the number of the solved MIP problems within the time limit and apply the LGF to various environmental settings.
|
|
MoB-4 |
Rm4 (Room C-1) |
Manipulation Systems 2 |
Regular session |
Chair: Harada, Kensuke | Osaka University |
Co-Chair: Yamanobe, Natsuki | Advanced Industrial Science and Technology |
|
14:10-14:20, Paper MoB-4.1 | |
A Solution to Adaptive Mobile Manipulator Throwing |
|
Liu, Yang | EPFL |
Nayak, Aradhana | EPFL |
Billard, Aude | EPFL |
Keywords: Mobile Manipulation, Task and Motion Planning, Representation Learning
Abstract: Mobile manipulator throwing is a promising method to increase the flexibility and efficiency of dynamic manipulation in factories. Its major challenge is to efficiently plan a feasible throw under a wide set of task specifications. We show that the mobile manipulator throwing problem can be simplified to a planar problem, hence greatly reducing the computational costs. Using machine learning approaches, we build a model of the object's inverted flying dynamics and the robot's kinematic feasibility, which enables throwing motion generation within 1 ms for given query of target position. Thanks to the computational efficiency of our method, we show that the system is adaptive under disturbance, via replanning on the fly for alternative solutions, instead of sticking to the original throwing plan.
|
|
14:20-14:30, Paper MoB-4.2 | |
A Novel Design and Evaluation of a Dactylus-Equipped Quadruped Robot for Mobile Manipulation |
|
Tsvetkov, Yordan | University of Edinburgh |
Ramamoorthy, Subramanian | The University of Edinburgh |
Keywords: Biologically-Inspired Robots, Mobile Manipulation, Legged Robots
Abstract: Quadruped robots are usually equipped with additional arms for manipulation, negatively impacting price and weight. On the other hand, the requirements of legged locomotion mean that the legs of such robots often possess the needed torque and precision to perform manipulation. In this paper, we present a novel design for a small-scale quadruped robot equipped with two leg-mounted manipulators inspired by crustacean chelipeds and knuckle-walker forelimbs. By making use of the actuators already present in the legs, we can achieve manipulation using only 3 additional motors per limb. The design enables the use of small and inexpensive actuators relative to the leg motors, further reducing cost and weight. The moment of inertia impact on the leg is small thanks to an integrated cable/pulley system. As we show in a suite of tele-operation experiments, the robot is capable of performing single- and dual-limb manipulation, as well as transitioning between manipulation modes. The proposed design performs similarly to an additional arm while weighing and costing 5 times less per manipulator and enabling the completion of tasks requiring 2 manipulators.
|
|
14:30-14:40, Paper MoB-4.3 | |
Task-Oriented Contact Optimization for Pushing Manipulation with Mobile Robots |
|
Bertoncelli, Filippo | University of Modena and Reggio Emilia |
Selvaggio, Mario | Università Degli Studi Di Napoli Federico II |
Ruggiero, Fabio | Università Di Napoli Federico II |
Sabattini, Lorenzo | University of Modena and Reggio Emilia |
Keywords: Dexterous Manipulation, Manipulation Planning, Multi-Robot Systems
Abstract: This work addresses the problem of transporting an object along a desired planar trajectory by pushing with mobile robots. More specifically, we concentrate on establishing optimal contacts between the object and the robots to execute the given task with minimum effort. We present a task-oriented contact placement optimization strategy for object pushing that allows calculating optimal contact points minimizing the amplitude of forces required to execute the task. Exploiting the optimized contact configuration, a motion controller uses the computed contact forces in feed-forward and position error feedback terms to realize the desired trajectory tracking task. Simulations and real experiments results confirm the validity of our approach.
|
|
14:40-14:50, Paper MoB-4.4 | |
Articulated Object Interaction in Unknown Scenes with Whole-Body Mobile Manipulation |
|
Mittal, Mayank | ETH Zurich |
Hoeller, David | ETH Zurich, NVIDIA |
Farshidian, Farbod | ETH Zurich |
Hutter, Marco | ETH Zurich |
Garg, Animesh | University of Toronto |
Keywords: Mobile Manipulation, Whole-Body Motion Planning and Control, Collision Avoidance
Abstract: A kitchen assistant needs to operate human-scale objects, such as cabinets and ovens, in unmapped environments with dynamic obstacles. Autonomous interactions in such environments require integrating dexterous manipulation and fluid mobility. While mobile manipulators in different form factors provide an extended workspace by combining the dexterity of a manipulator arm with the extended reach of a mobile base, their real-world adoption has been limited. This limitation is in part due to two main reasons: 1) inability to interact with unknown human-scale objects and 2) inefficient coordination between the arm and the mobile base. Executing a high-level task for general objects requires a perceptual understanding of the object as well as adaptive whole-body control among dynamic obstacles. In this paper, we propose a two-stage architecture for autonomous interaction with large articulated objects in unknown environments. The first stage, emph{object-centric} planner, only focuses on the object to provide an action-conditional sequence of states for manipulation using RGB-D data. The second stage, emph{agent-centric} planner, formulates the whole-body motion control as an optimal control problem that ensures safe tracking of the generated plan, even in scenes with moving obstacles. We show that the proposed pipeline can handle complex static and dynamic kitchen settings for both wheel-base and legged mobile manipulators. Compared to other agent-centric planners, our proposed planner achieves a higher success rate and a lower execution time. We perform hardware tests on a legged mobile manipulator to interact with various articulated objects in a kitchen. For additional material, please check: https://www.pair.toronto.edu/articulated-mm/ .
|
|
14:50-15:00, Paper MoB-4.5 | |
A Versatile Affordance Modeling Framework Using Screw Primitives to Increase Autonomy During Manipulation Contact Tasks |
|
Pettinger, Adam | The University of Texas at Austin |
Alambeigi, Farshid | University of Texas at Austin |
Pryor, Mitchell | University of Texas |
Keywords: Mobile Manipulation, Telerobotics and Teleoperation, Reactive and Sensor-Based Planning
Abstract: Recent studies utilizing Affordance Templates to perform remote contact manipulation tasks with mobile manipulators have demonstrated their usefulness for modeling complex tasks allowing robots to work in uncertain environments. These efforts largely fall into the “supervised autonomy” paradigm where a user commands high-level actions and supervises the robot’s execution, but is not responsible for the low-level execution details such as controlling the endpoint/joints, managing contact forces, or avoiding collisions. In this work, we present a novel formulation for modeling affordances that features the versatility and generality of screw theory. We also propose a generic framework and algorithm for executing manipulation contact tasks. To thoroughly evaluate the performance of the proposed formulation and framework, we performed various sets of experiments using our mobile manipulator. We showed that this framework is generic enough to successfully manipulate a variety of articulated objects. Moreover, by performing a set of manipulation tasks on a wheel valve as our case study, we demonstrated that our approach lowers task duration (about 5 times) and applied forces and torques significantly (about 2.5 and 3 times, respectively) when compared to direct teleoperation. Further, by performing 90 trials, we show robust performance of the proposed screw-based framework even when there is significant error in the estimated position and orientation (i.e., about 10 cm and 0.35 rad) of the task objects.
|
|
15:00-15:10, Paper MoB-4.6 | |
Combining Navigation and Manipulation Costs for Time-Efficient Robot Placement in Mobile Manipulation Tasks |
|
Reister, Fabian | Karlsruhe Institute of Technology (KIT) |
Grotz, Markus | Karlsruhe Institute of Technology (KIT) |
Asfour, Tamim | Karlsruhe Institute of Technology (KIT) |
Keywords: Mobile Manipulation, Motion and Path Planning, Integrated Planning and Control
Abstract: Mobile manipulation tasks require a seamless integration of navigation and manipulation capabilities. Finding suitable robot placements to pick up and place objects in such tasks is crucial for time-efficient task execution. Sub-optimal robot placements result in infeasible solutions or require larger repositioning of the mobile base to reach target objects, increasing the overall time to complete the task. In this work, we propose an approach that, given a set of objects, autonomously selects the optimal placements of a humanoid robot in conjunction with the best grasp candidate and corresponding arm. In contrast to previous approaches, our method considers both the navigation costs between consecutive robot placements and the manipulation costs to reduce the time needed to complete the task. We evaluate our method on a simulated table clearing task that requires the robot to move between pickup and discard locations and demonstrate the applicability in a real-world experiment on the humanoid robot ARMAR-6. In addition, we perform a run-time analysis and show that our approach can integrate sensory feedback to update the optimal placement in dynamic environments.
|
|
15:10-15:20, Paper MoB-4.7 | |
An Open-Source Motion Planning Framework for Mobile Manipulators Using Constraint-Based Task Space Control with Linear MPC |
|
Stelter, Simon | Universität Bremen |
Bartels, Georg | Universität Bremen |
Beetz, Michael | University of Bremen |
Keywords: Mobile Manipulation, Service Robotics
Abstract: We present an open source motion planning framework for ROS, which uses constraint and optimization based task space control to generate trajectories for the whole body of mobile manipulators. Motion goals are defined as constraints which are enforced on task space functions. They map the controllable degrees of freedom of a system onto custom task spaces, which can, but do not have to be, the Cartesian space. We use this expressive tool from motion control to pre-compute trajectories in order to utilize the fact that most robots offer controllers to follow such trajectories. As a result, our framework only requires a kinematic model of the robot to control it. In addition, we extend the constraint-based motion control approach with linear MPC to explicitly optimize for velocity, acceleration and jerk simultaneously, which allows us to enforce constraints on all derivatives in both joint and task space at the same time. As a result, we can reuse pre-defined motion goals on any robot without modifications. Our framework was tested on four different robots to show its generality.
|
|
15:20-15:30, Paper MoB-4.8 | |
View Planning for Object Pose Estimation Using Point Clouds: An Active Robot Perception Approach |
|
Hu, Jie | Texas A&M University |
Pagilla, Prabhakar Reddy | Texas A&M University |
Keywords: Computer Vision for Manufacturing, Reactive and Sensor-Based Planning, Motion and Path Planning
Abstract: This paper considers the object pose estimation problem for robotic tasks where the robot end-effector is mounted with a vision sensor to collect pose data from different views. The data quality, which directly affects how accurately the object's pose can be estimated, depends on view planning which is the process of sequentially placing the vision sensor in the robot workspace to continuously acquire useful data (pose information). While the pose estimation is done through point cloud registration by using existing methods, the focus of this work is on planning sensor views based on point cloud analysis to improve point cloud quality and quantity for pose estimation. To this end, we propose an active robot perception framework based on the intrinsic connection between data collection and pose estimation for a robotic task: robot motions should be guided by the task goal and the task is better completed with improved data collected through robot motion. The proposed active perception framework includes the following components: evaluating the quality and quantity of the point clouds, finding a minimum number of optimal sensor views, and calculating robot poses corresponding to the sensor views. The minimum number of sensor views required to improve the point cloud quality is found by solving a set cover problem while the optimal sensor views are obtained through mixed-integer programming. The robot poses corresponding to the sensor views are calculated by solving the robot inverse kinematics via nonlinear programming. Extensive experiments were conducted with different objects to evaluate the proposed framework, and a representative sample of the results from those experiments are provided. The overall effectiveness of the active perception strategy is shown through imp
|
|
15:30-15:40, Paper MoB-4.9 | |
Planning to Build Block Structures with Unstable Intermediate States Using Two Manipulators (I) |
|
Chen, Hao | Osaka University |
Wan, Weiwei | Osaka University |
Harada, Kensuke | Osaka University |
Koyama, Keisuke | Osaka University |
Keywords: Assembly, Manipulation Planning, Task and Motion Planning
Abstract: The work is inspired by the assembly of Soma block puzzles. Soma block puzzles usually include unstable intermediate states that require additional support to maintain stability temporarily. In the puzzles’ solution manual, we can observe that designers consider the characteristics that humans have two hands and can avoid an unstable intermediate state by using one hand to support the finished component and using the other hand to assemble an upcoming workpiece. Motivated by human behavior, this paper develops a planner that automatically finds an optimal assembly sequence for a dual-arm robot to build a woodblock structure while considering various constraints and supporting grasps from a second hand. It uses the mesh model of wood blocks and the final assembly state to generate possible assembly sequences and evaluate the optimal assembly sequence by considering the stability, graspability, assemblability, and the need for a second hand. Especially, the need for a second hand is resolved when supports from worktables and other workpieces are not enough to produce a stable assembly. A second hand can hold and support the unstable components so that the robot can further assemble new workpieces until the structure state becomes stable again. The output of the planner includes the optimal assembly orders, candidate grasps, assembly directions, and the supporting grasps (if needed). The output can help guide a dual-arm robot to perform motion planning and thus generate assembly motion. Experiments using various blocks and structures show the effectiveness of the proposed planner.
|
|
MoB-5 |
Rm5 (Room C-2) |
Navigation Systems 1 |
Regular session |
Chair: Stoyanov, Danail | University College London |
Co-Chair: Taguchi, Shun | Toyota Central R&D Labs., Inc |
|
14:10-14:20, Paper MoB-5.1 | |
RARA: Zero-Shot Sim2Real Visual Navigation with Following Foreground Cues |
|
Kelchtermans, Klaas | KU Leuven |
Tuytelaars, Tinne | KU Leuven |
Keywords: Vision-Based Navigation, AI-Enabled Robotics, Deep Learning for Visual Perception
Abstract: The gap between simulation and the real-world restrains many machine learning breakthroughs in computer vision and reinforcement learning from being applicable in the real world. In this work, we tackle this gap for the specific case of camera-based navigation, formulating it as following a visual cue in the foreground with arbitrary backgrounds. The visual cue in the foreground can often be simulated realistically, such as a line, gate or cone. The challenge then lies in coping with the unknown backgrounds and integrating both. As such, the goal is to train a visual agent on data captured in an empty simulated environment except for this foreground cue and test this model directly in a visually diverse real world. In order to bridge this big gap, we show it's crucial to combine following techniques namely: Randomized augmentation of the fore- and background, regularization with both deep supervision and triplet loss and finally abstraction of the dynamics by using waypoints rather than direct velocity commands. The various techniques are ablated in our experimental results both qualitatively and quantitatively finally demonstrating a successful transfer from simulation to the real world.
|
|
14:20-14:30, Paper MoB-5.2 | |
Navigation among Movable Obstacles with Object Localization Using Photorealistic Simulation |
|
Ellis, Kirsty | University College London |
Zhang, Henry, Ziheng | UCL |
Stoyanov, Danail | University College London |
Kanoulas, Dimitrios | University College London |
Keywords: Vision-Based Navigation, Motion and Path Planning, Autonomous Vehicle Navigation
Abstract: While mobile navigation has been focused on obstacle avoidance, Navigation Among Movable Obstacles (NAMO) via interaction with the environment, is a problem that is still open and challenging. This paper, presents a novel system integration to handle NAMO using visual feedback. In order to explore the capabilities of our introduced system, we explore the solution of the problem via graph-based path planning in a photorealistic simulator (NVIDIA Isaac Sim), in order to identify if the simulation-to-reality (sim2real) problem in robot navigation can be resolved. We consider the case where a wheeled robot navigates in a warehouse, in which movable boxes are common obstacles. We enable online real-time object localization and obstacle movability detection, to either avoid objects or, if it is not possible, to clear them out from the robot planned path by using pushing actions. We firstly test the integrated system in photorealistic environments, and we then validate the method on a real-world mobile wheeled robot (UCL MPPL) and its on-board sensory and computing system.
|
|
14:30-14:40, Paper MoB-5.3 | |
Navigating Underground Environments Using Simple Topological Representations |
|
Cano, Lorenzo | Universidad De Zaragoza |
Mosteo, Alejandro R. | Centro Universitario De La Defensa De Zaragoza |
Tardioli, Danilo | Centro Universitario De La Defensa |
Keywords: Autonomous Vehicle Navigation, Field Robots
Abstract: Underground environments are some of the most challenging for autonomous navigation. The long, featureless corridors, loose and slippery soils, bad illumination and unavailability of global localization make many traditional approaches struggle. In this work, a topological-based navigation system is presented that enables autonomous navigation of a ground robot in mine-like environments relying exclusively on a high-level topological representation of the tunnel network. The topological representation is used to generate high-level topological instructions used by the agent to navigate through corridors and intersections. A Convolutional Neural Network (CNN) is used to detect all the galleries accessible to a robot from its current position. The use of a CNN proves to be a reliable approach to this problem, capable of detecting the galleries correctly in a wide variety of situations. The CNN is also able to detect galleries even in the presence of obstacles, which motivates the development of a reactive navigation system that can effectively exploit the predictions of the gallery detection.
|
|
14:40-14:50, Paper MoB-5.4 | |
Teaching Agents How to Map: Spatial Reasoning for Multi-Object Navigation |
|
Marza, Pierre | INSA Lyon |
Matignon, Laetitia | Université Lyon Claude Bernard |
Simonin, Olivier | INSA De Lyon |
Wolf, Christian | Naver Labs Europe |
Keywords: Vision-Based Navigation, Deep Learning Methods, Deep Learning for Visual Perception
Abstract: In the context of visual navigation, the capacity to map a novel environment is necessary for an agent to exploit its observation history in the considered place and efficiently reach known goals. This ability can be associated with spatial reasoning, where an agent is able to perceive spatial relationships and regularities, and discover object characteristics. Recent work introduces learnable policies parametrized by deep neural networks and trained with Reinforcement Learning (RL). In classical RL setups, the capacity to map and reason spatially is learned end-to-end, from reward alone. In this setting, we introduce supplementary supervision in the form of auxiliary tasks designed to favor the emergence of spatial perception capabilities in agents trained for a goal-reaching downstream objective. We show that learning to estimate metrics quantifying the spatial relationships between an agent at a given location and a goal to reach has a high positive impact in Multi-Object Navigation settings. Our method significantly improves the performance of different baseline agents, that either build an explicit or implicit representation of the environment, even matching the performance of incomparable oracle agents taking ground-truth maps as input. A learning-based agent from the literature trained with the proposed auxiliary losses was the winning entry to the Multi-Object Navigation Challenge, part of the CVPR 2021 Embodied AI Workshop.
|
|
14:50-15:00, Paper MoB-5.5 | |
NAUTS: Negotiation for Adaptation to Unstructured Terrain Surfaces |
|
Siva, Sriram | Colorado School of Mines |
Wigness, Maggie | U.S. Army Research Laboratory |
Rogers III, John G. | US Army Research Laboratory |
Quang, Long | U.S. Army Research Laboratory |
Zhang, Hao | Colorado School of Mines |
Keywords: Vision-Based Navigation, Autonomous Vehicle Navigation, Field Robots
Abstract: When robots operate in real-world off-road environments with unstructured terrains, the ability to adapt their navigational policy is critical for effective and safe navigation. However, off-road terrains introduce several challenges to robot navigation, including dynamic obstacles and terrain uncertainty, leading to inefficient traversal or navigation failures. To address these challenges, we introduce a novel approach for adaptation by negotiation that enables a ground robot to adjust its navigational behaviors through a negotiation process. Our approach first learns prediction models for various navigational policies to function as a terrain-aware joint local controller and planner. Then, through a new negotiation process, our approach learns from various policies' interactions with the environment to agree on the optimal combination of policies in an online fashion to adapt robot navigation to unstructured off-road terrains on the fly. Additionally, we implement a new optimization algorithm that offers the optimal solution for robot negotiation in real-time during execution. Experimental results have validated that our method for adaptation by negotiation outperforms previous methods for robot navigation, especially over unseen and uncertain dynamic terrains.
|
|
15:00-15:10, Paper MoB-5.6 | |
Resilient Detection and Recovery of Autonomous Systems Operating under On-Board Controller Cyber Attacks |
|
Bonczek, Paul | University of Virginia |
Bezzo, Nicola | University of Virginia |
Keywords: Autonomous Vehicle Navigation, Failure Detection and Recovery, Motion and Path Planning
Abstract: Cyber-attacks, failures, and implementation errors inside the controller of an autonomous system can affect its correct behavior leading to unsafe states and degraded performance. In this paper, we focus on such problems specifically on cyber-attacks that manipulate controller parameters like the gains in a feedback controller or that triggers different behaviors or block inputs based on specific values of the state and tracking error. If such attacks are undetected, they can lead to the partial or complete loss of system's control authority, resulting in a hijacking and leading the autonomous system towards unforeseen states. To deal with this problem, we propose a runtime monitoring and recovery scheme in which: 1) we leverage the residual between the expected and the received measurements to detect inconsistencies in the generated inputs and 2) provide a recovery method for counteracting the malicious effects to allow for resilient operations by manipulating the reference signal and state vector provided to the system to avoid the affected regions in the state and error space. We validate our approach with Matlab simulations and experiments on unmanned ground vehicles resiliently performing operations in the presence of malicious attacks to on-board controllers.
|
|
15:10-15:20, Paper MoB-5.7 | |
Benchmarking Augmentation Methods for Learning Robust Navigation Agents: The Winning Entry of the 2021 iGibson Challenge |
|
Yokoyama, Naoki | Georgia Institute of Technology |
Luo, Qian | Georgia Institute of Technology |
Batra, Dhruv | Georgia Tech / Facebook AI Research |
Ha, Sehoon | Georgia Institute of Technology |
Keywords: Vision-Based Navigation, Autonomous Agents, Reinforcement Learning
Abstract: Recent advances in deep reinforcement learning and scalable photorealistic simulation have led to increasingly mature embodied AI for various visual tasks, including navigation. However, while impressive progress has been made for teaching embodied agents to navigate static environments, much less progress has been made on more dynamic environments that may include moving pedestrians or movable obstacles. In this study, we aim to benchmark different augmentation techniques for improving the agent's performance in these challenging environments. We show that adding several dynamic obstacles into the scene during training confers significant improvements in test-time generalization, achieving much higher success rates than baseline agents. We find that this approach can also be combined with image augmentation methods to achieve even higher success rates. Additionally, we show that this approach is also more robust to sim-to-sim transfer than image augmentation methods. Finally, we demonstrate the effectiveness of this dynamic obstacle augmentation approach by using it to train an agent for the 2021 iGibson Challenge at CVPR, where it achieved 1st place for Interactive Navigation.
|
|
15:20-15:30, Paper MoB-5.8 | |
Robotic Interestingness Via Human-Informed Few-Shot Object Detection |
|
Kim, Seungchan | Carnegie Mellon University |
Wang, Chen | Carnegie Mellon University |
Li, Bowen | Tongji University |
Scherer, Sebastian | Carnegie Mellon University |
Keywords: Vision-Based Navigation, Object Detection, Segmentation and Categorization, Human Factors and Human-in-the-Loop
Abstract: Interestingness recognition is crucial for decision making in autonomous exploration for mobile robots. Previous methods proposed an unsupervised online learning approach that can adapt to environments and detect interesting scenes quickly, but lack the ability to adapt to human-informed interesting objects. To solve this problem, we introduce a human-interactive framework, AirInteraction, that can detect human-informed objects via few-shot online learning. To reduce the communication bandwidth, we first apply an online unsupervised learning algorithm on the unmanned vehicle for interestingness recognition and then only send the potential interesting scenes to a base-station for human inspection. The human operator is able to draw and provide bounding box annotations for particular interesting objects, which are sent back to the robot to detect similar objects via few-shot learning. Only using few human-labeled examples, the robot can learn novel interesting object categories during the mission and detect interesting scenes that contain the objects. We evaluate our method using various interesting scene recognition datasets. To the best of our knowledge, it is the first human-informed few-shot object detection framework for autonomous exploration.
|
|
15:30-15:40, Paper MoB-5.9 | |
Map-Free Lidar Odometry (MFLO) Using a Range Flow Constraint and Point Patch Covariances |
|
Lesak, Mark C. | United States Military Academy |
Gonzalez, Daniel | Aurora Flight Sciences |
Zhou, Michelle W. | Cornell University |
Petruska, Andrew J. | Colorado School of Mines |
Keywords: Localization, Range Sensing, Mining Robotics
Abstract: We present a lightweight real-time method to extract 3D ego-motion using a range flow constraint equation, point patch covariance, and a least squares solution. Our method exploits the structured data provided by range sensors, like rotating LiDARs, to attain 6 DOF odometry without building a map or scan-matching. To evaluate the performance of MFLO, a quadrotor was flown in various environments, and results indicate that MFLO matches and sometimes exceeds the performance of other LiDAR-based odometry methods while using fewer computational resources. In underground environments, MFLO captured 95.7% of the total vertical displacement for a 17.5m translation upwards through a missile silo, the most of any other LiDAR algorithm evaluated in this study, and captured 92.8% of the total translation for a 42m translation through an underground mine. In a motion capture lab, MFLO performed with only a 0.89-2.87% displacement percent error and 1.03-2.97% in final position error comparing to ground truth, making it the most consistent LiDAR odometry algorithm without mapping.
|
|
MoB-6 |
Rm6 (Room D) |
Aerial Systems 2 |
Regular session |
Chair: Floreano, Dario | Ecole Polytechnique Federal, Lausanne |
Co-Chair: Shimonomura, Kazuhiro | Ritsumeikan University |
|
14:10-14:20, Paper MoB-6.1 | |
Rotor Array Synergies for Aerial Modular Reconfigurable Robots |
|
Moshirian, Benjamin Nabil | Queensland University of Technology |
Pounds, Pauline | The University of Queensland |
Keywords: Aerial Systems: Applications, Cellular and Modular Robots, Aerial Systems: Mechanics and Control
Abstract: Aerial Modular Reconfigurable Robots (AMRRs) are scalable systems consisting of rotor modules capable of rearrangement during flight. The potential to dynamically change any shape for a given task poses the question: what arrangements offer the most aerodynamic benefit for the task of flying? Answering this requires understanding how adjacent rotors in various configurations influence each another. Intuitively, aerodynamic models such as momentum theory suggest that close rotor proximity decreases performance due to the upstream rotor flow fields interacting. However, effects such as vortex interaction or viscous flow entrainment (used by the Dyson bladeless fan) may offer benefits not captured by the modelling assumptions of computational analysis or simulation. Thus, this work takes an experimental approach, testing thrust performance of rotors in independent configurations of lines, square lattices, and hexagons with various inter-rotor spacings. It was found that inter-rotor spacing did not significantly change thrust performance, but that hexagonal arrangements outperformed line and grid lattices. Smoke tests indicated that hexagon configurations entrained air in the central cavity resulting in a thrust improvement. An inter-rotor spacing of 1.51 rotor diameters gave the best performance increase, roughly equal to that of an additional rotor. This suggests that by placing rotors in an array of six hollow hexagonal honeycombs, thrust performance could theoretically be increased by up to 27.3 per cent, for no additional mass.
|
|
14:20-14:30, Paper MoB-6.2 | |
Precise Position Control of a Multi-Rotor UAV with a Cable-Suspended Mechanism During Water Sampling |
|
Panetsos, Fotis | National Technical University of Athens |
Karras, George | National Technical University of Athens |
Aspragkathos, Sotiris | NTUA |
Kyriakopoulos, Kostas | National Technical Univ. of Athens |
Keywords: Aerial Systems: Applications, Sensor Fusion, Sensor-based Control
Abstract: This paper addresses the problem of water sampling by using a multirotor UAV with a cable-suspended mechanism. In order to ensure the safe execution of the sampling procedure and the stabilization of the vehicle, the disturbances, induced by the water flow and transferred through the cable, have to be identified. Specifically, an estimate of the disturbances is extracted by integrating a depth sensor, a load cell, an ultrasonic sensor and a downward-looking camera into the UAV’s sensor suite and fusing the respective measurements. Gaussian Processes are afterwards employed so as to learn the uncertain disturbances in real time and in a non-parametric manner. The predicted disturbances are incorporated into a geometric control scheme which is capable of stabilizing the UAV above the desired sampling position while compensating for the aforementioned disturbances. The performance of the proposed control strategy is demonstrated through both simulation and experimental results.
|
|
14:30-14:40, Paper MoB-6.3 | |
Multirotor Long-Reach Aerial Pruning with Wire-Suspended Saber Saw |
|
Miyazaki, Ryo | Ritsumeikan University |
Matori, Wataru | Ritsumeikan University |
Kominami, Takamasa | Ritsumeikan University |
Paul, Hannibal | Ritsumeikan University |
Shimonomura, Kazuhiro | Ritsumeikan University |
Keywords: Aerial Systems: Applications, Field Robots, Mobile Manipulation
Abstract: Pruning work at high altitude is a dangerous work with a high risk of accidents for human workers. In this research, we propose a multirotor flying robot that is equipped with a wire-suspended device and performs pruning task. We use a saber saw as a cutting tool. If the cutting tool is installed on the body of the multirotor platform, it is difficult for the flying robot to approach the desired work point if there are obstacles such as other branches around the target branch to be pruned. Therefore, in this study, we propose a saber saw suspended from the body of the multirotor platform with two wires. The wire-suspended device is equipped with a saber saw and four ducted fans that produces thrust in any direction on the horizontal plane. This ducted fan system can be used to suppress the swing of the wire-suspended device to make positioning of the saber saw blade to the target point easier, and to improve the efficiency of the cutting and reduce the cutting time by providing a pushing force to the saber saw. As a result, the pruning work could be performed efficiently. Experiments have demonstrated that aerial pruning is possible using the long-reach wire-suspended saber saw.
|
|
14:40-14:50, Paper MoB-6.4 | |
Unmanned Aircraft System-Based Radiological Mapping of Buildings |
|
Lazna, Tomas | Brno University of Technology |
Gabrlik, Petr | Brno University of Technology |
Sladek, Petr | Nuclear Science and Instrumentation Laboratory, International At |
Jilek, Tomas | Brno University of Technology |
Zalud, Ludek | Brno University of Technology |
Keywords: Aerial Systems: Applications, Robotics in Hazardous Fields, Simulation and Animation
Abstract: The article focuses on acquiring a 3D radiation map of a building via a two-phase survey performed with an unmanned aircraft system (UAS). First, a model of the studied building is created by means of photogrammetry. Then, radiation data are collected using a 2-inch NaI(Tl) detector in a regular grid at a distance of 2 m from all accessible surfaces of the building (i.e., the walls and the roof). The data are then georeferenced, filtered, projected to the building model, and interpolated to yield the detailed radiation map. A method to estimate the parameters of the radiation sources located inside is introduced and successfully tested, providing a localization accuracy in the order of meters. This task is aimed to deliver the proof of concept for employing such a mapping technique within nuclear safeguards. The acquisition of the radiation data was performed via a manual flight to ensure an appropriate safety level; in this context, it should be noted that the autonomous flight mode still requires major improvements in terms of safety.
|
|
14:50-15:00, Paper MoB-6.5 | |
Towards Edible Drones for Rescue Missions: Design and Flight of Nutritional Wings |
|
Kwak, Bokeon | EPFL |
Shintake, Jun | University of Electro-Communications |
Zhang, Lu | Wageningen University & Research |
Floreano, Dario | Ecole Polytechnique Federal, Lausanne |
Keywords: Aerial Systems: Applications, Search and Rescue Robots
Abstract: Drones have shown to be useful aerial vehicles for unmanned transport missions such as food and medical supply delivery. This can be leveraged to deliver life-saving nutrition and medicine for people in emergency situations. However, commercial drones can generally only carry 10 % ~ 30 % of their own mass as payload, which limits the amount of food delivery in a single flight. One novel solution to noticeably increase the food carrying ratio of a drone, is recreating some structures of a drone, such as the wings, with edible materials. We thus propose a drone, which is no longer only a food-transporting aircraft, but itself is partially edible, increasing its food-carrying mass ratio to 50 %, owing to its edible wings. Furthermore, should the edible drone be left behind in the environment after performing its task in an emergency situation, it will be more biodegradable than its non-edible counterpart, leaving less waste in the environment. Here we describe the choice of materials and scalable design of edible wings, and validate the method in a flight-capable prototype that can provide 300 kcal and carry a payload of 80 g of water.
|
|
15:00-15:10, Paper MoB-6.6 | |
Data-Efficient Collaborative Decentralized Thermal-Inertial Odometry |
|
Polizzi, Vincenzo | University of Zurich and ETH Zurich |
Hewitt, Robert | Jet Propulsion Laboratory |
Hidalgo Carrio, Javier | University of Zurich and ETH Zurich |
Delaune, Jeff | Jet Propulsion Laboratory |
Scaramuzza, Davide | University of Zurich |
Keywords: Aerial Systems: Applications, Space Robotics and Automation, Multi-Robot SLAM
Abstract: We propose a system solution to achieve data-efficient, decentralized state estimation for a team of flying robots using thermal images and inertial measurements. Each robot can fly independently, and exchange data when possible to refine its state estimate. Our system front-end applies an online photometric calibration to refine the thermal images so as to enhance feature tracking and place recognition. Our system back-end uses a covariance-intersection fusion strategy to neglect the cross-correlation between agents so as to lower memory usage and computational cost. The communication pipeline uses Vector of Locally Aggregated Descriptors (VLAD) to construct a request-response policy that requires low bandwidth usage. We test our collaborative method on both synthetic and real-world data. Our results show that the proposed method improves by up to 46 % trajectory estimation with respect to an individual-agent approach, while reducing up to 89 % the communication exchange. Datasets and code are released to the public, extending the already-public JPL xVIO library.
|
|
15:10-15:20, Paper MoB-6.7 | |
Enforcing Vision-Based Localization Using Perception Constrained N-MPC for Multi-Rotor Aerial Vehicles |
|
Jacquet, Martin | LAAS, CNRS |
Franchi, Antonio | University of Twente |
Keywords: Aerial Systems: Perception and Autonomy, Aerial Systems: Applications, Aerial Systems: Mechanics and Control
Abstract: This work introduces a Nonlinear Model Predictive Control (N-MPC) for camera-equipped Unmanned Aerial Vehicles (UAVs), which controls at the motor level the UAV motion to ensure the quality of vision-based state estimation while performing other tasks. The controller ensures visibility over a sufficient amount of features, while optimizing their coverage, based on an assessment of the estimation quality. The controller works for the very broad class of generic multi-rotor UAVs, including platforms with any number of propellers, which can be both collinear, as in the quadrotor, and fixedly-tilted. The low-level inputs are computed in real-time and realistically constrained, in terms of maximum motor torque. This allows the platform to exploit its full actuation capabilities to maintain the visibility over the set of points of interest. Our implementation is tested in Gazebo simulations and in mocap-free real experiments, and features a visual-inertial state estimation based on Kalman filter. The software is provided open-source.
|
|
15:20-15:30, Paper MoB-6.8 | |
Visual Loop Closure Detection for a Future Mars Science Helicopter |
|
Dietsche, Alexander | ETH Zurich |
Ott, Lionel | ETH Zurich |
Siegwart, Roland | ETH Zurich |
Brockers, Roland | California Institute of Technology |
Keywords: Aerial Systems: Perception and Autonomy, Space Robotics and Automation
Abstract: Future Mars Rotorcrafts will require the ability to precisely navigate to previously visited locations in order to return to a safe landing site or execute precise scientific measurements such as sample acquisition or targeted sensing. To enable a future Mars Science Helicopter to perform in-flight loop closures, we present an on-board visual loop closure detection system based on a Bag-of-Words (BoW) approach that is efficient enough to run in real-time on the anticipated computationally constrained flight avionics. Our system establishes image-to-image associations between incoming images of a downward-looking navigation camera and previously observed geo-tagged keyframes stored in a database. The system extracts ORB features which are quantized into a BoW histogram using a custom 1 million words hierarchical vocabulary, trained on synthetic images from a Mars simulation. An efficient database query using an inverted index produces a set of candidate frames which we check for geometrical consistency. For efficient feature matching, we leverage the vocabulary to perform fast approximate nearest neighbor search. The geometrical check accepts loop closure pairs whose essential matrix is supported by a minimum number of feature matches, which are pre-selected based on a rotational consistency check. The vocabulary and the methods used for the geometrical consistency checks were chosen to maximize the performance while allowing real-time execution on a computationally constrained embedded processor. We demonstrate and evaluate the proposed system both on simulated and real-world data, including flight data from the Mars Helicopter Ingenuity.
|
|
15:30-15:40, Paper MoB-6.9 | |
PencilNet: Zero-Shot Sim-To-Real Transfer Learning for Robust Gate Perception in Autonomous Drone Racing |
|
Pham, Huy | Aarhus University |
Sarabakha, Andriy | Nanyang Technological University |
Odnoshyvkin, Mykola | Technical University of Munich |
Kayacan, Erdal | Aarhus University |
Keywords: Aerial Systems: Perception and Autonomy, Aerial Systems: Applications, Aerial Systems: Mechanics and Control
Abstract: In autonomous and mobile robotics, one of the main challenges is the robust on-the-fly perception of the environment, which is often unknown and dynamic, like in autonomous drone racing. In this work, we propose a novel deep neural network-based perception method for racing gate detection -- PencilNet-- which relies on a lightweight neural network backbone on top of a pencil filter. This approach unifies predictions of the gates' 2D position, distance, and orientation in a single pose tuple. We show that our method is effective for zero-shot sim-to-real transfer learning that does not need any real-world training samples. Moreover, our framework is highly robust to illumination changes commonly seen under rapid flight compared to state-of-art methods. A thorough set of experiments demonstrates the effectiveness of this approach in multiple challenging scenarios, where the drone completes various tracks under different lighting conditions.
|
|
MoB-7 |
Rm7 (Room E) |
Medical Robots and Systems 2 |
Regular session |
Chair: Haddadin, Sami | Technical University of Munich |
Co-Chair: Chen, Cheng-Wei | National Taiwan University |
|
14:10-14:20, Paper MoB-7.1 | |
Force-Guided Alignment and File Feedrate Control for Robot-Assisted Endodontic Treatment |
|
Cheng, Hao-Fang | National Taiwan University |
Li, Yi-Chan | National Taiwan University |
Ho, Yi-Ching | Taipei Veterans General Hospital |
Chen, Cheng-Wei | National Taiwan University |
Keywords: Medical Robots and Systems, Compliance and Impedance Control, Force and Tactile Sensing
Abstract: Due to the precise manipulations required in dental surgery, robotic technologies have been applied to dentistry. So far, most dental robots are designed for implant surgery, helping dentists accurately place the implant to the desired position and depth. This paper presents the DentiBot, the first robot designed for dental endodontic treatment. Without visual feedback, the DentiBot is integrated with a force and torque sensor to monitor the contact between the root canal and endodontic file. Additionally, DentiBot is implemented with force-guided alignment and file feedrate control to autonomously adjust surgical path and compensate for patient movement in real-time while protecting against endodontic file fracture. The feasibility of robot-assisted endodontic treatment is verified by the pre-clinical evaluation performed on acrylic root canal models.
|
|
14:20-14:30, Paper MoB-7.2 | |
A Training-Evaluation Method for Nursing Telerobot Operator with Unsupervised Trajectory Segmentation |
|
Xie, Jiexin | Fudan University |
Zhu, DeLiang | Hebei University of Technology |
Wang, Jiaxin | Hebei University of Technology |
Guo, Shijie | Hebei University of Technology |
Keywords: Telerobotics and Teleoperation, Service Robotics, Learning from Demonstration
Abstract: To cope with the difficulty of training and evaluation for nursing telerobot operator. This paper proposes a training-evaluation method for operator with unsupervised trajectory segmentation. To evaluate the dexterity and procedural knowledge of the operators objectively, we propose a new unsupervised model TSC-CRP that can automatically segment trajectory from nursing robotic training sessions. By comparing the segmented sub-trajectories and the standard sub-trajectory process, the method can provide objective evaluation and meaningful feedback without the intervention from experts. Experiments show that TSC-CRP has higher segmentation accuracy than other unsupervised methods, and it can identify the operators with different skill levels. In practical, the proposed training-evaluation system allows to provide an in-depth analysis of operator action to assess their skills precisely.
|
|
14:30-14:40, Paper MoB-7.3 | |
Visual Servo Control of COVID-19 Nasopharyngeal Swab Sampling Robot |
|
Hwang, Guebin | KIST |
Lee, Jongwon | Korea Institute of Science and Technology |
Yang, Sungwook | Korea Institute of Science and Technology |
Keywords: Medical Robots and Systems, Visual Servoing, Computer Vision for Automation
Abstract: In this study, we present a visual servo control framework for fully automated nasopharyngeal swab robots. The proposed framework incorporates a deep learning-based nostril detection with a cascade approach to reliably identify the nostrils with high accuracy in real time. In addition, a partitioned visual servoing scheme that combines image-based visual servoing with axial control is formulated for accurately positioning the sampling swabs at the nostril with a multi-DOF robot arm. As the visual servoing is designed to minimize an error between the detected nostril and the swab, it can compensate for potential errors in real operation, such as positioning error by inaccurate camera-robot calibration and kinematic error by unavoidable swab deflection. The performance of the visual servo control was tested on a head phantom model for 30 unusedswabs, and then compared with a method referring to only the 3D nostril target for control. Consequently, the swabs reached the nostril target with less than an average error of 1.2±0.5 mm and a maximum error of 2.0 mm via the visual servo control, while the operation without visual feedback yielded an average error of 10.6±2.3 mm and a maximum error of 16.2 mm. The partitioned visual servoing allows the swab to rapidly converge to the nostril target within 1.0 s without control instability. Finally, the swab placement at the nostril among the entire procedure of fully automated NP swab was successfully demonstrated on a human subject via the visual servo control.
|
|
14:40-14:50, Paper MoB-7.4 | |
Development of a Cable-Driven Growing Sling to Assist Patient Transfer |
|
Lee, MyungJoong | Korea Institute of Science and Technology, University of Science |
Moon, Yonghwan | Korea Institute of Science and Technology |
Kim, Jeongryul | Korea Institute of Science and Technology |
Lee, Seungjun | Korea Institute of Science and Technology |
Kim, Keri | Korea Institute of Science and Technology |
In, HyunKi | Korea Institute of Science and Technology |
Keywords: Medical Robots and Systems, Physically Assistive Devices, Tendon/Wire Mechanism
Abstract: As the aging of society continues to accelerate, the number of elderly patients is increasing, as is the demand for manpower to care for them. In particular, there is an urgent need for bedridden patient care. However, limitations in the supply of human resources have caused an increase in the burden for care. In particular, nursing personnel often experience inconvenience and difficulties owing to the great deal of effort required to transfer a patient from bed to wheelchair, or vice versa. The most difficult process during the patient transfer is inserting the sling under the patient. Aiming to solve this problem, a mechanical Growing Sling was devised. The proposed sling adapts a growing mechanism comprising a low-friction fabric and steel shafts, and the sling is inserted under the patient by towing the steel shafts with cables connected to a motor. For the comfort and safety of the sling insertion, the required towing force was analyzed to find the minimum diameter of the shaft. The results from experimental evaluations using the proposed sling verified that it can be inserted under the patient without moving the patient, and with an acceptable level of pressure being applied to the patient.
|
|
14:50-15:00, Paper MoB-7.5 | |
Development of an Inherently Safe Nasopharyngeal Swab Sampling Robot Using a Force Restriction Mechanism |
|
Maeng, Chan-Young | Korea University of Technology and Education (KOREATECH) |
Yoon, JongJun | Koreatech |
Kim, Do-yun | KOREATECH University |
Lee, Jongwon | Korea Institute of Science and Technology |
Kim, Yong-Jae | Korea University of Technology and Education |
Keywords: Medical Robots and Systems, Mechanism Design, Compliant Joints and Mechanisms
Abstract: The demand for autonomous nasopharyngeal swab sampling robot systems is increasing owing to the recent outbreak of the respiratory virus pandemic. To protect the medical staff from infection, automatic upper respiratory sample collecting robotic systems are needed. The nasopharyngeal swab sampling robot proposed in this study is composed of a unique force restriction mechanism, a precise 3-axis force sensor, and a compact remote center of motion (RCM) mechanism, which are designed to enhance safety and efficiency. The proposed force restriction mechanism has a constant repulsive force finely adjustable using specially designed leaf spring structures. The force sensor is a capacitance-type 3-axis force sensor based on flexure mechanisms. Owing to the RCM mechanism, the distal end of the swab remains stationary. The effectiveness of the proposed robot was verified by various experiments, including restriction force measurement and a sampling success rate test.
|
|
15:00-15:10, Paper MoB-7.6 | |
On the Design of Integrated Tele-Monitoring/Operation System for Therapeutic Devices in Isolation Intensive Care Unit |
|
Song, ChangSeob | Korea Institue of Science and Technology |
Yang, Gyungtak | Korea Institute of Science and Technology |
Park, Sungwoo | Korea University, Korea Institute of Science and Technology |
Jang, Namseon | Korea Institute of Science and Technology |
Jeon, Soobin | Korea Advanced Institute of Science and Technology (KAIST) |
Oh, Sang-Rok | KIST |
Hwang, Donghyun | Korea Institute of Science and Technology |
Keywords: Telerobotics and Teleoperation, Medical Robots and Systems, Mechanism Design
Abstract: We design a central controller system (CCS) and a tele-controlled system (TCS) with an aim of developing the integrated tele-monitoring/operation system that can enable the medical staff to tele-monitor the state of therapeutic devices utilized in the isolation intensive care unit (ICU) and to tele-operate its user interfaces. To achieve this aim, we survey the medical staff for medical requirements first and define the design guideline for tele-monitoring/operation functionality and field applicability. In designing the CCS, we focus on realizing the device having intuitive and user-friendly interfaces so that the medical staff can use the device conveniently without pre-training. Further, we attempt to implement the TCS capable of manipulating various types of user interfaces of the therapeutic device (e.g., touch screen, buttons, and knobs) without failure. As two core components of the TCS, the precision XY-positioner having a maximum positioning error of about 0.695 mm and the end-effector having three-degrees-of- freedom motion (i.e., pressing, gripping, and rotating) are applied to the system. In the experiment conducted for assessing functionality, it is investigated that the time taken to complete the tele-operation after logging into the CCS is less than 1 minute. Furthermore, the result of field demonstration for focus group shows that the proposed system could be applied practically to the medical fields when the functional reliability is improved.
|
|
15:10-15:20, Paper MoB-7.7 | |
Tactile Robotic Telemedicine for Safe Remote Diagnostics in Times ofCorona: System Design, Feasibility and Usability Study |
|
Naceri, Abdeldjallil | TUM |
Elsner, Jean | Technical University of Munich |
Troebinger, Mario | Technical University of Munich |
Sadeghian, Hamid | Technical University of Munich |
Johannsmeier, Lars | Technical University Munich |
Voigt, Florian | Technical University of Munich |
Chen, Xiao | Technical University of Munich |
Macari, Daniela | Franka Emika |
Jaehne, Christoph | Franka Emika GmbH |
Berlet, Maximilian | TUM |
Fuchtmann, Jonas | TUM |
Figueredo, Luis Felipe Cruz | Technical University of Munich (TUM) |
Feussner, Hubertus | Klinikum Rechts Der Isar Der TUM |
Wilhelm, Dirk | Klinikum Rechts Der Isar Der TUM |
Haddadin, Sami | Technical University of Munich |
Keywords: Telerobotics and Teleoperation, Medical Robots and Systems, Human Factors and Human-in-the-Loop
Abstract: The current crisis surrounding the COVID-19pandemic demonstrates the amount of responsibility and the workload on our healthcare system and, above all, on the medical staff around the world. In this work, we propose a promising approach to overcome this problem using robot-assisted telediagnostics, which allows medical experts to examine patients from distance. The designed telediagnostic system consists of two robotic arms. Each robot is located at the doctor and patient sites. Such a system enables the doctor to have a direct conversation via telepresence and to examine patients through robot-assisted inspection (guided tactile and audiovi-sual contact). The proposed bilateral teleoperation system is redundant in terms of teleoperation control algorithms and visual feedback. Specifically, we implemented two main control modes: joint-based and displacement-based teleoperation. The joint-based mode was implemented due to its high transparency and ease of mapping between Leader and Follower whereas the displacement-based is highly flexible in terms of relative pose mapping and null-space control. Tracking tests between Leader and Follower were conducted on our system using both wired and wireless connections. Moreover, our system was tested by seven medical doctors in two experiments. User studies demonstrated the system’s usability and it was successfully validated by the medical experts.
|
|
15:20-15:30, Paper MoB-7.8 | |
Novel Supernumerary Robotic Limb Based on Variable Stiffness Actuators for Hemiplegic Patients Assistance |
|
Hasanen, Basma | Khalifa University of Science and Technology |
Awad, Mohammad I. | Khalifa University |
Boushaki, Mohamed | Scuola Superiore S'Anna Pisa |
Niu, Zhenwei | Khalifa University Robotics and Autonomous Robotics Center |
Ramadan, Mohammed | Khalifa University |
Hussain, Irfan | Khalifa University |
Keywords: Medical Robots and Systems, Compliant Joints and Mechanisms, Safety in HRI
Abstract: Loss in upper extremity motor control and function is an unremitting symptom in post-stroke patients. This would impose hardships in accomplishing their Daily-Life-Activities. Supernumerary-Robotic-Limbs (SRLs) were introduced as a solution to regain the lost degrees-of-freedom (DoFs) by introducing an independent new-limb. The actuation systems in SRL can be categorized into rigid and soft actuators. Soft Actuators have proven advantageous over their rigid counterparts through their intrinsic safety, cost and energy efficiency. However, they suffer from low stiffness which jeopardizes their accuracy. Variable-Stiffness-Actuators (VSAs) are newly developed technologies that has been proven to ensure both accuracy and safety. In this paper, we introduce the novel Supernumerary-Robotic-Limb based on Variable-Stiffness-Actuators. Based on our knowledge, the proposed proof-of-concept SRL is the first which utilizes Variable Stiffness Actuators. The developed SRL would assist post-stroke patients in Bi-manual tasks e.g. eating-with-fork-and-knife. The modeling, design and realization of the system are illustrated. The proposed SRL was evaluated and verified for its accuracy via predefined trajectories. The safety was verified through the utilization of the momentum observer for collision detection and several post-collision reaction strategies were evaluated through the Soft-Tissue-Injury Test. The assistance process is qualitatively verified through standard user-satisfaction questioners.
|
|
MoB-8 |
Rm8 (Room F) |
Mechanism Design 2 |
Regular session |
Chair: Kiguchi, Kazuo | Kyushu University |
Co-Chair: Watanabe, Masahiro | Tohoku University |
|
14:10-14:20, Paper MoB-8.1 | |
Plate Harmonic Reducer with a Profiled Groove Wave Generator |
|
You, Seungbin | Seoul National University |
Jung, Jaesug | Technical University of Munich |
Sung, Eunho | Seoul National University |
Park, Jaeheung | Seoul National University |
Keywords: Mechanism Design, Actuation and Joint Mechanisms, Engineering for Robotic Systems
Abstract: In this study, a mechanism that realizes a novel structural form of the harmonic reducer is introduced. Conventional robots often use various mechanical reducers owing to low torque and high-speed characteristics of electric motors. Among them, harmonic reducers are frequently used because of their compact size and backlash-free precision. The plate harmonic reducer which uses the same topological geometry and reducing mechanism as the conventional harmonic reducer is a novel type of strain gear that changes its shape to a plate form for axial deformation. It has unique differences in terms of axial thickness, torsional stiffness, and efficiency due to its morphological characteristics. This study introduces and analyzes the reducing principle of the plate harmonic reducer and describes the methodological solutions for realization. Finally, the theoretical performance improvement and operating feasibility of the plate harmonic reducer are analyzed using finite element method and a 3D-printed prototype model.
|
|
14:20-14:30, Paper MoB-8.2 | |
Experimental Study of the Mechanical Properties of a Spherical Parallel Link Mechanism with Arc Prismatic Pairs |
|
Saiki, Naoto | Tohoku University |
Tadakuma, Kenjiro | Tohoku University |
Watanabe, Masahiro | Tohoku University |
Abe, Kazuki | Tohoku University |
Konyo, Masashi | Tohoku University |
Tadokoro, Satoshi | Tohoku University |
Keywords: Mechanism Design, Actuation and Joint Mechanisms, Parallel Robots
Abstract: A two-degrees-of-freedom spherical parallel link mechanism (2-DOF SPM) was designed to ensure that it only has rotational degrees of freedom in two directions around a fixed center. In general, 2-DOF SPM includes passive rotating pairs, and at least two actuators are needed to change the end-effector posture. The arrangement of the links and pairs determines the characteristics and performance of SPM, so 2-DOF SPMs were designed considering various requirements, such as output torque, accuracy, and space constraints for applications. To satisfy these requirements, arc prismatic pairs can be used in SPMs. In order to use in SPMs, as for arc prismatic pairs, the concrete configuration and design methods for arc prismatic pairs have been studied. Furthermore, in order to compensate for the influence of friction on the positioning error, the control model considering the friction has been proposed by constructing a feedback loop containing experimentally found parameters. However, the conventional model is not a mechanical model of friction. Therefore, it is not suitable for calculating the friction force and understanding how the limit of the workspace changes due to the influence of friction. In this study, we construct a mechanical friction model considering the intersection angle change between the input and the rail slide direction. In addition, using the friction model, we clarify the influences of friction on the workspace and driving the SPM to realize high-performance 2-DOF SPM. First, we theoretically clarified the influence of friction on the workspace by considering the case of a slider-type differential-drive 2-DOF SPM. Second, the driving torque was experimentally measured, and the influence of friction on driving was examined.
|
|
14:30-14:40, Paper MoB-8.3 | |
Manipulator Equipped with Differential Wrist Mechanism to Combine the Torque of Two Motors to Increase Fingertip Force and Wrist Torque |
|
Bao, Yuanhao | Hiroshima University |
Takaki, Takeshi | Hiroshima University |
Keywords: Mechanism Design, Actuation and Joint Mechanisms, Multifingered Hands
Abstract: Abstract—A lightweight and high-output manipulator is introduced in this study. The manipulator comprises two parts: a robot hand part and a wrist part. The total weight of the robot hand part is approximately 450 g, and its size is almost the same as that of a human hand. Furthermore, a design concept of a differential mechanism is presented in this study. In contrast to traditional mechanisms, in which the movement of a single motor corresponds to a single module, the proposed differential mechanism can superimpose the output forces of multiple motors and act on the movement of single or multiple modules; consequently, the output force is doubled, the low-power requirement of the motor, which drives the robot hand as the end effector to rotate around the wrist, is met, and the driving force amplification mechanism amplifies the gripping force of the fingers. The proposed differential mechanism is incorporated in the wrist part. The robot hand part is responsible for the fast grasping of objects. The wrist part is responsible for increasing the output force of the fingers to ensure a firm grasp. In addition, it enables the robot hand to rotate around the wrist. It takes 0.32 s for the fingers to transition from moving to touching the object. It takes approximately 0.9 s to grasp the object firmly, and the output force of the fingertips can reach 25.5 N. The robot hand can rotate 135 °around the wrist, and the rotation speed is 118 deg/s.
|
|
14:40-14:50, Paper MoB-8.4 | |
Design of a Modular Continuum Robot with Alterable Compliance Using Tubular-Actuation |
|
Wang, Mingyuan | Shanghai Robotics Institute, School of Mechatronic Engineering An |
Du, Liang | Shanghai University |
Bao, Sheng | Shanghai University |
Yuan, Jianjun | Shanghai University, China |
Zhou, Jinshu | Shanghai University |
Ma, Shugen | Ritsumeikan University |
Keywords: Mechanism Design, Compliant Joints and Mechanisms, Soft Robot Applications
Abstract: Compliance is good. However, it is challenging for one compliant continuum robot to finish both high precision manipulation and environmental-adapted motions. In this paper, a modular con-tinuum robot with the alterable compliance characteristic is pro-posed. Besides, an actuation module is also proposed using a tubular-screw mechanism for non-slippage transmission. Kine-matic analyses and dynamic co-simulation are performed to study the continuum robot. Furthermore, two potential applica-tion scenarios of pick-and-place manipulation and confined space navigating are carried out to demonstrate the advantages of the alterable compliance design. This study presents a capable con-tinuum robotic solution for non-structural inspection tasks, with potential for in-situ applications in restricted and hazardous en-vironments.
|
|
14:50-15:00, Paper MoB-8.5 | |
3D-Printable Low-Reduction Cycloidal Gearing for Robotics |
|
Roozing, Wesley | University of Twente |
Roozing, Glenn | Auto Elect B.V |
Keywords: Actuation and Joint Mechanisms, Mechanism Design
Abstract: The recent trend towards low reduction gearing in robotic actuation has revitalised the need for high-performance gearing concepts. In this work we propose compact low-reduction cycloidal gearing, that is 3D-printable and combined with off-the-shelf components. This approach presents an enormous potential for high performance-to-cost implementations. After discussing parameter selection and design considerations, we present a prototype that is combined with a low-cost brushless motor to demonstrate its potential. Extensive experimental results demonstrate high performance, including >40 Nm torque, low friction and play, and high impact robustness. The results show that the proposed approach can yield viable gearbox designs.
|
|
15:00-15:10, Paper MoB-8.6 | |
Fold-Based Complex Joints for a 3 DoF 3R Parallel Robot Design |
|
Merz, Judith U. | RWTH Aachen University |
Huber, Markus | Technical University of Munich |
Irlinger, Franz | Technische Universität München |
Lueth, Tim C. | Technical University of Munich |
Pfitzner, Janik | RWTH Aachen University |
Corves, Burkhard | RWTH Aachen University |
Keywords: Actuation and Joint Mechanisms, Parallel Robots, Kinematics
Abstract: This contribution demonstrates the usage of fold-based joints to create a novel 3 DoF 3R(RPaR) parallel robot design. Multiple folding mechanisms are introduced, fulfilling the function of revolute, prismatic, and spherical joints. Folding mechanisms are here tested regarding their applicability in parallel kinematic robots taking advantage of beneficial properties such as increased stiffness, flat-foldability and compressed states, easy cleaning as well as lightweight designs. The designed delta robot structure is then analysed for its motion behaviour, workspace dimensions and validated by a 3D printed model. Further, scalability possibilities are presented.
|
|
15:10-15:20, Paper MoB-8.7 | |
A 4-DoF Parallel Robot with a Built-In Gripper for Waste Sorting |
|
Leveziel, Maxence | FEMTO-ST Institute |
Laurent, Guillaume J. | Univ. Bourgogne Franche-Comté, ENSMM |
Haouas, Wissem | Femto-St |
Gauthier, Michael | FEMTO-ST Institute |
Dahmouche, Redwan | Université De Franche Comté |
Keywords: Mechanism Design, Parallel Robots, Industrial Robots
Abstract: This article presents a new robot concept dedicated to fast and energy-efficient waste sorting applications. This parallel robot can provide at the same time the three translations in space (3-DoF) and the opening/closing of a built-in gripper (1 additional DoF). The movement of the clamp is enabled thanks to a configurable platform at the end of the parallel structure. This platform is composed of a two-gear train gripper which is directly controlled by the 4 actuators attached to the base of the manipulator. The inverse kinematic, as well as the differential models, have been developed. A first prototype has been realized to validate this new parallel architecture for pick-and-toss tasks.
|
|
15:20-15:30, Paper MoB-8.8 | |
Adjustable Lever Mechanism with Double Parallel Link Platforms for Robotic Limbs |
|
Nishikawa, Satoshi | Kyushu University |
Tokunaga, Daigo | Kyushu University |
Kiguchi, Kazuo | Kyushu University |
Keywords: Mechanism Design, Parallel Robots
Abstract: For universal robotic limbs, having a large workspace with high stiffness and adjustable output properties is important to adapt to various situations. A combination of parallel mechanisms that can change output characteristics is promising to meet these demands. As such, we propose a lever mechanism with double parallel link platforms. This mechanism is composed of a lever mechanism with the effort point and the pivot point; each is supported by a parallel link mechanism. First, we calculated the differential kinematics of this mechanism. Next, we investigated the workspace of the mechanism. The proposed mechanism can reach nearer positions than the posture with the most shrinking actuators thanks to the three-dimensional movable effort point. Then, we confirmed that this mechanism could change the output force profile at the end-effector by changing the lever ratio. The main change is the directional change of the maximum output force. The change range is larger when the squatting depth is larger. The changing tendency of the shape of the maximum output force profile by the position of the pivot plate depends on the force balance of the actuators. These analytical results show the potential of the proposed mechanism and would aid in the design of this mechanism for robotic limbs.
|
|
15:30-15:40, Paper MoB-8.9 | |
Design and Characterization of 3D Printed, Open-Source Actuators for Legged Locomotion |
|
Urs, Karthik | University of Michigan |
Enninful Adu, Challen | University of Michigan |
Rouse, Elliott | University of Michigan / (Google) X |
Moore, Talia | University of Michigan |
Keywords: Mechanism Design, Methods and Tools for Robot System Design
Abstract: Impressive animal locomotion capabilities are mediated by the co-evolution of the skeletal morphology and muscular properties. Legged robot performance would also likely benefit from the co-optimization of actuators and leg morphology. However, development of custom actuators for legged robots is expensive and time consuming, discouraging application-specific actuator optimization. This paper presents open-source designs for two quasi-direct-drive actuators with performance regimes appropriate for an 8-15 kg robot, built from off the shelf and 3D-printed components for less than 200 USD each. The mechanical, electrical, and thermal properties of each actuator are characterized and compared to benchmark data. Actuators subjected to 420k strides of gait data experienced only a 2% reduction in efficiency and 26 mrad in backlash growth, demonstrating viability for rigorous and sustained research applications. We present a thermal solution that nearly doubles the thermally-driven torque limits of our plastic actuator design. The performance results are comparable to traditional metallic actuators for use in high-speed legged robots of the same scale. These 3D printed designs demonstrate an approach for designing and characterizing low-cost, highly customizable and reproducible actuators, democratizing the field of actuator design and enabling co-design and optimization of actuators and robot legs.
|
|
MoB-9 |
Rm9 (Room G) |
Object Detection, Segmentation and Categorization 2 |
Regular session |
Chair: Haddadin, Sami | Technical University of Munich |
Co-Chair: Ani, Mohammad | University of Birmingham |
|
14:10-14:20, Paper MoB-9.1 | |
Real-Time IMU-Based Learning: A Classification of Contact Materials |
|
Valle, Carlos Magno | Technical University of Munich |
Kurdas, Alexander Andreas | Technical University of Munich |
Pozo Fortunić, Edmundo | Technische Universität München |
Abdolshah, Saeed | Technical University of Munich |
Haddadin, Sami | Technical University of Munich |
Keywords: Object Detection, Segmentation and Categorization, AI-Enabled Robotics
Abstract: In modern highly dynamic robot manipulation, collisions between a robot and objects may be intentionally executed to improve performance. To distinguish between these deliberate contacts and accidental collisions beyond the limit of state-of-the-art human-robot interactions, new sensing approaches are required. This work seeks an easy-to-implement and real-time capable solution to detect the identity of the impacted material. We developed an inertial measurement unit (IMU) based setup that records vibration signals occurring after collisions. Furthermore, a data-set was generated in an unsupervised learning manner using the measurements of collision experiments with several materials commonly used in realistic applications. The data-set was used to train an artificial neural network to classify the type of material involved. Our results show that the neural net detects collisions and a detailed distinction between materials is achieved, even with estimating different human body parts. The unsupervised data-set generation allows for a simple integration of new classes, which provides broader applicability of our approach. As the calculations are running faster than the control cycle of the robot, the output of our classifier can be used in real-time to decide about the robot’s reaction behavior.
|
|
14:20-14:30, Paper MoB-9.2 | |
New Objects on the Road? No Problem, We’ll Learn Them Too |
|
Singh, Deepak Kumar | IIITH |
Rai, Shyam Nandan | IIIT Hyderabad |
K J, Joseph | Indian Institute of Technology Hyderabad |
Saluja, Rohit | IIIT Hyderabad |
Balasubramanian, Vineeth | Indian Institute of Technology, Hyderabad |
Arora, Chetan | Indian Institute of Technology, Delhi |
Subramanian, Anbumani | Intel |
Jawahar, C.V. | IIIT, Hyderabad |
Keywords: Object Detection, Segmentation and Categorization, Autonomous Vehicle Navigation, Vision-Based Navigation
Abstract: Object detection plays an essential role in providing localization, path planning, and decision making capabilities in autonomous navigation systems. However, existing object detection models are trained and tested on a fixed number of known classes. This setting makes the object detection model difficult to generalize well in real-world road scenarios while encountering an unknown object. We address this problem by introducing our framework that handles the issue of unknown object detection and updates the model when unknown object labels are available. Next, our solution includes three major components that address the inherent problems present in the road scene datasets. The novel components are a) Feature-Mix that improves the unknown object detection by widening the gap between known and unknown classes in latent feature space, b) Focal regression loss handling the problem of improving small object detection and intra-class scale variation, and c) Curriculum learning further enhances the detection of small objects. We use Indian Driving Dataset (IDD) and Berkeley Deep Drive (BDD) dataset for evaluation. Our solution provides state-of-the-art performance on open-world evaluation metrics. We hope this work will create new directions for open-world object detection for road scenes, making it more reliable and robust autonomous systems.
|
|
14:30-14:40, Paper MoB-9.3 | |
Conditional Patch-Based Domain Randomization: Improving Texture Domain Randomization Using Natural Image Patches |
|
Ani, Mohammad | University of Birmingham |
Basevi, Hector | University of Birmingham |
Leonardis, Ales | University of Birmingham |
Keywords: Object Detection, Segmentation and Categorization, Deep Learning for Visual Perception, Data Sets for Robotic Vision
Abstract: Using Domain Randomized synthetic data for training deep learning systems is a promising approach for addressing the data and the labeling requirements for supervised techniques to bridge the gap between simulation and the real world. We propose a novel approach for generating and applying class-specific Domain Randomization textures by using randomly cropped image patches from real-world data. In evaluation against the current Domain Randomization texture application techniques, our approach outperforms the highest performing technique by 4.94 AP and 6.71 AP when solving object detection and semantic segmentation tasks on the YCB-M real-world robotics dataset. Our approach is a fast and inexpensive way of generating Domain Randomized textures while avoiding the need to handcraft texture distributions currently being used.
|
|
14:40-14:50, Paper MoB-9.4 | |
CloudAttention: Efficient Multi-Scale Attention Scheme for 3D Point Cloud Learning |
|
Saleh, Mahdi | Technical University Munich |
Wang, Yige | Technical University of Munich |
Navab, Nassir | TU Munich |
Busam, Benjamin | Technical University of Munich |
Tombari, Federico | Technische Universität München |
Keywords: Object Detection, Segmentation and Categorization, Deep Learning for Visual Perception, RGB-D Perception
Abstract: Processing 3D data efficiently has always been a challenge. Spatial operations on large-scale point clouds, stored as sparse data, require extra cost. Attracted by the success of transformers, researchers are using multi-head attention for vision tasks. However, attention calculations in transformers come with quadratic complexity in the number of inputs and miss spatial intuition on sets like point clouds. We redesign set transformers in this work and incorporate them into a hierarchical framework for shape classification and part and scene segmentation. We propose our local attention unit, which captures features in a spatial neighborhood. We also compute efficient and dynamic global cross attentions by leveraging sampling and grouping at each iteration. Finally, to mitigate the non-heterogeneity of point clouds, we propose an efficient Multi-Scale Tokenization (MST), which extracts scale-invariant tokens for attention operations. The proposed hierarchical model achieves state-of-the-art shape classification in mean accuracy and yields results on par with the previous segmentation methods while requiring significantly fewer computations. Our proposed architecture predicts segmentation labels with around half the latency and parameter count of the previous most efficient method with comparable performance. The code is available at https://github.com/YigeWang-WHU/CloudAttention.
|
|
14:50-15:00, Paper MoB-9.5 | |
Fast Hierarchical Learning for Few-Shot Object Detection |
|
She, Yihang | ETH Zurich |
Bhat, Goutam | ETH |
Danelljan, Martin | ETH Zurich |
Yu, Fisher | ETH Zurich |
Keywords: Object Detection, Segmentation and Categorization, Incremental Learning
Abstract: Transfer learning based approaches have recently achieved promising results on the few-shot detection task. These approaches however suffer from ``catastrophic forgetting'' issue due to finetuning of base detector, leading to sub-optimal performance on the base classes. Furthermore, the slow convergence rate of stochastic gradient descent (SGD) results in high latency and consequently restricts real-time applications. We tackle the aforementioned issues in this work. We pose few-shot detection as a hierarchical learning problem, where the novel classes are treated as the child classes of existing base classes and the background class. The detection heads for the novel classes are then trained using a specialized optimization strategy, leading to significantly lower training times compared to SGD. Our approach obtains competitive novel class performance on few-shot MS-COCO benchmark, while completely retaining the performance of the initial model on the base classes. We further demonstrate the application of our approach to a new class-refined few-shot detection task.
|
|
15:00-15:10, Paper MoB-9.6 | |
Improving Single-View Mesh Reconstruction for Unseen Categories Via Primitive-Based Representation and Mesh Augmentation |
|
Kuo, Yu-Liang | National Yang Ming Chiao Tung University |
Ko, Wei Jan | National Chiao Tung University |
Chiu, Chen-Yi | National Yang Ming Chiao Tung University |
Chiu, Wei-Chen | National Chiao Tung University |
Keywords: Object Detection, Segmentation and Categorization, Representation Learning, Deep Learning for Visual Perception
Abstract: As most existing works of single-view 3D recon- struction aim at learning the better mapping functions to directly transform the 2D observation into the corresponding 3D shape for achieving state-of-the-art performance, there often comes a potential concern on having the implicit bias towards the seen classes learnt in their models (i.e. recon- struction intertwined with the classification) thus leading to poor generalizability for the unseen object categories. Moreover, such implicit bias typically stemmed from adopting the object- centered coordinate in their model designs, in which the reconstructed 3D shapes of the same class are all aligned to the same canonical pose regardless of different view-angles in the 2D observations. To this end, we propose an end-to-end framework to reconstruct the 3D mesh from a single image, where the reconstructed mesh is not only view-centered (i.e. its 3D pose respects the viewpoint of the 2D observation) but also preliminarily represented as a composition of volumetric 3D primitives before being further deformed into the fine- grained mesh to capture the shape details. In particular, the usage of volumetric primitives is motivated from the assumption that there generally exists some similar shape parts shared across various object categories, learning to estimate the primitive-based 3D model thus becomes more generalizable to the unseen categories. Furthermore, we advance to propose a novel mesh augmentation strategy, CvxRearrangement, to enrich the distribution of training shapes, which contributes to increasing the robustness of our proposed model and achieves better generalization. Extensive experiments demonstrate that our proposed method provides superior performance on both unseen and seen classes in comparison to several representative baselines of single-view 3D reconstruction.
|
|
15:10-15:20, Paper MoB-9.7 | |
Instance Segmentation with Cross-Modal Consistency |
|
Zhu, Alex Zihao | Waymo LLC |
Casser, Vincent Michael | Waymo LLC |
Mahjourian, Reza | Waymo |
Kretzschmar, Henrik | Waymo |
Pirk, Soren | Robotics at Google |
Keywords: Object Detection, Segmentation and Categorization, Visual Learning, Deep Learning Methods
Abstract: Segmenting object instances is a key task in machine perception, with safety-critical applications in robotics and autonomous driving. We introduce a novel approach to instance segmentation that jointly leverages measurements from multiple sensor modalities, such as cameras and LiDAR. Our method learns to predict embeddings for each pixel or point that give rise to a dense segmentation of the scene. Specifically, our technique applies contrastive learning to points in the scene both across sensor modalities and the temporal domain. We demonstrate that this formulation encourages the models to learn embeddings that are invariant to viewpoint variations and consistent across sensor modalities. We further demonstrate that the embeddings are stable over time as objects move around the scene. This not only provides stable instance masks, but can also provide valuable signals to downstream tasks, such as object tracking. We evaluate our method on the Cityscapes and KITTI-360 datasets. We further conduct a number of ablation studies, demonstrating benefits when applying additional inputs for the contrastive loss.
|
|
15:20-15:30, Paper MoB-9.8 | |
Early Recall, Late Precision: Multi-Robot Semantic Object Mapping under Operational Constraints in Perceptually-Degraded Environments |
|
Lei, Xianmei | NASA JPL |
Taeyeon, Kim | Korea Advanced Institute of Science and Technology |
Marchal, Nicolas Paul | ETH Zurich |
Pastor, Daniel | Caltech |
Ridge, Barry | NASA Jet Propulsion Laboratory, California Institute of Technolo |
Terry, Edward | NASA JPL |
Touma, Thomas | Caltech |
Chavez, Fernando | Jet Propulsion Laboratory |
Otsu, Kyohei | California Institute of Technology |
Agha-mohammadi, Ali-akbar | NASA-JPL, Caltech |
Scholler, Frederik | Technical University of Denmark |
Morrell, Benjamin | Jet Propulsion Laboratory, California Institute of Technology |
Keywords: Object Detection, Segmentation and Categorization
Abstract: Semantic object mapping in uncertain, perceptually degraded environments during long-range multi-robot autonomous exploration tasks such as search-and-rescue is important and challenging. During such missions, high recall is desirable to avoid missing true target objects and high precision is also critical to avoid wasting valuable operational time on false positives. Given recent advancements in visual perception algorithms, the former is largely solvable autonomously, but the latter is difficult to address without the supervision of a human operator. However, operational constraints such as mission time, computational requirements and mesh network bandwidth can make the operator's task infeasible unless properly managed. We propose the Early Recall, Late Precision (EaRLaP) semantic object mapping pipeline to solve this problem. EaRLaP was used by Team CoSTAR in DARPA Subterranean Challenge, where it successfully detected all the artifacts encountered by the team of robots. We will discuss these results and the performance of the EaRLaP on various datasets.
|
|
15:30-15:40, Paper MoB-9.9 | |
Sparse PointPillars: Maintaining and Exploiting Input Sparsity to Improve Runtime on Embedded Systems |
|
Vedder, Kyle | University of Pennsylvania |
Eaton, Eric | University of Pennsylvania |
Keywords: Object Detection, Segmentation and Categorization, Deep Learning for Visual Perception, Deep Learning Methods
Abstract: Bird's Eye View (BEV) is a popular representation for processing 3D point clouds, and by its nature is fundamentally sparse. Motivated by the computational limitations of mobile robot platforms, we create a fast, high-performance BEV 3D object detector that maintains and exploits this input sparsity to decrease runtimes over non-sparse baselines and avoids the tradeoff between pseudoimage area and runtime. We present results on KITTI, a canonical 3D detection dataset, and Matterport-Chair, a novel Matterport3D-derived chair detection dataset from scenes in real furnished homes. We evaluate runtime characteristics using a desktop GPU, an embedded ML accelerator, and a robot CPU, demonstrating that our method results in significant detection speedups (2X or more) for embedded systems with only a modest decrease in detection quality. Our work represents a new approach for practitioners to optimize models for embedded systems by maintaining and exploiting input sparsity throughout their entire pipeline to reduce runtime and resource usage while preserving detection performance. All models, weights, experimental configurations, and datasets used are publicly available at https://vedder.io/sparse_point_pillars.
|
|
MoB-10 |
Rm10 (Room H) |
Force and Tactile Sensing |
Regular session |
Chair: Santos, Veronica J. | University of California, Los Angeles |
Co-Chair: Lepora, Nathan | University of Bristol |
|
14:10-14:20, Paper MoB-10.1 | |
Pose-Based Tactile Servoing: Controlled Soft Touch Using Deep Learning (I) |
|
Lepora, Nathan | University of Bristol |
Lloyd, John | University of Bristol |
Keywords: Force and Tactile Sensing
Abstract: This article describes a new way of controlling robots using soft tactile sensors: pose-based tactile servo (PBTS) control. The basic idea is to embed a tactile perception model for estimating the sensor pose within a servo control loop that is applied to local object features, such as edges and surfaces. PBTS control is implemented with a soft, curved optical tactile sensor (the BRL TacTip) using a convolutional neural network trained to be insensitive to shear. As a consequence, robust and accurate controlled motion over various complex 3D objects is attained.
|
|
14:20-14:30, Paper MoB-10.2 | |
Mapping Mid-Air Haptics with a Low-Cost Tactile Robot |
|
Alakhawand, Noor | University of Bristol |
Frier, William | Ultraleap |
Lepora, Nathan | University of Bristol |
Keywords: Force and Tactile Sensing, Haptics and Haptic Interfaces
Abstract: Mid-air haptics create a new mode of feedback to allow people to feel tactile sensations in the air. Ultrasonic arrays focus acoustic radiation pressure in space, to induce tactile sensation from the resulting skin deflection. In this work, we present a low-cost tactile robot to test mid-air haptics. By combining a desktop robot arm with a 3D-printed biomimetic tactile sensor, we developed a system that can sense, map, and visualize mid-air haptic sensations created by an ultrasonic transducer array. We evaluate our tactile robot by testing it on a variety of mid-air haptic stimuli, including unmodulated and modulated focal points that create a range of haptic shapes. We compare the mapping of the stimuli to another method used to test mid-air haptics: Laser Doppler Vibrometry, highlighting the advantages of the tactile robot including far lower cost, a small lightweight form-factor, and ease-of-use. Overall, these findings indicate our method has multiple benefits for sensing mid-air haptics and opens up new possibilities for expanding the testing to better emulate human haptic perception.
|
|
14:30-14:40, Paper MoB-10.3 | |
Autonomous Tactile Localization and Mapping of Objects Buried in Granular Materials |
|
Jia, Shengxin | University of California, Los Angeles |
Zhang, Lionel | University of California, Los Angeles |
Santos, Veronica J. | University of California, Los Angeles |
Keywords: Force and Tactile Sensing, Mapping, Contact Modeling
Abstract: Robots are expected to operate autonomously in unstructured, real-world environments, for tasks such as locating buried objects in search and rescue applications. When robots operate within opaque granular materials, tactile and proprioceptive feedback can be more informative than visual feedback. However, since tactile measurements are local and sparse, it can be difficult to efficiently build a global, tactile-based model of a search area. In this work, we developed a framework for tactile perception, mapping, and haptic exploration for the autonomous localization of objects buried in granular materials. Haptic exploration was performed within a densely packed sand mixture using a sensor model that accounts for granular material characteristics and aids in the interpretation of interaction forces between the robot and its environment. The haptic exploration strategy was designed to efficiently locate a buried object and refine its outline while simultaneously minimizing potentially damaging physical interactions with the buried object. Coverage path planning techniques were used to select haptic exploration movements from candidates that aimed to reduce map uncertainty. A continuous occupancy map was generated that fused local, sparse tactile information into a global Bayesian Hilbert Map. We demonstrated our framework in simulation and with a real robot with granular materials.
|
|
14:40-14:50, Paper MoB-10.4 | |
Tactile Gym 2.0: Sim-To-Real Deep Reinforcement Learning for Comparing Low-Cost High-Resolution Robot Touch |
|
Lin, Yijiong | University of Bristol |
Lloyd, John | University of Bristol |
Church, Alex | University of Bristol |
Lepora, Nathan | University of Bristol |
Keywords: Force and Tactile Sensing, Reinforcement Learning
Abstract: High-resolution optical tactile sensors are increasingly used in robotic learning environments due to their ability to capture large amounts of data directly relating to agent-environment interaction. However, there is a high barrier of entry to research in this area due to the high cost of tactile robot platforms, specialised simulation software, and sim-to-real methods that lack generality across different sensors. In this letter we extend the Tactile Gym simulator to include three new optical tactile sensors (TacTip, DIGIT and DigiTac) of the two most popular types, Gelsight-style (image-shading based) and TacTip-style (marker based). We demonstrate that a single sim-to-real approach can be used with these three different sensors to achieve strong real-world performance despite the significant differences between real tactile images. Additionally, we lower the barrier of entry to the proposed tasks by adapting them to an inexpensive 4-DoF robot arm, further enabling the dissemination of this benchmark. We validate the extended environment on three physically-interactive tasks requiring a sense of touch: object pushing, edge following and surface following. The results of our experimental validation highlight some differences between these sensors, which may help future researchers select and customize the physical characteristics of tactile sensors for different manipulations scenarios. Code and videos are available at https://sites.google.com/my.bristol.ac.uk/tactilegym2.
|
|
14:50-15:00, Paper MoB-10.5 | |
Bioinspired, Multifunctional, Active Whisker Sensors for Tactile Sensing of Mobile Robots |
|
Yu, Zhiqiang | Beijing Institute of Technology |
Guo, Yue | Beijing Institute of Technology |
Su, Jiaji | Case Western Reserve University |
Huang, Qiang | Beijing Institute of Technology |
Fukuda, Toshio | Beijing Institute of Technology |
Cao, Changyong (Chase) | Case Western Reserve University |
Shi, Qing | Beijing Institute of Technology |
Keywords: Force and Tactile Sensing, Sensor-based Control, Collision Avoidance
Abstract: Whiskers of some animals, such as rats and cats, can actively sense stimuli from their surrounding environment. Such a capability is attractive for intelligent mobile robots. However, an artificial whisker with similar abilities has not been fully developed so that the robots acquire their surrounding environment information in an active approach such as rats. In this paper, we propose a new bioinspired active whisking tactile sensor (MAWS) capable of sensing the distance, shape, size, and orientation of caves and environmental conditions. Two orthogonally distributed linear Hall sensors are mounted beneath a circular permanent magnet for spatial localization of the whisker. The whisker is then actuated and controlled by an array of nine electromagnetic coils by tuning the excitation current and phase sequence. Conical pendulum and bidirectional sweeping strategies were designed to mimic the simultaneous perception behavior of rats. A reactive obstacle avoidance experiment was also conducted to evaluate the performance of the proposed MAWS installed on a mobile robot.
|
|
15:00-15:10, Paper MoB-10.6 | |
Deep Active Cross-Modal Visuo-Tactile Transfer Learning for Robotic Object Recognition |
|
Murali, Prajval Kumar | BMW Group and University of Glasgow |
Wang, Cong | Technical University of Munich |
Lee, Dongheui | Technische Universität Wien (TU Wien) |
Dahiya, Ravinder | University of Glasgow |
Kaboli, Mohsen | BMW Group and Radboud University, Donders Institute for Brain An |
Keywords: Force and Tactile Sensing, Transfer Learning, Recognition
Abstract: We propose for the first time, a novel deep active visuo-tactile cross-modal full-fledged framework for object recognition by autonomous robotic systems. Our proposed network xAVTNet is actively trained with labelled point clouds from a vision sensor with one robot and tested with an active tactile perception strategy to recognise objects never touched before using another robot. We propose a novel visuo-tactile loss (VTLoss) to minimise the discrepancy between the visual and tactile domains for unsupervised domain adaptation. Our framework leverages the strengths of deep neural networks for cross-modal recognition along with active perception and active learning strategies for increased efficiency by minimising redundant data collection. Our method is extensively evaluated on a real robotic system and compared against baselines and other state-of-art approaches. We demonstrate clear outperformance in recognition accuracy compared to the state-of-art visuo-tactile cross-modal recognition method.
|
|
15:10-15:20, Paper MoB-10.7 | |
DigiTac: A DIGIT-TacTip Hybrid Tactile Sensor for Comparing Low-Cost High-Resolution Robot Touch |
|
Lepora, Nathan | University of Bristol |
Lin, Yijiong | University of Bristol |
Money-Coomes, Ben | University of Bristol |
Lloyd, John | University of Bristol |
Keywords: Force and Tactile Sensing
Abstract: Deep learning combined with high-resolution tactile sensing could lead to highly capable dexterous robots. However, progress is slow because of the specialist equipment and expertise. The DIGIT tactile sensor offers low-cost entry to high-resolution touch using GelSight-type sensors. Here we customize the DIGIT to have a 3D-printed sensing surface based on the TacTip family of soft biomimetic optical tactile sensors. The DIGIT-TacTip (DigiTac) enables direct comparison between these distinct tactile sensor types. For this comparison, we introduce a tactile robot system comprising a desktop arm, mounts and 3D-printed test objects. We use tactile servo control with a PoseNet deep learning model to compare the DIGIT, DigiTac and TacTip for edge- and surface-following over 3D-shapes. All three sensors performed similarly at pose prediction, but their constructions led to differing performances at servo control, offering guidance for researchers selecting or innovating tactile sensors. All hardware and software for reproducing this study will be openly released.
|
|
15:20-15:30, Paper MoB-10.8 | |
Semi-Supervised Disentanglement of Tactile Contact Geometry from Sliding-Induced Shear |
|
Gupta, Anupam K. | University of Bristol |
Church, Alex | University of Bristol |
Lepora, Nathan | University of Bristol |
Keywords: Force and Tactile Sensing
Abstract: The sense of touch is fundamental to human dexterity. When mimicked in robotic touch, particularly by use of soft optical tactile sensors, it suffers from distortion due to motion-dependent shear. This complicates tactile tasks like shape reconstruction and exploration that require information about contact geometry. In this work, we pursue a semi-supervised approach to remove shear while preserving contact-only information. We validate our approach by showing a match between the model-generated unsheared images with their counterparts from vertically tapping onto the object. The model-generated unsheared images give faithful reconstruction of contact-geometry otherwise masked by shear, along with robust estimation of object pose then used for sliding exploration and full reconstruction of several planar shapes. We show that our semi-supervised approach achieves comparable performance to its fully supervised counterpart across all validation tasks with an order of magnitude less supervision. The semi-supervised method is thus more computational and labeled sample-efficient. We expect it will have broad applicability to wide range of complex tactile exploration and manipulation tasks performed via a shear-sensitive sense of touch.
|
|
15:30-15:40, Paper MoB-10.9 | |
Multi-Purpose Tactile Perception Based on Deep Learning in a New Tendon-Driven Optical Tactile Sensor |
|
Zhao, Zhou | EPITA Research and Development Laboratory (LRDE) |
Lu, Zhenyu | Bristol Robotics Laboratory |
Keywords: Force and Tactile Sensing, Deep Learning Methods, Tendon/Wire Mechanism
Abstract: In this paper, we create a new tendon-connected multi-functional optical tactile sensor, MechTac, for object perception in the field of view (TacTip) and location of touching points in the blind area of vision (TacSide). In a multi-point touch task, the information of the TacSide and the TacTip are overlapped to commonly affect the distribution of papillae pins on the TacTip. Since the effects of TacSide are much less obvious to those affected on the TacTip, a perceiving out-of-view neural network (O 2VNet) is created to separate the mixed information with unequal affection. To reduce the dependence of the O 2VNet on the grayscale information of the image, we create one new binarized convolutional (BConv) layer in front of the backbone of the O 2VNet. The O 2VNet can not only achieve real-time temporal sequence prediction (34 ms per image), but also attain the average classification accuracy of 99.06%. The experimental results show that the O 2VNet can hold a high classification accuracy even facing the image contrast changes.
|
|
MoB-11 |
Rm11 (Room I) |
Human-Robot Collaboration |
Regular session |
Chair: Ohara, Kenichi | Meijo University |
Co-Chair: Suzuki, Takuo | Aichi Prefectural University |
|
14:10-14:20, Paper MoB-11.1 | |
A Multi-Granularity Scene Segmentation Network for Human-Robot Collaboration Environment Perception |
|
Fan, Junming | The Hong Kong Polytechnic University |
Zheng, Pai | The Hong Kong Polytechnic University |
Lee, Carman K.M. | The Hong Kong Polytechnic University - Dept of Industrial and Sy |
Keywords: Human-Robot Collaboration, Computer Vision for Manufacturing, Deep Learning for Visual Perception
Abstract: Human-robot collaboration (HRC) has been considered as a promising paradigm towards futuristic humancentric smart manufacturing, to meet the thriving needs of mass personalization. In this context, existing robotic systems normally adopt a single-granularity semantic segmentation scheme for environment perception, which lacks the flexibility to be implemented to various HRC situations. To fill the gap, this study proposes a multi-granularity scene segmentation network. Inspired by some recent network designs, we construct an encoder network with two ConvNext-T backbones for RGB and depth respectively, and an decoder network consisting of multi-scale supervision and multi-granularity segmentation branches. The proposed model is demonstrated in a human-robot collaborative battery disassembly scenario and further evaluated in comparison with state-of-the-art RGB-D semantic segmentation methods on the NYU-Depth V2 dataset.
|
|
14:20-14:30, Paper MoB-11.2 | |
Controller Design of a Robotic Assistant for the Transport of Large and Fragile Objects |
|
Dumora, Julie | CEA (French Alternative Energies AndAtomicEnergyCommission) |
Nicolas, Julien | Commissariat à l'Energie Atomique |
Geffard, Franck | Atomic Energy Commissariat (CEA) |
Keywords: Human-Robot Collaboration, Physical Human-Robot Interaction, Control Architectures and Programming
Abstract: This paper deals with the design of a robotic assistant for the transport of large and fragile objects. We propose a new collaborative robotic controller that fulfills the main requirements of co-transportation tasks of large and fragile objects: to execute any trajectory in a collaborative mode while minimizing the stress applied on the object by both partners in order to avoid damaging it. This controller prevents the robot from applying torques on the object while maintaining a desired orientation of the object along the transport trajectory in order to follow the operator. An original feature of our approach is to care about torques applied by both partners (not only by operator) during any co-manipulation trajectory execution. It leads to a novel outcome: the minimization of stress applied by both partners on a large and fragile object during its transport on any trajectory. We demonstrate the effectiveness of this approach in a collaborative transportation task.
|
|
14:30-14:40, Paper MoB-11.3 | |
Safety-Based Dynamic Task Offloading for Human-Robot Collaboration Using Deep Reinforcement Learning |
|
Ruggeri, Franco | Ericsson AB |
Terra, Ahmad | Ericsson AB |
Hata, Alberto Yukinobu | Ericsson Telecomunicações S/A |
Inam, Rafia | Ericsson AB |
Leite, Iolanda | KTH Royal Institute of Technology |
Keywords: Human-Robot Collaboration, Networked Robots, Reinforcement Learning
Abstract: Robots with constrained hardware resources usually rely on Multi-access Edge Computing infrastructures to offload computationally expensive tasks to meet real-time and safety requirements. Offloading every task might not be the best option due to dynamic changes in the network conditions and can result in network congestion or failures. This work proposes a task offloading strategy for mobile robots in a Human-Robot Collaboration scenario that optimizes the edge resource usage and reduces network delays, leading to safety enhancement. The solution utilizes a Deep Reinforcement Learning (DRL) agent that observes safety and network metrics to dynamically decide at runtime if (i) a less accurate model should run on the robot; (ii) a more complex model should run on the edge; or (iii) the previous output should be reused through temporal coherence verification. Experiments are performed in a simulated warehouse where humans and robots have close interactions and safety needs are high. Our results show that the proposed DRL solution outperforms the baselines in several aspects. The edge is used only when the network performance is reliable, reducing the number of failures (up to 47%). The latency is not only decreased (up to 68%) but also adapted to the safety requirements (risk×latency reduced up to 48%), avoiding unnecessary network congestion in safe situations and letting other devices use the network. Overall, the safety metrics get improved, such as the increased time in the safe zone by up to 3.1%.
|
|
14:40-14:50, Paper MoB-11.4 | |
"I'm Confident This Will End Poorly": Robot Proficiency Self-Assessment in Human-Robot Teaming |
|
Conlon, Nicholas | University of Colorado Boulder |
Szafir, Daniel J. | University of North Carolina at Chapel Hill |
Ahmed, Nisar | University of Colorado Boulder |
Keywords: Human-Robot Teaming, Human-Robot Collaboration, Human Factors and Human-in-the-Loop
Abstract: Human-robot teams are expected to accomplish complex tasks in high-risk and uncertain environments. In domains such as space exploration or search & rescue, a human operator may not be a robotics expert, but will need to establish a baseline understanding of the robot’s capabilities with respect to a given task in order to appropriately utilize and rely on the robot. This willingness to rely, also known as trust, is based partly on the operator’s belief in the robot’s task proficiency. If trust is too high, the operator may unknowingly push the robot beyond its capabilities. If trust is too low, the operator may not utilize it when they otherwise could have, wasting precious time and resources. In this work, we discuss results from an online human-subjects study investigating how a robot communicated report of its task proficiency with respect to an operators expectations affects trust and performance in a navigation task. Our results show that communication of a robot self-assessment helped operators understand when reliance on the robot was appropriate given the task and conditions. This led to improvements in task performance, informed choices of autonomy level, and increased trust.
|
|
14:50-15:00, Paper MoB-11.5 | |
RILI: Robustly Influencing Latent Intent |
|
Parekh, Sagar | Virginia Tech |
Habibian, Soheil | Virginia Tech |
Losey, Dylan | Virginia Tech |
Keywords: Human-Robot Collaboration, Representation Learning, Reinforcement Learning
Abstract: When robots interact with human partners, often these partners change their behavior in response to the robot. On the one hand this is challenging because the robot must learn to coordinate with a dynamic partner. But on the other hand --- if the robot understands these dynamics --- it can harness its own behavior, influence the human, and guide the team towards effective collaboration. Prior research enables robots to learn to influence other robots or simulated agents. In this paper we extend these learning approaches to now influence humans. What makes humans especially hard to influence is that --- not only do humans react to the robot --- but the way a single user reacts to the robot may change over time, and different humans will respond to the same robot behavior in different ways. We therefore propose a robust approach that learns to influence changing partner dynamics. Our method first trains with a set of partners across repeated interactions, and learns to predict the current partner's behavior based on the previous states, actions, and rewards. Next, we rapidly adapt to new partners by sampling trajectories the robot learned with the original partners, and then leveraging those existing behaviors to influence the new partner dynamics. We compare our resulting algorithm to state-of-the-art baselines across simulated environments and a user study where the robot and participants collaborate to build towers. We find that our approach outperforms the alternatives, even when the partner follows new or unexpected dynamics.
|
|
15:00-15:10, Paper MoB-11.6 | |
Multimodal Object Categorization with Reduced User Load through Human-Robot Interaction in Mixed Reality |
|
Nakamura, Hitoshi | Ritsumeikan University |
El Hafi, Lotfi | Ritsumeikan University |
Taniguchi, Akira | Ritsumeikan University |
Hagiwara, Yoshinobu | Ritsumeikan University |
Taniguchi, Tadahiro | Ritsumeikan University |
Keywords: Human-Robot Collaboration, Virtual Reality and Interfaces
Abstract: Enabling robots to learn from interactions with users is essential to perform service tasks. However, as a robot categorizes objects from multimodal information obtained by its sensors during interactive onsite teaching, the inferred names of unknown objects do not always match the human user's expectation, especially when the robot is introduced to new environments. Confirming the learning results through natural speech interaction with the robot often puts an additional burden on the user who can only listen to the robot to validate the results. Therefore, we propose a human-robot interface to reduce the burden on the user by visualizing the inferred results in mixed reality (MR). In particular, we evaluate the proposed interface on the system usability scale (SUS) and the NASA task load index (NASA-TLX) with three experimental object categorization scenarios based on multimodal latent Dirichlet allocation (MLDA) in which the robot: 1) does not share the inferred results with the user at all, 2) shares the inferred results through speech interaction with the user (baseline), and 3) shares the inferred results with the user through an MR interface (proposed). We show that providing feedback through an MR interface significantly reduces the temporal, physical, and mental burden on the human user compared to speech interaction with the robot.
|
|
15:10-15:20, Paper MoB-11.7 | |
Learning and Executing Re-Usable Behaviour Trees from Natural Language Instruction |
|
Suddrey, Gavin | Queensland University of Technology |
Talbot, Ben | Queensland University of Technology |
Maire, Frederic | Queensland University of Technology |
Keywords: Human-Robot Collaboration, Learning from Demonstration, Task Planning
Abstract: Domestic and service robots have the potential to transform industries such as health care and small-scale manufacturing, as well as the homes in which we live. However, due to the overwhelming variety of tasks these robots will be expected to complete, providing generic out-of-the-box solutions that meet the needs of every possible user is clearly intractable. To address this problem, robots must therefore not only be capable of learning how to complete novel tasks at run-time, but the solutions to these tasks must also be informed by the needs of the user. In this paper we demonstrate how behaviour trees, a well established control architecture in the fields of gaming and robotics, can be used in conjunction with natural language instruction to provide a robust and modular control architecture for instructing autonomous agents to learn and perform novel complex tasks. We also show how behaviour trees generated using our approach can be generalised to novel scenarios, and can be re-used in future learning episodes to create increasingly complex behaviours. We validate this work against an existing corpus of natural language instructions, demonstrate the application of our approach on both a simulated robot solving a toy problem, as well as two distinct real-world robot platforms which, respectively, complete a block sorting scenario, and a patrol scenario.
|
|
15:20-15:30, Paper MoB-11.8 | |
ProTAMP: Probabilistic Task and Motion Planning Considering Human Action for Harmonious Collaboration |
|
Mochizuki, Shunsuke | Keio University |
Kawasaki, Yosuke | Keio University |
Takahashi, Masaki | Keio University |
Keywords: Human-Robot Collaboration, Task and Motion Planning, Modeling and Simulating Humans
Abstract: For the proper functioning of mobile manipulator-type autonomous robot performing complicated tasks in a human-robot coexistence environment, tasks and motions must be planned simultaneously. In such environments, a human and robot should collaborate with each other. Therefore, the robot must act in accordance with the human and avoid useless actions duplicated with those of humans. However, any action undertaken by a human has uncertainty, and thus, predicting them correctly is challenging. This study proposed probabilistic task and motion planning considering both deterministic and probabilistic environment changes caused by robot and human actions temporarily and spatially, respectively. First, the environmental changes were modeled, where the robot is capable of recognizing the possibility of environmental changes. Second, in task planning, the probabilities of each environmental change owing to human actions was minimized. Finally, in motion planning, a movement path connecting each task in a planned order was planned, thereby enabling the robot to perform actions not duplicated with those by a human. Furthermore, the plans generated were compared without considering possibility of human actions and the effectiveness of the proposed method was verified. Consequently, the proposed method was confirmed to reduce the time required for finishing the tasks.
|
|
15:30-15:40, Paper MoB-11.9 | |
VR Facial Animation for Immersive Telepresence Avatars |
|
Rochow, Andre | University of Bonn |
Schwarz, Max | University Bonn |
Schreiber, Michael | University of Bonn |
Behnke, Sven | University of Bonn |
Keywords: Gesture, Posture and Facial Expressions, Telerobotics and Teleoperation, Virtual Reality and Interfaces
Abstract: VR Facial Animation is necessary in applications requiring clear view of the face, even though a VR headset is worn. In our case, we aim to animate the face of an operator who is controlling our robotic avatar system. We propose a real-time capable pipeline with very fast adaptation for specific operators. In a quick enrollment step, we capture a sequence of source images from the operator without the VR headset which contain all the important operator-specific appearance information. During inference, we then use the operator keypoint information extracted from a mouth camera and two eye cameras to estimate the target expression and head pose, to which we map the appearance of a source still image. In order to enhance the mouth expression accuracy, we dynamically select an auxiliary expression frame from the captured sequence. This selection is done by learning to transform the current mouth keypoints into the source camera space, where the alignment can be determined accurately. We, furthermore, demonstrate an eye tracking pipeline that can be trained in less than a minute, a time efficient way to train the whole pipeline given a dataset that includes only complete faces, show exemplary results generated by our method, and discuss performance at the ANA Avatar XPRIZE semifinals.
|
|
MoB-12 |
Rm12 (Room J) |
Visual Servoing |
Regular session |
Chair: Chaumette, Francois | Inria Rennes-Bretagne Atlantique |
Co-Chair: Aranda, Miguel | Universidad De Zaragoza |
|
14:10-14:20, Paper MoB-12.1 | |
An Offline Geometric Model for Controlling the Shape of Elastic Linear Objects |
|
Aghajanzadeh, Omid | Universite Clermont Auvergne, Institut Pascal |
Aranda, Miguel | Universidad De Zaragoza |
Lopez-Nicolas, Gonzalo | Universidad De Zaragoza |
Lenain, Roland | INRAE |
Mezouar, Youcef | Clermont Auvergne INP - SIGMA Clermont |
Keywords: Visual Servoing, Sensor-based Control
Abstract: We propose a new approach to control the shape of deformable objects with robots. Specifically, we consider a fixed-length elastic linear object lying on a 2D workspace. Our main idea is to encode the object’s deformation behavior in an offline constant Jacobian matrix. To derive this Jacobian, we use geometric deformation modeling and combine recent work from the fields of deformable object control and multirobot systems. Based on this Jacobian, we then propose a robotic control law that is capable of driving a set of shape features on the object toward prescribed values. Our contribution relative to existing approaches is that at run-time we do not need to measure the full shape of the object or to estimate/simulate a deformation model. This simplification is achieved thanks to having abstracted the deformation behavior as an offline model. We illustrate the proposed approach in simulation and in experiments with real deformable linear objects.
|
|
14:20-14:30, Paper MoB-12.2 | |
Skeleton-Based Adaptive Visual Servoing for Control of Robotic Manipulators in Configuration Space |
|
Gandhi, Abhinav | Worceser Polytechnic Institute |
Chatterjee, Sreejani | Worcester Polytechnic Institute |
Calli, Berk | Worcester Polytechnic Institute |
Keywords: Visual Servoing, Robust/Adaptive Control, Sensor-based Control
Abstract: This paper presents a novel visual servoing method that controls a robotic manipulator in the configuration space as opposed to the classical vision-based control methods solely focusing on the end effector pose. We first extract the robot's shape from depth images using a skeletonization algorithm and represent it using parametric curves. We then adopt an adaptive visual servoing scheme that estimates the Jacobian online relating the changes of the curve parameters and the joint velocities. The proposed scheme does not only enable controlling a manipulator in the configuration space, but also demonstrates a better transient response while converging to the goal configuration compared to the classical adaptive visual servoing methods. We present simulations and real robot experiments that demonstrate the capabilities of the proposed method and analyze its performance, robustness, and repeatability compared to the classical algorithms.
|
|
14:30-14:40, Paper MoB-12.3 | |
Conditional Visual Servoing for Multi-Step Tasks |
|
Izquierdo, Sergio | University of Freiburg |
Argus, Maximilian | University of Freiburg |
Brox, Thomas | University of Freiburg |
Keywords: Visual Servoing, Learning from Demonstration, Perception for Grasping and Manipulation
Abstract: Visual Servoing has been effectively used to move a robot into specific target locations or to track a recorded demonstration. It does not require manual programming, but it is typically limited to settings where one demonstration maps to one environment state. We propose a modular approach to extend visual servoing to scenarios with multiple demonstration sequences. We call this conditional servoing, as we choose the next demonstration conditioned on the observation of the robot. This method presents an appealing strategy to tackle multi-step problems, as individual demonstrations can be combined flexibly into a control policy. We propose different selection functions and compare them on a shape-sorting task in simulation. With the reprojection error yielding the best overall results, we implement this selection function on a real robot and show the efficacy of the proposed conditional servoing.
|
|
14:40-14:50, Paper MoB-12.4 | |
Optimal Shape Servoing with Task-Focused Convergence Constraints |
|
Giraud, Victor Henri | SIGMA-Clermont / Institut Pascal |
Padrin, Maxime Lino | INP Clermont Auvergne - Sigma Clermont |
Shetab-Bushehri, Mohammadreza | Université Clermont Auvergne, Institut Pascal |
Bouzgarrou, Chedli | Institut Pascal UMR 6602 - UCA/CNRS/SIGMA |
Mezouar, Youcef | Clermont Auvergne INP - SIGMA Clermont |
Ozgur, Erol | SIGMA-Clermont / Institut Pascal |
Keywords: Visual Servoing, Optimization and Optimal Control, Industrial Robots
Abstract: Most deformable object manipulation tasks still rely on skillful human operators. To automate such tasks, a robotic system should not only be able to deform an object to a desired shape but also servo its deformation along a specific path towards the desired shape. We propose a shape servoing control scheme to automate such tasks. Our scheme controls the deformation trajectory towards the desired shape by imposing task-focused convergence constraints. The constraints impose how fast the different regions of the object converge to the desired shape. Integrating such a behavior in shape servoing forms our main contribution. Experiments, carried out on rubber layer assembly tasks, show that our control scheme outperforms a state-of-the-art shape servoing scheme.
|
|
14:50-15:00, Paper MoB-12.5 | |
An Event-Triggered Visual Servoing Predictive Control Strategy for the Surveillance of Contour-Based Areas Using Multirotor Aerial Vehicles |
|
Aspragkathos, Sotiris | NTUA |
Sinani, Mario | National Technical University of Athens - Control Systems Labora |
Karras, George | National Technical University of Athens |
Panetsos, Fotis | National Technical University of Athens |
Kyriakopoulos, Kostas | National Technical Univ. of Athens |
Keywords: Visual Servoing, Visual Tracking, Sensor-based Control
Abstract: In this paper, an Event-triggered Image-based Visual Servoing Nonlinear Model Predictive Controller (ET-IBVS-NMPC) for multirotor aerial vehicles is presented. The proposed scheme is developed for the autonomous surveillance of contour-based areas with different characteristics (e.g. forest paths, coastlines, road pavements). For this purpose, an appropriately trained Deep Neural Network (DNN) is employed for the accurate detection of the contours. In an effort to reduce the remarkably large computational cost required by an IBVS-NMPC algorithm, a triggering condition is designed to define when the Optimal Control Problem (OCP) should be resolved and new control inputs will be calculated. Between two successive triggering instants, the control input trajectory is applied to the robot in an open-loop way, which means no visual measurements or control input computations are required at that moment. As a result, the system's computing effort and energy consumption are lowered, while its autonomy and flight duration are increased. The visibility and input constraints, as well as the external disturbances, are all taken into account throughout the control design. The efficacy of the proposed strategy is demonstrated through a series of real-time experiments using a quadrotor and an octorotor both equipped with a monocular downward looking camera.
|
|
15:00-15:10, Paper MoB-12.6 | |
Vision-Based Rotational Control of an Agile Observation Satellite |
|
Robic, Maxime | INRIA |
Fraisse, Renaud | Airbus Defence & Space |
Marchand, Eric | Univ Rennes, Inria, CNRS, IRISA |
Chaumette, Francois | Inria Rennes-Bretagne Atlantique |
Keywords: Visual Servoing, Space Robotics and Automation
Abstract: Recent Earth observation satellites are now equipped with new instrument that allows image feedback in real-time. Problematic such as ground target tracking, moving or not, can now be addressed by controlling precisely the satellite attitude. In this paper, we propose to consider this problem using a visual servoing (VS) approach. While focusing on the target, the control scheme has also to take into account the satellite motion induced by its orbit, Earth rotational velocities, potential target own motion, but also rotational velocities and accelerations constraints of the system. We show the efficiency of our system using both simulation (considering real Earth image) and experiments on a robot that replicates actual high resolution satellite constraints.
|
|
15:10-15:20, Paper MoB-12.7 | |
DIJE: Dense Image Jacobian Estimation for Robust Robotic Self-Recognition and Visual Servoing |
|
Toshimitsu, Yasunori | University of Tokyo |
Kawaharazuka, Kento | The University of Tokyo |
Miki, Akihiro | The University of Tokyo |
Okada, Kei | The University of Tokyo |
Inaba, Masayuki | The University of Tokyo |
Keywords: Visual Servoing, Model Learning for Control, Semantic Scene Understanding
Abstract: For robots to move in the real world, they must first correctly understand the state of its own body and the tools that it holds. In this research, we propose DIJE, an algorithm to estimate the image Jacobian for every pixel. It is based on an optical flow calculation and a simplified Kalman Filter that can be efficiently run on the whole image in real time. It does not rely on markers nor knowledge of the robotic structure. We use the DIJE in a self-recognition process which can robustly distinguish between movement by the robot and by external entities, even when the motion overlaps. We also propose a visual servoing controller based on DIJE, which can learn to control the robot's body to conduct reaching movements or bimanual tool-tip control. The proposed algorithms were implemented on a physical musculoskeletal robot and its performance was verified. We believe that such global estimation of the visuomotor policy has the potential to be extended into a more general framework for manipulation.
|
|
15:20-15:30, Paper MoB-12.8 | |
Self-Supervised Wide Baseline Visual Servoing Via 3D Equivariance |
|
Huh, Jinwook | Samsung |
Hong, Jungseok | University of Minnesota |
Garg, Suveer | University of Pennsylvania |
Park, Hyun Soo | University of Minnesota |
Isler, Volkan | University of Minnesota |
Keywords: Visual Servoing, Representation Learning, Perception-Action Coupling
Abstract: One of the challenging input settings for visual servoing is when the initial and goal camera views are far apart. Such settings are difficult because the wide baseline can cause drastic changes in object appearance and cause occlusions. This paper presents a novel self-supervised visual servoing method for wide baseline images which does not require 3D ground truth supervision. Existing approaches that regress absolute camera pose with respect to an object require 3D ground truth data of the object in the forms of 3D bounding boxes or meshes. We learn a coherent visual representation by leveraging a geometric property called 3D equivariance-the representation is transformed in a predictable way as a function of 3D transformation. To ensure that the feature-space is faithful to the underlying geodesic space, a geodesic preserving constraint is applied in conjunction with the equivariance. We design a Siamese network that can effectively enforce these two geometric properties without requiring 3D supervision. With the learned model, the relative transformation can be inferred simply by following the gradient in the learned space and used as feedback for closed-loop visual servoing. Our method is evaluated on objects from the YCB dataset, showing meaningful outperformance on a visual servoing task, or object alignment task with respect to state-of-the-art approaches that use 3D supervision. Ours yields more than 35% average distance error reduction and more than 90% success rate with 3cm error tolerance.
|
|
15:30-15:40, Paper MoB-12.9 | |
Visibility Maximization Controller for Robotic Manipulation |
|
He, Kerry | Monash University |
Newbury, Rhys | Monash University |
Tran, Tin | Monash University |
Haviland, Jesse | Queensland University of Technology |
Burgess-Limerick, Ben | Queensland University of Technology |
Kulic, Dana | Monash University |
Corke, Peter | Queensland University of Technology |
Cosgun, Akansel | Monash University |
Keywords: Visual Servoing, Sensor-based Control, Mobile Manipulation
Abstract: Occlusions caused by a robot's own body is a common problem for closed-loop control methods employed in eye-to-hand camera setups. We propose an optimization-based reactive controller that minimizes self-occlusions while achieving a desired goal pose. The approach allows coordinated control between the robot's base, arm and head by encoding the line-of-sight visibility to the target as a soft constraint along with other task-related constraints, and solving for feasible joint and base velocities. The generalizability of the approach is demonstrated in simulated and real-world experiments, on robots with fixed or mobile bases, with moving or fixed objects, and multiple objects. The experiments revealed a trade-off between occlusion rates and other task metrics. While a planning-based baseline achieved lower occlusion rates than the proposed controller, it came at the expense of highly inefficient paths and a significant drop in the task success. On the other hand, the proposed controller is shown to improve visibility to the line target object(s) without sacrificing too much from the task success and efficiency. Videos and code can be found at: rhys-newbury.github.io/projects/vmc/.
|
|
MoB-13 |
Rm13 (Room K) |
Mapping 2 |
Regular session |
Chair: Stachniss, Cyrill | University of Bonn |
Co-Chair: Leutenegger, Stefan | Technical University of Munich |
|
14:10-14:20, Paper MoB-13.1 | |
Multi-Camera-LiDAR Auto-Calibration by Joint Structure-From-Motion |
|
Tu, Diantao | Institute of Automation, Chinese Academy of Sciences |
Wang, Baoyu | Institute of Automation, Chinese Academy of Sciences |
Cui, Hainan | Institute of Automation, Chinese Academy of Sciences |
Liu, Yuqian | SenseTime |
Shen, Shuhan | Institute of Automation, Chinese Academy of Sciences |
Keywords: Mapping
Abstract: Multiple sensors, especially cameras and LiDARs, are widely used in autonomous vehicles. In order to fuse data from different sensors accurately, precise calibrations are required, including camera intrinsic parameters, and relative poses between multiple cameras and LiDARs. However, most existing camera-LiDAR calibration methods need to place manually designed calibration objects in multiple locations and multiple times, which are time-consuming and labor-intensive, and are not suitable for frequent use. To address that, in this paper we proposed a novel calibration pipeline that can automatically calibrate multiple cameras and multiple LiDARs in a Structure-from-Motion (SfM) process. In our pipeline, we first perform a global SfM on all images with the help of rough LiDAR data to get the initial poses of all sensors. Then, feature points on lines and planes are extracted from both SfM point cloud and LiDARs. With these features, a global Bundle Adjustment is performed to minimize the point reprojection errors, point-to-line errors, and point-to-plane errors together. During this minimization process, camera intrinsic parameters, camera and LiDAR poses, and SfM point cloud are refined jointly. The proposed method uses the characteristics of natural scenes, does not require manually designed calibration objects, and incorporates all calibration parameters into a unified optimization framework. Experiments on autonomous vehicles with different sensor configurations demonstrate the effectiveness and robustness of the proposed method.
|
|
14:20-14:30, Paper MoB-13.2 | |
GeoROS: Georeferenced Real-Time Orthophoto Stitching with Unmanned Aerial Vehicle |
|
Gao, Guangze | Insititute of Automation Chinese Academy of Sciences |
Yuan, Mengke | Institute of Automation, Chinese Academy of Sciences |
Ma, Zhihao | Institute of Automation,Chinese Academy of Sciences |
Jiaming, Gu | Institute of Automation, Chinese Academy of Sciences |
Meng, Weiliang | Institute of Automation, Chinese Academy of Sciences |
Xu, Shibiao | Beijing University of Posts and Telecommunications |
Zhang, Xiaopeng | National Laboratory of Pattern Recognition, Institute of Automat |
Keywords: Mapping, SLAM, Aerial Systems: Perception and Autonomy
Abstract: Simultaneous orthophoto stitching during the flight of Unmanned Aerial Vehicles (UAV) can greatly promote the practicability and instantaneity of diverse applications such as emergency disaster rescue, digital agriculture, and cadastral survey, which is of remarkable interest in aerial photogrammetry. However, the inaccurately estimated camera poses and intuitive fusion strategy of existing methods lead to misalignment and distortion artifacts in orthophoto mosaics. To address these issues, we propose a Georeferenced Real-time Orthophoto Stitching method (GeoROS), which can achieve efficient and accurate camera pose estimation through exploiting geolocation information in monocular visual simultaneous localization and mapping (SLAM) and fuse transformed images via orthogonality-preserving criterion. Specifically, in the SLAM process, georeferenced tracking is employed to acquire high-quality initial camera poses with a geolocation based motion model and facilitate non-linear pose optimization. Meanwhile, we design a georeferenced mapping scheme by introducing robust geolocation constraints in joint optimization of camera poses and the position of landmarks. Finally, aerial images warped with localized cameras are fused by considering both the orthogonality of camera orientation relative to the ground plane and the pixel centrality to fulfill global orthorectification. Besides, we construct two datasets with global navigation satellite system (GNSS) information of different scenarios and validate the superiority of our GeoROS method compared with state-of-the-art methods in accuracy and efficiency.
|
|
14:30-14:40, Paper MoB-13.3 | |
Learning to Complete Object Shapes for Object-Level Mapping in Dynamic Scenes |
|
Xu, Binbin | Imperial College London |
Davison, Andrew J | Imperial College London |
Leutenegger, Stefan | Technical University of Munich |
Keywords: Mapping, RGB-D Perception, SLAM
Abstract: In this paper, we propose a novel object-level mapping system that can simultaneously segment, track, and reconstruct objects in dynamic scenes. It can further predict and complete their full geometries by conditioning on reconstructions from depth inputs and a category-level shape prior with the aim that completed object geometry leads to better object reconstruction and tracking accuracy. For each incoming RGB-D frame, we perform instance segmentation to detect objects and build data associations between the detection and the existing object maps. A new object map will be created for each unmatched detection. For each matched object, we jointly optimise its pose and latent geometry representations using geometric residual and differential rendering residual towards its shape prior and completed geometry. Our approach shows better tracking and reconstruction performance compared to methods using traditional volumetric mapping or learned shape prior approaches. We evaluate its effectiveness by quantitatively and qualitatively testing it in both synthetic and real-world sequences.
|
|
14:40-14:50, Paper MoB-13.4 | |
Robot-Aided Microbial Density Estimation and Mapping |
|
Pey, Javier Jia Jie | Singapore University of Technology and Design |
Palanisamy, Povendhan | Singapore University of Technology and Design |
Pathmakumar, Thejus | Singapore University of Technology and Design |
Elara, Mohan Rajesh | Singapore University of Technology and Design |
Keywords: Mapping, Engineering for Robotic Systems, Service Robotics
Abstract: Estimating the microbial infestation profile of an area is essential for an effective cleaning process. However, current methods used to inspect the microbial infestation within a spatial region are manual and laborious. For large regions that require automated cleaning, conventional methods of microbial examination are not practical. We propose a novel robotaided microbial density estimation and mapping framework using an in-house developed biosensor payload onboard a mobile robot. The biosensor estimates the degree of microbial infestation in Relative Light Units (RLU) using the natural bio-luminescence reaction. The global distribution of microbial infestation is approximated through the Radial Basis Function (RBF) and Nearest Neighbour (NN) interpolation algorithms. The proposed method is implemented on an in-house developed mobile robot called Beluga. The framework’s validation and usefulness are demonstrated quantitatively through real-world experiment trials.
|
|
14:50-15:00, Paper MoB-13.5 | |
Elevation Mapping for Locomotion and Navigation Using GPU |
|
Miki, Takahiro | ETH Zurich |
Wellhausen, Lorenz | ETH Zürich |
Grandia, Ruben | ETH Zurich |
Jenelten, Fabian | ETH Zurich |
Homberger, Timon | ETH Zurich |
Hutter, Marco | ETH Zurich |
Keywords: Mapping, Legged Robots, Autonomous Vehicle Navigation
Abstract: Perceiving the surrounding environment is crucial for autonomous mobile robots. An elevation map provides a memory-efficient and simple yet powerful geometric representation of the terrain for ground robots. The robots can use this information for navigation in an unknown environment or perceptive locomotion control over rough terrain. Depending on the application, various post processing steps may be incorporated, such as smoothing, inpainting or plane segmentation. In this work, we present an elevation mapping pipeline leveraging GPU for fast and efficient processing with additional features both for navigation and locomotion. We demonstrated our mapping framework through extensive hardware experiments. Our mapping software was successfully deployed for underground exploration during DARPA Subterranean Challenge and for various experiments of quadrupedal locomotion.
|
|
15:00-15:10, Paper MoB-13.6 | |
LODM: Large-Scale Online Dense Mapping for UAV |
|
Huang, Jianxin | Zhejiang University |
Li, Laijian | Zhejiang University |
Zhao, Xiangrui | Zhejiang University |
Lang, Xiaolei | Zhejiang University |
Zhu, Deye | Zhejiang University |
Liu, Yong | Zhejiang University |
Keywords: Mapping, Aerial Systems: Perception and Autonomy
Abstract: This paper proposes a method for online large-scale dense mapping. The UAV is within a range of 150-250 meters, combining GPS and visual odometry to estimate the scaled pose and sparse points. In order to use the depth of sparse points for depth map, we propose Sparse Confidence Cascade View-Aggregation MVSNet (SCCVA-MVSNet), which projects the depth-converged points in the sliding window on keyframes to obtain a sparse depth map. The photometric error constructs sparse confidence. The coarse depth and confidence through normalized convolution use the images of all keyframes, coarse depth, and confidence as the input of CVA-MVSNet to extract features and construct 3D cost volumes with adaptive view aggregation to balance the different stereo baselines between the keyframes. Our proposed network utilizes sparse features point information, the output of the network better maintains the consistency of the scale. Our experiments show that MVSNet using sparse feature point information outperforms image-only MVSNet, and our online reconstruction results are comparable to offline reconstruction methods.To benefit the research community, we open-source our code at https://github.com/hjxwhy/LODM.git
|
|
15:10-15:20, Paper MoB-13.7 | |
Roadside HD Map Object Reconstruction Using Monocular Camera |
| |