2020 IEEE/RSJ International Conference on Intelligent Robots and Systems, Oct. 25-29, Las Vegas, USA (Virtual)


WeOS	Room T24
Awards	Awards ceremony
Chair: Oh, Paul Y.	University of Nevada, Las Vegas (UNLV)
Co-Chair: O'Malley, Marcia	Rice University

08:45-09:00, Paper WeOS.1
Awards

Oh, Paul Y.	University of Nevada, Las Vegas (UNLV)
O'Malley, Marcia	Rice University


WePL	Room T25
Plenary 3	Plenary session
Chair: Oh, Paul Y.	University of Nevada, Las Vegas (UNLV)
Co-Chair: O'Malley, Marcia	Rice University

09:00-10:00, Paper WePL.1
Living with Social Robots: From Research to Commercialization and Back

Breazeal, Cynthia	MIT


WeAT1	Room T1
3D Model Learning	Regular session
Chair: Lee, Sang Uk	Massachusetts Institute of Technology
Co-Chair: Inaba, Masayuki	The University of Tokyo

10:00-10:15, Paper WeAT1.1
Indoor Scene Recognition in 3D

Huang, Shengyu	ETH Zurich
Usvyatsov, Mikhail	ETHZ
Schindler, Konrad	ETH Zurich
Keywords: Deep Learning for Visual Perception, Recognition, Semantic Scene Understanding Abstract: Recognising in what type of environment one is located is an important perception task. For instance, for a robot operating indoors it is helpful to be aware whether it is in a kitchen, a hallway or a bedroom. Existing approaches attempt to classify the scene based on 2D images or 2.5D range images. Here, we study scene recognition from 3D point cloud (or voxel) data, and show that it greatly outperforms methods based on 2D birds-eye views. Moreover, we advocate multi-task learning as a way to improve scene recognition, building on the fact that the scene type is highly correlated with the objects in the scene, and therefore with its semantic segmentation into different object classes. In a series of ablation studies, we show that successful scene recognition is not just the recognition of individual objects unique to some scene type (such as a bathtub), but depends on several different cues, including coarse 3D geometry, colour, and the (implicit) distribution of object categories. Moreover, we demonstrate that surprisingly sparse 3D data is sufficient to classify indoor scenes with good accuracy.

10:15-10:30, Paper WeAT1.2
Edge Enhanced Implicit Orientation Learning with Geometric Prior for 6D Pose Estimation

Wen, Yilin	The University of Hong Kong
Pan, Hao	Microsoft Research
Yang, Lei	The University of Hong Kong
Wang, Wenping	The University of Hong Kong
Keywords: Deep Learning for Visual Perception, Representation Learning Abstract: Estimating 6D poses of rigid objects from RGB images is an important but challenging task. This is especially true for textureless objects with strong symmetry, since they have only sparse visual features to be leveraged for the task and their symmetry leads to pose ambiguity. The implicit encoding of orientations learned by autoencoders [31], [32] has demonstrated its effectiveness in handling such objects without requiring explicit pose labeling. In this paper, we further improve this methodology with two key technical contributions. First, we use edge cues to complement the color images with more discriminative features and reduce the domain gap between the real images for testing and the synthetic ones for training. Second, we enhance the regularity of the implicitly learned pose representations by a self-supervision scheme to enforce the geometric prior that the latent representations of two images presenting nearby rotations should be close too. Our approach achieves the state-of-the-art performance on the T-LESS benchmark in the RGB domain; its evaluation on the LINEMOD dataset also outperforms other synthetically trained approaches. Extensive ablation tests demonstrate the improvements enabled by our technical designs. Our code is publicly available for research use.

10:30-10:45, Paper WeAT1.3
QSRNet: Estimating Qualitative Spatial Representations from RGB-D Images
Video Attachment

Lee, Sang Uk	Massachusetts Institute of Technology
Hong, Sungkweon	Massachusetts Institute of Technology
Hofmann, Andreas	MIT
Williams, Brian	MIT
Keywords: AI-Based Methods, Cognitive Human-Robot Interaction, Computer Vision for Other Robotic Applications Abstract: Humans perceive and describe their surroundings with qualitative statements (e.g., "Alice's hand is in contact with a bottle."), rather than quantitative values (e.g., 6-D poses of Alice's hand and a bottle). Qualitative spatial representation (QSR) is a framework that represents the spatial information of objects in a qualitative manner. Region connection calculus (RCC), qualitative trajectory calculus (QTC), and qualitative distance calculus (QDC) are some popular QSR calculi. With the recent development of computer vision, it is important to compute QSR calculi from the visual inputs (e.g., RGB-D images). In fact, many QSR application domains (e.g., human activity recognition (HAR) in robotics) involve visual inputs. We propose a qualitative spatial representation network (QSRNet) that computes the three QSR calculi (i.e., RCC, QTC, and QDC) from the RGB-D images. QSRNet has the following novel contributions. First, QSRNet models the dependencies among the three QSR calculi. We introduce the dependencies as kinematics for QSR because they are analogous to the kinematics in classical mechanics. Second, QSRNet applies the 3-D point cloud instance segmentation to compute the QSR calculi. The experimental results show that QSRNet improves the accuracy in comparison to the other state-of-the-art techniques.

10:45-11:00, Paper WeAT1.4
Acquiring Mechanical Knowledge from 3D Point Clouds
Video Attachment

Li, Zijia	University of Tokyo
Okada, Kei	The University of Tokyo
Inaba, Masayuki	The University of Tokyo
Keywords: Deep Learning for Visual Perception, Representation Learning, Deep Learning in Grasping and Manipulation Abstract: We consider the problem of acquiring mechanical knowledge through visual cues to help robots use objects in new situations. In this work, we propose a novel deep learning approach that allows a robot to acquire mechanical knowledge from 3D point clouds. This presents two main challenges. The first challenge is that a robot needs to infer novel objects� functions from its experience. Secondly, the robot should also need to know how to manipulate these novel objects. To solve these problems, we present a two-branch deep neural network. The first branch detects function parts from the point clouds while the second branch predicts offset poses. Fusing the results from these two branches, our approach can not only detect what functions the novel objects may have but also generate key object states which can be used to guide a robot to manipulate these objects. We show that even though most of the training samples are synthetic data, our model still learns useful features and outputs proper results. Finally, we evaluate our approach on a real robot to run a series of tasks. The experimental results show that our approach has the capability to transfer mechanical knowledge in new situations.

11:00-11:15, Paper WeAT1.5
Learning Visual Policies for Building 3D Shape Categories
Video Attachment

Pashevich, Alexander	INRIA Grenoble Rhone-Alpes
Kalevatykh, Igor	INRIA
Laptev, Ivan	INRIA
Schmid, Cordelia	Inria
Keywords: Deep Learning in Grasping and Manipulation, Visual Learning, Assembly Abstract: Manipulation and assembly tasks require non-trivial planning of actions depending on the environment and the final goal. Previous work in this domain often assembles particular instances of objects from known sets of primitives. In contrast, we aim to handle varying sets of primitives and to construct different objects of a shape category. Given a single object instance of a category, e.g. an arch, and a binary shape classifier, we learn a visual policy to assemble other instances of the same category. In particular, we propose a disassembly procedure and learn a state policy that discovers new object instances and their assembly plans in state space. We then render simulated states in the observation space and learn a heatmap representation to predict alternative actions from a given input image. To validate our approach, we first demonstrate its efficiency for building object categories in state space. We then show the success of our visual policies for building arches from different primitives. Moreover, we demonstrate (i) the reactive ability of our method to re-assemble objects using additional primitives and (ii) the robust performance of our policy for unseen primitives resembling building blocks used during training. Our visual assembly policies are trained with no real images and reach up to 95% success rate when evaluated on a real robot.


WeAT2	Room T2
Cognitive Control Architectures	Regular session
Chair: Ramirez-Amaro, Karinne	Chalmers University of Technology
Co-Chair: Boularias, Abdeslam	Rutgers University

10:00-10:15, Paper WeAT2.1
The Robot As Scientist: Using Mental Simulation to Test Causal Hypotheses Extracted from Human Activities in Virtual Reality
Video Attachment

Uhde, Constantin	Technical University of Munich
Berberich, Nicolas	Technical University of Munich
Ramirez-Amaro, Karinne	Chalmers University of Technology
Cheng, Gordon	Technical University of Munich
Keywords: Cognitive Control Architectures, AI-Based Methods, Learning from Demonstration Abstract: To act effectively in its environment, a cognitive robot needs to understand the causal dependencies of all intermediate actions leading up to its goal. For example, the system has to infer that it is instrumental to open a cupboard door before trying to grasp an object inside the cupboard. In this paper, we introduce a novel learning method for extracting instrumental dependencies by following the scientific cycle of observations, generation of causal hypotheses and testing through experiments. Our method uses a virtual reality dataset containing observations from human activities to generate hypotheses about causal dependencies between actions. It detects pairs of actions with a high temporal co-occurrence and verifies if one action is instrumental in executing the other action through mental simulation in a virtual reality environment which represents the system's mental model. Our approach is able to extract all present instrumental action dependencies while significantly reducing the search space for mental simulation, resulting in a 6-fold reduction in computational time.

10:15-10:30, Paper WeAT2.2
Learning Transition Models with Time-Delayed Causal Relations
Video Attachment

Liang, Junchi	Rutgers University
Boularias, Abdeslam	Rutgers University
Keywords: Cognitive Control Architectures, Representation Learning, Reinforecment Learning Abstract: This paper introduces an algorithm for discovering implicit and delayed causal relations between events observed by a robot at arbitrary times, with the objective of improving data-efficiency and interpretability of model-based reinforcement learning (RL) techniques. The proposed algorithm initially predicts observations with the Markov assumption, and incrementally introduces new hidden variables to explain and reduce the stochasticity of the observations. The hidden variables are memory units that keep track of pertinent past events. Such events are systematically identified by their information gains. The learned transition and reward models are then used for planning. Experiments on simulated and real robotic tasks show that this method significantly improves over current RL techniques.

10:30-10:45, Paper WeAT2.3
Manipulation Planning Using Object-Centered Predicates and Hierarchical Decomposition of Contextual Actions
Video Attachment

Agostini, Alejandro	Technical University of Munich
Saveriano, Matteo	University of Innsbruck
Lee, Dongheui	Technical University of Munich
Piater, Justus	University of Innsbruck
Keywords: AI-Based Methods, Cognitive Control Architectures, Manipulation Planning Abstract: Current approaches combining task and motion planning require intensive geometric and symbolic reasoning to find feasible motions for task execution. The poor expressiveness of task planning domains for characterizing geometric changes with actions and the difficulties faced by current approaches to efficiently identify motion dependencies for plan execution produce expensive callings to motion planning on unfeasible actions and intensive reasoning to find realizable plans. In this work we combine two recent approaches to address these problems. Task planning is carried out using an object-centered description of geometric relations that consistently characterizes changes in the object configuration space. Plan execution is implemented using a symbol to motion hierarchical decomposition that depends on consecutive actions in the plan, rather than on single actions, which permits considering motion dependencies across plan actions for a successful execution.

10:45-11:00, Paper WeAT2.4
Convergence Analysis of Hybrid Control Systems in the Form of Backward Chained Behavior Trees

Ogren, Petter	Royal Institute of Technology (KTH)
Keywords: Behavior-Based Systems, Hybrid Logical/Dynamical Planning and Verification, Cognitive Control Architectures Abstract: A robot control system is often composed of a set of low level continuous controllers and a switching policy that decides which of those continuous controllers to apply at each time instant. The switching policy can be either a Finite State Machine (FSM), or a Behavior Tree (BT). In previous work we have shown how to create BTs using a backward chained approach that results in a reactive goal directed policy. This policy can be thought of as providing disturbance rejection at the task level in the sense that if a disturbance changes the state in such a way that the currently running continuous controller cannot handle it, the policy will switch to the appropriate continuous controller. In this paper we show how to provide convergence guarantees for such policies.

11:00-11:15, Paper WeAT2.5
Going Cognitive: A Demonstration of the Utility of Task-General Cognitive Architectures for Adaptive Robotic Task Performance
Video Attachment

Frasca, Tyler	Tufts University
Han, Zhao	UMass Lowell
Allspaw, Jordan	University of Massachusetts Lowell
Yanco, Holly	UMass Lowell
Scheutz, Matthias	Tufts University
Keywords: Cognitive Control Architectures, Control Architectures and Programming, Distributed Robot Systems Abstract: It has been claimed that a main advantage of cognitive architectures (compared to other types of specialized robotic architectures) is that they are task-general and can thus learn to perform any task as long as they have the right perceptual and action primitives. In this paper, we provide empirical evidence for this claim by directly comparing a high-performing custom robotic architecture developed for the standardized robotic �FetchIt!� challenge task to a hybrid cognitive robotic architecture that allows for online one-shot task learning and task modifications through natural language instructions. The results show that there is no disadvantage of running the hybrid architecture (i.e., no significant difference in overall performance or computational overhead compared to the custom architecture) while adding the flexibility of online one-shot task instruction and modification not available in the custom architecture.

11:15-11:30, Paper WeAT2.6
Robotic Episodic Cognitive Learning Inspired by Hippocampal Spatial Cells

Zou, Qiang	Dalian University of Technology
Cong, Ming	Dalian University of Technology
Liu, Dong	Dalian University of Technology
Du, Yu	Dalian Dahuazhongtian Technology Co., Ltd
Lyu, Zhi	Dalian University of Technology
Keywords: Biomimetics, Mapping, Cognitive Control Architectures Abstract: This paper presents a robotic episodic cognitive learning framework based on the biological cognitive mechanism of hippocampal spatial cells. By emphasizing the cognition process and episodic memory in brain, the framework adopts the velocity modulated grid cells and place cells to afford the robot position cognition, abstracts the state neurons to represent the robotic state, and uses state neurons� activity and connection to simulate the episodic memory construction process. The episodic memory is formed by a sequence of particular events consisting of visual features, state neuron, phase, and pose information. Besides an episodic-cognitive map building approach based on this framework is proposed, which performs closed-loop correction by resetting the spatial cells phase to keep the map accurate. The episodic-cognitive map built in this paper is a topological metric map to describe the topological relations of the particular events coordinates in the unknown environment. The framework is applied on a mobile robot platform, the robotic episodic cognitive learning and episodic-cognitive map building approach are investigated. The robotic experiments demonstrate that the framework can effectively achieve the robotic incremental accumulative learning, update the spatial cognition to the environment and construct the episodic-cognitive map.


WeAT3	Room T3
Visual Learning I	Regular session
Chair: Admoni, Henny	Carnegie Mellon University
Co-Chair: Verbelen, Tim	Ghent University - Imec

10:00-10:15, Paper WeAT3.1
Uncertainty-Aware Self-Supervised 3D Data Association

Wang, Jianren	Carnegie Mellon University
Ancha, Siddharth	Carnegie Mellon University
Chen, Yi-Ting	Honda Research Institute USA
Held, David	Carnegie Mellon University
Keywords: Visual Learning, Visual Tracking Abstract: 3D object trackers usually require training on large amounts of annotated data that is expensive and time-consuming to collect. Instead, we propose leveraging vast unlabeled datasets by self-supervised metric learning of 3D object trackers, with a focus on data association. Large scale annotations for unlabeled data are cheaply obtained by automatic object detection and association across frames. We show how these self-supervised annotations can be used in a principled manner to learn point-cloud embeddings that are effective for 3D tracking. We estimate and incorporate uncertainty in self-supervised tracking to learn more robust embeddings, without needing any labeled data. We design embeddings to differentiate objects across frames, and learn them using uncertainty-aware self-supervised training. Finally, we demonstrate their ability to perform accurate data association across frames, towards effective and accurate 3D tracking.

10:15-10:30, Paper WeAT3.2
F-Siamese Tracker: A Frustum-Based Double Siamese Network for 3D Single Object Tracking
Video Attachment

Zou, Hao	Zhejiang University
Cui, Jinhao	Zhejiang University
Kong, Xin	Zhejiang University
Zhang, Chujuan	Zhejiang University
Liu, Yong	Zhejiang University
Wen, Feng	Huawei Technologies Co., Ltd
Li, Wanlong	Beijing Huawei Digital Technologies Co., Ltd
Keywords: Visual Tracking, Deep Learning for Visual Perception, Sensor Fusion Abstract: This paper presents F-Siamese Tracker, a novel approach for single object tracking prominently characterized by more robustly integrating 2D and 3D information to reduce redundant search space. A main challenge in 3D single object tracking is how to reduce search space for generating appropriate 3D candidates. Instead of solely relying on 3D proposals, firstly, our method leverages the Siamese network applied on RGB images to produce 2D region proposals which are then extruded into 3D viewing frustums. Besides, we perform an online accuracy validation on the 3D frustum to generate refined point cloud searching space, which can be embedded directly into the existing 3D tracking backbone. For efficiency, our approach gains better performance with fewer candidates by reducing search space. In addition, benefited from introducing the online accuracy validation, for occasional cases with strong occlusions or very sparse points, our approach can still achieve high precision, even when the 2D Siamese tracker loses the target. This approach allows us to set a new state-of-the-art in 3D single object tracking by a significant margin on a sparse outdoor dataset (KITTI tracking). Moreover, experiments on 2D single object tracking show that our framework boosts 2D tracking performance as well.

10:30-10:45, Paper WeAT3.3
Segmenting the Future

Chiu, Hsu-kuang	Stanford University
Adeli, Ehsan	Stanford University
Niebles, Juan Carlos	Stanford University
Keywords: Visual Learning, Deep Learning for Visual Perception, Computer Vision for Transportation Abstract: Predicting the future is an important aspect for decision-making in robotics or autonomous driving systems, which heavily rely upon visual scene understanding. While prior work attempts to predict future video pixels, anticipate activities or forecast future scene semantic segments from segmentation of the preceding frames, methods that predict future semantic segmentation solely from the previous frame RGB data in a single end-to-end trainable model do not exist. In this paper, we propose a temporal encoder-decoder network architecture that encodes RGB frames from the past and decodes the future semantic segmentation. The network is coupled with a new knowledge distillation training framework specific for the forecasting task. Our method, only seeing preceding video frames, implicitly models the scene segments while simultaneously accounting for the object dynamics to infer the future scene semantic segments. Our results on Cityscapes and Apolloscape outperform the baseline and current state-of-the-art methods. Code will be available soon.

10:45-11:00, Paper WeAT3.4
Anomaly Detection for Autonomous Guided Vehicles Using Bayesian Surprise
Video Attachment

Catal, Ozan	Ghent University
Leroux, Sam	Ghent University
De Boom, Cedric	Ghent University
Verbelen, Tim	Ghent University - Imec
Dhoedt, Bart	Ghent University - Imec
Keywords: Visual Learning, Representation Learning, Deep Learning for Visual Perception Abstract: As warehouses, storage facilities and factories become more expanded and equipped with smart devices, there is a substantial need for rapid, intelligent and autonomous detection of unusual and potentially hazardous situations, also called anomalies. In particular for Autonomous Guided Vehicles (AGVs) that drive around these premises independently, unforeseen obstructions along their path---e.g.~a cardboard box in the middle of a corridor or bumps in the floor---and sudden or unexpected actions executed by personnel---e.g.~someone walking in a restricted area---make it hard for AGVs to navigate safely. We therefore propose a novel approach to detect such anomalies in an unsupervised manner by measuring Bayesian surprise: whenever an event is observed that does not align with the agent's prior knowledge of the world, this event is deemed surprising and could indicate an anomaly. This paper lays out the details on how to learn both the prior and posterior models of an AGV that drives around a warehouse and observes the environment through an RGBD camera. In the experiments we show that our Bayesian surprise approach outperforms a baseline that is traditionally used to detect anomalies in sequences of images.

11:00-11:15, Paper WeAT3.5
3DMotion-Net: Learning Continuous Flow Function for 3D MotionPrediction

Yuan, Shuaihang	New York University
Li, Xiang	New York University, Abu Dhabi
Tzes, Anthony	New York University Abu Dhabi
Fang, Yi	New York University
Keywords: Visual Tracking, Deep Learning for Visual Perception, Deep Learning in Grasping and Manipulation Abstract: This paper deals with predicting future 3D motions of 3D object scans from the previous two consecutive frames. Previous methods mostly focus on sparse motion prediction in the form of skeletons. While in this paper, we focus on predicting dense 3D motions in the form of 3D point clouds. To approach this problem, we propose a self-supervised approach that leverages the power of the deep neural network to learn a continuous flow function of 3D point clouds that can predict temporally consistent future motions and naturally bring out the correspondences among consecutive point clouds at the same time. More specifically, in our approach, to eliminate the unsolved and challenging process of defining a discrete point convolution on 3D point cloud sequences to encode spatial and temporal information, we introduce a learnable latent code to represent the temporal-aware shape descriptor, which is optimized during the model training. Moreover, a temporally consistent motion Morpher is proposed to learn a continuous flow field which deforms a 3D scan from the current frame to the next frame. We perform extensive experiments on D-FAUST, SCAPE, and TOSCA benchmark data sets. The results demonstrate that our approach is capable of handling temporally inconsistent input and produces consistent future 3D motion while requiring no ground truth supervision.

11:15-11:30, Paper WeAT3.6
Learning Vision-Based Physics Intuition Models for Non-Disruptive Object Extraction

Ahuja, Sarthak	Carnegie Mellon University
Admoni, Henny	Carnegie Mellon University
Steinfeld, Aaron	Carnegie Mellon University
Keywords: Visual Learning, Robot Safety, Deep Learning for Visual Perception Abstract: Robots operating in human environments must be careful, when executing their manipulation skills, not to disturb nearby objects. This requires robots to reason about the effect of their manipulation choices by accounting for the support relationships among objects in the scene. Humans do this in part by visually assessing their surroundings and using physics intuition for how likely it is that a particular object can be safely manipulated (i.e., cause no disruption in the rest of the scene). Existing work has shown that deep convolutional neural networks can learn intuitive physics over images generated in simulation and determine the stability of a scene in the real world. In this paper, we extend these physics intuition models to the task of assessing safe object extraction by conditioning the visual images on specific objects in the scene. Our results, in both simulation and real-world settings, show that with our proposed method, physics intuition models can be used to inform a robot of which objects can be safely extracted and from which direction to extract them.


WeAT4	Room T4
Visual Learning II	Regular session
Chair: Frintrop, Simone	University of Hamburg

10:00-10:15, Paper WeAT4.1
Spectral-GANs for High-Resolution 3D Point-Cloud Generation

Ramasinghe, Sameera Chandimal	The Australian National University
Khan, Salman	CSIRO
Barnes, Nick	National ICT Australia
Gould, Stephen	Australian National University
Keywords: Computer Vision for Other Robotic Applications, Novel Deep Learning Methods, Computer Vision for Automation Abstract: Point-clouds are a popular choice for robotics and computer vision tasks due to their accurate shape description and direct acquisition from range-scanners. This demands the ability to synthesize and reconstruct high-quality point-clouds. Current deep generative models for 3D data generally work on simplified representations (e.g., voxelized objects) and cannot deal with the inherent redundancy and irregularity in point-clouds. A few recent efforts on 3D point-cloud generation offer limited resolution and their complexity grows with the increase in output resolution. In this paper, we develop a principled approach to synthesize 3D point-clouds using a spectral-domain Generative Adversarial Network (GAN). Our spectral representation is highly structured and allows us to disentangle various frequency bands such that the learning task is simplified for a GAN model. As compared to spatial-domain generative approaches, our formulation allows us to generate high-resolution point-clouds with minimal computational overhead. Furthermore, we propose a fully differentiable block to transform from {the} spectral to the spatial domain and back, thereby allowing us to integrate knowledge from well-established spatial models. We demonstrate that Spectral-GAN performs well for point-cloud generation task. Additionally, it can learn {a} highly discriminative representation in an unsupervised fashion and can be used to accurately reconstruct 3D objects. Our codes are available at https://github.com/samgregoost/Spectral-GAN/

10:15-10:30, Paper WeAT4.2
UnRectDepthNet: Self-Supervised Monocular Depth Estimation Using a Generic Framework for Handling Common Camera Distortion Models
Video Attachment

Ravi Kumar, Varun	Valeo
Yogamani, Senthil	Home
Bach, Markus	Valeo
Witt, Christian	Valeo
Milz, Stefan	Valeo Schalter Und Sensoren GmbH
M�der, Patrick	Technische Universit�t Ilmenau
Keywords: Omnidirectional Vision, Computer Vision for Automation, Deep Learning for Visual Perception Abstract: In classical computer vision, rectification is an integral part of multi-view depth estimation. It typically includes epipolar rectification and lens distortion correction. This process simplifies the depth estimation significantly, and thus it has been adopted in CNN approaches. However, rectification has several side effects, including a reduced field-of-view (FOV), resampling distortion, and sensitivity to calibration errors. The effects are particularly pronounced in case of significant distortion (e.g., wide-angle fisheye cameras). In this paper, we propose a generic scale-aware self-supervised pipeline for estimating depth, euclidean distance, and visual odometry from unrectified monocular videos. We demonstrate a similar level of precision on the unrectified KITTI dataset with barrel distortion comparable to the rectified KITTI dataset. The intuition being that the rectification step can be implicitly absorbed within the CNN model, which learns the distortion model without increasing complexity. Our approach does not suffer from a reduced field of view and avoids computational costs for rectification at inference time. To further illustrate the general applicability of the proposed framework, we apply it to wide-angle fisheye cameras with 190 degrees horizontal field-of-view (FOV). The training framework UnRectDepthNet takes in the camera distortion model as an argument and adapts projection and unprojection functions accordingly. The proposed algorithm is evaluated further on the KITTI dataset, and we achieve state-of-the-art results that improve upon our previous work FisheyeDistanceNet~cite{kumar2019fisheyedistancenet}. Qualitative results on a distorted test scene video sequence indicate excellent performance https://youtu.be/K6pbx3bU4Ss.

10:30-10:45, Paper WeAT4.3
AutoLay: Benchmarking Amodal Layout Estimation for Autonomous Driving
Video Attachment

Mani, Kaustubh	IIIT-Hyderabad
Narasimhan, Sai Shankar	IIIT Hyderabad
Jatavallabhula, Krishna	Mila, Universite De Montreal
Krishna, Madhava	IIIT Hyderabad
Keywords: Computer Vision for Other Robotic Applications, Deep Learning for Visual Perception, Autonomous Vehicle Navigation Abstract: Given an image or a video captured from a monocular camera, amodal layout estimation is the task of predicting semantics and occupancy in bird�s eye view. The term amodal implies we also reason about entities in the scene that are occluded or truncated in image space. While several recent efforts have tackled this problem, there is a lack of standardization in task specification, datasets, and evaluation protocols. We address these gaps with AutoLay, a dataset and benchmark for amodal layout estimation from monocular images. AutoLay encompasses driving imagery from two popular datasets: KITTI Tracking [1] and Argoverse [2]. In addition to fine-grained attributes such as lanes, sidewalks, and vehicles, we also provide semantically annotated 3D pointclouds. We implement several baselines and bleeding edge approaches, and release our data and code

10:45-11:00, Paper WeAT4.4
Data-Driven Distributed State Estimation and Behavior Modeling in Sensor Networks

Yu, Rui	Pennsylvania State University
Yuan, Zhenyuan	The Pennsylvania State University
Zhu, Minghui	Pennsylvania State University
Zhou, Zihan	The Pennsylvania State University
Keywords: Sensor Networks, Behavior-Based Systems Abstract: Nowadays, the prevalence of sensor networks has enabled tracking of the states of dynamic objects for a wide spectrum of applications from autonomous driving to environmental monitoring and urban planning. However, tracking real-world objects often faces two key challenges: First, due to the limitation of individual sensors, state estimation needs to be solved in a collaborative and distributed manner. Second, the objects� movement behavior model is unknown, and needs to be learned using sensor observations. In this work, for the first time, we formally formulate the problem of simultaneous state estimation and behavior learning in a sensor network. We then propose a simple yet effective solution to this new problem by extending the Gaussian process-based Bayes filters (GP-BayesFilters) to an online, distributed setting. The effectiveness of the proposed method is evaluated on tracking objects with unknown movement behaviors using both synthetic data and data collected from a multi-robot platform.

11:00-11:15, Paper WeAT4.5
ProxEmo: Gait-Based Emotion Learning andMulti-View Proxemic Fusion for Socially-Aware Robot Navigation
Video Attachment

Narayanan, Venkatraman	UMD
Sai Sudhakar, Bala Murali Manoghar	University of Maryland, College Park
Dorbala, Vishnu Sashank	University of Maryland, College Park
Manocha, Dinesh	University of Maryland
Bera, Aniket	University of Maryland
Keywords: Computer Vision for Other Robotic Applications, Gesture, Posture and Facial Expressions, Social Human-Robot Interaction Abstract: We present ProxEmo, a novel end-to-end emotion prediction algorithm for socially aware robot navigation among pedestrians. Our approach predicts the perceived emotions of a pedestrian from walking gaits, which is then used for emotion-guided navigation taking into account social and proxemic constraints. To classify emotions, we propose a multi-view skeleton graph convolution-based model that works on a commodity camera mounted onto a moving robot. Our emotion recognition is integrated into a mapless navigation scheme and makes no assumptions about the environment of pedestrian motion. It achieves a mean average emotion prediction precision of 82.47 % on the Emotion-Gait benchmark dataset. We outperform current state-of-art algorithms for emotion recognition from 3D gaits. We highlight its benefits in terms of navigation in indoor scenes using a Clearpath Jackal robot.

11:15-11:30, Paper WeAT4.6
Gaussian Process Online Learning with a Sparse Data Stream

Park, Jehyun	Yonsei University
Choi, Jongeun	Yonsei University
Keywords: Sensor-based Control, Probability and Statistical Methods, Optimization and Optimal Control Abstract: Gaussian processes (GPs) have been exploited for various applications even including online learning. To learn time-varying hyperparameters from an information-limited sparse data stream, we consider the infinite-horizon Gaussian process (IHGP) with a low computational complexity for online learning. For example, the IHGP framework could provide efficient GP online learning with a sparse data stream in mobile devices. However, we show that the originally proposed IHGP has difficulty in learning time-varying hyperparameters online from the sparse data stream due to the numerically approximated gradient of the marginal likelihood function. In this paper, we show how to extend the IHGP in order to learn time-varying hyperparameters using a sparse and non-stationary data stream. In particular, our solution approach offers the exact gradient as the solution of a Lyapunov equation. Therefore, our approach achieves better performance with a sparse data stream while still keeping the computational complexity low. Finally, we present the comparison results to demonstrate that our approach outperforms the originally proposed IHGP on practical applications with sparse data streams. To demonstrate the effectiveness of our approach, we consider a multi-rate sensor fusion or an interpolation problem where slow vision systems need to be combined with other fast sensory units for feedback control in the field of autonomous driving. In particular, we apply our approach to vehicle lateral position error estimation together with a deep learning model for autonomous driving using non-stationary lateral position error signals in a model-free and data-driven fashion.


WeAT5	Room T5
Semantic Scene Understanding I	Regular session
Chair: Indelman, Vadim	Technion - Israel Institute of Technology
Co-Chair: Murillo, Ana Cristina	University of Zaragoza

10:00-10:15, Paper WeAT5.1
Semantic Graph Based Place Recognition for 3D Point Clouds
Video Attachment

Kong, Xin	Zhejiang University
Yang, Xuemeng	Zhejiang University
Zhai, Guangyao	Zhejiang University
Zhao, Xiangrui	Zhejiang University
Zeng, Xianfang	Zhejiang University
Wang, Mengmeng	Zhejiang University
Liu, Yong	Zhejiang University
Li, Wanlong	Beijing Huawei Digital Technologies Co., Ltd
Wen, Feng	Huawei Technologies Co., Ltd
Keywords: Semantic Scene Understanding, Deep Learning for Visual Perception, SLAM Abstract: Due to the difficulty in generating the effective descriptors which are robust to occlusion and viewpoint changes, place recognition for 3D point cloud remains an open issue. Unlike most of the existing methods that focus on extracting local, global, and statistical features of raw point clouds, our method aims at the semantic level that can be superior in terms of robustness to environmental changes. Inspired by the perspective of humans, who recognize scenes through identifying semantic objects and capturing their relations, this paper presents a novel semantic graph based approach for place recognition. First, we propose a novel semantic graph representation for the point cloud scenes by reserving the semantic and topological information of the raw point cloud. Thus, place recognition is modeled as a graph matching problem. Then we design a fast and effective graph similarity network to compute the similarity. Exhaustive evaluations on the KITTI dataset show that our approach is robust to the occlusion as well as viewpoint changes and outperforms the state-of-the-art methods with a large margin. Our code is available at: https://github.com/kxhit/SG_PR.

10:15-10:30, Paper WeAT5.2
A Bottom-Up Framework for Construction of Structured Semantic 3D Scene Graph
Video Attachment

Yu, Bangguo	Shandong University
Chen, Chongyu	Sun Yat-Sen University
Zhou, Fengyu	Shandong University
Wan, Fang	Shandong University
Zhuang, Wenmi	Shandong University
Zhao, Yang	School of Electrical Engineering and Automation, Qilu University
Keywords: Semantic Scene Understanding, Cognitive Human-Robot Interaction Abstract: For high-level human-robot interaction tasks, 3D scene understanding is important and non-trivial for autonomous robots. However, parsing and utilizing effective environment information of the 3D scene is not trivial due to the complexity of the 3D environment and the limited ability for reasoning about our visual world. Although there have been great efforts on semantic detection and scene analysis, the existing solutions for parsing and representation of the 3D scene still fail to preserve accurate semantic information and equip sufficient applicability. This study proposes a bottom-up construction framework for structured 3D scene graph generation, which efficiently describes the objects, relations and attributes of the 3D indoor environment with structured representation. In the proposed method, we adopt visual perception to capture the semantic information and inference from scene priors to calculate the optimal parse graph. Afterwards, an improved probabilistic grammar model is used to represent the scene priors. Experiment results demonstrate that the proposed framework significantly outperforms existing methods in terms of accuracy, and a demonstration is provided to verify the applicability in applying to high-level human-robot interaction tasks.The supplementary video can be accessed at the following link: https://youtu.be/vEWNxnSwmKI.

10:30-10:45, Paper WeAT5.3
Autonomous Detection and Assessment with Moving Sensors
Video Attachment

Spencer, Steven J.	Sandia National Laboratories
Parikh, Anup	Sandia National Laboratories
McArthur, Daniel	Sandia National Laboratories
Young, Carol	Sandia National Laboratories
Blada, Timothy	Sandia National Laboratories
Slightam, Jonathon E.	Sandia National Laboratories
Buerger, Stephen P.	Sandia National Laboratories
Keywords: Surveillance Systems, Perception-Action Coupling, Reactive and Sensor-Based Planning Abstract: Current approaches to physical security suffer from high false alarm rates and frequent human operator involvement, despite the relative rarity of real-world threats. We present a novel architecture for autonomous adaptive physical security called autonomous detection and assessment with moving sensors (ADAMS). ADAMS is a framework for reducing nuisance and false alarms by placing mobile robotic platforms equipped with sensors outside the normal asset perimeter. These robotic agents integrate sensor data from multiple perspectives over time, and autonomously move to obtain the best new data to reduce uncertainty in the threat scene. Inferences drawn from data fused over time provide ultimate decisions regarding whether to alert human operators. This paper describes the framework and algorithms used in a prototype ADAMS implementation. We describe the results of simulations comparing this framework to alternate paradigms. These simulations show ADAMS has a 4x increase in the range at which threats are identified versus traditional static sensors, and a 5x reduction in false alarms triggered versus frameworks where all sensor detections become alarms, leading to reduced operator load. Further, these simulations show this framework for reacting to new potential threats significantly outperforms methods which merely patrol the site. We also present the results of preliminary hardware trials of an exemplar prototype system, providing limited validation of the simulations in a real-time physical demonstration.

10:45-11:00, Paper WeAT5.4
Distributed Consistent Multi-Robot Semantic Localization and Mapping
Video Attachment

Tchuiev, Vladimir	Technion Israel Institute of Technology
Indelman, Vadim	Technion - Israel Institute of Technology
Keywords: Semantic Scene Understanding, Distributed Robot Systems, SLAM Abstract: We present an approach for multi-robot consistent distributed localization and semantic mapping in an unknown environment, considering scenarios with classification ambiguity, where objects' visual appearance generally varies with viewpoint. Our approach addresses such a setting by maintaining a distributed posterior hybrid belief over continuous localization and discrete classification variables. In particular, we utilize a viewpoint-dependent classifier model to leverage the coupling between semantics and geometry. Moreover, our approach yields a consistent estimation of both continuous and discrete variables, with the latter being addressed for the first time, to the best of our knowledge. We evaluate the performance of our approach in a multi-robot semantic SLAM simulation and in a real-world experiment, demonstrating an increase in both classification and localization accuracy compared to maintaining a hybrid belief using local information only.

11:00-11:15, Paper WeAT5.5
RegionNet: Region-Feature-Enhanced 3D Scene Understanding Network with Dual Spatial-Aware Discriminative Loss

Zhang, Guanghui	Shanghai Institute of Microsystem and Information Technology, Ch
Zhu, Dongchen	Shanghai Institute of Microsystem and Information Technology, Chi
Ye, Xiaoqing	Baidu Inc
Shi, Wenjun	Shanghai Institute of Microsystem and Information Technology
Chen, Minghong	Bionic Vision System Laboratory, State Key Laboratory of Transdu
Li, Jiamao	Shanghai Institute of Microsystem and Information Technology, Chi
Zhang, Xiaolin	Shanghai Institute of Microsystem and Information Technology, Chi
Keywords: Semantic Scene Understanding, Deep Learning for Visual Perception, Object Detection, Segmentation and Categorization Abstract: Neural networks have recently achieved impressive success in semantic and instance segmentation on 2D images. However, their capabilities have not been fully explored to address semantic instance segmentation on unstructured 3D point cloud data. Digging into the regional feature representation to boost point cloud comprehension, we propose a region-feature-enhanced structure consisting of adaptive regional feature complementary (ARFC) module and affinity-based regional relational reasoning (AR^3) module. The ARFC module aims to complement low-level features of sparse regions adaptively. The AR^3 module emphasizes on mining the potential reasoning relationships between high-level features based on affinity. Both the ARFC and AR^3 modules are plug-and-play. Besides, a novel dual spatial-aware discriminative loss is proposed to improve the discrimination of instance embedding by incorporating location information. Our proposal-free point cloud instance segmentation network (RegionNet) equipped with the region-feature-enhanced structure and dual spatial-aware discriminative loss achieves state-of-the-art performance on S3DIS dataset and ScanNet-v2 dataset.

11:15-11:30, Paper WeAT5.6
3D-MiniNet: Learning a 2D Representation from Point Clouds for Fast and Efficient 3D LIDAR Semantic Segmentation

Alonso, I�igo	University of Zaragoza
Riazuelo, Luis	Instituto De Investigaci�n En Ingenier�adeArag�n, University of Z
Montesano, Luis	Universidad De Zaragoza
Murillo, Ana Cristina	University of Zaragoza
Keywords: Semantic Scene Understanding, Object Detection, Segmentation and Categorization, Deep Learning for Visual Perception Abstract: LIDAR semantic segmentation is an essential task that provides 3D semantic information about the environment to robots. Fast and efficient semantic segmentation methods are needed to match the strong computational and temporal restrictions of many real-world robotic applications. This work presents 3D-MiniNet, a novel approach for LIDAR semantic segmentation that combines 3D and 2D learning layers. It first learns a 2D representation from the raw points through a novel projection which extracts local and global information from the 3D data. This representation is fed to an efficient 2D Fully Convolutional Neural Network (FCNN) that produces a 2D semantic segmentation. These 2D semantic labels are re-projected back to the 3D space and enhanced through a post-processing module. The main novelty in our strategy relies on the projection learning module. Our detailed ablation study shows how each component contributes to the final performance of 3D-MiniNet. We validate our approach on well known public benchmarks (SemanticKITTI and KITTI), where 3D-MiniNet gets state-of-the-art results while being faster and more parameter-efficient than previous methods.


WeAT6	Room T6
Semantic Scene Understanding II	Regular session
Chair: Lu, Peng	The Hong Kong Polytechnic University
Co-Chair: Behley, Jens	University of Bonn

10:00-10:15, Paper WeAT6.1
Domain Transfer for Semantic Segmentation of LiDAR Data Using Deep Neural Networks
Video Attachment

Langer, Ferdinand	University of Bonn
Milioto, Andres	University of Bonn
Haag, Alexandre	AID Driving
Behley, Jens	University of Bonn
Stachniss, Cyrill	University of Bonn
Keywords: Computer Vision for Transportation, Object Detection, Segmentation and Categorization, Transfer Learning Abstract: Inferring semantic information towards an understanding of the surrounding environment is crucial for autonomous vehicles to drive safely. Deep learning-based segmentation methods can infer semantic information directly from laser range data, even in the absence of other sensor modalities such as cameras. In this paper, we address improving the generalization capabilities of such deep learning models to range data that was captured using a different sensor and in situations where no labeled data is available for the new sensor setup. Our approach assists the domain transfer of a LiDAR-only semantic segmentation model to a different sensor and environment exploiting existing geometric mapping systems. To this end, we fuse sequential scans in the source dataset into a dense mesh and render semi-synthetic scans that match those of the target sensor setup. Unlike simulation, this approach provides a real-to-real transfer of geometric information and delivers additionally more accurate remission information. We implemented and thoroughly tested our approach by transferring semantic scans between two different real-world datasets with different sensor setups. Our experiments show that we can improve the segmentation performance substantially with zero manual re-labeling. This approach solves the number one feature request since we released our semantic segmentation library LiDAR-bonnetal

10:15-10:30, Paper WeAT6.2
On a Videoing Control System Based on Object Detection and Tracking
Video Attachment

Ren, Yanhao	Fudan University
Wang, Yi	Fantasy Power (Shanghai) Culture Communication Co., Ltd
Qi, Tang	Fantasy Power (Shanghai) Culture Communication Co., Ltd., Shangh
Jiang, Haijun	Fantasy Power (Shanghai) Culture Communication Co., Ltd
Lu, Wenlian	Fudan University
Keywords: Planning, Scheduling and Coordination, Object Detection, Segmentation and Categorization Abstract: In this paper, we propose a camera control system towards occasionally videoing preassigned objects. Based on the technique of real-time visual detection and tracking, using the Kalman filter and re-identification (ReID), we propose continuous composition of lens, based on the atomic rules of shots, and give the trajectory planning of the camera, to generate the PID controller to the pan-tilt. By both simulation and emulation by frame-wise cropping of video clips, we illustrate the efficiency of this method. Based on this model, we design and produce an AI automatic camera for lively photography and clip videoing.

10:30-10:45, Paper WeAT6.3
Understanding Dynamic Scenes Using Graph Convolution Networks
Video Attachment

Mylavarapu, Sravan	International Institute of Information Technology
Sandhu, Mahtab	International Institute of Information Technology, Hyderabad
Vijayan, Priyesh	McGill University
Krishna, Madhava	IIIT Hyderabad
Ravindran, Balaraman	IIT Madras
Namboodiri, Anoop M.	International Institute of Information Technology
Keywords: Computer Vision for Other Robotic Applications, Semantic Scene Understanding Abstract: We present a novel Multi-Relational Graph Convolutional Network (MRGCN) based framework to model on-road vehicle behaviors from a sequence of temporally ordered frames as grabbed by a moving monocular camera. The input to MRGCN is a multi-relational graph where the graph's nodes represent the active and passive agents/objects in the scene, and the bidirectional edges that connect every pair of nodes are encodings of their Spatio-temporal relations. We show that this proposed explicit encoding and usage of an intermediate spatio-temporal interaction graph to be well suited for our tasks over learning end-end directly on a set of temporally ordered spatial relations. We also propose an attention mechanism for MRGCNs that conditioned on the scene dynamically scores the importance of information from different interaction types. The proposed framework achieves significant performance gain over prior methods on vehicle-behavior classification tasks on four datasets. We also show a seamless transfer of learning to multiple datasets without resorting to fine-tuning. Such behavior prediction methods find immediate relevance in a variety of navigation tasks such as behavior planning, state estimation, and applications relating to the detection of traffic violations over videos.

10:45-11:00, Paper WeAT6.4
Quadrotor-Enabled Autonomous Parking Occupancy Detection
Video Attachment

Wang, Yafeng	Texas Tech University
Ren, Beibei	Texas Tech University
Keywords: Surveillance Systems, Aerial Systems: Mechanics and Control, Computer Vision for Other Robotic Applications Abstract: Large special-events parking involves various parking scenarios, e.g., temporary parking and on-street parking. Their occupancy detection is challenging as it is unrealistic to construct gates/stations for temporary parking areas or build a sensor-based detection system to cover every single street. To address this issue, this study develops a quadrotor-enabled autonomous parking occupancy detection system. A camera-equipped quadrotor is flying over the parking lot first; then the images are captured by the on-board camera of the quadrotor and transferred to the ground station; finally, the ground station will process and release the occupancy information to the driver�s mobile devices. The decision tree learning algorithm is adopted to determine the optimal flying speed for the quadrotor to balance the trade-off between the detection efficiency and accuracy. In order to tackle the complex environment in real-life parking, a convolutional neural network (CNN)-based vehicle detection model has been trained and implemented, where the realistic factors, e.g., passing pedestrians and tree blocking, are considered. Experiments are conducted to illustrate the effectiveness of the proposed system.

11:00-11:15, Paper WeAT6.5
Learning Consistency Pursued Correlation Filters for Real-Time UAV Tracking

Fu, Changhong	Tongji University
Yang, Xiaoxiao	Tongji University
Li, Fan	Tongji University
Xu, Juntao	Tongji University
Liu, Changjing	Tongji University
Lu, Peng	The Hong Kong Polytechnic University
Keywords: Surveillance Systems, Visual-Based Navigation, Aerial Systems: Perception and Autonomy Abstract: Correlation filter (CF)-based methods have demonstrated exceptional performance in visual object tracking for unmanned aerial vehicle (UAV) applications, but suffer from the undesirable boundary effect. To solve this issue, spatially regularized correlation filters (SRDCF) proposes the spatial regularization to penalize filter coefficients, thereby significantly improving the tracking performance. However, the temporal information hidden in the response maps is not considered in SRDCF, which limits the discriminative power and the robustness for accurate tracking. This work proposes a novel approach with dynamic consistency pursued correlation filters, i.e., the CPCF tracker. Specifically, through a correlation operation between adjacent response maps, a practical consistency map is generated to represent the consistency level across frames. By minimizing the difference between the practical and the scheduled ideal consistency map, the consistency level is constrained to maintain temporal smoothness, and rich temporal information contained in response maps is introduced. Besides, a dynamic constraint strategy is proposed to further improve the adaptability of the proposed tracker in complex situations. Comprehensive experiments are conducted on three challenging UAV benchmarks, i.e., UAV123@10FPS, UAVDT, and DTB70. Based on the experimental results, the proposed tracker favorably surpasses the other 25 state-of-the-art trackers with real-time running speed (~43FPS) on a single CPU.

11:15-11:30, Paper WeAT6.6
Deep Context Maps: Agent Trajectory Prediction Using Location-Specific Latent Maps

Gilitschenski, Igor	Massachusetts Institute of Technology
Rosman, Guy	Massachusetts Institute of Technology
Gupta, Arjun	MIT
Karaman, Sertac	Massachusetts Institute of Technology
Rus, Daniela	MIT
Keywords: Semantic Scene Understanding, Autonomous Vehicle Navigation, Autonomous Agents Abstract: In this paper, we propose a novel approach for agent motion prediction in cluttered environments. One of the main challenges in predicting agent motion is accounting for location and context-specific information. Our main contribution is the concept of learning context maps to improve the prediction task. Context maps are a set of location-specific latent maps that are trained alongside the predictor. Thus, the proposed maps are capable of capturing location context beyond visual context cues (e.g. usual average speeds and typical trajectories) or predefined map primitives (such as lanes and stop lines). We pose context map learning as a multi-task training problem and describe our map model and its incorporation into a state-of-the-art trajectory predictor. In extensive experiments, it is shown that use of learned maps can significantly improve predictor accuracy. Furthermore, the performance can be additionally boosted by providing partial knowledge of map semantics.


WeAT7	Room T7
Learning from Motion and Touch	Regular session
Chair: Begum, Momotaz	University of New Hampshire

10:00-10:15, Paper WeAT7.1
Learning Soft Robotic Assembly Strategies from Successful and Failed Demonstrations
Video Attachment

Hamaya, Masashi	OMRON SINIC X Corporation
von Drigalski, Felix Wolf Hans Erich	OMRON SINIC X Corporation
Matsubara, Takamitsu	Nara Institute of Science and Technology
Tanaka, Kazutoshi	OMRON SINIC X Corporation
Lee, Robert	Australian Centre for Robotic Vision
Nakashima, Chisato	OMRON Corp
Shibata, Yoshiya	OMRON Corpration
Ijiri, Yoshihisa	OMRON Corp
Keywords: Soft Robot Applications, Modeling, Control, and Learning for Soft Robots, Learning from Demonstration Abstract: Physically soft robots are promising for robotic assembly tasks as they allow stable contacts with the environment. In this study, we propose a novel learning system for soft robotic assembly strategies. We formulate this problem as a reinforcement learning task and design the reward function from human demonstrations. Our key insight is that the failed demonstrations can be used as constraints to avoid failed behaviors. To this end, we developed a teaching device with which humans can intuitively provide various demonstrations. Moreover, we leverage Physically-Consistent Gaussian Mixture Models to clearly assign Gaussian components to the successful and failed trials.We then create the reference trajectories via Gaussian Mixture Regressions, which fit the successful demonstrations while considering the failed ones. Finally, we apply a sampleefficient deep model-based reinforcement learning method to obtain robust strategies with a few interactions. To validate our method, we developed a real-robot experimental system composed of a rigid collaborative robot arm with a compliant wrist and the teaching device. Our results demonstrate that our method learned the assembly strategies with a higher success rate than when using only successful demonstrations.

10:15-10:30, Paper WeAT7.2
Pattern Analysis and Parameters Optimization of Dynamic Movement Primitives for Learning Unknown Trajectories

Li, Mantian	Harbin Institute of Technology
Yang, Zeguo	Harbin Institute of Technology
Zha, Fusheng	Harbin Institute of Technology
Wang, Xin	Alibaba (China) Co., Ltd
Wang, Pengfei	Harbin Institute of Technology, State Key Laboratory of Robotics
Guo, Wei	Harbin Institute of Technology
Caldwell, Darwin G.	Istituto Italiano Di Tecnologia
Chen, Fei	Istituto Italiano Di Tecnologia
Keywords: Learning from Demonstration, Motion and Path Planning, Optimization and Optimal Control Abstract: A robot in the future may initially has a good learning capability but an empty library of movements. It gradually enriches its library of movements through human demonstrations. Dynamic Movement Primitives (DMPs) has been proved to be an effective way to represent trajectories. Trajectories are classified into discrete and rhythmic ones, and parameters are set for each demonstrated trajectory. However, what kind of trajectory will be provided by robot users is sometimes unknown to robot developers, so trajectory pattern and the parameters can not be determined in advance. It's also impossible for non-technical robot users to set these parameters and determine the pattern of movements they are going to demonstrate. To make it easier for non-expert robot users to programme their robots by demonstration, this work presents an efficient way to deal with these two problems. The effectiveness of the proposed methodology is proved by teaching a robot to clean the whiteboard in different ways and stack a set of cubic boxes in specific order.

10:30-10:45, Paper WeAT7.3
Robot Learning from Demonstration with Tactile Signals for Geometry-Dependent Tasks

Huang, Isabella	UC Berkeley
Bajcsy, Ruzena	Univ of California, Berkeley
Keywords: Learning from Demonstration, Force and Tactile Sensing, Imitation Learning Abstract: Deploying robot learning frameworks in unconstrained environments requires robustness and tractability. We must not only equip the robot with a sufficient range of sensing capabilities, but also provide training data in a sample-efficient manner. To this end, we identify and address a need specifically in robot learning from demonstration (LfD) literature to account for not only end-effector pose and wrench signals, but also tactile signals for contact. While traditional pose and wrench signals have proven to be sufficient for robots to learn basic position and force-control behaviors, they are inherently too constraining for the learning of general manipulation tasks. In particular, useful manipulation tasks often rely on the geometry of the contact interaction. To explore the value of geometry-based tactile signals, we utilize a LfD framework built upon hidden Markov models and Gaussian mixture regression, adapt it to our robotic system equipped with a soft tactile sensor, and validate its performance with an edge-following task and a manipulation task involving different object geometries.

10:45-11:00, Paper WeAT7.4
Learning Optimized Human Motion Via Phase Space Analysis
Video Attachment

Gesel, Paul	University of New Hampshire
Mikulis-Borsoi, Francesco	University of New Hampshire
LaRoche, Dain	University of New Hampshire
Arthanat, Sajay	University of New Hampshire
Begum, Momotaz	University of New Hampshire
Keywords: Learning from Demonstration, Optimization and Optimal Control, Imitation Learning Abstract: This paper proposes a dynamic system based learning from demonstration approach to teach a robot activities of daily living. The approach takes inspiration from human movement literature to formulate trajectory learning as an optimal control problem. We assume a weighted combination of basis objective functions is the true objective function for a demonstrated motion. We derive basis objective functions analogous to those in human movement literature to optimize the robot's motion. This method aims to naturally adapt the learned motion in different situations. To validate our approach, we learn motions from two categories: 1) commonly prescribed therapeutic exercises and 2) tea making. We show the reproduction accuracy of our method and compare torque requirements to the dynamic motion primitive for each motion, with and without an added load.

11:00-11:15, Paper WeAT7.5
Learning Robust Manipulation Tasks Involving Contact Using Trajectory Parameterized Probabilistic Principal Component Analysis
Video Attachment

Vergara Perico, Cristian Alejandro	KU Leuven
De Schutter, Joris	KU Leuven
Aertbelien, Erwin	KU Leuven
Keywords: Learning from Demonstration, Reactive and Sensor-Based Planning, Force Control Abstract: In this paper, we aim to expedite the deployment of challenging manipulation tasks involving both motion and contact wrenches (forces and moments). To this end, we acquire motion and wrench signals from a small set of demonstrations using passive observation. To learn these tasks, we introduce Trajectory parameterized Probabilistic Principal Component Analysis (traPPCA) which compactly re-parameterizes the acquired signals using trajectory information and encodes the signal correlations using Probabilistic Principal Component Analysis (PPCA). Finally, the task is transferred to a robot setup by specifying the robot behavior using a constraint-based task specification and control approach. This framework results in increased robustness of the system against different sources of uncertainty: imprecise sensors, adaptation of the tool, and changes in the execution speed.


WeAT8	Room T8
Learning Categories and Concepts	Regular session
Chair: Wagner, Alan Richard	Penn State University
Co-Chair: Zhang, Hao	Colorado School of Mines

10:00-10:15, Paper WeAT8.1
Tell Me What This Is: Few-Shot Incremental Object Learning by a Robot

Ayub, Ali	Penn State University
Wagner, Alan Richard	Penn State University
Keywords: Learning Categories and Concepts, Recognition, Visual Learning Abstract: For many applications, robots will need to be incrementally trained to recognize the specific objects needed for an application. This paper presents a practical system for incrementally training a robot to recognize different object categories using only a small set of visual examples provided by a human. The paper evaluates and verifies a recently developed state-of-the-art method for few-shot incremental learning. After learning the object classes incrementally, the robot performs a table cleaning task organizing objects into categories specified by the human. We also demonstrate the system's ability to learn arrangements of objects and predict missing or incorrectly placed objects. Experimental evaluations demonstrate that our approach achieves nearly the same performance as a system trained with all examples at one time (batch training), which constitutes a theoretical upper bound.

10:15-10:30, Paper WeAT8.2
Voxel-Based Representation Learning for Place Recognition Based on 3D Point Clouds

Siva, Sriram	Colorado School of Mines
Nahman, Zachary	Colorado School of Mines
Zhang, Hao	Colorado School of Mines
Keywords: Representation Learning, SLAM Abstract: Place recognition is a critical component towards addressing the key problem of Simultaneous Localization and Mapping (SLAM). Most existing methods use visual images; whereas, place recognition using 3D point clouds, especially based on the voxel representations, has not been well addressed yet. In this paper, we introduce the novel approach of voxel-based representation learning (VBRL) that uses 3D point clouds to recognize places with long-term environment variations. VBRL splits a 3D point cloud input into voxels and uses multi-modal features extracted from these voxels to perform place recognition. Additionally, VBRL uses structured sparsity-inducing norms to learn representative voxels and feature modalities that are important to match places under long-term changes. Both place recognition, and voxel and feature learning are integrated into a unified regularized optimization formulation. As the sparsity-inducing norms are non-smooth, it is hard to solve the formulated optimization problem. Thus, we design a new iterative optimization algorithm, which has a theoretical convergence guarantee. Experimental results have shown that VBRL performs place recognition well using 3D point cloud data and is capable of learning the importance of voxels and feature modalities.

10:30-10:45, Paper WeAT8.3
Robotic Understanding of Spatial Relationships Using Neural-Logic Learning

Yan, Fujian	Wichita State University
Wang, Dali	Oak Ridge National Lab
He, Hongsheng	Wichita State University
Keywords: Cognitive Human-Robot Interaction, Semantic Scene Understanding Abstract: Abstract--- Understanding spatial relations of objects is critical in many robotic applications such as grasping, manipulation, and obstacle avoidance. Humans can simply reason object's spatial relations from a glimpse of a scene based on prior knowledge of spatial constraints. The proposed method enables a robot to comprehend spatial relationships among objects from RGB-D data. This paper proposed a neural-logic learning framework to learn and reason spatial relations from raw data by following logic rules on spatial constraints. The neural-logic network consists of three blocks: grounding block, spatial logic block, and inference block. The grounding block extracts high-level features from the raw sensory data. The spatial logic blocks can predicate fundamental spatial relations by training a neural network with spatial constraints. The inference block can infer complex spatial relations based on the predicated fundamental spatial relations. Simulations and robotic experiments evaluated the performance of the proposed method.

10:45-11:00, Paper WeAT8.4
Understanding Contexts Inside Robot and Human Manipulation Tasks through Vision-Language Model and Ontology System in Video Streams
Video Attachment

Jiang, Chen	University of Alberta
Dehghan, Masood	University of Alberta
Jagersand, Martin	University of Alberta
Keywords: Learning Categories and Concepts, Visual Learning, Deep Learning for Visual Perception Abstract: Manipulation tasks in daily life, such as pouring water, unfold through human intentions. Being able to process contextual knowledge from these Activities of Daily Living (ADLs) over time can help us understand manipulation intentions, which are essential for an intelligent robot to transition smoothly between various manipulation actions. In this paper, to model the intended concepts of manipulation, we present a vision dataset under a strictly constrained knowledge domain for both robot and human manipulations, where manipulation concepts and relations are stored by an ontology system in a taxonomic manner. Furthermore, we propose a scheme to generate a combination of visual attentions and an evolving knowledge graph filled with commonsense knowledge. Our scheme works with real-world camera streams and fuses an attention-based Vision-Language model with the ontology system. The experimental results demonstrate that the proposed scheme can successfully represent the evolution of an intended object manipulation procedure for both robots and humans. The proposed scheme allows the robot to mimic human-like intentional behaviors by watching real-time videos. We aim to develop this scheme further for real-world robot intelligence in Human-Robot Interaction.

11:00-11:15, Paper WeAT8.5
Representing Spatial Object Relations As Parametric Polar Distribution for Scene Manipulation Based on Verbal Commands
Video Attachment

Kartmann, Rainer	Karslruhe Institute of Technology (KIT)
Zhou, You	Karlsruhe Institute of Technology (KIT)
Liu, Danqing	Karlsruhe Institute of Technology (KIT)
Paus, Fabian	Karlsruhe Institute of Technology (KIT)
Asfour, Tamim	Karlsruhe Institute of Technology (KIT)
Keywords: Learning Categories and Concepts, Cognitive Human-Robot Interaction, Representation Learning Abstract: Understanding spatial relations is a key element for natural human-robot interaction. Especially, a robot must be able to manipulate a given scene according to a human verbal command specifying desired spatial relations between objects. To endow robots with this ability, a suitable representation of spatial relations is necessary, which should be derivable from human demonstrations. We claim that polar coordinates can capture the underlying structure of spatial relations better than Cartesian coordinates and propose a parametric probability distribution defined in polar coordinates to represent spatial relations. We consider static spatial relations such as left of, behind, and near, as well as dynamic ones such as closer to and other side of, and take into account verbal modifiers such as roughly and a lot. We show that adequate distributions can be derived for various combinations of spatial relations and modifiers in a sample-efficient way using Maximum Likelihood Estimation, evaluate the effects of modifiers on the distribution parameters, and demonstrate our representation's usefulness in a pick-and-place task on a real robot.

11:15-11:30, Paper WeAT8.6
Weakly-Supervised Learning for Multimodal Human Activity Recognition in Human-Robot Collaboration Scenarios

Pohlt, Clemens	OTH Regensburg
Schlegl, Thomas	OTH Regensburg
Wachsmuth, Sven	Bielefeld University
Keywords: Gesture, Posture and Facial Expressions, Multi-Modal Perception, Computer Vision for Other Robotic Applications Abstract: The ability to synchronize expectations among human-robot teams and understand discrepancies between expectations and reality is essential for human-robot collaboration scenarios. To ensure this, human activities and intentions must be interpreted quickly and reliably by the robot using various modalities. In this paper we propose a multimodal recognition system designed to detect physical interactions as well as nonverbal gestures. Existing approaches feature high post-transfer recognition rates which, however, can only be achieved based on well-prepared and large datasets. Unfortunately, the acquisition and preparation of domain-specific samples especially in industrial context is time consuming and expensive. To reduce this effort we introduce a weakly-supervised classification approach. Therefore, we learn a latent representation of the human activities with a variational autoencoder network. Additional modalities and unlabeled samples are incorporated by a scalable product-of-expert sampling approach. The applicability in industrial context is evaluated by two domain-specific collaborative robot datasets. Our results demonstrate, that we can keep the number of labeled samples constant while increasing the network performance by providing additional unprocessed information.


WeAT9	Room T9
Learning about Objects/affordances	Regular session
Chair: Matuszek, Cynthia	University of Maryland, Baltimore County
Co-Chair: Hermans, Tucker	University of Utah

10:00-10:15, Paper WeAT9.1
Tool Shape Optimization through Backpropagation of Neural Network
Video Attachment

Kawaharazuka, Kento	The University of Tokyo
Ogawa, Toru	Preferred Networks, Inc
Nabeshima, Cota	Preferred Networks, Inc
Keywords: Model Learning for Control, AI-Based Methods, Deep Learning in Grasping and Manipulation Abstract: When executing a certain task, human beings can choose or make an appropriate tool to achieve the task. This research especially addresses the optimization of tool shape for robotic tool-use. We propose a method in which a robot obtains an optimized tool shape, tool trajectory, or both, depending on a given task. The feature of our method is that a transition of the task state when the robot moves a certain tool along a certain trajectory is represented by a deep neural network. We applied this method to object manipulation tasks on a 2D plane, and verified that appropriate tool shapes are generated by using this novel method.

10:15-10:30, Paper WeAT9.2
A Causal Approach to Tool Affordance Learning
Video Attachment

Brawer, Jake	Yale University
Qin, Meiying	Yale University
Scassellati, Brian	Yale
Keywords: AI-Based Methods, Learning Categories and Concepts Abstract: While abstract knowledge like cause-and-effect relations enables robots to problem-solve in new environments, acquiring such knowledge remains out of reach for many traditional machine learning techniques. In this work, we introduce a method for a robot to learn an explicit model of cause-and-effect by constructing a structural causal model through a mix of observation and self-supervised experimentation, allowing a robot to reason from causes to effects and from effects to causes. We demonstrate our method on tool affordance learning tasks, where a humanoid robot must leverage its prior learning to utilize novel tools effectively. Our results suggest that after minimal training examples, our system can preferentially choose new tools based on the context, and can use these tools for goal-directed object manipulation.

10:30-10:45, Paper WeAT9.3
Learning Object Attributes with Category-Free Grounded Language from Deep Featurization

Richards, Luke	University of Maryland, Baltimore County
Darvish, Kasra	University of Maryland, Baltimore County
Matuszek, Cynthia	University of Maryland, Baltimore County
Keywords: Cognitive Human-Robot Interaction, Multi-Modal Perception, Human-Centered Robotics Abstract: While grounded language learning, or learning the meaning of language with respect to the physical world in which a robot operates, is a major area in human-robot interaction studies, most research occurs in closed worlds or domain-constrained settings. We present a system in which language is grounded in visual percepts without using categorical constraints by combining CNN-based visual featurization with natural language labels. We demonstrate results comparable to those achieved using handcrafted features for specific traits, a step towards moving language grounding into the space of fully open world recognition.

10:45-11:00, Paper WeAT9.4
Visuomotor Mechanical Search: Learning to Retrieve Target Objects in Clutter
Video Attachment

Kurenkov, Andrey	Stanford University
Taglic, Joseph	Stanford University
Kulkarni, Rohun	Stanford University
Dominguez-Kuhne, Marcus	California Institute of Technology
Garg, Animesh	University of Toronto
Mart�n-Mart�n, Roberto	Stanford University
Savarese, Silvio	Stanford University
Keywords: Deep Learning in Grasping and Manipulation, Reinforecment Learning, RGB-D Perception Abstract: When searching for objects in cluttered environments, it is often necessary to perform complex interactions in order to move occluding objects out of the way and fully reveal the object of interest and make it graspable. Due to the complexity of the physics involved and the lack of accurate models of the clutter, planning and controlling precise predefined interactions with accurate outcome is extremely hard, when not impossible. In problems where accurate (forward) models are lacking, Deep Reinforcement Learning (RL) has shown to be a viable solution to map observations (e.g. images) to good interactions in the form of close-loop visuomotor policies. However, Deep RL is sample inefficient and fails when applied directly to the problem of unoccluding objects based on images. In this work we present a novel Deep RL procedure that combines i) teacher-aided exploration, ii) a critic with privileged information, and iii) mid-level representations, resulting in sample efficient and effective learning for the problem of uncovering a target object occluded by a heap of unknown objects. Our experiments show that our approach trains faster and converges to more efficient uncovering solutions than baselines and ablations, and that our uncovering policies lead to an average improvement in the graspability of the target object, facilitating downstream retrieval applications.

11:00-11:15, Paper WeAT9.5
Multi-Fingered Active Grasp Learning
Video Attachment

Lu, Qingkai	University of Utah
Van der Merwe, Mark	University of Utah
Hermans, Tucker	University of Utah
Keywords: Deep Learning in Grasping and Manipulation Abstract: Learning-based approaches to grasp planning are preferred over analytical methods due to their ability to better generalize to new, partially observed objects. However, data collection remains one of the biggest bottlenecks for grasp learning methods, particularly for multi-fingered hands. The relatively high dimensional configuration space of the hands coupled with the diversity of objects common in daily life requires a significant number of samples to produce robust and confident grasp success classifiers. In this paper, we present the first active deep learning approach to grasping that searches over the grasp configuration space and classifier confidence in a unified manner. We base our approach on recent success in planning multi-fingered grasps as probabilistic inference with a learned neural network likelihood function. We embed this within a multi-armed bandit formulation of sample selection. We show that our active grasp learning approach uses fewer training samples to produce grasp success rates comparable with the passive supervised learning method trained with grasping data generated by an analytical planner. We additionally show that grasps generated by the active learner have greater qualitative and quantitative diversity in shape.

11:15-11:30, Paper WeAT9.6
Incorporating Object Intrinsic Features within Deep Grasp Affordance Prediction

Veres, Matthew	University of Guelph
Cabral, Ian	University of Guelph
Moussa, Medhat	Guelph
Keywords: Deep Learning in Grasping and Manipulation, Multi-Modal Perception, Grasping Abstract: Robotic grasping systems often rely on visual observations to drive the grasping process, where the robot must be able to detect and localize an object, extract features relevant to the task, and then combine this information to plan a manipulation strategy. But what happens when some of the most impactful features are not observed by the robot? Without context on an objects center-of-mass, for example, a robot may make assumptions such as uniform density that do not hold, and which may in turn guide the robot into perceiving a sub-optimal set of grasping configurations. In this work, we examine how having prior knowledge of an object's intrinsic properties influences the task of dense grasp accordance prediction. We investigate a simple, constrained grasping task where object properties heavily regulate the space of successful grasps, and further evaluate how learning is affected when generalizing across unseen weight configurations and unseen object shapes.


WeAT10	Room T10
Segmentation I	Regular session
Chair: Vincze, Markus	Vienna University of Technology
Co-Chair: Z�rn, Jannik	University of Freiburg

10:00-10:15, Paper WeAT10.1
Invisible Marker: Automatic Annotation of Segmentation Masks for Object Manipulation
Video Attachment

Takahashi, Kuniyuki	Preferred Networks
Yonekura, Kenta	Preferred Networks, Inc
Keywords: Perception for Grasping and Manipulation, Big Data in Robotics and Automation, Object Detection, Segmentation and Categorization Abstract: We propose a method to annotate segmentation masks accurately and automatically using invisible marker for object manipulation. Invisible marker is invisible under visible (regular) light conditions, but becomes visible under invisible light, such as ultraviolet (UV) light. By painting objects with the invisible marker, and by capturing images while alternately switching between regular and UV light at high speed, massive annotated datasets are created quickly and inexpensively. We show a comparison between our proposed method and manual annotations. We demonstrate semantic segmentation for deformable objects including clothes, liquids, and powders under controlled environmental light conditions. In addition, we show demonstrations of liquid pouring tasks under uncontrolled environmental light conditions in complex environments such as inside the office, house, and outdoors. Furthermore, it is possible to capture data while the camera is in motion so it becomes easier to capture large datasets, as shown in our demonstration.

10:15-10:30, Paper WeAT10.2
Meta Learning with Differentiable Closed-Form Solver for Fast Video Object Segmentation
Video Attachment

Liu, Yu	The University of Adelaide
Liu, Lingqiao	University of Adelaide
Zhang, Haokui	Northwestern Polytechnical University
Rezatofighi, S. Hamid	The University of Adelaide
Yan, Qingsen	Northwestern Ployteachnical University
Reid, Ian	University of Adelaide
Keywords: Object Detection, Segmentation and Categorization, Deep Learning for Visual Perception, Recognition Abstract: Video object segmentation plays a vital role to many robotic tasks, beyond the satisfied accuracy, quickly adapt to the new scenario with very limited annotations and conduct a quick inference are also important. In this paper, we are specifically concerned with the task of fast segmenting all pixels of a target object in all frames, given the annotation mask in the first frame. Even when such annotation is available, this remains a challenging problem because of the changing appearance and shape of the object over time. In this paper, we tackle this task by formulating it as a meta-learning problem, where the base learner grasping the semantic scene understanding for a general type of objects, and the meta learner quickly adapting the appearance of the target object with a few examples. Our proposed meta-learning method uses a closed form optimizer, the so-called ``ridge regression", which has been shown to be conducive for fast and better training convergence. Moreover, we propose a mechanism, named ``block splitting", to further speed up the training process as well as to reduce the number of learning parameters. In comparison with the state-of-the art methods, our proposed framework achieves significant boost up in processing speed, while having highly comparable performance compared to the best performing methods on the widely used datasets.

10:30-10:45, Paper WeAT10.3
Cascaded Non-Local Neural Network for Point Cloud Semantic Segmentation

Cheng, Mingmei	Nanjing University of Science and Technology
Hui, Le	Nanjing University of Science and Technology
Xie, Jin	Nanjing University of Science and Technology
Yang, Jian	Nanjing University of Science & Technology
Kong, Hui	Nanjing University of Science and Technology
Keywords: Object Detection, Segmentation and Categorization, Semantic Scene Understanding, Visual Learning Abstract: In this paper, we propose a cascaded non-local neural network for point cloud segmentation. The proposed network aims to build the long-range dependencies of point clouds for the accurate segmentation. Specifically, we develop a novel cascaded non-local module, which consists of the neighborhood-level, superpoint-level and global-level non-local blocks. First, in the neighborhood-level block, we extract the local features of the centroid points of point clouds by assigning different weights to the neighboring points. The extracted local features of the centroid points are then used to encode the superpoint-level block with the non-local operation. Finally, the global-level block aggregates the non-local features of the superpoints for semantic segmentation in an encoder-decoder framework. Benefiting from the cascaded structure, geometric structure information of different neighborhoods with the same label can be propagated. In addition, the cascaded structure can largely reduce the computational cost of the original non-local operation on point clouds. Experiments on different indoor and outdoor datasets show that our method achieves state-of-the-art performance and effectively reduces the time consumption and memory occupation.

10:45-11:00, Paper WeAT10.4
Robust and Efficient Object Change Detection by Combining Global Semantic Information and Local Geometric Verification
Video Attachment

Langer, Edith	TU Wien
Patten, Timothy	TU Wien
Vincze, Markus	Vienna University of Technology
Keywords: Object Detection, Segmentation and Categorization, Semantic Scene Understanding, RGB-D Perception Abstract: Identifying new, moved or missing objects is an important capability for robot tasks such as surveillance or maintaining order in homes, offices and industrial settings. However, current approaches do not distinguish between novel objects or simple scene rearrangements nor do they sufficiently deal with localization errors and sensor noise. To overcome these limitations, we combine the strengths of global and local methods for efficient detection of novel objects in 3D reconstructions of indoor environments. Global structure, determined from 3D semantic information, is exploited to establish object candidates. These are then locally verified by comparing isolated geometry to a reference reconstruction provided by the task. We evaluate our approach on a novel dataset containing different types of rooms with 31 scenes and 260 annotated objects. Experiments show that our proposed approach significantly outperforms baseline methods.

11:00-11:15, Paper WeAT10.5
HeatNet: Bridging the Day-Night Domain Gap in Semantic Segmentation with Thermal Images
Video Attachment

Vertens, Johan	University of Freiburg
Z�rn, Jannik	University of Freiburg
Burgard, Wolfram	Toyota Research Institute
Keywords: Computer Vision for Other Robotic Applications, Semantic Scene Understanding, Intelligent Transportation Systems Abstract: The majority of learning-based semantic segmentation methods are optimized for daytime scenarios and favorable lighting conditions. Real-world driving scenarios, however, entail adverse environmental conditions such as nighttime illumination or glare which remain a challenge for existing approaches. In this work, we propose a multimodal semantic segmentation model that can be applied during daytime and nighttime. To this end, besides RGB images, we leverage thermal images, making our network significantly more robust. We avoid the expensive annotation of nighttime images by leveraging an existing daytime RGB-dataset and propose a teacher-student training approach that transfers the dataset's knowledge to the nighttime domain. We further adopt a domain adaptation method to align the learned feature spaces across the domains and propose a novel two-stage training scheme. Furthermore, due to a lack of thermal data for autonomous driving, we present a new dataset comprising over 20,000 time-synchronized and aligned RGB-thermal image pairs. In this context, we also present a novel target-less calibration method that allows for automatic robust extrinsic and intrinsic thermal camera calibration. Among others, we use our new dataset to show state-of-the-art results for nighttime semantic segmentation.

11:15-11:30, Paper WeAT10.6
PBP-Net: Point Projection and Back-Projection Network for 3D Point Cloud Segmentation

Yang, JuYoung	Korea Advanced Institute of Science and Technology
Lee, Chanho	KAIST
Ahn, Pyunghwan	KAIST
Lee, Haeil	KAIST
Yi, Eojindl	KAIST
Kim, Junmo	KAIST
Keywords: Object Detection, Segmentation and Categorization, Semantic Scene Understanding Abstract: Following considerable development in 3D scanning technologies, many studies have recently been proposed with various approaches for 3D vision tasks, including some methods that utilize 2D convolutional neural networks (CNNs). However, even though 2D CNNs have achieved high performance in many 2D vision tasks, existing works have not effectively applied them onto 3D vision tasks. In particular, segmentation has not been well studied because of the difficulty of dense prediction for each point, which requires rich feature representation. In this paper, we propose a simple and efficient architecture named point projection and back-projection network (PBP-Net), which leverages 2D CNNs for the 3D point cloud segmentation. 3 modules are introduced, each of which projects 3D point cloud onto 2D planes, extracts features using a 2D CNN backbone, and back-projects features onto the original 3D point cloud. To demonstrate effective 3D feature extraction using 2D CNN, we perform various experiments including comparison to recent methods. We analyze the proposed modules through ablation studies and perform experiments on object part segmentation (ShapeNet-Part dataset) and indoor scene semantic segmentation (S3DIS dataset). The experimental results show that proposed PBP-Net achieves comparable performance to existing state-of-the-art methods.


WeAT11	Room T11
Segmentation II	Regular session
Chair: Atanasov, Nikolay	University of California, San Diego
Co-Chair: Stachniss, Cyrill	University of Bonn

10:00-10:15, Paper WeAT11.1
Single-Shot Panoptic Segmentation

Weber, Mark	RWTH Aachen University
Luiten, Jonathon	RWTH Aachen University
Leibe, Bastian	RWTH Aachen University
Keywords: Object Detection, Segmentation and Categorization, Semantic Scene Understanding, Deep Learning for Visual Perception Abstract: We present a novel end-to-end single-shot method that segments countable object instances (things) as well as background regions (stuff) into a non-overlapping panoptic segmentation at almost video frame rate. Current state-of-the-art methods are far from reaching video frame rate and mostly rely on merging instance segmentation with semantic background segmentation, making them impractical to use in many applications such as robotics. Our approach relaxes this requirement by using an object detector but is still able to re-solve inter- and intra-class overlaps to achieve a non-overlapping segmentation. On top of a shared encoder-decoder backbone, we utilize multiple branches for semantic segmentation, object detection, and instance center prediction. Finally, our panoptic head combines all outputs into a panoptic segmentation and can even handle conflicting predictions between branches as well as certain false predictions. Our network achieves 32.6% PQ on MS-COCO at 23.5 FPS, opening up panoptic segmentation to a broader field of applications.

10:15-10:30, Paper WeAT11.2
Meta-Learning Deep Visual Words for Fast Video Object Segmentation
Video Attachment

Behl, Harkirat Singh	University of Oxford
Najafi, Mohammad	University of Oxford
Arnab, Anurag	University of Oxford
Torr, Philip	University of Oxford
Keywords: Object Detection, Segmentation and Categorization, Deep Learning for Visual Perception, Recognition Abstract: Accurate video object segmentation methods finetune a model using the first annotated frame, and/or use additional inputs such as optical flow and complex post-processing. In contrast, we develop a fast algorithm that requires no finetuning, auxiliary inputs or post-processing, and segments a variable number of objects in a single forward-pass. We represent an object with clusters, or ''visual words'', in the embedding space, which correspond to object parts in the image space. This allows us to robustly match to the reference objects throughout the video, because although the global appearance of an object changes as it undergoes occlusions and deformations, the appearance of more local parts may stay consistent. We learn these visual words in an unsupervised manner, using meta-learning to ensure that our training objective matches our inference procedure. We achieve comparable accuracy to finetuning based methods (whilst being 1 to 2 orders of magnitude faster), and state-of-the-art in terms of speed/accuracy trade-offs on four video segmentation datasets.

10:30-10:45, Paper WeAT11.3
Fully Convolutional Geometric Features for Category-Level Object Alignment
Video Attachment

Feng, Qiaojun	University of California, San Diego
Atanasov, Nikolay	University of California, San Diego
Keywords: RGB-D Perception, Object Detection, Segmentation and Categorization, Semantic Scene Understanding Abstract: This paper focuses on pose registration of different object instances from the same category. This is required in online object mapping because object instances detected at test time usually differ from the training instances. Our approach transforms instances of the same category to a normalized canonical coordinate frame and uses metric learning to train fully convolutional geometric features. The resulting model is able to generate pairs of matching points between the instances, allowing category-level registration. Evaluation on both synthetic and real-world data shows that our method provides robust features, leading to accurate alignment of instances with different shapes.

10:45-11:00, Paper WeAT11.4
Towards Unsupervised Learning for Instrument Segmentation in Robotic Surgery with Cycle-Consistent Adversarial Networks

Pakhomov, Daniil	Johns Hopkins University
Shen, Wei	Beijing Institute of Technology
Navab, Nassir	TU Munich
Keywords: Object Detection, Segmentation and Categorization, Surgical Robotics: Laparoscopy Abstract: Surgical tool segmentation in endoscopic images is an important problem: it is a crucial step towards full instrument pose estimation and it is used for integration of pre- and intra-operative images into the endoscopic view. While many recent approaches based on convolutional neural networks have shown great results, a key barrier to progress lies in the acquisition of a large number of manually-annotated images which is necessary for an algorithm to generalize and work well in diverse surgical scenarios. Unlike the surgical image data itself, annotations are difficult to acquire and may be of variable quality. On the other hand, synthetic annotations can be automatically generated by using forward kinematic model of the robot and CAD models of tools by projecting them onto an image plane. Unfortunately, this model is very inaccurate and cannot be used for supervised learning of image segmentation models. Since generated annotations will not directly correspond to endoscopic images due to errors, we formulate the problem as an unpaired image-to-image translation where the goal is to learn the mapping between an input endoscopic image and a corresponding annotation using an adversarial model. Our approach allows to train image segmentation models without the need to acquire expensive annotations and can potentially exploit large unlabeled endoscopic image collection outside the annotated distributions of image/annotation data. We test our proposed method on Endovis 2017 challenge dataset and show that it is competetive with supervised segmentation methods.

11:00-11:15, Paper WeAT11.5
LiDAR Panoptic Segmentation for Autonomous Driving

Milioto, Andres	University of Bonn
Behley, Jens	University of Bonn
McCool, Christopher Steven	University of Bonn
Stachniss, Cyrill	University of Bonn
Keywords: Object Detection, Segmentation and Categorization, Semantic Scene Understanding Abstract: Truly autonomous driving without the need for human intervention can only be attained when self-driving cars fully understand their surroundings. Most of these vehicles rely on a suite of active and passive sensors. LiDAR sensors are a cornerstone in most of these hardware stacks, and leveraging them as a complement to other passive sensors such as RGB cameras is an enticing goal. Understanding the semantic class of each point in a LiDAR sweep is important, as well as knowing to which instance of that class it belongs to. To this end, we present a novel, single-stage, and real-time capable panoptic segmentation approach using a shared encoder with a semantic and instance decoder. We leverage the geometric information of the LiDAR scan to perform a distance-aware trilinear upsampling, which allows our approach to use larger output strides than using transpose convolutions leading to substantial savings in computation time. Our experimental evaluation and ablation studies for each module show that combining our geometric and semantic embeddings with our learned, variable instance thresholds, a category-specific loss, and the novel trilinear upsampling module leads to higher panoptic quality. We will release the code of our approach in our LiDAR processing library LiDAR-Bonnetal.

11:15-11:30, Paper WeAT11.6
LiDAR Guided Small Obstacle Segmentation
Video Attachment

Singh, Aasheesh	International Institute of Information Technology, Hyderabad
Kamireddypalli, Aditya	IIIT Hyderabad
Gandhi, Vineet	IIIT Hyderabad
Krishna, Madhava	IIIT Hyderabad
Keywords: Visual-Based Navigation, Computer Vision for Automation, Computer Vision for Other Robotic Applications Abstract: Detecting small obstacles on the road is critical for autonomous driving. In this paper, we present a method to reliably detect such obstacles through a multi-modal framework of sparse LiDAR(VLP-16) and Monocular vision. LiDAR is employed to provide additional context in the form of confidence maps to monocular segmentation networks. We show significant performance gains when the context is fed as an additional input to monocular semantic segmentation frameworks. We further present a new semantic segmentation dataset to the community, comprising of over 3000 image frames with corresponding LiDAR observations. The images come with pixel-wise annotations of three classes off-road, road, and small obstacle. We stress that precise calibration between LiDAR and camera is crucial for this task and thus propose a novel Hausdorff distance based calibration refinement method over extrinsic parameters. As a first benchmark over this dataset, we report our results with 73 % instance detection up to a distance of 50 meters on challenging scenarios. Qualitatively by showcasing accurate segmentation of obstacles less than 15 cms at 50m depth and quantitatively through favourable comparisons vis a vis prior art, we vindicate the method�s efficacy. Our project and dataset is hosted at https://small-obstacle-dataset.github.io/.


WeAT12	Room T12
Scene Understanding I	Regular session
Chair: Vieira, Marcos	Universidade Federal De Minas Gerais
Co-Chair: Berger, Marie-Odile	INRIA

10:00-10:15, Paper WeAT12.1
Localizing against Drawn Maps Via Spline-Based Registration

Chen, Kevin	Stanford University
V�zquez, Marynel	Yale University
Savarese, Silvio	Stanford University
Keywords: Localization, Deep Learning for Visual Perception Abstract: We propose a method to facilitate robot navigation relative to sketched maps of human environments. Our main contribution centers around using thin plate splines for registering the robot's LIDAR observation with the hand-drawn maps. Thin plate splines are particularly effective for this task because they are able to handle many of the non-rigid deformations commonly seen in sketches of maps, which render traditional rigid transformations inappropriate. Our proposed approach uses a convolutional neural network to efficiently predict the control points which define the spline transform, from which we then compute the pose of the robot on the hand drawn map for navigation purposes. Our systematic evaluations in simulation using a synthetic dataset and real, hand-drawn sketches show that the proposed spline-based registration approach outperforms baseline methods.

10:15-10:30, Paper WeAT12.2
KR-Net: A Dependable Visual Kidnap Recovery Network for Indoor Spaces

Hyeon, Janghun	Korea University
Kim, Dongwoo	Korea University
Jang, Bumchul	Korea University
Choi, Hyunga	Korea University
Yi, Dong-Hoon	LG Electronics
Yoo, Kyungho	LG Electronics
Choi, Jeongae	LG Electronics
Doh, Nakju	Korea University
Keywords: Localization, Visual Learning, Service Robots Abstract: In this paper, we propose a dependable visual kidnap recovery (KR) framework that pinpoints a unique pose in a given 3D map when a device is turned on. For this framework, we first develop indoor-GeM (i-GeM), which is an extension of GeM but considerably more robust than other global descriptors, including GeM itself. Then, we propose a convolutional neural network (CNN)-based system called KR-Net, which is based on a coarse-to-fine paradigm as in Inloc and HFNet. To our knowledge, KR-Net is the first network that can pinpoint a wake-up pose with a confidence level near 100% within a 1.0m translational error boundary. This dependable success rate is enabled not only by i-GeM, but also by a combinatorial pooling approach that uses multiple images around the wake-up spot, whereas previous implementations were constrained to a single image. Experiments were conducted in two challenging datasets: a large-scale (12,557m^2) area with frequent featureless or repetitive places and a place with significant view changes due to a one-year gap between prior modeling and query acquisition. Given 59 test query sets (eight images per pose), KR-Net successfully found all wakeup poses, with average and maximum errors of 0.246m and 0.983m, respectively.

10:30-10:45, Paper WeAT12.3
Mobile Robot Localization under Non-Gaussian Noise usingCorrentropy Similarity Metric

Da Silva Santos, Elerson Rubens	Universidade Federal De Minas Gerais
Vieira, Marcos	Universidade Federal De Minas Gerais
Sukhatme, Gaurav	University of Southern California
Keywords: Localization, Range Sensing, Sensor Fusion Abstract: In this paper, we study the localization problem under non-Gaussian noise. In particular, we consider systems that can be represented by a state transition and a measurement component. The state transition indicates how the system evolves given a control variable. The measurement component compares, for a given state, the received and predicted measurements. Here we consider a radio based range sensor which is the primary source of non-Gaussian noise in the system. We solve the problem using a MHE (Maximum Horizon Estimator) with a correntropy similarity metric. The MHE seeks the best set of states given a window of time, that explains the system and the received measurements. Moreover, the main advantage of a MHE is that it allows the re-estimation of past states. Additionally, the correntropy is a similarity metric that, given the amount of error in the estimation, behaves as L2, L1 or L0 norms and has been successfully used in many applications under non-Gaussian noise. We evaluate our proposed method using both simulated and real data. The results show that correntropy is able to work well in comparison with other methods in presence of impulsive noise.

10:45-11:00, Paper WeAT12.4
Semantic Localization Considering Uncertainty of Object Recognition
Video Attachment

Akai, Naoki	Nagoya University
Hirayama, Takatsugu	Nagoya University
Murase, Hiroshi	Nagoya University
Keywords: Localization, Semantic Scene Understanding, Range Sensing Abstract: Semantics can be leveraged in ego-vehicle localization to improve robustness and accuracy because objects with the same labels can be correctly matched with each other. Object recognition has significantly improved owing to advances in machine learning algorithms. However, perfect object recognition is still challenging in real environments. Hence, the uncertainty of object recognition must be considered in localization. This paper proposes a novel localization method that integrates a supervised object recognition method, which predicts probabilistic distributions over object classes for individual sensor measurements, into the Bayesian network for localization. The proposed method uses the estimated probabilities and Dirichlet distribution to calculate the likelihood for estimating an ego-vehicle pose. Consequently, the uncertainty can be handled in localization. We present an implementation example of the proposed method using a particle filter and deep-neural-network-based point cloud semantic segmentation and evaluate it by simulation and the SemanticKITTI dataset. Experimental results show that the proposed method can accurately generate likelihood distribution even when object recognition accuracy is degraded, and its estimation accuracy is the highest compared to that of two conventional methods.

11:00-11:15, Paper WeAT12.5
Perspective-2-Ellipsoid: Bridging the Gap between Object Detections and 6-DoF Camera Pose

Gaudilliere, Vincent	Inria Nancy Grand-Est
Simon, Gilles	Loria
Berger, Marie-Odile	INRIA
Keywords: Localization, Visual-Based Navigation, Recognition Abstract: Recent years have seen the emergence of very effective ConvNet-based object detectors that have reconfigured the computer vision landscape. As a consequence, new approaches that propose object-based reasoning to solve traditional problems, such as camera pose estimation, have appeared. In particular, these methods have shown that modelling 3D objects by ellipsoids and 2D detections by ellipses offers a convenient manner to link 2D and 3D data. Following that promising direction, we propose here a novel object-based pose estimation algorithm that does not require any sensor but a RGB camera. Our method operates from at least two bject detections, and is based on a new paradigm that enables to decrease the Degrees of Freedom (DoF) of the pose stimation problem from six to three, while two simplifying yet realistic assumptions reduce the remaining DoF to only one. Exhaustive search is performed over the unique unknown parameter to recover the full camera pose. Robust algorithms designed to deal with any number of objects as well as a refinement step are introduced. Effectiveness of the method has been assessed on the challenging T-LESS and Freiburg datasets.

11:15-11:30, Paper WeAT12.6
Delta Descriptors: Change-Based Place Representation for Robust Visual Localization

Garg, Sourav	Queensland University of Technology
Harwood, Ben	Monash University
Anand, Gaurangi	Queensland University of Technology
Milford, Michael J	Queensland University of Technology
Keywords: Localization Abstract: Visual place recognition is challenging because there are so many factors that can cause the appearance of a place to change, from day-night cycles to seasonal change to atmospheric conditions. In recent years a large range of approaches have been developed to address this challenge including deep-learnt image descriptors, domain translation, and sequential filtering, all with shortcomings including generality and velocity-sensitivity. In this paper we propose a novel descriptor derived from tracking changes in any learned global descriptor over time, dubbed Delta Descriptors. Delta Descriptors mitigate the offsets induced in the original descriptor matching space in an unsupervised manner by considering temporal differences across places observed along a route. Like all other approaches, Delta Descriptors have a shortcoming - volatility on a frame to frame basis - which can be overcome by combining them with sequential filtering methods. Using two benchmark datasets, we first demonstrate the high performance of Delta Descriptors in isolation, before showing new state-of-the-art performance when combined with sequence-based matching. We also present results demonstrating the approach working with four different underlying descriptor types, and two other beneficial properties of Delta Descriptors in comparison to existing techniques: their increased inherent robustness to variations in camera motion and a reduced rate of performance degradation as dimensional reduction is applied. Source code is made available at https://github.com/oravus/DeltaDescriptors.


WeAT13	Room T13
Scene Understanding II	Regular session
Chair: Lee, Donghwan	Naverlabs
Co-Chair: Wang, Jingchuan	Shanghai Jiao Tong University

10:00-10:15, Paper WeAT13.1
SpoxelNet: Spherical Voxel-Based Deep Place Recognition for 3D Point Clouds of Crowded Indoor Spaces
Video Attachment

Chang, Min Young	NAVER LABS
Yeon, Suyong	Naver Labs
Ryu, Soohyun	NAVER LABS
Lee, Donghwan	Naverlabs
Keywords: Localization, SLAM, Deep Learning for Visual Perception Abstract: With its essential role in achieving full autonomy of robot navigation, place recognition has been widely studied with various approaches. Recently, numerous point cloud-based methods with deep learning implementation have been proposed with promising results for their application in outdoor environments. However, their performances are not as promising in indoor spaces because of the high level of occlusion caused by structures and moving objects. In this paper, we propose a point cloud-based place recognition method for crowded indoor spaces. The method consists of voxelizing point clouds in spherical coordinates and defining the occupancy of each voxel in ternary values. We also present SpoxelNet, a neural network architecture that encodes input voxels into global descriptor vectors by extracting the structural features in both fine and coarse scales. It also reinforces its performance in occluded places by concatenating feature vectors from multiple directions. Our method is evaluated in various indoor datasets and outperforms existing methods with a large margin.

10:15-10:30, Paper WeAT13.2
Online Localization with Imprecise Floor Space Maps Using Stochastic Gradient Descent
Video Attachment

Li, Zhikai	National University of Singapore
Ang Jr, Marcelo H	National University of Singapore
Rus, Daniela	MIT
Keywords: Localization, Autonomous Vehicle Navigation, SLAM Abstract: Many indoor spaces have constantly changing layouts and may not be mapped by an autonomous vehicle, yet maps such as floor plans or evacuation maps of these places are common. We propose a method for an autonomous robot to localize itself on such maps with inconsistent scale using Stochastic Gradient Descent (SGD) with scan matching using a 2D LiDAR. We also introduce a new scale state in 2D localization to manage the possible inconsistent scale of the input map. Experiments are conducted in an indoor corridor using three different input maps - a point cloud, a floor plan, and a hand-drawn map. The SGD localization algorithm is bench-marked to Adaptive Monte Carlo Localization (AMCL). In a point cloud mapped environment, our algorithm achieves 0.264m and 5.26 degrees average position and heading error respectively. On the hand-drawn map, our SGD localization algorithm remains robust while AMCL fails. The role of the scale state in our SGD localization algorithm is demonstrated in poorly scaled maps.

10:30-10:45, Paper WeAT13.3
BIT-VO: Visual Odometry at 300 FPS Using Binary Features from the Focal Plane
Video Attachment

Murai, Riku	Imperial College London
Saeedi, Sajad	Ryerson University
Kelly, Paul H J	Imperial College London
Keywords: Localization, Mapping, Visual-Based Navigation Abstract: Focal-plane Sensor-processor (FPSP) is a next-generation camera technology which enables every pixel on the sensor chip to perform computation in parallel, on the focal plane where the light intensity is captured. SCAMP-5 is a general-purpose FPSP used in this work and it carries out computations in the analog domain before analog to digital conversion. By extracting features from the image on the focal plane, data which is digitised and transferred is reduced. As a consequence, SCAMP-5 offers a high frame rate while maintaining low energy consumption. Here, we present BIT-VO, which is the first 6-Degrees of Freedom visual odometry algorithm which utilises the FPSP. Our entire system operates at 300 FPS in a natural environment, using binary edges and corner features detected by the SCAMP-5.

10:45-11:00, Paper WeAT13.4
Long-Term Localization with Time Series Map Prediction for Mobile Robots in Dynamic Environments
Video Attachment

Wang, Lisai	Shanghai Jiao Tong University
Chen, Weidong	Shanghai Jiao Tong University
Wang, Jingchuan	Shanghai Jiao Tong University
Keywords: Localization Abstract: In many applications of mobile robot, the environment is constantly changing. How to use historical information to analysis environmental changes and generate a map corresponding with current environment is important to achieve high-precision localization. Inspired by predictive mechanism of brain, this paper presents a long-term localization approach named ArmMPU (ARMA-based Map Prediction and Update) based on time series modeling and prediction. Autoregressive moving average model (ARMA), a kind of time series modeling method, is employed for environmental map modeling and prediction, then predicted map and filtered observation are fused to fix the prediction error. The simulation and experiment results show that the proposed method improves long-term localization performance in dynamic environments.

11:00-11:15, Paper WeAT13.5
Improved Data Association Using Buffered Pose Adjustment for Map-Aided Localization

Welte, Anthony	Universit� De Technologie De Compi�gne
Xu, Philippe	Universit� De Technologique De Compi�gne
Bonnifait, Philippe	Univ. of Technology of Compiegne
Zinoune, Cl�ment	University of Technologie of Compi�gne, Renault SAS
Keywords: Localization, Intelligent Transportation Systems Abstract: Maps provide an important source of information for autonomous vehicles. They can be used along with cameras and lidars to localize the vehicle. This requires the ability to correctly associate observations to features referenced in the map. The problem is all the more difficult than all observations are not necessarily referenced, and all map features might not be detectable with the embedded sensors. This paper presents an adjustment technique than enables to increase the number of associations that can be made while limiting the chance of obtaining wrong associations. This is achieved by matching observations in batches in a buffer and matching them regularly. Periodically, the observation buffer is used to adjust the trajectory used to match observations. This is done without making any assumption on the association between observations and map features through a likelihood maximization process. The adjusted trajectory then provides the best associations that are used for real-time localization. The method was tested with data recorded on public roads using an experimental vehicle. The results show that thanks to the trajectory adjustment step and the use of an observation buffer, the number of associations that can be made is increased. This also results in greater localization accuracy and consistency with an average error of 0.7 meters at 50 Hz using road markings and traffic signs.


WeAT14	Room T14
Modeling, Control, and Learning for Soft Robots 2	Regular session
Chair: Yeow, Chen-Hua	National University of Singapore
Co-Chair: Kuntz, Alan	University of Utah

10:00-10:15, Paper WeAT14.1
Utilizing Sacrificial Molding for Embedding Motion Controlling Endostructures in Soft Pneumatic Actuators

Bhat, Ajinkya	National University of Singapore
Yeow, Chen-Hua	National University of Singapore
Keywords: Soft Robot Materials and Design, Soft Sensors and Actuators, Modeling, Control, and Learning for Soft Robots Abstract: The field of soft robotics has evolved as a domain for developing light, compliant and safe actuators. However, one of the challenges in the field is the lack of repeatable fabrication techniques as well as customizability that restricts the application of soft robots. We present a fabrication technique using sacrificial molding to fabricate pneumatic channels that are repeatable and less prone to variability. This technique enables the monolithic fabrication of actuators which eliminates conventional failure modes such as delamination. We then use embedded endostructures manufactured using Fused Deposition Modelling (FDM) 3D printers to customize the behavior of bending actuator by altering local mechanical characteristics. Finite element analysis (FEA) was used as a tool to tune the choice of materials and the geometry of the 3D printed layers based on the required application. We analyze the effect of the mechanical properties of the endostructures on actuator behavior and its utility in improving customizability. We analyzed the behavior of actuators with a variety of endostructures using visual markers. As predicted by the FEA and Euler-Bernoulli beam theory, the behavior of the actuators was seen to be influenced by the mechanical properties of the endostucture. Thus, we present a new methodology for tuning the mechanical properties of Soft Pneumatic Actuators (SPAs), which is simple and efficient to predict as well as easy to execute.

10:15-10:30, Paper WeAT14.2
Simultaneous Position-Stiffness Control of Antagonistically Driven Twisted-Coiled Polymer Actuators Using Model Predictive Control

Moon, Hyungpil	Sungkyunkwan University
Luong, Anh Tuan	Sungkyunkwan University
Seo, Sungwon	SungKyunKwan Univ
Kim, Kihyeon	Sungkyunkwan University
Choi, Hyouk Ryeol	Sungkyunkwan University
Koo, Ja Choon	Sungkyunkwan University
Jeon, Jeongmin	Sungkyunkwan University
Keywords: Modeling, Control, and Learning for Soft Robots, Soft Robot Applications, Soft Sensors and Actuators Abstract: Super-coiled polymer (SCP) artificial muscles have many interesting properties that show potentials for making high performance bionic devices. To realize human-like robotic devices from this type of actuator, it is important for the SCP driven mechanisms to achieve human-like performance, such as compliant behaviors through antagonistic mechanisms. This paper presents the simultaneous position-stiffness control of an antagonistic joint driven by hybrid twisted-coiled polymer actuation bundles made from Spandex and nylon fibers, which is a common human compliant behavior. Based on a linear model of the system, which is identified and verified experimentally, a controller based on model predictive control (MPC) is designed. The MPC performance is enhanced by the incorporation of time delay estimation to estimate model variations and external disturbances. The controlled system is verified through simulations and experiments. The results show the controller's ability to control the joint angle with the highest position error of 0.6 degrees while changing joint stiffness, verified with both step command and sinusoidal reference with composite frequencies of 0.01Hz to 0.1Hz.

10:30-10:45, Paper WeAT14.3
Teleoperation and Contact Detection of a Waterjet-Actuated Soft Continuum Manipulator for Low-Cost Gastroscopy

Campisano, Federico	Vanderbilt University
Remirez, Andria	Vanderbilt University
Landewee, Claire Ann	Vanderbilt University, STORM Lab
Cal�, Simone	University of Leeds
Obstein, Keith	Vanderbilt University
Webster III, Robert James	Vanderbilt University
Valdastri, Pietro	University of Leeds
Keywords: Modeling, Control, and Learning for Soft Robots, Telerobotics and Teleoperation, Contact Modeling Abstract: Gastric cancer is the third leading cause of cancer deaths worldwide, with most new cases occurring in low and middle income countries, where access to screening programs is hindered by the high cost of conventional endoscopy. The waterjet-actuated HydroJet endoscopic platform was developed as a low-cost, disposable alternative for inspection of the gastric cavity in low-resource settings. In this work, we present a teleoperation scheme and contact detection algorithm that work together to enable intuitive teleoperation of the HydroJet within the confined space of the stomach. Using a geometrically accurate stomach model and realistic anatomical inspection targets, we demonstrate that, using these methods, a novice user can complete a gastroscopy in approximately the same amount of time with the HydroJet as with a conventional endoscope.

10:45-11:00, Paper WeAT14.4
Data-Driven Disturbance Observers for Estimating External Forces on Soft Robots

Della Santina, Cosimo	Massachusetts Institute of Technology
Truby, Ryan Landon	Massachusetts Institute of Technology
Rus, Daniela	MIT
Keywords: Modeling, Control, and Learning for Soft Robots, Dynamics, Soft Robot Applications Abstract: Unlike traditional robots, soft robots can intrinsically interact with their environment in a continuous, robust, and safe manner. These abilities - and the new opportunities they open - motivate the development of algorithms that provide reliable information on the nature of environmental interactions and, thereby, enable soft robots to reason on and properly react to external contact events. However, directly extracting such information with integrated sensors remains an arduous task that is further complicated by also needing to sense the soft robot's configuration. As an alternative to direct sensing, this paper addresses the challenge of estimating contact forces directly from the robot's posture. We propose a new technique that merges a nominal disturbance observer, a model-based component, with corrections learned from data. The result is an algorithm that is accurate yet sample efficient, and one that can reliably estimate external contact events with the environment. We prove the convergence of our proposed method analytically, and we demonstrate its performance with simulations and physical experiments.


WeAT15	Room T15
Modeling, Control, and Learning for Soft Robots 1	Regular session
Chair: Simaan, Nabil	Vanderbilt University
Co-Chair: Misra, Sarthak	University of Twente

10:00-10:15, Paper WeAT15.1
Towards Gradient-Based Actuationof Magnetic Soft Robots Using a Six-Coil Electromagnetic System
Video Attachment

Kalpathy Venkiteswaran, Venkatasubramanian	University of Twente
Misra, Sarthak	University of Twente
Keywords: Modeling, Control, and Learning for Soft Robots, Soft Robot Applications, Biologically-Inspired Robots Abstract: Soft materials with embedded magnetic properties can be actuated in a contactless manner for dexterous motion in restricted and unstructured environments. Magnetic soft robots have been demonstrated to be capable of versatile and programmable untethered motion. However, magnetic soft robots reported in literature are typically actuated by utilizing magnetic fields to generate torques that produce deformation. By contrast, this work investigates the utilization of field gradients to produce tethering forces for anchoring soft robots to the working surface, in conjunction with the use of magnetic fields to generate torques for deformation. The methodology applied here uses a six-coil electromagnetic system for field generation. The approach to achieve the magnetic field and gradients desired for soft robot motion is described, along with the restrictions imposed by Maxwell's equations. The design and fabrication of the soft robots is explained together with calculations to assess the capabilities of the actuation system. Proof-of-concept demonstrations of soft robot motion show Hexapede robots with the ability to `walk' untethered on the ceiling of the workspace, working against gravity; and lightweight Worm robots made of thin strips of material are demonstrated to locomote while staying in contact with the ground.

10:15-10:30, Paper WeAT15.2
Model Identification of a Soft Robotic Neck

Quevedo Vallejo, Fernando	Universidad Carlos III De Madrid (UC3M)
Mu�oz, Jorge	University
Castano, Juan Alejandro	Carlos III University
Monje, Concepci�n A.	University Carlos III of Madrid
Balaguer, Carlos	Universidad Carlos III De Madrid
Keywords: Modeling, Control, and Learning for Soft Robots, Calibration and Identification, Flexible Robots Abstract: Soft links and actuators are nowadays emerging technologies aiming to overcome some problems in robotics such as weight, cost or human interaction. However, the nonlinear nature of their elements can make their characterization challenging and hinder the use of standard control engineering tools. In this paper, we explore different state-of-the-art identification methods for the soft neck, in order to find a reliable plant model. Even though the neck has three Degrees of freedom, in this work we only consider the planar deflection of the link as a starting point for future analysis. Given the nonlinear nature of the soft neck, we consider two identification strategies, i.e., set membership, which is a data driven, nonlinear and nonparametric identification strategy, and Recursive Least Squares at selected linearization points. A neural network identification is also given for comparison purposes. Results show that the explored methods offer a suitable alternative to identify the dynamics of the neck that allows their implementation for simulation and future control.

10:30-10:45, Paper WeAT15.3
Observer-Based Control of Inflatable Robot with Variable Stiffness
Video Attachment

Ataka, Ahmad	Queen Mary University of London
Abrar, Taqi	Queen Mary University of London
Putzu, Fabrizio	Queen Mary University of London
Godaba, Hareesh	Queen Mary University of London
Althoefer, Kaspar	Queen Mary University of London
Keywords: Modeling, Control, and Learning for Soft Robots, Motion Control, Soft Robot Applications Abstract: In the last decade, soft robots have been at the forefront of a robotic revolution. Due to the flexibility of the soft materials employed, soft robots are equipped with a capability to execute new tasks in new application areas - beyond what can be achieved using classical rigid-link robots. Despite these promising properties, many soft robots nowadays lack the capability to exert sufficient force to perform various real-life tasks. This has led to the development of stiffness-controllable inflatable robots instilled with the ability to modify their stiffness during motion. This new capability, however, poses an even greater challenge for robot control. In this paper, we propose a model-based kinematic control strategy to guide the tip of an inflatable robot arm in its environment. The bending of the robot is modelled using an Euler-Bernoulli beam theory which takes into account the variation of the robot's structural stiffness. The parameters of the model are estimated online using an observer based on the Extended Kalman Filter (EKF). The parameters' estimates are used to approximate the Jacobian matrix online and used to control the robot's tip considering also variations in the robot's stiffness. Simulation results and experiments using a fabric-based planar 3-degree-of-freedom (DOF) inflatable manipulators demonstrate the promising performance of the proposed control algorithm.

10:45-11:00, Paper WeAT15.4
Solving Cosserat Rod Models Via Collocation and the Magnus Expansion
Video Attachment

Orekhov, Andrew	Vanderbilt University
Simaan, Nabil	Vanderbilt University
Keywords: Flexible Robots, Kinematics, Modeling, Control, and Learning for Soft Robots Abstract: Choosing a kinematic model for a continuum robot typically involves making a tradeoff between accuracy and computational complexity. One common modeling approach is to use the Cosserat rod equations, which have been shown to be accurate for many types of continuum robots. This approach, however, still presents significant computational cost, particularly when many Cosserat rods are coupled via kinematic constraints. In this work, we propose a numerical method that combines orthogonal collocation on the local rod curvature and forward integration of the Cosserat rod kinematic equations via the Magnus expansion, allowing the equilibrium shape to be written as a product of matrix exponentials. We provide a bound on the maximum step size to guarantee convergence of the Magnus expansion for the case of Cosserat rods, compare in simulation against other approaches, and demonstrate the tradeoffs between speed and accuracy for the fourth and sixth order Magnus expansions as well as for different numbers of collocation points. Our results show that the proposed method can find accurate solutions to the Cosserat rod equations and can potentially be competitive in computation speed.


WeAT16	Room T16
Soft Actuators	Regular session
Chair: Mazumdar, Yi	Georgia Institute of Technology
Co-Chair: Swensen, John	Washington State University

10:00-10:15, Paper WeAT16.1
Design of Fully Soft Actuator with Double-Helix Tendon Routing Path for Twisting Motion
Video Attachment

Choi, Joonmyeong	University of Ulsan College of Medicine
Ahn, Se Hyeok	Seoul National University
Cho, Kyu-Jin	Seoul National University, Biorobotics Laboratory
Keywords: Soft Sensors and Actuators, Soft Robot Materials and Design, Surgical Robotics: Steerable Catheters/Needles Abstract: Soft actuators have been widely studied in recent years because of their ability to adapt to diverse environments and safely interact with humans. Their softness broadens their potential range of medical applications since they can provide inherent safety. Among the various motions a soft robot can perform, �torsion� can maximize the efficiency of motion in confined spaces like the human abdominal cavity. This paper presents a fully soft actuator with a double-helix tendon routing path for large-angle torsional motions. The double-helix tendon routing enables the actuator to generate large twisting deformations, while also avoiding buckling generally associated with the torque imbalance in small diameter soft cylinder structures. A sequential casting method was developed for cylindrical structures with internal double-helix pathing. A parametric study of the actuator�s twisting angle and the axial contraction with respect to different design parameters was conducted, including the wire tension and path pitch. From the results, when the tendon was pulled with 40 N after the pitch was decreased, the axial contraction of the soft actuator was reduced by half and the torsional angle was doubled up to 600 degrees without buckling.

10:15-10:30, Paper WeAT16.2
Design of a Highly-Maneuverable Pneumatic Soft Actuator Driven by Intrinsic SMA Coils (PneuSMA Actuator)

Allen, Emily	Washington State University
Swensen, John	Washington State University
Keywords: Soft Robot Materials and Design, Biologically-Inspired Robots, Flexible Robots Abstract: This paper presents the design of a new soft pneumatic actuator whose direction and magnitude of bending may be precisely controlled via activation of different shape memory alloy (SMA) springs within the actuator, in conjunction with pneumatic actuation. This design is inspired by examples seen in nature such as the human tongue, where the combination of hydrostatic pressure and contraction of intrinsic muscle groups enables precise maneuverability and morphing capabilities. Here, SMA springs are embedded in the walls of the actuator, serving as intrinsic muscles that may be selectively activated to constrain the device. The pneumatic SMA (PneuSMA) actuator demonstrates remarkable spatial controllability evidenced by testing under different pressures and SMA activation combinations. A baseline finite element model is also developed to predict the actuator deformation under different pressure and activation conditions.

10:30-10:45, Paper WeAT16.3
Multi-Modal Pneumatic Actuator for Twisting, Extension, and Bending
Video Attachment

Balak, Roman	Georgia Institute of Technology
Mazumdar, Yi	Georgia Institute of Technology
Keywords: Soft Sensors and Actuators, Soft Robot Materials and Design, Soft Robot Applications Abstract: Soft pneumatic actuators are commonly used in robotics for creating single-axis compression, extension, or bending motions. If these actuators are composed of compliant materials, they can also have low off-axis stiffnesses, making it difficult to restrict off-axis motions. In this work, we exploit the low off-axis stiffnesses of pneumatic actuators to design a modular actuator system that is capable of multi-modal extension, compression, two-axis bending, and twisting motions. By combining physical constraint mechanisms and motion planning, we demonstrate closed loop control with up to 24 mm of compression, 70 mm of extension, 115 degrees of bending, and 240 degrees of twisting. This actuator system is then used to illustrate several unique applications including twisting for unscrewing bottle caps and peristaltic crawling for locomotion.

10:45-11:00, Paper WeAT16.4
Wireless Soft Actuator Based on Liquid-Gas Phase Transition Controlled by Millimeter-Wave Irradiation

Ueno, Soichiro	Keio University
Monnai, Yasuaki	Keio University
Keywords: Soft Robot Applications, Force Control Abstract: We propose a wireless soft actuator controlled thermally by millimeter-wave irradiation. The actuator is composed of low boiling point liquid sealed in a soft bellows. By irradiating high-power millimeter-waves, the liquid can be evaporated to generate a strong mechanical force. We characterize the force and work extracted from the bellows as a function of the liquid volume and temperature. We then demonstrate the wireless actuation by irradiating a millimeter-wave on the bellows. We also evaluate its dynamic response by modulating the millimeter-wave. Our approach provides novel usage and design space of soft actuators.

11:00-11:15, Paper WeAT16.5
An Ionic Polymer Metal Composite (IPMC)-Driven Linear Peristaltic Microfluidic Pump

Sideris, Eva Ann	Eindhoven University of Technology
de Lange, Hendrik Cornelis	Eindhoven University of Technology
Hunt, Andres	Delft University of Technology
Keywords: Soft Sensors and Actuators, Medical Robots and Systems, Micro/Nano Robots Abstract: Microfluidic devices and micro-pumps are increasingly necessitated in many fields ranging from untethered soft robots, to pharmaceutical and biomedical technology. While realization of such devices is limited by miniaturization constraints of conventional actuators, these restrictions can be resolved by using smart material transducers instead. This paper proposes and investigates the first ionic polymer metal composite (IPMC) actuator-driven linear peristaltic pump. With the aim of designing a monolithic device, our concept is based on a single IPMC actuator that is etched on both sides and cut with kirigami-inspired slits by laser ablation. Our pump has a planar configuration, operates with low activation voltages (< 5 V) and is simple to manufacture and thus miniaturize. We build proof-of-principle prototypes of an open and closed design of our proposed pump concept, model the closed design, and evaluate both configurations experimentally. Results show the feasibility of the proposed IPMC-driven pump. Without any optimization, the open pump achieved pumping rates of 669 pL � s−1, while the closed pump configuration attained a 4.57 Pa pressure buildup and 9.18 nL � s−1 pumping rate. These results indicate feasibility of the concept and future work will focus on design optimization.

11:15-11:30, Paper WeAT16.6
The Multi-Material Actuator for Variable Stiffness (MAVS): Design, Modeling, and Characterization of a Soft Actuator for Lateral Ankle Support
Video Attachment

Thalman, Carly	Arizona State University
Hertzell, Tiffany	Arizona State University
Debeurre, Marielle Prescott	Arizona State University
Lee, Hyunglae	Arizona State University
Keywords: Soft Robot Materials and Design, Soft Sensors and Actuators, Soft Robot Applications Abstract: This paper presents the design of the Multi-material Actuator for Variable Stiffness (MAVS), which consists of an inflatable soft fabric actuator fixed between two layers of rigid retainer pieces. The MAVS is designed to be integrated with a soft robotic ankle-foot orthosis (SR-AFO) exosuit to aid in supporting the human ankle in the inversion/eversion directions. This design aims to assist individuals affected with chronic ankle instability (CAI) or other impairments to the ankle joint. The MAVS design is made from compliant fabric materials, layered and constrained by thin rigid retainers to prevent volume increase during actuation. The design was optimized to provide the greatest stiffness and least deflection for a beam positioned as a cantilever with a point load. Geometric programming of materials was used to maximize stiffness when inflated and minimize stiffness when passive. An analytic model of the MAVS was created to evaluate the effects in stiffness observed by varying the ratio in length between the rigid pieces and the soft actuator. A finite element analysis (FEA) was generated to analyze and predict the behavior of the MAVS prior to fabrication. The results from the analytic model and FEA study were compared to experimentally obtained results of the MAVS. The MAVS with the greatest stiffness was observed when the gap between the rigid retainers was smallest and the rigid retainer length was smallest. The MAVS design with the highest stiffness at 100 kPa was determined, which required 26.71 +/- 0.06 N to deflect the actuator 20 mm, and a resulting stiffness of 1,335.5 N/m and 9.1% margin of error from the model predictions.

11:15-11:30, Paper WeAT16.7
Hybrid Fluidic Actuation for a Foam-Based Soft Actuator

Peters, Jan	Leibniz Universit�t Hannover
Anvari, Bani	Lecturer (Assistant Professor)
Chen, Cheng	Unversity College London
Lim, Zara Timothea Yue Xin	King's College London
Wurdemann, Helge Arne	University College London
Keywords: Soft Robot Applications, Soft Robot Materials and Design, Soft Sensors and Actuators Abstract: ctuation means for soft robotic structures are manifold: despite actuation mechanisms such as tendon-driven manipulators or shape memory alloys, the majority of soft robotic actuators are fluidically actuated - either purely by positive or negative air pressure or by hydraulic actuation only. This paper presents the novel idea of employing hybrid fluidic - hydraulic and pneumatic - actuation for soft robotic systems. The concept and design of the hybrid actuation system as well as the fabrication of the soft actuator are presented: Polyvinyl Alcohol (PVA) foam is embedded inside a casted, reinforced silicone chamber. A hydraulic and pneumatic robotic syringe pump are connected to the base and top of the soft actuator. We found that a higher percentage of hydraulics resulted in a higher output force. Hydraulic actuation further is able to change displacements at a higher rate compared to pneumatic actuation. Changing between Hydraulic:Pneumatic (HP) ratios shows how stiffness properties of a soft actuator can be varied.


WeAT17	Room T17
Soft Grippers	Regular session
Chair: Park, Yong-Lae	Seoul National University
Co-Chair: Spenko, Matthew	Illinois Institute of Technology

10:00-10:15, Paper WeAT17.1
Laminar Jamming Flexure Joints for the Development of Variable Stiffness Robot Grippers and Hands
Video Attachment

Gerez, Lucas	The University of Auckland
Gao, Geng	University of Auckland
Liarokapis, Minas	The University of Auckland
Keywords: Soft Robot Materials and Design Abstract: Although soft robots are a good alternative to rigid, traditional robots due to their intrinsic compliance and environmental adaptability, there are several drawbacks that limit their impact, such as low force exertion capability and low resistance to deformation. For this reason, soft structures of variable stiffness have become a popular solution in the field to combine the benefits of both soft and rigid designs. In this paper, we develop laminar jamming flexure joints that facilitate the development of adaptive robot grippers with variable stiffness. Initially, we propose a mathematical model of the laminar jamming structures. Then, the model is experimentally validated through bending tests using different materials, pressures, and number of layers. Finally, the soft, laminar jamming structured are employed to develop variable stiffness flexure joints for two different adaptive robot grippers. Bending profile analysis and grasping tests have demonstrated the benefits of the proposed jamming structures and the capabilities of the designed grippers.

10:15-10:30, Paper WeAT17.2
An Underactuated Gripper Using Origami-Folding Inspired Variable Stiffness Flexure Hinges
Video Attachment

Godaba, Hareesh	Queen Mary University of London
Sajad, Aqeel	Queen Mary University of London
Patel, Navin	Queen Mary University of London
Althoefer, Kaspar	Queen Mary University of London
Zhang, Ketao	Queen Mary University of London
Keywords: Soft Robot Materials and Design, Grippers and Other End-Effectors, Underactuated Robots Abstract: This paper presents a novel approach for developing robotic grippers with variable stiffness hinges for dexterous grasps. This approach for the first time uses pneumatically actuated pouch actuators to fold and unfold morphable flaps of flexure hinges thus change stiffness of the hinge. By varying the air pressure in pouch actuators, the flexure hinge morphs into a beam with various open sections while the flaps bend, enabling stiffness variation of the flexure hinge. This design allows 3D printing of the flexure hinge using printable soft filaments. Utilizing the variable stiffness flexure hinges as the joints of robotic fingers, a light-weight and low-cost two-fingered tendon driven robotic gripper is developed. The stiffness variation caused due to the shape morphing of flexure hinges is studied by conducting static tests on fabricated hinges with different flap angles and on a flexure hinge with flaps that are erected by pouch actuators subjected to various pressures. Multiple grasp modes of the two-fingered gripper are demonstrated by grasping objects with various geometric shapes. The gripper is then integrated with a robot manipulator in a teleoperation setup for conducting a pick-and-place operation in a confined environment.

10:30-10:45, Paper WeAT17.3
A Soft Humanoid Hand with In-Finger Visual Perception
Video Attachment

Hundhausen, Felix	Karlsruhe Institute of Technology
Starke, Julia	Karlsruhe Institute of Technology
Asfour, Tamim	Karlsruhe Institute of Technology (KIT)
Keywords: Multifingered Hands, Soft Robot Materials and Design, Humanoid Robot Systems Abstract: We present a novel underactued humanoid five finger soft hand, the KIT softhand, which is equipped with cameras inside the fingertips and integrates a high performance embedded system for visual processing and control. We describe the actuation mechanism of the hand and the tendon-driven soft finger design with internally routed high-bandwidth flat-flex cables. For efficient on-board parallel processing of visual data from multiple fingertip cameras, we present a hybrid embedded architecture consisting of a field programmable logic array (FPGA) and a microcontroller that allows the realization of visual object segmentation based on convolutional neural networks. We evaluate the hand design by conducting durability experiments with one finger and quantify the grasp performance in terms of grasping force, speed and grasp success. The results show that the hand exhibits a grasp force of 31.8 N and a mechanical durability of the finger of more than 15.000 closing cycles. Finally, we evaluate the accuracy of visual object segmentation during the different phases of the grasping process using five different objects. Hereby, an accuracy above 90 % can be achieved.

10:45-11:00, Paper WeAT17.4
Exploring the Role of Palm Concavity and Adaptability in Soft Synergistic Robotic Hands
Video Attachment

Capsi Morales, Patricia	University of Pisa
Grioli, Giorgio	Istituto Italiano Di Tecnologia
Piazza, Cristina	Northwestern University
Bicchi, Antonio	Universit� Di Pisa
Catalano, Manuel Giuseppe	Istituto Italiano Di Tecnologia
Keywords: Multifingered Hands, Mechanism Design, Natural Machine Motion Abstract: Robotic hand engineers usually focus on finger capabilities, often disregarding the palm contribution. Inspired by human anatomy, this paper explores the advantages of including a flexible concave palm into the design of a robotic hand actuated by soft synergies. We analyse how the inclusion of an articulated palm improves finger workspace and manipulability. We propose a mechanical design of a modular palm with two elastic rolling-contact palmar joints, that can be integrated on the Pisa/IIT SoftHand, without introducing additional motors. With this prototype, we evaluate experimentally the grasping capabilities of a robotic palm. We compare its performance to that of the same robotic hand with the palm fixed, and to that of a human hand. To assess the effective grasp quality achieved by the three systems, we measure the contact area using painttransfer patterns in different grasping actions. Preliminary grasping experiments show a closer resemblance of the softpalm robotic hand to the human hand. Results evidence a higher adaptive capability and a larger involvement of all fingers in grasping.

11:00-11:15, Paper WeAT17.5
Delicate Fabric Handling Using a Soft Robotic Gripper with Embedded Microneedles
Video Attachment

Ku, Subyeong	Seoul National University
Myeong, Jihye	Seoul National University
Kim, Ho-Young	Seoul National University
Park, Yong-Lae	Seoul National University
Keywords: Grippers and Other End-Effectors, Biologically-Inspired Robots, Dexterous Manipulation Abstract: We propose a soft robotic gripper that can handle various types of fabrics with delicacy for applications in the field of garment manufacturing. The design was inspired by the adhesion mechanism of a parasitic fish called 'lamprey.' The proposed gripper not only is able to pick up and hold a single sheet of fabric from a stack but also does not make any damages on it. In this work, we first modeled the holding force of the gripper and then experimentally evaluated its performance with different types of fabrics, in terms of the holding force and the response time. The experimental data showed a reasonable agreement with the predicted values by the model. The actuation time and the maximum holding force measured in the experiments were 0.32 seconds and 1.12 N, respectively. The gripper showed high success rates in picking up a single sheet of air permeable fabric, which was not possible by a commercial vacuum pad. It also showed durability in repeated motions of gripping test over 20,000 cycles. We believe the proposed gripper has a high potential in realizing smart manufacturing in garment industry.

11:15-11:30, Paper WeAT17.6
An Electrostatic/Gecko-Inspired Adhesives Soft Robotic Gripper
Video Attachment

Alizadehyazdi, Vahid	The University of Chicago
Bonthron, Michael	Illinois Institute of Technology
Spenko, Matthew	Illinois Institute of Technology
Keywords: Grippers and Other End-Effectors, Soft Robot Materials and Design, Biologically-Inspired Robots Abstract: Compared to traditional grippers, soft grippers can typically grasp a wider range of objects, including ones that are soft, fragile, or irregularly shaped, but at the cost of a relatively low gripping force. To increase gripping force for soft grippers, this research presents a gripper with an integrated electrostatic and gecko-inspired adhesive. Synthetic gecko-inspired, microstructured adhesives are controllable (i.e. they can be turned on and off) and work on a wide range of substrates and materials; however, they are not typically effective on rough surfaces. In contrast, electrostatic adhesives, also controllable, have a higher tolerance to rough surfaces. By combining the two, it is possible to create an adhesive that is effective on a wider range of materials and roughness, including fabric. To increase the gripping force, parameters that affect electrostatic adhesion, including the electrode gap, electrode width, relative permittivity of gecko-inspired layer, and air gap between the adhesive and substrate were studied with Comsol Multiphysics software and experimentally validated. Results show that adding the two adhesives improves the gripping capabilities across acrylic, Tyvek fabric, and Kapton hemispheres of different diameters on an average of 100, 39, and 168%, respectively.


WeAT18	Room T18
Soft Robot Applications I	Regular session
Chair: Blumenschein, Laura	Purdue University

10:00-10:15, Paper WeAT18.1
A Compact, Cable-Driven, Activatable Soft Wrist with Six Degrees of Freedom for Assembly Tasks
Video Attachment

von Drigalski, Felix Wolf Hans Erich	OMRON SINIC X Corporation
Tanaka, Kazutoshi	OMRON SINIC X Corporation
Hamaya, Masashi	OMRON SINIC X Corporation
Lee, Robert	Australian Centre for Robotic Vision
Nakashima, Chisato	OMRON Corp
Shibata, Yoshiya	OMRON Corpration
Ijiri, Yoshihisa	OMRON Corp
Keywords: Factory Automation, Soft Robot Applications, Soft Robot Materials and Design Abstract: Physical softness has been proposed to absorb impacts when establishing contact with a robot or its workpiece, to relax control requirements and improve performance in assembly and insertion tasks. Previous work has focused on special end effector solutions for isolated tasks, such as the peg-in-hole task. However, as many robot tasks require the precision of rigid robots, and their performance would degrade when simply adding compliance, it has been difficult to take advantage of physical softness in real applications. A wrist that could switch between soft and rigid modes could solve this problem, but actuators with sufficient strength for this state transition would increase the size and weight of the module and decrease the payload of the robot. To solve this problem, we propose a novel design of a soft module consisting of a cable-driven mechanism, which allows the robot end effector to change between soft and rigid mode while being very compact and light. The module effectively combines the advantages of soft and rigid robots, and can be retrofitted to existing robots and grippers while preserving the characteristics of the robotic system. We evaluate the effectiveness of our proposed design through experiments modeling assembly tasks, and investigate design parameters quantitatively.

10:15-10:30, Paper WeAT18.2
An Untethered Brittle Star-Inspired Soft Robot for Closed-Loop Underwater Locomotion
Video Attachment

Patterson, Zach J.	Carnegie Mellon University
Sabelhaus, Andrew P.	Carnegie Mellon University
Chin, Keene	Carnegie Mellon University
Hellebrekers, Tess	Carnegie Mellon University
Majidi, Carmel	Carnegie Mellon University
Keywords: Soft Robot Applications, Biologically-Inspired Robots, Motion and Path Planning Abstract: Soft robots are capable of inherently safer interactions with their environment than rigid robots since they can mechanically deform in response to unanticipated stimuli. However, their complex mechanics can make planning and control difficult, particularly with tasks such as locomotion. In this work, we present a mobile and untethered underwater crawling soft robot, PATRICK, paired with a testbed that demonstrates closed-loop locomotion planning. PATRICK is inspired by the brittle star, with five flexible legs actuated by a total of 20 shape-memory alloy (SMA) wires, providing a rich variety of possible motions via its large input space. We propose a motion planning infrastructure based on a simple set of PATRICK's motion primitives, and provide experiments showing that the planner can command the robot to locomote to a goal state. These experiments contribute the first examples of closed-loop, state-space goal seeking of an underwater, untethered, soft crawling robot, and make progress towards full autonomy of soft mobile robotic systems.

10:30-10:45, Paper WeAT18.3
A Multigait Stringy Robot with Bi-Stable Soft-Bodied Structures in Multiple Viscous Environments
Video Attachment

Ta, Tung D.	The University of Tokyo
Umedachi, Takuya	Shinshu University
Kawahara, Yoshihiro	The University of Tokyo
Keywords: Soft Robot Materials and Design, Biologically-Inspired Robots, Marine Robotics Abstract: The exploration of spatially limited terrestrial or aquatic environments requires miniature and lightweight robots. Soft-bodied robot research is paving ways for a new class of small-scale robots that can navigate a variety of environments with minimum influence on the environment itself. However, it is generally challenging to design miniature soft-bodied robots that efficiently adapt to the change between viscous environments. A small-scale soft-bodied robot, which could slowly move on dry land, will need rapid motions to be able to swim in a wet environment. Although using snap-through buckling of a deformable body could help to create swift motions of the robot, merely applying the snap-through buckling does not improve the swimming speed of the robot so much. Here we propose a design of a stringy soft-bodied robot that can crawl on dry surfaces and swim in liquid environments. Besides taking advantage of the snap-through buckling using coil shape memory alloys (SMAs), we design the body of the robot with a geometrical overlapping of the active body segments and control the frequency of the undulation movement, which is crucial for the swimming locomotion. We evaluate the performance of the robot in different density and viscosity liquids such as cooking oil and Glycerin solution. We found that the robot needs to drastically change its undulation from low to high frequency when it moves from high to low viscosity environments. Our robot can swim at a speed of 3.37 body-lengths per minute (BL/min}) and crawl at a speed of 1.74 BL/min. We anticipate our findings will help shed light on the design of soft-bodied robots that adapt to the changing environments efficiently.

10:45-11:00, Paper WeAT18.4
Development of a Pneumatically Driven Growing Sling to Assist Patient Transfer
Video Attachment

Choi, Jonggyu	KIST
Lee, Seungjun	Korea Institute of Science and Technology
Kim, Jeongryul	Korea Institute of Science and Technology
Lee, MyungJoong	Korea Institute of Science and Technology, University of Science
Kim, Keri	Korea Institute of Science and Technology
In, HyunKi	Korea Institute of Science and Technology
Keywords: Soft Robot Materials and Design, Soft Robot Applications, Service Robots Abstract: In this study, a new type of sling for assisting bedridden patients is developed using a pneumatic growing mechanism. Growing Sling focuses on minimizing the labor input of the caregivers by automating the sling insertion and retraction process while maintaining safety and comfort. Improvements over the typical growing mechanism were made by reinforcing the sling with shafts and filament tape for restricting the height of the sling to ensure its design purpose. Analysis of forces exerted on the structure was made to interpret the driving power of the automated insertion process and to ensure the structural integrity of components. Experiments on materials and prototype devices were conducted to determine the quantitative load that the sling needs to endure and what type of material is suitable for fabrication. Further, we propose a fabrication process for the Growing Sling, including its dimensions, and validate the performance of the fabricated prototype.

11:00-11:15, Paper WeAT18.5
A Tip Mount for Transporting Sensors and Tools Using Soft Growing Robots
Video Attachment

Jeong, Sang-Goo	KOREATECH
Coad, Margaret M.	Stanford University
Blumenschein, Laura	Purdue University
Luo, Ming	Stanford University
Mehmood, Usman	Korea University of Technology and Education
Kim, Ji-Hun	Korea University of Technology and Education
Okamura, Allison M.	Stanford University
Ryu, Jee-Hwan	Korea Advanced Institute of Science and Technology
Keywords: Soft Robot Materials and Design, Soft Robot Applications Abstract: Pneumatically operated soft growing robots that extend via tip eversion are well-suited for navigation in confined spaces. Adding the ability to interact with the environment using sensors and tools attached to the robot tip would greatly enhance the usefulness of these robots for exploration in the field. However, because the material at the tip of the robot body continually changes as the robot grows and retracts, it is challenging to keep sensors and tools attached to the robot tip during actuation and environment interaction. In this paper, we analyze previous designs for mounting to the tip of soft growing robots, and we present a novel device that successfully remains attached to the robot tip while providing a mounting point for sensors and tools. Our tip mount incorporates and builds on our previous work on a device to retract the robot without undesired buckling of its body. Using our tip mount, we demonstrate two new soft growing robot capabilities: (1) pulling on the environment while retracting, and (2) retrieving and delivering objects. Finally, we discuss the limitations of our design and opportunities for improvement in future soft growing robot tip mounts.


WeAT19	Room T19
Soft Robot Applications II	Regular session
Chair: Ollero, Anibal	University of Seville
Co-Chair: Zhao, Jianguo	Colorado State University

10:00-10:15, Paper WeAT19.1
Novel Design of a Soft Pump Driven by Super-Coiled Polymer Artificial Muscles
Video Attachment

Tse, Yu Alexander	HKUST Robotics Institute
Wong, Kiwan	The Hong Kong University of Science and Technology
Yang, Yang	Nanjing University of Information Science and Technology
Wang, Michael Yu	Hong Kong University of Science & Technology
Keywords: Soft Robot Materials and Design, Soft Sensors and Actuators, Soft Robot Applications Abstract: The widespread use of fluidic actuation for soft robots creates a high demand for soft pumps and compressors. However, current off-the-shelf pumps are usually rigid, noisy, and cumbersome. As a result, it is hard to integrate most commercial pumps into soft robotic systems, which restricts the autonomy and portability of soft robots. This paper presents the novel design of a soft pump based on bellow structure and super-coiled polymer (SCP) artificial muscles. The pump is flexible, lightweight, modular, scalable, quiet, and low cost. The pumping mechanism and fabrication process of the proposed soft pump is demonstrated. A pump prototype is fabricated to verify the proposed design and characterize its performance. From the characterization results, the pump can reach an output flow rate of up to 54 ml/min and delivers pressure up to 2.63 kPa. The pump has potential applications in untethered soft robots and wearable devices.

10:15-10:30, Paper WeAT19.2
Integrated Actuation and Self-Sensing for Twisted-And-Coiled Actuators with Applications to Innervated Soft Robots
Video Attachment

Sun, Jiefeng	Colorado State University
Zhao, Jianguo	Colorado State University
Keywords: Soft Robot Applications, Soft Sensors and Actuators Abstract: Traditional soft robots require separate sensors and actuators to precisely control their motion. A twisted-and-coiled actuator (TCA) is a new artificial muscle with both actuation and self-sensing capability that can simultaneously serve both as a sensor and an actuator allowing to control the motion of TCAs without external sensors. This paper investigates the integrated sensing and actuation for TCAs, and the self-sensing function is realized by only measuring the TCA's electrical resistance change. The closed-loop control of a single TCA is realized, and an innervated soft finger that can respond to external load without extra sensors is demonstrated. Our results will lay a foundation for integrated sensing and control by directly using the actuator, paving the way for self-contained smart robotic systems (e.g., untethered soft robots).

10:30-10:45, Paper WeAT19.3
Exploiting the Morphology of a Shape Memory Spring As the Active Backbone of a Highly Dexterous Tendril Robot (ATBR)
Video Attachment

Sonaike, Kayode	University of Bristol
Sadati, Seyedmohammadhadi	King's College London
Bergeles, Christos	King's College London
Walker, Ian	Clemson University
Keywords: Soft Robot Materials and Design, Soft Sensors and Actuators, Flexible Robots Abstract: Tendrils are common stable structures in nature and are used for sensing, actuation, and geometrical stiffness modulation. In this paper, for the first time we exploit the helical geometry of a shape memory alloy (SMA) tendril as a simple to fabricate highly dexterous robotic continuum tentacle that we called Active Tendril-Backbone Robot (ATBR). This is achieved via partial (120 deg) activation of single helix turns resulting in backbone directional bendings. A 141.5 mm prototype (130 mm when fully compressed) has been fabricated and a simple theoretical framework is proposed and experimentally validated for modeling of the tentacle configuration. The manipulator has five 2-DOF joints capable of reaching bending angles of up to 54.5 deg and angular speed of up to 6.8 deg/s. The dexterity of the manipulator is showcased empirically in reaching complex configurations and simple navigation through confined space of a curving path.

10:45-11:00, Paper WeAT19.4
SMA Actuated Low-Weight Bio-Inspired Claws for Grasping and Perching Using Flapping Wing Aerial Systems
Video Attachment

G�mez Tamm, Alejandro Ernesto	Universidad De Sevilla
Perez Sanchez, Vicente	University of Seville, GRVC
Arrue, Bego�a C.	Universidad De Sevilla
Ollero, Anibal	University of Seville
Keywords: Aerial Systems: Applications, Soft Robot Materials and Design, Grasping Abstract: Taking inspiration in nature, the work presented in this paper aims to develop bio-inspired claws to be used for grasping and perching in flapping wing aerial systems. This claws can be 3D printed out of two different materials and will be capable of adapt to any shape. Also, they will be soft for avoiding undesired damages on the objects when performing manipulation. These claws will be actuated by shape memory alloys (SMA) springs to get rid of the weight of traditional servos. The design of all the components will be explained in this work. Also the challenges of being able to control SMA using only a LiPo battery on an aerial vehicle will be exposed. The solutions applied and electronics used will be also described. Lastly, experiments made both in test bench as on flight will be summarized.

11:00-11:15, Paper WeAT19.5
Simultaneous 3D Forming and Patterning Method of Realizing Soft IPMC Robots
Video Attachment

Kubo, Keita	Tokyo Institute of Technology
Nabae, Hiroyuki	Tokyo Institute of Technology
Horiuchi, Tetsuya	National Institute of Advanced Industrial Science and Technology
Asaka, Kinji	National Institute of Advanced Industrial Scince and Technology
Endo, Gen	Tokyo Institute of Technology
Suzumori, Koichi	Tokyo Institute of Technology
Keywords: Soft Robot Applications, Soft Robot Materials and Design, Soft Sensors and Actuators Abstract: Ionic polymer-metal composites (IPMC) actuators are popular because they can be driven at a low voltage, possess excellent responsiveness, and can perform soft motions similar to that of living creatures. Conventional IPMC soft robots are manufactured by cutting and assembling IPMC sheets. However, using this conventional process to stably manufacture three-dimensional (3D)-shaped soft robots is difficult. To mitigate this problem, we propose a new method for fabricating 3D IPMC actuators in which several surface electrodes are separately fabricated from a single ion-exchange membrane. We refer to our proposal as the simultaneous 3D forming and patterning (SFP) method. Unlike the conventional IPMC fabrication process, the SFP method requires only one step to fix the ion-exchange membrane to contact masks. First, we briefly describe IPMC actuators, before introducing the proposed SFP method in detail. Next, we describe our investigations of the patterning resolution for the surface electrode using the proposed method. We fabricated two soft robot prototypes using the proposed method. The first robot is a starfish-type soft robot. Its surface electrode can be patterned in a plane using the proposed method, and independent driving is possible by applying voltage individually to the divided electrodes. The second prototype is a sea anemone-type soft robot, wherein surface electrodes can be patterned on a 3D curved surface to form a 3D shape.


WeAT20	Room T20
Soft Robot Design and Modelling	Regular session
Chair: Kramer-Bottiglio, Rebecca	Yale University
Co-Chair: Aukes, Daniel	Arizona State University

10:00-10:15, Paper WeAT20.1
Toward Analytical Modeling and Evaluation of Curvature-Dependent Distributed Friction Force in Tendon-Driven Continuum Manipulators
Video Attachment

Liu, Yang	The University of Texas at Austin
Ahn, Seong Hyo	University of Texas at Austin
Yoo, Uksang	The University of Texas at Austin
Cohen, Alexander Ross	Univeristy of Texas at Autsin
Alambeigi, Farshid	University of Texas at Austin
Keywords: Medical Robots and Systems, Modeling, Control, and Learning for Soft Robots, Surgical Robotics: Laparoscopy Abstract: In this paper, we present an analytical modeling approach to address the problem of tension loss in a generic variable curvature tendon-driven continuum manipulators(TDCM)occurring due to the tendon-sheath distributed friction force. Despite the previous approaches in the literature, our presented model and the iterative solution algorithm do not rely on a priori known curvature/shape of the TD-CM and can be implemented on any TD-CM with constant/ variable curvatures with a continuous neutral axis function. The performance of the proposed modeling approach in predicting the distributed tendon tension and tension loss has been evaluated via simulation and experimental studies on a TD-CM with planar bending. Results demonstrate the outstanding and accurate performance of our novel modeling and the proposed solution algorithm.

10:15-10:30, Paper WeAT20.2
Vacuum Driven Auxetic Switching Structure and Its Application on a Gripper and Quadruped
Video Attachment

Liu, Shuai	Hong Kong University of Science and Technology
Athar, Sheeraz	M.Phil Candidate (The Hong Kong University of Science and Techno
Wang, Michael Yu	Hong Kong University of Science & Technology
Keywords: Soft Robot Applications Abstract: The properties and applications of auxetics havebeen widely explored in the past years. Through properutilization of auxetic structures, designs with unprecedentedmechanical and structural behaviors can be produced. Takingadvantage of this, we present the development of novel and low-cost 3D structures inspired by a simple auxetic unit. The corepart, which we call the body in this paper, is a 3D realizationof 2D rotating squares. This body structure was formed byjoining four similar structures through softer material at thevertices. A monolithic structure of this kind is accomplishedthrough a custom-built multi-material 3D printer. The modelworks in a way that, when torque is applied along the face ofthe rotational squares, they tend to bend at the vertex of thesofter material, and due to the connected-ness of the design, aproper opening and closing motion is achieved. To demonstratethe potential of this part as an important component forrobots, two applications are presented: a soft gripper anda crawling robot. Vacuum-driven actuators move both theapplications. The proposed gripper combines the benefits oftwo types of grippers whose fingers are placed parallel andequally spaced to each other, in a single design. This gripperis adaptable to the size of the object and can grasp objectswith large and small cross- sections alike. A novel bendingactuator, which is made of soft material and bends in curvaturewhen vacuumed, provides the grasping nature of the gripper.Crawling robots, in addition to their versatile nature, providea better interaction with humans. The designed crawling robotemploys negative pressure-driven actuators to highlight linearand turning locomotion.

10:30-10:45, Paper WeAT20.3
A Model-Based Sensor Fusion Approach for Force and Shape Estimation in Soft Robotics
Video Attachment

Escaida Navarro, Stefan	Inria
Nagels, Steven	Hasselt University, Institute for Materials Research (IMO)
Alagi, Hosam	Karlsruhe Institute of Technology
Faller, Lisa-Marie	FH K�rnten
Goury, Olivier	Inria - Lille Nord Europe
Morales Bieze, Thor Enrique	University of Lille
Zangl, Hubert	Alpen-Adria-Universitaet Klagenfurt
Hein, Bj�rn	Karlsruhe Institute of Technology
Ramakers, Raf	University Hasselt
Deferme, Wim	University Hasselt
Zheng, Gang	INRIA
Duriez, Christian	INRIA
Keywords: Modeling, Control, and Learning for Soft Robots, Soft Sensors and Actuators, Soft Robot Materials and Design Abstract: In this paper, we address the challenge of sensor fusion in Soft Robotics for estimating forces and deformations. In the context of intrinsic sensing, we propose the use of a soft capacitive sensor to find a contact�s location, and the use of pneumatic sensing to estimate the force intensity and the deformation. Using a FEM-based numerical approach, we integrate both sensing streams and model two Soft Robotics devices we have conceived. These devices are a Soft Pad and a Soft Finger. We show in an evaluation that external forces on the Soft Pad can be estimated and that the shape of the Soft Finger can be reconstructed.

10:45-11:00, Paper WeAT20.4
Reconfigurable Soft Flexure Hinges Via Pinched Tubes
Video Attachment

Jiang, Yuhao	Arizona State University
Sharifzadeh, Mohammad	Arizona State University
Aukes, Daniel	Arizona State University
Keywords: Soft Robot Materials and Design, Soft Robot Applications, Mechanism Design Abstract: Abstract� Tuning the stiffness of soft robots is essential in order to extend usability and control the maneuverability of soft robots. In this paper, we propose a novel mechanism that can reconfigure the stiffness of tubular structures, using pinching to induce highly directional changes in stiffness. When pinched, these tubes can be then utilized as flexure hinges to create virtual joints on demand; the orientation of the hinge axis can additionally be selected via control of the distribution of pinch forces on the surface of the tube. Through proper material and geometry selection, passive shape recovery is observed when pinching forces are removed; a proposed active shape recovery technique can further assist the tube to recover its initial shape in order to re-configure the hinge in a new orientation. The proposed mechanism has been validated in FEA as well as experimentally, looking specifically at the relation between pinching force and curvature change, as well as comparing tube stiffness between pinched and unpinched configurations. The experimental prototype detailed in this paper � and demonstrated in the associated video � is capable of controlling the generation and recovery of flexure hinges at multiple orientations around the radial axis of tubes on demand.

11:00-11:15, Paper WeAT20.5
Rolling Soft Membrane-Driven Tensegrity Robots
Video Attachment

Baines, Robert Lawrence	Yale University
Booth, Joran	Yale University
Kramer-Bottiglio, Rebecca	Yale University
Keywords: Soft Robot Applications, Soft Robot Materials and Design, Motion and Path Planning Abstract: We present a methodology for designing, fabricating, and controlling rolling membrane-driven tensegrity robots. This methodology is enabled by pneumatic membrane actuators and a generalized path planning algorithm for rolling polyhedra. Membrane actuators are planar, assembled in a scalable fashion, and amenable to arbitrary geometries. Their deformation trajectories can be tuned by varying the stacking sequence and orientation of layers of unidirectional lamina placed on their surfaces. We demonstrate the application of the same set of membrane actuators consisting of polygonal faces of Platonic Solids to create polyhedral tensegrity variants. Three specific tensegrities in the forms of cube, dodecahedron, and rhombicuboctahedron are chosen to demonstrate the path planning algorithm, though the algorithm is generalizable to any uniform or non-uniform n-sided polyhedra. The membrane-driven tensegrities are able to roll in unique trajectories and circumvent obstacles contingent on the distribution and types of polygons which constitute their faces.

11:15-11:30, Paper WeAT20.6
Retraction Mechanism of Soft Torus Robot with a Hydrostatic Skeleton
Video Attachment

Takahashi, Tomoya	Tohoku University
Watanabe, Masahiro	Tohoku University
Tadakuma, Kenjiro	Tohoku University
Konyo, Masashi	Tohoku University
Tadokoro, Satoshi	Tohoku University
Keywords: Soft Robot Materials and Design, Mechanism Design Abstract: Soft robots have attracted much attention in recent years owing to their high adaptability. Long articulated soft robots enable diverse operations, and tip-extending robots that navigate their environment through growth are highly effective in robotic search applications. Because the robot membrane extends from the tip, these robots can lengthen without friction from the environment. However, the flexibility of the membrane inhibits tip retraction. Two methods have been proposed to resolve this issue; increasing the pressure of the internal fluid to reinforce rigidity, and mounting an actuator at the tip. The disadvantage of the former is that the increase is limited by the membrane pressure resistance, while the second method adds to the robot complexity. In this paper, we present a tip-retraction mechanism without bending motion that takes advantage of the friction from the external environment. Water is used as the internal fluid to increase ground pressure with the environment. We explore the failure pattern of the retraction motion and propose plausible solutions by using a hydrostatic skeleton robot. Additionally, we develop a prototype robot that successfully retracts by using the proposed methodology. Our solution can contribute to the advancement of mechanical design in the soft robotics field with applications to soft snakes and manipulators.


WeAT21	Room T21
Soft Sensors I	Regular session
Chair: Markvicka, Eric	University of Nebraska-Lincoln
Co-Chair: Jamone, Lorenzo	Queen Mary University London

10:00-10:15, Paper WeAT21.1
Localization and Force-Feedback with Soft Magnetic Stickers for Precise Robot Manipulation
Video Attachment

Hellebrekers, Tess	Carnegie Mellon University
Zhang, Kevin	Carnegie Mellon University
Veloso, Manuela	Carnegie Mellon University
Kroemer, Oliver	Carnegie Mellon University
Majidi, Carmel	Carnegie Mellon University
Keywords: Force and Tactile Sensing, Grasping, Soft Sensors and Actuators Abstract: Tactile sensors are used in robot manipulation to reduce uncertainty regarding hand-object pose estimation. However, existing sensor technologies tend to be bulky and provide signals that are difficult to interpret into actionable changes. Here, we achieve wireless tactile sensing with soft and conformable magnetic stickers that can be easily placed on objects within the robot's workspace. We embed a small magnetometer within the robot's fingertip that can localize to a magnetic sticker with sub-mm accuracy and enable the robot to pick up objects in the same place, in the same way, every time. In addition, we utilize the soft magnets' ability to exhibit magnetic field changes upon contact forces. We demonstrate the localization and force-feedback features with a 7-DOF Franka arm on deformable tool use and a key insertion task for applications in home, medical, and food robotics. By increasing the reliability of interaction with common tools, this approach to object localization and force sensing can improve robot manipulation performance for delicate, high-precision tasks.

10:15-10:30, Paper WeAT21.2
Fruit Quality Control by Surface Analysis Using a Bio-Inspired Soft Tactile Sensor
Video Attachment

Ribeiro, Pedro	Instituto Superior Tecnico
Cardoso, Susana	INESC-Microsistemas E Nanotecnologias and In
Bernardino, Alexandre	IST - T�cnico Lisboa
Jamone, Lorenzo	Queen Mary University London
Keywords: Soft Sensors and Actuators, Biomimetics, Agricultural Automation Abstract: The growing consumer demand for large volumes of high quality fruit has generated an increasing need for automated fruit quality control during production. Optical methods have been proved successful in a few cases, but with limitations related to the variability of fruit colors and lighting conditions during tests. Tactile sensing provides a valuable alternative, although it comes with the need of a physical interaction that could damage the fruit. To overcome these limitations, we propose to use a recently developed soft tactile sensor for non-invasive fruit quality control. The ability of the sensor to detect very small forces and to finely characterize surfaces allows to collect relevant information about the fruit by performing a very delicate physical interaction, that does not cause any damage. We report experiments in which such information is used to determine whether apples and strawberries are ripe or senescent. We test different configurations of the sensor and different classification algorithms, achieving very good accuracy for both apples (96%) and strawberries (83%).

10:30-10:45, Paper WeAT21.3
Wireless Electronic Skin with Integrated Pressure and Optical Proximity Sensing
Video Attachment

Markvicka, Eric	University of Nebraska-Lincoln
Rogers, Jonathan	NASA Johnson Space Center
Majidi, Carmel	Carnegie Mellon University
Keywords: Soft Sensors and Actuators, Force and Tactile Sensing, Perception for Grasping and Manipulation Abstract: Electronic skins and tactile sensors can provide the sense of touch to robotic manipulators. These sensing modalities complement existing long range optical sensors and can provide detailed information before and after contact. However, integration with existing systems can be challenging due to size constraints, the interface geometry, and restrictions of external wiring used to interface with the sensor. Here, we introduce a low-profile, wireless electronic skin for direct integration with existing robotic manipulators. The flexible electronic skin combines pressure, optical proximity sensing, and a micro-LIDAR device in a small, low profile package. Each of the sensors are characterized individually and the system is demonstrated on Robonaut 2, an anthropomorphic robot designed to work in environments designed for humans. We demonstrate the sensor can be used for contact sensing, mapping of local unknown environments, and to provide medical monitoring during an emergency in a remote area.

10:45-11:00, Paper WeAT21.4
Vision-Based Proprioceptive Sensing: Tip Position Estimation for a Soft Inflatable Bellow Actuator
Video Attachment

Werner, Peter	ETH Z�rich
Hofer, Matthias	ETH Zurich
Sferrazza, Carmelo	ETH Zurich
D'Andrea, Raffaello	ETHZ
Keywords: Soft Sensors and Actuators, Modeling, Control, and Learning for Soft Robots Abstract: This paper presents a vision-based sensing approach for a soft linear actuator, which is equipped with an internal camera. The proposed vision-based sensing pipeline predicts the three-dimensional tip position of the actuator. To train and evaluate the algorithm, predictions are compared to ground truth data from an external motion capture system. An off-the-shelf distance sensor is integrated in a second actuator of the same type, providing only the vertical component of the tip position and used as a baseline for comparison. The camera-based sensing pipeline runs at 40 Hz in real-time on a standard laptop and is additionally used for closed loop elongation control of the actuator. It is shown that the approach can achieve comparable accuracy to the distance sensor for measuring the linear expansion of the actuator, but additionally provide the full three-dimensional tip position.


WeAT22	Room T22
Soft and Flexible Robotics	Regular session
Chair: Yang, Xingbang	Beihang University

10:00-10:15, Paper WeAT22.1
A Minimalistic Hyper Flexible Manipulator: Modeling and Control
Video Attachment

Prigozin, Amit	Technion - Israel Institute of Technology
Degani, Amir	Technion - Israel Institute of Technology
Keywords: Flexible Robots, Underactuated Robots, Motion and Path Planning Abstract: Robotic manipulators can be found today in most industries, from autonomous warehouses to advanced assembly lines in factories. Most of these industrial robots are characterized by having non-flexible and highly rigid links. In dense and complex environments these manipulators require many degrees of freedom (DOFs) which complicates the mechanical structure of the manipulator, as well as the control and path planning algorithms. In this work we present a minimalistic approach to reduce the number of active DOFs by using non-rigid, Hyper-Flexible Manipulators (HFM). We introduce a dynamic model of the HFM as well as a control scheme to bring the end-effector to a desired position from known initial configuration. Finally, we present experiments that support the analytic part and simulative results of this paper.

10:15-10:30, Paper WeAT22.2
Joint-Level Control of the DLR Lightweight Robot SARA
Video Attachment

Iskandar, Maged	German Aerospace Center - DLR
Ott, Christian	German Aerospace Center (DLR)
Eiberger, Oliver	DLR - German Aerospace Center
Keppler, Manuel	German Aerospace Center (DLR)
Albu-Sch�ffer, Alin	DLR - German Aerospace Center
Dietrich, Alexander	German Aerospace Center (DLR)
Keywords: Flexible Robots, Compliance and Impedance Control, Industrial Robots Abstract: Lightweight robots are known to be intrinsically elastic in their joints. The established classical approaches to control such systems are mostly based on motor-side coordinates since the joints are comparatively stiff. However, that inevitably introduces errors in the coordinates that actually matter: the ones on the link side. Here we present a new joint-torque controller that uses feedback of the link-side positions. Passivity during interaction with the environment is formally shown as well as asymptotic stability of the desired equilibrium in the regulation case. The performance of the control approach is experimentally validated on DLR's new generation of lightweight robots, namely the SARA robot, which enables this step from motor-side-based to link-sided-based control due to sensors with higher resolution and improved sampling rate.

10:30-10:45, Paper WeAT22.3
Contact Point Estimation Along Air Tube Based on Acoustic Sensing of Pneumatic System Noise
Video Attachment

Mikogai, Shinichi	Tokai University
Bulathgaha Dewage Chandrasiri, Kazumi Randika	Tokai University
Takemura, Kentaro	Tokai University
Keywords: Soft Sensors and Actuators, Soft Robot Applications Abstract: Active acoustic sensing is being widely used in various fields, with applications including shape estimation of soft pneumatic actuators. In a pneumatic system, air tubes are frequently adopted, and thus it is essential to detect failures along the air path. Although acoustic sensing has been used for detecting contact and identifying the contact position along a tube, it has not been applied to pneumatic systems. We devised an acoustic sensing method to this end for air tubes in a pneumatic system. As pneumatic system noise propagates through the air tube, we employed this type of noise instead of the conventional method of using a sound source or emitting vibration with an additional oscillator. We conducted several experiments that confirm the feasibility of the proposed method, succeeding to estimate the contact point on a 16 m air tube.

10:45-11:00, Paper WeAT22.4
Self-Sensing and Feedback Control for a Twin Coil Spring-Based Flexible Ultrasonic Motor
Video Attachment

Sato, Yunosuke	Toyohashi University of Technology
Kanada, Ayato	Kyushu University
Mashimo, Tomoaki	Toyohashi University of Technology
Keywords: Soft Sensors and Actuators, Soft Robot Applications Abstract: We propose a twin coil spring-based soft actuator that can move forward and backward with extensibility and can bend left and right with flexibility. It is driven by two flexible ultrasonic motors, each consisting of a compact metallic stator and an elastic elongated coil spring. The position of the end effector is determined by the positional relationship of the two coils and can be kinetically controlled with a constant curvature model. In our design, the coil springs act not only as a flexible slider but also as a resistive positional sensor. Changes in the resistance between the stator and the coil spring end are converted to a voltage and used for position detection. Each flexible ultrasonic motor with the self-sensing is experimentally evaluated, and it has shown good response characteristics, high sensor linearity, and robustness, without losing flexibility and controllability. We build a twin coil spring-based flexible ultrasonic motor prototype and demonstrate feedback control of planar motion based on the constant curvature model.

11:00-11:15, Paper WeAT22.5
Fluid-Structure Interaction Hydrodynamics Analysis on a Deformed Bionic Flipper with Non-Uniformly Distributed Stiffness
Video Attachment

Huang, Jinguo	Beihang University
Sun, Yilun	Technical University of Munich
Wang, Tianmiao	Beihang University
Lueth, Tim C.	Technical University of Munich
Liang, Jianhong	Beihang University
Yang, Xingbang	Beihang University
Keywords: Biomimetics, Simulation and Animation, Dynamics Abstract: Although the biologically flexible flippers of the cormorant (Phalacrocorax) are believed to be one of the most important features to achieve optimal swimming performance before take off, studies on a deformable bionic flipper with a non-uniformly distributed stiffness are rare. In this paper, we present a fully coupled fluid-structure interaction solver based on computational fluid dynamics (FSI-CFD), which can deal with the dynamic interplay between flexible aquatic animals and the ambient medium. The numerical solutions of the physical models are presented to gain the unsteady hydrodynamic distribution during the power stroke and recovery stroke. We quantified the three-axis component distribution, and the results show that the horizontal force of the fluid will not provide positive thrust during the initial take-off stage. Greater lift and forerake moment are generated, which brings the body off the water as soon as possible and reduces the angle of attack. As the angle of attack decreases, the positive thrust will be generated, and the forward velocity and the lift will be further increased. As the draft area of the cormorant is reduced, the wings will start flapping, further increasing the lift and thrust, and thus taking off. This solver will serve as a framework for the future bio-inspired studies involving active and passive control associated with complex structural material.

11:15-11:30, Paper WeAT22.6
Self-Healing Cell Tactile Sensor by Ultraflexible Printed Electrodes

Shimizu, Masahiro	Osaka University
Fujie, Toshinori	Tokyo Institute of Technology
Umedachi, Takuya	Shinshu University
Shigaki, Shunsuke	Osaka University
Kawashima, Hiroki	Osaka University
Saito, Masato	Tokyo Institute of Technology
Ohashi, Hirono	Osaka University
Hosoda, Koh	Osaka University
Keywords: Soft Sensors and Actuators, Soft Robot Materials and Design, Biological Cell Manipulation Abstract: We used cells, which are the units that make up a living body, as building blocks to design a biomachine hybrid system and develop a tactile sensor that uses living cells as sensor receptors. We fabricated a novel cell tactile sensor with the electrodes formed using printed electronics technology. This sensor comprises elastic electrodes mounted on a soft material to acquire tactile information; similar to a conventional cell tactile sensor, it acquires signals through mechanical stimulation. Further, self-organization of cells can be induced, and logical processing such as selective responses to stimuli can be performed directly by the physical system, without any coding using programming languages. The proposed novel cell tactile sensor that uses printed electrodes is small enough to mount on robots. Interestingly, we confirmed the self-healing properties of the proposed sensor after cells were injured mechanically.


WeAT23	Room T23
Soft Sensors II	Regular session
Chair: Althoefer, Kaspar	Queen Mary University of London
Co-Chair: Kyung, Ki-Uk	Korea Advanced Institute of Science & Technology (KAIST)

10:00-10:15, Paper WeAT23.1
Self-Sensing Soft Tactile Actuator for Fingertip Interface

Youn, Jung-Hwan	Korea Advanced Institute of Science and Technology (KAIST)
Yasir, Ibrahim Bin	Korea Advanced Institute of Science and Technology (KAIST)
Kyung, Ki-Uk	Korea Advanced Institute of Science & Technology (KAIST)
Keywords: Haptics and Haptic Interfaces, Soft Sensors and Actuators, Soft Robot Applications Abstract: In this paper, we report a self-sensing soft tactile actuator based on Dielectric elastomer actuator (DEA) for wearable haptic interface. DEAs are one of electroactive polymer actuators, which are reported to have large area strain and fast response speed. A soft tactile actuator is constructed of a multi-layered DEA membrane layer, a passive membrane layer, and an inner circular pillar. The soft actuator was optimized by varying the geometry, and the force and displacement tests were conducted under a frequency range of 0 to 30 Hz. The selected actuator produces an output force up to 0.9 N, with a displacement of 1.43 mm. To provide accurate physical force feedback to the user, the actuator is integrated with a 1.1 mm thick film-type soft force sensor that enables feedback control. Under the pressure, touch layer contacts with the core, and the light inside the core scatters to the touch layer. A fabricated soft force sensor can measure the force in a range of 0 to 1.25 N under various frequency ranges. Our wearable prototype exhibits high output force of 0.9 N, as well as flexibility, conformity, and light-weight structure (3.2 g).

10:15-10:30, Paper WeAT23.2
3D Printed Bio-Inspired Hair Sensor for Directional Airflow Sensing

Rajasekaran, Keshav	University of Maryland
Bae, Hyung Dae	Howard University
Bergbreiter, Sarah	Carnegie Mellon University
Yu, Miao	University of Maryland
Keywords: Additive Manufacturing, Biomimetics, Nanomanufacturing Abstract: With reduction in the scale of unmanned air vehicles, there is an increasing need for lightweight, compact, low-power sensors and alternate sensing modalities to facilitate flight control and navigation. This paper presents a novel method to fabricate a micro-scale artificial hair sensor that is capable of directional airflow sensing. The sensor consists of a high-aspect ratio hair structure attached to a thin flexible membrane. When subjected to airflow, the hair deflection induces a deformation of the membrane. Two pairs of perpendicular electrodes are attached to the membrane, which allow the sensing of airflow amplitude and direction through the measurement of differential capacitance. The sensor structure is fabricated by using two photon polymerization, which is integrated onto a miniature PCB circuit board to allow simple measurement. The sensor's responses to static displacement loading from different directions were characterized, and are in good agreement with the simulation results. Finally, the sensor's capability for directional airflow measurement was demonstrated with a clear correlation between flow speed and sensor output.

10:30-10:45, Paper WeAT23.3
Silicone-Based Capacitive E-Skin for Exteroception and Proprioception

Dawood, Abu Bakar	Queen Mary University of London
Godaba, Hareesh	Queen Mary University of London
Ataka, Ahmad	Queen Mary University of London
Althoefer, Kaspar	Queen Mary University of London
Keywords: Soft Sensors and Actuators, Soft Robot Materials and Design, Sensor Networks Abstract: Thin and imperceptible soft skins that can detect internal de-formations as well as external forces, can go a long way to ad-dress perception and control challenges in soft robots. However, decoupling proprioceptive and exteroceptive stimuli is a chal-lenging task. In this paper, we present a silicone-based, capaci-tive E-skin for exteroception and proprioception (SCEEP). This soft and stretchable sensor can perceive stretch as along with touch at 100 different points via its 100 tactels. In this paper, we present a novel algorithm that decouples global strain from local indentations due to external forces. The soft skin is 10.1cm in length and 10cm in width and can be used to accurately measure the global strain of up to 25% with an error of under 3%; while at the same time, can determine the amplitude and position of local indentations. This is a step towards a fully soft electronic skin that can act as a proprioceptive sensor to meas-ure internal states while measuring external forces.

10:45-11:00, Paper WeAT23.4
Shape Reconstruction of CCD Camera-Based Soft Tactile Sensors

Soter, Gabor	University of Bristol
Hauser, Helmut	University of Bristol
Conn, Andrew	University of Bristol
Rossiter, Jonathan	University of Bristol
Nakajima, Kohei	University of Tokyo
Keywords: Soft Robot Applications, Soft Sensors and Actuators, Novel Deep Learning Methods Abstract: CCD camera-based tactile sensors provide high-resolution information about the deformation of soft and elastic interfaces. However, they have poor scalability as it is difficult to sense a large surface area without increasing the distance between the camera and the interface or using multiple processing chips. For example, using such tactile sensors for a whole robotic arm is not yet possible. In this work, we demonstrate a data driven method that can reconstruct the high-resolution information about deformation of the soft interface while keeping the space requirements and power consumption relatively low. Our modified tactile sensor incorporates two independent sensing techniques, one low- and one high-resolution, and we learn to map to the latter from the former. As a low-resolution sensor, we use liquid-filled channels that transmit the information from the location of the tactile interaction to a rigid display, where the liquid displacements are tracked by a CCD camera. Simultaneously, the same interaction is measured by tracking the markers on the bottom of the sensor using a second CCD camera. After data collection, we train two different machine learning models to reconstruct the time series of the high-resolution sensor. By training a convolutional autoencoder (CAE) and attaching it to the recurrent neural network (RNN), we demonstrate the reconstruction of high-resolution video frames using only the time series of the low-resolution sensor.


WeBT1	Room T1
Activity Recognition	Regular session
Chair: Wu, Weili	University of Texas at Dallas
Co-Chair: Mason, Celeste	University of Bremen

11:45-12:00, Paper WeBT1.1
Personalized Online Learning with Pseudo-Ground Truth

Losing, Viktor	Honda Research Institute Europe GmbH
Hasenjaeger, Martina	Honda Research Institute Europe GmbH
Yoshikawa, Taizo	Honda R&D Japan
Keywords: AI-Based Methods Abstract: Personalized online machine learning allows a very accurate modelling of individual behavior and demands. In particular, a system that dynamically adapts during runtime can initiate a continuous collaboration with its user where both alternately adjust to each other to maximize the system's utility. However, in application scenarios based on supervised learning it is often unclear how to obtain the required ground truth for such dynamic systems. In this paper, we focus on applications where a real-time classification of sequential data is crucial. Concretely, we propose to adapt an online personalized model solely based on pseudo-ground-truth information which is provided by another machine learning model. This model has the advantage to classify sequences in retrospective with a small delay and thus is able to achieve a higher performance than real-time systems. In particular, it is a pre-trained offline model, which means that no ground-truth information is necessary during runtime. We apply the proposal on the task of online action classification, for which the benefits of personalization have been recently emphasized.

12:00-12:15, Paper WeBT1.2
Explainable and Efficient Sequential Correlation Network for 3D Single Person Concurrent Activity Detection
Video Attachment

Wei, Yi	University at Albany, State University of New York
Li, Wenbo	Samsung Research America
Chang, Ming-Ching	University at Albany - SUNY
Jin, Hongxia	Samsung Research America
Lyu, Siwei	SUNY Albany
Keywords: Deep Learning for Visual Perception, Surveillance Systems, RGB-D Perception Abstract: We present the sequential correlation network (SCN) to improve concurrent activity detection. SCN combines a recurrent neural network and a correlation model hierarchically to model the complex correlations and temporal dynamics of concurrent activities. SCN has several advantages that enable effective learning even from a small dataset for real-world deployment. Unlike the majority of approaches assuming that each subject performs one activity at a time, SCN is end-to-end trainable, ie, it can automatically learn the inclusive or exclusive relations of concurrent activities. SCN is lightweight in design using only a small set of learnable parameters to model the spatio-temporal correlations of activities. This also enhances the explainability of the learned parameters. Furthermore, the learning of SCN can benefit from the initialization using semantically meaningful priors. We evaluate the proposed method against the state-of-the-art method on two benchmark datasets with human skeletal data, SCN achieves comparable performance to the SOTA but with much faster inference speed and less memory usage.

12:15-12:30, Paper WeBT1.3
Faster Healthcare Time Series Classification for Boosting Mortality Early Warning System

Hu, Yanke	University of Texas at Dallas
Subramanian, Raj	Humana
An, Wangpeng	Tsinghua University
Zhao, Na	Peking University School and Hospital of Stomatology
Wu, Weili	University of Texas at Dallas
Keywords: Health Care Management, Novel Deep Learning Methods Abstract: Electronic Health Record (EHR) and healthcare claim data provide rich clinical information for time series analysis. In this work, we provide a different angle of solving healthcare multivariate time series classification by turning it into a computer vision problem. We propose a Convolutional Feature Engineering (CFE) methodology, that can effectively extract long sequence dependency time series features. Combined with LightGBM, it can achieve the state-of-the-art results with 35X speed acceleration compared with LSTM based approaches on MIMIC-III In Hospital Mortality benchmark task. We deploy CFE based LightGBM into our Mortality Early Warning System at Humana, and train it on 1 million member samples. The offline metrics shows that this new approach generates better-quality predictions than previous LSTM based approach, and meanwhile greatly decrease the training and inference time.

12:30-12:45, Paper WeBT1.4
Action Sequence Predictions of Vehicles in Urban Environments Using Map and Social Context

Zaech, Jan-Nico	ETH Zurich
Dai, Dengxin	ETH Zurich
Liniger, Alexander	ETH Zurich
Van Gool, Luc	ETH Zurich
Keywords: Big Data in Robotics and Automation, Computer Vision for Transportation Abstract: This work studies the problem of predicting the sequence of future actions for surrounding vehicles in real-world driving scenarios. To this aim, we make three main contributions. The first contribution is an automatic method to convert the trajectories recorded in real-world driving scenarios to action sequences with the help of HD maps. The method enables automatic dataset creation for this task from large-scale driving data. Our second contribution lies in applying the method to the well-known traffic agent tracking and prediction dataset Argoverse, resulting in 228,000 action sequences. We also manually annotated 2,245 action sequences for testing. The third contribution is to propose a novel action sequence prediction method by integrating past positions and velocities of the traffic agents, map information and social context into a single end-to-end trainable neural network. Our experiments prove the merit of the data creation method and the value of the created dataset -- prediction performance improves consistently with the size of the dataset and shows that our action prediction method outperforms comparing models.

12:45-13:00, Paper WeBT1.5
Multi-Label Long Short-Term Memory for Construction Vehicle Activity Recognition with Imbalanced Supervision

Abe, Haruka	Tokyo Institute of Technology
Hino, Takuya	Komatsu Ltd
Sugihara, Motohide	Komatsu Ltd
Ikeya, Hiroki	Komatsu Ltd
Shimosaka, Masamichi	Tokyo Institute of Technology
Keywords: Novel Deep Learning Methods, Representation Learning, Big Data in Robotics and Automation Abstract: Sensor-based activity recognition for construction vehicles is useful for evaluating the skills of the operator, measuring work efficiency, and many other use cases. Therefore, many researches have explored robust activity-recognition models. However, it remains a challenge to apply the model to many construction sites because of the imbalance of the dataset. While it is natural to employ multi-label representation on imbalanced data with a large number of activity categories, multi-label robust classification for activity recognition has yet to be resolved because of the nature of the time-series property. In this work, we propose a novel multi-label long short-term memory (LSTM) model, which is effective for the sequence multi-labeling problem. The proposed model has connections to the temporal direction and attribute direction, which exploit both the temporal pattern and co-occurrence among attributes. In addition, by providing a bidirectional connection structure in the attribute direction, the model enables us to alleviate the dependency of the chain order in what we call ``classifier chain``, which is a classical approach to multi-label classification. To validate our methods, we conduct experiments using real-world construction-vehicle dataset.

13:00-13:15, Paper WeBT1.6
From Human to Robot Everyday Activity

Mason, Celeste	University of Bremen
Gadzicki, Konrad	University of Bremen
Meier, Moritz	University of Bremen
Ahrens, Florian	University of Bremen
Kluss, Thorsten	University of Bremen, Cognitive Neuroinformatics
Maldonado, Jaime	University of Bremen
Putze, Felix	Karlsruhe Institute of Technology
Fehr, Thorsten	University of Bremen
Zetzsche, Christoph	University of Bremen
Herrmann, Manfred	University of Bremen
Schill, Kerstin	University of Bremen
Schultz, Tanja	University of Bremen
Keywords: Big Data in Robotics and Automation, Humanoid Robot Systems, Cognitive Human-Robot Interaction Abstract: The Everyday Activities Science and Engineering(EASE) Collaborative Research Consortium�s mission to enhance the performance of cognition-enabled robots establishes its foundation in the EASE Human Activities Data Analysis Pipeline. Through collection of diverse human activity information resources, enrichment with contextually relevant annotations, and subsequent multimodal analysis of the combined data sources, the pipeline described will provide a rich resource for robot planning researchers, through incorporation in the OPENEASE cloud platform.


WeBT2	Room T2
Calibration and Identification I	Regular session
Chair: Lim, Jongwoo	Hanyang University
Co-Chair: Saripalli, Srikanth	Texas A&M

11:45-12:00, Paper WeBT2.1
Non-Overlapping RGB-D Camera Network Calibration with Monocular Visual Odometry
Video Attachment

Koide, Kenji	National Institute of Advanced Industrial Science and Technology
Menegatti, Emanuele	The University of Padua
Keywords: Calibration and Identification, RGB-D Perception Abstract: This paper describes a calibration method for RGB-D camera networks consisting of not only static overlapping, but also dynamic and non-overlapping cameras. The proposed method consists of two steps: online visual odometry-based calibration and depth image-based calibration refinement. It first estimates the transformations between overlapping cameras using fiducial tags, and bridges non-overlapping camera views through visual odometry that runs on a dynamic monocular camera. Parameters such as poses of the static cameras and tags, as well as dynamic camera trajectory, are estimated in the form of the pose graph-based online landmark SLAM. Then, depth-based ICP and floor constraints are added to the pose graph to compensate for the visual odometry error and refine the calibration result. The proposed method is validated through evaluation in simulated and real environments, and a person tracking experiment is conducted to demonstrate the data integration of static and dynamic cameras.

12:00-12:15, Paper WeBT2.2
Set-Membership Extrinsic Calibration of a 3D LiDAR and a Camera

Voges, Raphael	Leibniz Universit�t Hannover
Wagner, Bernardo	Leibniz Universit�t Hannover
Keywords: Calibration and Identification, Formal Methods in Robotics and Automation, Multi-Modal Perception Abstract: To fuse information from a 3D Light Detection and Ranging (LiDAR) sensor and a camera, the extrinsic transformation between the sensor coordinate systems needs to be known. Therefore, an extrinsic calibration must be performed, which is usually based on features extracted from sensor data. Naturally, sensor errors can affect the feature extraction process, and thus distort the calibration result. Unlike previous works, which do not consider the uncertainties of the sensors, we propose a set-membership approach that takes all sensor errors into account. Since the actual error distribution of off-the-shelf sensors is often unknown, we assume to only know bounds (or intervals) enclosing the sensor errors and accordingly introduce novel error models for both sensors. Next, we introduce interval-based approaches to extract corresponding features from images and point clouds. Due to the unknown but bounded sensor errors, we cannot determine the features exactly, but compute intervals guaranteed to enclose them. Subsequently, these feature intervals enable us to formulate a Constraint Satisfaction Problem (CSP). Finally, the CSP is solved to find a set of solutions that is guaranteed to contain the true solution and simultaneously reflects the accuracy of the calibration. Experiments using simulated and real data validate our approach and show its advantages over existing methods.

12:15-12:30, Paper WeBT2.3
Experimental Evaluation of 3D-LIDAR Camera Extrinsic Calibration

Mishra, Subodh	Texas A&M University
Osteen, Philip	U.S. Army Research Laboratory
Pandey, Gaurav	Ford Motor Company
Saripalli, Srikanth	Texas A&M
Keywords: Multi-Modal Perception, Sensor Fusion, Calibration and Identification Abstract: In this paper we perform an extensive experimental evaluation of three planar target based 3D-LIDAR camera calibration algorithms, on a sensor suite consisting multiple 3D-LIDARs and cameras, assessing their robustness to random initialization and by using metrics like Mean Line Re-projection Error (MLRE) and Factory Stereo Calibration Error. We briefly describe each method and provide insights into practical aspects like ease of data collection. We also show the effect of noisy sensor on the calibration result and conclude with a note on which calibration algorithm should be used under what circumstances.

12:30-12:45, Paper WeBT2.4
Kalman Filter Based Range Estimation and Clock Synchronization for Ultra Wide Band Networks
Video Attachment

Senevirathna, Nushen Manithya	University of Moratuwa, Sri Lanka
De Silva, Oscar	Memorial University of Newfoundland
Mann, George K. I.	Memorial University of Newfoundland
Gosine, Raymond G.	Memorial University of Newfoundland
Keywords: Range Sensing, Sensor Networks, Localization Abstract: This paper presents the development of a Kalman filter-based range estimation technique to precisely calculate the inter-node ranges of Ultra-Wide Band (UWB) modules. Relative clock tracking filters running between every anchor pair tracks relative clock dynamics while estimating the time of flight as a filter state. Both inbound and outbound message timestamps are used to update the filter to make the time of flight observable in the chosen state space design. A faster relative clock filter convergence has been achieved with the inclusion of the clock offset ratio as a measurement additional to the timestamps. Furthermore, a modified gradient clock synchronization algorithm is used to achieve global clock synchronization throughout the network. A correction term is used in the gradient clock synchronization algorithm to enforce the global clock rate to converge at the average of individual clock rates while achieving asymptotic stability in clock rate error state. Experiments are conducted to evaluate synchronization and ranging accuracy of the proposed range estimation approach.

12:45-13:00, Paper WeBT2.5
Unified Calibration for Multi-Camera Mult-LiDAR Systems Using a Single Checkerboard
Video Attachment

Lee, Wonmyung	Hanyang University
Won, Changhee	Hanyang University
Lim, Jongwoo	Hanyang University
Keywords: Sensor Fusion, Calibration and Identification, Omnidirectional Vision Abstract: In this paper, we propose a unified calibration method for multi-camera multi-LiDAR systems. Only using a single planar checkerboard, the captured checkerboard frames by each sensor are classified as either global frames if they are observed by at least two sensors, or a local frame if observed by a single camera. Both global and local frames of each camera are used to estimate its intrinsic parameters, whereas the global frames between sensors are for computing their relative poses. The global frames are used to optimize intrinsic parameters of the cameras, and global poses of the sensors and the checkerboards while the local frames are used to optimize intrinsic parameters of the cameras to prevent their over-fitting to the global frames. In contrast to the previous methods that simply combine the pairwise poses (e.g., camera-to-camera or camera-to-LiDAR) that are separately estimated, we further optimize the sensor poses in the system globally using all observations as the constraints in the optimization problem. We find that the point-to-plane distances are effective as camera-to-LiDAR constraints where the points are 3D positions of the checkerboard corners and the planes are estimated from the LiDAR point-cloud. Also, abundant corner observations in the local frames enable the joint optimization of intrinsic and extrinsic parameters in a unified framework. In contrast to previous calibration methods which optimize relative poses between a pair of sensors (e.g., camera to camera, or camera to LiDAR) and merging them, we jointly optimize extrinsic parameters of the sensors in the global coordinate system via bundle adjustment. The proposed calibration method utilizes entire observations in a unified global optimization framework, and it significantly reduces the error caused by the simple composition of the relative sensor poses. The proposed global optimization framework can decrease cumulative errors from merging the relative poses by maximizing the utilization of meaningful sensor observations. We extensively evaluate the proposed algorithm qualitatively and quantitatively using real and synthetic datasets. We plan to make the implementation open to the public with the paper publication.


WeBT4	Room T4
Assembly and Picking	Regular session
Chair: Liarokapis, Minas	The University of Auckland
Co-Chair: B�rger, Mathias	Bosch Center for Artificial Intelligence

11:45-12:00, Paper WeBT4.1
A Learning-Based Robotic Bin-Picking with Flexibly Customizable Grasping Conditions
Video Attachment

Tachikake, Hiroki	YASKAWA Electric Corporation and AI Cube Inc
Watanabe, Wataru	AI Cube Inc
Keywords: Industrial Robots, Deep Learning in Grasping and Manipulation Abstract: A practical robotic bin-picking system requires a high grasp success rate for various objects. Also, the system must be capable of coping with various constraints and their changes flexibly. To resolve these issues, this study proposes a novel deep learning-based method that exploits a simulator to generate desired grasping actions. The features of this method are as follows: (1) Grasping conditions for any object can be flexibly customizable in the simulated environment to improve the real-world grasping actions. (2) Sensor input (RGB image) is directly regressed to grasping actions by using convolutional processing. Owing to these features, the system using the proposed method can grasp objects with geometric variations, semi-transparent objects, and objects with a biased center of gravity. Experimental results on a real robot system show that the proposed method exhibits a high grasp success rate for four types of different objects and adapts to two additional constraints of grasping condition.

12:00-12:15, Paper WeBT4.2
Precision Assembly of Heavy Objects Suspended with Multiple Cables from a Crane
Video Attachment

Hoffman, Rachel	Massachusetts Institute of Technology
Asada, Harry	MIT
Keywords: Compliant Assembly, Factory Automation, Assembly Abstract: A new approach to precision mating of heavy objects suspended from overhead cranes is presented. We have found through experiments that heavy shafts suspended with multiple cables at specific positions and orientations can be inserted into a chamfered hole despite a small clearance. This will allow an overhead crane, although limited in positioning accuracy, to execute precision assembly of a heavy shaft simply by holding it with multiple cables and lowering it onto the chamfered hole of a fixed object. Unlike the well-known Remote Center Compliance (RCC) hand, this method does not use a two-layer elastic structure but exploits the physical properties of cables. Specifically, cables go slack when a compressive load is applied. This unidirectional load bearing property is exploited to suspend a heavy shaft such that it is not over-constrained during insertion. Conditions for the cable attachment position and orientation for successful insertion are obtained. A proof-of-concept prototype is developed and experimental verification of the principle and analysis are presented..

12:15-12:30, Paper WeBT4.3
Sim-To-Real Transfer of Bolting Tasks with Tight Tolerance
Video Attachment

Son, Dongwon	Seoul National University
Yang, Hyunsoo	Seoul National University
Lee, Dongjun	Seoul National University
Keywords: Assembly, Contact Modeling, Reinforecment Learning Abstract: In this paper, we propose a novel sim-to-real framework to solve bolting tasks with tight tolerance and complex contact geometry which are hard to be modeled. The sim-to-real has desirable features in terms of cost and safety, however, that of the assembly task is rare due to the lack of simulator, which can robustly render multi-contact assembly. We implement the sim-to-real transfer of nut tightening policy which is adaptive to uncertain bolt positions. This can be realized through developing a novel contact model, which is fast and robust to complex assembly geometry, and novel hierarchical controller with reinforcement learning (RL), which can perform the tasks with a narrow and complicated path. The fast and robust contact model is achieved by utilizing configuration space abstraction and passive midpoint integrator (PMI), which render the simulator robust even in a high stiffness contact condition. And we use sampling-based motion planning to construct a path library and design linear quadratic tracking controller as a low-level controller to be compliant and avoid local optima. Additionally, we use the RL agent as a high-level controller to make it possible to adapt to the bolt position uncertainty, thereby realizing sim-to-real. Experiments are performed to verify our proposed sim-to-real framework.

12:30-12:45, Paper WeBT4.4
Combining Compliance Control, CAD Based Localization, and a Multi-Modal Gripper for Rapid and Robust Programming of Assembly Tasks
Video Attachment

Gorjup, Gal	The University of Auckland
Gao, Geng	University of Auckland
Dwivedi, Anany	University of Auckland
Liarokapis, Minas	The University of Auckland
Keywords: Assembly, Compliant Assembly Abstract: Current trends in industrial automation favor agile systems that allow adaptation to rapidly changing task requirements and facilitate customized production in smaller batches. This work presents a flexible manufacturing system relying on compliance control, CAD based localization, and a multi-modal gripper to enable fast and efficient task programming for assembly operations. CAD file processing is employed to extract component pose data from 3D assembly models, while the system's active compliance compensates for errors in calibration or positioning. To minimize retooling delays, a novel gripper design incorporating both a parallel jaw element and a rotating module is proposed. The developed system placed first in the manufacturing track of the Robotic Grasping and Manipulation Competition of the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) 2019, experimentally validating its efficiency.

12:45-13:00, Paper WeBT4.5
Learning and Sequencing of Object-Centric Manipulation Skills for Industrial Tasks
Video Attachment

Rozo, Leonel	Bosch Center for Artificial Intelligence
Guo, Meng	Bosch Group
Kupcsik, Andras	Bosch Center for AI
Todescato, Marco	Robert Bosch GmbH
Schillinger, Philipp	Bosch Center for Artificial Intelligence
Giftthaler, Markus	Google
Ochs, Matthias	Robert Bosch GmbH, Corporate Research
Spies, Markus	Bosch Center for Artificial Intelligence
Waniek, Nicolai	Robert Bosch GmbH
Kesper, Patrick	Bosch Center for Artificial Intelligence
B�rger, Mathias	Bosch Center for Artificial Intelligence
Keywords: Manipulation Planning, Learning from Demonstration, Assembly Abstract: Enabling robots to quickly learn manipulation skills is an important, yet challenging problem. Such manipulation skills should be flexible, e.g., be able adapt to the current workspace configuration. Furthermore, to accomplish complex manipulation tasks, robots should be able to sequence several skills and adapt them to changing situations. In this work, we propose a rapid robot skill-sequencing algorithm, where the skills are encoded by object-centric hidden semi-Markov models. The learned skill models can encode multimodal (temporal and spatial) trajectory distributions. This approach significantly reduces manual modeling efforts, while ensuring a high degree of flexibility and re-usability of learned skills. Given a task goal and a set of generic skills, our framework computes smooth transitions between skill instances. To compute the corresponding optimal end-effector trajectory in task space we rely on Riemannian optimal controller. We demonstrate this approach on a 7~DoF robot arm for industrial assembly tasks.

13:00-13:15, Paper WeBT4.6
Sample-Efficient Learning for Industrial Assembly Using Qgraph-Bounded DDPG
Video Attachment

Hoppe, Sabrina	Robert Bosch, Corporate Research
Giftthaler, Markus	Google
Krug, Robert	Bosch Corporate Research
Toussaint, Marc	Tu Berlin
Keywords: Factory Automation, Reinforecment Learning, Assembly Abstract: Recent progress in deep reinforcement learning has enabled agents to autonomously learn complex control strategies from scratch. Model-free approaches like Deep Deterministic Policy Gradients (DDPG) seem promising for applications with intricate dynamics, such as contact-rich manipulation tasks. However, these methods typically require large amounts of training data or meticulous hyperparameter tuning, limiting their usefulness for real-world robotics applications. In this paper, we evaluate and benchmark our recently proposed approach for improving model-free reinforcement learning with DDPG through Qgraph-based bounds in temporal difference learning. We directly apply the algorithm to a challenging real-world industrial insertion task and assess its performance (see https://youtu.be/Z_GcNbCWE-E). Empirical results show that the insertion task can be learned despite significant frictional forces and uncertainty, even in sparse-reward settings. We present an in-depth comparison based on a large number of experiments and demonstrate the advantages and performance of Qgraph-bounded DDPG: the learning process can be significantly sped up, robustified against bad choices of hyperparameters and runs with less memory requirements. Lastly, the presented results extend the current theoretical understanding of the link between data graph structure and soft divergence in DDPG.


WeBT5	Room T5
Cooperative Manipulation	Regular session
Chair: Antonelli, Gianluca	University of Cassino and Southern Lazio
Co-Chair: Shorinwa, Ola	Stanford University

11:45-12:00, Paper WeBT5.1
LegoBot: Automated Planning for Coordinated Multi-Robot Assembly of LEGO Structures

N�gele, Ludwig	University of Augsburg
Hoffmann, Alwin	University of Augsburg
Schierl, Andreas	University of Augsburg
Reif, Wolfgang	University of Augsburg
Keywords: Intelligent and Flexible Manufacturing, Cooperating Robots, Planning, Scheduling and Coordination Abstract: Multi-functional cells with cooperating teams of robots promise to be flexible, robust, and efficient and, thus, are a key to future factories. However, their programming is tedious and AI-based planning for multiple robots is computionally expensive. In this work, we present a modular and efficient twolayer planning approach for multi-robot assembly. The goal is to generate the program for coordinated teams of robots from an (enriched) 3D model of the target assembly. Although the approach is both motivated and evaluated with LEGO, which is a challenging variant of blocks world, the approach can be customized to different kinds of assembly domains.

12:00-12:15, Paper WeBT5.2
Experiments on Whole-Body Control of a Dual-Arm Mobile Robot with the Set-Based Task-Priority Inverse Kinematics Algorithm
Video Attachment

Di Lillo, Paolo Augusto	University of Cassino and Southern Lazio
Pierri, Francesco	Universit� Della Basilicata
Caccavale, Fabrizio	Universit� Degli Studi Della Basilicata
Antonelli, Gianluca	Univ. of Cassino and Southern Lazio
Keywords: Redundant Robots, Whole-Body Motion Planning and Control, Service Robots Abstract: In this paper an experimental study of set-based task-priority kinematic control for a dual-arm mobile robot is developed. The control strategy for the coordination of the two manipulators and the mobile base relies on the definition of a set of elementary tasks to be properly handled depending on their functional role. In particular, the tasks have been grouped into three categories: safety, operational and optimization tasks. The effectiveness of the resulting task hierarchy has been validated through experiments on a Kinova Movo robot, in a domestic use case scenario.

12:15-12:30, Paper WeBT5.3
Collision Reaction through Internal Stress Loading in Cooperative Manipulation
Video Attachment

Aladele, Victor	Georgia Institute of Technology
Hutchinson, Seth	Georgia Institute of Technology
Keywords: Cooperating Robots, Dual Arm Manipulation, Multi-Robot Systems Abstract: Cooperative manipulation offers many advantages over single-arm manipulation. However, this comes at a cost of added complexity, both in modeling and control of multi-arm systems. Much research has been focused on determining optimal load distribution strategies based on several objective functions, some of which include manipulability, energy consumption and joint torque minimization. This paper presents an internal loading strategy that is subject to the estimate of the external disturbances along the body of one or more of the arms involved in the manipulation process. The authors of this paper propose a reaction strategy to external disturbances by transforming the disturbance forces into internal forces on the object through appropriate load distribution on the cooperative arms. The goal is to have a set-point on the object, track a given trajectory while compensating for external disturbances along the links of some of the robot arms involved in the cooperative manipulation.

12:30-12:45, Paper WeBT5.4
Scalable Collaborative Manipulation with Distributed Trajectory Planning
Video Attachment

Shorinwa, Ola	Stanford University
Schwager, Mac	Stanford University
Keywords: Mobile Manipulation, Optimization and Optimal Control, Distributed Robot Systems Abstract: We present a distributed algorithm to enable a group of robots to collaboratively manipulate an object to a desired configuration while avoiding obstacles. Each robot solves a local optimization problem iteratively and communicates with its local neighbors, ultimately converging to the optimal trajectory of the object over a receding horizon. The algorithm scales efficiently to large groups, with a convergence rate constant in the number of robots, and can enforce constraints that are only known to a subset of the robots, such as for collision avoidance using local online sensing. We show that the algorithm converges many orders of magnitude faster, and results in a tracking error two orders of magnitude lower, than competing distributed collaborative manipulation algorithms based on Consensus alternating direction method of multipliers (ADMM).

12:45-13:00, Paper WeBT5.5
Distributed Control for Cooperative Manipulation with Event-Triggered Communication (I)

Budde genannt Dohmann, Pablo	Technical University Munich
Hirche, Sandra	Technische Universit�t M�nchen
Keywords: Distributed Robot Systems, Multi-Robot Systems, Networked Robots


WeBT6	Room T6
Dexterous Manipulation	Regular session
Chair: Rodriguez, Alberto	Massachusetts Institute of Technology
Co-Chair: Namiki, Akio	Chiba University

11:45-12:00, Paper WeBT6.1
High-Speed Catching by Multi-Vision Robot Hand
Video Attachment

Sato, Masaki	Chiba University
Takahashi, Akira	Chiba University
Namiki, Akio	Chiba University
Keywords: Multifingered Hands, Visual Servoing, Manipulation Planning Abstract: In this paper, we propose a "multi-vision hand" system which has a number of small high-speed cameras arranged on its surface. And we propose visual serving control using the multi-eye system. As a target of control with the multi-vision hand system, the ball catching motion control. In the proposed catching control, the catch position of the ball estimated is corrected by the multi-vision hand in realtime. In experiments, catch operation was successful by the correction of the catch position, confirming the effectiveness of the multi-vision hand system.

12:00-12:15, Paper WeBT6.2
High-Speed Hitting Grasping with Magripper, a Highly Backdrivable Gripper Using Magnetic Gear and Plastic Deformation Control
Video Attachment

Tanaka, Satoshi	The University of Tokyo
Koyama, Keisuke	Osaka University
Senoo, Taku	Hiroshima University
Shimojo, Makoto	University of Electro-COmmunications
Ishikawa, Masatoshi	University of Tokyo
Keywords: Grippers and Other End-Effectors, Grasping, Compliance and Impedance Control Abstract: In this study, Magripper, a highly backdrivable gripper, is developed to achieve high-speed hitting grasping executed seamlessly from reaching. The gripper is designed to achieve both high speed and environmental adaptability. The key element is backdrivability in terms of both hardware and control. In Magripper, a magnetic gear is introduced to passively absorb shock in the moment of contact as a means of hardware backdrivability, and backdrive control is implemented based on the Zener model. After developing a hitting grasping framework, high-speed hitting grasping tasks with a wood block, a wood cylinder, and a plastic coin are conducted using only servo control without sensors, such as cameras and tactile sensors. In particular, coin grasping with high-speed movement is very difficult because collisions with environmental objects such as the floor and desk, are likely, which may break a robot.

12:15-12:30, Paper WeBT6.3
PnuGrip: An Active Two-Phase Gripper for Dexterous Manipulation
Video Attachment

Taylor, Ian	Massachusetts Institute of Technology
Chavan-Dafle, Nikhil	Massachusetts Institute of Technology
Li, Godric	MIT
Doshi, Neel	MIT
Rodriguez, Alberto	Massachusetts Institute of Technology
Keywords: Dexterous Manipulation, Mechanism Design, Factory Automation Abstract: We present the design of an active two-phase finger for mechanically mediated dexterous manipulation. The finger enables re-orientation of a grasped object by using a pneumatic braking mechanism to transition between free-rotating and fixed (i.e., braked) phases. Our design allows controlled high-bandwidth (5 Hz) phase transitions independent of the grasping force for manipulation of a variety of objects. Moreover, its thin profile (1 cm) facilitates picking and placing in clutter. Finally, the design features a sensor for measuring fingertip rotation to support feedback control. We experimentally characterize the finger's load handling capacity in the brake phase and rotational resistance in the free phase. We also demonstrate several pick-and-place manipulations common to industrial and laboratory automation settings that are simplified by our design.

12:30-12:45, Paper WeBT6.4
Design and Control of Roller Grasper V2 for In-Hand Manipulation
Video Attachment

Yuan, Shenli	Stanford University
Shao, Lin	Stanford University
Yako, Connor	Stanford University
Gruebele, Alexander	Stanford University
Salisbury, Kenneth	Stanford University
Keywords: Grippers and Other End-Effectors, In-Hand Manipulation, Deep Learning in Grasping and Manipulation Abstract: The ability to perform in-hand manipulation still remains an unsolved problem; having this capability would allow robots to perform sophisticated tasks requiring repositioning and reorienting of grasped objects. In this work, we present a novel non-anthropomorphic robot grasper with the ability to manipulate objects by means of active surfaces at the fingertips. Active surfaces are achieved by spherical rolling fingertips with two degrees of freedom (DoF) - a pivoting motion for surface reorientation - and a continuous rolling motion for moving the object. A further DoF is in the base of each finger, allowing the fingers to grasp objects over a range of size and shapes. Instantaneous kinematics was derived and objects were successfully manipulated both with a custom handcrafted control scheme as well as one learned through imitation learning, in simulation and experimentally on the hardware.

12:45-13:00, Paper WeBT6.5
50 Benchmarks for Anthropomorphic Hand Function-Based Dexterity Classification and Kinematics-Based Hand Design
Video Attachment

Zhou, Jianshu	The Chinese University of Hong Kong
Chen, Yonghua	The University of Hong Kong
Li, Dickson Chun Fung	The Chinese University of Hong Kong
Gao, Yuan	The Chinese University of Hong Kong
Li, Yunquan	The University of Hong Kong
Cheng, Shing Shin	The Chinese University of Hong Kong
Chen, Fei	Istituto Italiano Di Tecnologia
Liu, Yunhui	Chinese University of Hong Kong
Keywords: Grippers and Other End-Effectors, Grasping, In-Hand Manipulation Abstract: Robotic hands with anthropomorphism considerations are of prominent popularity in humancentered environment. Existing anthropomorphic robotic hands achieving part or most of human hand comparable dexterity have been applied as various robotic endeffectors and prosthetics. However, two deficiencies are evident that the design for a dexterous anthropomorphic hand is largely based on the intuition of designers and the dexterity of robotic hand is hard to evaluate. To tackle these two challenges, this paper firstly summarizes 50 hand dexterity benchmarks (HD-marks) to evaluate hand dexterity comprehensively from three perspectives. Secondly, a novel 22-DOFs soft robotic hand (S-22) replicates human hand kinematics is used to demonstrate all the 50 HD-marks. Thirdly, 7 critical joint-based kinematic motions (K-motions) and their correlation with the 50 HD-marks are established. Therefore, a clear robotic hand design guideline is built by mapping the hand functional dexterity to the required joint kinematics.

13:00-13:15, Paper WeBT6.6
Stable In-Grasp Manipulation with a Low-Cost Robot Hand by Using 3-Axis Tactile Sensors with a CNN
Video Attachment

Funabashi, Satoshi	Waseda University, Sugano Lab
Isobe, Tomoki	Waseda University
Ogasa, Shun	Waseda University, Graduate School of Creative Science, Engineer
Ogata, Tetsuya	Waseda University
Schmitz, Alexander	Waseda University
Tomo, Tito Pradhono	Waseda University
Sugano, Shigeki	Waseda University
Keywords: In-Hand Manipulation, Force and Tactile Sensing, Deep Learning in Grasping and Manipulation Abstract: The use of tactile information is one of the most important factors for achieving stable in-grasp manipulation. Especially with low-cost robotic hands that provide low-precision control, robust in-grasp manipulation is challenging. Abundant tactile information could provide the required feedback to achieve reliable in-grasp manipulation also in such cases. In this research, soft distributed 3-axis skin sensors (``uSkin'') and 6-axis F/T (force/torque) sensors were mounted on each fingertip of an Allegro Hand to provide rich tactile information. These sensors yielded 78 measurements for each fingertip (72 measurements from the uSkin and 6 measurements from the 6-axis F/T sensor). However, such high-dimensional tactile information can be difficult to process because of the complex contact states between the grasped object and the fingertips. Therefore, a convolutional neural network (CNN) was employed to process the tactile information. In this paper, we explored the importance of the different sensors for achieving in-grasp manipulation. Successful in-grasp manipulation with untrained daily objects was achieved when both 3-axis uSkin and 6-axis F/T information was provided and when the information was processed using a CNN.


WeBT7	Room T7
Dexterous Manipulation and Grasping I	Regular session
Chair: Malvezzi, Monica	University of Siena
Co-Chair: Chakraborty, Nilanjan	Stony Brook University

11:45-12:00, Paper WeBT7.1
Diabolo Orientation Stabilization by Learning Predictive Model for Unstable Unknown-Dynamics Juggling Manipulation
Video Attachment

Murooka, Takayuki	The University of Tokyo
Okada, Kei	The University of Tokyo
Inaba, Masayuki	The University of Tokyo
Keywords: Entertainment Robotics, Dexterous Manipulation, Model Learning for Control Abstract: Juggling manipulation is one of difficult manipulation to acquire because some of such manipulation is unstable and also its physical model is unknown due to the complex nonprehensile manipulation. To acquire these unstable unknown-dynamics juggling manipulation, we propose a method for designing the predictive model of such manipulation with a deep neural network, and also a real-time optimal control law with some robustness and adaptability using backpropagation of the network. In this study, we applied this method to diabolo orientation stabilization, which is one of unstable unknown-dynamics juggling manipulation. We verify the effectiveness of the proposed method by comparing with basic controllers such as P Controller or PID Controller, and also check the adaptability of the proposed controller by some experiments with a real life-sized humanoid robot.

12:00-12:15, Paper WeBT7.2
Hand-Object Contact Force Synthesis for Manipulating Objects by Exploiting Environment

Patankar, Aditya	Stony Brook University
Fakhari, Amin	State University of New York, Korea
Chakraborty, Nilanjan	Stony Brook University
Keywords: Grasping, Manipulation Planning, Optimization and Optimal Control Abstract: In this paper, we study the problem of computing grasping forces for quasi-static manipulation of large and heavy objects, by exploiting object-environment contacts. We present a general formulation of this problem as a Second-Order Cone Program (SOCP) that considers (i) contact friction constraints at the object-manipulator contacts and object-environment contacts, (ii) force/moment equilibrium constraints, and (iii) manipulator joint torque constraints. The SOCP formulation implies that the optimal grasping forces for manipulating objects with the help of the environment can be computed efficiently. Different optimization objectives like minimizing contact forces at the object-manipulator contacts or minimizing joint torques of manipulators can be considered. We evaluated our method by simulations in two different scenarios.

12:15-12:30, Paper WeBT7.3
Functionally Divided Manipulation Synergy for Controlling Multi-Fingered Hands
Video Attachment

Higashi, Kazuki	Osaka University
Koyama, Keisuke	Osaka University
Ozawa, Ryuta	Meiji University
Nagata, Kazuyuki	National Inst. of AIST
Wan, Weiwei	Osaka University
Harada, Kensuke	Osaka University
Keywords: Multifingered Hands Abstract: Synergy provides a practical approach for expressing various postures of a multi-fingered hand. However, a conventional synergy defined for reproducing grasping postures cannot perform in-hand manipulation, e.g., tasks that involve simultaneously grasping and manipulating an object. Locking the position of particular fingers of a multi-fingered hand is essential for in-hand manipulation tasks either to hold an object or to fix unnecessary fingers. When using conventional synergy based control to manipulate an object, which requires locking some fingers, the coordination of joints is heavily restricted, decreasing the dexterity of the hand. We propose a functionally divided manipulation synergy (FDMS) method, which provides a synergy-based control to achieves both dimensionality reduction and in-hand manipulation. In FDMS, first, we define the function of each finger of the hand as either ``manipulation" or ``fixed." Then, we apply synergy control only to the fingers having the manipulation function, so that dexterous manipulations can be realized with a few control inputs. Furthermore, we propose the Synergy Switching Framework as a method for applying a finely defined FDMS to sequential task changes. The effectiveness of our method is experimentally verified.

12:30-12:45, Paper WeBT7.4
Maintaining Stable Grasps During Highly Dynamic Robot Trajectories
Video Attachment

Martucci, Giandomenico	Universit� Degli Studi Di Siena
Bimbo, Joao	Yale University
Prattichizzo, Domenico	University of Siena
Malvezzi, Monica	University of Siena
Keywords: Grasping, Dynamics, Contact Modeling Abstract: One of the key advantages of robots is the high speeds at which they can operate. In industrial settings, increased velocities can lead to higher throughputs and improved efficiency. Some manipulation tasks might require the robot to perform highly dynamic operations such as shaking, or swinging while grasping an object. These fast movements may produce high accelerations and thus give rise to inertial forces that can cause a grasped object to slip. In this paper a method is proposed to determine the inertial forces that arise on a grasped object during a trajectory, find the instances at which the object might slip, and avoid these slippages by changing the trajectory, namely the orientation of the object. To exemplify the usage of this approach, two grasping tasks are realised: a prehensile and a non-prehensile grasp, and strategies to successfully perform these tasks without changing the overall duration of the trajectory are defined and evaluated.

12:45-13:00, Paper WeBT7.5
Object-Agnostic Dexterous Manipulation of Partially Constrained Trajectories
Video Attachment

Morgan, Andrew	Yale University
Hang, Kaiyu	Yale University
Dollar, Aaron	Yale University
Keywords: Dexterous Manipulation, In-Hand Manipulation, Manipulation Planning Abstract: We address the problem of controlling a partially constrained trajectory of the manipulation frame�an arbitrary frame of reference rigidly attached to the object�as the desired motion about this frame is often underdefined. This may be apparent, for example, when the task requires control only about the translational dimensions of the manipulation frame, with disregard to the rotational dimensions. This scenario complicates the computation of the grasp frame trajectory, as the mobility of the mechanism is likely limited due to the constraints imposed by the closed kinematic chain. In this letter, we address this problem by combining a learned, object-agnostic manipulation model of the gripper with Model Predictive Control (MPC). This combination facilitates an approach to simple vision-based control of robotic hands with generalized models, enabling a single manipulation model to extend to different task requirements. By tracking the hand-object configuration through vision, the proposed framework is able to accurately control the trajectory of the manipulation frame along translational, rotational, or mixed trajectories. We provide experiments quantifying the utility of this framework, analyzing its ability to control different objects over varied horizon lengths and optimization iterations, and finally, we implement the controller on a physical system.

13:00-13:15, Paper WeBT7.6
Wet Adhesion of Micro-Patterned Interfaces for Stable Grasping of Deformable Objects
Video Attachment

Nguyen, Van Pho	Japan Advanced Institute of Science and Technology (JAIST)
Luu, Quan	Japan Advanced Institute of Science and Technology
Takamura, Yuzuru	Japan Advanced Institute of Science and Technology
Ho, Van	Japan Advanced Institute of Science and Technology
Keywords: Grasping, Soft Robot Materials and Design, Biomimetics Abstract: Stable grip of wet, deformable objects is a challenging task for robotic grasping and manipulation, especially for food products� handling. The wet, slippery interfaces between the object and robotic fingers may require larger gripping force, resulting in higher risk of damaging the grasped object. This research aims to evaluate the role of micro-patterned soft pad on enhancement of wet adhesion in grasping a food sample in wet environment. We showcased this scenario with a tofu block 19.6mm�19.6mm�15mm that is soft, and deformable object, gripped by a soft robotic gripper with two fingers. Each fingertip�s surface, which directly makes contact with the tofu, was deposited soft pads in two cases: normal pads (flat surface) and a micropatterned pads. The micropatterned pad comprises of 14400 square cells, each cell has four 85 μm edges, surrounded by a channel network with 44 μm in depth. We conducted estimation of grasped force generated by pads in two cases, then verified by actual setup in griping the tofu block. Both estimated and experimental results reveal that the micropatterned pad decreased necessary load acting on the tofu�s surface 2.2 times lower than that of the normal one, while maintaining the stability of the grasped tofu. The showcase in this paper supported the potential of micro patterns on soft fingertip in grasping deformable objects in wet environments without complicated control strategy, promising wider applications for robot in service section or food industry.

13:00-13:15, Paper WeBT7.7
Identification of a Human Hand Kinematics by Measuring and Merging of Nail-Based Finger Motions

Tani, Hidenori	Kawasaki Heavy Industries, Ltd
Nozawa, Ryo	Kawasaki Heavy Industry
Sugihara, Tomomichi	Preferred Networks, Inc
Keywords: Multifingered Hands, Kinematics, In-Hand Manipulation Abstract: A method to identify the kinematics model of a human hand that less suffers from the skin artifact is proposed based on a fact that the movements of nails with respect to the corresponding fingertip bones are much smaller than that of skin. It consists of two stages. In the first (individual) stage, the most likely combination of joint assignments and angles of each finger is identified through a dual-phase least squares method (LSM), where the joint angles are estimated in the inner LSM and the joint assignments in the outer LSM, from the movement of the hand dorsum with respect to the base coordinate frame attached to each nail. In the second (merging) stage, kinematic models of each finger are merged so as to compromise the estimated movements of the hand dorsum by them also through the dual-phase LSM. It is shown that the identified joint assignments have an advantage over several existing anthropomorphic robot hands based on the distribution of pinchability (DOP), which is also proposed in this paper as a novel index to evaluate the ability of in-hand manipulation.


WeBT8	Room T8
Dexterous Manipulation and Grasping II	Regular session
Chair: Padir, Taskin	Northeastern University
Co-Chair: Salvietti, Gionata	University of Siena

11:45-12:00, Paper WeBT8.1
Gripping a Kitchen Knife on the Cutting Board
Video Attachment

Xue, Yuechuan	Iowa State University
Jia, Yan-Bin	Iowa State University
Keywords: Dexterous Manipulation, In-Hand Manipulation, Motion Control Abstract: Despite more than three decades of grasping research, many tools in our everyday life still pose a serious challenge for a robotic hand to grip. The level of dexterity for such a maneuver is surprisingly 'high' that its execution may require a combination of closed loop controls and finger gaits. This paper studies the task of an anthropomorphic hand driven by a robotic arm to pick up and firmly hold a kitchen knife initially resting on the cutting board. In the first phase, the hand grasps the knife's handle at two antipodal points and then pivots it about the knife's point in contact with the board to leverage the latter's support. Desired contact forces exerted by the two holding soft fingers are calculated and used for dynamic control of both the hand and the arm. In the second phase, a sequence of gaits for all the five fingers is performed quasi-statically to reach a power grasp on the knife's handle, which remains still during the period. Simulation has been performed using models of the Shadow Hand and the UR10 Arm.

12:00-12:15, Paper WeBT8.2
A Grasping-Centered Analysis for Cloth Manipulation (I)

Borr�s, J�lia	CSIC-UPC
Aleny�, Guillem	CSIC-UPC
Torras, Carme	Csic - Upc
Keywords: Dexterous Manipulation, Grasping, Grippers and Other End-Effectors Abstract: Compliant and soft hands have gained a lot of attention in the past decade because of their ability to adapt to the shape of the objects, increasing their effectiveness for grasping. However, when it comes to grasping highly flexible objects such as textiles, we face the dual problem: it is the object that will adapt to the shape of the hand or gripper. In this context, the classic grasp analysis or grasping taxonomies are not suitable for describing textile objects grasps. This article proposes a novel definition of textile object grasps that abstracts from the robotic embodiment or hand shape and recovers concepts from the early neuroscience literature on hand prehension skills. This framework enables us to identify what grasps have been used in literature until now to perform robotic cloth manipulation, and allows for a precise definition of all the tasks that have been tackled in terms of manipulation primitives based on regrasps. In addition, we also review what grippers have been used. Our analysis shows how the vast majority of cloth manipulations have relied only on one type of grasp, and at the same time we identify several tasks that need more variety of grasp types to be executed successfully. Our framework is generic, provides a classification of cloth manipulation primitives and can inspire gripper design and benchmark construction for cloth manipulation.

12:15-12:30, Paper WeBT8.3
Modular, Accessible, Sensorized Objects for Evaluating the Grasping and Manipulation Capabilities of Grippers and Hands
Video Attachment

Gao, Geng	University of Auckland
Gorjup, Gal	The University of Auckland
Yu, Ruobing	Ai Data Innovations
Jarvis, Patrick	Humanoid Artificial Intelligence
Liarokapis, Minas	The University of Auckland
Keywords: Performance Evaluation and Benchmarking Abstract: The human hand is Nature's most versatile and dexterous end-effector and it has been a source of inspiration for roboticists for over 50 years. Recently, significant industrial and research effort has been put into the development of dexterous robot hands and grippers. Such end-effectors offer robust grasping and dexterous, in-hand manipulation capabilities that increase the efficiency, precision, and adaptability of the overall robotic platform. This work focuses on the development of modular, sensorized objects that can facilitate benchmarking of the dexterity and performance of hands and grippers. The proposed objects aim to offer; a minimal, sufficiently diverse solution, efficient pose tracking, and accessibility. The object manufacturing instructions, 3D models, and assembly information are made publicly available through the creation of a corresponding repository.

12:30-12:45, Paper WeBT8.4
Learning Bayes Filter Models for Tactile Localization

Kelestemur, Tarik	Northeastern University
Keil, Colin	Northeastern University
Whitney, John Peter	Northeastern University
Platt, Robert	Northeastern University
Padir, Taskin	Northeastern University
Keywords: Deep Learning in Grasping and Manipulation, Multi-Modal Perception, Novel Deep Learning Methods Abstract: Localizing and tracking the pose of robotic grippers are necessary skills for manipulation tasks. However, the manipulators with imprecise kinematic models (e.g. low-cost arms) or manipulators with unknown world coordinates (e.g. poor camera-arm calibration) cannot locate the gripper with respect to the world. In these circumstances, we can leverage tactile feedback between the gripper and the environment. In this paper, we present learnable Bayes filter models that can localize robotic grippers using tactile feedback. We propose a novel observation model that conditions the tactile feedback on visual maps of the environment along with a motion model to recursively estimate the gripper's location. Our models are trained in simulation with self-supervision and transferred to the real world. Our method is evaluated on a tabletop localization task in which the gripper interacts with objects. We report results in simulation and on a real robot, generalizing over different sizes, shapes, and configurations of the objects.

12:45-13:00, Paper WeBT8.5
A Versatile Gripper for Cloth Manipulation
Video Attachment

Donaire, S�nia	CSIC-UPC
Borr�s, J�lia	CSIC-UPC
Aleny�, Guillem	CSIC-UPC
Torras, Carme	Csic - Upc
Keywords: Grippers and Other End-Effectors, Mechanism Design, Grasping Abstract: Cloth manipulation has been mostly advancing in perception and modeling methods for cloth state estimation and grasping point detection. In comparison, less attention has been put on end-effector design. Indeed, most implementations use a parallel gripper that can only perform pinch grasps. Instead, a more versatile set of possible grasp types could ease many tasks by providing more support to certain parts of the object, as well as make feasible tasks that become very complex when executed with only pinch grasps. We present a versatile gripper design which, besides the common open-close thumb+finger(s) feature, it has a couple of reconfiguration degrees of freedom that offer, first, a wide base plane to provide support and, second, variable friction surface on the thumb tip. Our gripper can execute a versatile set of grasps that ease some complex tasks such as pick and place folded clothes or fold in the air. In addition, the variable friction mechanism enables a more robust pinch-and-slide manipulation to trace cloth edges. Our evaluation shows the gripper potential to execute a wide variety of cloth manipulation tasks.

13:00-13:15, Paper WeBT8.6
The Mag-Gripper: A Soft-Rigid Gripper Augmented with an Electromagnet to Precisely Handle Clothes
Video Attachment

Marullo, Sara	University of Siena
Bartoccini, Simone	Universit� Degli Studi Di Siena
Salvietti, Gionata	University of Siena
Iqbal, Muhammad Zubair	University of Siena
Prattichizzo, Domenico	University of Siena
Keywords: Grippers and Other End-Effectors, Grasping, Dexterous Manipulation Abstract: This paper introduces Mag-Gripper, a novel robotic gripper specifically designed for autonomous clothing manipulation. It is capable of improving grasp repeatability and precision, compensating uncertainties in the target grasping locations. We propose to approach the autonomous clothing manipulation challenge by involving a suitable magnetic force. For this reason, Mag-Gripper is equipped with an electromagnet capable of interacting with small metal parts properly placed on the garment to be grasped. Electromagnet exploitation is not a novelty in literature, but our design innovation consists in embedding the electromagnet in the structure of a jaw gripper. In so doing, we revisit a classic end-effector type, corresponding to the simplest representation of a hand capable of opposability, allowing easily controllable devices to perform grasps similar to the human pinch grasp. Mag-Gripper can find applications either in Research labs investigating Machine Learning-based clothing manipulation techniques either in companies having to manage a large amount of returns, either in home setting scenarios.


WeBT9	Room T9
Manipulation and Grasping: Design	Regular session
Chair: Stuart, Hannah	UC Berkeley
Co-Chair: Catalano, Manuel Giuseppe	Istituto Italiano Di Tecnologia

11:45-12:00, Paper WeBT9.1
A Thermoplastic Elastomer Belt Based Robotic Gripper
Video Attachment

Zheng, Xingwen	Peking University
Hou, Ningzhe	The University of Manchester
Dinjens, Pascal Johannes Dani�l	Fontys University of Applied Sciences
Wang, Ruifeng	The University of Manchester
Dong, Chengyang	The University of Manchester
Xie, Guangming	Peking University
Keywords: Grippers and Other End-Effectors, Grasping, Soft Robot Applications Abstract: Novel robotic grippers have captured increasing interests recently because of their abilities to adapt to varieties of circumstances and their powerful functionalities. Differing from traditional gripper with mechanical components-made fingers, novel robotic grippers are typically made of novel structures and materials, using a novel manufacturing process. In this paper, a novel robotic gripper with external frame and internal thermoplastic elastomer belt-made net is proposed. The gripper grasps objects using the friction between the net and objects. It has the ability of adaptive gripping through flexible contact surface. Stress simulation has been used to explore the regularity between the normal stress on the net and the deformation of the net. Experiments are conducted on a variety of objects to measure the force needed to reliably grip and hold the object. Test results show that the gripper can successfully grip objects with varying shape, dimensions, and textures. It is promising that the gripper can be used for grasping fragile objects in the industry or out in the field, and also grasping the marine organisms without hurting them.

12:00-12:15, Paper WeBT9.2
A Variable-Structure Robot Hand That Uses the Environment to Achieve General Purpose Grasps
Video Attachment

Golan, Yoav	Ben Gurion University
Shapiro, Amir	Ben Gurion University of the Negev
Rimon, Elon	Technion - Israel Institute of Technology
Keywords: Grippers and Other End-Effectors, Manipulation Planning, Mechanism Design Abstract: Modern robotic grippers are either specialized and simple, complex and general purpose, or soft and compliant hands. The first provide high reliability but are limited in the range of objects they grasp. Compliant or soft grippers can grasp wide ranges of objects, but are not yet reliable enough for real-world applications. This paper presents a novel variable-structure general purpose robotic hand. The planar-acting hand has a minimalistic structure that can adapt itself against the environment to fit a wide range of objects. The single motor, multi-finger hand utilizes a novel principle of re-arranging its structure prior to grasping. This is achieved by pressing the hand against the environment and performing a series of adjustment moves, to best suit the hand for the object to be grasped. The design and method of operation of the hand is explained. Then, a technique for determining the set of adjustments needed to re-arrange the hand according to the desired grasp is sketched. Real-world experiments are performed with the hand, showing its ability to re-arrange itself and grasp previously unseen objects. Source code for the simulations and experiments is supplemented to the paper, as well as a video clip of the variable-structure hand in action.

12:15-12:30, Paper WeBT9.3
Milliscale Features Increase Friction of Soft Skin in Lubricated Contact

Li, Monica	Embodied Dexterity Group, University of California, Berkeley
Melville, Dominic	UC Berkeley
Chung, Ethan	University of California, Berkeley
Stuart, Hannah	UC Berkeley
Keywords: Grasping, Contact Modeling, Soft Robot Materials and Design Abstract: Real world environments, such as kitchens, present objects covered in viscous fluids: soap, oil, water, etc. Understanding and designing for slippery and submerged contact, where fluid lubrication is present, is a continuing challenge in the robotics community. Contact area, bending stiffness, and the presence of a viscous fluid affect friction. This work focuses on milliscale features (3 to 20 mm in size) of soft urethane skin on smooth, flat surfaces. We characterize the friction of soft skins, with varying size, and therefore bending stiffness, of cylindrical features, all with the same nominal contact area. In addition, a new method of frustrated total internal reflection with dye is introduced to visualize lubricated contact. We find that a small number of milliscale fingertip features maximizes friction force in the presence of lubrication, as compared both to un-patterned and many-featured skin designs. This holds true for a robotic gripper test, when pinching glass submerged in oil.

12:30-12:45, Paper WeBT9.4
A Dexterous Soft Robotic Hand for Delicate In-Hand Manipulation
Video Attachment

Abondance, Sylvain	Harvard, EPFL
Teeple, Clark	Harvard University
Wood, Robert	Harvard University
Keywords: In-Hand Manipulation, Soft Robot Applications, Dexterous Manipulation Abstract: In this work, we show that soft robotic hands provide a robust means of performing basic primitives of in-hand manipulation in the presence of uncertainty. We first discuss the design of a prototype hand with dexterous soft fingers capable of moving objects within the hand using several basic motion primitives. We then empirically validate the ability of the hand to perform the desired object motion primitives while still maintaining strong grasping capabilities. Based on these primitives, we examine a simple, heuristic finger gait which enables continuous object rotation for a wide variety of object shapes and sizes. Finally, we demonstrate the utility of our dexterous soft robotic hand in three real-world cases: unscrewing the cap of a jar, orienting food items for packaging, and gravity compensation during grasping. Overall, we show that even for complex tasks such as in-hand manipulation, soft robots can perform robustly without the need for local sensing or complex control.

12:45-13:00, Paper WeBT9.5
SoftHandler: An Integrated Soft Robotic System for the Handling of Heterogeneous Objects (I)
Video Attachment

Angelini, Franco	University of Pisa
Petrocelli, Cristiano	Istituto Italiano Di Tecnologia
Catalano, Manuel Giuseppe	Istituto Italiano Di Tecnologia
Garabini, Manolo	Universit� Di Pisa
Grioli, Giorgio	Istituto Italiano Di Tecnologia
Bicchi, Antonio	Universit� Di Pisa
Keywords: Compliant Assembly, Multifingered Hands, Grasping Abstract: The picking performance of a robot can be severely affected by measurement errors, especially when handling objects that are fragile or irregular in shape and size. This is one of the main reasons why the problem of autonomously picking and placing objects is still open. In this work, we exploit the �embodied� intelligence of soft robotic technologies to propose an integrated system, named SoftHandler, capable of overcoming some of the limitations of traditional pick-andplace industrial robots. The SoftHandler integrates a novel parallel soft manipulator, the SoftDelta, and a novel soft endeffector, the Pisa/IIT SoftGripper. We describe the mechatronic design and control architectures of the system, including a learning technique able to preserve the natural compliance of the system. After that, we design a benchmarking method, and we experimentally compare the developed soft manipulator with its rigid counterpart. Experimental results with reference objects show that the proposed system has a larger grasping success rate than the rigid one and is subject to smaller interaction forces during the picking phase. Finally, we report an extensive experimental validation of the grasping capability of the SoftHandler with real objects in realistic, physically simulated, scenarios, as the ones of bin picking, grocery and raw food handling.


WeBT10	Room T10
Manipulation and Grasping: Learning	Regular session
Chair: Leitner, Jurgen	LYRO Robotics / Australian Centre for Robotic Vision / QUT
Co-Chair: Sun, Yu	University of South Florida

11:45-12:00, Paper WeBT10.1
Generalizing Learned Manipulation Skills in Practice
Video Attachment

Wilches, Juan	University of South Florida
Huang, Yongqiang	University of South Florida
Sun, Yu	University of South Florida
Keywords: Sensorimotor Learning, Dynamics, Sensor-based Control Abstract: Robots should be able to learn and perform a manipulation task across different settings. This paper presents an approach that learns an RNN-based manipulation skill model from demonstrations and then generalizes the learned skill in new settings. The manipulation skill model learned from demonstrations in an initial set of setting performs well in those settings and similar ones. However, the model may perform poorly in a novel setting that is significantly different from the learned settings. Therefore a novel approach called generalization in practice (GiP) is developed to tackle this critical problem. In this approach, the robot practices in the new setting to obtain new training data and refine the learned skill using the new data to gradually improve the learned skill model. The proposed approach has been implemented for one type of manipulation task -- pouring that is the most performed manipulation in cooking applications. The presented approach enables a pouring robot to pour gracefully like a person in terms of speed and accuracy in learned setups and gradually improve the pouring performance in novel setups after several practices.

12:00-12:15, Paper WeBT10.2
Robot Learning in Mixed Adversarial and Collaborative Settings

Yoon, Sean Hee	University of Southern California
Nikolaidis, Stefanos	University of Southern California
Keywords: Grasping, Deep Learning in Grasping and Manipulation Abstract: Previous work has shown that interacting with a human adversary can significantly improve the efficiency of the learning process in robot grasping. However, people are not consistent in applying adversarial forces; instead they may alternate between acting antagonistically with the robot or helping the robot achieve its tasks. We propose a physical framework for robot learning in a mixed adversarial/collaborative setting, where a second agent may act as a collaborator or as an antagonist, unbeknownst to the robot. The framework leverages prior estimates of the reward function to infer whether the actions of the second agent are collaborative or adversarial. Integrating the inference in an adversarial learning algorithm can significantly improve the robustness of learned grasps in a manipulation task.

12:15-12:30, Paper WeBT10.3
Blind Bin Picking of Small Screws through In-Finger Manipulation with Compliant Robotic Fingers
Video Attachment

Ishige, Matthew	The University of Tokyo
Umedachi, Takuya	Shinshu University
Ijiri, Yoshihisa	OMRON Corp
Taniguchi, Tadahiro	Ritsumeikan University
Kawahara, Yoshihiro	The University of Tokyo
Keywords: Soft Robot Applications, Modeling, Control, and Learning for Soft Robots, Deep Learning in Grasping and Manipulation Abstract: Although picking up objects a few centimeters in size is a common task, achieving such ability in a robot manipulator remains challenging. We take a step toward solving this problem by focusing on the task of picking a 1.0-cm screw from a bulk bin using only tactile information to achieve the task. Inspired by how humans pick up small objects from a bin, we propose a �grasp-separate� strategy for robotic picking,which involves grasping many objects first and then separating a single object through manipulation in the fingers, for robotic picking. Based on this strategy, we developed a tactile-based screw bin-picking system. We trained a convolution neural network to estimate the number of screws in the fingers first and built a controller that generates manipulation behaviors to separate a screw using reinforcement learning. To compensate for the low resolution of off-the-shelf tactile sensor arrays,we adopted active sensing, which uses observations obtained during a predefined simple movement. We show that this approach enhances the estimation accuracy and manipulation performance. Furthermore, to enable flexible finger motion,such as between the thumb and the index finger in a human hand, we propose a soft robot finger structure that leverages compliant materials. A soft actor-critic algorithm successfully found dexterous screw separation behaviors in compliant soft robotic fingers. In the evaluation, the system obtained an average success rate of 80%, which was difficult to achieve without the grasp-separate manipulation technique.

12:30-12:45, Paper WeBT10.4
Object Recognition, Dynamic Contact Simulation, Detection, and Control of the Flexible Musculoskeletal Hand Using a Recurrent Neural Network with Parametric Bias
Video Attachment

Kawaharazuka, Kento	The University of Tokyo
Tsuzuki, Kei	University of Tokyo
Onitsuka, Moritaka	The University of Tokyo
Asano, Yuki	The University of Tokyo
Okada, Kei	The University of Tokyo
Kawasaki, Koji	The University of Tokyo
Inaba, Masayuki	The University of Tokyo
Keywords: Multifingered Hands, Biomimetics, Deep Learning in Grasping and Manipulation Abstract: The flexible musculoskeletal hand is difficult to modelize, and its model can change constantly due to deterioration over time, irreproducibility of initialization, etc. Also, for object recognition, contact detection, and contact control using the hand, it is desirable not to use a neural network trained for each task, but to use only one integrated network. Therefore, we develop a method to acquire a sensor state equation of the musculoskeletal hand using a recurrent neural network with parametric bias. By using this network, the hand can realize recognition of the grasped object, contact simulation, detection, and control, and can cope with deterioration over time, irreproducibility of initialization, etc. by updating parametric bias. We apply this study to the hand of the musculoskeletal humanoid Musashi and show its effectiveness.

12:45-13:00, Paper WeBT10.5
EGAD! an Evolved Grasping Analysis Dataset for Diversity and Reproducibility in Robotic Manipulation
Video Attachment

Morrison, Douglas	Australian Centre for Robotic Vision
Corke, Peter	Queensland University of Technology
Leitner, Jurgen	LYRO Robotics / Australian Centre for Robotic Vision / QUT
Keywords: Grasping, Deep Learning in Grasping and Manipulation, Manipulation Planning Abstract: We present the Evolved Grasping Analysis Dataset (EGAD), comprising over 2000 generated objects aimed at training and evaluating robotic visual grasp detection algorithms. The objects in EGAD are geometrically diverse, filling a space ranging from simple to complex shapes and from easy to difficult to grasp, compared to other datasets for robotic grasping, which may be limited in size or contain only a small number of object classes. Additionally, we specify a set of 49 diverse 3D-printable evaluation objects to encourage reproducible testing of robotic grasping systems across a range of complexity and difficulty. The dataset, code and videos can be found at https://dougsm.github.io/egad/

13:00-13:15, Paper WeBT10.6
Deep Gated Multi-Modal Learning: In-Hand Object Pose Changes Estimation Using Tactile and Image Data
Video Attachment

Anzai, Tomoki	The University of Tokyo
Takahashi, Kuniyuki	Preferred Networks
Keywords: In-Hand Manipulation, Sensor Fusion, Deep Learning in Grasping and Manipulation Abstract: For in-hand manipulation, estimation of the object pose inside the hand is one of the important functions to manipulate objects to the target pose. Since in-hand manipulation tends to cause occlusions by the hand or the object itself, image information only is not sufficient for in-hand object pose estimation. Multiple modalities can be used in this case, the advantage is that other modalities can compensate for occlusion, noise, and sensor malfunctions. Even though deciding the utilization rate of a modality (referred to as reliability value) corresponding to the situations is important, the manual design of such models is difficult, especially for various situations. In this paper, we propose deep gated multi-modal learning, which self-determines the reliability value of each modality through end-to-end deep learning. For the experiments, an RGB camera and a GelSight tactile sensor were attached to the parallel gripper of the Sawyer robot, and the object pose changes were estimated during grasping. A total of 15 objects were used in the experiments. In the proposed model, the reliability values of the modalities were determined according to the noise level and failure of each modality, and it was confirmed that the pose change was estimated even for unknown objects.


WeBT11	Room T11
Manipulation and Grasping: Planning	Regular session
Chair: Lohan, Katrin Solveig	Heriot-Watt University
Co-Chair: Dogar, Mehmet R	University of Leeds

11:45-12:00, Paper WeBT11.1
Knowledge-Based Grasp Planning Using Dynamic Self-Organizing Network

Yang, Shiyi	University of Waterloo
Jeon, Soo	University of Waterloo
Keywords: Grasping, Learning Categories and Concepts, Service Robotics Abstract: Category-based methods for task-specified grasp planning have recently been proposed in the literature. Such methods, however, are normally time consuming in both training and grasp determination process and lack capabilities to improve grasping skills due to the fixed training data set. This paper presents an improved approach for knowledge-based grasp planning by developing a multi-layer network using self-organizing map. A number of grasp candidates are learned in the experiments and the information that is associated with these grasp candidates is clustered based on different criteria on each network layer. A codebook which is composed of a small number of generalized models and the corresponding task oriented grasps is generated from the network. In addition, the proposed network is capable of automatically adjusting its size so that the codebook can be continuously updated from each interaction with the novel objects. In order to increase the accuracy and convergence rate of the clustering process, a new initialization method is also proposed. Simulation results present the advantages of the proposed initialization method and the auto-growing algorithm in terms of accuracy and efficiency over some conventional methods. Experimental results demonstrate that novel objects can be successfully grasped in accordance with desired tasks using the proposed approach.

12:00-12:15, Paper WeBT11.2
Environment-Aware Grasp Strategy Planning in Clutter for a Variable Stiffness Hand
Video Attachment

Sundaram, Ashok M.	German Aerospace Center (DLR)
Friedl, Werner	German AerospaceCenter (DLR)
Roa, Maximo A.	DLR - German Aerospace Center
Keywords: Manipulation Planning, Grasping, Dexterous Manipulation Abstract: This paper deals with the problem of planning grasp strategies on constrained and cluttered scenarios. The planner sequences the objects for grasping by considering multiple factors: (i) possible environmental constraints that can be exploited to grasp an object, (ii) object neighborhood, (iii) capability of the arm, and (iv) confidence score of the vision algorithm. To successfully exploit the environmental constraints, this work uses the CLASH hand, a compliant hand that can vary its passive stiffness. The hand can be softened such that it can comply with the object shape, or it can be stiffened to pierce between the objects in clutter. A stiffness decision tree is introduced to choose the best stiffness setting for each particular scenario. In highly cluttered scenarios, a finger position planner is used to find a suitable orientation for the hand such that the fingers can slide in the free regions around the object. Thus, the grasp strategy planner predicts not only the sequence in which the objects can be grasped, but also the required stiffness of the end effector, and the appropriate positions for the fingers around the object. Different experiments are carried out in the context of grocery handling to test the performance of the planner in scenarios that require different grasping strategies.

12:15-12:30, Paper WeBT11.3
Self-Assessment of Grasp Affordance Transfer
Video Attachment

Ard�n, Paola	Edinburgh Centre for Robotics
Pairet, �ric	Edinburgh Centre for Robotics
Petillot, Yvan R.	Heriot-Watt University
Petrick, Ron	Heriot-Watt University
Ramamoorthy, Subramanian	The University of Edinburgh
Lohan, Katrin Solveig	Heriot-Watt University
Keywords: Humanoid Robot Systems, Behavior-Based Systems, Perception for Grasping and Manipulation Abstract: Reasoning about object grasp affordances allows an autonomous agent to estimate the most suitable grasp to execute a task. While current approaches for estimating grasp affordances are effective, their prediction is driven by hypotheses on visual features rather than an indicator of a proposal's suitability for an affordance task. Consequently, these works cannot guarantee any level of performance when executing a task and, in fact, not even ensure successful task completion. In this work, we present a pipeline for SAGAT based on prior experiences. We visually detect a grasp affordance region to extract multiple grasp affordance configuration candidates. Using these candidates, we forward simulate the outcome of executing the affordance task to analyse the relation between task outcome and grasp candidates. The relations are ranked by performance success with a heuristic confidence function and used to build a library of affordance task experiences. The library is later queried to perform one-shot transfer estimation of the best grasp configuration on new objects. Experimental evaluation shows that our method exhibits a significant performance improvement up to 11.7% against current state-of-the-art methods on grasp affordance detection. Experiments on a PR2 robotic platform demonstrate our method's highly reliable deployability to deal with real-world task affordance problems.

12:30-12:45, Paper WeBT11.4
Describing Physics for Physical Reasoning: Force-Based Sequential Manipulation Planning
Video Attachment

Toussaint, Marc	Tu Berlin
Ha, Jung-Su	University of Stuttgart
Driess, Danny	University of Stuttgart
Keywords: Manipulation Planning, Motion and Path Planning Abstract: Physical reasoning is a core aspect of intelligence in animals and humans. A central question is what model should be used as a basis for reasoning. Existing work considered models ranging from intuitive physics and physical simulators to contact dynamics models used in robotic manipulation and locomotion. In this work we propose descriptions of physics which directly allow us to leverage optimization methods for physical reasoning and sequential manipulation planning. The proposed multi-physics formulation enables the solver to mix various levels of abstraction and simplifications for different objects and phases of the solution. As an essential ingredient, we propose a specific parameterization of wrench exchange between object surfaces in a path optimization framework, introducing the point-of-attack as decision variable. We demonstrate the approach on various robot manipulation planning problems, such as grasping a stick in order to push or lift another object to a target, shifting and grasping a book from a shelve, and throwing an object to bounce towards a target.

12:45-13:00, Paper WeBT11.5
Online Replanning with Human-In-The-Loop for Non-Prehensile Manipulation in Clutter � a Trajectory Optimization Based Approach
Video Attachment

Papallas, Rafael	The University of Leeds
Cohn, Anthony	University of Leeds
Dogar, Mehmet R	University of Leeds
Keywords: Manipulation Planning, Human Factors and Human-in-the-Loop Abstract: We are interested in the problem where a number of robots, in parallel, are trying to solve reaching through clutter problems in a simulated warehouse setting. In such a setting, we investigate the performance increase that can be achieved by using a human-in-the-loop providing guidance to robot planners. These manipulation problems are challenging for autonomous planners as they have to search for a solution in a high-dimensional space. In addition, physics simulators suffer from the uncertainty problem where a valid trajectory in simulation can be invalid when executing the trajectory in the real-world. To tackle these problems, we propose an online-replanning method with a human-in-the-loop. This system enables a robot to plan and execute a trajectory autonomously, but also to seek high-level suggestions from a human operator if required at any point during execution. This method aims to minimize the human effort required, thereby increasing the number of robots that can be guided in parallel by a single human operator. We performed experiments in simulation and on a real robot, using an experienced and a novice operator. Our results show a signiﬁcant increase in performance when using our approach in a simulated warehouse scenario and six robots.

13:00-13:15, Paper WeBT11.6
Neural Manipulation Planning on Constraint Manifolds

Qureshi, Ahmed Hussain	University of California San Diego
Dong, Jiangeng	UCSD
Choe, Austin	UCSD
Yip, Michael C.	University of California, San Diego
Keywords: Manipulation Planning, Learning from Demonstration, Motion and Path Planning Abstract: The presence of task constraints imposes a significant challenge to motion planning. Despite all recent advancements, existing algorithms are still computationally expensive for most planning problems. In this paper, we present Constrained Motion Planning Networks (CoMPNet), the first neural planner for multimodal kinematic constraints. Our approach comprises the following components: i) constraint and environment perception encoders; ii) neural robot configuration generator that outputs configurations on/near the constraint manifold(s), and iii) a bidirectional planning algorithm that takes the generated configurations to create a feasible robot motion trajectory. We show that CoMPNet solves practical motion planning tasks involving both unconstrained and constrained problems. Furthermore, it generalizes to new unseen locations of the objects, i.e., not seen during training, in the given environments with high success rates. When compared to the state-of-the-art constrained motion planning algorithms, CoMPNet outperforms by order of magnitude improvement in computational speed with a significantly lower variance.


WeBT12	Room T12
Manipulation Planning	Regular session
Chair: Bohg, Jeannette	Stanford University
Co-Chair: Yoon, Sung-eui	KAIST

11:45-12:00, Paper WeBT12.1
TORM: Fast and Accurate Trajectory Optimization of Redundant Manipulator Given an End-Effector Path
Video Attachment

Kang, Mincheul	KAIST
Shin, Heechan	KAIST
Kim, Donghyuk	KAIST
Yoon, Sung-eui	KAIST
Keywords: Manipulation Planning, Kinematics, Redundant Robots Abstract: A redundant manipulator has multiple inverse kinematics solutions per end-effector pose. Accordingly, there can be many trajectories for joints that follow a given endeffector path in the Cartesian space. In this paper, we present a trajectory optimization of a redundant manipulator (TORM) to synthesize a trajectory that follows a given end-effector path accurately, while achieving smoothness and collisionfree manipulation. Our method holistically incorporates three desired properties into the trajectory optimization process by integrating the Jacobian-based inverse kinematics solving method and an optimization-based motion planning approach. Specifically, we optimize a trajectory using two-stage gradient descent to reduce potential competition between different properties during the update. To avoid falling into local minima, we iteratively explore different candidate trajectories with our local update. We compare our method with state-of-the-art methods in test scenes including external obstacles and two non-obstacle problems. Our method robustly minimizes the pose error in a progressive manner while satisfying various desirable properties.

12:00-12:15, Paper WeBT12.2
Multi-Mode Trajectory Optimization for Impact-Aware Manipulation
Video Attachment

Stouraitis, Theodoros	The University of Edinburgh and Honda Research Institute EU
Yan, Lei	The University of Edinburgh
Moura, Joao	University of Edinburgh
Gienger, Michael	Honda Research Institute Europe
Vijayakumar, Sethu	University of Edinburgh
Keywords: Manipulation Planning, Optimization and Optimal Control, Compliance and Impedance Control Abstract: The transition from free motion to contact is a challenging problem in robotics, in part due to its hybrid nature. Additionally, disregarding the effects of impacts at the motion planning level often results in intractable impulsive contact forces. In this paper, we introduce an impact-aware multi-mode trajectory optimization (TO) method that combines hybrid dynamics and hybrid control in a coherent fashion. A key concept is the incorporation of an explicit contact force transmission model in the TO method. This allows the simultaneous optimization of the contact forces, contact timings, continuous motion trajectories and compliance, while satisfying task constraints. We compare our method against standard compliance control and an impact-agnostic TO method in physical simulations. Further, we experimentally validate the proposed method with a robot manipulator on the task of halting a large-momentum object.

12:15-12:30, Paper WeBT12.3
Multi-Object Rearrangement with Monte Carlo Tree Search: A Case Study on Planar Nonprehensile Sorting
Video Attachment

Song, Haoran	Hong Kong University of Science and Technology
Haustein, Joshua Alexander	KTH Royal Institute of Technology
Yuan, Weihao	Hong Kong University of Science and Technology
Hang, Kaiyu	Yale University
Wang, Michael Yu	Hong Kong University of Science & Technology
Kragic, Danica	KTH
Stork, Johannes A.	Orebro University
Keywords: Manipulation Planning, Deep Learning in Grasping and Manipulation, Motion and Path Planning Abstract: In this work, we address a planar non-prehensile sorting task. Here, a robot needs to push many densely packed objects belonging to different classes into a configuration where these classes are clearly separated from each other. To achieve this, we propose to employ Monte Carlo tree search equipped with a task-specific heuristic function. We evaluate the algorithm on various simulated and real-world sorting tasks. We observe that the algorithm is capable of reliably sorting large numbers of convex and non-convex objects, as well as convex objects in the presence of immovable obstacles.

12:30-12:45, Paper WeBT12.4
Learning Skills to Patch Plans Based on Inaccurate Models
Video Attachment

Lagrassa, Alex	Carnegie Mellon University
Lee, Steven	Carnegie Mellon University
Kroemer, Oliver	Carnegie Mellon University
Keywords: Manipulation Planning, Learning from Demonstration, Failure Detection and Recovery Abstract: Planners using accurate models can be effective for accomplishing manipulation tasks in the real world, but are typically highly specialized and require significant fine-tuning to be reliable. Meanwhile, learning is useful for adaptation, but can require a substantial amount of data collection. In this paper, we propose a method that improves the efficiency of sub-optimal planners with approximate but simple and fast models by switching to a model-free policy when unexpected transitions are observed. Unlike previous work, our method specifically addresses when the planner fails due to transition model error by patching with a local policy only where needed. First, we use a sub-optimal model-based planner to perform a task until model failure is detected. Next, we learn a local model-free policy from expert demonstrations to complete the task in regions where the model failed. To show the efficacy of our method, we perform experiments with a shape insertion puzzle and compare our results to both pure planning and imitation learning approaches. We then apply our method to a door opening task. Our experiments demonstrate that our patch-enhanced planner performs more reliably than pure planning and with lower overall sample complexity than pure imitation learning.

12:45-13:00, Paper WeBT12.5
Motion Planning for Dual-Arm Manipulation of Elastic Rods
Video Attachment

Sintov, Avishai	Tel-Aviv University
Macenski, Steven	University of Illinois at Urbana-Champaign
Borum, Andy	Cornell University
Bretl, Timothy	University of Illinois at Urbana-Champaign
Keywords: Dual Arm Manipulation, Manipulation Planning, Motion and Path Planning Abstract: We present a novel motion planning strategy for the manipulation of elastic rods with two robotic arms. In previous work, it has been shown that the free configuration space of an elastic rod, i.e., the set of equilibrium shapes of the rod, is a smooth manifold of a finite dimension that can be parameterized by one chart. Thus, a sampling-based planning algorithm is straightforward to implement in the product space of the joint angles and the equilibrium configuration space of the elastic rod. Preliminary results show that planning directly in this product space is feasible. However, solving for the elastic rod's shape requires the numerical solution of differential equations, resulting in an excessive and impractical runtime. Hence, we propose to pre-compute a descriptor of the rod, i.e., a roadmap in the free configuration space of the rod that captures its main-connectivity. By doing so, we can plan the motion of any dual-arm robotic system over this roadmap with dramatically fewer solutions of the differential equations. Experiments using the Open Motion Planning Library (OMPL) show significant runtime reduction by an order of magnitude.

13:00-13:15, Paper WeBT12.6
Learning Topological Motion Primitives for Knot Planning
Video Attachment

Yan, Mengyuan	Stanford University
Li, Gen	Tsinghua University
Zhu, Yilin	Stanford University
Bohg, Jeannette	Stanford University
Keywords: Manipulation Planning, Deep Learning in Grasping and Manipulation, Reinforecment Learning Abstract: In this paper, we approach the challenging problem of motion planning for knot tying. We propose a hierarchical approach in which the top layer produces a topological plan and the bottom layer translates this plan into continuous robot motion. The top layer decomposes a knotting task into sequences of abstract topological actions based on knot theory. The bottom layer translates each of these abstract actions into robot motion trajectories through learned topological motion primitives. To adapt each topological action to the specific rope geometry, the motion primitives take the observed rope configuration as input. We train the motion primitives by imitating human demonstrations and reinforcement learning in simulation. To generalize human demonstrations of simple knots into more complex knots, we observe similarities in the motion strategies of different topological actions and design the neural network structure to exploit such similarities. We demonstrate that our learned motion primitives can be used to efficiently generate motion plans for tying the overhand knot. The motion plan can then be executed on a real robot using visual tracking and Model Predictive Control. We also demonstrate that our learned motion primitives can be composed to tie a more complex pentagram-like knot despite being only trained on human demonstrations of simpler knots.

13:00-13:15, Paper WeBT12.7
Objective Functions for Principal Contact Estimation from Motion Based on the Geometrical Singular Condition

Ishikawa, Seiya	The University of Tokyo
Shirafuji, Shouhei	The University of Tokyo
Ota, Jun	The University of Tokyo
Keywords: Kinematics, Contact Modeling, Computational Geometry Abstract: In this paper, we propose objective functions to estimate the principal contact between a unknown manipulated target object and its unknown surroundings from the motion of the object. We derived the objective functions based on the fact that contact involves a pair of geometrical primitives (a point of vertex, a line of edge, and a plane of face) for the singular ondition of the calculation for their intersection or their spanned space from the point of view of geometrical algebra. The minimization of the proposed objective functions, which are differential quadratic forms of the Kronecker product of geometrical parameters, efficiently provided us the contact geometries that constrained the object movements. Additionally, the proposed objective functions are fundamentals for identifying contact during compliant manipulation, and we showed that the objective functions provide a clue for contact identification via experiments.


WeBT13	Room T13
Manipulation and Grasping	Regular session
Chair: Chakraborty, Nilanjan	Stony Brook University
Co-Chair: Hang, Kaiyu	Yale University

11:45-12:00, Paper WeBT13.1
Variable In-Hand Manipulations for Tactile-Driven Robot Hand Via CNN-LSTM
Video Attachment

Funabashi, Satoshi	Waseda University, Sugano Lab
Ogasa, Shun	Waseda University, Graduate School of Creative Science, Engineer
Isobe, Tomoki	Waseda University
Ogata, Tetsuya	Waseda University
Schmitz, Alexander	Waseda University
Tomo, Tito Pradhono	Waseda University
Sugano, Shigeki	Waseda University
Keywords: In-Hand Manipulation, Force and Tactile Sensing, Deep Learning in Grasping and Manipulation Abstract: Performing various in-hand manipulation tasks, without learning each individual task, would enable robots to act more versatile, while reducing the effort for training. However, in general it is difficult to achieve stable in-hand manipulation, because the contact state between the fingertips becomes difficult to model, especially for a robot hand with anthropomorphically shaped fingertips. Rich tactile feedback can aid the robust task execution, but on the other hand it is challenging to process high-dimensional tactile information. In the current paper we use two fingers of the Allegro hand, and each fingertip is anthropomorphically shaped and equipped not only with 6-axis force-torque (F/T) sensors, but also with uSkin tactile sensors, which provide 24 tri-axial measurement per fingertip. A convolutional neural network is used to process the high dimensional uSkin information, and a long short-term memory (LSTM) handles the time-series information. The network is trained to generate two different motions (``twist'' and ``push''). The desired motion is provided as a task-parameter to the network, with twist defined as -1 and push as +1. When values between -1 and +1 are used as the task parameter, the network is able to generate untrained motions in-between the two trained motions. Thereby, we can achieve multiple untrained manipulations, and can achieve robustness with high-dimensional tactile feedback.

12:00-12:15, Paper WeBT13.2
On Screw Linear Interpolation for Point-To-Point Path Planning
Video Attachment

Sarker, Anik	Stony Brook University
Sinha, Anirban	Stony Brook University
Chakraborty, Nilanjan	Stony Brook University
Keywords: Motion and Path Planning, Manipulation Planning, Industrial Robots Abstract: Robot motion is controlled in the joint space whereas the robots have to perform tasks in their task space. Many tasks like carrying a glass of liquid, pouring liquid, opening a drawer requires constraints on the end-effector during the motion. The forward and inverse kinematic mappings between joint space and task space are highly nonlinear and multi-valued (for IK). Consequently, modeling task space constraints like keeping the orientation of the end-effector fixed while changing its position (which is required for carrying a cup of liquid without dropping it) is quite complex in the joint space. In this paper, we show that the use of screw linear interpolation to plan motions in the task space combined with resolved motion rate control to compute the corresponding joint space path, allows one to satisfy many common task space motion constraints in motion planning, without explicitly modeling them. In particular, any motion constraint that forms a subgroup of the group of rigid body motions can be incorporated in our planning scheme, without explicit modeling. We present simulation and experimental results on Baxter robot for different tasks with task space constraints that demonstrates the usefulness of our approach.

12:15-12:30, Paper WeBT13.3
New Formulation of Mixed-Integer Conic Programming for Globally Optimal Grasp Planning
Video Attachment

Liu, Min	School of Computer, National University of Defense Technology
Pan, Zherong	The University of North Carolina at Chapel Hill
Xu, Kai	National University of Defense Technology
Manocha, Dinesh	University of Maryland
Keywords: Grasping, Optimization and Optimal Control Abstract: We present a two-level branch-and-bound (BB) algorithm to compute the optimal gripper pose that maximizes a grasp metric in a restricted search space. Our method can take the gripper�s kinematics feasibility into consideration to ensure that a given gripper can reach the set of grasp points without collisions or predict infeasibility with finite-time termination when no pose exists for a given set of grasp points. Our main technical contribution is a novel mixed-integer conic programming (MICP) formulation for the inverse kinematics of the gripper that uses a small number of binary variables and tightened constraints, which can be efficiently solved via a low-level BB algorithm. Our experiments show that optimal gripper poses for various target objects can be computed taking 20-180 minutes of computation on a desktop machine and the computed grasp quality, in terms of the Q1 metric, is better than those generated using sampling-based planners.

12:30-12:45, Paper WeBT13.4
Calculating the Support Function of Complex Continuous Surfaces with Applications to Minimum Distance Computation and Optimal Grasp Planning (I)

Zheng, Yu	Tencent
Hang, Kaiyu	Yale University
Keywords: Grasping, Collision Avoidance, Computational Geometry Abstract: The support function of a surface is a fundamental concept in mathematics and a crucial operation for algorithms in robotics, such as those for collision detection and grasp planning. It is possible to calculate the support function of a convex body in a closed form. For complex continuous, especially nonconvex, surfaces, however, this calculation can be far more difficult and no general solution is available so far, which limits the applicability of those related algorithms. This article first presents a branch-and-bound (B&B) algorithm to calculate the support function of complex continuous surfaces. An upper bound of the support function over a surface domain is derived. While a surface domain is divided into subdomains, the upper bound of the support function over any subdomain is proved to be not greater than the one over the original domain. Then, as the B&B algorithm sequentially divides the surface domain by dividing its subdomain having a greater upper bound than the others, the maximum upper bound over all subdomains is monotonically decreasing and converges to the exact value of the desired support function. Furthermore, with the aid of the B&B algorithm, this article derives new algorithms for the minimum distance between complex continuous surfaces and for globally optimal grasps on objects with continuous surfaces. A number of numerical examples are provided to demonstrate the effectiveness of the proposed algorithms.


WeBT14	Room T14
Perception for Grasping and Manipulation I	Regular session
Chair: Held, David	Carnegie Mellon University
Co-Chair: Liarokapis, Minas	The University of Auckland

11:45-12:00, Paper WeBT14.1
Model-Free, Vision-Based Object Identification and Contact Force Estimation with a Hyper-Adaptive Robotic Gripper

Hasan, Waris	University of Auckland
Gerez, Lucas	The University of Auckland
Liarokapis, Minas	The University of Auckland
Keywords: Perception for Grasping and Manipulation Abstract: Robots and intelligent industrial systems that focus on sorting or inspection of products require end-effectors that can grasp and manipulate the objects surrounding them. The capability of such systems largely depends on their ability to efficiently identify the objects and estimate the forces exerted on them. This paper presents an underactuated, compliant, and lightweight hyper-adaptive robot gripper that can efficiently discriminate between different everyday life objects and estimate the contact forces exerted on them during a single grasp, using vision based techniques. The hyper-adaptive mechanism consists of an array of movable steel rods that get reconfigured conforming to the geometry of the grasped object. The proposed object identification and force estimation techniques are model-free and do not rely on time consuming object exploration. A series of experiments have been carried out to discriminate between 12 different everyday life objects and estimate the forces exerted on a dynamometer. During each grasp, a series of images are captured that detect the reconfiguration of the hyper-adaptive grasping mechanism. These images are then used by an image processing algorithm to extract the required information about the gripper reconfiguration, classify the object grasped using a Random Forests (RF) classifier, and estimate the amount of force being exerted. The employed RF classifier gives a prediction accuracy of 100%, while the results of the force estimation techniques (Neural Networks, Random Forests, and 3rd order polynomial) range from 94.7% to 99.1%.

12:00-12:15, Paper WeBT14.2
Online Acquisition of Close-Range Proximity Sensor Models for Precise Object Grasping and Verification
Video Attachment

Hasegawa, Shun	The University of Tokyo
Yamaguchi, Naoya	The University of Tokyo
Okada, Kei	The University of Tokyo
Inaba, Masayuki	The University of Tokyo
Keywords: Perception for Grasping and Manipulation, Sensor Fusion, Grippers and Other End-Effectors Abstract: This study presents an approach for acquiring model parameters of close-range approximate proximity sensors on a robot hand using long-range distance sensors while that hand is grasping an object. The acquired models are used to generate a precise close-range distance output. We aim herein to acquire proximity sensors that have little dependence on object properties and that can sense a wide range (i.e., both close and long ranges). Simple close-range sensors strongly depend on object properties such as reflectance, material, volume, and/or conductivity, whereas long-range sensors cannot precisely measure the close range. To accomplish our goal, we fused close- and long-range sensors. Simple fusion remains object dependent at the close range. Hence, we acquired an object-dependent parameter in the close-range sensor model using the distance output of the long-range sensor at the overlap of the two sensor types. Through real robot experiments, we evaluated the precision of the generated distance output at the close range and found it useful to the precise grasping of compliant objects. We also confirmed that the acquired object-dependent parameter can verify ultra-thin object grasping.

12:15-12:30, Paper WeBT14.3
Acoustic Collision Detection and Localization for Robot Manipulators
Video Attachment

Fan, Xiaoran	Rutgers University
Lee, Daewon	Samsung AI Center New York
Chen, Yuan	Samsung AI Center New York
Prepscius, Colin	Samsung
Isler, Volkan	University of Minnesota
Jackel, Larry	NVIDIA Corp
Seung, Sebastian	Samsung AI Center NY
Lee, Daniel	Cornell Tech
Keywords: Perception for Grasping and Manipulation, Sensor Fusion, Force and Tactile Sensing Abstract: Collision detection is critical for safe robot operation in the presence of humans. Acoustic information originating from collisions between robots and objects provides opportunities for fast collision detection and localization; however, audio information from microphones on robot manipulators needs to be robustly differentiated from motors and external noise sources. In this paper, we present Panotti, the first system to efficiently detect and localize on-robot collisions using low-cost microphones. We present a novel algorithm that can localize the source of a collision with centimeter level accuracy and is also able to reject false detections using a robust spectral filtering scheme. Our method is scalable, easy to deploy, and enables safe and efficient control for robot manipulator applications. We implement and demonstrate a prototype that consists of 8 miniature microphones on a 7 degree of freedom (DOF) manipulator to validate our design. Extensive experiments show that Panotti realizes near perfect on-robot true positive collision detection rate with almost zero false detections even in high noise environments. In terms of accuracy, it achieves an average localization error of less than 3.8 cm under various experimental settings.

12:30-12:45, Paper WeBT14.4
Estimating an Object�s Inertial Parameters by Robotic Pushing: A Data-Driven Approach

Mavrakis, Nikos	University of Surrey
Ghalamzan Esfahani, Amir Masoud	University of Lincoln
Stolkin, Rustam	University of Birmingham
Keywords: Perception for Grasping and Manipulation, Robotics in Hazardous Fields, Big Data in Robotics and Automation Abstract: Estimating the inertial properties of an object can make robotic manipulations more efficient, especially in extreme environments. This paper presents a novel method of estimating the 2D inertial parameters of an object, by having a robot applying a push on it. We draw inspiration from previous analyses on quasi-static pushing mechanics, and introduce a data-driven model that can accurately represent these mechanics and provide a prediction for the object's inertial parameters. We evaluate the model with two datasets. For the first dataset, we set up a V-REP simulation of seven robots pushing objects with large range of inertial parameters, acquiring 48000 pushes in total. For the second dataset, we use the object pushes from the MIT M-Cube lab pushing dataset. We extract features from force, moment and velocity measurements of the pushes, and train a Multi-Output Regression Random Forest. The experimental results show that we can accurately predict the 2D inertial parameters from a single push, and that our method retains this robust performance under various surface types.

12:45-13:00, Paper WeBT14.5
Kinematic Multibody Model Generation of Deformable Linear Objects from Point Clouds

Wnuk, Markus	Institute for Control Engineering of Machine Tools and Manufactu
Hinze, Christoph	Institute for Control Engineering of Machine Tools and Manufactu
Lechler, Armin	University Stuttgart
Verl, Alexander	University of Stuttgart
Keywords: Perception for Grasping and Manipulation, Object Detection, Segmentation and Categorization, Kinematics Abstract: Control and localization of deformable linear objects (DLOs) require models to handle their deformation. This paper proposes an approach to automatically generate a model from available visual sensor information. Based on point cloud data obtained from a 3D stereo camera, the kinematics of a multibody model formulation are derived. The approach aims to balance the tradeoff between computational complexity and model accuracy. This is achieved with a geometric error criterion that reduces the introduced degrees of freedom of the model to a necessary minimum, representing the continuous shape with as few bodies as possible. The approach is evaluated analytically and validated with an experimental scenario of DLO manipulation.

13:00-13:15, Paper WeBT14.6
Cloth Region Segmentation for Robust Grasp Selection
Video Attachment

Qian, Jianing	Carnegie Mellon University
Weng, Thomas	Carnegie Mellon University
Zhang, Luxin	Carnegie Mellon University
Okorn, Brian	Carnegie Mellon University
Held, David	Carnegie Mellon University
Keywords: Perception for Grasping and Manipulation, Object Detection, Segmentation and Categorization, Computer Vision for Other Robotic Applications Abstract: Cloth detection and manipulation is a common task in domestic and industrial settings, yet such tasks remain a challenge for robots due to cloth deformability. Furthermore, in many cloth-related tasks like laundry folding and bed making, it is crucial to manipulate specific regions like edges and corners, as opposed to folds. In this work, we focus on the problem of segmenting and grasping these key regions. Our approach trains a network to segment the edges and corners of a cloth from a depth image, distinguishing such regions from wrinkles or folds. We also provide a novel algorithm for estimating the grasp location, direction, and directional uncertainty from the segmentation. We demonstrate our method on a real robot system and show that it outperforms baseline methods on grasping success. Video and other supplementary materials are available at: https://sites.google.com/view/cloth-segmentation.


WeBT15	Room T15
Perception for Grasping and Manipulation II	Regular session
Chair: Angelova, Anelia	Google Research
Co-Chair: Bekris, Kostas E.	Rutgers, the State University of New Jersey

11:45-12:00, Paper WeBT15.1
Physics-Based Dexterous Manipulations with Estimated Hand Poses and Residual Reinforcement Learning
Video Attachment

Garcia-Hernando, Guillermo	Imperial College London
Johns, Edward	Imperial College London
Kim, Tae-Kyun	Imperial College London
Keywords: Virtual Reality and Interfaces, Computer Vision for Other Robotic Applications, Dexterous Manipulation Abstract: Dexterous manipulation of objects in virtual environments with our bare hands, by using only a depth sensor and a state-of-the-art 3D hand pose estimator (HPE), is challenging. While virtual environments are ruled by physics, e.g. object weights and surface frictions, the absence of force feedback makes the task challenging, as even slight inaccuracies on finger tips or contact points from HPE may make the interactions fail. Prior arts simply generate contact forces in the direction of the fingers' closures, when finger joints penetrate virtual objects. Although useful for simple grasping scenarios, they cannot be applied to dexterous manipulations such as in-hand manipulation. Existing reinforcement learning (RL) and imitation learning (IL) approaches train agents that learn skills by using task-specific rewards, without considering any online user input. In this work, we propose to learn a model that maps noisy input hand poses to target virtual poses, which introduces the needed contacts to accomplish the tasks on a physics simulator. The agent is trained in a residual setting by using a model-free hybrid RL+IL approach. A 3D hand pose estimation reward is introduced leading to an improvement on HPE accuracy when the physics-guided corrected target poses are remapped to the input space. As the model corrects HPE errors by applying minor but crucial joint displacements for contacts, this helps to keep the generated motion visually close to the user input. Since HPE sequences performing successful virtual interactions do not exist, a data generation scheme to train and evaluate the system is proposed. We test our framework in two applications that use hand pose estimates for dexterous manipulations: hand-object interactions in VR and hand-object motion reconstruction in-the-wild. Experiments show that the proposed method outperforms various RL/IL baselines and the simple prior art of enforcing hand closure, both in task success and hand pose accuracy.

12:00-12:15, Paper WeBT15.2
Affordance-Based Grasping and Manipulation in Real World Applications
Video Attachment

Pohl, Christoph	Karlsruhe Institute of Technology (KIT)
Hitzler, Kevin	Karlsruhe Institute of Technology (KIT)
Grimm, Raphael	Karlsruhe Institute of Technology (KIT)
Zea, Antonio	Karlsruhe Institute of Technology
Hanebeck, Uwe D.	Karlsruhe Institute of Technology (KIT)
Asfour, Tamim	Karlsruhe Institute of Technology (KIT)
Keywords: Humanoid Robot Systems, Perception for Grasping and Manipulation, Robotics in Hazardous Fields Abstract: In real world applications, robotic solutions remain impractical due to the challenges that arise in unknown and unstructured environments. To perform complex manipulation tasks in complex and cluttered situations, robots need to be able to identify the interaction possibilities with the scene, i.e. the affordances of the objects encountered. In unstructured environments with noisy perception, insufficient scene understanding and limited prior knowledge, this is a challenging task. In this work, we present an approach for grasping unknown objects in cluttered scenes with a humanoid robot in the context of a nuclear decommissioning task. Our approach combines the convenience and reliability of autonomous robot control with the precision and adaptability of teleoperation in a semi-autonomous selection of grasp affordances. Additionally, this allows exploiting the expert knowledge of an experienced human worker. To evaluate our approach, we conducted 75 real world experiments with more than 660 grasp executions on the humanoid robot ARMAR-6. The results demonstrate that high-level decisions made by the human operator, supported by autonomous robot control, contribute significantly to successful task execution.

12:15-12:30, Paper WeBT15.3
X-Ray: Mechanical Search for an Occluded Object by Minimizing Support of Learned Occupancy Distributions
Video Attachment

Danielczuk, Michael	UC Berkeley
Angelova, Anelia	Google Research
Vanhoucke, Vincent	Google Research
Goldberg, Ken	UC Berkeley
Keywords: Perception for Grasping and Manipulation, RGB-D Perception, Deep Learning in Grasping and Manipulation Abstract: For applications in e-commerce, warehouses, healthcare, and home service, robots are often required to search through heaps of objects to grasp a specific target object. For mechanical search, we introduce X-Ray, an algorithm based on learned occupancy distributions. We train a neural network using a synthetic dataset of RGBD heap images labeled for a set of standard bounding box targets with varying aspect ratios. X-Ray minimizes support of the learned distribution as part of a mechanical search policy in both simulated and real environments. We benchmark these policies against two baseline policies on 1,000 heaps of 15 objects in simulation where the target object is partially or fully occluded. Results suggest that X-Ray is significantly more efficient, as it succeeds in extracting the target object 82% of the time, 15% more often than the best-performing baseline. Experiments on an ABB YuMi robot with 20 heaps of 25 household objects suggest that the learned policy transfers easily to a physical system, where it outperforms baseline policies by 15% in success rate with 17% fewer actions. Datasets, videos, and experiments are available at https://sites.google.com/berkeley.edu/x-ray.

12:30-12:45, Paper WeBT15.4
Making Robots Draw a Vivid Portrait in Two Minutes

Gao, Fei	Advanced Institute of Information Technology (AIIT), Peking Univ
Zhu, Jingjie	Advanced Institute of Information Technology Peking University
Yu, Zeyuan	Advanced Institute of Information Technology Peking University
Li, Peng	Advanced Institute of Information Technology, Peking University
Wang, Tao	Peking University
Keywords: Entertainment Robotics, Computer Vision for Other Robotic Applications, Deep Learning for Visual Perception Abstract: Significant progress has been made with artistic robots. However, existing robots fail to produce high-quality portraits in a short time. In this work, we present a drawing robot, which can automatically transfer a facial picture to a vivid portrait, and then draw it on paper within two minutes averagely. At the heart of our system is a novel portrait synthesis algorithm based on deep learning. Innovatively, we employ a self-consistency loss, which makes the algorithm capable of generating continuous and smooth brush-strokes. Besides, we propose a componential sparsity constraint to reduce the number of brush-strokes over insignificant areas. We also implement a local sketch synthesis algorithm, and several pre- and post-processing techniques to deal with the background and details. The portrait produced by our algorithm successfully captures individual characteristics by using a sparse set of continuous brush-strokes. Finally, the portrait is converted to a sequence of trajectories and reproduced by a 3-degree-of-freedom robotic arm. The whole portrait drawing robotic system is named AiSketcher. Extensive experiments show that AiSketcher can produce considerably high-quality sketches for a wide range of pictures, including faces in-the-wild and universal images of arbitrary content. To our best knowledge, AiSketcher is the first portrait drawing robot that uses neural style transfer techniques. AiSketcher has attended a quite number of exhibitions and shown remarkable performance under diverse circumstances.

12:45-13:00, Paper WeBT15.5
Task-Driven Perception and Manipulation for Constrained Placement of Unknown Objects
Video Attachment

Mitash, Chaitanya	Rutgers University
Shome, Rahul	Rice University
Wen, Bowen	Rutgers University
Boularias, Abdeslam	Rutgers University
Bekris, Kostas E.	Rutgers, the State University of New Jersey
Keywords: Perception for Grasping and Manipulation, Manipulation Planning, Dual Arm Manipulation Abstract: Recent progress in robotic manipulation has dealt with the case of previously unknown objects in the context of relatively simple tasks, such as bin-picking. Existing methods for more constrained problems, however, such as deliberate placement in a tight region, depend more critically on shape information to achieve safe execution. This work deals with pick-and-constrained placement of objects without access to geometric models. The objective is to pick an object and place it safely inside a desired goal region without any collisions, while minimizing the time and the sensing operations required to complete the task. An algorithmic framework is proposed for this purpose, which performs manipulation planning simultaneously over a conservative and an optimistic estimate of the object�s volume. The conservative estimate ensures that the manipulation is safe while the optimistic estimate guides the sensor-based manipulation process when no solution can be found for the conservative estimate. To maintain these estimates and dynamically update them during manipulation, objects are represented by a simple volumetric representation, which stores sets of occupied and unseen voxels. The effectiveness of the proposed approach is demonstrated by developing a robotic system that picks a previously unseen object from a table-top and places it in a constrained space. The system comprises of a dual-arm manipulator with heterogeneous end-effectors and leverages hand-offs as a re-grasping strategy. Real-world experiments show that straightforward pick-sense-and-place alternatives frequently fail to solve pick-and-constrained placement problems. The proposed pipeline, however, achieves more than 95% success rate and faster execution times as evaluated over multiple physical experiments.

13:00-13:15, Paper WeBT15.6
Point Cloud Projective Analysis for Part-Based Grasp Planning
Video Attachment

Monica, Riccardo	University of Parma
Aleotti, Jacopo	University of Parma
Keywords: Perception for Grasping and Manipulation, Object Detection, Segmentation and Categorization, Semantic Scene Understanding Abstract: This work presents an approach for part-based grasp planning in point clouds. A complete pipeline is proposed that allows a robot manipulator equipped with a range camera to perform object detection, categorization, segmentation into meaningful parts, and part-based semantic grasping. A supervised image-space technique is adopted for point cloud segmentation based on projective analysis. Projective analysis generates a set of 2D projections from the input object point cloud, labels each object projection by transferring knowledge from existing labeled images, and then fuses the labels by back-projection on the object point cloud. We introduce an algorithm for point cloud categorization based on 2D projections. We also propose a viewpoint aware algorithm that filters 2D projections according to the scanning path of the robot. Object categorization and segmentation experiments were carried out with both synthetic and real datasets. Results indicate that the proposed approach performs better than a CNN-based method for a training set of limited size. Finally, we show part-based grasping tasks in a real robotic setup.


WeBT16	Room T16
Learning for Grasping	Regular session
Chair: Hermans, Tucker	University of Utah
Co-Chair: Sahin, Ferat	Rochester Institute of Technology

11:45-12:00, Paper WeBT16.1
Grasping Detection Network with Uncertainty Estimation for Confidence-Driven Semi-Supervised Domain Adaptation

Zhu, Haiyue	Singapore Institute of Manufacturing Technology
Yiting, Li	National University of Singapore
Bai, Fengjun	Advanced Remanufacturing and Technology Center
Chen, Wenjie	Singapore Inst. of Manufacturing Technology
Li, Xiaocong	A*STAR
Ma, Jun	National University of Singapore
Teo, Chek Sing	SIMTech
Tao, Pey Yuen	SIMTech
Lin, Wei	SIMTech, A*STAR
Keywords: Deep Learning in Grasping and Manipulation, Novel Deep Learning Methods, Perception for Grasping and Manipulation Abstract: Data-efficient domain adaptation with only a few labelled data is desired for many robotic applications, e.g., in grasping detection, the inference skill learned from a grasping dataset is not universal enough to directly apply on various other daily/industrial applications. This paper presents an approach enabling the easy domain adaptation through a novel grasping detection network with confidence-driven semi-supervised learning, where these two components deeply interact with each other. The proposed grasping detection network specially provides a prediction uncertainty estimation mechanism by leveraging on Feature Pyramid Network (FPN), and the mean-teacher semi-supervised learning utilizes such uncertainty information to emphasizing the consistency loss only for those unlabelled data with high confidence, which we referred it as the confidence-driven mean teacher. This approach largely prevents the student model to learn the incorrect/harmful information from the consistency loss, which speeds up the learning progress and improves the model accuracy. Our results show that the proposed network can achieve high success rate on the Cornell grasping dataset, and for domain adaptation with very limited data, the confidence-driven mean teacher outperforms the original mean teacher and direct training by more than 10% in evaluation loss especially for avoiding the overfitting and model diverging.

12:00-12:15, Paper WeBT16.2
Batch Normalization Masked Sparse Autoencoder for Robotic Grasping Detection

Shao, Zhenzhou	Capital Normal University
Qu, Ying	The University of Tennessee, Knoxville
Ren, Guangli	Institute of Automation, Chinese Academy of Sciences
Wang, Guohui	Capital Normal University
Guan, Yong	Capital Normal University
Shi, Zhiping	Capital Normal University
Tan, Jindong	University of Tennessee, Knoxville
Keywords: Deep Learning in Grasping and Manipulation, RGB-D Perception, Perception for Grasping and Manipulation Abstract: To improve the accuracy of the grasping detection, this paper proposes a novel detector with batch normalization masked evaluation model. It is designed with a two-layer sparse autoencoder, and a Batch Normalization based mask is incorporated into the second layer of the model to effectively reduce the features with weak correlation. The extracted features from such model are more distinctive, which guarantees the higher accuracy of the grasping detection. Extensive experiments show that the proposed evaluation model outperforms the state-of-the-art, and the recognition accuracy can reach 95.51% for robotic grasping detection.

12:15-12:30, Paper WeBT16.3
No-Regret Shannon Entropy Regularized Neural Contextual Bandit Online Learning for Robotic Grasping
Video Attachment

Lee, Kyungjae	Seoul National University
Choy, JaeGoo	Seoul National University
Choi, Yunho	Seoul National University
Kee, Hogun	Seoul National University
Oh, Songhwai	Seoul National University
Keywords: Deep Learning in Grasping and Manipulation, Robust/Adaptive Control of Robotic Systems, Reinforecment Learning Abstract: In this paper, we propose a novel contextual bandit algorithm that employs a neural network as a reward estimator and utilizes Shannon entropy regularization to encourage exploration, which is called Shannon entropy regularized neural contextual bandits (SERN). In many learning-based algorithms for robotic grasping, the lack of the real-world data hampers the generalization performance of a model and makes it difficult to apply a trained model to the real-world problems. To handle this issue, the proposed method utilizes the benefit of online learning. The proposed method trains a neural network to predict the success probability of a given grasp pose based on a depth image, which is called a grasp quality. The policy is defined as a softmax distribution of a grasp quality which is induced by the Shannon entropy regularization. The proposed method explores diverse grasp poses due to the softmax distribution, but promising grasp poses based on the estimated grasp quality are explored more frequently. We also theoretically show that the SERN has a no regret property. We empirically demonstrate that the SERN outperforms epsilon-greedy in terms of sample efficiency.

12:30-12:45, Paper WeBT16.4
Antipodal Robotic Grasping Using Generative Residual Convolutional Neural Network
Video Attachment

Kumra, Sulabh	Rochester Institute of Technology
Joshi, Shirin	RIT
Sahin, Ferat	Rochester Institute of Technology
Keywords: Deep Learning in Grasping and Manipulation, Manipulation Planning, Grasping Abstract: In this paper, we present a modular robotic system to tackle the problem of generating and performing antipodal robotic grasps for unknown objects from n-channel image of the scene. We propose a novel Generative Residual Convolutional Neural Network (GR-ConvNet) model that can generate robust antipodal grasps from n-channel input at real-time speeds (~20ms). We evaluate the proposed model architecture on standard datasets and a diverse set of household objects. We achieved state-of-the-art accuracy of 97.7% and 94.6% on Cornell and Jacquard grasping datasets respectively. We also demonstrate a grasp success rate of 95.4% and 93% on household and adversarial objects respectively using a 7 DoF robotic arm.

12:45-13:00, Paper WeBT16.5
Grasping in the Wild: Learning 6DoF Closed-Loop Grasping from Low-Cost Demonstrations
Video Attachment

Song, Shuran	Columbia University
Zeng, Andy	Google
Lee, Johnny	Google
Funkhouser, Thomas A.	Princeton University
Keywords: Deep Learning in Grasping and Manipulation, Deep Learning for Visual Perception, Perception for Grasping and Manipulation Abstract: Intelligent manipulation benefits from the capacity to flexibly control an end-effector with high degrees of freedom (DoF) and dynamically react to the environment. However, due to the challenges of collecting effective training data and learning efficiently, most grasping algorithms today are limited to top-down movements and open-loop execution. In this work, we propose a new low-cost hardware interface for collecting grasping demonstrations by people in diverse environments. This data makes it possible to train a robust end-to-end 6DoF closed-loop grasping model with reinforcement learning that transfers to real robots. A key aspect of our grasping model is that it uses ``action-view'' based rendering to simulate future states with respect to different possible actions. By evaluating these states using a learned value function (e.g., Q-function), our method is able to better select corresponding actions that maximize total rewards (i.e., grasping success). Our final grasping system is able to achieve reliable 6DoF closed-loop grasping of novel objects across various scene configurations, as well as in dynamic scenes with moving objects.

13:00-13:15, Paper WeBT16.6
Multi-Fingered Grasp Planning Via Inference in Deep Neural Networks (I)
Video Attachment

Lu, Qingkai	University of Utah
Van der Merwe, Mark	University of Utah
Sundaralingam, Balakumar	University of Utah
Hermans, Tucker	University of Utah
Keywords: Deep Learning in Grasping and Manipulation, Grasping, Perception for Grasping and Manipulation Abstract: We propose a novel approach to multi-fingered grasp planning leveraging learned deep neural network models. We train a voxel-based 3D convolutional neural network to predict grasp success probability as a function of both visual information of an object and grasp configuration. We can then formulate grasp planning as inferring the grasp configuration which maximizes the probability of grasp success. In addition, we learn a prior over grasp configurations as a mixture density network conditioned on our voxel-based object representation. We show that this object conditional prior improves grasp inference when used with the learned grasp success prediction network when compared to a learned, object-agnostic prior, or an uninformed uniform prior. Our work is the first to directly plan high quality multi-fingered grasps in configuration space using a deep neural network without the need of an external planner. We validate our inference method performing multi-finger grasping on a physical robot. Our experimental results show that our planning method outperforms existing grasp planning methods for neural networks.


WeBT17	Room T17
Learning for Manipulation	Regular session
Chair: Bhattacharjee, Tapomayukh	University of Washington
Co-Chair: Kroeger, Torsten	Karlsruher Institut F�r Technologie (KIT)

11:45-12:00, Paper WeBT17.1
Deep Imitation Learning of Sequential Fabric Smoothing from an Algorithmic Supervisor
Video Attachment

Seita, Daniel	University of California, Berkeley
Ganapathi, Aditya	University of California, Berkeley
Hoque, Ryan	University of California, Berkeley
Hwang, Minho	University of California Berkeley
Cen, Edward	University of California, Berkeley
Tanwani, Ajay Kumar	UC Berkeley
Balakrishna, Ashwin	University of California, Berkeley
Thananjeyan, Brijen	UC Berkeley
Ichnowski, Jeffrey	UC Berkeley
Jamali, Nawid	Honda Research Institute USA
Yamane, Katsu	Honda
Iba, Soshi	Honda Research Institute USA
Canny, John F.	University of California, Berkeley
Goldberg, Ken	UC Berkeley
Keywords: Learning from Demonstration, RGB-D Perception, Deep Learning in Grasping and Manipulation Abstract: Sequential pulling policies to flatten and smooth fabrics have applications from surgery to manufacturing to home tasks such as bed making and folding clothes. Due to the complexity of fabric states and dynamics, we apply deep imitation learning to learn policies that, given color (RGB), depth (D), or combined color-depth (RGBD) images of a rectangular fabric sample, estimate pick points and pull vectors to spread the fabric to maximize coverage. To generate data, we develop a fabric simulator and an algorithmic supervisor that has access to complete state information. We train policies in simulation using domain randomization and dataset aggregation (DAgger) on three tiers of difficulty in the initial randomized configuration. We present results comparing five baseline policies to learned policies and report systematic comparisons of RGB vs D vs RGBD images as inputs. In simulation, learned policies achieve comparable or superior performance to analytic baselines. In 180 physical experiments with the da Vinci Research Kit (dVRK) surgical robot, RGBD policies trained in simulation attain coverage of 83% to 95% depending on difficulty tier, suggesting that effective fabric smoothing policies can be learned from an algorithmic supervisor and that depth sensing is a valuable addition to color alone. Supplementary material is available at https: //sites.google.com/view/fabric-smoothing.

12:00-12:15, Paper WeBT17.2
Adaptive Robot-Assisted Feeding: An Online Learning Framework for Acquiring Previously Unseen Food Items
Video Attachment

Gordon, Ethan Kroll	University of Washington
Meng, Xiang	University of Washington
Bhattacharjee, Tapomayukh	University of Washington
Barnes, Matt	University of Washington
Srinivasa, Siddhartha	University of Washington
Keywords: Physically Assistive Devices, AI-Based Methods, Perception for Grasping and Manipulation Abstract: A successful robot-assisted feeding system requires bite acquisition of a wide variety of food items. It must adapt to changing user food preferences under uncertain visual and physical environments. Different food items in different environmental conditions require different manipulation strategies for successful bite acquisition. Therefore, a key challenge is how to handle previously unseen food items with very different success rate distributions over strategy. Combining low-level controllers and planners into discrete action trajectories, we show that the problem can be represented using a linear contextual bandit setting. We construct a simulated environment using a doubly robust loss estimate from previously seen food items, which we use to tune the parameters of off-the-shelf contextual bandit algorithms. Finally, we demonstrate empirically on a robot-assisted feeding system that, even starting with a model trained on thousands of skewering attempts on dissimilar previously seen food items, epsilon-greedy and LinUCB algorithms can quickly converge to the most successful manipulation strategy.

12:15-12:30, Paper WeBT17.3
RobotVQA � a Scene-Graph and Deep-Learning-Based Visual Question Answering System for Robot Manipulation
Video Attachment

Kenghagho Kenfack, Franklin	University of Bremen
Siddiky, Feroz Ahmed	1986
Balint-Benczedi, Ferenc	University of Bremen
Beetz, Michael	University of Bremen
Keywords: Deep Learning for Visual Perception, Semantic Scene Understanding, Perception for Grasping and Manipulation Abstract: Abstract� Visual robot perception has been challenging to successful robot manipulation in noisy, cluttered and dynamic environments. While some perception systems fail to provide an adequate semantics of the scene, others fail to present appropriate learning models and training data. Another major issue encountered in some robot perception systems is their inability to promptly respond to robot control programs whose realtimeness is crucial. This paper proposes an architecture to robot vision for manipulation tasks that addresses the three issues mentioned above. The architecture encompasses a generator of training datasets and a learnable scene describer, coined as RobotVQA for Robot Visual Question Answering. The architecture leverages the power of deep learning to predict and photo-realistic virtual worlds to train. RobotVQA takes as input a robot scene�s RGB or RGBD image, detects all relevant objects in it, then describes in realtime each object in terms of category, color, material, shape, openability, 6D-pose and segmentation mask. Moreover, RobotVQA computes the qualitative spatial relations among those objects. We refer to such a scene description in this paper as scene graph or semantic graph of the scene. In RobotVQA, prediction and training take place in a unified manner. Finally, we demonstrate how RobotVQA is suitable for robot control systems that interpret perception as a question answering process.

12:30-12:45, Paper WeBT17.4
Model-Based Quality-Diversity Search for Efficient Robot Learning

Keller, Leon	TU Darmstadt
Tanneberg, Daniel	Technische Universitaet Darmstadt
Stark, Svenja	Technical University Darmstadt
Peters, Jan	Technische Universit�t Darmstadt
Keywords: AI-Based Methods, Autonomous Agents, Transfer Learning Abstract: Despite recent progress in robot learning, it still remains a challenge to program a robot to deal with open-ended object manipulation tasks. One approach that was recently used to autonomously generate a repertoire of diverse skills is a novelty based Quality-Diversity~(QD) algorithm. However, as most evolutionary algorithms, QD suffers from sample-inefficiency and, thus, it is challenging to apply it in real-world scenarios. This paper tackles this problem by integrating a neural network that predicts the behavior of the perturbed parameters into a novelty based QD algorithm. In the proposed Model-based Quality-Diversity search (M-QD), the network is trained concurrently to the repertoire and is used to avoid executing unpromising actions in the novelty search process. Furthermore, it is used to adapt the skills of the final repertoire in order to generalize the skills to different scenarios. Our experiments show that enhancing a QD algorithm with such a forward model improves the sample-efficiency and performance of the evolutionary process and the skill adaptation.

12:45-13:00, Paper WeBT17.5
Transferring Experience from Simulation to the Real World for Precise Pick-And-Place Tasks in Highly Cluttered Scenes
Video Attachment

Kleeberger, Kilian	Fraunhofer IPA
V�lk, Markus	Fraunhofer IPA
Moosmann, Marius	Fraunhofer IPA
Thiessenhusen, Erik	Fraunhofer IPA
Roth, Florian	Fraunhofer IPA
Bormann, Richard	Fraunhofer IPA
Huber, Marco F.	University of Stuttgart
Keywords: Deep Learning in Grasping and Manipulation, Perception for Grasping and Manipulation, Computer Vision for Automation Abstract: In this paper, we introduce a novel learning-based approach for grasping known rigid objects in highly cluttered scenes and precisely placing them based on depth images. Our Placement Quality Network (PQ-Net) estimates the object pose and the quality for each automatically generated grasp pose for multiple objects simultaneously at 92 fps in a single forward pass of a neural network. All grasping and placement trials are executed in a physics simulation and the gained experience is transferred to the real world using domain randomization. We demonstrate that our policy successfully transfers to the real world. PQ-Net outperforms other model-free approaches in terms of grasping success rate and automatically scales to new objects of arbitrary symmetry without any human intervention.

13:00-13:15, Paper WeBT17.6
Self-Supervised Learning for Precise Pick-And-Place without Object Model
Video Attachment

Berscheid, Lars	Karlsruhe Institute of Technology
Mei�ner, Pascal	University of Aberdeen
Kroeger, Torsten	Karlsruher Institut F�r Technologie (KIT)
Keywords: Deep Learning in Grasping and Manipulation, Grasping Abstract: Flexible pick-and-place is a fundamental yet challenging task within robotics, in particular due to the need of an object model for a simple target pose definition. In this work, the robot instead learns to pick-and-place objects using planar manipulation according to a single, demonstrated goal state. Our primary contribution lies within combining robot learning of primitives, commonly estimated by fully-convolutional neural networks, with one-shot imitation learning. Therefore, we define the place reward as a contrastive loss between real-world measurements and a task-specific noise distribution. Furthermore, we design our system to learn in a self-supervised manner, enabling real-world experiments with up to 25000 pick-and-place actions. Then, our robot is able to place trained objects with an average placement error of 2.7�0.2 mm and 2.6�0.8�. As our approach does not require an object model, the robot is able to generalize to unknown objects keeping a precision of 5.9�1.1 mm and 4.1�1.2�. We further show a range of emerging behaviors: The robot naturally learns to select the correct object in the presence of multiple object types, precisely inserts objects within a peg game, picks screws out of dense clutter, and infers multiple pick-and-place actions from a single goal state.

13:00-13:15, Paper WeBT17.7
Generating Reactive Approach Motions towards Allowable Manifolds Using Generalized Trajectories from Demonstrations
Video Attachment

Vergara Perico, Cristian Alejandro	KU Leuven
Iregui, Santiago	KU Leuven
De Schutter, Joris	KU Leuven
Aertbelien, Erwin	KU Leuven
Keywords: Reactive and Sensor-Based Planning, Motion and Path Planning, Learning from Demonstration Abstract: There is a high cost associated with the time and expertise required to program complex robot applications with high variability. This is one of the main barriers that inhibit the entry of robotic automation in small and medium-sized enterprises. To tackle the high level of task uncertainty associated with changing conditions of the environment, we propose a framework that leverages a combination between learning from demonstration (LfD) and constraint-based task specification and control. This synergy enables our framework to use LfD to generalize reactive approach motions (RAMo) towards not only a single pose but towards an allowable manifold defined with respect to the object to interact with. As a result, the robot executes the task by following a feasible approach motion generalized from the learned information. This approach motion is generated based on an initial representation of the environment, and it can be reactively adapted in function of current updates of the environment using sensor information. The proposed framework enables the system to deal with applications that involve a high level of uncertainty, increasing the flexibility and robustness, compared to traditional sense-plan-act paradigms.


WeBT18	Room T18
Reinforcement Learning for Manipulation	Regular session
Chair: Beetz, Michael	University of Bremen
Co-Chair: Solowjow, Eugen	Siemens Corporation

11:45-12:00, Paper WeBT18.1
Simultaneous Planning for Item Picking and Placing by Deep Reinforcement Learning

Tanaka, Tatsuya	Toshiba Corporation
Kaneko, Toshimitsu	Toshiba Corporation
Sekine, Masahiro	Toshiba Corporation
Tangkaratt, Voot	RIKEN
Sugiyama, Masashi	The University of Tokyo
Keywords: Reinforecment Learning, Deep Learning in Grasping and Manipulation, Factory Automation Abstract: Container loading by a picking robot is an important challenge in the logistics industry. When designing such a robotic system, item picking and placing have been planned individually thus far. However, since the condition of picking an item affects the possible candidates for placing, it is preferable to plan picking and placing simultaneously. In this paper, we propose a deep reinforcement learning (DRL) method for simultaneously planning item picking and placing. A technical challenge in the simultaneous planning is its scalability: even for a practical container size, DRL can be computationally intractable due to large action spaces. To overcome the intractability, we adopt a fully convolutional network for policy approximation and determine the action based only on local information. This enables us to produce a shared policy which can be applied to larger action spaces than the one used for training. We experimentally demonstrate that our method can successfully solve the simultaneous planning problem and achieve a higher occupancy rate than conventional methods.

12:00-12:15, Paper WeBT18.2
Distributed Reinforcement Learning of Targeted Grasping with Active Vision for Mobile Manipulators
Video Attachment

Fujita, Yasuhiro	Preferred Networks, Inc
Uenishi, Kota	Preferred Networks, Inc
Ummadisingu, Avinash	Preferred Networks, Inc
Nagarajan, Prabhat	Preferred Networks
Masuda, Shimpei	Preferred Networks
Ynocente Castro, Mario	Preferred Networks, Inc
Keywords: Reinforecment Learning, Deep Learning in Grasping and Manipulation, Perception for Grasping and Manipulation Abstract: Developing personal robots that can perform a diverse range of manipulation tasks in unstructured environments necessitates solving several challenges for robotic grasping systems. We take a step towards this broader goal by presenting the first RL-based system, to our knowledge, for a mobile manipulator that can (a) achieve targeted grasping generalizing to unseen target objects, (b) learn complex grasping strategies to succeed in cluttered scenes with occluded objects, and (c) perform active vision through its movable wrist camera to better locate and grasp occluded objects. The system is informed of the target object to grasp in the form of a single, arbitrary-pose RGB image of that object, enabling it to generalize to unseen objects without retraining. To achieve such a system, we combine several advances in deep reinforcement learning and present a large-scale distributed training system using synchronous SGD that seamlessly scales to multi-node, multi-GPU infrastructure to make rapid prototyping easier. We train and evaluate our system in the simulated environment, identify key components for improving performance, analyze its behaviors, and transfer to a real-world setup.

12:15-12:30, Paper WeBT18.3
SQUIRL: Robust and Efficient Learning from Video Demonstration of Long-Horizon Robotic Manipulation Tasks

Wu, Bohan	Columbia University
Xu, Feng	Columbia University
He, Zhanpeng	University of Southern California
Gupta, Abhi	Columbia University
Allen, Peter	Columbia University
Keywords: Learning from Demonstration, Imitation Learning, Reinforecment Learning Abstract: Recent advances in deep reinforcement learning (RL) have demonstrated its potential to learn complex robotic manipulation tasks. However, RL still requires the robot to collect a large amount of real-world experience. To address this problem, recent works have proposed learning from expert demonstrations (LfD), particularly via inverse reinforcement learning (IRL), given its ability to achieve robust performance with only a small number of expert demonstrations. Nevertheless, deploying IRL on real robots is still challenging due to the large number of robot experiences it requires. This paper aims to address this scalability challenge with a robust, sample-efficient, and general meta-IRL algorithm, SQUIRL, that performs a new but related long-horizon task robustly given only a single video demonstration. First, this algorithm bootstraps the learning of a task encoder and a task-conditioned policy using behavioral cloning (BC). It then collects real-robot experiences and bypasses reward learning by directly recovering a Q-function from the combined robot and expert trajectories. Next, this algorithm uses the learned Q-function to re-evaluate all cumulative experiences collected by the robot to improve the policy quickly. In the end, the policy performs more robustly (90%+ success) than BC on new tasks while requiring no experiences at test time. Finally, our real-robot and simulated experiments demonstrate our algorithm's generality across different state spaces, action spaces, and vision-based manipulation tasks, e.g., pick-pour-place and pick-carry-drop.

12:30-12:45, Paper WeBT18.4
Meta-Reinforcement Learning for Robotic Industrial Insertion Tasks
Video Attachment

Schoettler, Gerrit	Siemens Corporation
Nair, Ashvin	UC Berkeley
Aparicio Ojea, Juan	Siemens
Levine, Sergey	UC Berkeley
Solowjow, Eugen	Siemens Corporation
Keywords: Reinforecment Learning, Deep Learning in Grasping and Manipulation, Industrial Robots Abstract: Robotic insertion tasks are characterized by contact and friction mechanics, making them challenging for conventional feedback control methods due to unmodeled physical effects. Reinforcement learning (RL) is a promising approach for learning control policies in such settings. However, RL can be unsafe during exploration and might require a large amount of real-world training data, which is expensive to collect. In this paper, we study how to use meta-reinforcement learning to solve the bulk of the problem in simulation by solving a family of simulated industrial insertion tasks and then adapt policies quickly in the real world. We demonstrate our approach by training an agent to successfully perform challenging real-world insertion tasks using less than 20 trials of real-world experience.

12:45-13:00, Paper WeBT18.5
Learning Motion Parameterizations of Mobile Pick and Place Actions from Observing Humans in Virtual Environments
Video Attachment

Kazhoyan, Gayane	University of Bremen
Hawkin, Alina	University of Bremen
Koralewski, Sebastian	University of Bremen
Haidu, Andrei	University Bremen
Beetz, Michael	University of Bremen
Keywords: Mobile Manipulation, Virtual Reality and Interfaces, Learning from Demonstration Abstract: In this paper, we present an approach and an implemented pipeline for transferring data acquired from observing humans in virtual environments onto robots acting in the real world, and adapting the data accordingly to achieve successful task execution. We demonstrate our pipeline by inferring seven different symbolic and subsymbolic motion parameters of mobile pick and place actions, which allows the robot to set a simple breakfast table. We propose an approach to learn general motion parameter models and discuss, which parameters can be learned at which abstraction level.

13:00-13:15, Paper WeBT18.6
Learning Variable Impedance Control for Contact Sensitive Tasks
Video Attachment

Bogdanovic, Miroslav	Max Planck Institute for Intelligent Systems
Khadiv, Majid	Max Planck Institute for Intelligent Systems
Righetti, Ludovic	New York University
Keywords: Reinforecment Learning, Compliance and Impedance Control, Motion Control Abstract: Reinforcement learning algorithms have shown great success in solving different problems ranging from playing video games to robotics. However, they struggle to solve delicate robotic problems, especially those involving contact interactions. Though in principle a policy directly outputting joint torques should be able to learn to perform these tasks, in practice we see that it has difficulty to robustly solve the problem without any given structure in the action space. In this paper, we investigate how the choice of action space can give robust performance in presence of contact uncertainties. We propose learning a policy giving as output impedance and desired position in joint space and compare the performance of that approach to torque and position control under different contact uncertainties. Furthermore, we propose an additional reward term designed to regularize these variable impedance control policies, giving them interpretability and facilitating their transfer to real systems. We present extensive experiments in simulation of both floating and fixed-base systems in tasks involving contact uncertainties, as well as results for running the learned policies on a real system (accompanying videos can be seen here: https://youtu.be/AQuuQ-h4dBM).


WeBT19	Room T19
Haptics I	Regular session
Chair: Gorlewicz, Jenna	Saint Louis University
Co-Chair: Morimoto, Tania	University of California San Diego

11:45-12:00, Paper WeBT19.1
A Control Scheme for Haptic Inspection and Partial Modification of Kinematic Behaviors
Video Attachment

Papageorgiou, Dimitrios	Aristotle University of Thessaloniki
Doulgeri, Zoe	Aristotle University of Thessaloniki
Keywords: Haptics and Haptic Interfaces Abstract: Over the last decades, Learning from Demonstration (LfD) has become a widely accepted solution for the problem of robot programming. According to LfD, the kinematic behavior is "taught" to the robot, based on a set of motion demonstrations performed by the human-teacher. The demonstrations can be either captured via kinesthetic teaching or external sensors, e.g., a camera. In this work, a controller for providing haptic cues of the robot's kinematic behavior to the human-teacher is proposed. Guidance is provided in procedures of kinesthetic coaching during inspection and partial modification of encoded motions. The proposed controller is based on an artificial potential field, designed to adjust the intensity of the haptic communication automatically according to the human intentions. The control scheme is proved to be passive with respect to robot's velocity and its effectiveness is experimentally evaluated in a KUKA LWR4+ robotic manipulator.

12:00-12:15, Paper WeBT19.2
Goal-Driven Variable Admittance Control for Robot Manual Guidance

Bazzi, Davide	Politecnico Di Milano
Lapertosa, Miriam	Politecnico Di Milano
Zanchettin, Andrea Maria	Politecnico Di Milano
Rocco, Paolo	Politecnico Di Milano
Keywords: Physical Human-Robot Interaction, Compliance and Impedance Control, Industrial Robots Abstract: In this paper we address variable admittance control for human-robot physical interaction in manual guidance applications. In the proposed solution, the parameters of the admittance filter can change not only as a function of the current state of motion (i.e. whether the human guiding the robot ia accelerating or decelerating) but also with reference to a predefined goal position. The human is in fact gently guided towards the goal along some curved paths, where the damping is conveniently scaled in order to accommodate the motion towards the goal position. The algorithm also allows the human to reach goals that he/she cannot directly see because for example the transported object is bulky and obstructs the worker view. The performance of the proposed controller are evaluated by means of point to point cooperative motions with multiple volunteers using an ABB IRB140 robot.

12:15-12:30, Paper WeBT19.3
Physical Human-Robot Interaction with Real Active Surfaces Using Haptic Rendering on Point Clouds
Video Attachment

Sommerhalder, Michael	ETH Z�rich
Zimmermann, Yves Dominic	ETH Zurich
Cizmeci, Burak	ETH Zurich
Riener, Robert	ETH Zurich
Hutter, Marco	ETH Zurich
Keywords: Physical Human-Robot Interaction, Collision Avoidance, Haptics and Haptic Interfaces Abstract: During robot-assisted therapy of hemiplegic patients, interaction with the patient must be intrinsically safe. Straight-forward collision avoidance solutions can provide this safety requirement with conservative margins. These margins heavily reduce the robot's workspace and make interaction with the patient�s unguided body parts impossible. However, interaction with the own body is highly beneficial from a therapeutic point of view. We tackle this problem by combining haptic rendering techniques with classical computer vision methods. Our proposed solution consists of a pipeline that builds collision objects from point clouds in real-time and a controller that renders haptic interaction. The raw sensor data is processed to overcome noise and occlusion problems. Our proposed approach is validated on the 6 DoF exoskeleton ANYexo for direct impacts, sliding scenarios, and dynamic collision surfaces. The results show that this method has the potential to successfully prevent collisions and allow haptic interaction for highly dynamic environments. We believe that this work significantly adds to the usability of current exoskeletons by enabling virtual haptic interaction with the patient�s body parts in human-robot therapy.

12:30-12:45, Paper WeBT19.4
Human-Drone Interaction for Aerially Manipulated Drilling Using Haptic Feedback
Video Attachment

Kim, Dongbin	University of Nevada, Las Vegas
Oh, Paul Y.	University of Nevada, Las Vegas (UNLV)
Keywords: Aerial Systems: Applications, Haptics and Haptic Interfaces, Mobile Manipulation Abstract: This paper presents a concept for haptic-based human-in-the-loop aerial manipulation for drilling. The concept serves as a case study for designing the human-drone interface to remotely drill with a mobile-manipulating drone. The notion of the work stems from using drones to perform dangerous tasks like material assembly, sensor insertion while being vertically elevated from bridge, wind turbine, and power line. Presented is the aerial manipulator, the customized haptic drill press, the gantry-based test-and-evaluation platform design, material drilling results in the gantry, and validation-and-verification results for indoor flight trials.

12:45-13:00, Paper WeBT19.5
Design and Implementation of a Haptic Measurement Glove to Create Realistic Human-Telerobot Interactions

Capelle, Evan	Saint Louis University
Benson, William	Saint Louis University
Anderson, Zach	Southern Illinois University Edwardsville
Weinberg, Jerry	Southern Illinois University Edwardsville
Gorlewicz, Jenna	Saint Louis University
Keywords: Physical Human-Robot Interaction, Haptics and Haptic Interfaces, Telerobotics and Teleoperation Abstract: Although research indicates that telepresence robots offer a more socially telepresent alternative to conventional forms of remote communication, the lack of touch-based interactions presents challenges for both remote and local users. In order to address these challenges, we have designed and implemented robotic manipulator emulating a human arm. However, that too presents difficulties, as contact interactions like handshakes may feel awkward and unnatural to local users. In this work, we present the design of a wearable haptic measurement glove (HMG) and use it to collect force and inertial data on handshakes in human-human and human-robot interactions in the interest of developing intelligent shared control algorithms for natural, human-like contact interactions in human-robot interactions.


WeBT20	Room T20
Haptics II	Regular session
Chair: Abbott, Jake	University of Utah
Co-Chair: Haddadin, Sami	Technical University of Munich

11:45-12:00, Paper WeBT20.1
Feeling the True Force in Haptic Telepresence for Flying Robots
Video Attachment

Moortgat-Pick, Alexander	Technical University of Munich (TUM)
Adamczyk, Anna	Technical University of Munich (TUM)
Tomic, Teodor	Skydio
Haddadin, Sami	Technical University of Munich
Keywords: Haptics and Haptic Interfaces, Aerial Systems: Mechanics and Control, Telerobotics and Teleoperation Abstract: Haptic feedback in teleoperation of flying robots can enable safe flight in unknown and densely cluttered environments. It is typically part of the robot�s control scheme and used to aid navigation and collision avoidance using artificial force fields displayed to the operator. However, to achieve fully immersive embodiment in this context, high fidelity force feedback is needed. In this paper we present a telepresence scheme that provides haptic feedback of the external forces or wind speed acting on the robot, leveraging the ability of a state-of-the-art flying robot to estimate these values online. As a result, we achieve true force feedback telepresence in flying robots by rendering the actual forces acting on the system. To the authors' knowledge, this is the first telepresence scheme for flying robots that is able to feedback real contact forces and does not depend on artificial representations of forces. The proposed event-based teleoperation scheme is stable under varying latency conditions. Secondly, we present a fundamental haptic interface design such that any haptic interface with at least as many force-sensitive and active degrees of freedom as the flying robot can implement this telepresence architecture. The approach is validated experimentally using a Skydio R1 autonomous flying robot in combination with a ForceDimension sigma.7 and a FrankaEmika Panda as haptic devices.

12:00-12:15, Paper WeBT20.2
Hybrid Force-Moment Braking Pulse: A Haptic Illusion to Increase the Perceived Hardness of Virtual Surfaces

Pourkand, Ashkan	Univerisyt of Utah
Abbott, Jake	University of Utah
Keywords: Haptics and Haptic Interfaces, Virtual Reality and Interfaces Abstract: A perennial challenge when rendering a virtual surface with an impedance-type haptic interface is making the surface feel hard without destroying its realism, since simply increasing its stiffness can lead to instability. One way to increase the perceived hardness without increasing stiffness is to implement a braking pulse or other high-frequency haptic contact event. Traditionally, such events are implemented as a force along the surface normal, which may leave some of the actuators of the haptic device underutilized. We propose a hybrid force-moment braking pulse, which includes a nonrealistic rendered moment to exploit a haptic illusion. We describe how to implement such a hybrid force-moment braking pulse in general, considering the saturation of the haptic device's actuators. In a human-subject study, we find that a virtual surface rendered with these hybrid force-moment braking pulses is perceived as harder than the same virtual surface rendered with a traditional braking pulse, without harming the surface's realism, for the majority of users. The moment-based haptic illusion also has the potential to be superimposed on other types of haptic contact events to improve the perceived hardness.

12:15-12:30, Paper WeBT20.3
End-To-End Tactile Feedback Loop: From Soft Sensor Skin Over Deep GRU-Autoencoders to Tactile Stimulation

Geier, Andreas	Waseda University
Tucker, Rawleigh C. Y.	Waseda University
Somlor, Sophon	Waseda University
Sawada, Hideyuki	Waseda University
Sugano, Shigeki	Waseda University
Keywords: Haptics and Haptic Interfaces, Soft Sensors and Actuators, AI-Based Methods Abstract: Tactile feedback is a key sensory channel that contributes to our ability to perform precise manipulations. In this regard, sensor skin provides robots with the sense of touch making them increasingly capable of dexterous object manipulation. However, in applications like teleoperation, the complex sensory input of an infinite number of different textures must be projected to the human user�s skin in a meaningful manner. In addressing this issue, a deep gated recurrent unit-based autoencoder (GRU-AE) that captured the perceptual dimensions of tactile textures in latent space was deployed to implicitly understand unseen textures. The expression of unknown textures in this latent space allowed for the definition of a control law to effectively drive tactile displays and to convey tactile feedback in a psycho-physically meaningful manner. The approach was experimentally verified by evaluating the prediction performance of the GRU-AE on seen and unseen data that were gathered during active tactile exploration of objects commonly encountered in daily living. A user study on a custom-made tactile display was conducted in which real tactile perceptions in response to active tactile object exploration were compared to the emulated tactile feedback using the proposed tactile feedback loop. The results suggest that the deep GRU-AE for tactile display control offers an efficient and intuitive method for efficient end-to-end tactile feedback during active tactile texture exploration.

12:30-12:45, Paper WeBT20.4
Human Navigation Using Phantom Tactile Sensation Based Vibrotactile Feedback
Video Attachment

Liao, Zhenyu	Tohoku University
Salazar Luces, Jose Victorio	Tohoku University
Hirata, Yasuhisa	Tohoku University
Keywords: Human-Centered Automation, Physically Assistive Devices, Wearable Robots Abstract: In recent years, multiple navigation systems using vibrotactile feedback have been studied, due to their ability to convey information while keeping free the visual and auditory channels, besides eliciting rapid responses from users. In the current stage, most navigation systems with vibrotactile feedback in the literature focus on guiding users around space using a fixed number of vibrotactile cues, which are limited by the number of vibrators. Achieving more precise guidance with a limited number of conveyable directions is difficult, as users cannot be directly guided to the desired position. In this paper, we present an approach to guide people around space using multidirectional vibrotactile feedback (MVF). The MVF can produce vibratory cues on the user�s left lower leg with an average directional resolution of 15.35◦ for cases when the user is not moving, using only six vibration motors, by exploiting a vibrotactile illusion called Phantom Tactile Sensation (PTS). In a preliminary test of dynamic direction recognition experiment, users reported they tend to become less sensitive to vibration under long-time continuous vibration. As a result, besides offering users a continuous vibration to indicate directions, we also considered producing the cues during either the swing or stance phase to users in this experiment. The result of a direction recognition experiment while walking shows that the average recognition error for the cues when produced in the swing or stance phases are lower than the recognition error when the cues are continuously produced. We carried out a navigation experiment to test the feasibility of using the proposed direction display to guide people around an open area in real-time. In this experiment, users were able to reach the goal within the time limit guided only by the proposed feedback around 90% of the times for both gait phases.


WeBT21	Room T21
Tactile Sensing I	Regular session
Chair: Ritter, Helge Joachim	Bielefeld University
Co-Chair: Yamamoto, Akio	The University of Tokyo

11:45-12:00, Paper WeBT21.1
Barometer-Based Tactile Skin for Anthropomorphic Robot Hand
Video Attachment

K�iva, Risto	Bielefeld University
Schwank, Tobias	Bielefeld University
Walck, Guillaume	Bielefeld University
Meier, Martin	Bielefeld University
Haschke, Robert	Bielefeld University
Ritter, Helge Joachim	Bielefeld University
Keywords: Force and Tactile Sensing, In-Hand Manipulation, Object Detection, Segmentation and Categorization Abstract: We present our second generation tactile sensor for the Shadow Dexterous Hand's palm. We were able to significantly improve the tactile sensor characteristics by utilizing our latest barometer-based tactile sensing technology with linear (R² ≥ 0.9996) sensor output and no noticeable hysteresis. The sensitivity threshold of the tactile cells and the spatial density were both dramatically increased. We demonstrate the benefits of the new sensor by re-running an experiment to estimate the stiffness of different objects that we originally used to test our first generation palm sensor. The results underline a considerable performance boost in estimation accuracy, just due to the improved tactile skin. We also propose a revised neural network architecture that even further improves the average classification accuracy to 96% in a 5-fold cross-validation.

12:00-12:15, Paper WeBT21.2
Adaptive Potential Scanning for a Tomographic Tactile Sensor with High Spatio-Temporal Resolution
Video Attachment

Mitsubayashi, Hiroki	The University of Tokyo
Yoshimoto, Shunsuke	The University of Tokyo
Yamamoto, Akio	The University of Tokyo
Keywords: Force and Tactile Sensing, Object Detection, Segmentation and Categorization, Haptics and Haptic Interfaces Abstract: A tactile sensor with high spatio-temporal resolution will greatly contribute to improving the performance of object recognition and human interaction in robots. In addition, being able to switch between higher spatial and higher temporal resolution will allow for more versatile sensing. To realize such a sensor, this paper introduces a method of increasing the sensing electrodes and adaptively selecting the grounding conditions in a tomography based tactile sensor. Several types of grounding conditions are proposed and evaluated using spatio-temporal metrics. As a result, the grounding method based on the location of contact had a good balance of temporal resolution (1 ms) and spatial resolution (only 1.55 times larger than using all electrodes as grounding conditions). When reconstructing dynamic contact data, the proposed method was able to obtain a much higher detailed waveform compared to the conventional method. By using the proposed method as default and switching to other grounding methods depending on the purpose of sensing, a versatile tactile sensor with high spatio-temporal resolution can be made.

12:15-12:30, Paper WeBT21.3
A Biomimetic Tactile Fingerprint Induces Incipient Slip

James, Jasper Wollaston	University of Bristol
Redmond, Stephen	University of New South Wales
Lepora, Nathan	University of Bristol
Keywords: Force and Tactile Sensing, Biomimetics, Soft Sensors and Actuators Abstract: We present a modified TacTip biomimetic optical tactile sensor design which demonstrates the ability to induce and detect incipient slip, as confirmed by recording the movement of markers on the sensor's external surface. Incipient slip is defined as slippage of part, but not all, of the contact surface between the sensor and object. The addition of ridges - which mimic the friction ridges in the human fingertip - in a concentric ring pattern allowed for localised shear deformation to occur on the sensor surface for a significant duration prior to the onset of gross slip. By detecting incipient slip we were able to predict when several differently shaped objects were at risk of falling and prevent them from doing so. Detecting incipient slip is useful because a corrective action can be taken before slippage occurs across the entire contact area thus minimising the risk of objects been dropped.

12:30-12:45, Paper WeBT21.4
Noncontact Estimation of Stiffness Based on Optical Coherence Elastography under Acoustic Radiation Pressure

Hashimoto, Yuki	Keio Univerisity
Monnai, Yasuaki	Keio University
Keywords: Force and Tactile Sensing, Medical Robots and Systems Abstract: In this study, we propose a method of noncontact elastography, which allows us to investigate stiffness of soft structures by combining optical and acoustic modalities. We use optical coherence tomography (OCT) as a means of detecting internal deformation of a sample appearing in response to a mechanical force applied by acoustic radiation pressure. Unlike most of other stiffness sensing, this method can be performed without any contacts between the sample and actuator that generates pressure. To demonstrate the method, we measure the vibration velocity of a uniform phantom made of polyurethane, and characterize the mechanical parameters. We then confirm that the measured and calculated attenuation of the vibration over the depth agree well, which is inaccessible with a conventional laser Doppler vibrometer. This result paves a way to characterize more complex internal structures of soft materials.

12:45-13:00, Paper WeBT21.5
Deep Tactile Experience: Estimating Tactile Sensor Output from Depth Sensor Data
Video Attachment

Patel, Karankumar	Honda Research Institute
Iba, Soshi	Honda Research Institute USA
Jamali, Nawid	Honda Research Institute USA
Keywords: Force and Tactile Sensing Abstract: Tactile sensing is inherently contact based. To usetactile data, robots need to make contact with the surface of an object. This is inefficient in applications where an agent needs to make a decision between multiple alternatives that depend the physical properties of the contact location. We propose a method to get tactile data in a non-invasive manner. The proposed method estimates the output of a tactile sensor from the depth data of the surface of the object based on past experiences. An experience dataset is built by allowing the robot to interact with various objects, collecting tactile data and the corresponding object surface depth data. We use the experience dataset to train a neural network to estimate the tactile output from depth data alone. We use GelSight tactile sensors, an image-based sensor, to generate images that capture detailed surface features at the contact location. We train a network with a dataset containing 578 tactile-image to depth-map correspondences. Given a depth-map of the surface of an object, the network outputs an estimate of the response of the tactile sensor, should it make a contact with the object. We evaluate the method with structural similarity index matrix (SSIM), a similarity metric between two images commonly used in image processing community. We present experimental results that show the proposed method outperforms a baseline that uses random images with statistical significance getting an SSIM score of 0.84�0.0056 and 0.80�0.0036,respectively.

13:00-13:15, Paper WeBT21.6
Learning to Live Life on the Edge: Online Learning for Data-Efficient Tactile Contour Following
Video Attachment

Stone, Elizabeth Anne	Bristol Robotics Laboratory
Lepora, Nathan	University of Bristol
Barton, David A. W.	University of Bristol
Keywords: Force and Tactile Sensing, Reactive and Sensor-Based Planning, Model Learning for Control Abstract: Tactile sensing has been used for a variety of robotic exploration and manipulation tasks but a common constraint is a requirement for a large amount of training data. This paper addresses the issue of data-efficiency by proposing a novel method for online learning based on a Gaussian Process Latent Variable Model (GP-LVM), whereby the robot learns from tactile data whilst performing a contour following task thus enabling generalisation to a wide variety of tactile stimuli. The results show that contour following is successful with comparatively little data and is robust to novel stimuli. This work highlights that even with a simple learning architecture there are significant advantages to be gained in efficient and robust task performance by using latent variable models and online learning for tactile sensing tasks. This paves the way for a new generation of robust, fast, and data-efficient tactile systems.

13:00-13:15, Paper WeBT21.7
Interactive Tactile Perception for Classification of Novel Object Instances
Video Attachment

Corcodel, Radu Ioan	Mitsubishi Electric Research Laboratories
Jain, Siddarth	Northwestern University, Shirley Ryan AbilityLab, Mitsubishi Ele
Vanbaar, Jeroen	MERL
Keywords: Force and Tactile Sensing, Perception for Grasping and Manipulation, Multi-Modal Perception Abstract: In this paper, we present a novel approach for classification of unseen object instances from interactive tactile feedback. Furthermore, we demonstrate the utility of a low resolution tactile sensor array for tactile perception that can potentially close the gap between vision and physical contact for manipulation. We contrast our sensor to high-resolution camera-based tactile sensors. Our proposed approach interactively learns a one-class classification model using 3D tactile descriptors, and thus demonstrates an advantage over the existing approaches, which require pre-training on objects. We describe how we derive 3D features from the tactile sensor inputs, and exploit them for learning one-class classifiers. In addition, since our proposed method uses unsupervised learning, we do not require ground truth labels. This makes our proposed method flexible and more practical for deployment on robotic systems. We validate our proposed method on a set of household objects and results indicate good classification performance in real-world experiments.


WeBT22	Room T22
Tactile Sensing II	Regular session
Chair: Barton, David A. W.	University of Bristol
Co-Chair: Luo, Shan	University of Liverpool

11:45-12:00, Paper WeBT22.1
Walking on TacTip Toes: A Tactile Sensing Foot for Walking Robots
Video Attachment

Stone, Elizabeth Anne	Bristol Robotics Laboratory
Lepora, Nathan	University of Bristol
Barton, David A. W.	University of Bristol
Keywords: Force and Tactile Sensing, Multi-legged Robots, Sensor-based Control Abstract: Little research into tactile feet has been done for walking robots despite the benefits such feedback could give when walking on uneven terrain. This paper describes the development of a simple, robust and inexpensive tactile foot for legged robots based on a high-resolution biomimetic TacTip tactile sensor. Several design improvements were made to facilitate tactile sensing while walking, including the use of phosphorescent markers to remove the need for internal LED lighting. The usefulness of the foot is verified on a quadrupedal robot performing a beam walking task and it is found the sensor prevents the robot falling off the beam. Further, this capability also enables the robot to walk along the edge of a curved table. This tactile foot design can be easily modified for use with any legged robot, including much larger walking robots, enabling stable walking in challenging terrain.

12:00-12:15, Paper WeBT22.2
TactileSGNet: A Spiking Graph Neural Network for Event-Based Tactile Object Recognition

Gu, Fuqiang	National University of Singapore
Sng, Weicong	National University of Singapore
Taunyazov, Tasbolat	National University of Singapore
Soh, Harold	National Universtiy of Singapore
Keywords: Force and Tactile Sensing, Biologically-Inspired Robots, Novel Deep Learning Methods Abstract: Tactile perception is crucial for a variety of robot tasks including grasping and in-hand manipulation. New advances in flexible, event-driven, electronic skins may soon endow robots with touch perception capabilities similar to humans. These electronic skins respond asynchronously to changes (e.g., in pressure, temperature), and can be laid out irregularly on the robot's body or end-effector. However, these unique features may render current deep learning approaches such as convolutional feature extractors unsuitable for tactile learning. In this paper, we propose a novel spiking graph neural network for event-based tactile object recognition. To make use of local connectivity of taxels, we present several methods for organizing the tactile data in a graph structure. Based on the constructed graphs, we develop a spiking graph convolutional network. The event-driven nature of spiking neural network makes it arguably more suitable for processing the event-based data. Experimental results on two tactile datasets show that the proposed method outperforms other state-of-the-art spiking methods, achieving high accuracies of approximately 90% when classifying a variety of different household objects.

12:15-12:30, Paper WeBT22.3
A Miniaturised Neuromorphic Tactile Sensor Integrated with an Anthropomorphic Robot Hand

Ward-Cherrier, Benjamin	University of Bristol
Conradt, Jorg	KTH Royal Institute of Technology
Catalano, Manuel Giuseppe	Istituto Italiano Di Tecnologia
Bianchi, Matteo	University of Pisa
Lepora, Nathan	University of Bristol
Keywords: Force and Tactile Sensing, Grippers and Other End-Effectors Abstract: Restoring tactile sensation is essential to enable in-hand manipulation and the smooth, natural control of upper-limb prosthetic devices. Here we present a platform to contribute to that long-term vision, combining an anthropomorphic robot hand (QB SoftHand) with a neuromorphic optical tactile sensor (neuroTac). Neuromorphic sensors aim to produce efficient, spike-based representations of information for bio-inspired processing. The development of this 5-fingered, sensorized hardware platform is validated with a customized mount allowing manual control of the hand. The platform is demonstrated to succesfully identify 4 objects from the YCB object set, and accurately discriminate between 4 directions of shear during stable grasps. This platform could lead to wide-ranging developments in the areas of haptics, prosthetics and telerobotics.

12:30-12:45, Paper WeBT22.4
Fast Texture Classification Using Tactile Neural Coding and Spiking Neural Network

Taunyazov, Tasbolat	National University of Singapore
Chua, Yansong	Huawei Technologies Co
Gao, Ruihan	Nanyang Technological University
Soh, Harold	National Universtiy of Singapore
Wu, Yan	A*STAR Institute for Infocomm Research
Keywords: Force and Tactile Sensing, Sensor Fusion, Neurorobotics Abstract: Touch is arguably the most important sensing modality in physical interactions. However, tactile sensing has been largely under-explored in robotics applications owing to the complexity in making perceptual inferences until the recent advancements in machine learning or deep learning in particular. Touch perception is strongly influenced by both its temporal dimension similar to audition and its spatial dimension similar to vision. While spatial cues can be learned episodically, temporal cues compete against the system�s response/reaction time to provide accurate inferences. In this paper, we propose a fast tactile-based texture classification framework which makes use of the spiking neural network to learn from the neural coding of the conventional tactile sensor readings. The framework is implemented and tested on two independent tactile datasets collected in sliding motion on 20 material textures. Our results show that the framework is able to make much more accurate inferences ahead of time as compared to that by the state-of-the-art learning approaches.

12:45-13:00, Paper WeBT22.5
Spatio-Temporal Attention Model for Tactile Texture Recognition

Cao, Guanqun	University of Liverpool
Zhou, Yi	University of Liverpool
Bollegala, Danushka	University of Liverpool
Luo, Shan	University of Liverpool
Keywords: Force and Tactile Sensing, Perception for Grasping and Manipulation, Grasping Abstract: Recently, tactile sensing has attracted great interest in robotics, especially for facilitating exploration of unstructured environments and effective manipulation. A detailed understanding of the surface textures via tactile sensing is essential for many of these tasks. Previous works on texture recognition using camera based tactile sensors have been limited to treating all regions in one tactile image or all samples in one tactile sequence equally, which includes much irrelevant or redundant information. In this paper, we propose a novel Spatio-Temporal Attention Model (STAM) for tactile texture recognition, which is the very first of its kind to our best knowledge. The proposed STAM pays attention to both spatial focus of each single tactile texture and the temporal correlation of a tactile sequence. In the experiments to discriminate 100 different fabric textures, the spatially and temporally selective attention has resulted in a significant improvement of the recognition accuracy, by up to 18.8%, compared to the non-attention based models. Specifically, after introducing noisy data that is collected before the contact happens, our proposed STAM can learn the salient features efficiently and the accuracy can increase by 15.23% on average compared with the CNN based baseline approach. The improved tactile texture perception can be applied to facilitate robot tasks like grasping and manipulation.

13:00-13:15, Paper WeBT22.6
GelTip: A Finger-Shaped Optical Tactile Sensor for Robotic Manipulation
Video Attachment

Fernandes Gomes, Daniel	University of Liverpool
Lin, Zhonglin	Fuzhou University
Luo, Shan	University of Liverpool
Keywords: Force and Tactile Sensing, Soft Sensors and Actuators, Industrial Robots Abstract: Sensing contacts throughout the fingers is an essential capability for a robot to perform manipulation tasks in cluttered environments. However, existing tactile sensors either only have a flat sensing surface or a compliant tip with a limited sensing area. In this paper, we propose a novel optical tactile sensor, the GelTip, that is shaped as a finger and can sense contacts on any location of its surface. The sensor captures high-resolution and color-invariant tactile image that can be exploited to extract detailed information about the end-effector's interactions against manipulated objects. Our extensive experiments show that the GelTip sensor can effectively localise the contacts on different locations its finger-shaped body, with a small localisation error of approximately 5 mm, on average, and under 1 mm in the best cases. The obtained results show the potential of the GelTip sensor in facilitating dynamic manipulation tasks with its all-round tactile sensing capability. The sensor models and further information about the GelTip sensor can be found at http://danfergo.github.io/geltip.


WeBT23	Room T23
Grippers & Other End Effectors	Regular session
Chair: Zhao, Jianguo	Colorado State University
Co-Chair: Dollar, Aaron	Yale University

11:45-12:00, Paper WeBT23.1
Highly Underactuated Radial Gripper for Automated Planar Grasping and Part Fixturing
Video Attachment

Patel, Vatsal	Yale University
Morgan, Andrew	Yale University
Dollar, Aaron	Yale University
Keywords: Mechanism Design, Grippers and Other End-Effectors, Grasping Abstract: Grasping can be conceptualized as the ability of an end-effector to temporarily attach or fixture an object to a manipulator-constraining all motion of the workpiece with respect to the end-effector's base frame. This seemingly simplistic action often requires excessive sensing, computation, or control to achieve with multi-fingered hands, which can be mitigated with underactuated mechanisms. In this work, we present the analysis of radial graspers for automated part fixturing and grasping in the plane with a design implementation of a single-actuator, 8-finger gripper. By leveraging a passively adaptable mechanism that is under-constrained pre-contact, the gripper conforms to arbitrary object geometries and locks post-contact as to provide form closure around the object. We also justify that 8 radially symmetric fingers with passive locking are sufficient to create robust form closure grasps on arbitrary planar objects. The underlying mechanism of the gripper is described in detail, with analysis of its highly underactuated nature, and the resulting form closure ability. We show with a wide variety of objects that the gripper is able to acquire robust grasps on all of them, and maintain maximal quality form closure on most objects, with each finger exerting equal grasp force within �2.48 N.

12:00-12:15, Paper WeBT23.2
Soft-Bubble Grippers for Robust and Perceptive Manipulation
Video Attachment

Kuppuswamy, Naveen	Toyota Research Institute
Alspach, Alex	Toyota Research Institute
Uttamchandani, Avinash	Toyota Research Institute
Creasey, Sam	Toyota Research Institute
Ikeda, Takuya	Kyoto University, Toyota Motor Corporation
Tedrake, Russ	Massachusetts Institute of Technology
Keywords: Perception for Grasping and Manipulation, Force and Tactile Sensing, Soft Robot Materials and Design Abstract: Manipulation in cluttered environments like homes requires stable grasps, precise placement and robustness against external contact. Towards addressing these challenges, we present the soft-bubble gripper system that combines highly compliant gripping surfaces with dense-geometry visuotactile sensing and facilitates multiple kinds of tactile perception. We first present several mechanical design advances on the soft-bubble sensors including a fabrication technique to deposit custom patterns to the internal surface of the sensor membrane that enables tracking of shear-induced displacement of the grasped object. The depth maps output by the internal imaging sensor are used in an in-hand proximity pose estimation framework - the method better captures distances to corners or edges on the object geometry. We also extend our previous work on tactile classification and integrate the system within a robust manipulation pipeline for cluttered home environments. The capabilities of the proposed system are demonstrated through robust execution of multiple real-world manipulation tasks.

12:15-12:30, Paper WeBT23.3
Design and Experimentation of a Variable Stiffness Bistable Gripper
Video Attachment

Lerner, Elisha	Colorado State University
Zhang, Haijie	Colorado State University
Zhao, Jianguo	Colorado State University
Keywords: Grippers and Other End-Effectors, Soft Robot Materials and Design, Soft Robot Applications Abstract: Grasping and manipulating objects is an integral part of many robotic systems. Both soft and rigid grippers have been investigated for manipulating objects in a multitude of different roles. Rigid grippers can hold heavy objects and apply large amounts of force, while soft grippers can conform to the size and shape of objects as well as protect fragile objects from excess stress. However, grippers that possess the qualities of both rigid and soft grippers are under-explored. In this paper, we present a novel gripper with two distinct properties: 1) it can vary its stiffness to become either a soft gripper that can conform its shape to fit complex objects or a rigid gripper that can hold a large weight; 2) when the gripper is soft, it has two stable states (i.e., bistable): open and closed: allowing it to be closed without an actuator but through contact force with a target object. The variable stiffness is accomplished by heating a shape memory polymer (SMP) material through its glass transition temperature. The bistability is achieved by shaping the gripper�s energy landscape through two elastic elements. This paper details the design and fabrication process of this gripper, as well as quantifies the influence of temperature variations on this gripper. The capability of the gripper is experimentally verified by grasping different objects with various shapes and weights. We expect such a gripper to be suitable for many applications that traditionally require either a rigid or a soft gripper.

12:30-12:45, Paper WeBT23.4
A Tendon-Driven Robot Gripper with Passively Switchable Underactuated Surface and Its Physics Simulation Based Parameter Optimization
Video Attachment

Ko, Tianyi	Preferred Networks, Inc
Keywords: Grippers and Other End-Effectors, Mechanism Design, Grasping Abstract: We propose a single-actuator gripper that can lift thin objects lying on a flat surface, in addition to the ability as a standard parallel gripper. The key is a crawler on the fingertip, which is underactuated together with other finger joints and switched with a passive and spring-loaded mechanism. The gripper can passively change the mode from the parallel approach mode to the pull-in mode, then finally to the power grasp mode, according to the grasping state. To optimize the highly underactuated system, we take a combination of black-box optimization and physics simulation of the whole grasp process. We show that this simulation-based approach can effectively consider the pre-contact motion, in-hand manipulation, power grasp stability, and even failure mode, which is difficult for the static equilibrium analysis based approaches. In the last part of the paper, we demonstrate that a prototype gripper with the proposed structure and design parameters optimized under the proposed process could successfully power grasp a thin sheet, a softcover book, and a cylinder lying on a flat surface.

12:45-13:00, Paper WeBT23.5
BLT Gripper: An Adaptive Gripper with Active Transition Capability between Precise Pinch and Compliant Grasp
Video Attachment

Kim, Yong-Jae	Korea University of Technology and Education
Song, Hansol	Koreatech
Maeng, Chan-Young	Korea University of Technology and Education (KOREATECH)
Keywords: Mechanism Design, Grippers and Other End-Effectors, In-Hand Manipulation Abstract: Achieving both the precise pinching and compliant grasping capabilities is a challenging goal for most robotic hands and grippers. Moreover, an active transition between the pinching pose and the grasping pose is difficult for limited degrees-of-freedom (DOF) hands or grippers. Even when using high DOF robotic hands, it requires a substantial amount of control effort and information from tactile sensors. In this paper, a 3-finger, 5-DOF adaptive gripper with active transition capability is presented. Each finger is composed of a minimum number of components using one rigid link, one belt, one fingertip frame and one motor for flexion motion. This simple and unique finger structure enables precise parallel pinching and highly compliant stable grasping with evenly distributed pressure. The other two motors are used for fingertip angle adjustment and change of the finger orientation respectively. Thorough kinematic and force analysis with detailed descriptions of the mechanical design clearly shows controllable transition property and stable grasping performance. The experiments including the grasping force and pressure measurement verify the performance of the proposed gripper and prove the practical usefulness for real-world applications.

13:00-13:15, Paper WeBT23.6
Friction Identification in a Pneumatic Gripper

Romeo, Rocco Antonio	Istituto Italiano Di Tecnologia
Maggiali, Marco	Italian Institute of Technology
Pucci, Daniele	Italian Institute of Technology
Fiorio, Luca	Istituto Italiano Di Tecnologia
Keywords: Grippers and Other End-Effectors, Hydraulic/Pneumatic Actuators, Calibration and Identification Abstract: Mechanical systems are typically composed of anumber of contacting surfaces that move against each other.Such surfaces are subject to friction forces. These dissipatepart of the actuation energy and cause an undesired effecton the overall system functioning. Therefore, a suitable modelof friction is needed to elide its action. The choice of sucha model is not always straightforward, as it is influenced bythe system properties and dynamics. In this paper, we showthe identification of different friction models and evaluate theirprediction capability on an experimental dataset. Despite beingstate-of-the-art models, some modifications were introduced toimprove their performance. A pneumatic gripper was used tocollect the data for the models evaluation. Two experimentalsetups were mounted to execute the experiments: informationfrom two pressure sensors, a load cell and a position sensorwas employed for the identification. During the experiments,the gripper was actuated at different constant velocities. Resultsindicate that all the identified models offer a proper predictionof the real friction force.

13:00-13:15, Paper WeBT23.7
Vision and Force Based Autonomous Robotic Coating with Rollers
Video Attachment

Du, Yayun	University of California, Los Angeles
Deng, Zhaoxing	University of California, Los Angeles
Fang, Zicheng	University of California, Los Angeles
Wang, Yunbo	University of California, Los Angeles
Nagata, Taiki	University of California, Los Angeles
Bansal, Karan	North Dakota State University
Quadir, Mohiuddin	North Dakota State University
Khalid Jawed, Mohammad	University of California, Los Angeles
Keywords: Grippers and Other End-Effectors, Domestic Robots, Mechanism Design Abstract: Coating rollers are widely popular in structural painting, in comparison with brushes and sprayers, due to thicker paint layer, better color consistency, and effortless customizability of holder frame and naps. In this paper, we introduce a cost-effective method to employ a general purpose robot (Sawyer, Rethink Robotics) for autonomous coating. To sense the position and the shape of the target object to be coated, the robot is combined with an RGB-Depth camera. The combined system autonomously recognizes the number of faces of the object as well as their position and surface normal. Unlike related work based on two-dimensional RGB-based image processing, all the analyses and algorithms here employ three-dimensional point cloud data (PCD). The object model learned from the PCD is then autonomously analyzed to achieve optimal motion planning to avoid collision between the robot arm and the object. To achieve human-level performance in terms of the quality of coating using the bare minimum ingredients, a combination of our own passive and built-in active impedance control is implemented. The former is realized by installing an ultrasonic sensor at the end-effector of robot working with a customized compliant mass-spring-damper roller to keep a precise distance between the end-effector and surface to be coated, maintaining a fixed force. Altogether, the control approach mimics human painting as evidenced by experimental measurements on the thickness of the coating. Coating on two different polyhedral objects is also demonstrated to test the overall method.


WeCT1	Room T1
Calibration and Identification II	Regular session
Chair: Vidal-Calleja, Teresa A.	University of Technology Sydney
Co-Chair: Lin, Wen-Chieh	National Chiao Tung University

14:00-14:15, Paper WeCT1.1
Information Driven Self-Calibration for Lidar-Inertial Systems

Usayiwevu, Mitchell	University of Technology Sydney
Le Gentil, Cedric	University of Technology Sydney
Mehami, Jasprabhjit	The University of Technology Sydney
Yoo, Chanyeol	University of Technology Sydney
Fitch, Robert	University of Technology Sydney
Vidal-Calleja, Teresa A.	University of Technology Sydney
Keywords: Calibration and Identification, Sensor Fusion Abstract: Multi-modal estimation systems have the advantage of increased accuracy and robustness. To achieve accurate sensor fusion with these types of systems, a reliable extrinsic calibration between each sensor pair is critical. This paper presents a novel self-calibration framework for lidar-inertial systems. The key idea of this work is to use an informative path planner to find the admissible path that produces the most accurate calibration of such systems in an unknown environment within a given time budget. This is embedded into a simultaneous localization, mapping and calibration lidar-inertial system, which involves challenges in dealing with agile motions for excitation and large amount of data. Our approach has two stages: firstly, the environment is explored and mapped following a pre-defined path; secondly, the map is exploited to find a continuous and differentiable path that maximises the information gain within a sampling-based planner. We evaluate the proposed self-calibration method in a simulated environment and benchmark it with standard predefined paths to show its performance.

14:15-14:30, Paper WeCT1.2
Targetless Calibration of LiDAR-IMU System Based on Continuous-Time Batch Estimation

Lv, Jiajun	Zhejiang University
Xu, Jinhong	University
Hu, Kewei	Zhejiang University
Liu, Yong	Zhejiang University
Zuo, Xingxing	Zhejiang University
Keywords: Calibration and Identification, Sensor Fusion Abstract: Sensor calibration is the fundamental block for a multi-sensor fusion system. This paper presents an accurate and repeatable LiDAR-IMU calibration method (termed LI-Calib), to calibrate the 6-DOF extrinsic transformation between the 3D LiDAR and the Inertial Measurement Unit (IMU). Regarding the high data capture rate for LiDAR and IMU sensors, LI-Calib adopts a continuous-time trajectory formulation based on B-Spline, which is more suitable for fusing high-rate or asynchronous measurements than discrete-time based approaches. Additionally, LI-Calib decomposes the space into cells and identifies the planar segments for data association, which renders the calibration problem well-constrained in usual scenarios without any artificial targets. We validate the proposed calibration approach on both simulated and real-world experiments. The results demonstrate the high accuracy and good repeatability of the proposed method in common human-made scenarios. To benefit the research community, we open-source our code at url{https://github.com/APRIL-ZJU/lidar_IMU_calib}

14:30-14:45, Paper WeCT1.3
Extrinsic and Temporal Calibration of Automotive Radar and 3D LiDAR
Video Attachment

Lee, Chia-Le	National Chiao Tung University
Hsueh, Yu-Han	National Chiao Tung University
Wang, Chieh-Chih	National Chiao Tung University
Lin, Wen-Chieh	National Chiao Tung University
Keywords: Calibration and Identification, Sensor Fusion Abstract: While automotive radars are widely used in most assisted and autonomous driving systems, only a few works were proposed to tackle the calibration problems of automotive radars with other perception sensors. One of the key calibration challenges of automotive planar radars with other sensors is the missing elevation angle in 3D space. In this paper, extrinsic calibration is accomplished based on the observation that the radar cross section (RCS) measurements have different value distributions across radar's vertical field of view. An approach to accurately and efficiently estimate the time delay between radars and LiDARs based on spatial-temporal relationships of calibration target positions is proposed. In addition, a localization method for calibration target detection and localization in pre-built maps is proposed to tackle insufficient LiDAR measurements on calibration targets. The experimental results show the feasibility and effectiveness of the proposed Radar-LiDAR extrinsic and temporal calibration approaches.

14:45-15:00, Paper WeCT1.4
Spatiotemporal Calibration of Camera and 3D Laser Scanner
Video Attachment

Nowicki, Michal	Poznan University of Technology
Keywords: Sensor Fusion, Field Robots, Range Sensing Abstract: The multi-sensory setups consisting of the laser scanners and cameras are popular as the measurements complement each other and provide necessary robustness for applications. Under dynamic conditions or when in motion, a direct transformation (spatial calibration) and time offset between sensors (temporal calibration) is needed to determine the correspondence between measurements. We propose an open-source spatiotemporal calibration framework for a camera and a 3D laser scanner. Our solution is based on commonly available chessboard markers requiring one-minute calibration before the operation that offers accurate and repeatable results. The framework is based on batch optimization of point-to-plane constraints with a time offset calibration possible by a novel minimal, continuous representation of the plane equations in the Lie algebra and the use of B-splines. The framework's properties are evaluated in simulation while performance is verified with two distinct sensory setups with Velodyne VLP-16 and SICK MRS6124 3D laser scanners.


WeCT2	Room T2
Crowd Modeling	Regular session
Chair: Engel, Jakob	Facebook
Co-Chair: Savva, Manolis	Simon Fraser University

14:00-14:15, Paper WeCT2.1
Robust Pedestrian Tracking in Crowd Scenarios Using an Adaptive GMM-Based Framework
Video Attachment

Zhang, Shuyang	Shenzhen Unity Drive Innovation Technology Co. Ltd.,
Wang, Di	Xi'an Jiaotong University
Ma, Fulong	The Hong Kong University of Science and Technology
Qin, Chao	Shenzhen Yiqing Inovation Co., Ltd
Chen, Zhengyong	The Hong Kong University of Science and Technology
Liu, Ming	Hong Kong University of Science and Technology
Keywords: Probability and Statistical Methods, Logistics Abstract: In this paper, we address the issue of pedestrian tracking in crowd scenarios. People in close social relationships tend to act as a group which is a great challenge to individually discriminate and track pedestrians on a LiDAR system. In this paper, we integrally model groups of people and track them in a recursive framework based on Gaussian Mixture Model (GMM). The model is optimized by an extended Expectation-Maximization (EM) algorithm which can adaptively vary the number of mixture components over scans. Experimental results both qualitatively and quantitatively indicate the reliability and accuracy of our tracker in populated scenarios.

14:15-14:30, Paper WeCT2.2
Towards Understanding and Inferring the Crowd: Guided Second Order Attention Networks and Re-Identification for Multi-Object Tracking

Bhujel, Niraj	Nanyang Technological University
Li, Jun	Institute for Infocomm Research
Yau, Wei-Yun	I2R
Wang, Han	Nanyang Technological University
Keywords: Deep Learning for Visual Perception, Visual Tracking, Autonomous Vehicle Navigation Abstract: Multi-human tracking in the crowded environment is a challenging problem due to occlusions, pose change, viewpoint variation and cluttered background. In this work, we propose a robust local feature learning method based on second-order attention network that can capture higher-order relationships between salient features at the early stages of Convolutional Neural Network (CNN). Guided Second-Order Attention Network (GSAN) that, unlike the existing attention learning methods which are weakly-supervised, uses a supervisory signal based on the quality of the learned attention maps. More specifically, GSAN looks into the attended maps of a person having the highest confidence and supervise itself to look into the correct regions in the different images of the same person. Attention mapslearned this way are spatially aligned and thus robust to camera-view changes and body pose variations. We verify the effectiveness of our approach by comparing with the state-of-the-art methods on challenging person re-identification and multi object tracking (MOT) datasets.

14:30-14:45, Paper WeCT2.3
Relational Graph Learning for Crowd Navigation
Video Attachment

Chen, Changan	UT Austin
Hu, Sha	Simon Fraser University
Nikdel, Payam	Simon Fraser University
Mori, Greg	Simon Fraser University
Savva, Manolis	Simon Fraser University
Keywords: Novel Deep Learning Methods, Reinforecment Learning, Representation Learning Abstract: We present a relational graph learning approach for robotic crowd navigation using model-based deep reinforcement learning that plans actions by looking into the future. Our approach reasons about the relations between all agents based on their latent features and uses a Graph Convolutional Network to encode higher-order interactions in each agent's state representation, which is subsequently leveraged for state prediction and value estimation. The ability to predict human motion allows us to perform multi-step lookahead planning, taking into account the temporal evolution of human crowds. We evaluate our approach against a state-of-the-art baseline for crowd navigation and ablations of our model to demonstrate that navigation with our approach is more efficient, results in fewer collisions, and avoids failure cases involving oscillatory and freezing behaviors.

14:45-15:00, Paper WeCT2.4
Domain Adaptation for Outdoor Robot Traversability Estimation from RGB Data with Safety-Preserving Loss

Palazzo, Simone	University of Catania
Guastella, Dario Calogero	Universit� Degli Studi Di Catania
Cantelli, Luciano	Universit� Degli Studi Di Catania
Spadaro, Paolo	University of Catania
Rundo, Francesco	ST Microelectronics
Muscato, Giovanni	Universita' Di Catania
Giordano, Daniela	University of Catania
Spampinato, Concetto	University of Catania
Keywords: Deep Learning for Visual Perception, AI-Based Methods, Collision Avoidance Abstract: Being able to estimate the traversability of the area surrounding a mobile robot is a fundamental task in the design of a navigation algorithm. However, the task is often complex, since it requires evaluating distances from obstacles, type and slope of terrain, and dealing with non-obvious discontinuities in detected distances due to perspective. In this paper, we present an approach based on deep learning to estimate and anticipate the traversing score of different routes in the field of view of an on-board RGB camera. The backbone of the proposed model is based on a state-of-the-art deep segmentation model, which is fine-tuned on the task of predicting route traversability. We then enhance the model�s capabilities by a) addressing domain shifts through gradient-reversal unsupervised adaptation, and b) accounting for the specific safety requirements of a mobile robot, by encouraging the model to err on the safe side, i.e., penalizing errors that would cause collisions with obstacles more than those that would cause the robot to stop in advance. Experimental results show that our approach is able to satisfactorily identify traversable areas and to generalize to unseen locations.

15:00-15:15, Paper WeCT2.5
SideGuide: A Large-Scale Sidewalk Dataset for Guiding Impaired People
Video Attachment

Park, Kibaek	KAIST
Oh, Youngtaek	KAIST
Ham, Soomin	KAIST
Joo, Kyungdon	Carnegie Mellon University (CMU)
Kim, Hyokyoung	Testworks
Kum, HyoYoung	Testworks
Kweon, In So	KAIST
Keywords: Big Data in Robotics and Automation, Object Detection, Segmentation and Categorization, Deep Learning for Visual Perception Abstract: In this paper, we introduce a new large-scale sidewalk dataset called SideGuide that could potentially help impaired people. Unlike most previous datasets, which are focused on road environments, we paid attention to sidewalks, where understanding the environment could provide the potential for improved walking of humans, especially impaired people. Concretely, we interviewed impaired people and carefully selected target objects from the interviewees' feedback (objects they encounter on sidewalks). We then acquired two different types of data: crowd-sourced data and stereo data. We labeled target objects at instance-level (i.e., bounding box and polygon mask) and generated a ground-truth disparity map for the stereo data. SideGuide consists of 350K images with bounding box annotation, 100K images with a polygon mask, and 180K stereo pairs with the ground-truth disparity. We analyzed our dataset by performing baseline analysis for object detection, instance segmentation, and stereo matching tasks. In addition, we developed a prototype that recognizes the target objects and measures distances, which could potentially assist people with disabilities. The prototype suggests the possibility of practical application of our dataset in real life.

15:15-15:30, Paper WeCT2.6
TLIO: Tight Learned Inertial Odometry

Liu, Wenxin	University of Pennsylvania
Caruso, David	Facebook Reality Lab
Ilg, Eddy	University of Freiburg
Dong, Jing	Georgia Institute of Technology
Mourikis, Anastasios	University of California, Riverside
Daniilidis, Kostas	University of Pennsylvania
Kumar, Vijay	University of Pennsylvania
Engel, Jakob	Facebook
Keywords: AI-Based Methods, Localization, Human-Centered Automation Abstract: In this work we propose a tightly-coupled Extended Kalman Filter framework for IMU-only state estimation. Strap-down IMU measurements provide relative state estimates based on IMU kinematic motion model. However the integration of measurements is sensitive to sensor bias and noise, causing significant drift within seconds. Recent research by Yan et al. (RoNIN) and Chen et al. (IONet) showed the capability of using trained neural networks to obtain accurate 2D displacement estimates from segments of IMU data and obtained good position estimates from concatenating them. This paper demonstrates a network that regresses 3D displacement estimates and its uncertainty, giving us the ability to tightly fuse the relative state measurement into a stochastic cloning EKF to solve for pose, velocity and sensor biases. We show that our network, trained with pedestrian data from a headset, can produce statistically consistent measurement and uncertainty to be used as the update step in the filter, and the tightly-coupled system outperforms velocity integration approaches in position estimates, and AHRS attitude filter in orientation estimates.


WeCT3	Room T3
Depth Estimation	Regular session
Chair: Huang, Rui	The Chinese University of Hong Kong, Shenzhen

14:00-14:15, Paper WeCT3.1
Deep Depth Estimation from Visual-Inertial SLAM
Video Attachment

Sartipi, Kourosh	University of Minnesota
Do, Tien	University of Minnesota
Ke, Tong	University of Minnesota
Vuong, Khiem	University of Minnesota
Roumeliotis, Stergios	University of Minnesota
Keywords: Deep Learning for Visual Perception, Visual Learning, SLAM Abstract: This paper addresses the problem of learning to complete a scene's depth from sparse depth points and images of indoor scenes. Specifically, we study the case in which the sparse depth is computed from a visual-inertial simultaneous localization and mapping (VI-SLAM) system. The resulting point cloud has low density, it is noisy, and has non-uniform spatial distribution, as compared to the input from active depth sensors, e.g., LiDAR or Kinect. Since the VI-SLAM produces point clouds only over textured areas, we compensate for the missing depth of the low-texture surfaces by leveraging their planar structures and their surface normals which is an important intermediate representation. The pre-trained surface normal network, however, suffers from large performance degradation when there is a significant difference in the viewing direction (especially the roll angle) of the test image as compared to the trained ones. To address this limitation, we use the available gravity estimate from the VI-SLAM to warp the input image to the orientation prevailing in the training dataset. This results in a significant performance gain for the surface normal estimate, and thus the dense depth estimates. Finally, we show that our method outperforms other state-of-the-art approaches both on training (ScanNet and NYUv2) and testing (collected with Azure Kinect) datasets.

14:15-14:30, Paper WeCT3.2
Learning Depth with Very Sparse Supervision

Loquercio, Antonio	UZH, University of Zurich
Dosovitskiy, Alexey	Google
Scaramuzza, Davide	University of Zurich
Keywords: Deep Learning for Visual Perception, AI-Based Methods, Autonomous Agents Abstract: Motivated by the astonishing capabilities of natural intelligent agents and inspired by theories from psychology, this paper explores the idea that perception gets coupled to 3D properties of the world via interaction with the environment. Existing works for depth estimation require either massive amounts of annotated training data or some form of hard-coded geometrical constraint. This paper explores a new approach to learning depth perception requiring neither of those. Specifically, we train a specialized global-local network architecture with what would be available to a robot interacting with the environment: from extremely sparse depth measurements down to even a single pixel per image. From a pair of consecutive images, our proposed network outputs a latent representation of the observer's motion between the images and a dense depth map. Experiments on several datasets show that, when ground truth is available even for just one of the image pixels, the proposed network can learn monocular dense depth estimation up to 22.5% more accurately than state-of-the-art approaches. We believe that this work, despite its scientific interest, lays the foundations to learn depth from extremely sparse supervision, which can be valuable to all robotic systems acting under severe bandwidth or sensing constraints.

14:30-14:45, Paper WeCT3.3
Self-Supervised Attention Learning for Depth and Ego-Motion Estimation
Video Attachment

Sadek, Assem	Naver Labs Europe
Chidlovskii, Boris	Naver Labs Europe
Keywords: Novel Deep Learning Methods, SLAM, Deep Learning for Visual Perception Abstract: We address the problem of depth and ego-motion estimation from image sequences. Recent advances in the domain propose to train a deep learning model for both tasks using image reconstruction in a self-supervised manner. We revise the assumptions and the limitations of the current approaches and propose two improvements to boost the performance of the depth and ego-motion estimation. We first use Lie group properties to enforce the geometric consistency between images in the sequence and their reconstructions. We then propose a mechanism to pay attention to image regions where the image reconstruction gets corrupted. We show how to integrate the attention mechanism in the form of attention gates in the pipeline and use attention coefficients as a mask. We evaluate the new architecture on the KITTI datasets and compare it to the previous techniques. We show that our approach improves the state-of-the-art results for ego-motion estimation and achieve comparable results for depth estimation.

14:45-15:00, Paper WeCT3.4
DiPE: Deeper into Photometric Errors for Unsupervised Learning of Depth and Ego-Motion from Monocular Videos
Video Attachment

Jiang, Hualie	The Chinese University of Hong Kong, Shenzhen
Ding, Laiyan	The Chinese University of Hong Kong, Shenzhen
Sun, Zhenglong	Chinese University of Hong Kong, Shenzhen
Huang, Rui	The Chinese University of Hong Kong, Shenzhen
Keywords: Deep Learning for Visual Perception, SLAM, Visual Learning Abstract: Unsupervised learning of depth and ego-motion from unlabelled monocular videos has recently drawn great attention, which avoids the use of expensive ground truth in the supervised one. It achieves this by using the photometric errors between the target view and the synthesized views from its adjacent source views as the loss. Despite significant progress, the learning still suffers from occlusion and scene dynamics. This paper shows that carefully manipulating photometric errors can tackle these difficulties better. The primary improvement is achieved by a statistical technique that can mask out the invisible or nonstationary pixels in the photometric error map and thus prevents misleading the networks. With this outlier masking approach, the depth of objects moving in the opposite direction to the camera can be estimated more accurately. To the best of our knowledge, such scenarios have not been seriously considered in the previous works, even though they pose a higher risk in applications like autonomous driving. We also propose an efficient weighted multi-scale scheme to reduce the artifacts in the predicted depth maps. Extensive experiments on the KITTI dataset show the effectiveness of the proposed approaches. The overall system achieves state-of-theart performance on both depth and ego-motion estimation.

15:00-15:15, Paper WeCT3.5
Don�t Forget the Past: Recurrent Depth Estimation from Monocular Video

Patil, Vaishakh	CVL ETH Zurich
Van Gansbeke, Wouter	KU Leuven
Dai, Dengxin	ETH Zurich
Van Gool, Luc	ETH Zurich
Keywords: Deep Learning for Visual Perception, RGB-D Perception, Sensor Fusion Abstract: Autonomous cars need continuously updated depth information. Thus far, depth is mostly estimated independently for a single frame at a time, even if the method starts from video input. Our method produces a time series of depth maps, which makes it an ideal candidate for online learning approaches.In particular, we put three different types of depth estimation (supervised depth prediction, self-supervised depth prediction, and self-supervised depth completion) into a common framework. We integrate the corresponding networks with a ConvLSTM such that the spatiotemporal structures of depth across frames can be exploited to yield a more accurate depth estimation. Our method is flexible. It can be applied to monocular videos only or be combined with different types of sparse depth patterns. We carefully study the architecture of the recurrent network and its training strategy. We are first to successfully exploit recurrent networks for real-time self-supervised monocular depth estimation and completion. Extensive experiments show that our recurrent method outperforms its image-based counterpart consistently and significantly in both self-supervised scenarios. It also outperforms previous depth estimation methods of the three popular groups.


WeCT4	Room T4
Depth Perception	Regular session
Chair: Ferrer, Gonzalo	Skolkovo Institute of Science and Technology
Co-Chair: Majumder, Anima	Tata Consultancy Services

14:00-14:15, Paper WeCT4.1
NBVC: A Benchmark for Depth Estimation from Narrow-Baseline Video Clips

Mordohai, Philippos	Stevens Institute of Technology
Batsos, Konstantinos	Stevens Institute of Technoiogy
Makadia, Ameesh	University of Pennsylvania
Snavely, Noah	Cornell
Keywords: Range Sensing, Visual-Based Navigation Abstract: We present a benchmark for online, video-based depth estimation, a problem that is not covered by the current set of benchmarks for evaluating 3D reconstruction, which focus on offline, batch reconstruction. Online depth estimation from video captured by a moving camera is a key enabling technology for compelling applications in robotics and augmented reality. Inspired by progress in many aspects of robotics due to benchmarks and datasets, we propose a new benchmark called NBVC for evaluating methods for online depth estimation from video. Our benchmark is composed of short video sequences with corresponding high-quality ground truth depth maps, derived from the recent Tanks and Temples dataset. We are hopeful that our work will be instrumental in the development of learning-based algorithms for online depth estimation from video clips, and will also lead to improvements in conventional approaches. In addition to the benchmark, we present a superpixel-based plane sweeping stereo algorithm and use it to investigate various aspects of the problem. The paper contains our initial findings and conclusions.

14:15-14:30, Paper WeCT4.2
LaNoising: A Data-Driven Approach for 903nm ToF LiDAR Performance Modeling under Fog
Video Attachment

Yang, Tao	Northwestern Polytechnical University
Li, You	RENAULT S.A.S
Ruichek, Yassine	University of Technology of Belfort-Montbeliard - France
Yan, Zhi	University of Technology of Belfort-Montb�liard (UTBM)
Keywords: Range Sensing, Autonomous Vehicle Navigation, Deep Learning for Visual Perception Abstract: As a critical sensor for high-level autonomous vehicles, LiDAR's limitations in adverse weather (e.g. rain, fog, snow, etc.) impede the deployment of self-driving cars in all weather conditions. In this paper, we model the performance of a popular 903nm ToF LiDAR under various fog conditions based on a LiDAR dataset collected in a well-controlled artificial fog chamber. Specifically, a two-stage data-driven method, called LaNoising (la for laser), is proposed for generating LiDAR measurements under fog conditions. In the first stage, the Gaussian Process Regression (GPR) model is established to predict whether a laser can successfully output a true detection range or not, given certain fog visibility values. If not, then in the second stage, the Mixture Density Network (MDN) is used to provide a probability prediction of the noisy measurement range. The performance of the proposed method has been quantitatively and qualitatively evaluated. Experimental results show that our approach can provide a promising description of 903nm ToF LiDAR performance under fog.

14:30-14:45, Paper WeCT4.3
360� Depth Estimation from Multiple Fisheye Images with Origami Crown Representation of Icosahedron
Video Attachment

Komatsu, Ren	The University of Tokyo
Fujii, Hiromitsu	Chiba Institute of Technology
Tamura, Yusuke	Tohoku University
Yamashita, Atsushi	The University of Tokyo
Asama, Hajime	The University of Tokyo
Keywords: Omnidirectional Vision Abstract: In this study, we present a method for all-around depth estimation from multiple omnidirectional images for indoor environments. In particular, we focus on plane-sweeping stereo as the method for depth estimation from the images. We propose a new icosahedron-based representation and ConvNets for omnidirectional images, which we name �CrownConv� because the representation resembles a crown made of origami. CrownConv can be applied to both fisheye images and equirectangular images to extract features. Furthermore, we propose icosahedron-based spherical sweeping for generating the cost volume on an icosahedron from the extracted features. The cost volume is regularized using the three-dimensional CrownConv, and the final depth is obtained by depth regression from the cost volume. Our proposed method is robust to camera alignments by using the extrinsic camera parameters; therefore, it can achieve precise depth estimation even when the camera alignment differs from that in the training dataset. We evaluate the proposed model on synthetic datasets and demonstrate its effectiveness. As our proposed method is computationally efficient, the depth is estimated from four fisheye images in less than a second using a laptop with a GPU. Therefore, it is suitable for real-world robotics applications. Our source code is available at https://github.com/matsuren/crownconv360depth.

14:45-15:00, Paper WeCT4.4
Video Depth Estimation by Fusing Flow-To-Depth Proposals
Video Attachment

Xie, Jiaxin	HKUST
Lei, Chenyang	HKUST
Li, Zhuwen	Nuro, Inc
Li, Li	Columbia University
Chen, Qifeng	HKUST
Keywords: RGB-D Perception, Computer Vision for Other Robotic Applications, Deep Learning for Visual Perception Abstract: Depth from a monocular video can enable billions of devices and robots with a single camera to see the world in 3D. In this paper, we present a model for video depth estimation, which consists of a flow-to-depth layer, a camera pose refinement module, and a depth fusion network. Given optical flow and camera poses, our flow-to-depth layer generates depth proposals and their corresponding confidence maps by explicitly solving an epipolar geometry optimization problem. Our flow-to-depth layer is differentiable, and thus we can refine camera poses by maximizing the aggregated confidence in the camera pose refinement module. Our depth fusion network can utilize the target frame, depth proposals, and confidence maps inferred from different neighboring frames to produce the final depth map. Furthermore, the depth fusion network can additionally take the depth proposals generated by other methods to further improve the results. The experiments on three public datasets show that our approach outperforms state-of-the-art depth estimation methods, and has reasonable cross-dataset generalization ability: our model trained on KITTI still performs well on the unseen Waymo dataset.

15:00-15:15, Paper WeCT4.5
Unsupervised Depth and Confidence Prediction from Monocular Images Using Bayesian Inference
Video Attachment

Bhutani, Vishal	TCS Research and Innovation
Vankadari, Madhu Babu	TCS
Jha, Omprakash	Tata Consultancy Services
Majumder, Anima	Tata Consultancy Services
Swagat, Kumar	Tata Consultancy Services
Dutta, Samrat	TCS Research and Innovation
Keywords: Autonomous Vehicle Navigation, RGB-D Perception, Perception for Grasping and Manipulation Abstract: In this paper, we propose an unsupervised deep learning framework with Bayesian inference for improving the accuracy of per-pixel depth prediction from monocular RGB images. The proposed framework predicts confidence map along with depth and pose information for a given input image. The depth hypotheses from previous frames are propagated forward and fused with the depth hypothesis of the current frame by using Bayesian inference mechanism. The ground truth information required for training the confidence map prediction is constructed using image reconstruction loss thereby obviating the need for explicit ground truth depth information used in supervised methods. The resulting unsupervised framework is shown to outperform the existing state-of-the-art methods for depth prediction on the publicly available KITTI outdoor dataset. The usefulness of the proposed framework is further established by demonstrating a real-world robotic pick and place application where the pose of the robot end-effector is computed using the depth predicted from an eye-in-hand monocular camera. The design choices made for the proposed framework is justified through extensive ablation studies.

15:15-15:30, Paper WeCT4.6
TT-TSDF: Memory-Efficient TSDF with Low-Rank Tensor Train Decomposition

Boyko, Alexey	Skolkovo Institute of Science and Technology
Matrosov, Mikhail	Skolkovo Institute of Science and Technology
Oseledets, Ivan	Skolkovo Institute of Science and Technology
Tsetserukou, Dzmitry	Skolkovo Institute of Science and Technology
Ferrer, Gonzalo	Skolkovo Institute of Science and Technology
Keywords: Range Sensing, RGB-D Perception Abstract: In this paper we apply the low-rank Tensor Train decomposition for compression and operations on 3D objects and scenes represented by volumetric distance functions. Our study shows that not only it allows for a very efficient compression of the high-resolution TSDF maps (up to three orders of magnitude of the original memory footprint at resolution of 512^3), but also allows to perform TSDF-Fusion directly in the low-rank form. This can potentially enable much more efficient 3D mapping on low-power mobile and consumer robot platforms.


WeCT5	Room T5
DL for Visual Perception I	Regular session
Chair: Chung, Soon-Jo	Caltech
Co-Chair: Kataoka, Hirokatsu	National Institute of Advanced Industrial Science and Technology (AIST)

14:00-14:15, Paper WeCT5.1
3D-Aware Scene Change Captioning from Multiview Images

Qiu, Yue	University of Tsukuba; National Institute of Advanced Industrial
Satoh, Yutaka	AIST
Suzuki, Ryota	National Institute of Advanced Industrial Science and Technology
Iwata, Kenji	AIST
Kataoka, Hirokatsu	National Institute of Advanced Industrial Science and Technology
Keywords: Multi-Modal Perception, Deep Learning for Visual Perception, Computer Vision for Other Robotic Applications Abstract: In this paper, we propose a framework that recognizes and describes the change that occurred in a scene observed from multiple viewpoints in natural language text. The ability to recognize and describe changes that occurred in a 3D scene plays an essential role in a variety of human-robot interaction (HRI) applications. However, most current 3D vision studies focus on static 3D scene understanding. Existing scene change captioning approaches recognize and generate change captions from single-view images. Thus, those methods have limited ability to deal with camera movements, object occlusions, which are common in real-environment applications. To resolve the above problems, we propose a framework that observes every scene from multiple viewpoints and describes the scene change based on an understanding containing the underlying 3D structure of scenes. We build two synthetic datasets consisting of primitive 3D object and scanned real object models for evaluation. The results indicate that our method outperforms the previous state-of-the-art 2D-based method by a large margin in terms of sentence generation and change understanding correctness. In addition, our method is more robust to camera movements compared to the previous method and also performs better for occluded scene setting. Moreover, our method also shows encouraging results in a more realistic scene setting, which makes it promising to adapt our framework to more complicated and extensive scene-settings.

14:15-14:30, Paper WeCT5.2
Loop-Net: Joint Unsupervised Disparity and Optical Flow Estimation of Stereo Videos with Spatiotemporal Loop Consistency

Kim, Taewoo	KAIST
Song, Kyeongseob	KAIST
Ryu, Kwonyoung	KAIST
Yoon, Kuk-Jin	KAIST
Keywords: Deep Learning for Visual Perception, Visual Learning, Computer Vision for Other Robotic Applications Abstract: Most of existing deep learning-based depth and optical flow estimation methods require the supervision of a lot of ground truth data, and hardly generalize to video frames, resulting in temporal inconsistency (flickering). In this paper, we propose a joint framework that estimates disparity and optical flow of stereo videos and generalizes across various video frames by considering the spatiotemporal relation between the disparity and flow without supervision. To improve both accuracy and consistency, we propose a loop consistency loss which enforces the spatiotemporal consistency of the estimated disparity and optical flow. Furthermore, we introduce a video-based training scheme using the c-LSTM to reinforce the temporal consistency. Extensive experiments show our proposed methods not only estimate disparity and optical flow accurately but also further improve spatiotemporal consistency. Our framework outperforms the state-of-the-art unsupervised depth and optical flow estimation models on the KITTI benchmark dataset.

14:30-14:45, Paper WeCT5.3
Fast Uncertainty Estimation for Deep Learning Based Optical Flow

Lee, Serin	California Institute of Technology
Capuano, Vincenzo	California Institute of Technology
Harvard, Alexei	California Institute of Technology
Chung, Soon-Jo	Caltech
Keywords: Deep Learning for Visual Perception, Computer Vision for Automation, Space Robotics and Automation Abstract: We present a novel approach to reduce the pro- cessing time required to derive the estimation uncertainty map in deep learning-based optical flow determination methods. Without uncertainty aware reasoning, the optical flow model, especially when it is used for mission critical fields such as robotics and aerospace, can cause catastrophic failures. Although several approaches such as the ones based on Bayesian neural networks have been proposed to handle this issue, they are computationally expensive. Thus, to speed up the processing time, our approach applies a generative model, which is trained by input images and an uncertainty map derived through a Bayesian approach. By using synthetically generated images of spacecraft, we demonstrate that the trained generative model can produce the uncertainty map 100∼700 times faster than the conventional uncertainty estimation method used for training the generative model itself. We also show that the quality of uncertainty map derived by the generative model is close to that of the original uncertainty map. By applying the proposed approach, the deep learning model operated in real-time can avoid disastrous failures by considering the uncertainty as well as achieving better performance removing uncertain portions of the prediction result.

14:45-15:00, Paper WeCT5.4
Diagnose Like a Clinician: Third-Order Attention Guided Lesion Amplification Network for WCE Image Classification

Xing, Xiaohan	The Chinese University of Hong Kong
Yuan, Yixuan	City University of Hong Kong
Meng, Max Q.-H.	The Chinese University of Hong Kong
Keywords: Deep Learning for Visual Perception, Computer Vision for Medical Robotics, Recognition Abstract: Wireless capsule endoscopy (WCE) is a novel imaging tool that allows noninvasive visualization of the entire gastrointestinal (GI) tract without causing discomfort to the patients. Although convolutional neural networks (CNNs) have obtained promising performance for the automatic lesion recognition, the results of the current approaches are still limited due to the small lesions and the background interference in the WCE images. To overcome these limits, we propose a Third-order Attention guided Lesion Amplification Network (TALA-Net) for WCE image classification. The TALA-Net consists of two branches, including a global branch and an attention-aware branch. Specifically, taking the high-level features in the global branch as the input, we propose a Third-order Attention (ToA) module to generate attention maps that can indicate potential lesion regions. Then, an Attention Guided Lesion Amplification (AGLA) module is proposed to deform multiple level features in the global branch, so as to zoom in the potential lesion features. The deformed features are fused into the attention-aware branch to achieve finer-scale lesion recognition. Finally, predictions from the global and attention-aware branches are averaged to obtain the classification results. Extensive experiments show that the proposed TALA-Net outperforms state-of-the-art methods with an overall classification accuracy of 94.72% on the WCE dataset.

15:00-15:15, Paper WeCT5.5
Wiping 3D-Objects Using Deep Learning Model Based on Image/Force/Joint Information
Video Attachment

Saito, Namiko	Waseda University
Wang, Danyang	Waseda University
Ogata, Tetsuya	Waseda University
Mori, Hiroki	Waseda University
Sugano, Shigeki	Waseda University
Keywords: Deep Learning for Visual Perception, Deep Learning in Grasping and Manipulation, Model Learning for Control Abstract: We propose a deep learning model for a robot to wipe 3D-objects. Wiping of 3D-objects requires recognizing the shapes of objects and planning the motor angle adjustments for tracing the objects. Unlike previous research, our learning model does not require pre-designed computational models of target objects. The robot is able to wipe the objects to be placed by using image, force, and arm joint information. We evaluate the generalization ability of the model by confirming that the robot handles untrained cube and bowl shaped-objects. We also find that it is necessary to use both image and force information to recognize the shape of and wipe 3D objects consistently by comparing changes in the input sensor data to the model. To our knowledge, this is the first work enabling a robot to use learning sensorimotor information alone to trace various unknown 3D- shape.

15:15-15:30, Paper WeCT5.6
D2VO: Monocular Deep Direct Visual Odometry

Jia, Qizeng	Huazhong University of Science and Technology
Pu, Yuechuan	Huazhong University of Science and Technology
Chen, Jingyu	Huazhong University of Science and Technology
Cheng, Junda	Huazhong University of Science and Technology
Liao, Chunyuan	HiScene Information Technologies
Yang, Xin	Huazhong University of Science and Technology
Keywords: Deep Learning for Visual Perception, SLAM, Mapping Abstract: In this paper, we present a novel deep learning and direct method based monocular visual odometry system named D2VO. Our system reconstructs the dense depth map of each keyframe and tracks camera poses based on these keyframes. For each input frame, a feature pyramid is built and shared by both tracking and mapping process. By calculating the cost volume on each pyramid level, the depth map of keyframe is estimated from coarse to fine with the followed multi-view hierarchical depth estimation network. We optimize the camera pose by minimizing photometric error between re-projected features of each frame and its reference keyframe with bundle adjustment. Experimental results on TUM dataset demonstrate that our approach outperforms the state-of-the-art methods on both tracking and mapping.


WeCT6	Room T6
DL for Visual Perception II	Regular session
Chair: Johnson-Roberson, Matthew	University of Michigan
Co-Chair: Zhou, Bolei	The Chinese University of Hong Kong

14:00-14:15, Paper WeCT6.1
Cross-View Semantic Segmentation for Sensing Surroundings
Video Attachment

Pan, Bowen	MIT
Sun, Jiankai	The Chinese University of Hong Kong
Leung, Ho Yin Tiga	The Chinese University of Hong Kong
Andonian, Alex	MIT
Zhou, Bolei	The Chinese University of Hong Kong
Keywords: Semantic Scene Understanding, Deep Learning for Visual Perception, Computer Vision for Other Robotic Applications Abstract: Sensing surroundings plays a crucial role in human spatial perception, as it extracts the spatial configuration of objects as well as the free space from the observations. To facilitate the robot perception with such a surrounding sensing capability, we introduce a novel visual task called Cross-view Semantic Segmentation as well as a framework named View Parsing Network (VPN) to address it. In the cross-view semantic segmentation task, the agent is trained to parse the first-view observations into a top-down-view semantic map indicating the spatial location of all the objects at pixel-level. The main issue of this task is that we lack the real-world annotations of top-down-view data. To mitigate this, we train the VPN in 3D graphics environment and utilize the domain adaptation technique to transfer it to handle real-world data. We evaluate our VPN on both synthetic and real-world agents. The experimental results show that our model can effectively make use of the information from different views and multi-modalities to understanding spatial information. Our further experiment on a LoCoBot robot shows that our model enables the surrounding sensing capability from 2D image input. Code and demo videos can be found at url{https://view-parsing-network.github.io}.

14:15-14:30, Paper WeCT6.2
Real-Time Fusion Network for RGB-D Semantic Segmentation Incorporating Unexpected Obstacle Detection for Road-Driving Images

Sun, Lei	Zhejiang University
Yang, Kailun	Karlsruhe Institute of Technology
Hu, Xinxin	Zhejiang University
Hu, Weijian	Zhejiang University
Wang, Kaiwei	Zhejiang University
Keywords: Deep Learning for Visual Perception, Computer Vision for Transportation, Autonomous Vehicle Navigation Abstract: Semantic segmentation has made striking progress due to the success of deep convolutional neural networks. Considering the demand of autonomous driving, real-time semantic segmentation has become a research hotspot these years. However, few real-time RGB-D fusion semantic segmentation studies are carried out despite readily accessible depth information nowadays. In this paper, we propose a real-time fusion semantic segmentation network termed RFNet that efficiently exploits complementary features from depth information to enhance the performance in an attention-augmented way, while running swiftly that is a necessity for autonomous vehicles applications. Multi-dataset training is leveraged to incorporate unexpected small obstacle detection, enriching the recognizable classes required to face unforeseen hazards in the real world. A comprehensive set of experiments demonstrates the effectiveness of our framework. On Cityscapes, Our method outperforms previous state-of-the-art semantic segmenters, with excellent accuracy and 22Hz inference speed at the full 2048�1024 resolution, outperforming most existing RGB-D networks.

14:30-14:45, Paper WeCT6.3
SilhoNet-Fisheye: Adaptation of a ROI-Based Object Pose Estimation Network to Monocular Fisheye Images
Video Attachment

Billings, Gideon	University of Michigan
Johnson-Roberson, Matthew	University of Michigan
Keywords: Deep Learning for Visual Perception, Computer Vision for Automation, Marine Robotics Abstract: There has been much recent interest in deep learning methods for monocular image based object pose estimation. While object pose estimation is an important problem for autonomous robot interaction with the physical world, and the application space for monocular-based methods is expansive, there has been little work on applying these methods with fisheye imaging systems. Also, little exists in the way of annotated fisheye image datasets on which these methods can be developed and tested. The research landscape is even more sparse for object detection methods applied in the underwater domain, fisheye image based or otherwise. In this work, we present a novel framework for adapting a ROI-based 6D object pose estimation method to work on full fisheye images. The method incorporates the gnomic projection of regions of interest from an intermediate spherical image representation to correct for the fisheye distortions. Further, we contribute a fisheye image dataset, called UWHandles, collected in natural underwater environments, with 6D object pose and 2D bounding box annotations.

14:45-15:00, Paper WeCT6.4
Alleviating the Burden of Labeling: Sentence Generation by Attention Branch Encoder-Decoder Network
Video Attachment

Ogura, Tadashi	National Institute of Information and Communications Technology
Magassouba, Aly	NICT
Sugiura, Komei	Keio University
Hirakawa, Tsubasa	Chubu University
Yamashita, Takayoshi	Chubu University
Fujiyoshi, Hironobu	Chubu University
Kawai, Hisashi	National Institute of Information and Communications Technology
Keywords: Novel Deep Learning Methods, Deep Learning for Visual Perception Abstract: Domestic service robots (DSRs) are a promising solution to the shortage of home care workers. Nonetheless, one of the main limitations of DSRs is their inability to naturally interact through language. Recently, data-driven approaches have been shown to be effective for tackling this limitation, however, they often require large-scale datasets, which is costly. Based on this background, we aim to perform automatic sentence generation for fetching instructions, e.g., "Bring me a green tea bottle on the table." This is particularly challenging because appropriate expressions depend on the target object, as well as its surroundings. In this paper, we propose the attention branch encoder-decoder network (ABEN) that generates sentences from visual inputs. Unlike other approaches, the ABEN has multimodal attention branches that utilize subword-level attention and generate sentences based on subword embeddings. In the experiment, we compared the ABEN with a baseline method using four standard metrics in image captioning. Experimental results show that the ABEN outperformed the baseline in terms of these metrics.

15:00-15:15, Paper WeCT6.5
CalibRCNN: Calibrating Camera and LiDAR by Recurrent Convolutional Neural Network and Geometric Constraints

Shi, Jieying	Zhejiang University of Technology
Zhu, Ziheng	Zhejiang University of Techonology
Zhang, Jianhua	Zhejiang University of Technology
Liu, Ruyu	The Universität Hamburg
Wang, Zhenhua	Zhejiang University of Technology
Chen, Shengyong	Tianjin University of Technology
Liu, Honghai	Portsmouth University
Keywords: Deep Learning for Visual Perception, Sensor Fusion, Multi-Modal Perception Abstract: In this paper, we present Calibration Recurrent Convolutional Neural Network (CalibRCNN) to infer a 6 degrees of freedom (DOF) rigid body transformation between 3D LiDAR and 2D camera. Different from the existing methods, our 3D-2D CalibRCNN not only uses the LSTM network to extract the temporal features between 3D point clouds and RGB images of consecutive frames, but also uses the geometric loss and photometric loss obtained by the interframe constraint to refine the calibration accuracy of the predicted transformation parameters. The CalibRCNN aims at inferring the correspondence between projected depth image and RGB image to learn the underlying geometry of 2D-3D calibration. Thus, the proposed calibration model achieves a good generalization ability to adapt to unknown initial calibration error ranges, and other 3D LiDAR and 2D camera pairs with different intrinsic parameters from the training dataset. Extensive experiments have demonstrated that our CalibRCNN can achieve state-of-the-art accuracy by comparison with other CNN based methods.

15:15-15:30, Paper WeCT6.6
Latent Replay for Real-Time Continual Learning

Pellegrini, Lorenzo	University of Bologna
Graffieti, Gabriele	University of Bologna
Lomonaco, Vincenzo	University of Bologna
Maltoni, Davide	University of Bologna
Keywords: Deep Learning for Visual Perception, Visual Learning, Computer Vision for Other Robotic Applications Abstract: Training deep neural networks at the edge on light computational devices, embedded systems and robotic platforms is nowadays very challenging. Continual learning techniques, where complex models are incrementally trained on small batches of new data, can make the learning problem tractable even for CPU-only embedded devices enabling remarkable levels of adaptiveness and autonomy. However, a number of practical problems need to be solved: catastrophic forgetting before anything else. In this paper we introduce an original technique named "Latent Replay" where, instead of storing a portion of past data in the input space, we store activations volumes at some intermediate layer. This can significantly reduce the computation and storage required by native rehearsal. To keep the representation stable and the stored activations valid we propose to slow-down learning at all the layers below the latent replay one, leaving the layers above free to learn at full pace. In our experiments we show that Latent Replay, combined with existing continual learning techniques,achieves state-of-the-art performance on complex video benchmarks such as CORe50 NICv2 (with nearly 400 small and highly non-i.i.d. batches) and OpenLORIS. Finally, we demonstrate the feasibility of nearly real-time continual learning on the edge through the deployment of the proposed technique on a smartphone device.


WeCT7	Room T7
DL for Visual Perception III	Regular session
Chair: Belter, Dominik	Poznan University of Technology
Co-Chair: Gandhi, Vineet	IIIT Hyderabad

14:00-14:15, Paper WeCT7.1
Learning to Switch CNNs with Model Agnostic Meta Learning for Fine Precision Visual Servoing
Video Attachment

Raj, Prem	IIT KANPUR
Namboodiri, Vinay	Indian Institute of Technology Kanpur
Behera, Laxmidhar	IIT Kanpur
Keywords: Visual Servoing, Deep Learning for Visual Perception Abstract: Convolutional Neural Networks (CNNs) have been successfully applied for relative camera pose estimation from labeled image-pair data, without requiring any hand-engineered features, camera intrinsic parameters or depth information. The trained CNN can be utilized for performing pose based visual servo control (PBVS). One of the ways to improve the quality of visual servo output is to improve the accuracy of the CNN for estimating the relative pose estimation. With a given state-of-the-art CNN for relative pose regression, how can we achieve an improved performance for visual servo control? In this paper, we explore switching of CNNs to improve the precision of visual servo control. The idea of switching a CNN is due to the fact that the dataset for training a relative camera pose regressor for visual servo control must contain variations in relative pose ranging from a very small scale to eventually a larger scale. We found that, training two different instances of the CNN, one for large-scale-displacements (LSD) and another for small-scale-displacements (SSD) and switching them during the visual servo execution yields better results than training a single CNN with the combined LSD+SSD data. However, it causes extra storage overhead and switching decision is taken by a manually set threshold which may not be optimal for all the scenes. To eliminate these drawbacks, we propose an efficient switching strategy based on model agnostic meta learning (MAML) algorithm. In this, a single model is trained to learn parameters which are simultaneously good for multiple tasks, namely a binary classification for switching decision, a 6DOF pose regression for LSD data and also a 6DOF pose regression for SSD data. The proposed approach performs far better than the naive approach, while storage and run-time overheads are almost negligible.

14:15-14:30, Paper WeCT7.2
HD Map Change Detection with Cross-Domain Deep Metric Learning
Video Attachment

Heo, Minhyeok	NAVER LABS
Kim, Jiwon	NAVER LABS
Kim, Sujung	NAVER LABS
Keywords: Deep Learning for Visual Perception, Computer Vision for Other Robotic Applications Abstract: High-definition (HD) maps are emerging as an essential tool for autonomous driving since they provide high-precision semantic information about the physical environment. To function as a reliable source of map information, HD maps must be constantly updated with changes that occur to the state of the road. In this paper, we propose a novel framework for HD map change detection that can be used to maintain an up-to-date HD map. More specifically, we design our HD map change detection algorithm based on deep metric learning, providing a unified framework that directly maps an input image to estimated probabilities of HD map changes. To reduce the discrepancy between input domains, i.e., camera image and HD map, we propose an effective learning scheme for metric space based on adversarial learning. Finally, we augment our framework with a pixel-level local change detector that specifies the region of changes in the image. We verify the effectiveness of our framework by evaluating it on a city-scale urban HD map dataset. Experimental results show that our method can robustly detect changes against noises due to dynamic objects and error in vehicle poses.

14:30-14:45, Paper WeCT7.3
CNN-Based Foothold Selection for Mechanically Adaptive Soft Foot
Video Attachment

Bednarek, Jakub	Poznań University of Technology
Maalouf, Noel	American University of Beirut
Pollayil, Mathew, Jose	University of Pisa
Garabini, Manolo	Universit� Di Pisa
Catalano, Manuel Giuseppe	Istituto Italiano Di Tecnologia
Grioli, Giorgio	Istituto Italiano Di Tecnologia
Belter, Dominik	Poznan University of Technology
Keywords: Deep Learning for Visual Perception, Multi-legged Robots, Task Planning Abstract: In this paper, we consider a problem of foothold selection for the quadrupedal robots equipped with compliant adaptive feet. Starting from a model of the foot we compute the quality of the potential footholds considering also kinematic constraints and collisions during evaluation. Since terrain assessment and constraints checking are computationally expensive we applied a Convolutional Neural Network (CNN) to evaluate the potential footholds on the elevation map. We propose an efficient strategy for data clustering and segmentation with CNN. The data for training the neural network is collected off-line but the inference works on-line when the robot walks on rough terrains and allows for efficient adaptation to the terrain and exploitation of the properties of the soft adaptive feet.

14:45-15:00, Paper WeCT7.4
Depth Estimation from Monocular Images and Sparse Radar Data
Video Attachment

Lin, Juan-Ting	ETH Zurich
Dai, Dengxin	ETH Zurich
Van Gool, Luc	ETH Zurich
Keywords: Deep Learning for Visual Perception, Sensor Fusion, RGB-D Perception Abstract: In this paper, we explore the possibility of achieving a more accurate depth estimation by fusing monocular images and Radar points using a deep neural network. We give a comprehensive study of the fusion between RGB images and Radar measurements from different aspects and proposed a working solution based on the observations. We find that the noise existing in Radar measurements is one of the main key reasons that prevents one from applying the existing fusion methods developed for LiDAR data and images to the new fusion problem between Radar data and images. The experiments are conducted on the nuScenes dataset, which is one of the first datasets which features Camera, Radar, and LiDAR recordings in diverse scenes and weather conditions. Extensive experiments demonstrate that our method outperforms existing fusion methods. We also provide detailed ablation studies to show the effectiveness of each component in our method.

15:00-15:15, Paper WeCT7.5
Tidying Deep Saliency Prediction Architectures
Video Attachment

Reddy, Navyasri	IIIT Hyderabad
Jain, Samyak	IIIT Hyderabad
Yarlagadda, Sree Ram Sai Pradeep	IIIT Hyderabad
Gandhi, Vineet	IIIT Hyderabad
Keywords: Deep Learning for Visual Perception, Computer Vision for Other Robotic Applications, Visual Learning Abstract: Learning computational models for visual attention (saliency estimation) is an effort to inch machines/robots closer to human visual cognitive abilities. Data-driven efforts have dominated the landscape since the introduction of deep neural network architectures. In deep learning research, the choices in architecture design are often empirical and frequently lead to more complex models than necessary. The complexity, in turn, hinders the application requirements. In this paper, we identify four key components of saliency models, i.e., input features, multi-level integration, readout architecture, and loss functions. We review the existing state of the art models on these four components and propose novel and simpler alternatives. As a result, we propose two novel end-to-end architectures called SimpleNet and MDNSal, which are neater, minimal, more interpretable and achieve state of the art performance on public saliency benchmarks. SimpleNet is an optimized encoder-decoder architecture and brings notable performance gains on the SALICON dataset (the largest saliency benchmark). MDNSal is a parametric model that directly predicts parameters of a GMM distribution and is aimed to bring more interpretability to the prediction maps. The proposed saliency models can be inferred at 25fps, making them suitable for real-time applications. Code and pre-trained models are available at https://github.com/samyak0210/saliency.

15:15-15:30, Paper WeCT7.6
Simultaneously Learning Corrections and Error Models for Geometry-Based Visual Odometry Methods

De Maio, Andrea	LAAS-CNRS
Lacroix, Simon	LAAS/CNRS
Keywords: Deep Learning for Visual Perception, Visual-Based Navigation, RGB-D Perception Abstract: This paper fosters the idea that deep learning methods can be used to complement classical visual odometry pipelines to improve their accuracy and to associate uncertainty models to their estimations. We show that the biases inherent to the visual odometry process can be faithfully learned and compensated for, and that a learning architecture associated with a probabilistic loss function can jointly estimate a full covariance matrix of the residual errors, defining an error model capturing the heteroscedasticity of the process. Experiments on autonomous driving image sequences assess the possibility to concurrently improve visual odometry and estimate an error associated with its outputs.


WeCT8	Room T8
Human Detection and Pose	Regular session
Chair: Iqbal, Tariq	University of Virginia
Co-Chair: Leibe, Bastian	RWTH Aachen University

14:00-14:15, Paper WeCT8.1
Whole-Game Motion Capturing of Team Sports: System Architecture and Integrated Calibration

Ikegami, Yosuke	University of Tokyo
Nikolić, Milutin	University of Novi Sad, Faculty of Technical Sciences
Yamada, Ayaka	The University of Tokyo
Zhang, Lei	Bejing University of Civil Engineering and Architecture
Ooke, Natsu	Nakamura-Yamamoto Lab, School of Mechano-Informatics, University
Nakamura, Yoshihiko	University of Tokyo
Keywords: Human and Humanoid Motion Analysis and Synthesis, Visual Tracking, Behavior-Based Systems Abstract: This paper discusses the application of video motion capturing technology (VMocap) to a competitive team sports game. The setting introduces a specific set of constraints: large scale markerless motion capturing, big recording volume, transmitting and processing gigabytes of data, operation without interfering with players or distracting spectators and staff, etc... In this paper, we present how we tackled and successfully solved all of these constraints. That enabled us to analyze the sportsmen without any intrusions, while giving their peak performance, hence opening a new field for Mocap application. International volleyball game was recorded in full length with the described system. During the course of the event, we compressed 54TB of RAW image data real-time, capturing 6 hours of high framerate video per camera, without disturbing any of the game operations. Using the data, we were able to reconstruct the motion, muscle activity and behavior of the athletes present on the court.

14:15-14:30, Paper WeCT8.2
A Particle Filter Technique for Human Pose Estimation in Case of Occlusion Exploiting Holographic Human Model and Virtualized Environment
Video Attachment

Messeri, Costanza	Politecnico Di Milano
Rebecchi, Lorenzo	Politecnico Di Milano
Zanchettin, Andrea Maria	Politecnico Di Milano
Rocco, Paolo	Politecnico Di Milano
Keywords: Virtual Reality and Interfaces, Sensor Fusion, RGB-D Perception Abstract: In a collaborative scenario, robots working side by side with humans might rely on vision sensors to monitor the activity of the other agent. When occlusions of the human body occur, both the safety of the cooperation and the performance of the team can be penalized, since the robot could receive incorrect information about the ongoing cooperation. In this work, we propose a novel particle filter algorithm that, by merging the data acquired through a RGB-D camera and a MR headset, estimates online the human wrist position. This algorithm allows to significantly reduce the uncertainty of the human pose estimation, in case of both static and dynamic occlusions. To this purpose, the proposed particle filter is integrated with a detailed virtual model of the real workspace. Moreover, additional constraints describing the boundaries of the motion of the human upper body are included in a virtualized framework. The results showed that the proposed technique entails significant improvements, determining a relevant reduction of the estimation error and of the uncertainty of the estimate.

14:30-14:45, Paper WeCT8.3
DR-SPAAM: A Spatial-Attention and Auto-Regressive Model for Person Detection in 2D Range Data

Jia, Dan	RWTH Aachen
Hermans, Alexander	RWTH Aachen University
Leibe, Bastian	RWTH Aachen University
Keywords: Object Detection, Segmentation and Categorization, Deep Learning for Visual Perception, Visual Learning Abstract: Detecting persons using a 2D LiDAR is a challenging task due to the low information content of 2D range data. To alleviate the problem caused by the sparsity of the LiDAR points, current state-of-the-art methods fuse multiple previous scans and perform detection using the combined scans. The downside of such a backward looking fusion is that all the scans need to be aligned explicitly, and the necessary alignment operation makes the whole pipeline more expensive -- often too expensive for real-world applications. In this paper, we propose a person detection network which uses an alternative strategy to combine scans obtained at different times. Our method, Distance Robust SPatial Attention and Auto-regressive Model (DR-SPAAM), follows a forward looking paradigm. It keeps the intermediate features from the backbone network as a template and recurrently updates the template when a new scan becomes available. The updated feature template is in turn used for detecting persons currently in the scene. On the DROW dataset, our method outperforms the existing state-of-the-art, while being approximately four times faster, running at 87.2 FPS on a laptop with a dedicated GPU and at 22.6 FPS on an NVIDIA Jetson AGX embedded GPU. We release our code in PyTorch and a ROS node including pre-trained models.

14:45-15:00, Paper WeCT8.4
Vision-Based Gesture Recognition in Human-Robot Teams Using Synthetic Data

de Melo, Celso	CCDC US Army Research Laboratory
Rothrock, Brandon	Jet Propulsion Laboratory, California Institute of Technology
Gurram, Prudhvi	CCDC US Army Research Laboratory
Ulutan, Oytun	UC Santa Barbara
Manjunath, B.S.	University of California Santa Barbara
Keywords: Computer Vision for Other Robotic Applications, Gesture, Posture and Facial Expressions, Deep Learning for Visual Perception Abstract: Building successful collaboration between humans and robots requires efficient, effective, and natural communication. Here we study a RGB-based deep learning approach for controlling robots through gestures (e.g., "follow me"). To address the challenge of collecting high-quality annotated data from human subjects, synthetic data is considered for this domain. We contribute a dataset of gestures that includes real videos with human subjects and synthetic videos from our custom simulator. A solution is presented for gesture recognition based on the state-of-the-art I3D model. Comprehensive testing was conducted to optimize the parameters for this model. Finally, to gather insight on the value of synthetic data, several experiments are described that systematically study the properties of synthetic data (e.g., gesture variations, character variety, generalization to new gestures). We discuss practical implications for the design of effective human-robot collaboration and the usefulness of synthetic data for deep learning.

15:00-15:15, Paper WeCT8.5
HAMLET: A Hierarchical Multimodal Attention-Based Human Activity Recognition Algorithm

Islam, Md Mofijul	University of Virginia
Iqbal, Tariq	University of Virginia
Keywords: Multi-Modal Perception, Novel Deep Learning Methods, Deep Learning for Visual Perception Abstract: To fluently collaborate with people, robots need the ability to recognize human activities accurately. Although modern robots are equipped with various sensors, robust human activity recognition (HAR) still remains a challenging task for robots due to difficulties related to multimodal data fusion. To address these challenges, in this work, we introduce a deep neural network-based multimodal HAR algorithm, HAMLET. HAMLET incorporates a hierarchical architecture, where the lower layer encodes spatio-temporal features from unimodal data by adopting a multi-head self-attention mechanism. We develop a novel multimodal attention mechanism for disentangling and fusing the salient unimodal features to compute the multimodal features in the upper layer. Finally, multimodal features are used in a fully connect neural-network to recognize human activities. We evaluated our algorithm by comparing its performance to several state-of-the-art activity recognition algorithms on three human activity datasets. The results suggest that HAMLET outperformed all other evaluated baselines across all datasets and metrics tested, with the highest top-1 accuracy of 95.12% and 97.45% on the UTD-MHAD [1] and the UT-Kinect [2] datasets respectively, and F1-score of 81.52% on the UCSD-MIT [3] dataset. We further visualize the unimodal and multimodal attention maps, which provide us with a tool to interpret the impact of attention mechanisms concerning HAR.

15:15-15:30, Paper WeCT8.6
Collision Avoidance in Human-Robot Interaction Using Kinect Vision System Combined with Robot�s Model and Data
Video Attachment

Nacimento, Hugo	Polytechnic School, Automation and Control Department, Universit
Mujica, Martin	INP-ENI of Tarbes
Benoussaad, Mourad	INP-ENI of Tarbes
Keywords: Collision Avoidance, Sensor-based Control, Peception-Action Coupling Abstract: Human-Robot Interaction (HRI) is a largely addressed subject today. Collision avoidance is one of main strategies that allow space sharing and interaction without contact between human and robot. It is thus usual to use a 3D depth camera sensor which may involves issues related to occluded robot in camera view. While several works overcame this issue by applying infinite depth principle or increasing the number of cameras, we developed in the current work a new and an original approach based on the combination of a 3D depth sensor (Microsoft Kinect V2) and the proprioceptive robot position sensors. This method uses a principle of limited safety contour around the obstacle to dynamically estimate the robot-obstacle distance, and then generate the repulsive force that controls the robot. For validation, our approach is applied in real time to avoid collision between dynamical obstacles (humans or objects) and the end-effector of a real 7-dof Kuka LBR iiwa collaborative robot. Several strategies based on distancing and its combination with dodging were tested. Results have shown a reactive and efficient collision avoidance, by ensuring a minimum obstacle-robot distance (of about 240mm), even when the robot is in an occluded zone in the Kinect camera view.


WeCT9	Room T9
Human Pose Estimation	Regular session
Chair: Johnson-Roberson, Matthew	University of Michigan
Co-Chair: Odobez, Jean-Marc	IDIAP

14:00-14:15, Paper WeCT9.1
Human Gait Phase Recognition Using a Hidden Markov Model Framework

Attal, Ferhat	University Paris-Est Cr�teil (UPEC)
Amirat, Yacine	University of Paris Est Cr�teil (UPEC)
Chibani, Abdelghani	Lissi Lab Paris EST University
Mohammed, Samer	University of Paris Est Cr�teil - (UPEC)
Keywords: AI-Based Methods, Recognition, Probability and Statistical Methods Abstract: Analysis of human daily living activities, particularly walking activity, is essential for health-care applications such as fall prevention, physical rehabilitation exercises, and gait monitoring. Studying the evolution of the gait cycle using wearable sensors is beneficial for the detection of any abnormal walking pattern. This paper proposes a novel discrete/continuous unsupervised Hidden Markov Model method that is able to recognize six gait phases of a typical human walking cycle through the use of two wearable Inertial Measurement Units (IMUs) mounted at both feet of the subject. The results obtained with the proposed approach were compared to those of well-known supervised and unsupervised segmentation approaches. The obtained results show the efficiency of the proposed approach in accurately recognizing the different gait phases of a human gait cycle. The proposed model allows the consideration of the sequential aspect of the walking gait phases while operating in an unsupervised context that avoids the process of data labeling, which is often tedious and time-consuming, particularly within a massive-data context.

14:15-14:30, Paper WeCT9.2
Using Diverse Neural Networks for Safer Human Pose Estimation: Towards Making Neural Networks Know When They Don�t Know

Schlosser, Patrick	Karlsruhe Institute of Technology
Ledermann, Christoph	Karlsruhe Institute of Technology
Keywords: Gesture, Posture and Facial Expressions, Robot Safety, Computer Vision for Other Robotic Applications Abstract: In recent years, human pose estimation has seen great improvements by the use of neural networks. However, these approaches are unsuitable for safety-critical applications such as human-robot interaction (HRI), as no guarantees are given whether a produced detection is correct or not and false detections with high confidence scores are produced on a regular basis. In this work, we propose a method to identify and eliminate false detections by comparing keypoint detections from different neural networks and assigning a 'Don't know' label in the case of a mismatch. Our approach is driven by the principle of software diversity, a technique recommended by the safety standard IEC 61508-7 for dealing with software implementation faults. We evaluate our general concept on the MPII human pose dataset using available ground truth data to calculate a suitable threshold for our keypoint comparison, reducing the number of false detections by approx. 61%. For the application at runtime, where no ground truth data is available, we introduce a method to calculate the needed threshold directly from keypoint detections. In further experiments, it was possible to reduce the number of false detections by approx. 75%. Eliminating keypoints by comparison also lowers the correct detection rate, which we maintained above 75% in all experiments. As this effect is limited and non-critical regarding safety we believe that the proposed approach can lead the way to a safe use of neural networks for human pose estimation in the future.

14:30-14:45, Paper WeCT9.3
Residual Pose: A Decoupled Approach for Depth-Based 3D Human Pose Estimation

Martinez-Gonzalez, Angel	EPFL, IDIAP
Villamizar, Michael	IDIAP
Canevet, Olivier	Idiap Research Institute
Odobez, Jean-Marc	IDIAP
Keywords: Social Human-Robot Interaction, Deep Learning for Visual Perception, Human-Centered Robotics Abstract: We propose to leverage recent advances in reliable 2D pose estimation with Convolutional Neural Networks (CNN) to estimate the 3D pose of people from depth images in multi-person Human-Robot Interaction (HRI) scenarios. Our method is based on the observation that using the depth information to obtain 3D lifted points from 2D body landmark detections provides a rough estimate of the true 3D human pose, thus requiring only a refinement step. In that line our contributions are threefold. (i) we propose to perform 3D pose estimation from depth images by decoupling 2D pose estimation and 3D pose refinement; (ii) we propose a deep-learning approach that regresses the residual pose between the lifted 3D pose and the true 3D pose; (iii) we show that despite its simplicity, our approach achieves very competitive results both in accuracy and speed on two public datasets and is therefore appealing for multi-person HRI compared to recent state-of-the-art methods.

14:45-15:00, Paper WeCT9.4
Simple Means Faster: Real-Time Human Motion Forecasting in Monocular First Person Videos on CPU
Video Attachment

Ansari, Junaid Ahmed	Tata Consultancy Services
Bhowmick, Brojeshwar	Tata Consultancy Services
Keywords: Deep Learning for Visual Perception, Computer Vision for Other Robotic Applications Abstract: We present a simple, fast, and light-weight RNN based framework for forecasting future locations of humans in first person monocular videos. The primary motivation for this work was to design a network which could accurately predict future trajectories at a very high rate on a CPU. Typical applications of such a system would be a social robot or a visual assistance system "for all", as both cannot afford to have high compute power to avoid getting heavier, less power efficient, and costlier. In contrast to many previous methods which rely on multiple type of cues such as camera ego-motion or 2D pose of the human, we show that a carefully designed network model which relies solely on bounding boxes can not only perform better but also predicts trajectories at a very high rate while being quite low in size of approximately 17 MB. Specifically, we demonstrate that having an auto-encoder in the encoding phase of the past information and a regularizing layer in the end boosts the accuracy of predictions with negligible overhead. We experiment with three first person video datasets: CityWalks, FPL and JAAD. Our simple method trained on CityWalks surpasses the prediction accuracy of state-of-the-art method (STED) while being 9.6x faster on a CPU (STED runs on a GPU). We also demonstrate that our model can transfer zero-shot or after just 15% fine-tuning to other similar datasets and perform on par with the state-of-the-art methods on such datasets (FPL and DTP). To the best of our knowledge, we are the first to accurately forecast trajectories at a very high prediction rate of 78 trajectories per second on CPU.

15:00-15:15, Paper WeCT9.5
Unsupervised Pedestrian Pose Prediction -- a Deep Predictive Coding Network Approach for Autonomous Vehicle Perception (I)

Du, Xiaoxiao	University of Michigan
Vasudevan, Ram	University of Michigan
Johnson-Roberson, Matthew	University of Michigan
Keywords: Deep Learning for Visual Perception, Gesture, Posture and Facial Expressions, Human and Humanoid Motion Analysis and Synthesis


WeCT10	Room T10
Visual Tracking	Regular session
Chair: Chen, Qifeng	HKUST
Co-Chair: Bekris, Kostas E.	Rutgers, the State University of New Jersey

14:00-14:15, Paper WeCT10.1
JRMOT: A Real-Time 3D Multi-Object Tracker and a New Large-Scale Dataset
Video Attachment

Shenoi, Abhijeet	Stanford University
Patel, Mihir	Stanford University
Gwak, JunYoung	Stanford University
Goebel, Patrick	Stanford University
Sadeghian, Amir	Stanford University
Rezatofighi, S. Hamid	The University of Adelaide
Mart�n-Mart�n, Roberto	Stanford University
Savarese, Silvio	Stanford University
Keywords: Visual Tracking, RGB-D Perception, Social Human-Robot Interaction Abstract: Robots navigating autonomously need to perceive and track the motion of objects and other agents in its surroundings. This information enables planning and executing robust and safe trajectories. To facilitate these processes, the motion should be perceived in 3D Cartesian space. However, most recent multi-object tracking (MOT) research has focused on tracking people and moving objects in 2D RGB video sequences. Our system is built with recent neural networks for re-identification, 2D and 3D detection and track description, combined into a joint probabilistic data-association framework within a multi-modal recursive Kalman architecture. As part of our work, we release the JRDB dataset, a novel large scale 2D+3D dataset and benchmark, annotated with over 2 million boxes and 3500 time consistent 2D+3D trajectories across 54 indoor and outdoor scenes. JRDB contains over 60 minutes of data including 360 degree cylindrical RGB video and 3D pointclouds in social settings that we use to develop, train and evaluate JRMOT. The presented 3D MOT system demonstrates state-of-the-art performance against competing methods on the popular 2D tracking KITTI benchmark and serves as first 3D tracking solution for our benchmark. Real-robot tests on our social robot JackRabbot indicate that the system is capable of tracking multiple pedestrians fast and reliably. We provide the ROS code of our tracker at https://sites.google.com/view/jrmot

14:15-14:30, Paper WeCT10.2
Factor Graph Based 3D Multi-Object Tracking in Point Clouds
Video Attachment

P�schmann, Johannes	Chemnitz University of Technology
Pfeifer, Tim	Chemnitz University of Technology
Protzel, Peter	Chemnitz University of Technology
Keywords: Visual Tracking Abstract: Accurate and reliable tracking of multiple moving objects in 3D space is an essential component of urban scene understanding. This is a challenging task because it requires the assignment of detections in the current frame to the predicted objects from the previous one. Existing filter-based approaches tend to struggle if this initial assignment is not correct, which can happen easily. We propose a novel optimization-based approach that does not rely on explicit and fixed assignments. Instead, we represent the result of an off-the-shelf 3D object detector as Gaussian mixture model, which is incorporated in a factor graph framework. This gives us the flexibility to assign all detections to all objects simultaneously. As a result, the assignment problem is solved implicitly and jointly with the 3D spatial multi-object state estimation using non-linear least squares optimization. Despite its simplicity, the proposed algorithm achieves robust and reliable tracking results and can be applied for offline as well as online tracking. We demonstrate its performance on the real world KITTI tracking dataset and achieve better results than many state-of-the-art algorithms. Especially the consistency of the estimated tracks is superior offline as well as online.

14:30-14:45, Paper WeCT10.3
Self-Supervised Object Tracking with Cycle-Consistent Siamese Networks
Video Attachment

Yuan, Weihao	Hong Kong University of Science and Technology
Wang, Michael Yu	Hong Kong University of Science & Technology
Chen, Qifeng	HKUST
Keywords: Visual Tracking, Visual Learning, Object Detection, Segmentation and Categorization Abstract: Self-supervised learning for visual object tracking possesses valuable advantages compared to supervised learning, such as the non-necessity of laborious human annotations and online training. In this work, we exploit an end-to-end Siamese network in a cycle-consistent self-supervised framework for object tracking. Self-supervision can be performed by taking advantage of the cycle consistency in the forward and backward tracking. To better leverage the end-to-end learning of deep networks, we propose to integrate a Siamese region proposal and mask regression network in our tracking framework so that a fast and more accurate tracker can be learned without the annotation of each frame. The experiments on the VOT dataset for visual object tracking and on the DAVIS dataset for video object segmentation propagation show that our method outperforms prior approaches on both tasks.

14:45-15:00, Paper WeCT10.4
3D Multi-Object Tracking: A Baseline and New Evaluation Metrics

Weng, Xinshuo	Carnegie Mellon University
Wang, Jianren	Carnegie Mellon University
Held, David	Carnegie Mellon University
Kitani, Kris	Carnegie Mellon University
Keywords: Visual Tracking, Deep Learning for Visual Perception Abstract: 3D multi-object tracking (MOT) is an essential component for many applications such as autonomous driving and assistive robotics. Recent work on 3D MOT focuses on developing accurate systems giving less attention to practical considerations such as computational cost and system complexity. In contrast, this work proposes a simple real-time 3D MOT system. Our system first obtains 3D detections from a LiDAR point cloud. Then, a straightforward combination of a 3D Kalman filter and the Hungarian algorithm is used for state estimation and data association. Additionally, 3D MOT datasets such as KITTI evaluate MOT methods in the 2D space and standardized 3D MOT evaluation tools are missing for a fair comparison of 3D MOT methods. Therefore, we propose a new 3D MOT evaluation tool along with three new metrics to comprehensively evaluate 3D MOT methods. We show that, although our system employs a combination of classical MOT modules, we achieve state-of-the-art 3D MOT performance on two 3D MOT benchmarks (KITTI and nuScenes). Surprisingly, although our system does not use any 2D data as inputs, we achieve competitive performance on the KITTI 2D MOT leaderboard. Our proposed system runs at a rate of 207.4 FPS on the KITTI dataset, achieving the fastest speed among all modern MOT systems. To encourage standardized 3D MOT evaluation, our code is publicly available at http://www.xinshuoweng.com/projects/AB3DMOT.

15:00-15:15, Paper WeCT10.5
Se(3)-TrackNet: Data-Driven 6D Pose Tracking by Calibrating Image Residuals in Synthetic Domains
Video Attachment

Wen, Bowen	Rutgers University
Mitash, Chaitanya	Rutgers University
Ren, Baozhang	Rutgers University
Bekris, Kostas E.	Rutgers, the State University of New Jersey
Keywords: Visual Tracking, Deep Learning for Visual Perception, Perception for Grasping and Manipulation Abstract: Tracking the 6D pose of objects in video sequences is important for robot manipulation. This task, however, introduces multiple challenges: (i) robot manipulation involves significant occlusions; (ii) data and annotations are troublesome and difficult to collect for 6D poses, which complicates machine learning solutions, and (iii) incremental error drift often accumulates in long term tracking to necessitate re-initialization of the object's pose. This work proposes a data-driven optimization approach for long-term, 6D pose tracking. It aims to identify the optimal relative pose given the current RGB-D observation and a synthetic image conditioned on the previous best estimate and the object's model. The key contribution in this context is a novel neural network architecture, which appropriately disentangles the feature encoding to help reduce domain shift, and an effective 3D orientation representation via Lie Algebra. Consequently, even when the network is trained only with synthetic data can work effectively over real images. Comprehensive experiments over benchmarks - existing ones as well as a new dataset with significant occlusions related to object manipulation - show that the proposed approach achieves consistently robust estimates and outperforms alternatives, even though they have been trained with real images. The approach is also the most computationally efficient among the alternatives and achieves a tracking frequency of 90.9Hz.

15:15-15:30, Paper WeCT10.6
Motion Prediction in Visual Object Tracking

Wang, Jianren	Carnegie Mellon University
He, Yihui	Carnegie Mellon University
Keywords: Visual Tracking Abstract: Visual object tracking (VOT) is an essential component for many applications, such as autonomous driving or assistive robotics. However, recent works tend to develop accurate systems based on more computationally expensive feature extractors for better instance matching. In contrast, this work addresses the importance of motion prediction in VOT. We use an off-the-shelf object detector to obtain instance bounding boxes. Then, a combination of camera motion decouple and Kalman filter is used for state estimation. Although our baseline system is a straightforward combination of standard methods, we obtain state-of-the-art results. Our method establishes new state-of-the-art performance on VOT (VOT-2016 and VOT-2018). Our proposed method improves the EAO on VOT-2016 from 0.472 of prior art to 0.505, from 0.410 to 0.431 on VOT-2018. To show the generalizability, we also test our method on video object segmentation (VOS: DAVIS-2016 and DAVIS-2017) and observe consistent improvement.


WeCT11	Room T11
Multi-Modal Perception I	Regular session
Chair: Radha, Hayder	Michigan State University
Co-Chair: Vogiatzis, George	Aston University

14:00-14:15, Paper WeCT11.1
Look and Listen: A Multi-Modality Late Fusion Approach to Scene Classification for Autonomous Machines

Bird, Jordan J.	Aston University
Faria, Diego	Aston University
Premebida, Cristiano	University of Coimbra
Ekart, Aniko	Aston University
Vogiatzis, George	Aston University
Keywords: Multi-Modal Perception, Visual Learning, Visual-Based Navigation Abstract: The novelty of this study consists in a multi-modality approach to scene classification, where image and audio complement each other in a process of deep late fusion. The approach is demonstrated on a difficult classification problem, consisting of two synchronised and balanced datasets of 16,000 data objects, encompassing 4.4 hours of video of 8 environments with varying degrees of similarity. We first extract video frames and accompanying audio at one second intervals. The image and the audio datasets are first classified independently, using a fine-tuned VGG16 and an evolutionary optimised deep neural network, with accuracies of 89.27% and 93.72%, respectively. This is followed by late fusion of the two neural networks to enable a higher order function, leading to accuracy of 96.81% in this multi-modality classifier with synchronised video frames and audio clips. The tertiary neural network implemented for late fusion outperforms classical state-of-the-art classifiers by around 3% when the two primary networks are considered as feature generators. We show that situations where a single-modality may be confused by anomalous data points are now corrected through an emerging higher order integration. Prominent examples include a water feature in a city misclassified as a river by the audio classifier alone and a densely crowded street misclassified as a forest by the image classifier alone. Both are examples which are correctly classified by our multi-modality approach.

14:15-14:30, Paper WeCT11.2
CLOCs: Camera-LiDAR Object Candidates Fusion for 3D Object Detection
Video Attachment

Pang, Su	Michigan State University
Morris, Daniel	Michigan State University
Radha, Hayder	Michigan State University
Keywords: Multi-Modal Perception, Object Detection, Segmentation and Categorization, Sensor Fusion Abstract: There have been significant advances in neural networks for both 3D object detection using LiDAR and 2D object detection using video. However, it has been surprisingly difficult to train networks to effectively use both modalities in a way that demonstrates gain over single-modality networks. In this paper, we propose a novel Camera-LiDAR Object Candidates (CLOCs) fusion network. CLOCs fusion provides a low-complexity multi-modal fusion framework that significantly improves the performance of single-modality detectors. CLOCs operates on the combined output candidates before Non-Maximum Suppression (NMS) of any 2D and any 3D detector, and is trained to leverage their geometric and semantic consistencies to produce more accurate final 3D and 2D detection results. Our experimental evaluation on the challenging KITTI object detection benchmark, including 3D and bird's eye view metrics, shows significant improvements, especially at long distance, over the state-of-the-art fusion based methods. At time of submission, CLOCs ranks the highest among all the fusion-based methods in the official KITTI leaderboard. We will release our code upon acceptance.

14:30-14:45, Paper WeCT11.3
Gimme Signals: Discriminative Signal Encoding for Multimodal Activity Recognition
Video Attachment

Memmesheimer, Raphael	University of Koblenz-Landau
Theisen, Nick	University Koblenz-Landau
Paulus, Dietrich	Universt�t Koblenz-Landau
Keywords: Multi-Modal Perception, Deep Learning for Visual Perception, Sensor Fusion Abstract: We present a simple, yet effective and flexible method for action recognition supporting multiple sensor modalities. Multivariate signal sequences are encoded in an image and are classified using a recently proposed EfficientNet CNN architecture. Our focus was to find an approach that generalizes well across different sensor modalities without data-specific adaptions while still achieving good results. We apply our method to 4 action recognition datasets containing skeleton sequences, inertial and motion capturing measurements as well as Wi-Fi fingerprints that range up to 120 classes. Our method defines the current best CNN-based approach on the NTU RGB+D 120 dataset, lifts the state of the art on the ARIL Wi-Fi dataset by +6.78%, improves the UTD-MHAD inertial baseline by +14.43%, the UTD-MHAD skeleton baseline by +0.5% and achieves 96.11% on the Simitate motion capturing data (80/20 split). We further demonstrate experiments on both, modality fusion on a signal level and signal reduction to prevent the representation from overloading.

14:45-15:00, Paper WeCT11.4
3D Localization of a Sound Source Using Mobile Microphone Arrays Referenced by SLAM
Video Attachment

Michaud, Simon	Universit� De Sherbrooke
Faucher, Samuel	Universit� De Sherbrooke
Grondin, Francois	Massachusetts Institute of Technology
Lauzon, Jean-Samuel	Universit� De Sherbrooke
Labb�, Mathieu	Universit� De Sherbrooke
L�tourneau, Dominic	Universit� De Sherbrooke
Ferland, Fran�ois	Universit� De Sherbrooke
Michaud, Francois	Universite De Sherbrooke
Keywords: Multi-Modal Perception, Robot Audition, SLAM Abstract: A microphone array can provide a mobile robot with the capability of localizing, tracking and separating distant sound sources in 2D, i.e., estimating their relative elevation and azimuth. To combine acoustic data with visual information in real world settings, spatial correlation must be established. The approach explored in this paper consists of having two robots, each equipped with a microphone array, localizing themselves in a shared reference map using SLAM. Based on their locations, data from the microphone arrays are used to triangulate in 3D the location of a sound source in relation to the same map. This strategy results in a novel cooperative sound mapping approach using mobile microphone arrays. Trials are conducted using two mobile robots localizing a static or a moving sound source to examine in which conditions this is possible. Results suggest that errors under 0.3 m are observed when the relative angle between the two robots are above 30� for a static sound source, while errors under 0.3 m for angles between 40� and 140� are observed with a moving sound source.

15:00-15:15, Paper WeCT11.5
When We First Met: Visual-Inertial Person Localization for Co-Robot Rendezvous

Sun, Xi	Carnegie Mellon University
Weng, Xinshuo	Carnegie Mellon University
Kitani, Kris	Carnegie Mellon University
Keywords: Multi-Modal Perception, Computer Vision for Automation, Representation Learning Abstract: We aim to enable robots to visually localize a target person through the aid of an additional sensing modality -- the target person's 3D inertial measurements. The need for such technology may arise when a robot is to meet a person in a crowd for the first time or when an autonomous vehicle must rendezvous with a rider amongst a crowd without knowing the appearance of the person in advance. A person's inertial information can be measured with a wearable device such as a smart-phone and can be shared selectively with an autonomous system during the rendezvous. We propose a method to learn a visual-inertial feature space in which the motion of a person in video can be easily matched to the motion measured by a wearable inertial measurement unit (IMU). The transformation of the two modalities into the joint feature space is learned through the use of a triplet loss which forces inertial motion features and video motion features generated by the same person to lie close in the joint feature space. To validate our approach, we compose a dataset of over 3,000 video segments of moving people along with wearable IMU data. We show that our method is able to localize a target person with 80.7% accuracy averaged over testing data with various number of candidates using only 5 seconds of IMU data and video.

15:15-15:30, Paper WeCT11.6
Self-Attention Based Visual-Tactile Fusion Learning for Predicting Grasp Outcomes

Cui, Shaowei	Institute of Automation, Chinese Academy of Sciences
Wang, Rui	Institute of Automation, Chinese Academy of Sciences
Wei, Junhang	Institute of Automation, Chinese Academy of Sciences
Hu, Jingyi	University of Chinese Academy of Sciences
Wang, Shuo	Chinese Academy of Sciences
Keywords: Multi-Modal Perception, Perception for Grasping and Manipulation, Grasping Abstract: Predicting whether a particular grasp will succeed is critical to performing stable grasping and manipulating tasks. Robots need to combine vision and touch as humans do to accomplish this prediction. The primary problem to be solved in this process is how to learn effective visual-tactile fusion features. In this paper, we propose a novel Visual-Tactile Fusion learning method based on the Self-Attention mechanism (VTFSA) to address this problem. We compare the proposed method with the traditional methods on two public multimodal grasping datasets, and the experimental results show that the VTFSA model outperforms traditional methods by a margin of 5 % and 7%. Furthermore, visualization analysis indicates that the VTFSA model can further capture some position-related visual-tactile fusion features that are beneficial to this task and is more robust than traditional methods.


WeCT12	Room T12
Multi-Modal Perception II	Regular session
Chair: Chernova, Sonia	Georgia Institute of Technology
Co-Chair: Thomas, Ulrike	Chemnitz University of Technology

14:00-14:15, Paper WeCT12.1
Using Machine Learning for Material Detection with Capacitive Proximity Sensors

Ding, Yitao	Chemnitz University of Technology
Kisner, Hannes	TU Chemnitz
Kong, Tianlin	TU Chemnitz
Thomas, Ulrike	Chemnitz University of Technology
Keywords: Multi-Modal Perception, Object Detection, Segmentation and Categorization, Perception for Grasping and Manipulation Abstract: The ability of detecting materials plays an important role in robotic applications. The robot can incorporate the information from contactless material detection and adapt its behavior in how it grasps an object or how it walks on specific surfaces. In this, paper we apply machine learning on impedance spectra from capacitive proximity sensors for material detection. The unique spectra of certain materials only differ slightly and are subject to noise and scaling effects during each measurement. A best-fit classification approach to pre-recorded data is therefore inaccurate. We perform classification on ten different materials and evaluate different classification algorithms ranging from simple k-NN approaches to artificial neural networks, which are able to extract the material specific information from the impedance spectra.

14:15-14:30, Paper WeCT12.2
Exploiting Visual-Outer Shape for Tactile-Inner Shape Estimation of Objects Covered with Soft Materials
Video Attachment

Miyamoto, Tomoya	Nara Institute of Science and Technology
Sasaki, Hikaru	Nara Institute of Science and Technology
Matsubara, Takamitsu	Nara Institute of Science and Technology
Keywords: Multi-Modal Perception, Peception-Action Coupling Abstract: In this paper, we consider the problem of inner-shape estimation of objects covered with soft materials, e.g., pastries wrapped in paper or vinyl, water bottles covered with shock-absorbing fabrics, or human bodies dressed in clothes. Due to the softness of the covered materials, tactile information obtained through physical touches can be useful to estimate such inner shape; however, using only tactile information is inefficient since it can collect local information at around the touchpoint. Another approach would be taking visual information obtained by cameras into account; however, it is not straightforward since the visual information only captures the outer shape of the covered materials, and it is unknown how much such visual-outer shape is similar/dissimilar to the tactile-inner shape. We propose an active tactile exploration framework that can utilize the visual-outer shape to efficiently estimate the inner shape of objects covered with soft materials. To this end, we propose the Gaussian Process Inner-Outer Implicit Surface model (GPIOIS) that jointly models the implicit surfaces of inner-outer shapes with their similarity by Gaussian processes. Simulation and real-robot experimental results demonstrated the effectiveness of our method.

14:30-14:45, Paper WeCT12.3
Tactile Event Based Grasping Algorithm Using Memorized Triggers and Mechanoreceptive Sensors
Video Attachment

Kim, Won Dong	KAIST
Kim, Jung	KAIST
Keywords: Perception for Grasping and Manipulation, Grasping, Force and Tactile Sensing Abstract: Humans perform grasping by breaking down the task into a series of action phases, where the transitions between the action phases are based on the comparison between the predicted tactile events and the actual tactile events. The dependency on tactile sensation in grasping allows humans to grasp objects without the need to locate the object precisely, which is a feature desirable in robot grasping to successfully grasp objects when there are uncertainties in localizing the target object. In this paper, we propose a method of implementing a tactile event based grasping algorithm using memorized predicted tactile events as state transition triggers, inspired by the human grasping. First, a simulated robotic manipulator mounted with pressure and vibration sensors on each finger, analogous to the different mechanoreceptors in humans, performed ideal grasping tasks, from which the tactile signals between consecutive states were extracted. The extracted tactile signals were processed and stored as predicted tactile events. Secondly, a grasping algorithm composed of eight discrete states, Reach, Re-Reach, Load, Lift, Hold, Avoid, Place, and Unload was built. The transition between consecutive states is triggered when the actual tactile events match the predicted tactile events, otherwise, triggering the corrective actions. Our algorithm was implemented on an actual robot, equipped with capacitive and piezoelectric transducers on the fingertips. Lastly, grasping experiments were conducted, where the target objects were deliberately misplaced from their expected positions, to investigate the robustness of the tactile event based grasping algorithm to object localization errors.

14:45-15:00, Paper WeCT12.4
Multimodal Sensor Fusion with Differentiable Filters
Video Attachment

Lee, Michelle	Stanford University
Yi, Brent	University of California, Berkeley
Mart�n-Mart�n, Roberto	Stanford University
Savarese, Silvio	Stanford University
Bohg, Jeannette	Stanford University
Keywords: Multi-Modal Perception, Perception for Grasping and Manipulation, Deep Learning in Grasping and Manipulation Abstract: Leveraging multimodal information with recursive Bayesian filters improves performance and robustness of state estimation, as recursive filters can combine different modalities according to their uncertainties. Prior work has studied how to optimally fuse different sensor modalities with analytical state estimation algorithms.However, deriving the dynamics and measurement models along with their noise profile can be difficult or lead to intractable models. Differentiable filters provide a way to learn these models end-to-end while retaining the algorithmic structure of recursive filters. This can be especially helpful when working with sensor modalities that are high dimensional and have very different characteristics. In contact-rich manipulation, we want to combine visual sensing (which gives us global information) with tactile sensing (which gives us local information). In this paper, we study new differentiable filtering architectures to fuse heterogeneous sensor information. As case studies, we evaluate three tasks: two in planar pushing (simulated and real) and one in manipulating a kinematically constrained door (simulated). In extensive evaluations, we find that differentiable filters that leverage crossmodal sensor information reach the same accuracy as unstructured LSTM models, while retaining interpretability of the state representation that may be important for safety-critical systems. We also release an open-source library for creating and training differentiable Bayesian filters in PyTorch, which can be found on our project website: https://sites.google.com/view/multimodalfilter.

15:00-15:15, Paper WeCT12.5
Multimodal Material Classification for Robots Using Spectroscopy and High Resolution Texture Imaging
Video Attachment

Erickson, Zackory	Georgia Institute of Technology
Xing, Eliot	Georgia Institute of Technology
Srirangam, Bharat	Georgia Institute of Technology
Chernova, Sonia	Georgia Institute of Technology
Kemp, Charlie	Georgia Institute of Technology
Keywords: Perception for Grasping and Manipulation, Mobile Manipulation, Service Robots Abstract: Material recognition can help inform robots about how to properly interact with and manipulate real-world objects. In this paper, we present a multimodal sensing technique, leveraging near-infrared spectroscopy and close-range high resolution texture imaging, that enables robots to estimate the materials of household objects. We release a dataset of high resolution texture images and spectral measurements collected from a mobile manipulator that interacted with 144 household objects. We then present a neural network architecture that learns a compact multimodal representation of spectral measurements and texture images. When generalizing material classification to new objects, we show that this multimodal representation enables a robot to recognize materials with greater performance as compared to prior state-of-the-art approaches. Finally, we present how a robot can combine this high resolution local sensing with images from the robot's head-mounted camera to achieve accurate material classification over a scene of objects on a table.

15:15-15:30, Paper WeCT12.6
DeepLiDARFlow: A Deep Learning Architecture for Scene Flow Estimation Using Monocular Camera and Sparse LiDAR

Rishav, Rishav	Birla Institute of Technology and Science, Pilani; DFKI Kaisersl
Battrawy, Ramy	DFKI
Schuster, Ren�	DFKI
Wasenm�ller, Oliver	German Research Center for Artificial Intelligence (DFKI)
Stricker, Didier	German Research Center for Artificial Intelligence
Keywords: Deep Learning for Visual Perception, Multi-Modal Perception, Computer Vision for Transportation Abstract: Scene flow is the dense 3D reconstruction of motion and geometry of a scene. Most state-of-the-art methods use a pair of stereo images as input for full scene reconstruction. These methods depend a lot on the quality of the RGB images and perform poorly in regions with reflective objects, shadows, ill-conditioned light environment and so on. LiDAR measurements are much less sensitive to the aforementioned conditions but LiDAR features are in general unsuitable for matching tasks due to their sparse nature. Hence, using both LiDAR and RGB can potentially overcome the individual disadvantages of each sensor by mutual improvement and yield robust features which can improve the matching process. In this paper, we present DeepLiDARFlow, a novel deep learning architecture which fuses high level RGB and LiDAR features at multiple scales in a monocular setup to predict dense scene flow. Its performance is much better in the critical regions where image-only and LiDAR-only methods are inaccurate. We verify our DeepLiDARFlow using the established data sets KITTI and FlyingThings3D and we show strong robustness compared to several state-of-the-art methods which used other input modalities. The code of our paper is available at https://github.com/dfki-av/DeepLiDARFlow.

15:15-15:30, Paper WeCT12.7
Balanced Depth Completion between Dense Depth Inference and Sparse Range Measurements Via KISS-GP
Video Attachment

Yoon, SungHo	KAIST (Korea Advanced Institute of Science and Technology)
Kim, Ayoung	Korea Advanced Institute of Science Technology
Keywords: Multi-Modal Perception, Sensor Fusion, Computer Vision for Other Robotic Applications Abstract: Estimating a dense and accurate depth map is the key requirement for autonomous driving and robotics. Recent advances in deep learning have allowed depth estimation in full resolution from a single image. Despite this impressive result, many deep-learning-based monocular depth estimation (MDE) algorithms have failed to keep their accuracy yielding a meter-level estimation error. In many robotics applications, accurate but sparse measurements are readily available from Light Detection and Ranging (LiDAR). Although they are highly accurate, the sparsity limits full resolution depth map reconstruction. Targeting the problem of dense and accurate depth map recovery, this paper introduces the fusion of these two modalities as a depth completion (DC) problem by dividing the role of depth inference and depth regression. Utilizing the state-of-the-art MDE and our Gaussian process (GP) based depth-regression method, we propose a general solution that can flexibly work with various MDE modules by enhancing its depth with sparse range measurements, as shown at Fig. 1. To overcome the major limitation of GP, we adopt Kernel Interpolation for Scalable Structured (KISS)-GP and mitigate the computational complexity from O(N3) to O(N). Our experiments demonstrate that the accuracy and robustness of our method outperform state-of-the-art unsupervised methods for sparse and biased measurements.


WeCT13	Room T13
RGB-D Perception	Regular session
Chair: Bennewitz, Maren	University of Bonn
Co-Chair: Duckett, Tom	University of Lincoln

14:00-14:15, Paper WeCT13.1
Polygonal Perception for Mobile Robots
Video Attachment

Missura, Marcell	University of Bonn
Roychoudhury, Arindam	University of Bonn
Bennewitz, Maren	University of Bonn
Keywords: RGB-D Perception, Range Sensing Abstract: Geometric primitives are a compact and versatile representation of the environment and the objects within. From a motion planning perspective, the geometric structure can be leveraged in order to implement potentially faster and smoother motion control algorithms than it has been possible with grid-based occupancy maps so far. In this paper, we introduce a novel perception pipeline that efficiently processes the point cloud obtained from an RGB-D sensor in order to produce a floor-projected 2D map in the field-of-view of the robot where obstacles are represented as polygons rather than cells. These polygons can then be processed by path planning algorithms and obstacle avoidance controllers. Our pipeline includes a ground floor plane detector that performs significantly faster than other contemporary solutions and a grid segmentation algorithm that uses image processing techniques to identify the contours of obstacles in order to convert them to polygons. We demonstrate the performance of our approach in experiments with a wheeled and a humanoid robot and show that our polygonal perception pipeline works robustly even in the presence of the disturbances caused by the shaking of a walking robot.

14:15-14:30, Paper WeCT13.2
Real-Time Detection of Broccoli Crops in 3D Point Clouds for Autonomous Robotic Harvesting

Montes, Hector	University of Lincoln
Le Louedec, Justin	University of Lincoln
Cielniak, Grzegorz	University of Lincoln
Duckett, Tom	University of Lincoln
Keywords: RGB-D Perception, Robotics in Agriculture and Forestry, AI-Based Methods Abstract: Real-time 3D perception of the environment is crucial for the adoption and deployment of reliable autonomous harvesting robots in agriculture. Using data collected with RGB-D cameras under farm field conditions, we present two methods for processing 3D data that reliably detect mature broccoli heads. The proposed systems are efficient and enable real-time detection on depth data of broccoli crops using the organised structure of the point clouds delivered by a depth sensor. The systems are tested with datasets of two broccoli varieties collected in planted fields from two different countries. Our evaluation shows the new methods outperform state-of-the-art approaches for broccoli detection based on both 2D vision-based segmentation techniques and depth clustering using the Euclidean proximity of neighbouring points. The results show the systems are capable of accurately detecting the 3D locations of broccoli heads relative to the vehicle at high frame rates.

14:30-14:45, Paper WeCT13.3
SGM-MDE: Semi-Global Optimization for Classification-Based Monocular Depth Estimation

Miclea, Vlad	Technical University of Cluj-Napoca
Nedevschi, Sergiu	Technical University of Cluj
Keywords: RGB-D Perception, Range Sensing, Deep Learning for Visual Perception Abstract: Depth estimation plays a crucial role in robotic applications that require environment perception. With the introduction of convolutional neural networks, monocular depth estimation (MDE) methods have become viable alternatives to LiDAR and stereo reconstruction-based solutions. Such methods require less equipment, fewer resources and do not need additional sensor alignment requirements. However, due to the ill-posed formulation of MDE, such algorithms can only rely on learning mechanisms, which makes them less reliable and less robust. In this work we propose a novel method to cope with the lack of geometric constraints inherent to monocular depth computation. Towards this goal, we initially mathematically transform the feature vectors from the last layer inside a MDE CNN such that a 3D stereo-like cost volume is generated. We then adapt the semi-global stereo optimization to the aforementioned volume, global consistency of the map being ensured. Furthermore, we enhance the results by adding a sub-pixel stereo post-processing be means of interpolation functions, a larger range of depth values being obtained. Our method can be applied to any classification-based MDE, experiments showing an increase in accuracy with an additional time cost of only 8 ms on a regular GPU, making the technique usable for real-time applications.

14:45-15:00, Paper WeCT13.4
Multi-Task Deep Learning for Depth-Based Person Perception in Mobile Robotics
Video Attachment

Seichter, Daniel	Ilmenau University of Technology
Lewandowski, Benjamin	Ilmenau University of Technology
Hoechemer, Dominik	Ilmenau University of Technology
Wengefeld, Tim	Ilmenau University of Technology
Gross, Horst-Michael	Ilmenau University of Technology
Keywords: RGB-D Perception, Deep Learning for Visual Perception, Service Robotics Abstract: Efficient and robust person perception is one of the most basic skills a mobile robot must have to ensure intuitive human-machine interaction. In addition to person detection, this also includes estimating various attributes, like posture or body orientation, in order to achieve user-adaptive behavior. However, given limited computing and battery capabilities on a mobile robot, it is inefficient to solve all perception tasks separately, especially when using computationally expensive deep neural networks. Therefore, we propose a multi-task system for person perception, comprising of a fast, depth-based region proposal and an efficient, lightweight deep neural network. Using a single network forward pass, the system simultaneously detects persons, classifies their body postures, and estimates the upper body orientations while retaining almost the same computation time as a single-task network. We describe how to handle a real-world multi-task scenario and conduct an extensive series of experiments in order to compare various network architectures and task weightings. We further show that multi-task learning improves the networks� performance compared to their single-task baselines. For training and evaluation, we combine an existing dataset for orientation estimation and a new, self-recorded dataset, consisting of more than 235,000 depth patches that is made publicly available to the research community.

15:00-15:15, Paper WeCT13.5
Unsupervised Domain Adaptation through Inter-Modal Rotation for RGB-D Object Recognition

Loghmani, Mohammad Reza	Vienna University of Technology
Robbiano, Luca	Politecnico Di Torino
Planamente, Mirco	Italian Institute of Technology
Park, Kiru	TU Wien
Caputo, Barbara	Sapienza University
Vincze, Markus	Vienna University of Technology
Keywords: RGB-D Perception, Recognition, Deep Learning for Visual Perception Abstract: Unsupervised Domain Adaptation (DA) exploits the supervision of a label-rich source dataset to make predictions on an unlabeled target dataset by aligning the two data distributions. In robotics, DA is used to take advantage of automatically generated synthetic data, that come with ``free" annotation, to make effective predictions on real data. However, existing DA methods are not designed to cope with the multi-modal nature of RGB-D data, which are widely used in robotic vision. We propose a novel RGB-D DA method that reduces the synthetic-to-real domain shift by exploiting the inter-modal relation between the RGB and depth image. Our method consists of training a convolutional neural network to solve, in addition to the main recognition task, the pretext task of predicting the relative rotation between the RGB and depth image. To evaluate our method and encourage further research in this area, we define two benchmark datasets for object categorization and instance recognition. With extensive experiments, we show the benefits of leveraging the inter-modal relations for RGB-D DA.

15:15-15:30, Paper WeCT13.6
3D Instance Embedding Learning with a Structure-Aware Loss Function for Point Cloud Segmentation

Liang, Zhidong	Shanghai Jiao Tong University
Yang, Ming	Shanghai Jiao Tong University
Li, Hao	Shanghai Jiao Tong University
Wang, Chunxiang	Shanghai Jiaotong University
Keywords: Semantic Scene Understanding, RGB-D Perception, AI-Based Methods Abstract: This paper presents a framework for 3D instance segmentation on point clouds. A 3D convolutional neural network is used as the backbone to generate semantic predictions and instance embeddings simultaneously. In addition to the embedding information, point clouds also provide 3D geometric information which reflects the relation between points. Considering both types of information, the structure-aware loss function is proposed to achieve discriminative embeddings for each 3D instance. To eliminate the quantization error caused by 3D voxel, the attention-based k-nearest neighbor (kNN) is proposed. Different from the average strategy, it learns different weights for different neighbors to aggregate and update the instance embeddings. Our network can be trained in an end-to-end style. Experiments show that our approach achieves state-of-the-art performance on two challenging datasets for instance segmentation.


WeCT14	Room T14
Object Detection	Regular session
Chair: Murillo, Ana Cristina	University of Zaragoza
Co-Chair: Meyer, Gregory P.	Uber Advanced Technologies Group

14:00-14:15, Paper WeCT14.1
Learning an Uncertainty-Aware Object Detector for Autonomous Driving

Meyer, Gregory P.	Uber Advanced Technologies Group
Thakurdesai, Niranjan	Georgia Institute of Technology
Keywords: Object Detection, Segmentation and Categorization, Deep Learning for Visual Perception, Computer Vision for Transportation Abstract: The capability to detect objects is a core part of autonomous driving. Due to sensor noise and incomplete data, perfectly detecting and localizing every object is infeasible. Therefore, it is important for a detector to provide the amount of uncertainty in each prediction. Providing the autonomous system with reliable uncertainties enables the vehicle to react differently based on the level of uncertainty. Previous work has estimated the uncertainty in a detection by predicting a probability distribution over object bounding boxes. In this work, we propose a method to improve the ability to learn the probability distribution by considering the potential noise in the ground-truth labeled data. Our proposed approach improves not only the accuracy of the learned distribution but also the object detection performance.

14:15-14:30, Paper WeCT14.2
Leveraging Stereo-Camera Data for Real-Time Dynamic Obstacle Detection and Tracking
Video Attachment

Eppenberger, Thomas	ETH Zurich
Cesari, Gianluca	ETH Zurich
Dymczyk, Marcin Tomasz	ETH Zurich, Autonomous Systems Lab
Siegwart, Roland	ETH Zurich
Dub�, Renaud	ETH Z�rich
Keywords: RGB-D Perception, Object Detection, Segmentation and Categorization Abstract: Dynamic obstacle avoidance is one crucial component for compliant navigation in crowded environments. In this paper we present a system for accurate and reliable detection and tracking of dynamic objects using noisy point cloud data generated by stereo cameras. Our solution is realtime capable and specifically designed for the deployment on computationally-constrained unmanned ground vehicles. The proposed approach identifies individual objects in the robot�s surroundings and classifies them as either static or dynamic. The dynamic objects are labeled as either a person or a generic dynamic object. We then estimate their velocities to generate a 2D occupancy grid that is suitable for performing obstacle avoidance. We evaluate the system in indoor and outdoor scenarios and achieve real-time performance on a consumergrade computer. On our test-dataset, we reach a MOTP of 0.07 � 0.07m, and a MOTA of 85.3% for the detection and tracking of dynamic objects. We reach a precision of 96.9% for the detection of static objects.

14:30-14:45, Paper WeCT14.3
Robust and Efficient Post-Processing for Video Object Detection
Video Attachment

Sabater, Alberto	Universidad De Zaragoza
Montesano, Luis	Universidad De Zaragoza
Murillo, Ana Cristina	University of Zaragoza
Keywords: Object Detection, Segmentation and Categorization, Visual Tracking, Failure Detection and Recovery Abstract: Object recognition in video is an important task for plenty of applications, including autonomous driving perception, surveillance tasks, wearable devices or IoT networks. Object recognition using video data is more challenging than using still images due to blur, occlusions or rare object poses. Specific video detectors with high computational cost or standard image detectors together with a fast post-processing algorithm achieve the current state-of-the-art. This work introduces a novel post-processing pipeline that overcomes some of the limitations of previous post-processing methods by introducing a learning-based similarity evaluation between detections across frames. Our method improves the results of stat-of-the-art specific video detectors, specially regarding fast moving objects, and presents low resource requirements. And applied to efficient still image detectors, such as YOLO, provides comparable results to much more computationally intensive detectors.

14:45-15:00, Paper WeCT14.4
Modality-Buffet for Real-Time Object Detection

Dorka, Nicolai	University of Freiburg
Meyer, Johannes	University of Freiburg
Burgard, Wolfram	Toyota Research Institute
Keywords: Object Detection, Segmentation and Categorization, RGB-D Perception, Reinforecment Learning Abstract: Real-time object detection in videos using lightweight hardware is a crucial component of many robotic tasks. Detectors using different modalities and with varying computational complexities offer different tradeoffs. One option is to have a very lightweight model that can predict from all modalities at once for each frame. However, in some situations (e.g. in static scenes) it might be better to have a more complex but more accurate model and to extrapolate from previous predictions for the frames coming in during the processing time. We formulate this as a sequential decision making problem and use reinforcement learning (RL) to generate a policy that decides from the RGB input which detector out of a portfolio of different object detectors to take for the next prediction. The objective of the RL agent is to maximize the accuracy of the predictions per image. We evaluate the approach on the Waymo Open Dataset and show that it exceeds the performance of each single detector.

15:00-15:15, Paper WeCT14.5
Deep Mixture Density Network for Probabilistic Object Detection

He, Yihui	Carnegie Mellon University
Wang, Jianren	Carnegie Mellon University
Keywords: Object Detection, Segmentation and Categorization Abstract: Mistakes/uncertainties in object detection could lead to catastrophes when deploying robots in the real world. In this paper, we measure the uncertainties of object localization to minimize this kind of risk. Uncertainties emerge upon challenging cases like occlusion. The bounding box borders of an occluded object can have multiple plausible configurations. We propose a deep multivariate mixture of Gaussians model for probabilistic object detection. The covariances help to learn the relationship between the borders, and the mixture components potentially learn different configurations of an occluded part. Quantitatively, our model improves the AP of the baselines by 3.9% and 1.4% on CrowdHuman and MSCOCO respectively with almost no computational or memory overhead. Qualitatively, our model enjoys explainability since the resulting covariance matrices and the mixture components help to measure uncertainties.

15:15-15:30, Paper WeCT14.6
MLOD: Awareness of Extrinsic Perturbation in Multi-LiDAR 3D Object Detection for Autonomous Driving
Video Attachment

Jiao, Jianhao	The Hong Kong University of Science and Technology
Yun, Peng	The Hong Kong University of Science and Technology
Tai, Lei	Huawei Technologies
Liu, Ming	Hong Kong University of Science and Technology
Keywords: Object Detection, Segmentation and Categorization, Deep Learning for Visual Perception, Sensor Fusion Abstract: Extrinsic perturbation always exists in multiple sensors. In this paper, we focus on the extrinsic uncertainty in multi-LiDAR systems for 3D object detection. We first analyze the influence of extrinsic perturbation on geometric tasks with two basic examples. To minimize the detrimental effect of extrinsic perturbation, we propagate an uncertainty prior on each point of input point clouds, and use this information to boost an approach for 3D geometric tasks. Then we extend our findings to propose a multi-LiDAR 3D object detector called MLOD. MLOD is a two-stage network where the multi-LiDAR information is fused through various schemes in stage one, and the extrinsic perturbation is handled in stage two. We conduct extensive experiments on a real-world dataset, and demonstrate both the accuracy and robustness improvement of MLOD. The code, data and supplementary materials are available at: https://ram-lab.com/file/site/mlod.


WeCT15	Room T15
Object Pose Estimation I	Regular session
Chair: Knoll, Alois	Tech. Univ. Muenchen TUM
Co-Chair: Jenkins, Odest Chadwicke	University of Michigan

14:00-14:15, Paper WeCT15.1
Active 6D Multi-Object Pose Estimation in Cluttered Scenarios with Deep Reinforcement Learning

Sock, Juil	Imperial College London
Garcia-Hernando, Guillermo	Imperial College London
Kim, Tae-Kyun	Imperial College London
Keywords: Object Detection, Segmentation and Categorization, Computer Vision for Other Robotic Applications, Visual-Based Navigation Abstract: In this work, we explore how a strategic selection of camera movements can facilitate the task of 6D multi-object pose estimation in cluttered scenarios while respecting real-world constraints important in robotics and augmented reality applications, such as time and distance travelled. In the proposed framework, a set of multiple object hypotheses is given to an agent, which is inferred by an object pose estimator and subsequently spatio-temporally selected by a fusion function that makes use of a verification score that circumvents the need of ground-truth annotations. The agent reasons about these hypotheses, directing its attention to the object which it is most uncertain about, moving the camera towards such an object. Unlike previous works that propose short-sighted policies, our agent is trained in simulated scenarios using reinforcement learning, attempting to learn the camera moves that produce the most accurate object poses hypotheses for a given temporal and spatial budget, without the need of viewpoints rendering during inference. Our experiments show that the proposed approach successfully estimates the 6D object pose of a stack of objects in both challenging cluttered synthetic and real scenarios, showing superior performance compared to other baselines.

14:15-14:30, Paper WeCT15.2
6D Pose Estimation for Flexible Production with Small Lot Sizes Based on CAD Models Using Gaussian Process Implicit Surfaces

Lin, Jianjie	Fortiss, An-Institut Technische Universit�t M�nchen
Rickert, Markus	Fortiss, An-Institut Technische Universit�t M�nchen
Knoll, Alois	Tech. Univ. Muenchen TUM
Keywords: Perception for Grasping and Manipulation Abstract: We propose a surface-to-surface~(S2S) point registration algorithm by exploiting the Gaussian Process Implicit Surfaces for partially overlapping 3D surfaces to estimate the 6D pose transformation. Unlike traditional approaches, that separate the corresponding search and update steps in the inner loop, we formulate the point registration as a nonlinear non-constraints optimization problem which does not explicitly use any corresponding points between two point sets. According to the implicit function theorem, we form one point set as a Gaussian Process Implicit Surfaces utilizing the signed distance function, which implicitly creates three manifolds. Points on the same manifold share the same function value, indicated as~mathbf{{1, 0, -1}}. The problem is thus converted into finding a rigid transformation that minimizes the inherent function value. This can be solved by using a Gauss-Newton~(GN) or Levenberg-Marquardt~(LM) solver. In the case of a partially overlapping 3D surface, the Fast Point Feature Histogram~(FPFH) algorithm is applied to both point sets and a Principal Component Analysis~(PCA) is performed on the result. Based on this, the initial transformation can then be computed. We conduct experiments on multiple point sets to evaluate the effectiveness of our proposed approach against existing state-of-the-art methods.

14:30-14:45, Paper WeCT15.3
Learning Orientation Distributions for Object Pose Estimation
Video Attachment

Okorn, Brian	Carnegie Mellon University
Xu, Mengyun	Carnegie Mellon University
Hebert, Martial	CMU
Held, David	Carnegie Mellon University
Keywords: Object Detection, Segmentation and Categorization, RGB-D Perception, Deep Learning for Visual Perception Abstract: For robots to operate robustly in the real world, they should be aware of their uncertainty. However, most methods for object pose estimation return a single point estimate of the object's pose. In this work, we propose two learned methods for estimating a distribution over an object's orientation. Our methods take into account both the inaccuracies in the pose estimation as well as the object symmetries. Our first method, which regresses from deep learned features to an isotropic Bingham distribution, gives the best performance for orientation distribution estimation for non-symmetric objects. Our second method learns to compare deep features and generates a non-parameteric histogram distribution. This method gives the best performance on objects with unknown symmetries, accurately modeling both symmetric and non-symmetric objects, without any requirement of symmetry annotation. We show that both of these methods can be used to augment an existing pose estimator. Our evaluation compares our methods to a large number of baseline approaches for uncertainty estimation across a variety of different types of objects. Code available at https://bokorn.github.io/orientation-distributions/

14:45-15:00, Paper WeCT15.4
Estimation of Object Class and Orientation from Multiple Viewpoints and Relative Camera Orientation Constraints

Ogawara, Koichi	Wakayama University
Iseki, Keita	Wakayama University
Keywords: Object Detection, Segmentation and Categorization, Visual Learning, Service Robots Abstract: In this research, we propose a method of estimating object class and orientation given multiple input images assuming the relative camera orientations are known. Input images are transformed to descriptors on 2-D manifolds defined for each class of object through a CNN, and the object class and orientation that minimize the distance between input descriptors and the descriptors associated with the estimated object class and orientation are selected. The object orientation is further optimized by interpolating the viewpoints in the database. The usefulness of the proposed method is demonstrated by comparative evaluation with other methods using publicly available datasets. The usefulness of the proposed method is also demonstrated by recognizing images taken from the cameras on our humanoid robot using our own dataset.

15:00-15:15, Paper WeCT15.5
Parts-Based Articulated Object Localization in Clutter Using Belief Propagation
Video Attachment

Pavlasek, Jana	University of Michigan
Lewis, Stanley	University of Michigan
Desingh, Karthik	University of Michigan
Jenkins, Odest Chadwicke	University of Michigan
Keywords: Perception for Grasping and Manipulation, RGB-D Perception Abstract: Robots working in human environments must be able to perceive and act on challenging objects with articulations, such as a pile of tools. Articulated objects increase the dimensionality of the pose estimation problem, and partial observations under clutter create additional challenges. To address this problem, we present a generative-discriminative parts-based recognition and localization method for articulated objects in clutter. We formulate the problem of articulated object pose estimation as a Markov Random Field (MRF). Hidden nodes in this MRF express the pose of the object parts, and edges express the articulation constraints between parts. Localization is performed within the MRF using an efficient belief propagation method. The method is informed by both part segmentation heatmaps over the observation, generated by a neural network, and the articulation constraints between object parts. Our generative-discriminative approach allows the proposed method to function in cluttered environments by inferring the pose of occluded parts using hypotheses from the visible parts. We demonstrate the efficacy of our methods in a tabletop environment for recognizing and localizing hand tools in uncluttered and cluttered configurations.

15:15-15:30, Paper WeCT15.6
PrimA6D: Rotational Primitive Reconstruction for Enhanced and Robust 6D Pose Estimation
Video Attachment

Jeon, Myung-Hwan	KAIST
Kim, Ayoung	Korea Advanced Institute of Science Technology
Keywords: Perception for Grasping and Manipulation, Deep Learning for Visual Perception, Computer Vision for Automation Abstract: In this paper, we introduce a rotational primitive prediction based 6D object pose estimation using a single image as an input. We solve for the 6D object pose of a known object relative to the camera using a single image with occlusion. Many recent state-of-the-art (SOTA) two-step approaches have exploited image keypoints extraction followed by PnP regression for pose estimation. Instead of relying on bounding box or keypoints on the object, we propose to learn orientation-induced primitive so as to achieve the pose estimation accuracy regardless of the object size. We leverage a Variational AutoEncoder (VAE) to learn this underlying primitive and its associated keypoints. The keypoints inferred from the reconstructed primitive image are then used to regress the rotation using PnP. Lastly, we compute the translation in a separate localization module to complete the entire 6D pose estimation. When evaluated over public datasets, the proposed method yields a notable improvement over the LINEMOD, the Occlusion LINEMOD, and the YCB-Video dataset. We further provide a synthetic-only trained case presenting comparable performance to the existing methods which require real images in the training phase.


WeCT16	Room T16
Object Pose Estimation II	Regular session
Chair: Likhachev, Maxim	Carnegie Mellon University
Co-Chair: Behnke, Sven	University of Bonn

14:00-14:15, Paper WeCT16.1
3D Gaze Estimation for Head-Mounted Devices Based on Visual Saliency

Liu, Meng	City University of Hong Kong
Li, You-Fu	City University of Hong Kong
Liu, Hai	City University of Hong Kong
Keywords: Visual Tracking, Computer Vision for Other Robotic Applications, Wearable Robots Abstract: Compared with the maturity of 2D gaze tracking technology, 3D gaze tracking has gradually become a research hotspot in recent years. The head-mounted gaze tracker has shown great potential for gaze estimation in 3D space due to its appealing flexibility and portability. The general challenge for 3D gaze tracking algorithms is that calibration is necessary before the usage, and calibration targets cannot be easily applied in some situations or might be blocked by moving human and objects. Besides, the accuracy on depth direction has always come to be a crucial problem. Regarding the issues mentioned above, a 3D gaze estimation with auto-calibration method is proposed in this study. We use an RGBD camera as the scene camera to acquire the accurate 3D structure of the environment. The automatic calibration is achieved by uniting gaze vectors with saliency maps of the scene which aligned depth information. Finally, we determine the 3D gaze point through a point cloud generated from the RGBD camera. The experiment result demonstrates that our proposed method achieves 4.34� of average angle error in the field from 0.5m to 3m and the average depth error is 23.22mm, which is sufficient for 3D gaze estimation in the real scene.

14:15-14:30, Paper WeCT16.2
Category-Level 3D Non-Rigid Registration from Single-View RGB Images
Video Attachment

Rodriguez, Diego	University of Bonn
Huber, Florian	University of Bonn
Behnke, Sven	University of Bonn
Keywords: Perception for Grasping and Manipulation, Deep Learning in Grasping and Manipulation, Object Detection, Segmentation and Categorization Abstract: In this paper, we propose a novel approach to solve the 3D non-rigid registration problem from RGB images using Convolutional Neural Networks (CNNs). The goal is to deform a given 3D canonical model in a non-rigid manner such that it matches a novel instance observed by a single-view RGB image. As result of the registration, the observed model is reconstructed. This is done by training a CNN that infers a deformation field for the canonical model and by employing a shape (latent) space for inferring the deformations of the occluded parts of the object. Our method does not only reconstruct a novel observed object, but it provides a deformation field from a canonical model, which can be used to transfer knowledge between instances, e.g., grasping skills. Because our method does not need depth information, it can register objects that are typically hard to perceive with RGB-D sensors (e.g., transparent bottles). Even without depth data, our approach outperforms the Coherent Point Drift (CPD) registration method for the evaluated object categories.

14:30-14:45, Paper WeCT16.3
Relative Pose Estimation and Planar Reconstruction Via Superpixel-Driven Multiple Homographies
Video Attachment

Wang, Xi	INRIA Rennes, IRISA
Christie, Marc	Universit� De Rennes 1
Marchand, Eric	Univ Rennes, Inria, CNRS, IRISA
Keywords: Visual Tracking, Mapping, SLAM Abstract: This paper proposes a novel method to simultaneously perform relative camera pose estimation and planar reconstruction of a scene from two RGB images. We start by extracting and matching superpixel information from both images and rely on a novel multi-model RANSAC approach to estimate multiple homographies from superpixels and identify matching planes. Ambiguity issues when performing homography decomposition are handled by proposing a voting system to more reliably estimate relative camera pose and plane parameters. A non-linear optimization process is also proposed to perform bundle adjustment that exploits a joint representation of homographies and works both for image pairs and whole sequences of image (vSLAM). As a result, the approach provides a mean to perform a dense 3D plane reconstruction from two RGB images only without relying on RGB-D inputs or strong priors such as Manhattan assumptions, and can be extended to handle sequences of images. Our results compete with keypoint- based techniques such as ORB-SLAM while providing a dense representation and are more precise than direct and semi-direct pose estimation techniques used in LSD-SLAM or DPPTAM.

14:45-15:00, Paper WeCT16.4
PERCH 2.0 : Fast and Accurate GPU-Based Perception Via Search for Object Pose Estimation
Video Attachment

Agarwal, Aditya	Carnegie Mellon University
Han, Yupeng	Carnegie Mellon University
Likhachev, Maxim	Carnegie Mellon University
Keywords: Perception for Grasping and Manipulation, RGB-D Perception Abstract: Pose estimation of known objects is fundamental to tasks such as robotic grasping and manipulation. The need for reliable grasping imposes stringent accuracy requirements on pose estimation in cluttered, occluded scenes in dynamic environments. Modern methods employ large sets of training data to learn features in order to find correspondence between 3D models and observed data. However these methods require extensive annotation of ground truth poses. An alternative is to use algorithms that search for the best explanation of the observed scene in a space of possible rendered scenes. A recently developed algorithm, PERCH (PErception Via SeaRCH) does so by using depth data to converge to a globally optimum solution using a search over a specially constructed tree. While PERCH offers strong guarantees on accuracy, the current formulation suffers from low scalability owing to its high runtime. In addition, the sole reliance on depth data for pose estimation restricts the algorithm to scenes where no two objects have the same shape. In this work, we propose PERCH 2.0, a novel perception via search strategy that takes advantage of GPU acceleration and RGB data. We show that our approach can achieve a speedup of 100x over PERCH, as well as better accuracy than the state-of-the-art data-driven approaches on 6-DoF pose estimation without the need for annotating ground truth poses in the training data. Our code and video are available at https://sbpl-cruz.github.io/perception/

15:00-15:15, Paper WeCT16.5
Seeing through the Occluders: Robust Monocular 6-DOF Object Pose Tracking Via Model-Guided Video Object Segmentation

Zhong, Leisheng	Tsinghua University
Zhang, Yu	Tsinghua University
Zhao, Hao	Tsinghua University
Chang, An	Tsinghua University
Xiang, Wenhao	Systems Engineering Research Institute, CSSC
Zhang, Shunli	Beijing Jiaotong University
Zhang, Li	Tsinghua University
Keywords: Computer Vision for Other Robotic Applications, Visual Tracking, Virtual Reality and Interfaces Abstract: To deal with occlusion is one of the most challenging problems for monocular 6-DOF object pose tracking. In this paper, we propose a novel 6-DOF object pose tracking method which is robust to heavy occlusions. When the tracked object is occluded by another object, instead of trying to detect the occluder, we seek to see through it, as if the occluder doesn�t exist. To this end, we propose to combine a learning-based video object segmentation module with an optimization-based pose estimation module in a closed loop. Firstly, a model-guided video object segmentation network is utilized to predict the accurate and full mask of the object (including the occluded part). Secondly, a non-linear 6-DOF pose optimization method is performed with the guidance of the predicted full mask. After solving the current object pose, we render the 3D object model to obtain a refined, model-constrained mask of the current frame, which is then fed back to the segmentation network for processing the next frame, closing the whole loop. Experiments show that the proposed method outperforms the state-of-arts by a large margin for dealing with heavy occlusions, and could handle extreme cases which previous methods would fail.

15:15-15:30, Paper WeCT16.6
VeREFINE: Integrating Object Pose Verification with Physics-Guided Iterative Refinement

Bauer, Dominik	TU Wien
Patten, Timothy	TU Wien
Vincze, Markus	Vienna University of Technology
Keywords: Perception for Grasping and Manipulation, Simulation and Animation, AI-Based Methods Abstract: Accurate and robust object pose estimation for robotics applications requires verification and refinement steps. In this work, we propose to integrate hypotheses verification with object pose refinement guided by physics simulation. This allows the physical plausibility of individual object pose estimates and the stability of the estimated scene to be considered in a unified optimization. The proposed method is able to adapt to scenes of multiple objects and efficiently focuses on refining the most promising object poses in multi-hypotheses scenarios. We call this integrated approach VeREFINE and evaluate it on three datasets with varying scene complexity. The generality of the approach is shown by using three state-of-the-art pose estimators and three baseline refiners. Results show improvements over all baselines and on all datasets. Furthermore, our approach is applied in real-world grasping experiments and outperforms competing methods in terms of grasp success rate.


WeCT17	Room T17
Robot Perception I	Regular session
Chair: Belagiannis, Vasileios	Universit�t Ulm
Co-Chair: Linkowski, Gregory	Ford Motor Company

14:00-14:15, Paper WeCT17.1
Laser2Vec: Similarity-Based Retrieval for Robotic Perception Data

Nashed, Samer	University of Massachusetts Amherst
Keywords: Big Data in Robotics and Automation, Semantic Scene Understanding, Service Robots Abstract: As mobile robot capabilities improve and deployment times increase, tools to analyze the growing volume of data are becoming necessary. Current state-of-the-art logging, playback, and exploration systems are insufficient for practitioners seeking to discover systemic points of failure in robotic systems. This paper presents a suite of algorithms for similarity-based queries of robotic perception data and implements a system for storing 2D LiDAR data from many deployments cheaply and evaluating top-k queries for complete or partial scans efficiently. We generate compressed representations of laser scans via a convolutional variational autoencoder and store them in a database, where a light-weight dense network for distance function approximation is run at query time. Our query evaluator leverages the local continuity of the embedding space to generate evaluation orders that, in expectation, dominate full linear scans of the database. The accuracy, robustness, scalability, and efficiency of our system is tested on real-world data gathered from dozens of deployments and synthetic data generated by corrupting real data. We find our system accurately and efficiently identifies similar scans across a number of episodes where the robot encountered the same location, or similar indoor structures or objects.

14:15-14:30, Paper WeCT17.2
Occlusion Handling for Industrial Robots

Zhu, Ling	Ford Motor Company
Menon, Meghna	Ford Motor Company
Santillo, Mario	Ford Motor Company
Linkowski, Gregory	Ford Motor Company
Keywords: Cooperating Robots, Computer Vision for Manufacturing, Industrial Robots Abstract: Industrial robots contain minimal sensing capability beyond recognition of their internal state. It is critical that an external vision system should cover the designated robot work space with awareness of blind spots and occlusions. This work presents two mechanisms to handle occlusions in an external multi-robot vision system: occlusion-aware optimal sensor positioning, and event-driven occlusion detection. When deploying sensors to the system, various scenarios are considered during optimization to reduce potential occlusions and increase sensor coverage. These methods are tested on a working cell with three industrial robot arms. The experimental results demonstrate the effectiveness of the proposed scenario-based multi-objective optimization for sensor positioning. Once the sensors are deployed, occlusion detection is actively triggered prior to robot path planning.

14:30-14:45, Paper WeCT17.3
Automatic Targetless Extrinsic Calibration of Multiple 3D LiDARs and Radars

Heng, Lionel	DSO National Laboratories
Keywords: Calibration and Identification, Field Robots Abstract: Many self-driving vehicles use a multi-sensor system comprising multiple 3D LiDAR and radar sensors for robust all-round perception. Precise calibration of this multi-sensor system is a critical prerequisite for accurate perception data which facilitates safe operation of self-driving vehicles in highly dynamic urban environments. This paper proposes the first-known automatic targetless method for extrinsic calibration of multiple 3D LiDAR and radar sensors, and which only requires the vehicle to be driven over a short distance. The proposed method first estimates the 6-DoF pose of each LiDAR sensor with respect to the vehicle reference frame by minimizing point-to-plane distances between scans from different LiDAR sensors. In turn, a 3D map of the environment is built using data from all calibrated LiDAR sensors on the vehicle. We find the 6-DoF pose of each radar sensor with respect to the vehicle reference frame by minimizing (1) point-to-plane distances between radar scans and the 3D map, and (2) radial velocity errors. Our proposed calibration method does not require overlapping fields of view between LiDAR and radar sensors. Real-world experiments demonstrate the accuracy and repeatability of the proposed calibration method.

14:45-15:00, Paper WeCT17.4
Traffic Control Gesture Recognition for Autonomous Vehicles
Video Attachment

Wiederer, Julian	Mercedes-Benz AG
Bouazizi, Arij	Mercedes Benz AG
Kressel, Ulrich	Mercedes-Benz AG
Belagiannis, Vasileios	Universit�t Ulm
Keywords: Human and Humanoid Motion Analysis and Synthesis, Gesture, Posture and Facial Expressions, Autonomous Vehicle Navigation Abstract: A car driver knows how to react on the gestures of the traffic officers. Clearly, this is not the case for the autonomous vehicle, unless it has road traffic control gesture recognition functionalities. In this work, we address the limitation of the existing autonomous driving datasets to provide learning data for traffic control gesture recognition. We introduce a dataset that is based on 3D body skeleton input to perform traffic control gesture classification on every time step. Our dataset consists of 250 sequences from several actors, ranging from 16 to 90 seconds per sequence. To evaluate our dataset, we propose eight sequential processing models based on deep neural networks such as recurrent networks, attention mechanism, temporal convolutional networks and graph convolutional networks. We present an extensive evaluation and analysis of all approaches for our dataset, as well as real-world quantitative evaluation. The code and dataset is publicly available.

15:00-15:15, Paper WeCT17.5
SelfieDroneStick: A Natural Interface for Quadcopter Photography
Video Attachment

Alabachi, Saif	University of Central Florida
Sukthankar, Gita	University of Central Florida
Sukthankar, Rahul	Google
Keywords: Human-Centered Robotics, Aerial Systems: Perception and Autonomy, Reinforecment Learning Abstract: A physical selfie stick extends the user's reach, enabling the acquisition of personal photos that include more of the background scene. Conversely a quadcopter can capture photos from vantage points unattainable by the user, but teleoperating a quadcopter to a good viewpoint is a non-trivial task. This paper presents a natural interface for quadcopter photography, the Selfie Drone Stick that allows the user to guide the quadcopter to the optimal vantage point based on the phone's sensors. Users specify the composition of their desired long-range selfies using their smartphone, and the quadcopter autonomously flies to a sequence of vantage points from where the desired shots can be taken. The robot controller is trained from a combination of real-world images and simulated flight data. This paper describes two key innovations required to deploy deep reinforcement learning models on a real robot: 1) an abstract state representation for transferring learning from simulation to the hardware platform, and 2) reward shaping and staging paradigms for training the controller. Both of these improvements were found to be essential in learning a robot controller from simulation that transfers successfully to the real robot.


WeCT18	Room T18
Robot Perception II	Regular session
Chair: Chan, Stanley	Purdue University
Co-Chair: Savatier, Xavier	Irseem Ea 4353

14:00-14:15, Paper WeCT18.1
Autonomous RGBD-Based Industrial Staircase Localization from Tracked Robots

Fourre, Jeremy	Esigelec
Vauchey, Vincent	ESIGELEC
Dupuis, Yohan	ESIGELEC
Savatier, Xavier	Irseem Ea 4353
Keywords: Field Robots, RGB-D Perception, Industrial Robots Abstract: This paper presents an industrial staircase localization algorithm based on RGBD data from a tracked robot. This situation is really challenging as the camera is placed close to the ground. Moreover, RGBD can be really noisy on sparse staircases. Contrary to existing works, our evaluation relies on ground truth data provided by a motion capture system. Our experiments suggest that our algorithm can robustly locate industrial staircase. We also propose a new framework to evaluate stair localization performance from RGBD data. The overall performance allows to safety control a robot to rally the staircase.

14:15-14:30, Paper WeCT18.2
EU Long-Term Dataset with Multiple Sensors for Autonomous Driving

Yan, Zhi	University of Technology of Belfort-Montb�liard (UTBM)
Sun, Li	University of Sheffield
Krajn�k, Tom�	Czech Technical University
Ruichek, Yassine	University of Technology of Belfort-Montbeliard - France
Keywords: Multi-Modal Perception, Localization, Software, Middleware and Programming Environments Abstract: The field of autonomous driving has grown tremendously over the past few years, along with the rapid progress in sensor technology. One of the major purposes of using sensors is to provide environment perception for vehicle understanding, learning and reasoning, and ultimately interacting with the environment. In this paper, we first introduce a multisensor platform allowing vehicle to perceive its surroundings and locate itself in a more efficient and accurate way. The platform integrates eleven heterogeneous sensors including various cameras and lidars, a radar, an IMU (Inertial Measurement Unit), and a GPS-RTK (Global Positioning System / Real-Time Kinematic), while exploits a ROS (Robot Operating System) based software to process the sensory data. Then, we present a new dataset (https://epan-utbm.github.io/utbm_robocar_dataset/) for autonomous driving captured many new research challenges (e.g. highly dynamic environment), and especially for long-term autonomy (e.g. creating and maintaining maps), collected with our instrumented vehicle, publicly available to the community.

14:30-14:45, Paper WeCT18.3
Interacting Multiple Model Navigation System for Quadrotor Micro Aerial Vehicles Subject to Rotor Drag
Video Attachment

Gomaa, Mahmoud A.K.	Memorial University of Newfoundland
De Silva, Oscar	Memorial University of Newfoundland
Mann, George K. I.	Memorial University of Newfoundland
Gosine, Raymond G.	Memorial University of Newfoundland
Keywords: Localization, Autonomous Vehicle Navigation, Aerial Systems: Perception and Autonomy Abstract: This paper presents the design of an Interacting Multiple Model (IMM) filter for improved navigation performance of Micro Aerial Vehicles (MAVs). The paper considers a navigation system that incorporates rotor drag dynamics and proposes a strategy to overcome the sensitivity of the system to external wind disturbances. Two error state Kalman filters are incorporated in an IMM filtering framework. The first filter has a model that uses conventional Inertial Navigation System (INS) mechanization equations, while the second filter considers a dynamic model with rotor drag forces of the MAV. In order to support the two error state Kalman filters, the generic IMM algorithm [1] is modified for error state implementation, handle dissimilar state definitions, and adaptive switching during operation. Numerical simulations and experimental validation using the EuRoC dataset are conducted to evaluate the performance of the proposed IMM filter design for changing flight conditions and external wind disturbance scenarios.

14:45-15:00, Paper WeCT18.4
Who Make Drivers Stop? towards Driver-Centric Risk Assessment: Risk Object Identification Via Causal Inference
Video Attachment

Li, Chengxi	Purdue University
Chan, Stanley	Purdue University
Chen, Yi-Ting	Honda Research Institute USA
Keywords: Perception-Action Coupling, Intelligent Transportation Systems, Computer Vision for Transportation Abstract: A significant amount of people die in road accidents due to driver errors. To reduce fatalities, developing intelligent driving systems assisting drivers to identify potential risks is in an urgent need. Risky situations are generally defined based on collision prediction in the existing works. However, collision is only a source of potential risks, and a more generic definition is required. In this work, we propose a novel driver-centric definition of risk, i.e., objects influencing drivers� behavior are risky. A new task called risk object identification is introduced. We formulate the task as the cause-effect problem and present a novel two-stage risk object identification framework based on causal inference with the proposed object-level manipulable driving model. We demonstrate favorable performance on risk object identification compared with strong baselines on the Honda Research Institute Driving Dataset (HDD). Our framework achieves a substantial average performance boost over a strong baseline by 7.5%.


WeCT19	Room T19
Range Sensing and Deep Learning	Regular session
Chair: Lee, Gim Hee	National University of Singapore
Co-Chair: Baur, Stefan Andreas	Mercedes-Benz AG

14:00-14:15, Paper WeCT19.1
Point Cloud Completion by Learning Shape Priors

Wang, Xiaogang	National University of Singapore
Ang Jr, Marcelo H	National University of Singapore
Lee, Gim Hee	National University of Singapore
Keywords: Deep Learning for Visual Perception, Novel Deep Learning Methods Abstract: In view of the difficulty in reconstructing object details in point cloud completion, we propose a shape prior learning method for object completion. The shape priors include geometric information in both complete and the partial point clouds. We design a feature alignment strategy to learn the shape prior from complete points, and a coarse to fine strategy to incorporate partial prior in the fine stage. To learn the complete objects prior, we first train a point cloud auto-encoder to extract the latent embeddings from complete points. Then we learn a mapping to transfer the point features from partial points to that of the complete points by optimizing feature alignment losses. The feature alignment losses consist of a L2 distance and an adversarial loss obtained by Maximum Mean Discrepancy Generative Adversarial Network (MMD-GAN). The L2 distance optimizes the partial features towards the complete ones in the feature space, and MMD-GAN decreases the statistical distance of two point features in a Reproducing Kernel Hilbert Space. We achieve state-of-the-art performances on the point cloud completion task. Our code is available at https://github.com/xiaogangw/point-cloud-completion-shape-prior.

14:15-14:30, Paper WeCT19.2
ECG: Edge-Aware Point Cloud Completion with Graph Convolution

Pan, Liang	National University of Singapore
Keywords: Deep Learning for Visual Perception, Computer Vision for Other Robotic Applications, Perception for Grasping and Manipulation Abstract: Scanned 3D point clouds for real-world scenes often suffer from noise and incompletion. Observing that prior point cloud shape completion networks overlook local geometric features, we propose our ECG - an Edge-aware point cloud Completion network with Graph convolution, which facilitates fine-grained 3D point cloud shape generation with multi-scale edge features. Our ECG consists of two consecutive stages: 1) skeleton generation and 2) details refinement. Each stage is a generation sub-network conditioned on the input incomplete point cloud. The first stage generates coarse skeletons to facilitate capturing useful edge features against noisy measurements. Subsequently, we design a deep hierarchical encoder with graph convolution to propagate multi-scale edge features for local geometric details refinement. To preserve local geometrical details while upsampling, we propose the Edge-aware Feature Expansion (EFE) module to smoothly expand/upsample point features by emphasizing their local edges. Extensive experiments show that our ECG significantly outperforms previous state-of-the-art methods for point cloud completion.

14:30-14:45, Paper WeCT19.3
PillarFlowNet: A Real-Time Deep Multitask Network for LiDAR-Based 3D Object Detection and Scene Flow Estimation

Duffhauss, Fabian	Bosch Center for Artificial Intelligence
Baur, Stefan Andreas	Mercedes-Benz AG
Keywords: Deep Learning for Visual Perception, Object Detection, Segmentation and Categorization, Computer Vision for Transportation Abstract: Mobile robotic platforms require a precise understanding about other agents in their surroundings as well as their respective motion in order to operate safely. Scene flow in combination with object detection can be used to achieve this understanding. Together, they provide valuable cues for behavior prediction of other agents and thus ultimately are a good basis for the ego-vehicle's behavior planning algorithms. Traditionally, scene flow estimation and object detection are handled by separate deep networks requiring immense computational resources. In this work, we propose PillarFlowNet, a novel method for simultaneous LiDAR scene flow estimation and object detection with low latency and high precision based on a single network. In our experiments on the KITTI dataset, PillarFlowNet achieves a 16.3 percentage points higher average precision score as well as a 21.4 percent reduction in average endpoint error for scene flow compared to the state-of-the-art in multitask LiDAR object detection and scene flow estimation. Furthermore, our method is significantly faster than previous methods, making it the first to be applicable for real-time systems.

14:45-15:00, Paper WeCT19.4
Monocular Depth Prediction through Continuous 3D Loss
Video Attachment

Zhu, Minghan	University of Michigan
Ghaffari, Maani	Univ. of Michigan
Zhong, Yuanxin	University of Michigan
Lu, Pingping	University of Michigan
Cao, Zhong	Tsinghua University
Eustice, Ryan	University of Michigan
Peng, Huei	University of MIchigan
Keywords: Deep Learning for Visual Perception, Computer Vision for Transportation, Range Sensing Abstract: This paper reports a new continuous 3D loss function for learning depth from monocular images. The dense depth prediction from a monocular image is supervised using sparse LIDAR points, which enables us to leverage available open source datasets with camera-LIDAR sensor suites during training. Currently, accurate and affordable range sensor is not readily available. Stereo cameras and LIDARs measure depth either inaccurately or sparsely/costly. In contrast to the current point-to-point loss evaluation approach, the proposed 3D loss treats point clouds as continuous objects; therefore, it compensates for the lack of dense ground truth depth due to LIDAR's sparsity measurements. We applied the proposed loss in three state-of-the-art monocular depth prediction approaches DORN, BTS, and Monodepth2. Experimental evaluation shows that the proposed loss improves the depth prediction accuracy and produces point-clouds with more consistent 3D geometric structures compared with all tested baselines, implying the benefit of the proposed loss on general depth prediction networks. A video demo of this work is available at https://youtu.be/5HL8BjSAY4Y.

15:00-15:15, Paper WeCT19.5
MSDPN: Monocular Depth Prediction with Partial Laser Observation Using Multi-Stage Neural Networks
Video Attachment

Lim, Hyungtae	Korea Advanced Institute of Science and Technology
Gil, Hyeonjae	KAIST
Myung, Hyun	KAIST (Korea Adv. Inst. Sci. & Tech.)
Keywords: Deep Learning for Visual Perception, Sensor Fusion, Range Sensing Abstract: In this study, a deep-learning-based multi-stage network architecture called Multi-Stage Depth Prediction Network (MSDPN) is proposed to predict a dense depth map using a 2D LiDAR and a monocular camera. Our proposed network consists of a multi-stage encoder-decoder architecture and Cross Stage Feature Aggregation (CSFA). The proposed multi-stage encoder-decoder architecture alleviates the partial observation problem caused by the characteristics of a 2D LiDAR, and CSFA prevents the multi-stage network from diluting the features and allows the network to learn the interspatial relationship between features better. Previous works use sub-sampled data from the ground truth as an input rather than actual 2D LiDAR data. In contrast, our approach trains the model and conducts experiments with a physically-collected 2D LiDAR dataset. To this end, we acquired our own dataset called KAIST RGBD-scan dataset and validated the effectiveness and the robustness of MSDPN under realistic conditions. As verified experimentally, our network yields promising performance against state-of-the-art methods. Additionally, we analyzed the performance of different input methods and confirmed that the reference depth map is robust in untrained scenarios.


WeCT20	Room T20
Range Sensing	Regular session
Chair: Zhu, Yuhao	University of Rochester
Co-Chair: Kim, Ayoung	Korea Advanced Institute of Science Technology

14:00-14:15, Paper WeCT20.1
Remove, Then Revert: Static Point Cloud Map Construction Using Multiresolution Range Images
Video Attachment

Kim, Giseop	KAIST(Korea Advanced Institute of Science and Technology)
Kim, Ayoung	Korea Advanced Institute of Science Technology
Keywords: Mapping, Range Sensing, Localization Abstract: We present a novel static point cloud map construction algorithm, called Removert, for use within dynamic urban environments. Leaving only static points and excluding dynamic objects is a critical problem in various robust robot missions in changing outdoors, and the procedure commonly contains comparing a query to the noisy map that has dynamic points. In doing so, however, the estimated discrepancies between a query scan and the noisy map tend to possess errors due to imperfect pose estimation, which degrades the static map quality. To tackle the problem, we propose a multiresolution range image-based false prediction reverting algorithm. We first conservatively retain definite static points and iteratively recover more uncertain static points by enlarging the query-tomap association window size, which implicitly compensates the LiDAR motion or registration errors. We validate our method on the KITTI dataset using SemanticKITTI as ground truth, and show our method qualitatively competes or outperforms the human-labeled data (SemanticKITTI) in ambiguous regions.

14:15-14:30, Paper WeCT20.2
Real-Time Spatio-Temporal LiDAR Point Cloud Compression

Feng, Yu	University of Rochester
Liu, Shaoshan	PerceptIn
Zhu, Yuhao	University of Rochester
Keywords: Mapping, Range Sensing Abstract: Compressing massive LiDAR point clouds in real-time is critical to autonomous machines such as drones and self-driving cars. While most of the recent prior work has focused on compressing individual point cloud frames, this paper proposes a novel system that effectively compresses a sequence of point clouds. The idea to exploit both the spatial and temporal redundancies in a sequence of point cloud frames. We first identify a key frame in a point cloud sequence and spatially encode the key frame by iterative plane fitting. We then exploit the fact that consecutive point clouds have large overlaps in the physical space, and thus spatially encoded data can be (re-)used to encode the temporal stream. Temporal encoding by reusing spatial encoding data not only improves the compression rate, but also avoids redundant computations, which significantly improves the compression speed. Experiments show that our compression system achieves 40x to 90x compression rate, significantly higher than the MPEG's LiDAR point cloud compression standard, while retaining high end-to-end application accuracies. Meanwhile, our compression system has a compression speed that matches the point cloud generation rate by today LiDARs and out-performs existing compression systems, enabling real-time point cloud transmission.

14:30-14:45, Paper WeCT20.3
B-Spline Surfaces for Range-Based Environment Mapping

T. Rodrigues, R�mulo	Faculty of Engineering, University of Porto
Tsiogkas, Nikolaos	KU Leuven
Aguiar, A. Pedro	Faculty of Engineering, University of Porto (FEUP)
Pascoal, Antonio	Instituto Superior Tecnico
Keywords: Mapping, Range Sensing, SLAM Abstract: In this paper, we propose a mapping technique that builds a continuous representation of the environment from range data. The strategy presented here encodes the probability of points in space to be occupied using 2.5D B-spline surfaces. For a fast update rate, the surface is recursively updated as new measurements arrive. The proposed B-spline map is less susceptible to precision and interpolation errors that are present in occupancy grid-based methods. From simulation and experimental results, we show that this approach leverages the floating point resolution of continuous metric maps and the fast update/access/merging advantages of discrete metric maps. Thus, the proposed method is suitable for online robotic tasks such as localization and path planning, requiring minor modification to existing software that usually operates on metric maps.

14:45-15:00, Paper WeCT20.4
2D Laser SLAM with General Features Represented by Implicit Functions

Zhao, Jiaheng	University of Technology Sydney
Zhao, Liang	University of Technology Sydney
Huang, Shoudong	University of Technology, Sydney
Wang, Yue	Zhejiang University
Keywords: SLAM, Localization, Range Sensing Abstract: The main contribution of this paper is the problem formulation and algorithm framework for 2D laser SLAM with general features represented by implicit functions. Since 2D laser data reflect the distances from the robot to the boundary of objects in the environment, it is natural to use the boundary of the general objects/features within the 2D environment to describe the features. Implicit functions can be used to represent almost arbitrary shapes from simple (e.g. circle, ellipse, line) to complex (e.g. a cross-section of a bunny model), thus it is worth studying implicit-expressed feature in 2D laser SLAM. In this paper, we clearly formulate the SLAM problem with implicit functions as features, with rigorously computed observation covariance matrix to be used in the SLAM objective function and propose a solution framework. Furthermore, we use ellipses and lines as examples to compare the proposed SLAM method with the traditional pre-fit method (represent the feature using its parameters and pre-fit the laser scan to get the fitted parameter as virtual observations). Simulation and experimental results show that our proposed method has a better performance compared with the pre-fit method and other methods, demonstrating the potential of this new SLAM formulation and method.

15:00-15:15, Paper WeCT20.5
A Novel Coding Architecture for LiDAR Point Cloud Sequence

Sun, Xuebin	USTHK
Wang, Sukai	Robotics and Multi-Perception Lab (RAM-LAB), Robotics Institute,
Wang, Miaohui	Shenzhen University
Wang, Zheng	The University of Hong Kong
Liu, Ming	Hong Kong University of Science and Technology
Keywords: Range Sensing, SLAM, Automation Technologies for Smart Cities Abstract: n this paper, we propose a novel coding architecturefor LiDAR point cloud sequences based on clustering andprediction neural networks. LiDAR point clouds are structured,which provides an opportunity to convert the 3D data to a2D array, represented as range images. Thus, we cast theLiDAR point clouds compression as a range images codingproblem. Inspired by the high efciency video coding (HEVC)algorithm, we design a novel coding architecture for the pointcloud sequence. The scans are divided into two categories: intra-frames and inter-frames. For intra-frames, a cluster-based intra-prediction technique is utilized to remove the spatial redundancy.For inter-frames, we design a prediction network model usingconvolutional LSTM cells, which is capable of predicting futureinter-frames according to the encoded intra-frames. Thus, thetemporal redundancy can be removed. Experiments on the KITTIdata set show that the proposed method achieves an impressivecompression ratio, with 4.10% at millimeter precision. Comparedwith octree, Google Draco and MPEG TMC13 methods, ourscheme also yields better performance in compression ratio.


WeCT21	Room T21
Recognition	Regular session
Chair: Huang, Shoudong	University of Technology, Sydney

14:00-14:15, Paper WeCT21.1
Centroids Triplet Network and Temporally-Consistent Embeddings for In-Situ Object Recognition
Video Attachment

Lagunes-Fortiz, Miguel	Bristol Robotics Lab
Damen, Dima	University of Bristol
Mayol, Walterio	University of Bristol
Keywords: Object Detection, Segmentation and Categorization, Visual Learning, Computer Vision for Automation Abstract: This work proposes learning to recognize objects from a small number of training examples collected and deployed in-situ. That is, from data collected where the objects are commonly placed or being used, perhaps after first encountering them, the learning algorithm immediately is able to recognize them again. We refer to this methodology as in-situ learning, and it opposes to the conventional methodology of using complex data acquisition mechanisms, such as rotating tables or synthetic data, to build a large-scale dataset for training convolutional neural networks (ConvNets). To learn in-situ, we propose a novel loss function that generates discriminative features for known and unseen objects, by utilizing a regularization term that reduces the distance between features and their manifold centroid. Additionally, we propose a temporal filter that is particularly useful to quickly react to appearing objects on the scene, which depending on the distance between neighboring video-frame features, it applies a weighted average between the current and the previous frame. Our framework achieves state-of-the-art accuracy for in-situ and on-the-fly learning, for the case of known objects achieves an average increase in accuracy of 3.01%, an increase of 3.3% for novel objects, and an average increase of 7.07% for the combined case, compared with the closest baseline. Utilizing the temporal filtering, led to a further increase in accuracy against nuisances of 7.32% for the known and novels objects case.

14:15-14:30, Paper WeCT21.2
AMAE: Adaptive Motion-Agnostic Encoder for Event-Based Object Classification

Deng, Yongjian	City University of Hong Kong
Li, Y.F.	City University of Hong Kong
Chen, Hao	City University of Hong Kong
Keywords: Object Detection, Segmentation and Categorization, Recognition, Deep Learning for Visual Perception Abstract: Event cameras, with low power consumption, high temporal resolution, and high dynamic range, have been used increasingly in computer vision. These superior characteristics enable event cameras to perform low-energy and high-response object classification tasks in challenging scenarios. Nevertheless, specific encoding methods for event-based classification are required owing to the unconventional output of event cameras. Existing event-based encoding methods have focused on extracting semantic and motion information in event signals. However, two main problems exist in these methods: (i) the motion information of event signals leads to mispredictions by the classifiers. (ii) effective evaluation methods to validate the motion robustness of event-based classification models have yet to be proposed. In this work, we introduce an adaptive motion-agnostic encoder for event streams to address the first problem. The proposed encoder would allow us to extract indistinguishable semantic information from an object in different motion circumstances. In addition, we propose a novel motion inconsistency evaluation method to assess the motion robustness of the classification models. We apply our method to several benchmark datasets and evaluate it using motion consistency and inconsistency testing methods. Classification performance shows that our proposed encoder outperforms state-of-the-art methods by a large margin.

14:30-14:45, Paper WeCT21.3
A Framework for Recognition and Prediction of Human Motions in Human-Robot Collaboration Using Probabilistic Motion Models

Callens, Thomas	KU Leuven
van der Have, Arthur	Katholieke Universiteit Leuven
Van Rossom, Sam	KU Leuven
De Schutter, Joris	KU Leuven
Aertbelien, Erwin	KU Leuven
Keywords: Recognition, Learning from Demonstration, Probability and Statistical Methods Abstract: This paper presents a framework for recognition and prediction of ongoing human motions. The predictions generated by this framework could be used in a controller for a robotic device, enabling the emergence of intuitive and predictable interactions between humans and a robotic collaborator. The framework includes motion onset detection, phase speed estimation, intent estimation and conditioning. For recognition and prediction of a motion, the framework makes use of a motion model database. This database contains several motion models learned using the probabilistic Principal Component Analysis (PPCA) method. The proposed framework is evaluated with joint angle trajectories of eight subjects performing squatting, stooping and lifting tasks. The motion onset and phase speed estimation modules are first evaluated separately. Next, an evaluation of the full framework provides more insight in the current challenges regarding motion prediction. A brief comparison between PPCA and the Probabilistic Movement Primitives (ProMP) method for learning motion models is made based on the influence of both methodologies on the performance of the framework. Both PPCA and ProMP motion models are able to predict motions over a short time horizon but struggle to predict motions over a longer horizon.

14:45-15:00, Paper WeCT21.4
Dense Isometric Non-Rigid Shape-From-Motion Based on Graph Optimization and Edge Selection
Video Attachment

Chen, Yongbo	University of Technology, Sydney
Zhao, Liang	University of Technology Sydney
Zhang, Yanhao	University of Technology Sydney
Huang, Shoudong	University of Technology, Sydney
Keywords: Recognition, Computer Vision for Medical Robotics, SLAM Abstract: In this letter, we propose a novel framework for dense isometric non-rigid shape-from-motion (Iso-NRSfM) based on graph topology and edge selection. A weighted undirected graph, of which nodes, edges, and weighted values are respectively the images, the image warps, and the number of the common features, is built. An edge selection algorithm based on maximum spanning tree and sub-modular optimization is presented to pick out the well-connected sub-graph for the warps with multiple images. Using the infinitesimal planarity assumption, the Iso-NRSfM problem is formulated as a graph optimization problem with the virtual measurements, which are based on metric tensor and Christoffel Symbol, and the variables related to the derivatives of the constructed points along the surface. The solution of this graph optimization problem directly leads to the normal field of the shape. Then, using a separable iterative optimization method, we obtain the dense point cloud with texture corresponding to the deformable shape robustly. In the experiments, the proposed method outperforms existing work in terms of constructed accuracy, especially when there exists missing/appearing (changing) data, noisy data, and outliers.

15:00-15:15, Paper WeCT21.5
Boosting Deep Open World Recognition by Clustering

Fontanel, Dario	Politecnico Di Torino
Cermelli, Fabio	Politecnico Di Torino
Mancini, Massimiliano	Sapienza University of Rome
Rota Bul�, Samuel	Fondazione Bruno Kessler
Ricci, Elisa	University of Trento
Caputo, Barbara	Sapienza University
Keywords: Recognition, Deep Learning for Visual Perception, Visual Learning Abstract: While convolutional neural networks have brought significant advances in robot vision, their ability is often limited to closed world scenarios, where the number of semantic concepts to be recognized is determined by the available training set. Since it is practically impossible to capture all possible semantic concepts present in the real world in a single training set, we need to break the closed world assumption, equipping our robot with the capability to act in an open world. To provide such ability, a robot vision system should be able to (i) identify whether an instance does not belong to the set of known categories (i.e. open set recognition), and (ii) extend its knowledge to learn new classes over time (i.e. incremental learning). In this work, we show how we can boost the performance of deep open world recognition algorithms by means of a new loss formulation enforcing a global to local clustering of class-specific features. In particular, a first loss term, i.e. global clustering, forces the network to map samples closer to the class centroid they belong to while the second one, local clustering, shapes the representation space in such a way that samples of the same class get closer in the representation space while pushing away neighbours belonging to other classes. Moreover, we propose a strategy to learn classspecific rejection thresholds, instead of heuristically estimating a single global threshold, as in previous works. Experiments on three benchmarks show the effectiveness of our approach.

15:15-15:30, Paper WeCT21.6
Augmenting Visual Place Recognition with Structural Cues

Oertel, Amadeus	University of Zurich, ETH Zurich
Cieslewski, Titus	University of Zurich
Scaramuzza, Davide	University of Zurich
Keywords: Recognition Abstract: In this paper, we propose to augment image-based place recognition with structural cues. Specifically, these structural cues are obtained using structure-from-motion, such that no additional sensors are needed for place recognition. This is achieved by augmenting the 2D convolutional neural network (CNN) typically used for image-based place recognition with a 3D CNN that takes as input a voxel grid derived from the structure-from-motion point cloud. We evaluate different methods for fusing the 2D and 3D features and obtain best performance with global average pooling and simple concatenation. On the Oxford RobotCar dataset, the resulting descriptor exhibits superior recognition performance compared to descriptors extracted from only one of the input modalities, including state-of-the-art image-based descriptors. Especially at low descriptor dimensionalities, we outperform state-of-the-art descriptors by up to 90%.


WeCT22	Room T22
Sensor Fusion: Vision and Perception	Regular session
Chair: Dolan, John M.	Carnegie Mellon University
Co-Chair: Oishi, Takeshi	The University of Tokyo

14:00-14:15, Paper WeCT22.1
Depth Completion Via Inductive Fusion of Planar LIDAR and Monocular Camera
Video Attachment

Fu, Chen	Carneigie Mellon University
Dong, Chiyu	DiDi Labs
Mertz, Christoph	CMU
Dolan, John M.	Carnegie Mellon University
Keywords: Sensor Fusion, Multi-Modal Perception, RGB-D Perception Abstract: Modern high-definition LIDAR is expensive for commercial autonomous driving vehicles and small indoor robots. An affordable solution to this problem is fusion of planar LIDAR with RGB images to provide a similar level of perception capability. Even though state-of-the-art methods provide approaches to predict depth information from limited sensor input, they are usually a simple concatenation of sparse LIDAR features and dense RGB features through an end-to-end fusion architecture. In this paper, we introduce an inductive late-fusion block which better fuses different sensor modalities inspired by a probability model. The proposed demonstration and aggregation network propagates the mixed context and depth features to the prediction network and serves as a prior knowledge of the depth completion. This late-fusion block uses the dense context features to guide the depth prediction based on demonstrations by sparse depth features. In addition to evaluating the proposed method on benchmark depth completion datasets including NYUDepthV2 and KITTI, we also test the proposed method on a simulated planar LIDAR dataset. Our method shows promising results compared to previous approaches on both the benchmark datasets and simulated dataset with various 3D densities.

14:15-14:30, Paper WeCT22.2
Discontinuous and Smooth Depth Completion with Binary Anisotropic Diffusion Tensor
Video Attachment

Yao, Yasuhiro	The University of Tokyo
Roxas, Menandro	The University of Tokyo
Ishikawa, Ryoichi	The University of Tokyo
Ando, Shingo	Nippon Telegraph and Telephone
Shimamura, Jun	NTT Media Intelligence Laboratories
Oishi, Takeshi	The University of Tokyo
Keywords: Sensor Fusion, Computer Vision for Other Robotic Applications Abstract: We propose an unsupervised real-time dense depth completion from a sparse depth map guided by a single image. Our method generates a smooth depth map while preserving discontinuity between different objects. Our key idea is a Binary Anisotropic Diffusion Tensor (B-ADT) which can completely eliminate smoothness constraint at intended positions and directions by applying it to variational regularization. We also propose an Image-guided Nearest Neighbor Search (IGNNS) to derive a piecewise constant depth map which is used for B-ADT derivation and in the data term of the variational energy. Our experiments show that our method can outperform previous unsupervised and semi-supervised depth completion methods in terms of accuracy. Moreover, since our resulting depth map preserves the discontinuity between objects, the result can be converted to a visually plausible point cloud. This is remarkable since previous methods generate unnatural surface-like artifacts between discontinuous objects.

14:30-14:45, Paper WeCT22.3
GRIF Net: Gated Region of Interest Fusion Network for Robust 3D Object Detection from Radar Point Cloud and Monocular Image

Kim, YoungSeok	Korea Advanced Institute of Science and Technology
Choi, Jun-Won	Hanyang University
Kum, Dongsuk	KAIST
Keywords: Sensor Fusion, Object Detection, Segmentation and Categorization, Computer Vision for Transportation Abstract: Robust and accurate scene representation is essential for advanced driver assistance systems (ADAS) such as automated driving. The radar and camera are two widely used sensors for commercial vehicles due to their low-cost, high-reliability, and low-maintenance. Despite their strengths, radar and camera have very limited performance when used individually. In this paper, we propose a low-level sensor fusion 3D object detector that combines two Region of Interest (RoI) from radar and camera feature maps by a Gated RoI Fusion (GRIF) to perform robust vehicle detection. To take advantage of sensors and utilize a sparse radar point cloud, we design a GRIF that employs the explicit gating mechanism to adaptively select the appropriate data when one of the sensors is abnormal. Our experimental evaluations on nuScenes show that our fusion method GRIF not only has significant performance improvement over single radar and image method but achieves comparable performance to the LiDAR detection method. We also observe that the proposed GRIF achieve higher recall than mean or concatenation fusion operation when points are sparse.

14:45-15:00, Paper WeCT22.4
Dynamic Object Tracking for Self-Driving Cars Using Monocular Camera and LIDAR
Video Attachment

Zhao, Lin	School of Automation, Beijing Institute of Technology
Wang, Meiling	Beijing Institute of Technology
Su, Sheng	Beijing Institute of Technology
Liu, Tong	Beijing Institute of Technology
Yang, Yi	Beijing Institute of Technology
Keywords: Sensor Fusion, Visual-Based Navigation, Wheeled Robots Abstract: The detection and tracking of dynamic traffic participants (e.g., pedestrians, cars, and bicyclists) plays an important role in reliable decision-making and intelligent navigation for autonomous vehicles. However, due to the rapid movement of the target, most current vision-based tracking methods, which perform tracking in the image domain or invoke 3D information in parts of their pipeline, have real-life limitations such as lack of the ability to recover tracking after the target is lost. In this work, we overcome such limitations and propose a complete system for dynamic object tracking in 3D space that combines: (1) a 3D position tracking algorithm based on monocular camera and LIDAR for the dynamic object; (2) a re-tracking mechanism (RTM) that restore tracking when the target reappears in camera's field of view. Compared with the existing methods, each sensor in our method is capable of performing its role to preserve reliability, and further extending its functions through a novel multimodality fusion module. We perform experiments in the real-world self-driving environment and achieve a desired 10Hz update rate for real-time performance. Our quantitative and qualitative analysis shows that this system is reliable for dynamic object tracking purposes of self-driving cars.

15:00-15:15, Paper WeCT22.5
Online Configuration Selection for Redundant Arrays of Inertial Sensors: Application to Robotic Systems Covered with a Multimodal Artificial Skin
Video Attachment

Leboutet, Quentin	Technical University of Munich
Bergner, Florian	Technical University of Munich
Cheng, Gordon	Technical University of Munich
Keywords: Sensor Networks, Multi-Modal Perception Abstract: Multiple approaches to the estimation of high-order motion derivatives for innovative control applications now rely on the data collected by redundant arrays of inertial sensors mounted on robots, with promising results. However, most of these works suffer scalability issues induced by the considerable amount of data generated by such large-scale distributed sensor systems. In this article, we propose a new adaptive sensor-selection algorithm, for distributed inertial measurements. Our approach consists in using the data of a subset of sensors, selected among a larger collection of inertial sensing elements covering a rigid robot link. The sensor selection process is formulated as an optimization problem, and solved using a projected gradient heuristics. The proposed method can run online on a robot and be used to recalculate the selected sensor arrangement on the fly when physical interaction or potential sensor failure is detected. The tests performed on a simulated UR5 industrial manipulator covered with a multimodal artificial skin, demonstrate the consistency and performance of the proposed sensor-selection algorithm.

15:15-15:30, Paper WeCT22.6
Robust Robotic Pouring Using Audition and Haptics
Video Attachment

Liang, Hongzhuo	University of Hamburg
Zhou, Chuangchuang	RWTH Aachen University
Li, Shuang	University of Hamburg
Ma, Xiaojian	University of California, Los Angeles
Hendrich, Norman	University of Hamburg
Gerkmann, Timo	University of Hamburg
Sun, Fuchun	Tsinghua University
Stoffel, Marcus	RWTH Aachen University
Zhang, Jianwei	University of Hamburg
Keywords: Sensor Fusion, Robot Audition, Force and Tactile Sensing Abstract: Robust and accurate estimation of liquid height lies as an essential part of pouring tasks for service robots. However, vision-based methods often fail in occluded conditions while audio-based methods cannot work well in a noisy environment. We instead propose a multimodal pouring network (MP-Net) that is able to robustly predict liquid height by conditioning on both audition and haptics input. MP-Net is trained on a self-collected multimodal pouring dataset. This dataset contains 300 robot pouring recordings with audio and force/torque measurements for three types of target containers. We also augment the audio data by inserting robot noise. We evaluated MP-Net on our collected dataset and a wide variety of robot experiments. Both network training results and robot experiments demonstrate that MP-Net is robust against noise and changes to the task and environment. Moreover, we further combine the predicted height and force data to estimate the shape of the target container.


WeCT23	Room T23
Sensor-Based Estimation	Regular session
Chair: Semini, Claudio	Istituto Italiano Di Tecnologia
Co-Chair: Song, Dezhen	Texas A&M University

14:00-14:15, Paper WeCT23.1
ARPDR: An Accurate and Robust Pedestrian Dead Reckoning System for Indoor Localization on Handheld Smartphones

Teng, Xiaoqiang	Didi Chuxing
Xu, Pengfei	Didi Chuxing
Guo, Deke	National University of Defense Technology
Guo, Yulan	National University of Defense Technology
Hu, Runbo	Didi Chuxing
Chai, Hua	Didi Chuxing
Keywords: Sensor Fusion, Localization, AI-Based Methods Abstract: The proliferation of mobile computing has prompted Pedestrian Dead Reckoning (PDR) to be one of the most attractive and promising indoor localization techniques for ubiquitous applications. The existing PDR approaches either suffer position drifts caused by accumulative errors or are sensitive to various users. This paper presents ARPDR, an accurate and robust PDR approach to improve the accuracy and robustness of indoor localization methods. Particularly, we propose a novel step counting algorithm based on motion models by deeply exploiting inertial sensor data. We then combine step counting with adaptive thresholding to personalize the PDR system for different users. Furthermore, we propose a novel stride-heading model with a deep neural network to predict stride lengths and walking orientations, thus the displacement errors are significantly reduced. Extensive experiments on public datasets demonstrate that ARPDR outperforms the state-of-the-art PDR methods.

14:15-14:30, Paper WeCT23.2
Fingertip Non-Contact Optoacoustic Sensor for Near-Distance Ranging and Thickness Differentiation for Robotic Grasping

Fang, Cheng	Texas A&M University
Wang, Di	Texas A&M University
Song, Dezhen	Texas A&M University
Zou, Jun	Texas A&M University
Keywords: Sensor Fusion, Range Sensing, Grasping Abstract: We report the feasibility study of a new optoacoustic sensor for both near-distance ranging and material thickness classification for robotic grasping. It is based on the optoacoustic effect where focused laser pulses are used to generate wideband ultrasound signals in the target. With a much smaller optical focal spot, the optoacoustic sensor achieves a lateral resolution of 93 μm, which is six times higher than ultrasound pulse-echo ranging under the same condition. A new multi-mode wideband PZT (lead zirconate titanate) transducer is built to properly receive the wideband optoacoustic signal. The ability to receive both low- and high-frequency components of the optoacoustic signal enhances the material sensing capability, which makes it promising to determine not only material type but also the sub-surface structures. For demonstration, optoacoustic spectra are collected from hard and soft materials with different thickness. A Bag-of-SFA-Symbols (BOSS) classifier is designed to perform primary material and then thickness classification based on the optoacoustic spectra. The accuracy of material / thickness classification reaches ≥ 99% and ≥ 94%, respectively, which shows the feasibility of differentiating solid materials with different thickness by the optoacoustic sensor.

14:30-14:45, Paper WeCT23.3
A Mobile Robot Hand-Arm Teleoperation System by Vision and IMU
Video Attachment

Li, Shuang	University of Hamburg
Jiang, Jiaxi	RWTH Aachen University
Ruppel, Philipp	University of Hamburg
Liang, Hongzhuo	University of Hamburg
Ma, Xiaojian	University of California, Los Angeles
Hendrich, Norman	University of Hamburg
Sun, Fuchun	Department of Computer Science and Technology, Tsinghua Universi
Zhang, Jianwei	University of Hamburg
Keywords: Telerobotics and Teleoperation, Gesture, Posture and Facial Expressions, Computer Vision for Automation Abstract: In this paper, we present a multimodal mobile teleoperation system that consists of a novel vision-based hand pose regression network (Transteleop) and an IMU (inertial measurement units)-based arm tracking method. Transteleop observes the human hand through a low-cost depth camera and generates not only joint angles but also depth images of paired robot hand poses through an image-to-image translation process. A keypoint-based reconstruction loss explores the resemblance in appearance and anatomy between human and robotic hands and enriches the local features of reconstructed images. A wearable camera holder enables simultaneous hand-arm control and facilitates the mobility of the whole teleoperation system. Network evaluation results on a test dataset and a variety of complex manipulation tasks that go beyond simple pick-and-place operations show the efficiency and stability of our multimodal teleoperation system.

14:45-15:00, Paper WeCT23.4
Supervised Autoencoder Joint Learning on Heterogeneous Tactile Sensory Data: Improving Material Classification Performance

Gao, Ruihan	Nanyang Technological University
Taunyazov, Tasbolat	National University of Singapore
Lin, Zhiping	Nanyang Technological University
Wu, Yan	A*STAR Institute for Infocomm Research
Keywords: Sensor Fusion, Force and Tactile Sensing, Learning Categories and Concepts Abstract: The sense of touch is an essential sensing modality for a robot to interact with the environment as it provides rich and multimodal sensory information upon contact. It enriches the perceptual understanding of the environment and closes the loop for action generation. One fundamental area of perception that touch dominates over other sensing modalities, is the understanding of the materials that it interacts with, for example, glass versus plastic. However, unlike the senses of vision and audition which have standardized data format, the format for tactile data is vastly dictated by the sensor manufacturer, which makes it difficult for large-scale learning on data collected from heterogeneous sensors, limiting the usefulness of publicly available tactile datasets. This paper investigates the joint learnability of data collected from two tactile sensors performing a touch sequence on some common materials. We propose a supervised recurrent autoencoder framework to perform joint material classification task to im- prove the training effectiveness. The framework is implemented and tested on the two sets of tactile data collected in sliding motion on 20 material textures using the iCub RoboSkin tactile sensors and the SynTouch BioTac sensor respectively. Our results show that the learning efficiency and accuracy improve for both datasets through the joint learning as compared to independent dataset training. This suggests the usefulness for large-scale open tactile datasets sharing with different sensors.

15:00-15:15, Paper WeCT23.5
Proprioceptive Sensor Fusion for Quadruped Robot State Estimation

Fink, Geoff	Istituto Italiano Di Tecnologia
Semini, Claudio	Istituto Italiano Di Tecnologia
Keywords: Sensor Fusion, Performance Evaluation and Benchmarking, Legged Robots Abstract: Estimation of a quadruped's state is fundamentally important to its operation. In this paper we develop a low-level state estimator for quadrupedal robots that includes attitude, odometry, ground reaction forces, and contact detection. The state estimator is divided into three parts. First, a nonlinear observer estimates attitude by fusing inertial measurements. The attitude estimator is globally exponentially stable and is able to initialize with large errors in the initial state estimates whereas a state-of-the-art EKF would diverge. This is practical for situations when the robot has fallen over and needs to start from its side. Second, leg odometry is calculated with encoders, force sensors, and torque sensors in the robot's joints. Lastly, the leg odometry and inertial measurements are fused to obtain linear position and velocity. We experimentally validate the state estimator using a novel dataset from the HyQ robot. For the entirety of the experiment the estimated attitude matched the ground truth data and had a root mean square error (RMSE) of [2 1 5] deg, the velocity estimates has a RMSE of [0.11 0.15 0.04] m/s, and the position estimates, which are unobservable, drifted on average [2 1 8] mm/s.

15:15-15:30, Paper WeCT23.6
Real-Time Robot End-Effector Pose Estimation with Deep Network

Cheng, Hu	The Chinese University of Hong Kong
Wang, Yingying	The Chinese University of Hong Kong
Meng, Max Q.-H.	The Chinese University of Hong Kong
Keywords: Service Robots, Service Robotics, Computer Vision for Other Robotic Applications Abstract: In this paper, we propose a novel algorithm that estimates the pose of the robot end effector using depth vision. The input to our system is the segmented robot hand point cloud from a depth sensor. Then a neural network takes a point cloud as input and outputs the position and orientation of the robot end effector in the camera frame. The estimated pose can serve as the input of the controller of the robot to reach a specific pose in the camera frame. The training process of the neural network takes the simulated rendered point cloud generated from different poses of the robot hand mesh. At test time, one estimation of a single robot hand pose is reduced to 10ms on gpu and 14ms on cpu, which makes it suitable for close loop robot control system that requires to estimate hand pose in an online fashion. We design a robot hand pose estimation experiment to validate the effectiveness of our algorithm working in the real situation. The platform we used includes a Kinova Jaco 2 robot arm and a Kinect v2 depth sensor.


WeKN1	Room T27
Keynote 9	Keynote session
Chair: Oh, Paul Y.	University of Nevada, Las Vegas (UNLV)
Co-Chair: O'Malley, Marcia	Rice University

15:45-16:30, Paper WeKN1.1
Human + Robot Teams, from Theory to Practice

Thomaz, Andrea Lockerd	University of Texas at Austin


WeDT1	Room T1
Brain-Machine Interfaces for Human Robot Interaction	Regular session
Chair: Park, Sang Hyun	DGIST
Co-Chair: Allen, Peter	Columbia University

16:30-16:45, Paper WeDT1.1
A Human-Robot Interface Based on Surface Electroencephalographic Sensors
Video Attachment

Mavridis, Christos	University of Maryland
Baras, John	University of Maryland
Kyriakopoulos, Kostas	National Technical Univ. of Athens
Keywords: Neurorobotics, Brain-Machine Interface, Telerobotics and Teleoperation Abstract: We propose a human-robot interface based on potentials recorded through surface Electroencephalographic sensors, aiming to decode human visual attention into motion in three-dimensional space. Low-frequency components are extracted and processed in real time, and subspace system identification methods are used to derive the optimal, in mean squared sense, linear dynamics generating the position vectors. This results in a human-robot interface that can be used directly in robot teleoperation or as part of a shared-control robotic manipulation scheme, feels natural to the user, and is appropriate for upper extremity amputees, since it requires no limb movement. We validate our methodology by teleoperating a redundant, anthropomorphic robotic arm in real time. The system's performance outruns similar EMG-based systems, and shows low long-term model drift, indicating no need for frequent model re-training.

16:45-17:00, Paper WeDT1.2
Few-Shot Relation Learning with Attention for EEG-Based Motor Imagery Classification

An, Sion	DGIST
Kim, Soopil	DGIST
Chikontwe, Philip	DGIST
Park, Sang Hyun	DGIST
Keywords: Brain-Machine Interface, AI-Based Methods, Representation Learning Abstract: Brain-Computer Interfaces (BCI) based on Electroencephalography (EEG) signals, in particular motor imagery (MI) data have received a lot of attention and show the potential towards the design of key technologies both in healthcare and other industries. MI data is generated when a subject imagines movement of limbs and can be used to aid rehabilitation as well as in autonomous driving scenarios. Thus, classification of MI signals is vital for EEG-based BCI systems. Recently, MI EEG classification techniques using deep learning have shown improved performance over conventional techniques. However, due to inter-subject variability, the scarcity of unseen subject data, and low signal-to-noise ratio, extracting robust features and improving accuracy is still challenging. In this context, we propose a novel two-way few shot network that is able to efficiently learn how to learn representative features of unseen subject categories and how to classify them with limited MI EEG data. The pipeline includes an embedding module that learns feature representations from a set of samples, an attention mechanism for key signal feature discovery, and a relation module for final classification based on relation scores between a support set and a query signal. In addition to the unified learning of feature similarity and a few shot classifier, our method leads to emphasize informative features in support data relevant to the query data, which generalizes better on unseen subjects. For evaluation, we used the BCI competition IV 2b dataset and achieved an 9.3% accuracy improvement in the 20-shot classification task with state-of-the-art performance. Experimental results demonstrate the effectiveness of employing attention and the overall generality of our method.

17:00-17:15, Paper WeDT1.3
Event-Based PID Controller Fully Realized in Neuromorphic Hardware: A One DoF Study

Stagsted, Rasmus	University of Southern Denmark
Vitale, Antonio	ETH Zurich
Renner, Alpha	Institute of Neuroinformatics, University of Zurich and ETH Zuri
Bonde Larsen, Leon	University of Southern Denmark
Christensen, Anders Lyhne	University Institute of Lisbon
Sandamirskaya, Yulia	Intel
Keywords: Neurorobotics Abstract: Spiking Neuronal Networks (SNNs) realized in neuromorphic hardware lead to low-power and low-latency neuronal computing architectures that have the potential to enable intelligent control of robotic platforms, such as drones, that have strict constraints on the power budget. Neuromorphic computing systems are most efficient when all of perception, decision making, and motor control are seamlessly integrated into a single neuronal architecture that can be realised on neuromorphic hardware. While most work in neuromorphic computing targets perception and pattern recognition tasks, here we present an improved implementation of a neuromorphic PID controller. The controller was realised on Intel's neuromorphic research chip Loihi and used to control a drone, constrained to rotate on a single axis. The SNN controller is built using neuronal populations, in which a single spike carries information about sensed and control signals. Neuronal arrays perform computation on such sparse representations to calculate the proportional, derivative, and integral terms. The SNN PID controller was compared to a PID controller, implemented in software and achieved a comparable performance, paving the way to a fully neuromorphic system, in which perception, planning, and control are realized in an on-chip SNN.

17:15-17:30, Paper WeDT1.4
Maximizing BCI Human Feedback Using Active Learning
Video Attachment

Wang, Zizhao	University of Michigan-Ann Arbor
Shi, Junyao	Columbia University
Akinola, Iretiayo	Columbia University
Allen, Peter	Columbia University
Keywords: Brain-Machine Interface, Novel Deep Learning Methods, Human Factors and Human-in-the-Loop Abstract: Recent advancements in Learning from Human Feedback present an effective way to train robot agents via inputs from non-expert humans, without a need for a specially designed reward function. However, this approach needs a human to be present and attentive during robot learning to provide evaluative feedback. In addition, the amount of feedback needed grows with the level of task difficulty and the quality of human feedback might decrease over time because of fatigue. To overcome these limitations and enable learning more robot tasks with higher complexities, there is a need to maximize the quality of expensive feedback received and reduce the amount of human cognitive involvement required. In this work, we present an approach that uses active learning to smartly choose queries for the human supervisor based on the uncertainty of the robot and effectively reduce the amount of feedback needed to learn a given task. We also use a novel multiple buffer system to improve robustness to feedback noise and guard against catastrophic forgetting as the robot learning evolves. This makes it possible to learn tasks with more complexity using lesser amounts of human feedback compared to previous methods. We demonstrate the utility of our proposed method on a robot arm reaching task where the robot learns to reach a location in 3D without colliding with obstacles. Our approach is able to learn this task faster, with less human feedback and cognitive involvement, compared to previous methods that do not use active learning.


WeDT2	Room T2
Cognitive Human Robot Interaction	Regular session
Chair: Sadigh, Dorsa	Stanford University
Co-Chair: Carreno, Pamela	Monash University

16:30-16:45, Paper WeDT2.1
Active Preference Learning Using Maximum Regret

Wilde, Nils	University of Waterloo
Kulic, Dana	Monash University
Smith, Stephen L.	University of Waterloo
Keywords: Cognitive Human-Robot Interaction, Human-Centered Robotics Abstract: We study active preference learning as a framework for intuitively specifying the behaviour of autonomous robots. In active preference learning, a user chooses the preferred behaviour from a set of alternatives, from which the robot learns the user's preferences, modeled as a parameterized cost function. Previous approaches present users with alternatives that minimize the uncertainty over the parameters of the cost function. However, different parameters might lead to the same optimal behaviour; as a consequence the solution space is more structured than the parameter space. We exploit this by proposing a query selection that greedily reduces the maximum error ratio over the solution space. In simulations we demonstrate that the proposed approach outperforms other state of the art techniques in both learning efficiency and ease of queries for the user. Finally, we show that evaluating the learning based on the similarities of solutions instead of the similarities of weights allows for better predictions for different scenarios.

16:45-17:00, Paper WeDT2.2
Learning User-Preferred Mappings for Intuitive Robot Control
Video Attachment

Li, Mengxi	Stanford University
Losey, Dylan	Stanford University
Bohg, Jeannette	Stanford University
Sadigh, Dorsa	Stanford University
Keywords: Cognitive Human-Robot Interaction, Human-Centered Robotics, Telerobotics and Teleoperation Abstract: When humans control drones, cars, and robots, we often have some preconceived notion of how our inputs should make the system behave. Existing approaches to teleoperation typically assume a one-size-fits-all approach, where the designers pre-define a mapping between human inputs and robot actions, and every user must adapt to this mapping over repeated interactions. Instead, we propose a personalized method for learning the human's preferred or preconceived mapping from a few robot queries. Given a robot controller, we identify an alignment model that transforms the human's inputs so that the controller's output matches their expectations. We make this approach data-efficient by recognizing that human mappings have strong priors: we expect the input space to be proportional, reversable, and consistent. Incorporating these priors ensures that the robot learns an intuitive mapping from few examples. We test our learning approach in robot manipulation tasks inspired by assistive settings, where each user has different personal preferences and physical capabilities for teleoperating the robot arm. Our simulated and experimental results suggest that learning the mapping between inputs and robot actions improves objective and subjective performance when compared to manually defined alignments or learned alignments without intuitive priors. The supplementary video showing these user studies can be found at: https://youtu.be/rKHka0_48-Q.

17:00-17:15, Paper WeDT2.3
Quantitative Operator Strategy Comparisons across Human Supervisory Control Scenarios

Zhu, Haibei	Duke University
Xu, Rong	Duke University
Cummings, M. L.	Duke
Keywords: Cognitive Human-Robot Interaction, Human Factors and Human-in-the-Loop, Probability and Statistical Methods Abstract: Human-automation collaborations, like automated driving assistance and piloting drones, have become prevalent as these technologies become more commonplace. Designers need tools that help them understand how and why design interventions may change the strategies of operators in such complex human supervisory control systems. To this end, we demonstrate that when the divergence metric is applied to Hidden Markov Model (HMM) comparisons, it can accurately capture statistical differences between operator strategies for interfaces that embody different tasks. However, the use of such an approach is problematic when used to compare HMM strategy models with non-equivalent observations. To address this limitation, we developed an observation reduction approach and conducted a sensitivity analysis to assess the impact of this approach. Our results show that when comparing two non-equivalent interfaces, our observation reduction approach does not fundamentally change the divergence metric, thus allowing for direct model comparison. The results further show that HMMs from different interfaces produce a much higher divergence metric than model comparison from the same people who repeatedly use the same interface. Future work will examine if this method can detect differences in models with different tasks or modified interfaces.

17:15-17:30, Paper WeDT2.4
Abductive Recognition of Context-Dependent Utterances in Human-Robot Interaction

Lanza, Davide	University of Genova
Menicatti, Roberto	Universit� Di Genova
Sgorbissa, Antonio	University of Genova
Keywords: Cognitive Human-Robot Interaction Abstract: Context-dependent meaning recognition in natural language utterances is one of the key problems of computational pragmatics. Abductive reasoning seems apt for modeling and understanding these phenomena. In fact, it presents observations through hypotheses, allowing us to understand subtexts and implied meanings without exact deductions. For this reason in this paper, we are going to explore abductive reasoning and context modeling in human-robot interaction. Rather than a radical inferential approach, we assumed a conventional approach towards context-depending meanings, i.e, they are conventionally encoded rather than inferred from the utterances. In order to address the problem, a case study is presented, analyzing whether such a system could manage correctly these linguistic phenomena. The results obtained confirm the validity of a conventional approach in context modeling and, on this basis, further models are proposed to work around the limitations of the case study.

17:30-17:45, Paper WeDT2.5
Designing Environments Conducive to Interpretable Robot Behavior

Kulkarni, Anagha	Arizona State University
Sreedharan, Sarath	Arizona State University
Keren, Sarah	Harvard University
Chakraborti, Tathagata	IBM
Smith, David	PS Research
Kambhampati, Subbarao	Arizona State University
Keywords: Cognitive Human-Robot Interaction, Social Human-Robot Interaction, Human-Centered Robotics Abstract: Designing robots capable of generating interpretable behavior is essential for effective human-robot collaboration. This requires robots to be able to generate behavior that aligns with human expectations but exhibiting such behavior in arbitrary environments could be quite expensive for robots, and in some cases, the robot may not even be able to exhibit expected behavior. However, in structured environments (like warehouses, restaurants, etc.), it may be possible to design the environment so as to boost the interpretability of a robot's behavior or to shape the human's expectations of the robot's behavior. In this paper, we investigate the opportunities and limitations of environment design as a tool to promote a particular type of interpretable behavior -- known in the literature as explicable behavior. We formulate a novel environment design framework that considers design over multiple tasks and over a time horizon. In addition, we explore the longitudinal effect of explicable behavior and the trade-off that arises between the cost of design and the cost of generating explicable behavior over an extended time horizon.

17:45-18:00, Paper WeDT2.6
ETRI-Activity3D: A Large-Scale RGB-D Dataset for Robots to Recognize Daily Activities of the Elderly

Jang, Jinhyeok	ETRI
Kim, DoHyung	Electronics and Telecommunications Research Institute
Park, Cheonshu	Electronics and Telecommunications Research Institute
Jang, Minsu	Electronics & Telecommunications Research Institute
Lee, Jaeyeon	ETRI
Kim, Jaehong	ETRI
Keywords: Cognitive Human-Robot Interaction, Deep Learning for Visual Perception, RGB-D Perception Abstract: Deep learning, based on which many modern algorithms operate, is well known to be data-hungry. In particular, the datasets appropriate for the intended application are difficult to obtain. To cope with this situation, we introduce a new dataset called ETRI-Activity3D, focusing on the daily activities of the elderly in robot-view. The major characteristics of the new dataset are as follows: 1) practical action categories that are selected from the close observation of the daily lives of the elderly; 2) realistic data collection, which reflects the robot�s working environment and service situations; and 3) a large-scale dataset that overcomes the limitations of the current 3D activity analysis benchmark datasets. The proposed dataset contains 112,620 samples including RGB videos, depth maps, and skeleton sequences. During the data acquisition, 100 subjects were asked to perform 55 daily activities. Additionally, we propose a novel network called four-stream adaptive CNN (FSA-CNN). The proposed FSA-CNN has three main properties: robustness to spatio-temporal variations, input-adaptive activation function, and extension of the conventional two-stream approach. In the experiment section, we confirmed the superiority of the proposed FSA-CNN using NTU RGB+D and ETRI-Activity3D. Further, the domain difference between both groups of age was verified experimentally. Finally, the extension of FSA-CNN to deal with the multimodal data was investigated.


WeDT3	Room T3
HRI Safety: Avoiding Collisions	Regular session
Chair: Ferraguti, Federica	Universit� Degli Studi Di Modena E Reggio Emilia
Co-Chair: Michael, Nathan	Carnegie Mellon University

16:30-16:45, Paper WeDT3.1
Assisted Mobile Robot Teleoperation with Intent-Aligned Trajectories Via Biased Incremental Action Sampling
Video Attachment

Yang, Xuning	Carnegie Mellon University
Michael, Nathan	Carnegie Mellon University
Keywords: Telerobotics and Teleoperation, Collision Avoidance, Motion and Path Planning Abstract: We present a method to assist the operator in teleoperation of mobile robots by generating trajectories such that the vehicle completes the desired task with ease in unstructured environments. Traditional assisted teleoperation methods have focused on reactive methods to avoid collisions, but neglect the operator�s intention in doing so. Instead, we generate long horizon, smooth trajectories that follow the operator�s intended direction while circumventing obstacles for a seamless teleoperation experience. For mobile robot teleoperation, an explicit goal in the state space is often unclear in cases such as exploration or navigation. Therefore, we model the intent as a direction and encode it as a cost function. As trajectories of various lengths can satisfy the same directional objective, we iteratively construct a tree of sequential actions that form multiple trajectories along the intended direction. We show our algorithm on a real-time teleoperation task of a simulated hexarotor vehicle in a dense random forest environment.

16:45-17:00, Paper WeDT3.2
L2B: Learning to Balance the Safety-Efficiency Trade-Off in Interactive Crowd-Aware Robot Navigation
Video Attachment

Nishimura, Mai	Omron Sinic X
Yonetani, Ryo	Omron Sinic X
Keywords: Social Human-Robot Interaction, Collision Avoidance, Motion and Path Planning Abstract: This work presents a deep reinforcement learning framework for interactive navigation in a crowded place. Our proposed Learning to Balance (L2B) framework enables mobile robot agents to steer safely towards their destinations by avoiding collisions with a crowd, while actively clearing a path by asking nearby pedestrians to make room, if necessary, to keep their travel efficient. We observe that the safety and efficiency requirements in crowd-aware navigation have a trade-off in the presence of social dilemmas between the agent and the crowd. On the one hand, intervening in pedestrian paths too much to achieve instant efficiency will result in collapsing a natural crowd flow and may eventually put everyone, including the self, at risk of collisions. On the other hand, keeping in silence to avoid every single collision will lead to the agent's inefficient travel. With this observation, our L2B framework augments the reward function used in learning an interactive navigation policy to penalize frequent active path clearing and passive collision avoidance, which substantially improves the balance of the safety-efficiency trade-off. We evaluate our L2B framework in a challenging crowd simulation and demonstrate its superiority, in terms of both navigation success and collision rate, over a state-of-the-art navigation approach.

17:00-17:15, Paper WeDT3.3
Collision Risk Assessment Via Awareness Estimation Toward Robotic Attendant

Koide, Kenji	National Institute of Advanced Industrial Science and Technology
Miura, Jun	Toyohashi University of Technology
Keywords: Social Human-Robot Interaction, Service Robotics, Human-Centered Robotics Abstract: With the aim of contributing to the development of a robotic attendant system, this study proposes the concept of assessing the risk of collision using awareness estimation. The proposed approach enables an attendant robot to assess a person's risk of colliding with an obstacle by estimating whether he/she is aware of it based on behavior, and to take the requisite preventative action. To implement the proposed concept, we design a model that can simultaneously estimate a person's awareness of obstacles and predict his/her trajectory based on a convolutional neural network. When trained on a dataset of collision-related behaviors generated from people trajectory datasets, the model can detect objects of which the person is not aware and with which he/she at risk of colliding. The proposed method was evaluated in an empirical environment, and the results verified its effectiveness.

17:15-17:30, Paper WeDT3.4
A Control Barrier Function Approach for Maximizing Performance While Fulfilling to ISO/TS 15066 Regulations
Video Attachment

Ferraguti, Federica	Universit� Degli Studi Di Modena E Reggio Emilia
Bertuletti, Mattia	University of Modena and Reggio Emilia
Talignani Landi, Chiara	University of Modena and Reggio Emilia
Bonfe, Marcello	University of Ferrara
Fantuzzi, Cesare	Universit� Di Modena E Reggio Emilia
Secchi, Cristian	Univ. of Modena & Reggio Emilia
Keywords: Physical Human-Robot Interaction, Robot Safety, Industrial Robots Abstract: ISO/TS 15066 is globally recognized as the guideline for designing safe collaborative robotic cells, where human and robot collaborate in order to fulfill a common job. Current approaches for implementing the ISO/TS 15066 guidelines lead to a conservative behavior (e.g. low velocity) of the robot and, consequently, to poor performance of the collaborative cell. In this paper, we propose an approach based on control barrier functions that allows to maximize the performance of a robot acting in a collaborative cell while satisfying the ISO/TS 15066 regulations. The proposed approach has been successfully validated both in simulation and through experiments.


WeDT4	Room T4
HRI Safety: Proximity Awareness	Regular session
Chair: Chernova, Sonia	Georgia Institute of Technology
Co-Chair: Bruckschen, Lilli	University of Bonn

16:30-16:45, Paper WeDT4.1
Learning Human-Aware Robot Navigation from Physical Interaction Via Inverse Reinforcement Learning
Video Attachment

Kollmitz, Marina	University of Freiburg
Koller, Torsten	University of Freiburg
Boedecker, Joschka	University of Freiburg
Burgard, Wolfram	Toyota Research Institute
Keywords: Physical Human-Robot Interaction, Learning from Demonstration, Social Human-Robot Interaction Abstract: Autonomous systems, such as delivery robots, are increasingly employed in indoor spaces to carry out activities alongside humans. This development poses the question of how robots can carry out their tasks while, at the same time, behaving in a socially compliant manner. Further, humans need to be able to communicate their preferences in a simple and intuitive way, and robots should adapt their behavior accordingly. This paper investigates force control as a natural means to interact with a mobile robot by pushing it along the desired trajectory. We employ inverse reinforcement learning (IRL) to learn from human interaction and adapt the robot behavior to its users' preferences, thereby eliminating the need to program the desired behavior manually. We evaluate our approach in a real-world experiment where test subjects interact with an autonomously navigating robot in close proximity. The results suggest that force control presents an intuitive means to interact with a mobile robot and show that our robot can quickly adapt to the test subjects' personal preferences.

16:45-17:00, Paper WeDT4.2
Human-Aware Robot Navigation by Long-Term Movement Prediction
Video Attachment

Bruckschen, Lilli	University of Bonn
Bungert, Kira	University of Bonn
Dengler, Nils	University of Bonn
Bennewitz, Maren	University of Bonn
Keywords: Social Human-Robot Interaction, Human-Centered Robotics, Motion and Path Planning Abstract: Foresighted, human-aware navigation is a prerequisite for service robots acting in indoor environments. In this paper, we present a novel human-aware navigation approach that relies on long-term prediction of human movements. In particular, we consider the problem of finding a path from the robot's current position to the initially unknown navigation goal of a moving user to provide timely assistance there. The navigation strategy has to minimize the robot's arrival time and at the same time comply with the user's comfort during the movement. Our solution predicts the user's navigation goal based on the robot's observations and prior knowledge about typical human transitions between objects. Based on the motion prediction, we then compute a time-dependent cost map that encodes the belief about the user's positions at future time steps. Using this map, we solve the time-dependent shortest path problem to find an efficient path for the robot, which still abides by the rules of human comfort. To identify robot navigation actions that are perceived as uncomfortable by humans, we performed user surveys and defined the corresponding constraints. We thoroughly evaluated our navigation system in simulation as well as in real-world experiments. As the results show, our system outperforms existing approaches in terms of human comfort, while still minimizing arrival times of the robot.

17:00-17:15, Paper WeDT4.3
A Game-Theoretic Strategy-Aware Interaction Algorithm with Validation on Real Traffic Data
Video Attachment

Sun, Liting	University of California, Berkeley
Cai, Mu	Xi'an Jiaotong University
Zhan, Wei	Univeristy of California, Berkeley
Tomizuka, Masayoshi	University of California
Keywords: Social Human-Robot Interaction, Behavior-Based Systems, Autonomous Agents Abstract: Interactive decision-making and motion planning are important to safety-critical autonomous agents, particularly when they interact with humans. Many different interaction strategies can be exploited by humans. For instance, they might ignore the autonomous agents, or might behave as selfish optimizers by treating the autonomous agents as opponents, or might assume themselves as leaders and the autonomous agents as followers who should take responsive actions. Different interaction strategies can lead to quite different closed-loop dynamics, and misalignment between the human's policy and the autonomous agent's belief over the policy will severely impact both safety and efficiency. Moreover, a human's interaction policy can change as interaction goes on. Hence, autonomous agents need to be aware of such uncertainties on the human policy, and integrate such information into their decision-making and motion planning algorithms. In this paper, we propose a policy-aware interaction strategy based on game theory. The goal is to allow autonomous agents to estimate humans' interactive policies and respond consequently. We validate the proposed algorithm on a roundabout scenario with real traffic data. The results show that the proposed algorithm can yield trajectories that are more similar to the ground truth than those with fixed policies. Also, we estimate how humans adjust their interaction strategies statistically based on the proposed algorithm.

17:15-17:30, Paper WeDT4.4
Online Velocity Constraint Adaptation for Safe and Efficient Human-Robot Workspace Sharing
Video Attachment

Joseph, Lucas	General Electrics
Pickard, Joshua	Inria Bordeaux Sud-Ouest
Padois, Vincent	Inria Bordeaux
Daney, David	Inria Bordeaux - Sud Ouest
Keywords: Robot Safety, Human Factors and Human-in-the-Loop, Optimization and Optimal Control Abstract: Despite the many advances in collaborative robotics, collaborative robot control laws remain similar to the ones used in more standard industrial robots, significantly reducing the capabilities of the robot when in proximity to a human. Improving the efficiency of collaborative robots requires revising the control approaches and modulating online and in real-time the low-level control of the robot to strictly ensure the safety of the human while guaranteeing efficient task realization. In this work, an openly simple and fast optimization based joint velocity controller is proposed which modulates the joint velocity constraints based on the robot�s braking capabilities and the separation distance. The proposed controller is validated on the 7 degrees-of-freedom Franka Emika Panda collaborative robot.

17:30-17:45, Paper WeDT4.5
Anticipatory Human-Robot Collaboration Via Multi-Objective Trajectory Optimization

Jain, Abhinav	Georgia Institute of Technology
Chen, Daphne	Georgia Institute of Technology
Bansal, Dhruva	Georgia Institute of Technology
Scheele, Samuel	Georgia Institute of Technology
Kishore, Mayank	Georgia Institute of Technology
Sapra, Hritik	Georgia Institute of Technology
Kent, David	Georgia Institute of Technology
Ravichandar, Harish	Georgia Institute of Technology
Chernova, Sonia	Georgia Institute of Technology
Keywords: Human-Centered Robotics Abstract: We address the problem of adapting robot trajectories to improve safety, comfort, and efficiency in human-robot collaborative tasks. To this end, we propose CoMOTO, a trajectory optimization framework that utilizes stochastic motion prediction to anticipate the human's motion and adapt the robot's joint trajectory accordingly. We design a multi-objective cost function that simultaneously optimizes for i) separation distance, ii) visibility of the end-effector, iii) legibility, iv) efficiency, and v) smoothness. We evaluate CoMOTO against three existing methods for robot trajectory generation when in close proximity to humans. Our experimental results indicate that our approach consistently outperforms existing methods over a combined set of safety, comfort, and efficiency metrics.

17:45-18:00, Paper WeDT4.6
Water Based Magnification of Capacitive Proximity Sensors: Water Containers As Passive Human Detectors
Video Attachment

Rocha, Rui	University of Coimbra
de Almeida, Anibal	IROS 2012 General Chair
Tavakoli, Mahmoud	University of Coimbra
Keywords: Physical Human-Robot Interaction, Human-Centered Robotics, Human-Centered Automation Abstract: Sensors that detect human presence received an increasing attention due to the recent advances in smart homes, collaborative fabrication cells, and human robot interaction. These sensors can be used in collaborative robot cells and mobile robots, in order to increase the robot awareness about the presence of humans, in order to increase safety during their operation. Among proximity detection systems, capacitive sensors are interesting, since they are low cost and simple human proximity detectors, however their detection range is limited. In this article, we show that the proximity detection range of a capacitive sensor can be enhanced, when the sensor is placed near a water container. In addition, the signal can pass trough several adjacent water containers, even if they are separated by a few centimeters. This phenomenon has an important implication in establishing low cost sensor networks. For instance, a limited number of active capacitive sensor nodes can be linked with several simple passive nodes, i.e. water containers, to detect human or animal proximity in a large area such as a farm, a factory or home. Analysis on the change of the maximum proximity range with sensor dimension, container size and liquid filler was performed in order to study this effect. Examples of application are also demonstrated.


WeDT5	Room T5
HRI: Gaze	Regular session
Chair: Pan, Matthew	Disney Research
Co-Chair: Kasneci, Enkelejda	University of T�bingen

16:30-16:45, Paper WeDT5.1
Gaze by Semi-Virtual Robotic Heads: Effects of Eye and Head Motion
Video Attachment

V�zquez, Marynel	Yale University
Milkessa, Yofti	Yale University
Li, Michelle M.	Yale University
Govil, Neha	Massachusetts Institute of Technology
Keywords: Social Human-Robot Interaction Abstract: We study human perception of gaze rendered by popular semi-virtual robotic heads, which use a screen to render a robot's face. It is known that when these heads are stationary, the screen may induce the Mona Lisa gaze effect, which widens the robot's apparent cone of direct gaze. But how do people perceive gaze when the head can move as well? To study this question, we conducted a laboratory experiment that investigated human perception of robot gaze when a semi-virtual platform looked in different directions. We varied the way in which the robot conveyed gaze, using several behaviors involving 2D eye and head motion. Our results suggest that the interplay between these motions can regulate how wide users perceive the robot's cone of direct gaze. Also, our findings suggest that the location of observers can affect the perception of gaze by semi-virtual robotic heads. We discuss the implications of our findings for social interaction.

16:45-17:00, Paper WeDT5.2
Realistic and Interactive Robot Gaze
Video Attachment

Pan, Matthew	Disney Research
Choi, Sungjoon	Disney Research
Kennedy, James	Disney Research
McIntosh, Kyna	Disney Research
Campos Zamora, Daniel	Disney Research
Niemeyer, G�nter	Disney Research
Kim, Joohyung	University of Illinois at Urbana-Champaign
Wieland, Alexis	Walt Disney Imagineering
Christensen, David	Stanford University
Keywords: Social Human-Robot Interaction, Humanoid Robot Systems, Entertainment Robotics Abstract: This paper describes the development of a system for lifelike gaze in human-robot interactions using a humanoid Audio-Animatronics^� bust. Previous work examining mutual gaze between robots and humans has focused on technical implementation. We present a general architecture that seeks not only to create gaze interactions from a technological standpoint, but also through the lens of character animation where the fidelity and believability of motion is paramount; that is, we seek to create an interaction which demonstrates the illusion of life. A complete system is described that perceives persons in the environment, identifies persons-of-interest based on salient actions, selects an appropriate gaze behavior, and executes high fidelity motions to respond to the stimuli. We use mechanisms that mimic motor and attention behaviors analogous to those observed in biological systems including attention habituation, saccades, and differences in motion bandwidth for actuators. Additionally, a subsumption architecture allows layering of simple motor movements to create increasingly complex behaviors which are able to interactively and realistically react to salient stimuli in the environment through subsuming lower levels of behavior. The result of this system is an interactive human-robot experience capable of human-like gaze behaviors.

17:00-17:15, Paper WeDT5.3
Robot Gaze Behaviors in Human-To-Robot Handovers

Kshirsagar, Alap	Cornell University
Lim, Melanie	Cornell University
Christian, Shemar	Cornell University
Hoffman, Guy	Cornell University
Keywords: Social Human-Robot Interaction, Cognitive Human-Robot Interaction, Physical Human-Robot Interaction Abstract: We present the results of two studies investigating gaze behaviors of a robot receiving an object from a human. Robot gaze is an important nonverbal behavior during human-robot handovers, yet prior work has only studied robots as givers. From a frame-by-frame video analysis of human-human handovers, we identified four receiver gaze behaviors: gazing at the giver's hand, gazing at their face, and two kinds of face-hand transition gazes. We implemented these behaviors on a robot arm equipped with an anthropomorphic head. In Study 1, participants watched and compared videos of a handover from a human actor to a robot exhibiting these four gaze behaviors. We found that when the robot transitions its head gaze from the giver's face to the giver's hand, participants consider the handover to be more likable, anthropomorphic, and communicative of timing. In Study 2, a different set of participants physically performed object handovers with the robot and rated their experiences of the handovers for each of the four gaze behaviors of the robot. We found weaker effects with face gaze rated the most likable and anthropomorphic behavior. In contrast to previous studies, we found no evidence that the robot's gaze affected the start time of the human's handover.

17:15-17:30, Paper WeDT5.4
Distilling Location Proposals of Unknown Objects through Gaze Information for Human-Robot Interaction

Weber, Daniel	University of T�bingen
Santini, Thiago	University of T�bingen
Zell, Andreas	University of T�bingen
Kasneci, Enkelejda	University of T�bingen
Keywords: Human-Centered Robotics, Human Factors and Human-in-the-Loop, Computer Vision for Other Robotic Applications Abstract: Successful and meaningful human-robot interaction requires robots to have knowledge about the interaction context � e.g., which objects should be interacted with. Unfortunately, the corpora of interactive objects is � for all practical purposes � infinite. This fact hinders the deployment of robots with pre-trained object-detection neural networks other than in pre-defined scenarios. A more flexible alternative to pre-training is to let a human teach the robot about new objects after deployment. However, doing so manually presents significant usability issues as the user must manipulate the object and communicate the object's boundaries to the robot. In this work, we propose streamlining this process by using automatic object location proposal methods in combination with human gaze to distill pertinent object location proposals. Experiments show that the proposed method 1) increased the precision by a factor of approximately 21 compared to location proposal alone, 2) is able to locate objects sufficiently similar to a state-of-the-art pre-trained deep-learning method (FCOS) without any training, and 3) detected objects that were completely missed by FCOS. Furthermore, the method is able to locate objects for which FCOS was not trained on, which are undetectable for FCOS by definition.


WeDT6	Room T6
HRI: Human Behavior Analysis	Regular session
Chair: Paxton, Chris	NVIDIA Research
Co-Chair: Zanchettin, Andrea Maria	Politecnico Di Milano

16:30-16:45, Paper WeDT6.1
Robust Real-Time Monitoring of Human Task Advancement for Collaborative Robotics Applications
Video Attachment

Maderna, Riccardo	Politecnico Di Milano
Ciliberto, Maria	Politecnico Di Milano
Zanchettin, Andrea Maria	Politecnico Di Milano
Rocco, Paolo	Politecnico Di Milano
Keywords: Human Factors and Human-in-the-Loop, Human-Centered Robotics, Intelligent and Flexible Manufacturing Abstract: A crucial problem in human-robot collaboration is to achieve seamless coordination among the agents. Robots have to adapt to human behaviour, which is highly uncertain. In fact, humans can perform each task in many ways and with different speeds, occasional errors and short pauses. This paper offers a robust method to monitor the advancement of the current human activity in real-time in order to predict its duration. The algorithm learns online templates of new variants of the task and uses them as references for a Dynamic Time Warping-based algorithm. The proposed strategy has been tested within a realistic assembly task. Results show its ability to give good predictions also in case of peculiar variants, such as those associated with errors.

16:45-17:00, Paper WeDT6.2
A Framework for Real-Time and Personalisable Human Ergonomics Monitoring
Video Attachment

Fortini, Luca	Istituto Italiano Di Tecnologia
Lorenzini, Marta	Istituto Italiano Di Tecnologia
Kim, Wansoo	Istituto Italiano Di Tecnologia
De Momi, Elena	Politecnico Di Milano
Ajoudani, Arash	Istituto Italiano Di Tecnologia
Keywords: Human Factors and Human-in-the-Loop, Human-Centered Automation, Modeling and Simulating Human Abstract: The objective of this paper is to present a personalisable human ergonomics framework that integrates a method for real-time identification of a human model and an ergonomics monitoring function. The human model is based on a Statically Equivalent Serial Chain (SESC), which is used for the estimation of the whole-body centre of pressure (CoP). A recursive linear regression algorithm (i.e., Kalman filter) is developed to achieve the online identification of the human model parameters. A visual feedback provides a minimum set of suggested human poses to speed up the identification process, while enhancing the model accuracy based on a convergence value. The online ergonomics monitoring function is to calculate and display the overloading effects on body joints in heavy lifting tasks. The overloading joint torques are calculated based on the displacement of CoP between the measured CoP and the estimated one. Unlike our previous work, the entire process, from the model identification (personalisation) to ergonomics monitoring, is performed in real-time. We evaluated the efficacy of the proposed method in human experiments during model identification and load carrying tasks. Results demonstrate the high exploitation potential of the framework in industrial settings, due to its fast personalisation and ergonomics monitoring capacity.

17:00-17:15, Paper WeDT6.3
Predicting the Human Behaviour in Human-Robot Co-Assemblies: An Approach Based on Suffix Trees

Casalino, Andrea	Politecnico Di Milano
Massarenti, Nicola	Politecnico Di Milano
Zanchettin, Andrea Maria	Politecnico Di Milano
Rocco, Paolo	Politecnico Di Milano
Keywords: Human-Centered Robotics, Assembly, Probability and Statistical Methods Abstract: Prediction of the human behaviour is essential for allowing an efficient human-robot collaboration. This was confirmed recently showing how scheduling approaches can significantly increase the productivity of a robotic cell by planning the robotic actions in a way as much as possible compliant with the human predicted behaviour. This work proposes an innovative approach for human activity prediction, exploiting both a-priori information and knowledge revealed during operation. The resulting approach is proved to achieve good performance through both off-line simulated sequences and in a realistic co-assembly involving a human operator and a dual arm collaborative robot.

17:15-17:30, Paper WeDT6.4
Socially and Contextually Aware Human Motion and Pose Forecasting

Adeli, Vida	Ferdowsi University of Mashhad
Adeli, Ehsan	Stanford University
Reid, Ian	University of Adelaide
Niebles, Juan Carlos	Stanford University
Rezatofighi, S. Hamid	The University of Adelaide
Keywords: Visual-Based Navigation, Gesture, Posture and Facial Expressions, Social Human-Robot Interaction Abstract: Smooth and seamless robot navigation while interacting with humans depends on predicting human movements. Forecasting such human dynamics often involves modeling human trajectories (global motion) or detailed body joint movements (local motion). Prior work typically tackled local and global human movements separately. In this paper, we propose a novel framework to tackle both tasks of human motion (or trajectory) and body skeleton pose forecasting in a unified end-to-end pipeline. To deal with this real-world problem, we consider incorporating both scene and social contexts, as critical clues for this prediction task, into our proposed framework. To this end, we first couple these two tasks by i) encoding their history using a shared Gated Recurrent Unit (GRU) encoder and ii) applying a metric as loss, which measures the source of errors in each task jointly as a single distance. Then, we incorporate the scene context by encoding a spatio-temporal representation of the video data. We also include social clues by generating a joint feature representation from motion and pose of all individuals from the scene using a social pooling layer. Finally, we use a GRU based decoder to forecast both motion and skeleton pose. We demonstrate that our proposed framework achieves a superior performance compared to several baselines on two social datasets.

17:30-17:45, Paper WeDT6.5
Human Grasp Classification for Reactive Human-To-Robot Handovers
Video Attachment

Yang, Wei	NVIDIA
Paxton, Chris	NVIDIA Research
Cakmak, Maya	University of Washington
Fox, Dieter	University of Washington
Keywords: Physical Human-Robot Interaction, Cognitive Human-Robot Interaction, Deep Learning in Grasping and Manipulation Abstract: Transfer of objects between humans and robots is a critical capability for collaborative robots. Although there has been a recent surge of interest in human-robot handovers, most prior research focus on robot-to-human handovers. Further, work on the equally critical human-to-robot handovers often assumes humans can place the object in the robot�s gripper. In this paper, we propose an approach for human-to-robot handovers in which the robot meets the human halfway, by classifying the human�s grasp of the object and quickly planning a trajectory accordingly to take the object from the human�s hand according to their intent. To do this, we collect a human grasp dataset which covers typical ways of holding objects with various hand shapes and poses, and learn a deep model on this dataset to classify the hand grasps into one of these categories. We present a planning and execution approach that takes the object from the human hand according to the detected grasp and hand position, and replans as necessary when the handover is interrupted. Through a systematic evaluation, we demonstrate that our system results in more fluent handovers versus two baselines. We also present findings from a user study (N=9) demonstrating the effectiveness and usability of our approach with naive users in different scenarios.

17:45-18:00, Paper WeDT6.6
Analysis and Transfer of Human Movement Manipulability in Industry-Like Activities
Video Attachment

Jaquier, No�mie	Idiap Research Institute
Rozo, Leonel	Bosch Center for Artificial Intelligence
Calinon, Sylvain	Idiap Research Institute
Keywords: Human and Humanoid Motion Analysis and Synthesis, Learning from Demonstration Abstract: Humans exhibit outstanding learning, planning and adaptation capabilities while performing different types of industrial tasks. Given some knowledge about the task requirements, humans are able to plan their limbs motion in anticipation of the execution of specific skills. For example, when an operator needs to drill a hole on a surface, the posture of her limbs varies to guarantee a stable configuration that is compatible with the drilling task specifications, e.g. exerting a force orthogonal to the surface. Therefore, we are interested in analyzing the human arms motion patterns in industrial activities. To do so, we build our analysis on the so-called manipulability ellipsoid, which captures a posture-dependent ability to perform motion and exert forces along different task directions. Through thorough analysis of the human movement manipulability, we found that the ellipsoid shape is task dependent and often provide more information about the human motion than classical manipulability indexes. Moreover, we show how manipulability patterns can be transferred to robots by learning a probabilistic model and employing a manipulability tracking controller that acts on the task planning and execution according to predefined control hierarchies.


WeDT7	Room T7
HRI: Learning	Regular session
Chair: Zhu, Yixin	University of California, Los Angeles
Co-Chair: Tjomsland, Jonas	University of Cambridge

16:30-16:45, Paper WeDT7.1
Graph-Based Hierarchical Knowledge Representation for Robot Task Transfer from Virtual to Physical World
Video Attachment

Zhang, Zhenliang	Tencent
Zhu, Yixin	University of California, Los Angeles
Zhu, Song-Chun	UCLA
Keywords: Virtual Reality and Interfaces, Human-Centered Robotics, Learning from Demonstration Abstract: We study the hierarchical knowledge transfer problem using a cloth-folding task, wherein the agent is first given a set of human demonstrations in the virtual world using an Oculus Headset, and later transferred and validated on a physical Baxter robot. We argue that such an intricate robot task transfer across different embodiments is only realizable if an abstract and hierarchical knowledge representation is formed to facilitate the process, in contrast to prior literature of sim2real in a reinforcement learning setting. Specifically, the knowledge in both the virtual and physical worlds are measured by information entropy built on top of a graph-based representation, so that the problem of task transfer becomes the minimization of the relative entropy between the two worlds. An And-Or-Graph (AOG) is introduced to represent the knowledge, induced from the human demonstrations performed across six virtual scenarios inside the Virtual Reality (VR). During the transfer, the success of a physical Baxter robot platform across all six tasks demonstrates the efficacy of the graph-based hierarchical knowledge representation.

16:45-17:00, Paper WeDT7.2
Building Plannable Representations with Mixed Reality
Video Attachment

Rosen, Eric	Brown University
Kumar, Nishanth	Brown University
Gopalan, Nakul	Brown University
Ullman, Daniel	Brown University
Tellex, Stefanie	Brown
Konidaris, George	Brown University
Keywords: Virtual Reality and Interfaces, Human-Centered Robotics Abstract: We propose Action-Oriented Semantic Maps (AOSMs), a representation that enables a robot to acquire object manipulation behaviors and semantic information about the environment from a human teacher with a Mixed Reality Head-Mounted Display (MR-HMD). AOSMs are a representation that captures both: a) high-level object manipulation actions in an object class's local frame, and b) semantic representations of objects in the robot's global map that are grounded for navigation. Humans can use a MR-HMD to teach the agent the information necessary for planning object manipulation and navigation actions by interacting with virtual 3D meshes overlaid on the physical workspace. We demonstrate that our system enables users to quickly and accurately teach a robot the knowledge required to autonomously plan and execute three household tasks: picking up a bottle and throwing it in the trash, closing a sink faucet, and flipping a light switch off.

17:00-17:15, Paper WeDT7.3
Learning Human Navigation Behavior Using Measured Human Trajectories in Crowded Spaces

Fahad, Muhammad	Stevens Institute of Technology
Yang, Guang	Stevens Institute of Technology
Guo, Yi	Stevens Institute of Technology
Keywords: Social Human-Robot Interaction, Learning from Demonstration, Imitation Learning Abstract: As humans and mobile robots increasingly coexist in public spaces, their close proximity demands that robots navigate following navigation strategies similar to those exhibited by humans. This could be achieved by learning directly from human demonstration trajectories in a machine learning framework. In this paper, we present a method to learn human navigation behaviors using an imitation learning approach based on generative adversarial imitation learning (GAIL), which has the ability of directly extracting navigation policy. Specifically, we use a large open human trajectory dataset that was experimentally collected in a crowded public space. We then recreate these human trajectories in a 3D robotic simulator, and generate demonstration data using a LIDAR sensor onboard a robot with the robot following the measured human trajectories. We then propose a GAIL based algorithm, which uses occupancy maps generated using LIDAR data as the input, and outputs the navigation policy for robot navigation. Simulation experiments are conducted, and performance evaluation shows that the learned navigation policy generates trajectories qualitatively and quantitatively similar to human trajectories. Compared with existing works using analytical models (such as social force model) to generate human demonstration trajectories, our method learns directly from intrinsic human trajectories, thus exhibits more human-like navigation behaviors.

17:15-17:30, Paper WeDT7.4
Real-World Human-Robot Collaborative Reinforcement Learning
Video Attachment

Shafti, Ali	Imperial College London
Tjomsland, Jonas	University of Cambridge
Dudley, William	Imperial College London
Faisal, Aldo	Imperial College London
Keywords: Cognitive Human-Robot Interaction, Reinforecment Learning, Human Factors and Human-in-the-Loop Abstract: The intuitive collaboration of humans and intelligent robots (embodied AI) in the real-world is an essential objective for many desirable applications of robotics. Whilst there is much research regarding explicit communication, we focus on how humans and robots interact implicitly, on motor adaptation level. We present a real-world setup of a human-robot collaborative maze game, designed to be non-trivial and only solvable through collaboration, by limiting the actions to rotations of two orthogonal axes, and assigning each axes to one player. This results in neither the human nor the agent being able to solve the game on their own. We use deep reinforcement learning for the control of the robotic agent, and achieve results within 30 minutes of real-world play, without any type of pre-training. We then use this setup to perform systematic experiments on human/agent behaviour and adaptation when co-learning a policy for the collaborative game. We present results on how co-policy learning occurs over time between the human and the robotic agent resulting in each participant's agent serving as a representation of how they would play the game. This allows us to relate a person's success when playing with different agents than their own, by comparing the policy of the agent with that of their own agent.

17:30-17:45, Paper WeDT7.5
Enabling Robot to Assist Human in Collaborative Assembly Using Convolutional Neural Networks

Chen, Yi	Clemson University
Wang, Weitian	Clemson University
Krovi, Venkat	Clemson University
Jia, Yunyi	Clemson University
Keywords: Human-Centered Robotics, Intelligent and Flexible Manufacturing Abstract: Human-robot collaborative assembly consists of humans and automated robots, who cooperate with each other to accomplish complex assembly tasks, which are difficult for either humans or robots to accomplish alone. There has been some success in statistics-based and optimization-based approaches to realize human-robot collaboration. However, they usually need a set of complex modeling and setup efforts and the robots usually need to be programmed by a well-trained expert. In this paper, we take a new approach by introducing convolutional neural networks (CNN) into the teaching- learning-collaboration (TLC) model for collaborative assembly tasks. The proposed approach can alleviate the need for complex modeling and setup compared to the existing approaches. It can collect and automatically label the data from human demonstrations and then train a CNN-based robot assistance model to make the robot assist humans in the assembly process in real-time. We have experimentally verified our proposed approach on a human-robot collaborative assembly platform and the results suggest that the robot can successfully learn from human demonstrations to automatically generate right actions to assist human in accomplishing assembly tasks.

17:45-18:00, Paper WeDT7.6
A Multi-Channel Reinforcement Learning Framework for Robotic Mirror Therapy

Xu, Jiajun	University of Science and Technology of China
Xu, Linsen	Hefei Institutes of Physical Science, Chinese Academy of Sciences
Li, You-Fu	City University of Hong Kong
Cheng, Gaoxin	University of Science and Technology of China
Shi, Jia	University of Science and Technology of China
Liu, Jinfu	University of Science and Technology of China
Chen, Shouqi	University of Science and Technology of China
Keywords: Physical Human-Robot Interaction, Rehabilitation Robotics, Mechanism Design Abstract: In the paper, a robotic framework is proposed for hemiparesis rehabilitation. Mirror therapy is applied to transfer therapeutic training from the patient�s function limb (FL) to the impaired limb (IL). The IL mimics the action prescribed by the FL with the assistance of the wearable robot, stimulating and strengthening the injured muscles through repetitive exercise. A master-slave robotic system is presented to implement the mirror therapy. Especially, the reinforcement learning is involved in the human-robot interaction control to enhance the rehabilitation efficacy and guarantee safety. Multi-channel sensed information, including the motion trajectory, muscle activation and the user�s emotion, are incorporated in the learning algorithm. The muscle activation is expressed via the skin surface electromyography (EMG) signals, and the emotion is shown as the facial expression. The reinforcement learning approach is realized by the normalized advantage functions (NAF) algorithm. Then, a lower extremity rehabilitation robot with magnetorheological (MR) actuators is specially developed. The clinical experiments are carried out using the robot to verify the performance of the framework.


WeDT8	Room T8
HRI: Social Navigation	Regular session
Chair: Szafir, Daniel J.	University of Colorado Boulder
Co-Chair: Wurdemann, Helge Arne	University College London

16:30-16:45, Paper WeDT8.1
REFORM: Recognizing F-Formations for Social Robots
Video Attachment

Hedayati, Hooman	Colorado University Boulder
Muehlbradt, Annika	Colorado University Boulder
Szafir, Daniel J.	University of Colorado Boulder
Andrist, Sean	Microsoft Research
Keywords: Social Human-Robot Interaction, Human-Centered Robotics Abstract: Recognizing and understanding conversational groups, or F-formations, is a critical task for situated agents designed to interact with humans. F-formations contain complex structures and dynamics, yet are used intuitively by people in everyday face-to-face conversations. Prior research exploring ways of identifying F-formations has largely relied on heuristic algorithms that may not capture the rich dynamic behaviors employed by humans. We introduce REFORM(REcognize F-FORmations with Machine learning), a data-driven approach for detecting F-formations given human and agent positions and orientations. REFORM decomposes the scene into all possible pairs and then reconstructs F-formations with a voting-based scheme. We evaluated our approach across three datasets: the SALSA dataset, a newly collected human-only dataset, and a new set of acted human-robot scenarios, and found that REFORM yielded improved accuracy over a state-of-the-art F-formation detection algorithm. We also introduce symmetry and tightnessas two new quantitative measures to characterize F-formations.

16:45-17:00, Paper WeDT8.2
Modelling Social Interaction between Humans and Service Robots in Large Public Spaces

Anvari, Bani	Lecturer (Assistant Professor)
Wurdemann, Helge Arne	University College London
Keywords: Service Robots, Path Planning for Multiple Mobile Robots or Agents, Collision Avoidance Abstract: With the advent of service robots in public places (e.g., in airports and shopping malls), understanding socio-psychological interactions between humans and robots is of paramount importance. On the one hand, traditional robotic navigation systems consider humans and robots as moving obstacles and focus on the problem of real-time collision avoidance in Human-Robot Interaction (HRI) using mathematical models. On the other hand, the behavior of a robot has been determined with respect to a human. Parameters for human-human interaction have been assumed and applied to interactions involving robots. One major limitation is the lack of sufficient data for calibration and validation procedures. This paper models, calibrates and validates the socio-psychological interaction of the human in HRIs among crowds. The mathematical model is an extension of the Social Force Model for crowd modelling. The proposed model is calibrated and validated using open source datasets (including uninstructed human trajectories) from the Asia and Pacific Trade Center shopping mall in Osaka (Japan). In summary, the results of the calibration and validation on the multiple HRIs encountered in the datasets show that humans react to a service robot to a higher extend within a larger distance compared to the interaction range towards another human. This microscopic model, calibration and validation framework can be used to simulate HRI between service robots and humans, predict humans' behavior, conduct comparative studies, and gain insights into safe and comfortable human-robot relationships from the human's perspective.

17:00-17:15, Paper WeDT8.3
Natural Criteria for Comparison of Pedestrian Flow Forecasting Models

Vintr, Tomas	FEE, Czech Technical University in Prague
Yan, Zhi	University of Technology of Belfort-Montb�liard (UTBM)
Kubis, Filip	Faculty of Electrical Engineering, CTU
Blaha, Jan	CTU, Departement of Computer Science
Ulrich, Jiri	Faculty of Electrical Engineering, Czech Technical University In
Eyisoy, Furkan Kerem	Marmara University
Swaminathan, Chittaranjan Srinivas	�rebro University
Molina, Sergi	University of Lincoln
Kucner, Tomasz Piotr	�rebro Universitet
Magnusson, Martin	�rebro University
Cielniak, Grzegorz	University of Lincoln
Faigl, Jan	Czech Technical University in Prague
Duckett, Tom	University of Lincoln
Lilienthal, Achim J.	Orebro University
Krajn�k, Tom�	Czech Technical University
Keywords: Social Human-Robot Interaction, Motion and Path Planning, Mapping Abstract: Models of human behaviour, such as pedestrian flows, are beneficial for safe and efficient operation of mobile robots. We present a new methodology for benchmarking of pedestrian flow models based on the afforded safety of robot navigation in human-populated environments. While previous evaluations of pedestrian flow models focused on their predictive capabilities, we assess their ability to support safe path planning and scheduling. Using real-world datasets gathered continuously over several weeks, we benchmark state-of-the-art pedestrian flow models, including both time-averaged and time-sensitive models. In the evaluation, we use the learned models to plan robot trajectories and then observe the number of times when the robot gets too close to humans, using a predefined social distance threshold. The experiments show that while traditional evaluation criteria based on model fidelity differ only marginally, the introduced criteria vary significantly depending on the model used, providing a natural interpretation of the expected safety of the system. For the time-averaged flow models, the number of encounters increases linearly with the percentage operating time of the robot, as might be reasonably expected. By contrast, for the time-sensitive models, the number of encounters grows sublinearly with the percentage operating time, by planning to avoid congested areas and times.

17:15-17:30, Paper WeDT8.4
Risk-Sensitive Sequential Action Control with Multi-Modal Human Trajectory Forecasting for Safe Crowd-Robot Interaction

Nishimura, Haruki	Stanford
Ivanovic, Boris	Stanford University
Gaidon, Adrien	Toyota Research Institute
Pavone, Marco	Stanford University
Schwager, Mac	Stanford University
Keywords: Physical Human-Robot Interaction, Collision Avoidance, Optimization and Optimal Control Abstract: This paper presents a novel online framework for safe crowd-robot interaction based on risk-sensitive stochastic optimal control, wherein the risk is modeled by the entropic risk measure. The sampling-based model predictive control relies on mode insertion gradient optimization for this risk measure as well as Trajectron++, a state-of-the-art generative model that produces multimodal probabilistic trajectory forecasts for multiple interacting agents. Our modular approach decouples the crowd-robot interaction into learning-based prediction and model-based control, which is advantageous compared to end-to-end policy learning methods in that it allows the robot's desired behavior to be specified at run time. In particular, we show that the robot exhibits diverse interaction behavior by varying the risk sensitivity parameter. A simulation study and a real-world experiment show that the proposed online framework can accomplish safe and efficient navigation while avoiding collisions with more than 50 humans in the scene.

17:30-17:45, Paper WeDT8.5
Perception-Aware Human-Assisted Navigation of Mobile Robots on Persistent Trajectories
Video Attachment

Cognetti, Marco	University of Oulu
Aggravi, Marco	CNRS
Pacchierotti, Claudio	Centre National De La Recherche Scientifique (CNRS)
Salaris, Paolo	University of Pisa
Robuffo Giordano, Paolo	Centre National De La Recherche Scientifique (CNRS)
Keywords: Human-Centered Robotics, Reactive and Sensor-Based Planning, Optimization and Optimal Control Abstract: We propose a novel shared control and active perception framework combining the skills of a human operator in accomplishing complex tasks with the capabilities of a mobile robot in autonomously maximizing the information acquired by the onboard sensors for improving its state estimation. The human operator modifies at runtime some suitable properties of a persistent cyclic path followed by the robot so as to achieve the given task (e.g., explore an environment). At the same time, the path is concurrently adjusted by the robot with the aim of maximizing the collected information. This combined behavior enables the human operator to control the high-level task of the robot while the latter autonomously improves its state estimation. The user's commands are included in a task priority framework together with other relevant constraints, while the quality of the acquired information is measured by the Shatten norm of the Constructibility Gramian. The user is also provided with guidance feedback pointing in the direction that would maximize this information metric. We evaluated the proposed approach in two human subject studies, testing the effectiveness of including the Constructibility Gramian into the task priority framework as well as the viability of providing either visual or haptic feedback to convey this information metric.


WeDT9	Room T9
Human and Robot Teaming	Regular session
Chair: Short, Elaine Schaertl	Tufts University

16:30-16:45, Paper WeDT9.1
Collaborative Interaction Models for Optimized Human Robot Teamwork
Video Attachment

Fishman, Adam	University of Washington
Paxton, Chris	NVIDIA Research
Yang, Wei	NVIDIA
Fox, Dieter	University of Washington
Boots, Byron	University of Washington
Ratliff, Nathan	Lula Robotics Inc
Keywords: Optimization and Optimal Control, Physical Human-Robot Interaction, Motion Control Abstract: Effective human-robot collaboration requires informed anticipation. The robot must anticipate the human's actions, but also react quickly and intuitively when its predictions are wrong. The robot must plan its actions to account for the human's own plan, with the knowledge that the human's behavior will change based on what the robot actually does. This cyclical game of predicting a human's future actions and generating a corresponding motion plan is extremely difficult to model using standard techniques. In this work, we describe a novel Model Predictive Control (MPC)-based framework for finding optimal trajectories in a collaborative, multi-agent setting, in which we simultaneously plan for the robot while predicting the actions of its external collaborators. We use human-robot handovers to demonstrate that with a strong model of the collaborator, our framework produces fluid, reactive human-robot interactions in novel, cluttered environments. Our method efficiently generates coordinated trajectories, and achieves a high success rate in handover, even in the presence of significant sensor noise.

16:45-17:00, Paper WeDT9.2
TASC: Teammate Algorithm for Shared Cooperation
Video Attachment

Chang, Mai Lee	University of Texas at Austin
Kessler Faulkner, Taylor	University of Texas at Austin
Wei, Thomas Benjamin	The University of Texas at Austin
Short, Elaine Schaertl	Tufts University
Anandaraman, Gokul	The University of Texas at Austin
Thomaz, Andrea Lockerd	University of Texas at Austin
Keywords: Social Human-Robot Interaction, Cooperating Robots Abstract: For robots to be perceived as full-fledged team members, they must display intelligent behavior along multiple dimensions. One challenge is that even when the robot and human are on the same team, the interaction may not feel like teamwork to the human. We present a novel algorithm, Teammate Algorithm for Shared Cooperation (TASC). TASC is motivated by the concept of shared cooperative activity (SCA) for human-human teamwork, developed in prior work by Bratman. We focus on enabling the robot to prioritize certain SCA facets in its action selection depending on the task. We evaluated TASC in three experiments using different tasks with human users on Amazon Mechanical Turk. Our results show that TASC enabled participants to predict the robot's goal earlier by one robot move and with greater confidence. The robot also helped reduce participants' energy usage in a simulated block-moving task. Altogether, these results show that considering the SCA facets in the robot's action selection improves teamwork.

17:00-17:15, Paper WeDT9.3
Lio - a Personal Robot Assistant for Human-Robot Interaction and Care Applications

Miseikis, Justinas	F&P Robotics AG
Caroni, Pietro	F&P Robotics AG
Duchamp, Patricia	F&P Robotics AG
Gasser, Alina	F&P Robotics AG
Marko, Rastislav	F&P Robotics AG
Miseikiene, Nelija	F&P Robotics AG
Zwilling, Frederik	F&P Robotics AG
de Castelbajac, Charles	F&P Robotics AG
Eicher, Lucas	F&P Robotics AG
Frueh, Michael	F&P Robotics AG
Frueh, Hansruedi	Neuronics AG
Keywords: Service Robotics, Human-Centered Robotics, Autonomous Agents Abstract: Lio is a mobile robot platform with a multi-functional arm explicitly designed for human-robot interaction and personal care assistant tasks. The robot has already been deployed in several health care facilities, where it is functioning autonomously, assisting staff and patients on an everyday basis. Lio is intrinsically safe by having full coverage in soft artificial-leather material as well as collision detection, limited speed and forces. Furthermore, the robot has a compliant motion controller. A combination of visual, audio, laser, ultrasound and mechanical sensors are used for safe navigation and environment understanding. The ROS-enabled setup allows researchers to access raw sensor data as well as have direct control of the robot. The friendly appearance of Lio has resulted in the robot being well accepted by health care staff and patients. Fully autonomous operation is made possible by a flexible decision engine, autonomous navigation and automatic recharging. Combined with time-scheduled task triggers, this allows Lio to operate throughout the day, with a battery life of up to 8 hours and recharging during idle times. A combination of powerful computing units provides enough processing power to deploy artificial intelligence and deep learning-based solutions on-board the robot without the need to send any sensitive data to cloud services, guaranteeing compliance with privacy requirements. During the COVID-19 pandemic, Lio was rapidly adjusted to perform additional functionality like disinfection and remote elevated body temperature detection. It complies with ISO13482 - Safety requirements for personal care robots, meaning it can be directly tested and deployed in care facilities.

17:15-17:30, Paper WeDT9.4
Generating Alerts to Assist with Task Assignments in Human-Supervised Multi-Robot Teams Operating in Challenging Environments

Al-Hussaini, Sarah	University of Southern California
Gregory, Jason M.	US Army Research Laboratory
Guan, Yuxiang	University of Southern California
Gupta, Satyandra K.	University of Southern California
Keywords: Human Factors and Human-in-the-Loop, Multi-Robot Systems, Human Performance Augmentation Abstract: In a mission with considerable uncertainty due to intermittent communications, degraded information flow, and failures, humans need to assess both the current and expected future states, and update task assignments to robots as quickly as possible. We present a forward simulation-based alert system that proactively notifies the human supervisor of possible, negatively-impactful events, which provides an opportunity for the human to retask agents to avoid undesirable scenarios. We propose methods for speeding up mission simulations and extracting alerts from simulation data in order to enable real-time alert generation suitable for time-critical missions. We present the results from a user trial and verify our hypothesis that the decision making performance of human supervisors can be improved by introducing forward simulation-based alerts.

17:30-17:45, Paper WeDT9.5
A Visuo-Haptic Guidance Interface for the Mobile Collaborative Robotic Assistant (MOCA)
Video Attachment

Lamon, Edoardo	Istituto Italiano Di Tecnologia
Fusaro, Fabio	Istituto Italiano Di Tecnologia
Balatti, Pietro	Istituto Italiano Di Tecnologia
Kim, Wansoo	Istituto Italiano Di Tecnologia
Ajoudani, Arash	Istituto Italiano Di Tecnologia
Keywords: Physical Human-Robot Interaction, Human-Centered Robotics, Human Factors and Human-in-the-Loop Abstract: In this work, we propose a novel visuo-haptic guidance interface to enable mobile collaborative robots to follow human instructions in a way understandable by non-experts. The interface is composed of a haptic admittance module and a human visual tracking module. The haptic guidance enables an individual to guide the robot end-effector in the workspace to reach and grasp arbitrary items. The visual interface, on the other hand, uses a real-time human tracking system and enables autonomous and continuous navigation of the mobile robot towards the human, with the ability to avoid static and dynamic obstacles along its path. To ensure a safer human-robot interaction, the visual tracking goal is set outside of a certain area around the human body, entering which will switch robot behaviour to the haptic mode. The execution of the two modes is achieved by two different controllers, the mobile base admittance controller for the haptic guidance and the robot's whole-body impedance controller, that enables physically coupled and controllable locomotion and manipulation. The proposed interface is validated experimentally, where a human-guided robot performs the loading and transportation of a heavy object in a cluttered workspace, illustrating the potential of the proposed Follow-Me interface in removing the external loading from the human body in this type of repetitive industrial tasks.

17:45-18:00, Paper WeDT9.6
Supportive Actions for Manipulation in Human-Robot Coworker Teams

Bansal, Shray	Georgia Institute of Technology
Newbury, Rhys	Monash University
Chan, Wesley Patrick	Monash University
Cosgun, Akansel	Monash University
Allen, Aimee	Monash University
Kulic, Dana	Monash University
Drummond, Tom	Monash University
Isbell, Charles	Georgia Institute of Technology
Keywords: Cooperating Robots, Human Factors and Human-in-the-Loop Abstract: The increasing presence of robots alongside humans, such as in human-robot teams in manufacturing, gives rise to research questions about the kind of behaviors people prefer in their robot counterparts. We term actions that support interaction by reducing future interference with others as supportive robot actions and investigate their utility in a colocated manipulation scenario. We compare two robot modes in a shared table pick-and-place task: (1) Task-oriented: the robot only takes actions to further its task objective and (2) Supportive: the robot sometimes prefers supportive actions to task-oriented ones when they reduce future goal-conflicts. Our experiments in simulation, using a simplified human model, reveal that supportive actions reduce the interference between agents, especially in more difficult tasks, but also cause the robot to take longer to complete the task. We implemented these modes on a physical robot in a user study where a human and a robot perform object placement on a shared table. Our results show that a supportive robot was perceived more favorably as a coworker and also reduced interference with the human in one of two scenarios. However, it also took longer to complete the task highlighting an interesting trade-off between task-efficiency and human-preference that needs to be considered before designing robot behavior for close-proximity manipulation scenarios.


WeDT10	Room T10
Social Human-Robot Interaction II	Regular session
Chair: Fitter, Naomi T.	Oregon State University
Co-Chair: Shiomi, Masahiro	ATR

16:30-16:45, Paper WeDT10.1
Learning to Take Good Pictures of People with a Robot Photographer
Video Attachment

Newbury, Rhys	Monash University
Cosgun, Akansel	Monash University
Koseoglu, Mehmet	University of Melbourne
Drummond, Tom	Monash University
Keywords: Social Human-Robot Interaction, Service Robotics, Computer Vision for Other Robotic Applications Abstract: We present a robotic system capable of navigating autonomously by following a line and taking good quality pictures of people. When a group of people is detected, the robot rotates towards them and then back to line while continuously taking pictures from different angles. Each picture is processed in the cloud where its quality is estimated in a two-stage algorithm. First, features such as the face orientation and likelihood of facial emotions are input to a fully connected neural network to assign a quality score to each face. Second, a representation is extracted by abstracting faces from the image and it is input to a Convolutional Neural Network (CNN) to classify the quality of the overall picture. We collected a dataset in which a picture was labeled as good quality if subjects are well-positioned in the image and oriented towards the camera with a pleasant expression. Our approach detected the quality of pictures with 78.4% accuracy in this dataset and received a better mean user rating (3.71/5) than a heuristic method that uses photographic composition procedures in a study where 97 human judges rated each picture. Statistical analysis against the state-of-the-art verified the quality of the resulting pictures.

16:45-17:00, Paper WeDT10.2
Can a Robot's Touches Express the Feeling of Kawaii Toward an Object?

Okada, Yuka	Doshisha University
Kimoto, Mitsuhiko	Keio University
Iio, Takamasa	University of Tsukuba / JST PRESTO
Shimohara, Katsunori	Doshisha University
Nittono, Hiroshi	Osaka University
Shiomi, Masahiro	ATR
Keywords: Social Human-Robot Interaction, Service Robots Abstract: Kawaii, a Japanese word that means �cute,� is a burgeoning　essential　design concept in consumer and pop culture in Japan. In this study, we　focused on a situation where a social robot explains an object during　an information-providing task, which is commonly required for social　robots in daily environments. Because kawaii　feelings are associated　with a motivation to approach a target, we employed touch behaviors by　our robot to express the feelings of kawaii to the object. We also　focused on whether touch behaviors with an exaggerated style emphasize　the feeling of kawaii of the touched object, following a phenomenon　where people strongly touch a target when they feel overwhelmingly　positive emotion: cute aggression. Our experiment results showed the　effectiveness of touch behaviors to express the feelings of kawaii from　the robot toward the object and to increase the participants� the　feeling of kawaii toward the object, but fewer effects toward the　robot. The　emphasized motion style did not show any significant effects　for the kawaii feelings.

17:00-17:15, Paper WeDT10.3
Person-Directed Pointing Gestures and Inter-Personal Relationship: Expression of Politeness to Friendliness by Android Robots
Video Attachment

Ishi, Carlos Toshinori	ATR
Mikata, Ryusuke	ATR
Ishiguro, Hiroshi	Osaka University
Keywords: Gesture, Posture and Facial Expressions, Human and Humanoid Motion Analysis and Synthesis, Social Human-Robot Interaction Abstract: Pointing at a person is usually deemed to be impolite. However, several different forms of person-directed pointing gestures commonly appear in casual dialogue interactions. In this study, we first analyzed pointing gestures in human-human dialogue interactions and observed different trends in the use of gesture types, based on the inter-personal relationships between dialogue partners. Then we conducted multiple subjective experiments by systematically creating behaviors in an android robot to investigate the effects of different types of pointing gestures on the impressions of its behaviors. Several factors were included: pointing gesture motion types (hand shapes, such as an open palm or an extended index finger, hand orientation, and motion direction), language types (formal or colloquial), gesture speeds, and gesture hold duration. Our evaluation results indicated that impressions of polite or casual are affected by the analyzed factors, and a behavior�s appropriateness depends on the inter-personal relationship with the dialogue partner.

17:15-17:30, Paper WeDT10.4
Socially Assistive Robots at Work: Making Break-Taking Interventions More Pleasant, Enjoyable, and Engaging
Video Attachment

Zhang, Brian John	Oregon State University
Quick, Ryan Racel	Oregon State University
Helmi, Ameer	Oregon State University
Fitter, Naomi T.	Oregon State University
Keywords: Social Human-Robot Interaction, Human-Centered Robotics, Service Robotics Abstract: More than ever, people spend the workday seated in front of a computer, which contributes to health issues caused by excess sedentary behavior. While breaking up long periods of sitting can alleviate these issues, no scalable interventions have had long-term success in motivating activity breaks at work. We believe that socially assistive robotics (SAR), which combines the scalability of e-health interventions with the motivational social ability of a companion or coach, may offer a solution for changing sedentary habits. To begin this work, we designed a SAR system and conducted a within-subjects study with N = 19 participants to compare their experiences taking breaks using the SAR system versus an alarm-like device for one day each in participants� normal workplaces. Results indicate that both systems had similar effects on sedentary behavior, but the SAR system led to greater feelings of pleasure, enjoyment, and engagement. Interviews yielded design recommendations for future systems. We find that SAR systems hold promise for further investigations of aiding healthy habit formation in work settings.


WeDT11	Room T11
Physical Human-Robot Interaction	Regular session
Chair: Lee, Hyunglae	Arizona State University
Co-Chair: Chan, Wesley Patrick	Monash University

16:30-16:45, Paper WeDT11.1
Interact with Me: An Exploratory Study on Interaction Factors for Active Physical Human-Robot Interaction
Video Attachment

Hu, Yue	National Institute of Advanced Industrial Science and Technology
Benallegue, Mehdi	AIST Japan
Venture, Gentiane	Tokyo University of Agriculture and Technology
Yoshida, Eiichi	National Inst. of AIST
Keywords: Physical Human-Robot Interaction Abstract: In future robotic applications in environments such as nursing houses, construction sites, private homes, etc, robots might need to take unpredicted physical actions according to the state of the users to overcome possible human errors. Referring to these actions as active physical human-robot interactions (active pHRI), in this paper, the goal is to verify the possibility of identifying measurable interaction factors that could be used in future active pHRI controllers, by exploring and analyzing the state of the users during active pHRI. We hypothesize that active physical robot actions can cause measurable alterations in the physical and physiological data of the users, and that these measurements could be interpreted with users' personality and perceptions. We design an experiment where the participant uses the robot to play a visual puzzle game, during which, the robot takes unanticipated physical actions. We collect physiological and physical data, as well as outcomes of two state-of-the-art questionnaires on the perceptions of robots, CH-33 and Godspeed Series Questionnaires (GSQ), and a pre-experiment personality questionnaire, to relate the collected data with the users' perceptions and personality. The experiment outcomes show that we can extract a few factors related to personality, perception, physiological, and physical measurements. Even though we could not draw very clear correlations, these outcomes give fundamental insights for the design of novel pHRI experiments.

16:45-17:00, Paper WeDT11.2
An Augmented Reality Human-Robot Physical Collaboration Interface Design for Shared, Large-Scale, Labour-Intensive Manufacturing Tasks

Chan, Wesley Patrick	Monash University
Hanks, Geoffrey	University of British Columbia
Sakr, Maram	University of British Columbia
Zuo, Tiger (hu)	University of British Columbia
Van der Loos, H.F. Machiel	University of British Columbia (UBC)
Croft, Elizabeth	Monash University
Keywords: Virtual Reality and Interfaces, Factory Automation, Industrial Robots Abstract: This paper investigate potential use of augmented reality (AR) for physical human-robot collaboration in large-scale, labour-intensive manufacturing tasks. While it has been shown that use of AR can help increase task efficiency in teleoperative and robot programming tasks involving smaller-scale robots, its use for physical human-robot collaboration in shared workspaces and large-scale manufacturing tasks have not been well-studied. With the eventual goal of applying our AR system to collaborative aircraft body manufacturing, we compare in a user study the use of an AR interface we developed with a standard joystick for human robot collaboration in an experiment task simulating industrial carbon-fibre-reinforced-polymer manufacturing procedure. Results show that use of AR yields reduced task time and physical demand, with increased robot utilization.

17:00-17:15, Paper WeDT11.3
A Control Scheme for Smooth Transition in Physical Human-Robot-Environment between Two Modes: Augmentation and Autonomous
Video Attachment

Li, Hsieh-Yu	Singapore University of Technology and Design
Yang, Liangjing	Zhejiang University
Tan, U-Xuan	Singapore University of Techonlogy and Design
Keywords: Physical Human-Robot Interaction, Human-Centered Robotics Abstract: There has been an increasing demand for physical human-robot collaboration during the design prototyping phase. For example, users would like to maneuver the end-effector compliantly in free space followed by supplying a contact force to obtain a firm adhesive connection. The technical challenges is the design of the controller, especially during the switching from human-robot interaction (human guides robot) to robot-environment interaction for the robot to continuously maintain the contact force even after the human lets go. Traditional controllers often result in unstable interaction during the switches of the controllers. Therefore, this paper proposes a control scheme that unifies impedance and admittance in the outer loop, and unifies the adaptive position and velocity control in the inner loop to address this issue. The cooperation of the cobot is divided into two modes, namely, an augmentation mode where the human force is the desired input to guide the motion of the cobot, and an autonomous mode where predefined position and force commands are used (e.g., to maintain a desired holding force). With the proposed control scheme, the physical interaction between the robot, human and environment can be smoothly and stably transited from augmentation mode to autonomous mode. Experiments are then conducted to validate the proposed approach.

17:15-17:30, Paper WeDT11.4
Control Interface for Hands-Free Navigation of Standing Mobility Vehicles Based on Upper-Body Natural Movements

Chen, Yang	University of Tsukuba
Paez-Granados, Diego	EPFL - Swiss Federal School of Technology in Lausanne
Kadone, Hideki	University of Tsukuba
Suzuki, Kenji	University of Tsukuba
Keywords: Physical Human-Robot Interaction, Medical Robots and Systems, Human-Centered Robotics Abstract: In this paper, we propose and evaluate a novel human-machine interface (HMI) for controlling a standing mobility vehicle or person carrier robot, aiming for a hands-free control through upper-body natural postures derived from gaze tracking while walking. We target users with lower-body impairment with remaining upper-body motion capabilities. The developed HMI bases on a sensing array for capturing body postures; an intent recognition algorithm for continuous mapping of body motions to robot control space; and a personalizing system for multiple body sizes and shapes. We performed two user studies: first, an analysis of the required body muscles involved in navigating with the proposed control; and second, an assessment of the HMI compared with a standard joystick through quantitative and qualitative metrics in a narrow circuit task. We concluded that the main user control contribution comes from Rectus Abdominis and Erector Spinae muscle groups at different levels. Finally, the comparative study showed that a joystick still outperforms the proposed HMI in usability perceptions and controllability metrics, however, the smoothness of user control was similar in jerk and fluency. Moreover, users' perceptions showed that hands-free control made it more anthropomorphic, animated, and even safer.

17:30-17:45, Paper WeDT11.5
Regulation of 2D Arm Stability against Unstable, Damping-Defined Environments in Physical Human-Robot Interaction

Zahedi, Fatemeh	Arizona State University
Bitz, Tanner	Arizona State University
Phillips, Connor	Arizona State University
Lee, Hyunglae	Arizona State University
Keywords: Physical Human-Robot Interaction, Human Factors and Human-in-the-Loop, Neurorobotics Abstract: This paper presents an experimental study to investigate how humans interact with a robotic arm simulating primarily unstable, damping-defined, mechanical environments, and to quantify lower bounds of robotic damping that humans can stably interact with. Human subjects performed posture maintenance tasks while a robotic arm simulated a range of negative damping-defined environments and transiently perturbed the human arm to challenge postural stability. Analysis of 2-dimensional kinematic responses in both the time domain and phase space allowed us to evaluate stability of the coupled human-robot system in both anterior-posterior (AP) and medial-lateral (ML) directions, and to determine the lower bounds of robotic damping for stable physical human-robot interaction (pHRI). All subjects demonstrated higher capacity to stabilize their arm against negative damping-defined environments in the AP direction than the ML direction, evidenced by all 3 stability measures used in this study. Further, the lower bound of robotic damping for stable pHRI was more than 3.5 times lower in the AP direction than the ML direction: -30.0 Ns/m and -8.2 Ns/m in the AP and ML directions, respectively. Sensitivity analysis confirmed that the results in this study were relatively insensitive to varying experimental conditions. Outcomes of this study would allow us to design a less conservative robotic impedance controller that utilizes a wide range of robotic damping, including negative damping, and achieves more transparent and agile operations without compromising coupled stability and safety of the human-robot system, and thus improves the overall performance of pHRI.

17:45-18:00, Paper WeDT11.6
Designing a Dummy Skin by Evaluating Contacts between a Human Hand and a Robot End Tip

Iki, Yumena	Nagoya University
Yamada, Yoji	Nagoya University
Akiyama, Yasuhiro	Nagoya-University
Okamoto, Shogo	Nagoya University
Jian, Liu	Nagoya University
Keywords: Robot Safety, Cooperating Robots, Physical Human-Robot Interaction Abstract: In many manufacturing industries, there is a high demand for industrial robots for collaborative operation. A set of safety verification data already exists in ISO/TS 15066 for collaborative operation, however, there is no established testing method for safety validation. To establish a testing method, it may be effective to use a dummy that has the mechanical properties similar to those of a human, however, there are no parametric study exists for designing a dummy that represents the static and dynamic mechanical properties and also nonlinearity of the static mechanical properties. In this study, static and dynamic experiment are conducted to obtain the stiffness of the human subjects and the contact force transitions during the dynamic contact between a robot system and a human system. Then, the same experiment is conducted to the dummy we proposed. The biofidelity of the dummy was discussed by comparing the parameters of the viscoelastic model. The dummy we proposed in this study shows the same maximum dynamic contact force as those of the subjects and higher total transferred energy than those of the subjects. It is said that the peak contact pressure and the total transferred energy are the dominant parameters for injury. Therefore, the dummy is quite useful from the view point of safety validation for injury.


WeDT12	Room T12
Planning for Robot Interactions with Humans	Regular session
Chair: Chung, Jen Jen	Eidgen�ssische Technische Hochschule Z�rich
Co-Chair: Yoon, Sung-eui	KAIST

16:30-16:45, Paper WeDT12.1
Modeling a Social Placement Cost to Extend Navigation among Movable Obstacles (NAMO) Algorithms
Video Attachment

Renault, Benoit	INSA Lyon
Saraydaryan, Jacques	Cpe Lyon
Simonin, Olivier	INSA De Lyon
Keywords: Motion and Path Planning, Social Human-Robot Interaction, Human-Centered Robotics Abstract: Current Navigation Among Movable Obstacles (NAMO) algorithms focus on finding a path for the robot that only optimizes the displacement cost of navigating and moving obstacles out of its way. However, in a human environment, this focus may lead the robot to leave the space in a socially inappropriate state that may hamper human activity (i.e. by blocking access to doors, corridors, rooms or objects of interest). In this paper, we tackle this problem of Social Placement Choice by building a social occupation costmap, built using only geometrical information. We present how existing NAMO algorithms can be extended by exploiting this new cost map. Then, we show the effectiveness of this approach with simulations, and provide additional evaluation criteria to assess the social acceptability of plans.

16:45-17:00, Paper WeDT12.2
Optimization-Based Path Planning for Person Following Using Following Field
Video Attachment

Shin, Heechan	KAIST
Yoon, Sung-eui	KAIST
Keywords: Motion and Path Planning, Autonomous Agents, Service Robotics Abstract: Person following is a essential task for a robot to serve a person. In an indoor environment, however, the following task is failed due to the occlusion of the target by structures, e.g., walls or pillars. To address this problem, we propose a method that helps the robot follow the target well and rapidly re-detect the target after missing. The proposed method is an optimization-based path planning which uses Following Field that we propose in this paper. The following field consists of two sub-field. One is the repulsion field which keeps the robot to see the target person. The other one is the target attraction field which attracts the robot toward the target. We introduce how to construct the fields and how to merge the field into path optimization process. In the end, we perform an experiment that shows our method works properly for following the target well.

17:00-17:15, Paper WeDT12.3
Human Perception-Optimized Planning for Comfortable VR Based Telepresence

Becerra, Israel	Centro De Investigacion En Matematicas
Suomalainen, Markku	University of Oulu
Lozano, Eliezer	Centro De Investigaci�n En Matem�ticas
Mimnaugh, Katherine J.	University of Oulu
Murrieta-Cid, Rafael	Center for Mathematical Research
LaValle, Steven M	University of Oulu
Keywords: Motion and Path Planning, Virtual Reality and Interfaces, Human Factors and Human-in-the-Loop Abstract: This paper introduces an emerging motion planning problem by considering a human that is immersed into the viewing perspective of a remote robot. The challenge is to make the experience both eective (such as delivering a sense of presence) and comfortable (such as avoiding adverse sickness symptoms, including nausea). We refer this challenging new area as human perception-optimized planning and propose a general multiobjective optimization framework that can be instantiated in many envisioned scenarios. We then consider a specific VR telepresence task as a case of human perception-optimized planning, in which we simulate a robot that sends 360 video to a remote user to be viewed through a head-mounted display. In this particular task, we plan trajectories that minimize VR sickness (and thereby maximize comfort). An A* type method is used to create a Pareto-optimal collection of piecewise linear trajectories while taking into account criteria that improve comfort. We conducted a study with human subjects touring a virtual museum, in which paths computed by our algorithm are compared against a reference RRT-based trajectory. Generally, users suffered less from VR sickness and preferred the paths created by the presented algorithm.

17:15-17:30, Paper WeDT12.4
IAN: Multi-Behavior Navigation Planning for Robots in Real, Crowded Environments
Video Attachment

Dugas, Daniel	ETH Zurich
Nieto, Juan	ETH Z�rich
Siegwart, Roland	ETH Zurich
Chung, Jen Jen	Eidgen�ssische Technische Hochschule Z�rich
Keywords: Motion and Path Planning, Social Human-Robot Interaction, Behavior-Based Systems Abstract: State of the art approaches for robot navigation among humans are typically restricted to planar movement actions. This work addresses the question of whether it can be beneficial to use interaction actions, such as saying, touching, and gesturing, for the sake of allowing robots to navigate in unstructured, crowded environments. To do so, we first identify challenging scenarios to traditional motion planning methods. Based on the hypothesis that the variation in modality for these scenarios calls for significantly different planning policies, we design specific navigation behaviors, as interaction planners for actuated, mobile robots. We further propose a high level planning algorithm for multi-behavior navigation, named Interaction Actions for Navigation (IAN). Through both real-world and simulated experiments, we validate the selected behaviors and the high-level planning algorithm, and discuss the impact of our obtained results on our stated assumptions.


WeDT13	Room T13
Social Human-Robot Interaction I	Regular session
Chair: Recchiuto, Carmine Tommaso	University of Genova
Co-Chair: Stoy, Kasper	IT University of Copenhagen

16:30-16:45, Paper WeDT13.1
Robots Can Defuse High-Intensity Conflict Situations

Frederiksen, Morten Roed	IT-University of Copenhagen
Stoy, Kasper	IT University of Copenhagen
Keywords: Social Human-Robot Interaction, Physical Human-Robot Interaction, Gesture, Posture and Facial Expressions Abstract: This paper investigates the specific scenario of high-intensity confrontations between humans and robots, to understand how robots can defuse the conflict. It focuses on the effectiveness of using five different affective expression modalities as main drivers for defusing the conflict. The aim is to discover any strengths or weaknesses in using each modality to mitigate the hostility that people feel towards a poorly performing robot. The defusing of the situation is accomplished by making the robot better at acknowledging the conflict and by letting it express remorse. To facilitate the tests, we used a custom affective robot in a simulated conflict situation with 105 test participants. The results show that all tested expression modalities can successfully be used to defuse the situation and convey an acknowledgment of the confrontation. The ratings were remarkably similar, but the movement modality was different (ANON p<.05) than the other modalities. The test participants also had similar affective interpretations on how impacted the robot was of the confrontation across all expression modalities. This indicates that defusing a high-intensity interaction may not demand special attention to the expression abilities of the robot, but rather require attention to the abilities of being socially aware of the situation and reacting in accordance with it.

16:45-17:00, Paper WeDT13.2
This or That: The Effect of Robot�s Deictic Expression on User�s Perception

Kang, Dahyun	Korea Institute of Science and Technology
Kwak, Sonya Sona	Korea Institute of Science and Technology (KIST)
Lee, Hanbyeol	Korea Institute of Science and Technology
Kim, Eun Ho	KITECH
Choi, Jongsuk	Korea Inst. of Sci. and Tech
Keywords: Social Human-Robot Interaction, Service Robots, Gesture, Posture and Facial Expressions Abstract: The purpose of this study is to investigate a robot�s impression perceived by users as well as the accuracy of perception of location information, which the robot provided according to the modality type of the robot. To explore this, we designed two 2 (verbal types: deictic vs. descriptive) x 2 (nose pointing: with nose vs. without nose) x 2 (eye pointing: with eyes vs. without eyes) mixed-participant studies. In the first study, we investigated the impacts of the robot�s modality type in the imperative pointing situation. As a result, participants identified the robot�s pointing gesture with nose as more effective, social, and positive, than the robot�s pointing gesture without nose. Moreover, the descriptive speech robot was evaluated as more positive than the deictic speech robot. In terms of the accuracy of perception of location information, which the robot provided, participants identified the robot-designated chair more accurately when the robot delivered a deictic speech than when the robot delivered a descriptive speech. For the second study, we explored the effects of the robot�s modality type in the declarative pointing situation. As a result, the robot�s descriptive speech was rated as effective, social, natural, competent, trustworthy, and more positive than deictic speech. In the case of the robot�s pointing gestures, pointing gesture with nose was evaluated as more effective, social, natural, competent, trustworthy, and positive than that without nose. In terms of the accuracy of location information perception, participants perceived the location of the object designated by the robot more accurately when the robot used descriptive speech, pointed with nose and without eyes.

17:00-17:15, Paper WeDT13.3
Smart Speaker vs. Social Robot in a Case of Hotel Room
Video Attachment

Nakanishi, Junya	Osaka Univ
Baba, Jun	CyberAgent, Inc
Kuramoto, Itaru	Osaka University
Ogawa, Kohei	Osaka University
Yoshikawa, Yuichiro	Osaka University
Ishiguro, Hiroshi	Osaka University
Keywords: Social Human-Robot Interaction Abstract: Under the circumstances that social robots are increasingly being developed and studied in service encounters at public spaces, are they introduced into residential environments (i.e., private space)? This study hypothesizes that a personal assistant device in residential environments should wear human-like appearance to engage in service as conversation partner. We implemented the interaction design that provides regular services as the current personal assistant and additional service as conversation partner, and then conducted a field experiment where the participants stayed in the hotel rooms with a smart speaker or a social robot. The results support the hypothesis in that of conversation amount and emotional experience by conversation. The results also suggest the possibility of commercial service, namely conversational advertisement through social robots.

17:15-17:30, Paper WeDT13.4
Robots versus Speakers: What Type of Central Smart Home Interface Consumers Prefer?

Kwak, Sonya Sona	Korea Institute of Science and Technology (KIST)
Kim, Jun San	KB Financial Group
Moon, Byeong June	Seoul National University
Kang, Dahyun	Korea Institute of Science and Technology
Choi, Jongsuk	Korea Inst. of Sci. and Tech
Keywords: Domestic Robots, Cognitive Human-Robot Interaction, Service Robots Abstract: In smart home environments, central interfaces that take commands from users and give orders to each relevant device appropriately are increasingly important. We investigated the type of central interface that consumers are more willing to adopt and whether these interfaces enhance the evaluation of services provided by smart home devices. This study confirms that speaker interfaces are preferred over social robots, speaker interfaces are perceived by users as more persuasive, and the adoption of central interfaces increases the overall service evaluation.

17:30-17:45, Paper WeDT13.5
A Feasibility Study of Culture-Aware Cloud Services for Conversational Robots

Recchiuto, Carmine Tommaso	University of Genova
Sgorbissa, Antonio	University of Genova
Keywords: Social Human-Robot Interaction, Service Robotics, Cognitive Human-Robot Interaction Abstract: Cultural competence - i.e., the capability to adapt verbal and non-verbal interaction to the user's cultural background - may be a key element for social robots to increase the user experience. However, designing and implementing culturally competent social robots is a complex task, given that advanced conversational skills are required. In this context, Cloud services may be useful for helping robots in generating appropriate interaction patterns in a culture-aware manner. In this paper, we present the design and the implementation of the CARESSES Cloud, a set of robotic services aimed at endowing robots with cultural competence in verbal interaction. A preliminary evaluation of the Cloud services as a general dialoguing system for culture-aware social robots has been performed, analyzing the feasibility of the architecture in terms of communication and data processing delays.


WeDT14	Room T14
Virtual Reality I	Regular session
Chair: Szafir, Daniel J.	University of Colorado Boulder
Co-Chair: Gopalan, Nakul	Brown University

16:30-16:45, Paper WeDT14.1
Human-Robot Interaction in a Shared Augmented Reality Workspace
Video Attachment

Qiu, Shuwen	University of California, Los Angeles
Liu, Hangxin	University of California, Los Angeles
Zhang, Zeyu	UCLA
Zhu, Yixin	University of California, Los Angeles
Zhu, Song-Chun	UCLA
Keywords: Virtual Reality and Interfaces, Cognitive Human-Robot Interaction Abstract: We design and develop a new shared Augmented Reality (AR) workspace for Human-Robot Interaction (HRI), which establishes a bi-directional communication between human agents and robots. In a prototype system, the shared AR workspace enables a shared perception, so that a physical robot not only perceives the virtual elements in its own view but also infers the utility of the human agent�the cost needed to perceive and interact in AR�by sensing the human agent�s gaze and pose. Such a new HRI design also affords a shared manipulation, wherein the physical robot can control and alter virtual objects in AR as an active agent; crucially, a robot can proactively interact with human agents, instead of purely passively executing received commands. In experiments, we design a resource collection game that qualitatively demonstrates how a robot perceives, processes, and manipulates in AR and quantitatively evaluates the efficacy of HRI using the shared AR workspace. We further discuss how the system can potentially benefit future HRI studies that are otherwise challenging.

16:45-17:00, Paper WeDT14.2
An Augmented Reality Interaction Interface for Autonomous Drone
Video Attachment

Liu, Chuhao	Hong Kong University of Science and Technology
Shen, Shaojie	Hong Kong University of Science and Technology
Keywords: Aerial Systems: Applications, Virtual Reality and Interfaces Abstract: The Human Drone Interaction in autonomous navigation contains a lot of 3D spatial information, including reconstructed 3 Dimensional (3D) maps from drone and human desired tasks in the 3D world. Augmented Reality device can be a powerful interactive tool for handling 3D spatial data. The spatial mapping ability allows Augmented Reality devices to recognize physical objects around the human operator and render virtual objects in the physical world. In this work, we built an Augmented Reality interface that displays the drone reconstructed 3D map on the surface plane in the real world. Human commands can be further set by intuitive head gaze and hand gesture on the 3D map. The interface is deployed on a real autonomous drone to explore an unknown environment. We further conduct a user study to evaluate the overall performance.Human drone interaction in autonomous navigation incorporates spatial interaction tasks, including reconstructed 3D map from the drone and human desired target position. Augmented Reality (AR) devices can be powerful interactive tools for handling these spatial interactions. In this work, we build an AR interface that displays the reconstructed 3D map from the drone on physical surfaces in front of the operator. Spatial target positions can be further set on the 3D map by intuitive head gaze and hand gesture. The AR interface is deployed to interact with an autonomous drone to explore an unknown environment. A user study is further conducted to evaluate the overall interaction performance.

17:00-17:15, Paper WeDT14.3
Visualization of Intended Assistance for Acceptance of Shared Control
Video Attachment

Brooks, Connor	University of Colorado Boulder
Szafir, Daniel J.	University of Colorado Boulder
Keywords: Human Factors and Human-in-the-Loop, Human-Centered Robotics, Telerobotics and Teleoperation Abstract: In shared control, advances in autonomous robotics are applied to help empower a human user in operating a robotic system. While these systems have been shown to improve efficiency and operation success, users are not always accepting of the new control paradigm produced by working with an assistive controller. This mismatch between performance and acceptance can prevent users from taking advantage of the benefits of shared control systems for robotic operation. To address this mismatch, we develop multiple types of visualizations for improving both the legibility and perceived predictability of assistive controllers, then conduct a user study to evaluate the impact that these visualizations have on user acceptance of shared control systems. Our results demonstrate that shared control visualizations must be designed carefully to be effective, with users requiring visualizations that improve both legibility and predictability of the assistive controller in order to voluntarily relinquish control.

17:15-17:30, Paper WeDT14.4
Mixed Reality As a Bidirectional Communication Interface for Human-Robot Interaction
Video Attachment

Rosen, Eric	Brown University
Whitney, David	Brown University
Fishman, Michael	Brown University
Ullman, Daniel	Brown University
Tellex, Stefanie	Brown
Keywords: Human-Centered Robotics, Virtual Reality and Interfaces Abstract: We present a decision-theoretic model and robot system that interprets multimodal human communication to disambiguate item references by asking questions via a mixed reality (MR) interface. Existing approaches have either chosen to use physical behaviors, like pointing and eye gaze, or virtual behaviors, like mixed reality. However, there is a gap of research on how MR compares to physical actions for reducing robot uncertainty. We test the hypothesis that virtual deictic gestures are better for human-robot interaction (HRI) than physical behaviors. To test this hypothesis, we propose the Physio-Virtual Deixis Partially Observable Markov Decision Process (PVDPOMDP), which interprets multimodal observations (speech, eye gaze, and pointing gestures) from the human and decides when and how to ask questions (either via physical or virtual deictic gestures) in order to recover from failure states and cope with sensor noise. We conducted a between-subjects user study with 80 participants distributed across three conditions of robot communication: no feedback control, physical feedback, and MR feedback. We tested performance of each condition with objective measures (accuracy, time), as well as evaluated user experience with subjective measures (usability, trust, workload). We found the MR feedback condition was 10% more accurate than the physical condition and a speedup of 160%.We also found that the feedback conditions significantly outperformed the no feedback condition in all subjective metrics.


WeDT15	Room T15
Virtual Reality II	Regular session
Chair: Hein, Bj�rn	Karlsruhe Institute of Technology

16:30-16:45, Paper WeDT15.1
Augmented Reality User Interfaces for Heterogeneous Multirobot Control
Video Attachment

Chacon Quesada, Rodrigo	Imperial College London
Demiris, Yiannis	Imperial College London
Keywords: Virtual Reality and Interfaces, Multi-Robot Systems, Human-Centered Robotics Abstract: Recent advances in the design of head-mounted augmented reality (AR) interfaces for assistive human-robot interaction (HRI) have allowed untrained users to rapidly and fluently control single-robot platforms. In this paper, we investigate how such interfaces transfer onto multirobot architectures, as several assistive robotics applications need to be distributed among robots that are different both physically and in terms of software. As part of this investigation, we introduce a novel head-mounted AR interface for heterogeneous multirobot control. This interface generates and displays dynamic joint-affordance signifiers, i.e. signifiers that combine and show multiple actions from different robots that can be applied simultaneously to an object. We present a user study with 15 participants analysing the effects of our approach on their perceived fluency. Participants were given the task of filling-out a cup with water making use of a multirobot platform. Our results show a clear improvement in standard HRI fluency metrics when users applied dynamic joint-affordance signifiers, as opposed to a sequence of independent actions.

16:45-17:00, Paper WeDT15.2
What the HoloLens Maps Is Your Workspace: Fast Mapping and Set-Up of Robot Cells Via Head Mounted Displays and Augmented Reality
Video Attachment

Puljiz, David	Karlsruhe Institute of Technology
Krebs, Franziska	Karlsruhe Institute of Technology
B�sing, Fabian	Karlsruhe Institute of Technology
Hein, Bj�rn	Karlsruhe Institute of Technology
Keywords: Virtual Reality and Interfaces, Industrial Robots, RGB-D Perception Abstract: Classical methods of modelling and mapping robot work cells are time consuming, expensive and involve expert knowledge. We present a novel approach to mapping and cell setup using modern Head Mounted Displays (HMDs) that possess self-localisation and mapping capabilities. We leveraged these capabilities to create a point cloud of the environment and build an OctoMap - a voxel occupancy grid representation of the robot's workspace for path planning. Through the use of Augmented Reality (AR) interactions, the user can edit the created Octomap and add safety zones. We perform comprehensive tests of the HoloLens' depth sensing capabilities and the quality of the resultant point cloud. A high-end laser scanner is used to provide the ground truth for the evaluation of the point cloud quality. The amount of false-positive and false-negative voxels in the OctoMap are also tested.

17:00-17:15, Paper WeDT15.3
Adaptive Precision-Enhancing Hand Rendering for Wearable Fingertip Tracking Devices
Video Attachment

Park, Hyojoon	Korea Institute of Science and Technology
Park, Jung-Min	Korea Institute of Science and Technology
Keywords: Virtual Reality and Interfaces, Wearable Robots, Haptics and Haptic Interfaces Abstract: We introduce a 3D hand rendering framework to reconstruct a visually realistic hand from a set of fingertip positions. One of the key limitations of wearable fingertip tracking devices used in VR/AR applications is the lack of detailed measurements and tracking of the hand, making the hand rendering difficult. The motivation for this paper is to develop a general framework to render a visually plausible hand given only the fingertip positions. In addition, our framework adjusts the size of a virtual hand based on the fingertip positions and device�s structure, and reduces a mismatch between the pose of the rendered and user�s hand by retargeting virtual finger motions. Moreover, we impose a new hinge constraint on the finger model to employ a real-time inverse kinematic solver. We show our framework is helpful for performing virtual grasping tasks more efficiently when only the measurements of fingertip positions are available.

17:15-17:30, Paper WeDT15.4
Virtual Reality for Robots

Suomalainen, Markku	University of Oulu
Nilles, Alexandra	University of Illinois - Champaign-Urbana
LaValle, Steven M	University of Oulu
Keywords: Virtual Reality and Interfaces, Simulation and Animation, Transfer Learning Abstract: This paper applies the principles of Virtual Reality (VR) to robots, rather than living organisms. A simulator, of either physical states or information states, renders outputs to custom displays that fool the robot's sensors. This enables a robot to experience a combination of real and virtual sensor inputs, combining the efficiency of simulation and the benefits of real world sensor inputs. Thus, the robot can be taken through targeted experiences that are more realistic than pure simulation, yet more feasible and controllable than pure real-world experiences. We define two distinctive methods for applying VR to robots, namely black box and white box; based on these methods we identify potential applications, such as testing and verification procedures that are better than simulation, the study of spoofing attacks and anti-spoofing techniques, and sample generation for machine learning. A general mathematical framework is presented, along with a simple experiment, detailed examples, and discussion of the implications.


WeDT16	Room T16
Telerobotics and Teleoperation I	Regular session
Chair: Ott, Christian	German Aerospace Center (DLR)
Co-Chair: Muradore, Riccardo	University of Verona

16:30-16:45, Paper WeDT16.1
A Passivity-Based Bilateral Teleoperation Architecture Using Distributed Nonlinear Model Predictive Control

Piccinelli, Nicola	University of Verona
Muradore, Riccardo	University of Verona
Keywords: Telerobotics and Teleoperation, Optimization and Optimal Control Abstract: Bilateral teleoperation systems allow the telepresence of an operator while working remotely. Such ability becomes crucial when dealing with critical environments like space, nuclear plants, rescue, and surgery. The main properties of a teleoperation system are the stability and the transparency which, in general, are in contrast and they cannot be fully achieved at the same time. In this paper, we will present a novel model predictive controller that implements a passivity-based bilateral teleoperation algorithm. Our solution mitigates the chattering issue arising when resorting to the energy tank (or reservoir) mechanism by forcing the passivity as a hard constraint on the system evolution.

16:45-17:00, Paper WeDT16.2
A Probabilistic Shared-Control Framework for Mobile Robots

Gholami, Soheil	Istituto Italiano Di Tecnologia (IIT)
Ruiz Garate, Virginia	Istituto Italiano Di Tecnologia
De Momi, Elena	Politecnico Di Milano
Ajoudani, Arash	Istituto Italiano Di Tecnologia
Keywords: Telerobotics and Teleoperation, Motion and Path Planning, Autonomous Agents Abstract: Full teleoperation of mobile robots during the execution of complex tasks not only demands high cognitive and physical effort but also generates less optimal trajectories compared to autonomous controllers. However, the use of the latter in cluttered and dynamically varying environments is still an open and challenging topic. This is due to several factors such as sensory measurement failures and rapid changes in task requirements. Shared-control approaches have been introduced to overcome these issues. However, these either present a strong decoupling that makes them still sensitive to unexpected events, or highly complex interfaces only accessible to expert users. In this work, we focus on the development of a novel and intuitive shared-control framework for target detection and control of mobile robots. The proposed framework merges the information coming from a teleoperation device with a stochastic evaluation of the desired goal to generate autonomous trajectories while keeping a human-in-control approach. This allows the operator to react in case of goal changes, sensor failures, or unexpected disturbances. The proposed approach is validated through several experiments both in simulation and in a real environment where the users try to reach a chosen goal in the presence of obstacles and unexpected disturbances. Operators receive both visual feedback of the environment and voice feedback of the goal estimation status while teleoperating a mobile robot through a control-pad. Results of the proposed method are compared to pure teleoperation proving a better time-efficiency and easiness-of-use of the presented approach.

17:00-17:15, Paper WeDT16.3
A Passivity-Shortage Based Control Framework for Teleoperation with Time-Varying Delays

Babu Venkateswaran, Deepalakshmi	University of Central Florida
Qu, Zhihua	University of Central Florida
Keywords: Telerobotics and Teleoperation, Multi-Robot Systems, Dynamics Abstract: This paper investigates the effect of time-varying delays in bilateral teleoperation, with respect to stability and performance, using the properties of passivity-shortage. Until recently, the desired stability and performance characteristics were achieved using the concept of passivity. However, passivity is limited to systems of relative degree zero or one, while passivityshortage includes systems of higher relative degrees and possibly of non-minimum phase. Passivity-shortage also arises naturally when data transmission is subject to delays, either constant or time-varying. In this paper, the properties of passivity-shortage are employed to design a simple negative feedback controller. We show that the proposed method provides a faster responding solution and improved performance compared to the existing approaches. The performance improvements include improved steady-state error convergence, and robustness against environmental disturbances, even in the presence of varying delays.

17:15-17:30, Paper WeDT16.4
Proxy-Based Approach for Position Synchronization of Delayed Robot Coupling without Sacrificing Performance

Singh, Harsimran	DLR German Aerospace Center
Panzirsch, Michael	DLR Institute of Robotics and Mechatronics
Coelho, Andre	German Aerospace Center (DLR)
Ott, Christian	German Aerospace Center (DLR)
Keywords: Telerobotics and Teleoperation Abstract: The application of the position-position architecture for enabling position synchronization of two robotic agents has been proven effective in the fields of telemanipulation and rendezvous of autonomous vehicles. Nevertheless, the approaches presented to this date with the purpose of rendering the position-position architecture passive under the presence of time-delays and packet-loss are only partially able to fulfil that goal. This owes to the fact that they mostly focus on passivating the system, at the cost of transparency. Such an issue becomes even more critical in the presence of position drift caused by most passivation methods. This paper presents a novel control approach that enhances the position synchronization of agents suffering from delayed coupling, by introducing a local proxy reference to one of the agents and only closing the feedback loop when it can preserve stability. The concept is free of position drift and promises less conservatism, without having any prior information about system parameters or prior assumptions regarding time-delay. It has been experimentally validated for time-varying round-trip delays of up to 2s.

17:30-17:45, Paper WeDT16.5
The 6-DoF Implementation of the Energy-Reflection Based Time Domain Passivity Approach with Preservation of Physical Coupling Behavior

Panzirsch, Michael	DLR Institute of Robotics and Mechatronics
Singh, Harsimran	DLR German Aerospace Center
Ott, Christian	German Aerospace Center (DLR)
Keywords: Telerobotics and Teleoperation, Haptics and Haptic Interfaces Abstract: Instability due to delayed communication is one of the main challenges in the coupling of autonomous robots but also in teleoperation with applications reaching from space to tele-healthcare scenarios. The Time Domain Passivity Approach assures stability despite delay and has already been validated in teleoperation scenarios from the International Space Station. It has been improved by a method considering energy reflection of the coupling controller recently. This extension has been shown to provide better performance in terms of position tracking and transmitted impedances which promises increased transparency for a human operator. This paper presents the 6-DoF implementation of the energy-reflection based approach and of an extended gradient method which promises to maintain the physical coupling behavior despite delay. An intense experimental validation confirms the performance increase due to both methods at delays up to 600ms in the 6-DoF case.

17:45-18:00, Paper WeDT16.6
Perpendicular Curve-Based Incomplete Orientation Mapping for Teleoperation with DOF Asymmetry
Video Attachment

Li, Gaofeng	Humanoids and Human Centered Mechatronics, Istituto Italiano Di
Caponetto, Fernando	Istituto Italiano Di Tecnologia
Del Bianco, Edoardo	Istituto Italiano Di Tecnologia
Sarakoglou, Ioannis	Fondazione Istituto Italiano Di Tecnologia
Katsageorgiou, Vasiliki-Maria	Humanoids and Human Centered Mechatronics, Istituto Italiano Di
Tsagarakis, Nikos	Istituto Italiano Di Tecnologia
Keywords: Telerobotics and Teleoperation, Human-Centered Robotics, Physical Human-Robot Interaction Abstract: Teleoperation systems require a human-centered approach in which the kinematic mapping is intuitive and straightforward for the operators. However, a mismatch in the degrees-of-freedom (DoFs) between master and slave could result in an asymmetrical teleoperation system. That is an obstacle for intuitive kinematic mapping. In particular, it is even more challenging when the missing DoF is a pure rotational DoF, since the rotation group SO(3) is a nonlinear Riemannian manifold. This paper is concerned with an asymmetric teleoperation system, where the master subsystem can provide 6-DoF pose sensing while the slave subsystem only has 5 DoFs. The rotation along the missing DoF, which is configuration-dependent, is mapped to a geodesic curve in SO (3). We define and prove the closed-form solution of the perpendicular curve to the geodesic curve. Based on the perpendicular curve, we develop a novel Incomplete Orientation Mapping (IOM) approach to avoid the motion in the missing DoF. By comparing with two baseline methods, the experimental results demonstrate that the proposed method can discard the rotational motion along the missing DoF for all configurations, while preserving the remaining rotations.


WeDT17	Room T17
Telerobotics and Teleoperation II	Regular session
Chair: Admoni, Henny	Carnegie Mellon University
Co-Chair: Pryor, Mitchell	University of Texas

16:30-16:45, Paper WeDT17.1
Reducing the Teleoperator's Cognitive Burden for Complex Contact Tasks Using Affordance Primitives
Video Attachment

Pettinger, Adam	The University of Texas at Austin
Elliott, Cassidy	The University of Texas at Austin
Fan, Zhiyuan	University of Texas at Austin
Pryor, Mitchell	University of Texas
Keywords: Compliance and Impedance Control, Telerobotics and Teleoperation, Motion Control Abstract: Using robotic manipulators to remotely perform real-world complex contact tasks is challenging whether tasks are known (due to uncertainty) or unknown a priori (lack of motion waypoints, force profiles, etc.). For known tasks we can integrate and utilize Affordance Templates with a selective compliance jogger to remotely perform high dimensional velocity/force tasks - such as turning valves, opening doors, etc. Affordance Templates (ATs) contain virtual visual representations of task-relevant objects and waypoints for interacting with visualized objects. Operators and/or developers align pre-defined ATs with real-world objects to complete complex tasks, potentially reducing the operator's input dimension to a single initiation command. In this work, we integrate a compliant controller with existing ATs to reduce the operator's burden by 1) reducing the dimension of commanded inputs, 2) internally managing contact forces even for complex tasks, and 3) providing situational awareness in the task frame. Since not all tasks can be modeled for general teleoperation, we also introduce Affordance Primitives which reduce the command dimensionality of complex spatial tasks to as low as 1-dimensional input gestures as demonstrated for this effort. To enable reduction of the command input's dimension, the same compliant jogger used to robustly handle uncertainty with ATs is used with Affordance Primitives to autonomously maintain force constraints associated with complex contact tasks. Both Affordance Templates and Affordance Primitives - when used in tandem with a compliant jogger - provide a safe, intuitive, and efficient teleoperation system for general use including using primitives to easily develop new Affordance Templates from newly completed teleoperation tasks.

16:45-17:00, Paper WeDT17.2
Design of a High-Level Teleoperation Interface Resilient to the Effects of Unreliable Robot Autonomy
Video Attachment

White, Samuel	Worcester Polytechnic Institute
Bisland, Keion	Worcester Polytechnic Institute
Collins, Michael	Worcester Polytechnic Institute
Li, Zhi	Worcester Polytechnic Institute
Keywords: Telerobotics and Teleoperation, Human Factors and Human-in-the-Loop, Virtual Reality and Interfaces Abstract: High-level control is generally preferred for the control of complex robot platforms and by users inexperienced with robot teleoperation. However, high-level teleoperation interfaces can be less effective if the robot autonomy is not reliable. To address this problem, it is important to understand how the users' preference of teleoperation interface may vary with the reliability of the robot autonomy, and understand what design features ameliorate the frustration and effort caused by unreliable autonomy. This paper proposes a graphical user interface for high-level robot control. The framework of the interface enables teleoperators to control a robot at the action level, and incorporates a simple but effective design that enables teleoperators to recover from task failure in a number of ways. We conducted a user study (N=25) to compare the performance and user experience when using the proposed high-level interface to a low-level interface (i.e., gamepad) for robot low-level control, on a representative manipulation task. We also investigated if the high-level teleoperation interface remains effective if the reliability of robot autonomy decreases. Our results show that a high-level interface able to handle the most frequent errors is resilient to the effects of unreliable robot autonomy. Although the total task completion time increased as the robot autonomy becomes unreliable, the users' perception of workload and task performance are not affected. Through the user study, we also reveal the desirable interface features.

17:00-17:15, Paper WeDT17.3
Semi-Autonomous Control of Leader-Follower Excavator Using Admittance Control for Synchronization and Autonomy with Bifurcation and Stagnation for Human Interface
Video Attachment

Iwano, Kohei	Tokyo Institute of Technology
Okada, Masafumi	Tokyo Institute of Technology
Keywords: Telerobotics and Teleoperation, Physical Human-Robot Interaction, Mining Robotics Abstract: So far, multiple LCD monitors and joysticks have been used for remote operation of excavators while it has low work efficiency. This is because it is difficult for the operator to recognize the state of the excavator and its surrounding environment. We have developed a semi-autonomous control system which consists of autonomy (attractor based dynamical system) and human action (admittance control). On the other hand, excavation tasks require the task selection. In this paper, we propose a nonlinear dynamical system with attractor with stagnation and bifurcation. The stagnation of the attractor is designed as a negative divergence vector field that converges to a point on the trajectory. A stagnation is placed at a bifurcation point of the trajectory, and the operator selects the next task by adding a force to the leader system.

17:15-17:30, Paper WeDT17.4
Diminished Reality for Close Quarters Robotic Telemanipulation

Taylor, Ada	Carnegie Mellon University
Matsumoto, Ayaka	NARA Institute of Science and Technology
Carter, Elizabeth	The Walt Disney Company
Plopski, Alexander	Nara Institute of Science and Technology
Admoni, Henny	Carnegie Mellon University
Keywords: Virtual Reality and Interfaces, Human-Centered Robotics Abstract: In robot telemanipulation tasks, the robot can sometimes occlude a target object from the user's view. We investigate the potential of diminished reality to address this problem. Our method uses an optical see-through head-mounted display to create a diminished reality illusion that the robot is transparent, allowing users to see occluded areas behind the robot. To investigate benefits and drawbacks of robot transparency, we conducted a user study that examined diminished reality in a simple telemanipulation task involving both occluded and unoccluded targets. We discovered that while these visualizations show promise for reducing user effort, there are drawbacks in terms of task efficiency and user preference. We identified several friction points in user experiences with diminished reality interfaces. Finally, we describe several design trade-offs among different visualization options.

17:30-17:45, Paper WeDT17.5
Telemanipulation with Chopsticks: Analyzing Human Factors in User Demonstrations
Video Attachment

Ke, Liyiming	University of Washington
Kamat, Ajinkya	VNIT
Wang, Jingqiang	University of Washington
Bhattacharjee, Tapomayukh	University of Washington
Mavrogiannis, Christoforos	University of Washington
Srinivasa, Siddhartha	University of Washington
Keywords: Telerobotics and Teleoperation, Grasping, Human and Humanoid Motion Analysis and Synthesis Abstract: Chopsticks constitute a simple yet versatile tool that humans have used for thousands of years to perform a variety of challenging tasks ranging from food manipulation to surgery. Applying such a simple tool in a diverse repertoire of scenarios requires significant adaptability. Towards developing autonomous manipulators with comparable adaptability to humans, we study chopsticks-based manipulation to gain insights into human manipulation strategies. We conduct a within-subjects user study with 25 participants, evaluating three different data-collection methods: normal chopsticks, motion-captured chopsticks, and a novel chopstick telemanipulation interface. We analyze factors governing human performance across a variety of challenging chopstick-based grasping tasks. Although participants rated teleoperation as the least comfortable and most difficult-to-use method, teleoperation enabled users to achieve the highest success rates on three out of five objects considered. Further, we notice that subjects quickly learned and adapted to the teleoperation interface. Finally, while motion-captured chopsticks could provide a better reflection of how humans use chopsticks, the teleoperation interface can produce quality on-hardware demonstrations from which the robot can directly learn.

17:45-18:00, Paper WeDT17.6
A Passive pHRI Controller for Assisting the User in Partially Known Tasks (I)

Papageorgiou, Dimitrios	Aristotle University of Thessaloniki
Kastritsi, Theodora	Aristotle University of Thessaloniki
Doulgeri, Zoe	Aristotle University of Thessaloniki
Rovithakis, George	Aristotel University of Thessaloniki
Keywords: Physical Human-Robot Interaction, Human Performance Augmentation Abstract: In this article, a passive physical human�robot interaction (pHRI) controller is proposed to enhance pHRI performance in terms of precision, cognitive load, and user effort, in cases partial knowledge of the task is available. Partial knowledge refers to a subspace of SE(3) determined by the desired task, generally mixing both position and orientation variables, and is mathematically approximated by parametric expressions. The proposed scheme, which utilizes the notion of virtual constraints and the prescribed performance control methodology, is proved to be passive with respect to the interaction force, while guaranteeing constraint satisfaction in all cases. The control scheme is experimentally validated and compared with a dissipative control scheme utilizing a KUKA LWR4+ robot in a master�slave task; experiments also include an application to a robotic assembly case.


WeDT18	Room T18
Mapping and Planning for Distributed and Multi-Robot Systems	Regular session
Chair: Berman, Spring	Arizona State University
Co-Chair: Coogan, Samuel	Georgia Tech

16:30-16:45, Paper WeDT18.1
The Application of a Flexible Leader-Follower Control Algorithm to Different Mobile Autonomous Robots
Video Attachment

Simonsen, Aleksander Skjerlie	Norwegian Defence Research Establishment
Ruud, Else-Line Malene	Norwegian Defence Research Establishment
Keywords: Path Planning for Multiple Mobile Robots or Agents, Autonomous Agents Abstract: In a wide range of applications involving multiple mobile autonomous systems, maneuvering the robots, vehicles or vessels in some sort of formation is a vital component for the overall task performance. Maintaining a specific distance between the platforms or even a relative geometry may greatly enhance sensor performance, provide collision safety, ensure stable vehicle-to-vehicle communication, and is of critical importance when the systems are in some way physically connected. In this paper, we present a flexible leader-follower type formation control algorithm for autonomous robots which is very simple, generic, yet decent in performance. The method applies to any relative geometry between a leader and one or more followers. In addition to testing the algorithm in simulations for a wide range of scenarios, we have performed experiments involving several different autonomous systems, including small Unmanned Aerial Vehicles (UAVs), Autonomous Underwater Vehicles (AUVs) and Unmanned Surface Vehicles (USVs). This includes pairs of USVs physically interconnected by a tow.

16:45-17:00, Paper WeDT18.2
Clothoid-Based Moving Formation Control Using Virtual Structures

Greg, Droge	Utah State University
Merrell, Brian	Utah State University
Keywords: Distributed Robot Systems, Cooperating Robots, Motion and Path Planning Abstract: Formation control is a canonical problem in multi-agent systems as many multi-agent problems require agents to travel in coordination at some point during execution. This paper develops a method for coordinated moving formation control by building upon existing virtual structures approaches to define the relative vehicle positions and orientations and building upon clothoid-based motion planning to create the desired motion of the structure. The result is a coordinated formation control method that respects individual curvature constraints of each agent while allowing agents to track their desired positions within the formation with asymptotic convergence.

17:00-17:15, Paper WeDT18.3
Multi-Robot Joint Visual-Inertial Localization and 3-D Moving Object Tracking

Zhu, Pengxiang	University of California at Riverside
Ren, Wei	University of California, Riverside
Keywords: Distributed Robot Systems, Multi-Robot Systems, Localization Abstract: In this paper, we present a novel distributed algorithm to track a moving object�s state by utilizing a heterogenous mobile robot network in a three-dimensional (3-D) environment, wherein the robots� poses (positions and orientations) are unknown. Each robot is equipped with a monocular camera and an inertial measurement unit (IMU), and has the ability to communicate with its neighbors. Rather than assuming a known common global frame for all the robots (which is often the case in the literature regarding multi-robot systems), we allow each robot to perform motion estimation locally. For localization, we propose a multi-robot visual-inertial navigation systems (VINS) where one robot builds a prior map and then the map is used to bound the long-term drifts of the visual-inertial odometry (VIO) running on the other robots. Moreover, a novel distributed Kalman filter is introduced and employed to cooperatively track the six degree-of-freedom (6- DoF) motion of the object which is represented as a point cloud. Further, the object can be totally invisible to some robots during the tracking period. The proposed algorithm is extensively validated in Monte-Carlo simulations.

17:15-17:30, Paper WeDT18.4
A Distributed Scalar Field Mapping Strategy for Mobile Robots
Video Attachment

Lin, Tony X.	Georgia Institute of Technology
Al-Abri, Said	Georgia Institute of Technology
Coogan, Samuel	Georgia Tech
Zhang, Fumin	Georgia Institute of Technology
Keywords: Distributed Robot Systems, Mapping, Multi-Robot Systems Abstract: This paper proposes a distributed field mapping algorithm that drives a team of robots to explore and learn an unknown scalar field. The algorithm is based on a bio-inspired approach known as Speeding-Up and Slowing-Down (SUSD) for distributed source seeking problems. Our algorithm leverages a Gaussian Process model to predict field values as robots explore. By comparing Gaussian Process predictions with measurements of the field, agents search along the gradient of the model error while simultaneously improving the Gaussian Process model. We provide a proof of convergence to the gradient direction and demonstrate our approach in simulation and experiments using 2D wheeled robots and 2D flying autonomous miniature blimps.

17:30-17:45, Paper WeDT18.5
Path Negotiation for Self-Interested Multirobot Vehicles in Shared Space

Inotsume, Hiroaki	NEC Corporation
Aggarwal, Aayush	NEC Corporation
Higa, Ryota	NEC Corporation, National Institute of Advanced Industrial Scien
Nakadai, Shinji	NEC Corporation
Keywords: Path Planning for Multiple Mobile Robots or Agents, Planning, Scheduling and Coordination, Multi-Robot Systems Abstract: This paper addresses the problem of path negotiation among self-interested multirobot operators in shared space. In conventional multirobot path planning problems, most of the research thus far has focused on the coordination and planning of collision-free paths for multiple robots with some common objectives. On the contrary, the recent progress of technologies in autonomous vehicles, including automated guidance vehicles, unmanned aerial vehicles, and manned autonomous cars, has increased demand for solving coordination and conflict avoidance in these autonomous and self-interested agents that pursue their own objectives. In this research, we tackle this problem from the operator perspective. We assume a problem setting where collisions between robots are avoided based on path reservation and negotiation. Under that circumstance, we propose a task-oriented utility function and a path negotiation algorithm for robot operators to maximize their task utility during path negotiation. The simulation and experiment results demonstrate the effectiveness of our task-based negotiation method over a simple path-based negotiation approach.

17:45-18:00, Paper WeDT18.6
Information Correlated Levy Walk Exploration and Distributed Mapping Using a Swarm of Robots (I)
Video Attachment

Ramachandran, Ragesh Kumar	University Southern California
Kakish, Zahi	Arizona State University
Berman, Spring	Arizona State University
Keywords: Distributed Robot Systems, Mapping, Swarms Abstract: In this work, we present a novel distributed method for constructing an occupancy grid map of an unknown environment using a swarm of robots with global localization capabilities and limited inter-robot communication. The robots explore the domain by performing L�vy walks in which their headings are defined by maximizing the mutual information between the robot�s estimate of its environment in the form of an occupancy grid map and the distance measurements that it is likely to obtain when it moves in that direction. Each robot is equipped with laser range sensors, and it builds its occupancy grid map by repeatedly combining its own distance measurements with map information that is broadcast by neighboring robots. Using results on average consensus over time-varying graph topologies, we prove that all robots� maps will eventually converge to the actual map of the environment. In addition, we demonstrate that a technique based on topological data analysis, developed in our previous work for generating topological maps, can be readily extended for adaptive thresholding of occupancy grid maps. We validate the effectiveness of our distributed exploration and mapping strategy through a series of 2D simulations and multi-robot experiments.


WeDT19	Room T19
Multi-Agent Planning & Control	Regular session
Chair: Kirchner, Matthew	University of California, Santa Barbara
Co-Chair: Mahmoudian, Nina	Purdue University

16:30-16:45, Paper WeDT19.1
Software Development Framework for Cooperating Robots with High-Level Mission Specification
Video Attachment

Hong, Hyesun	Seoul National University
Kang, Woosuk	Seoul National University
Ha, Soonhoi	Seoul National University
Keywords: Software, Middleware and Programming Environments, Cooperating Robots, Distributed Robot Systems Abstract: In recent years, there has been a growing interest in multiple robots performing a single task through different types of collaboration. There are two software challenges when deploying collaborative robots: how to specify a cooperative mission and how to program each robot to accomplish its mission. In this paper, we propose a novel software development framework to support distributed robot systems, swarm robots, and their hybrid. We extend the service-oriented and model-based (SeMo) framework to improve the robustness, scalability, and flexibility of robot collaboration. To enable a casual user to specify various types of cooperative missions easily, the high-level mission scripting language is extended with new features such as team hierarchy, group service, one-to-many communication. The script program is refined to the robot codes through two intermediate steps, strategy description and task graph generation, in the proposed framework. The viability of the proposed framework is evidenced by two preliminary experiments using real robots and a robot simulator.

16:45-17:00, Paper WeDT19.2
A Hamilton-Jacobi Formulation for Optimal Coordination of Heterogeneous Multiple Vehicle Systems

Kirchner, Matthew	University of California, Santa Barbara
Debord, Mark	University of Southern California
Hespanha, Jo�o Pedro	University of California, Santa Barbara
Keywords: Path Planning for Multiple Mobile Robots or Agents, Multi-Robot Systems, Optimization and Optimal Control Abstract: We present a method for optimal coordination of multiple vehicle teams when multiple endpoint configurations are equally desirable, such as seen in the autonomous assembly of formation flight. The individual vehicles' positions in the formation are not assigned a priori and a key challenge is to find the optimal configuration assignment along with the optimal control and trajectory. Commonly, assignment and trajectory planning problems are solved separately. We introduce a new multi-vehicle coordination paradigm, where the optimal goal assignment and optimal vehicle trajectories are found simultaneously from a viscosity solution of a single Hamilton-Jacobi (HJ) partial differential equation (PDE), which provides a necessary and sufficient condition for global optimality. Intrinsic in this approach is that individual vehicle dynamic models need not be the same, and therefore can be applied to heterogeneous systems. Numerical methods to solve the HJ equation have historically relied on a discrete grid of the solution space and exhibits exponential scaling with system dimension, preventing their applicability to multiple vehicle systems. By utilizing a generalization of the Hopf formula, we avoid the use of grids and present a method that exhibits polynomial scaling in the number of vehicles.

17:00-17:15, Paper WeDT19.3
Bounded Sub-Optimal Multi-Robot Path Planning Using Satisfiability Modulo Theory (SMT) Approach

Surynek, Pavel	Czech Technical University in Prague
Keywords: Path Planning for Multiple Mobile Robots or Agents, Motion and Path Planning, Multi-Robot Systems Abstract: Multi-robot path planning (MRPP) is a task of planning collision free paths for a group of robots in a graph. Each robot starts in its individual starting vertex and its task is to reach a given goal vertex. Existing techniques for solving MRPP optimally under various objectives include search-based and compilation-based approaches. Often however finding an optimal solution is too difficult hence sub-optimal algorithms that trade-off the quality of solutions and the runtime have been devised. We suggest eSMT-CBS, a new bounded sub-optimal algorithm built on top of recent compilation-based method for optimal MRPP based on satisfiability modulo theories (SMT). We compare eSMT-CBS with ECBS, a major representative of bounded sub-optimal search-based algorithms. The experimental evaluation shows significant advantage of eSMT-CBS across variety of scenarios.

17:15-17:30, Paper WeDT19.4
Data Driven Online Multi-Robot Formation Planning
Video Attachment

Cappo, Ellen	Carnegie Mellon University
Desai, Arjav Ashesh	Carnegie Mellon University
Michael, Nathan	Carnegie Mellon University
Keywords: Multi-Robot Systems, Motion and Path Planning Abstract: This work addresses planning for multi-robot formations online in cluttered environments via a data-driven search approach. The user-specified objective function governing formation shape and rotation is expressed in terms of offline demonstrations of robot motions (performed in an obstacle free environment). We leverage the offline demonstration to inform online planning for coordinated motions in the presence of obstacles. We formulate planning as a discrete search over demonstrated multi-robot actions, and select actions using a best-first approach to minimize edge expansions for fast online operation. Actions are selected using a heuristic based on their probability distribution exhibited in the demonstration, and we show that this approach is able to recreate coordinated motions exhibited in the demonstration when navigating in the obstructed conditions of the cluttered test environments. We demonstrate results in simulation over environments with increasing numbers of obstacles, and show that resulting plans are collision free and obey dynamic constraints.

17:30-17:45, Paper WeDT19.5
Collaborative Mission Planning for Long-Term Operation Considering Energy Limitations

Li, Bingxi	Michigan Technological University
Page, Brian	Purdue University
Moridian, Barzin	Purdue University
Mahmoudian, Nina	Purdue University
Keywords: Planning, Scheduling and Coordination, Multi-Robot Systems, Marine Robotics Abstract: Mobile robotics research and deployment is highly challenged by energy limitations, particularly in marine robotics applications. This challenge can be addressed by autonomous transfer and sharing of energy in addition to effective mission planning. Specifically, it is possible to overcome energy limitations in robotic missions using an optimization approach that can generate trajectories for both working robots and mobile chargers while adapting to environmental changes. Such a method must simultaneously optimize all trajectories in the robotic network to be able to maximize overall system efficiency. This paper presents a Genetic Algorithm based approach that is capable of solving this problem at a variety of scales, both in terms of the size of the mission area and the number of robots. The algorithm is capable of re-planning during operation, allowing for the mission to adapt to changing conditions and disturbances. The proposed approach has been validated in multiple simulation scenarios. Field experiments using an autonomous underwater vehicle and a surface vehicle verify feasibility of the generated trajectories. The simulation and experimental validation show that the approach efficiently generates feasible trajectories to minimize energy use when operating multi-robot networks.

17:45-18:00, Paper WeDT19.6
Pac-Man Is Overkill

Santos, Renato	Instituto Federal De Mato Grosso Do Sul
Ramachandran, Ragesh Kumar	University Southern California
Vieira, Marcos	Universidade Federal De Minas Gerais
Sukhatme, Gaurav	University of Southern California
Keywords: Path Planning for Multiple Mobile Robots or Agents, Multi-Robot Systems, Cooperating Robots Abstract: Pursuit-Evasion Game (PEG) consists of a team of pursuers trying to capture one or more evaders. PEG is important due to its application in surveillance, search and rescue, disaster robotics, boundary defense and so on. In general, PEG requires exponential time to compute the minimum number of pursuers to capture an evader. To mitigate this, we have designed a parallel optimal algorithm to minimize the capture time in PEG. Given a discrete topology, this algorithm also outputs the minimum number of pursuers to capture an evader. A classic example of PEG is the popular arcade game, Pac-Man. Although Pac-Man topology has almost 300 nodes, our algorithm can handle this. We show that PacMan is overkill, i.e., given the Pac-Man game topology, Pac-Man game contains more pursuers/ghosts (four) than it is necessary (two) to capture evader/Pac-man. We evaluate the proposed algorithm on many different topologies.


WeDT20	Room T20
Multi-Robot and Distributed Robot Systems	Regular session
Chair: Sharf, Inna	McGill University
Co-Chair: Schwager, Mac	Stanford University

16:30-16:45, Paper WeDT20.1
Distributed Motion Control for Multiple Connected Surface Vessels
Video Attachment

Wang, Wei	Massachusetts Institute of Technology
Wang, Zijian	Stanford University
Mateos, Luis	MIT
Huang, Kuan Wei	Massachusetts Institute of Technology
Schwager, Mac	Stanford University
Ratti, Carlo	Massachusetts Institute of Technology
Rus, Daniela	MIT
Keywords: Distributed Robot Systems, Marine Robotics, Multi-Robot Systems Abstract: We propose a scalable cooperative control approach which coordinates a group of rigidly connected autonomous surface vessels to track desired trajectories in a planar water environment as a single floating modular structure. Our approach leverages the implicit information of the structure's motion for force and torque allocation without explicit communication among the robots. In our system, a leader robot steers the entire group by adjusting its force and torque according to the structure's deviation from the desired trajectory, while follower robots run distributed consensus-based controllers to match their inputs to amplify the leader's intent using only onboard sensors as feedback. To cope with the nonlinear system dynamics in the water, the leader robot employs a nonlinear model predictive controller (NMPC), where we experimentally estimated the dynamics model of the floating modular structure in order to achieve superior performance for leader-following control. Our method has a wide range of potential applications in transporting humans and goods in many of today's existing waterways. We conducted trajectory and orientation tracking experiments in hardware with three custom-built autonomous modular robotic boats, called Roboat, which are capable of holonomic motions and onboard state estimation. Simulation results with up to 65 robots also prove the scalability of our proposed approach.

16:45-17:00, Paper WeDT20.2
Distributed Model Predictive Control for UAVs Collaborative Payload Transport

Wehbeh, Jad	McGill University
Rahman, Shatil	University of Toronto
Sharf, Inna	McGill University
Keywords: Optimization and Optimal Control, Aerial Systems: Mechanics and Control, Distributed Robot Systems Abstract: We consider the problem of collaborative trans port of a payload using several quadrotor vehicles. The payload is assumed to be a rigid body and is attached to the vehicles with rigid rods. The model of the system is presented and is employed to formulate a Model Predictive Controller. The centralized MPC formulation differs from others in the literature in the way the linearized model of the system is employed about a non-equilibrium state-input pair. We then present a decentralized formulation of MPC by distributing the computations among the vehicles. Simulations of both versions of the controller are carried out for a four-quadrotor system carrying out a transport maneuver of a box payload, for a cost penalizing the deviations of the vehicles from the desired trajectory and the attitude perturbations of the payload. The results confirm that the decentralized controller can yield a comparable performance to the centralized MPC implementation, for the same computation time of the two algorithms.

17:00-17:15, Paper WeDT20.3
Multi-UAV Surveillance with Minimum Information Idleness and Latency Constraints

Scherer, J�rgen	University of Klagenfurt
Rinner, Bernhard	Alpen-Adria-Universit�t Klagenfurt
Keywords: Multi-Robot Systems, Motion and Path Planning, Search and Rescue Robots Abstract: We discuss surveillance with multiple unmanned aerial vehicles (UAV) that minimize information idleness (the lag between the start of the mission and the moment when the data captured at a sensing location arrives at the base station) and constrain latency (the lag between capturing data at a sensing location and its arrival at the base station). This is important in surveillance scenarios where sensing locations should not only be visited as soon as possible, but the captured data needs to reach the base station in due time, especially if the surveillance area is larger than the communication range. In our approach, multiple UAVs cooperatively transport the data in a store-and-forward fashion along minimum latency paths (MLPs) to guarantee data delivery within a predefined latency bound. Additionally, MLPs specify a lower bound for any latency minimization problem where multiple mobile agents transport data in a store-and-forward fashion. We introduce three variations of a heuristic employing MLPs and compare their performance with an uncooperative approach in a simulation study. The results show that cooperative data transport reduces the information idleness at the base station compared to the uncooperative approach where data is transported individually by the UAVs.

17:15-17:30, Paper WeDT20.4
BioARS: Designing Adaptive and Reconfigurable Bionic Assembly Robotic System with Inchworm Modules
Video Attachment

Liu, Yide	Zhejiang University
Zhao, Donghao	Zhejiang University
Chen, Yanhong	Zhejiang University
Wang, Dongqi	Zhejiang University
Wen, Zhou	Zhejiang University
Ziyi, Ye	Zhejiang University
Guo, Jianhui	University
Zhou, Haofei	Zhejiang University
Qu, Shaoxing	Zhejiang University
Yang, Wei	Zhejiang University
Keywords: Multi-Robot Systems, Biologically-Inspired Robots Abstract: Designing a swarm of robots to address different tasks and adapt to various environments through self-assembly is one of the most challenging topics in the field of robotics research. Here, we present an assembly robotic system with inchworm robots as modules. The system is called BioARS (Bionic Assembly Robotic System). It can either work as a swarm of individual untethered inchworm robots or be assembled into a quadruped robot. The inchworm robots are connected by magnets using a �shoulder-to-shoulder� connecting method, which helps strengthen the magnetic connection. Central pattern generators are used to control the trot gait of the quadruped robot. Our experiments demonstrate that the bionic assembly system is adaptive in that it can pass through confined spaces in the form of inchworms and walk on rough terrain in the form of a quadruped robot. The proposed BioARS, therefore, combines the flexibility of inchworms and the adaptability of quadruped animals, which is promising for application in planetary exploration, earthquake search and rescue operations. We also provide several examples of directions for future research regarding our system.

17:30-17:45, Paper WeDT20.5
3D Coating Self-Assembly for Modular Robotic Scaffolds

Thalamy, Pierre	Univ. Bourgogne Franche-Comt� / FEMTO-ST / CNRS
Piranda, Beno�t	Universit� De Franche-Comt� / FEMTO-ST
Bourgeois, Julien	Institut FEMTO-ST
Keywords: Cellular and Modular Robots Abstract: This paper addresses the self-reconfiguration problem in large-scale modular robots for the purpose of shape formation for object representation. It aims to show that this process can be accelerated without compromising on the visual aspect of the final object, by creating an internal skeleton of the shape using the previously introduced sandboxing and scaffolding techniques, and then coating this skeleton with a layer of modules for higher visual fidelity. We discuss the challenges of the coating problem, introduce a basic method for constructing the coating of a scaffold layer by layer, and show that even with a straightforward algorithm, our scaffolding and coating combo uses much fewer modules than dense shapes and offers attractive reconfiguration times. Finally, we show that it could be a strong alternative to the construction of dense shapes using traditional self-reconfiguration algorithms.

17:45-18:00, Paper WeDT20.6
Autonomous Model-Based Assessment of Mechanical Failures of Reconfigurable Modular Robots with a Conjugate Gradient Solver

Holobut, Pawel	Institute of Fundamental Technological Research, Polish Academy
Bordas, Stephane	University of Luxembourg
Lengiewicz, Jakub	Institute of Fundamental Technological Research, Polish Academy
Keywords: Cellular and Modular Robots, Distributed Robot Systems, Whole-Body Motion Planning and Control Abstract: Large-scale 3D autonomous self-reconfigurable modular robots are made of numerous interconnected robotic modules that operate in a close packing. The modules are assumed to have their own CPU and memory, and are only able to communicate with their direct neighbors. As such, the robots embody a special computing architecture: a distributed memory and distributed CPU system with a local message-passing interface. The modules can move and rearrange themselves changing the robot's connection topology. This may potentially cause mechanical failures (e.g., overloading of some inter-modular connections), which are irreversible and need to be detected in advance. In the present contribution, we further develop the idea of performing model-based detection of mechanical failures, posed in the form of balance equations solved by the modular robot itself in a distributed manner. A special implementation of the Conjugate Gradient iterative solution method is proposed and shown to greatly reduce the required number of iterations compared with the weighted Jacobi method used previously. The algorithm is verified in a virtual test bed---the VisibleSim emulator of the modular robot. The assessments of time-, CPU-, communication- and memory complexities of the proposed scheme are provided.


WeDT21	Room T21
Multi-Robot Systems: Coverage	Regular session
Chair: Peterson, Cammy	Brigham Young University

16:30-16:45, Paper WeDT21.1
Game Theoretic Formation Design for Probabilistic Barrier Coverage

Shishika, Daigo	University of Pennsylvania
Guimar�es Macharet, Douglas	Universidade Federal De Minas Gerais
Sadler, Brian	Army Research Laboratory
Kumar, Vijay	University of Pennsylvania, School of Engineering and Applied Sc
Keywords: Multi-Robot Systems, Sensor Networks, Surveillance Systems Abstract: We study strategies to deploy defenders/sensors to detect intruders that approach a targeted region. This scenario is formulated as a barrier coverage, which aims to minimize the number of unseen paths. The problem becomes challenging when the number of defenders is insufficient for a full coverage, requiring us to find the most effective location to deploy them. To this end, we use ideas from game theory to account for various paths that the intruders may take. Specifically, we propose an iterative algorithm to refine the set of candidate defender formations, which uses the payoff matrix to directly evaluate the utility of different formations. Given the set of candidate formations, a mixed Nash equilibrium gives a stochastic policy to deploy the defenders. The efficacy of the proposed strategy is demonstrated by a numerical analysis that compares our method with an existing graph-theoretic method.

16:45-17:00, Paper WeDT21.2
Persistent Intelligence, Surveillance, and Reconnaissance Using Multiple Autonomous Vehicles with Asynchronous Route Updates

Peterson, Cammy	Brigham Young University
Casbeer, David	AFRL
Manyam, Satyanarayana Gupta	Air Force Research Labs
Rasmussen, Steven	Miami Valley Aerospace LLC
Keywords: Path Planning for Multiple Mobile Robots or Agents, Multi-Robot Systems, Cooperating Robots Abstract: Providing persistent intelligence, reconnaissance, and surveillance of targets is a challenging, but important task when time-critical information is required. In this letter, we provide a decentralized routing algorithm for coordinating multiple autonomous vehicles as they visit a discrete set of pre-defined targets with weighted revisit priorities. The algorithm utilizes a block coordinate ascent algorithm combined with a Monte Carlo tree search to tractably decide each vehicle's route. The result is a non-myopic algorithm for multiple vehicles that is decentralized, computationally tractable, and allows for target prioritization. Guarantees are provided that all targets will have finite revisit times and that the block coordinate ascent algorithm will converge to a block optimal solution. Numerical simulations illustrate the utility of this method by showing that the results are comparable to those of a centralized exhaustive search and that they degrade gracefully with limited communication and scale under increasing numbers of targets and vehicles.

17:00-17:15, Paper WeDT21.3
Adaptive Informative Sampling with Environment Partitioning for Heterogeneous Multi-Robot Systems
Video Attachment

Shi, Yunfei	Carnegie Mellon University
Wang, Ning	Carnegie Mellon University
Zheng, Jianmin	Carnegie Mellon University
Zhang, Yang	Carnegie Mellon University
Yi, Sha	Carnegie Mellon University
Luo, Wenhao	Carnegie Mellon University
Sycara, Katia	Carnegie Mellon University
Keywords: Multi-Robot Systems, Environment Monitoring and Management, Distributed Robot Systems Abstract: Multi-robot systems are widely used in environmental exploration and modeling, especially in hazardous environments. However, different types of robots are limited by different mobility, battery life, sensor type, etc. Heterogeneous robot systems are able to utilize various types of robots and provide solutions where robots are able to compensate each other with their different capabilities. In this paper, we consider the problem of sampling and modeling environmental characteristics with a heterogeneous team of robots. To utilize heterogeneity of the system while remaining computationally tractable, we propose an environmental partitioning approach that leverages various robot capabilities by forming a uniformly defined heterogeneity cost space. We combine with the mixture of Gaussian Processes model-learning framework to adaptively sample and model the environment in an efficient and scalable manner. We demonstrate our algorithm in field experiments with ground and aerial vehicles.

17:15-17:30, Paper WeDT21.4
Multi-Robot Containment and Disablement
Video Attachment

Maymon, Yuval	Bar Ilan University
Agmon, Noa	Bar Ilan University
Keywords: Multi-Robot Systems, Path Planning for Multiple Mobile Robots or Agents Abstract: This paper presents the multi-robot containment and disablement (CAD) problem. In this problem, a team of (ground or aerial) robots are engaged in a cooperative task of swarm containment and disablement (for example, locust swarm). Each team member is equipped with a tool that can both detect and disable the swarm individuals. The swarm is active in a given physical location, and the goal of the robots is twofold: to contain the swarm members such that the individuals will be prevented from expanding further beyond this area (this is referred to as perfect enclosure), and to fully disable the locust by reducing the size of the contained area (while preserving the perfect enclosure). We determine the minimal number of robots necessary to ensure perfect enclosure, and a placement of the robots about the contained area such that they will be able to guarantee perfect enclosure, as well as a distributed area reduction protocol maintaining perfect enclosure. We then suggest algorithms for handling the case in which there are not enough robots to guarantee perfect enclosure, and describe their performance based on rigorous experiments in the TeamBots simulator.

17:30-17:45, Paper WeDT21.5
Dec-PPCPP: A Decentralized Predator--Prey-Based Approach to Adaptive Coverage Path Planning Amid Moving Obstacles
Video Attachment

Hassan, Mahdi	University of Technology, Sydney
Mustafic, Daut	University of Technology Sydney
Liu, Dikai	University of Technology, Sydney
Keywords: Path Planning for Multiple Mobile Robots or Agents, Multi-Robot Systems, Collision Avoidance Abstract: Enabling multiple robots to collaboratively perform coverage path planning on complex surfaces embedded in mathbb{R}^3 in the presence of moving obstacles is a challenging problem that has not received much attention from researchers. As robots start to be practically deployed, it is becoming important to address this problem. A novel decentralized multi-robot coverage path planning approach is proposed that is adaptive to unexpected stationary and moving obstacles while aiming to achieve complete coverage with minimal cost. The approach is inspired by the predator-prey relation. For a robot (a prey), a virtual stationary predator enforces spatial ordering on the prey, and dynamic predators (other robots) cause the prey to be repelled resulting in better task allocation and collision-avoidance. The approach makes the best use of both worlds: offline global planning for tuning of model parameters based on a prior map of the surface, and real-time local planning for adaptive and swift decision making amid moving obstacles and other robots while preserving global behavior. Comparisons with other approaches and extensive testing and validation using different number of robots, different surfaces and obstacles, and various scenarios are conducted.

17:45-18:00, Paper WeDT21.6
Resilient Coverage: Exploring the Local-To-Global Trade-Off
Video Attachment

Ramachandran, Ragesh Kumar	University Southern California
Zhou, Lifeng	Virginia Tech
Preiss, James	USC
Sukhatme, Gaurav	University of Southern California
Keywords: Multi-Robot Systems, Failure Detection and Recovery, Cooperating Robots Abstract: We propose a centralized control framework to select suitable robots from a heterogeneous pool and place them at appropriate locations to monitor a region for events of interest. In the event of a robot failure, our framework repositions robots in a user-defined local neighborhood of the failed robot to compensate for the coverage loss. If repositioning robots locally fails to attain a user-specified level of desired coverage, the central controller augments the team with additional robots from the pool. The size of the local neighborhood around the failed robot and the desired coverage over the region are two objectives that can be varied to achieve a user-specified balance. We investigate the trade-off between the coverage compensation achieved through local repositioning and the computation required to plan the new robot locations. We also study the relationship between the size of the local neighborhood and the number of additional robots added to the team for a given user-specified level of desired coverage. Through extensive simulations and an experiment with a team of seven quadrotors we verify the effectiveness of our framework. We show that to reach a high level of coverage in a neighborhood with a large robot population, it is more efficient to enlarge the neighborhood size, instead of adding additional robots and repositioning them.


WeDT22	Room T22
Multi-Robot Systems: Learning	Regular session
Chair: Zhao, Ding	Carnegie Mellon University
Co-Chair: Tokekar, Pratap	University of Maryland

16:30-16:45, Paper WeDT22.1
MAPPER: Multi-Agent Path Planning with Evolutionary Reinforcement Learning in Mixed Dynamic Environments

Liu, Zuxin	Carnegie Mellon University
Chen, Baiming	Tsinghua University
Zhou, Hongyi	Carnegie Mellon University
Senthil Kumar, Guru Koushik	Carnegie Mellon University
Hebert, Martial	CMU
Zhao, Ding	Carnegie Mellon University
Keywords: Path Planning for Multiple Mobile Robots or Agents, Reinforecment Learning, AI-Based Methods Abstract: Multi-agent navigation in dynamic environments is of great industrial value when deploying a large scale fleet of robot to real-world applications. This paper proposes a decentralized partially observable multi-agent path planning with evolutionary reinforcement learning (MAPPER) method to learn an effective local planning policy in mixed dynamic environments. Reinforcement learning-based methods usually suffer performance degradation on long-horizon tasks with goal-conditioned sparse rewards, so we decompose the long-range navigation task into many easier sub-tasks under the guidance of a global planner, which increases agents' performance in large environments. Moreover, most existing multi-agent planning approaches assume either perfect information of the surrounding environment or homogeneity of nearby dynamic agents, which may not hold in practice. Our approach models dynamic obstacles' behavior with an image-based representation and trains a policy in mixed dynamic environments without homogeneity assumption. To ensure multi-agent training stability and performance, we propose an evolutionary training approach that can be easily scaled to large and complex environments. Experiments show that MAPPER is able to achieve higher success rates and more stable performance when exposed to a large number of non-cooperative dynamic obstacles compared with traditional reaction-based planner LRA* and the state-of-the-art learning-based method.

16:45-17:00, Paper WeDT22.2
Scaling up Multiagent Reinforcement Learning for Robotic Systems: Learn an Adaptive Sparse Communication Graph

Sun, Chuangchuang	Massachusetts Institute of Technology
Shen, Macheng	Massachusetts Institute of Technology
How, Jonathan Patrick	Massachusetts Institute of Technology
Keywords: Multi-Robot Systems, Reinforecment Learning, Novel Deep Learning Methods Abstract: The complexity of multiagent reinforcement learning (MARL) in multiagent systems increases exponentially with respect to the agent number. This scalability issue prevents MARL from being applied in large-scale multiagent systems. However, one critical feature in MARL that is often neglected is that the interactions between agents are quite sparse. Without exploiting this sparsity structure, existing works aggregate information from all of the agents and thus have a high sample complexity. To address this issue, we propose an adaptive sparse attention mechanism by generalizing a sparsity-inducing activation function. Then a sparse communication graph in MARL is learned by graph neural networks based on this new attention mechanism. Through this sparsity structure, the agents can communicate in an effective as well as efficient way via only selectively attending to agents that matter the most and thus the scale of the MARL problem is reduced with little optimality compromised. Comparative results show that our algorithm can learn an interpretable sparse structure and outperforms previous works by a significant margin on applications involving a large-scale multiagent system.

17:00-17:15, Paper WeDT22.3
Risk-Aware Planning and Assignment for Ground Vehicles Using Uncertain Perception from Aerial Vehicles

Sharma, Vishnu	University of Maryland
Toubeh, Maymoonah	Virginia Tech
Zhou, Lifeng	Virginia Tech
Tokekar, Pratap	University of Maryland
Keywords: Path Planning for Multiple Mobile Robots or Agents, Deep Learning for Visual Perception, Multi-Robot Systems Abstract: We propose a risk-aware framework for multi-robot, multi-demand assignment and planning in unknown environments. Our motivation is disaster response and search-and-rescue scenarios where ground vehicles must reach demand locations as soon as possible. We consider a setting where the terrain information is available only in the form of an aerial, georeferenced image. Deep learning techniques can be used for semantic segmentation of the aerial image to create a cost map for safe ground robot navigation. Such segmentation may still be noisy. Hence, we present a joint planning and perception framework that accounts for the risk introduced due to noisy perception. Our contributions are two-fold: (i) we show how to use Bayesian deep learning techniques to extract risk at the perception level; and (ii) use a risk-theoretical measure, CVaR, for risk-aware planning and assignment. The pipeline is theoretically established, then empirically analyzed through two datasets. We find that accounting for risk at both levels produces quantifiably safer paths and assignments.

17:15-17:30, Paper WeDT22.4
With Whom to Communicate: Learning Efficient Communication for Multi-Robot Collision Avoidance
Video Attachment

Serra-G�mez, �lvaro	Delft University of Technology
Brito, Bruno	TU Delft
Zhu, Hai	Delft University of Technology
Chung, Jen Jen	Eidgen�ssische Technische Hochschule Z�rich
Alonso-Mora, Javier	Delft University of Technology
Keywords: Path Planning for Multiple Mobile Robots or Agents, Collision Avoidance, Distributed Robot Systems Abstract: Decentralized multi-robot systems typically perform coordinated motion planning by constantly broadcasting their intentions as a means to cope with the lack of a central system coordinating the efforts of all robots. Especially in complex dynamic environments, the coordination boost allowed by communication is critical to avoid collisions between cooperating robots. However, the risk of collision between a pair of robots fluctuates through their motion and communication is not always needed. Additionally, constant communication makes much of the still valuable information shared in previous time steps redundant. This paper presents an efficient communication method that solves the problem of ``when" and with ``whom" to communicate in multi-robot collision avoidance scenarios. In this approach, every robot learns to reason about other robots' states and considers the risk of future collisions before asking for the trajectory plans of other robots. We evaluate and verify the proposed communication strategy in simulation with four quadrotors and compare it with three baseline strategies: non-communicating, broadcasting and a distance-based method broadcasting information with quadrotors within a predefined distance.

17:30-17:45, Paper WeDT22.5
GLAS: Global-To-Local Safe Autonomy Synthesis for Multi-Robot Motion Planning with End-To-End Learning
Video Attachment

Riviere, Benjamin	California Institute of Technology
Hoenig, Wolfgang	California Institute of Technology
Yue, Yisong	California Institute of Technology
Chung, Soon-Jo	Caltech
Keywords: Path Planning for Multiple Mobile Robots or Agents, Learning from Demonstration, Distributed Robot Systems Abstract: We present GLAS: Global-to-Local Autonomy Synthesis, a provably-safe, automated distributed policy generation for multi-robot motion planning. Our approach combines the advantage of centralized planning of avoiding local minima with the advantage of decentralized controllers of scalability and distributed computation. In particular, our synthesized policies only require relative state information of nearby neighbors and obstacles, and compute a provably-safe action. Our approach has three major components: i) we generate demonstration trajectories using a global planner and extract local observations from them, ii) we use deep imitation learning to learn a decentralized policy that can run efficiently online, and iii) we introduce a novel differentiable safety module to ensure collision-free operation, enabling end-to-end policy training. Our numerical experiments demonstrate that our policies have a 20 % higher success rate than ORCA across a wide range of robot and obstacle densities. We demonstrate our method on an aerial swarm, executing the policy on low-end microcontrollers in real-time.

17:45-18:00, Paper WeDT22.6
Graph Neural Networks for Decentralized Multi-Robot Path Planning
Video Attachment

Li, Qingbiao	The University of Cambridge
Gama, Fernando	University of Pennsylvania
Ribeiro, Alejandro	University of Pennsylvania
Prorok, Amanda	University of Cambridge
Keywords: Multi-Robot Systems, Path Planning for Multiple Mobile Robots or Agents, Imitation Learning Abstract: Effective communication is key to successful, decentralized, multi-robot path planning. Yet, it is far from obvious what information is crucial to the task at hand, and how and when it must be shared among robots. To side-step these issues and move beyond hand-crafted heuristics, we propose a combined model that automatically synthesizes local communication and decision-making policies for robots navigating in constrained workspaces. Our architecture is composed of a convolutional neural network (CNN) that extracts adequate features from local observations, and a graph neural network (GNN) that communicates these features among robots. We train the model to imitate an expert algorithm, and use the resulting model online in decentralized planning involving only local communication and local observations. We evaluate our method in simulations {by navigating teams of robots to their destinations in 2D} cluttered workspaces. We measure the success rates and sum of costs over the planned paths. The results show a performance close to that of our expert algorithm, demonstrating the validity of our approach. In particular, we show our model's capability to generalize to previously unseen cases (involving larger environments and larger robot teams).


WeDT23	Room T23
Networked Robots	Regular session
Chair: Mehta, Ankur	UCLA
Co-Chair: Bobadilla, Leonardo	Florida International University

16:30-16:45, Paper WeDT23.1
Communication Maintenance of Robotic Parasitic Antenna Arrays
Video Attachment

Twigg, Jeffrey	Army Research Lab
Chopra, Nikhil	University of Maryland, College Park
Sadler, Brian	Army Research Laboratory
Keywords: Networked Robots, Cooperating Robots Abstract: Recent developments in low-VHF antenna design and parasitic antenna array research show it is possible to form multi-robot antenna arrays. These arrays have the potential to extend communication range in urban and indoor environments. However, existing control formulations are not general enough to support this new capability. We first propose a generalized version of a disk model that describes a parasitic array. This model is then integrated into a Fiedler maximization approach for maintaining communication. Through simulation, we test our approach and demonstrate its ability to increase communication range by increasing array size.

16:45-17:00, Paper WeDT23.2
Swarm Relays: Distributed Self-Healing Ground-And-Air Connectivity Chains
Video Attachment

Varadharajan, Vivek shankar	Polytechnique Montr�al
St-Onge, David	Ecole De Technologie Superieure
Adams, Bram	Ecole Polytechnique De Montreal
Beltrame, Giovanni	Ecole Polytechnique De Montreal
Keywords: Swarms, Multi-Robot Systems, Distributed Robot Systems Abstract: The coordination of robot swarms -- large decentralized teams of robots -- generally relies on robust and efficient inter-robot communication. Maintaining communication between robots is particularly challenging in field deployments where robot motion, unstructured environments, limited computational resources, low bandwidth, and robot failures add to the complexity of the problem. In this paper we propose a novel lightweight algorithm that lets a heterogeneous group of robots navigate to a target in complex 3D environments while maintaining connectivity with a ground station by building a chain of robots. The fully decentralized algorithm is robust to robot failures, can heal broken communication links, and exploits heterogeneous swarms: when a target is unreachable by ground robots, the chain is extended with flying robots. We test the performance of our algorithm using up to 100 robots in a physics-based simulator with three mazes and several robot failure scenarios. We then validate the algorithm with physical platforms: 7 wheeled robots and 6 flying ones, in homogeneous and heterogeneous scenarios in the lab and on the field.

17:00-17:15, Paper WeDT23.3
Minimally Disruptive Connectivity Enhancement for Resilient Multi-Robot Teams

Luo, Wenhao	Carnegie Mellon University
Chakraborty, Nilanjan	Stony Brook University
Sycara, Katia	Carnegie Mellon University
Keywords: Multi-Robot Systems, Networked Robots, Sensor Networks Abstract: In this work, we focus on developing algorithms to maintain and enhance the connectivity of a multi-robot system with minimal disruption to the primary tasks that the robots are performing. Such algorithms are useful for collaborating robots to be resilient to reduction in connectivity of the communication graph of the robot team when robots can arrive or leave. These algorithms are also useful in a supervisory control setting when an operator wants to enhance the connectivity of the robot team. In contrast to many existing works that can only maintain the current connectivity of the multi-robot graph, we propose a generalized connectivity control framework that allows for reconfiguration of the multi-robot system to provably satisfy any connectivity demand, while minimally disrupting the execution of their original tasks. In particular, we propose a novel k-Connected Minimum Resilient Graph (k-CMRG) algorithm to compute an optimal k-connectivity graph that minimally constrains the robots' original task-related motion, and employ the Finite-Time Convergence Control Barrier Function (FCBF) to enforce the pairwise robot motion constraints defined by the edges of the graph. The original controllers are minimally modified to drive the robots and form the k-CMRG. We demonstrate the effectiveness of our approach via simulations in the presence of multiple tasks and robot failures.

17:15-17:30, Paper WeDT23.4
Predictive Control of Connected Mixed Traffic under Random Communication Constraints

Guo, Longxiang	Clemson University
Jia, Yunyi	Clemson University
Keywords: Networked Robots, Multi-Robot Systems, Control Architectures and Programming Abstract: Fully connected and automated vehicles have been envisioned to help improve the driving safety and efficiency of the transportation system. However, human-driven vehicles will still be present in the near future, which will lead to connected mixed traffic instead of fully connected and automated traffic. This is challenging because of the complexity of human-driving vehicles and the potential communication constraints in the connectivity. To address this issue, this paper models the connected mixed traffic and proposes model predictive control approaches with various prediction approaches including a new inverse model predictive control (IMPC) based approach to handle random communication delays and packet losses in connectivity. The human-in-the-loop experimental results for connected mixed traffic demonstrated the effectiveness and advantages of the proposed approaches, especially the predictive control with IMPC in handling communication constraints in mixed traffic.

17:30-17:45, Paper WeDT23.5
Path Planning under MIMO Network Constraints for Throughput Enhancement in Multi-Robot Data Aggregation Tasks

Pogue, Alexandra	UCLA
Hanna, Samer	University of California, Los Angeles
Nichols, Andy	University of California Santa Barbara
Chen, Xin	University of Southern California
Cabric, Danijela	University of California, Los Angeles
Mehta, Ankur	UCLA
Keywords: Networked Robots, Multi-Robot Systems, Path Planning for Multiple Mobile Robots or Agents Abstract: Under line-of-sight (LOS) network conditions, multi-input multi-output (MIMO) wireless communications can increase the channel capacity between a team of robots and a multi-antenna array at a stationary base station. This increased capacity can result in greater data throughput, shortening the time necessary to complete channel-limited data aggregation tasks. To take advantage of this higher capacity channel, the robots in the team must be positioned to maximize complex channel orthogonality between each robot and receiver antenna. Using geometrically motivated assumptions, we derive transmitter spacing rules that can be easily be added on to existing path plans to improve backhaul throughput for data offloading from the robot team, with minimal impact on other system objectives. We demonstrate the effectiveness of the approach---both in ideal as well as realistic channels outside the domain of our simplifying assumptions---with numerical examples of robot-coordinated path plans in two example environments, achieving up to 42% improvement in task completion times.

17:45-18:00, Paper WeDT23.6
Lightweight Multi-Robot Communication Protocols for Information Synchronization
Video Attachment

Alsayegh, Murtadha	Florida International University
Dutta, Ayan	University of North Florida
Vanegas, Peter	Florida International University
Bobadilla, Leonardo	Florida International University
Keywords: Distributed Robot Systems, Formal Methods in Robotics and Automation, Networked Robots Abstract: Communication is one of the most popular and efficient means of multi-robot coordination. Due to potential real-world constraints, such as limited bandwidth and contested scenarios, a communication strategy requiring to send, for example, all n bits of an environment representation might not be feasible in situations where the robots' data exchanges are frequent and large. To this end, we propose and implement lightweight, bandwidth-efficient, robot-to-robot communication protocols inspired by communication complexity results for data synchronization without exchanging the originally required n bits. We have tested our proposed approach both in simulation and with real robots. Simulation results show that the proposed method is computationally fast and enables the robots to synchronize the data (near) accurately while exchanging significantly smaller amounts of information (in the order of log n bits). Real-world experiments with two mobile robots show the practical feasibility of our proposed approach.

Technical Program for Wednesday October 28, 2020