| |
Last updated on July 16, 2025. This conference program is tentative and subject to change
Technical Program for Monday August 18, 2025
|
MoAT1 |
Room T1 |
Planning, Scheduling and Control 1 |
Regular Session |
Chair: Wang, Yinan | RPI |
|
10:30-10:48, Paper MoAT1.1 | |
A Learning-Based Approach to Motion Planning with State Lattices in Off-Road Environments |
|
Rosser, Joshua | University of Rochester |
Warnell, Garrett | U.S. Army Research Laboratory |
Lancaster, Eli | Booz Allen Hamilton |
Sanchez, Felix | Booz Allen Hamilton |
Fahnestock, Ethan | MIT |
Damm, Eric | University of Rochester |
Gregory, Jason M. | US Army Research Laboratory |
Howard, Thomas | University of Rochester |
Keywords: Motion and Path Planning, Learning and Adaptive Systems, Autonomous Vehicle Navigation
Abstract: To safely navigate unmanned ground vehicles op- erating in partially observed environments, unstructured, and partially observed environments, intelligence architectures re- quire efficient motion planning algorithms that generate near- optimal routes and satisfy motion constraints in real-time. The recombinant nature of state lattice-based search spaces enables efficient search in motion planning graphs with precomputed trajectory libraries that satisfy nonholonomic constraints. The implementation of lattice planner-based search spaces however requires design choices that include resolution, fidelity, and expressiveness of the graph. Parameters that are well tuned for some environments may prove suboptimal or ineffective in others. In this paper we proposed a classification-based approach to pa- rameter learning that optimizes state lattice planner performance online using context from the environment and planning problem definition. Experimental results using data collected from a high- speed off-road unmanned ground vehicle operating in off-road environments demonstrate a substantial improvement in relative optimality of generated trajectories for regional motion planning.
|
|
10:48-11:06, Paper MoAT1.2 | |
ARMOR: Egocentric Perception for Bimanual Robot Collision Avoidance and Motion Planning |
|
Kim, Daehwa | Carnegie Mellon University |
Srouji, Mario | Apple Inc |
Chen, Chen | Apple |
Zhang, Jian | Apple |
Keywords: Motion and Path Planning, Sensor-based Control, Motion Control
Abstract: Robotic arms have significant gaps in their sensing and perception, making it hard to perform motion planning in dense environments. To address this, we introduce ARMOR, a novel egocentric perception system that integrates both hardware and software, specifically incorporating wearable-like depth sensors for bimanual robotic platforms with arms. Our distributed perception approach enhances the robot's spatial awareness, and facilitates more agile motion planning. We also train a transformer-based imitation learning (IL) policy in simulation to perform dynamic collision avoidance, by leveraging around 86 hours worth of human realistic motions from the AMASS dataset. We show that our ARMOR perception is superior against a setup with multiple dense head-mounted, and externally mounted depth cameras, with a 63.7% reduction in collisions, and a 78.7% improvement in success rate. We also compare our IL policy against a sampling-based motion planning expert cuRobo, showing 31.6% less collisions, 16.9% higher success rate, and 26x reduction in computational latency. Lastly, we deploy our ARMOR perception on our real-world GR1 humanoid robot from Fourier Intelligence. The simulation environment, HW description, and 3D CAD files are available at https://daehwakim.com/armor.
|
|
11:06-11:24, Paper MoAT1.3 | |
Interaction-Minimizing Roadmap Optimization for High-Density Multi-Agent Path Finding |
|
Weindel, Sören | Karlsruhe Institute of Technology |
Wilch, Jan | Technical University of Munich |
Kögel, Christoph | SOMIC Verpackungsmaschinen GmbH & Co. KG |
Xiao, Kevin | Planar Motor Incorporated |
Vogel-Heuser, Birgit | Technical University Munich |
Keywords: Motion and Path Planning, Intelligent Transportation Systems, Cyber-physical Production Systems and Industry 4.0
Abstract: Modern industry increasingly demands customizability from each element of their workflow and factories. A prominent example for this is the advent of Automated Guided Vehicles (AGV) in intralogistics tasks, which autonomously navigate the manufacturing floor, reacting dynamically to variations in the workflow. One such application makes use of magnetically propelled planar drive systems to transport products between manufacturing stations, replacing traditional solutions which are limited in their ability to efficiently adapt to new requirements. This work presents an AGV control approach capable of offloading large parts of the computational expense into an offline preprocessing step: A unidirectional roadmap is generated using alternating position optimization and network modification operations, with the goal of reducing the number of interactions between agents to be resolved at runtime. This concept was successfully validated in simulation. An accompanying tech report and implementation further details the presented approach, as well as the used controller and simulator.
|
|
11:24-11:42, Paper MoAT1.4 | |
Swarm Intelligence-Based Optimization of Matrix System Layout with Workstation Combination and Assignment |
|
Lee, Changha | Sungkyunkwan University |
Oh, Seog-Chan | General Motors |
Arinez, Jorge | General Motors Research & Development Center |
Noh, Sang Do | Sungkyunkwan University |
Keywords: Optimization and Optimal Control, AI-Based Methods, Intelligent and Flexible Manufacturing
Abstract: The layout of a manufacturing system is a critical factor that significantly influences system performance during operation. Achieving an optimal layout requires simultaneous consideration of equipment selection, such as determining the appropriate combination of workstations, and workstation assignment for a layout design. However, existing research often assumes a predetermined workstation combination, focusing solely on optimizing workstation assignments for a layout design. Since workstation combination is typically undecided in the early stages of layout design, particularly in greenfield systems, an integrated optimization approach is essential. This study proposes a layout design optimization methodology for Matrix Manufacturing Systems (MMS) that simultaneously considers workstation combination (i.e., equipment selection) and workstation assignment problems. The proposed approach integrates Particle Swarm Optimization (PSO) with production simulation, employing a dimensionally reduced and information-dense solution representation—a structured format that encapsulates candidate solutions—to optimize both aspects simultaneously. To validate the proposed methodology, we conducted a case study on an automotive assembly process, comparing our approach with an existing method that separately solves workstation combination and assignment problems.
|
|
11:42-12:00, Paper MoAT1.5 | |
Robust Temporal Logic Planning under Contingency Constraints |
|
Yuksel, Sadik Bera | Northeastern University |
Taheri, Azizollah | Northeastern University |
Yazicioglu, Yasin | Northeastern University |
Aksaray, Derya | Northeastern University |
Keywords: Formal Methods in Robotics and Automation, Motion and Path Planning, Autonomous Agents
Abstract: In dynamic and uncertain environments, robots are often required not only to complete their primary tasks but also to be able to switch to a contingency mode that facilitates a safe and effective response in the face of an unpredictable event. In this paper, we address the problem of synthesizing optimal high-level control policies for a robot 1) to satisfy a desired task and 2) to be able to respond safely in the face of an unexpected event. We model the dynamics of the robot as a Stochastic Transition System. We express the primary task as a Time Window Temporal Logic (TWTL) specification, and we consider the contingency behavior to be reaching a safe region in at most k time steps with a probability greater than a desired threshold. We propose an automata-theoretic framework to compute optimal policies for task satisfaction and contingency behavior. While the contingency behavior is always ensured, the task satisfaction probability is maintained by minimally extending the mission horizon when necessary. We demonstrate the performance of the proposed method through simulations and experiments.
|
|
MoAT2 |
Room T2 |
TASE Paper Session 1 |
Special Session |
Chair: Wang, Hui | Florida State University |
|
10:30-10:48, Paper MoAT2.1 | |
Efficient Constrained Motion Planning Using Direct Sampling of Screw-Constraint Manifolds |
|
Pettinger, Adam | Texas A&M University |
Panthi, Janak | The University of Texas at Austin |
Alambeigi, Farshid | University of Texas at Austin |
Pryor, Mitchell | University of Texas |
Keywords: Motion and Path Planning, Manipulation Planning, Industrial and Service Robotics
Abstract: Manipulating articulated objects is especially difficult if the robot is operating autonomously or far from any human operator. Object articulation imposes strict constraints on robot motion, making it a challenge to generate valid trajectories to complete the task. Problems compound when the robot is mobile and operates in an uncontrolled environment, where the location or articulation model is unknown a priori. In this work, we leverage screw theory to model constraints imposed on a generic manipulator by simple articulated objects and present two novel, fast, and robust methods– Sequential Path Stepping (SPS) and Direct Screw Sampling (DSS) –for planning trajectories by directly sampling these constraints. We show that these methods are hardware-agnostic and work in cluttered environments using long, complex paths modeled by multiple screw-axis constraints. We demonstrate that modeling constraints using multiple screw axes handles objects with multiple DoF, or multi-step tasks (e.g., turning a knob before opening the door). In addition, the direct sampling component of the proposed approaches is implemented as a module that used with existing well-known probabilistic planning methods, allowing customization across different hardware, domains, or planning problems. We validate our methods across many planning and inverse kinematic elements, with three different mobile and stationary manipulators, and on a set of challenging planning problems that include single-and multiple-screw constraints. Results demonstrate a 97.6% success rate planning in cluttered environments, in less than 0.2 seconds.
|
|
10:48-11:06, Paper MoAT2.2 | |
MINER-RRT*: A Hierarchical and Fast Trajectory Planning Framework in 3D Cluttered Environments |
|
Wang, Pengyu | Hong Kong University of Science and Technology |
Tang, Jiawei | Hong Kong University of Science and Technology |
Lin, Hin Wang | The Hong Kong University of Science and Technology |
Zhang, Fan | The Hong Kong University of Science and Technology |
Wang, Chaoqun | Shandong University |
Wang, Jiankun | Southern University of Science and Technology |
Shi, Ling | The Hong Kong University of Science and Technology |
Meng, Max Q.-H. | The Chinese University of Hong Kong |
Keywords: Motion and Path Planning, Planning, Scheduling and Coordination
Abstract: Trajectory planning for quadrotors in cluttered environments has been challenging in recent years. While many trajectory planning frameworks have been successful, there still exists potential for improvements, particularly in enhancing the speed of generating efficient trajectories. In this paper, we present a novel hierarchical trajectory planning framework to reduce computational time and memory usage called MINER-RRT*, which consists of two main components. First, we propose a sampling-based path planning method boosted by neural networks, where the predicted heuristic region accelerates the convergence of rapidly-exploring random trees. Second, we utilize the optimal conditions derived from the quadrotor's differential flatness properties to construct polynomial trajectories that minimize control effort in multiple stages. Extensive simulation and real-world experimental results demonstrate that, compared to several state-of-the-art (SOTA) approaches, our method can generate high-quality trajectories with better performance in 3D cluttered environments.
|
|
11:06-11:24, Paper MoAT2.3 | |
Automated Ontology Generation for Zero-Shot Defect Identification in Manufacturing |
|
Yhdego, Tsegai | Florida A&M University-Florida State University College of Engin |
Wang, Hui | Florida State University |
Keywords: Process Control, Machine learning, Hybrid Strategy of Intelligent Manufacturing
Abstract: A lack of labeled data presents a significant challenge to automatic defect identification in manufacturing, which is a crucial step in process control and certification during process development. State-of-the-art transfer learning is incapable of handling such zero-shot learning (ZSL) when defect labels are absent in training datasets. The latest research on ZSL leverages natural language processing (NLP) based on large language models (LLM) and shows promise by supplementing information to generate labels. However, its performance is hampered by the supporting LLMs pre-trained on generic vocabulary that failed to characterize manufacturing defects accurately. This paper establishes a methodology to automatically extract multi-level attributes from literature to improve defect representation, thereby facilitating ZSL. The extracted attributes contribute to a hierarchical knowledge graph, called defect ontology, to characterize multiple aspects of manufacturing defects. The proposed algorithm takes the defect images and associated text from the literature as input and develops an unsupervised method to identify the hierarchical relationships among the tokenized information extracted from the input text-feature corpora. The hierarchical graph is refined to retain the most relevant information by a pruning algorithm based on a minimum path search. A walk algorithm, along with NLP, parsed the generated ontology to create embedding of defects to enable zero-shot attribute learning to identify defects. The proposed method advances the ZSL methodology by automatically creating a hierarchical knowledge representation from literature and images to replace generic vocabulary in LLM adopted by ZSL algorithms, thus improving defect representation.
|
|
11:24-11:42, Paper MoAT2.4 | |
Predicting Vulnerable Road User Behavior with Transformer-Based Gumbel Distribution Networks |
|
Astuti, Lia | Feng Chia University |
Lin, Yu-Chen | Feng Chia University |
Chiu, Chui-Hong | Feng Chia University |
Chen, Wen-Hui | National Taipei University of Technology |
Keywords: Collision Avoidance, Motion and Path Planning, AI-Based Methods
Abstract: This study introduces the crossing intention and trajectory with the Transformer networks (CITraNet) prediction model to process multimodal input data of vulnerable road users (VRUs), such as pedestrians and bicyclists, whose behavior is inherently unpredictable. Unlike traditional approaches that rely on sequence transduction and Gaussian distribution-based models, CITraNet employs Transformer networks and Gumbel distribution. First, we utilize multi-head attention and feed-forward layers to extract hidden features from historically observed data to allow effective parallelization and handling of long-range dependencies. Second, CITraNet features an innovative Transformer-based Gumbel distribution network that significantly enhances the model’s ability to accurately predict all possible trajectories using extreme value theory, which replaces the conventional Gaussian distribution models that struggle with discrete and non-linear data. The effectiveness and accuracy of CITraNet are validated on the Taiwan pedestrian (TaPed) dataset, as well as the publicly available JAAD and PIE datasets. The model’s deterministic and stochastic trajectory predictions are assessed over short (0.5s), medium (1.0s), and long (1.5s) intervals, crucial for gauging predictive accuracy across varying durations. The results demonstrate that CITraNet outperforms previous benchmarks.
|
|
11:42-12:00, Paper MoAT2.5 | |
Parallel Inspection Route Optimization with Priorities for 5G Base Station Networks |
|
Dai, Xiangqi | Tsinghua University |
Liang, Zhenglin | Tsinghua University |
Keywords: Machine learning, Motion and Path Planning
Abstract: 5G base station networks generate numerous alarms daily. With the increasing demand for digital services, it is vital to inspect and rectify anomalies to uphold user satisfaction. This study explores the potential of unmanned aerial vehicle (UAV) empowered opportunistic inspection based on alarm data. We formulate the inspection routing problem as a prioritized traveling salesman problem (PTSP) encompassing two categories of base stations. Priority is assigned to stations generating more alarms, while others are subject to opportunis tic inspection. To expedite large-scale opportunistic inspection routes, we introduce a novel transformer-based parallelizable routing algorithm (TPRA). TPRA is an intelligent optimization that orchestrates multiple parallelized constrained reinforcement learning algorithms. Through balancing spectral clustering, the large-scale graph is segmented into manageable subgraphs. For each subgraph, the prioritized inspection routing problem is for mulated as a constrained Markov decision process and optimized by transformer-based reinforcement learning in parallel. The optimized subgraphs are then merged using an adaptive large neighborhood search approach. Through parallel computing, our approach achieves as much as 75% reduction in computation time, while concurrently generating shorter routes. The approach is implemented in real-world cases to validate its efficacy.
|
|
MoAT3 |
Room T3 |
Optimization for Energy Systems |
Special Session |
Chair: Robba, Michela | University of Genoa |
Organizer: Grammatico, Sergio | Delft University of Technology |
Organizer: Dotoli, Mariagrazia | Politecnico Di Bari |
Organizer: Carli, Raffaele | Politecnico Di Bari |
Organizer: Scarabaggio, Paolo | Politecnico Di Bari |
Organizer: Mignoni, Nicola | Politecnico Di Bari |
|
10:30-10:45, Paper MoAT3.1 | |
Distributed Model Predictive Control for Building Automation Systems: A Parallel ADMM Approach (I) |
|
Robba, Michela | University of Genoa |
Ferro, Giulio | University of Genoa |
Parodi, Luca | University of Genoa |
Keywords: Building Automation, Power and Energy Systems automation, Optimization and Optimal Control
Abstract: This paper proposes a distributed Model Predictive ontrol (MPC)-based approach for comfort temperature tracking and electric consumption minimization in building automation systems (BASs). The developed optimization model and the overall architecture have been developed taking into account real-world applications with in-field controllers and sensors. A distributed optimization algorithm is here proposed, which extends the well-known alternating direction method of multipliers (ADMM) to handle inequality constraints (that are necessary to model the typical local temperature sensors and actuators in smart buildings). The methodology is validated through testing on a real case study, i.e. the Smart Energy Building (SEB) at the Savona Campus of the University of Genova, characterized by a geothermal heat pump, photovoltaics, storage systems, and charging stations. The algorithm allows reaching comfort temperature, limiting power variation for heat pump, and minimizing costs. As regards other solution methods, comparison with state-of-the-art approaches proves the reduction in 25% of iterations needed to convergence.
|
|
10:45-11:00, Paper MoAT3.2 | |
An Energy Management System for Green Ports (I) |
|
Casella, Virginia | University of Genoa |
Gallo, Marco | University of Genova |
Graffione, Federico | University of Genova |
Robba, Michela | University of Genoa |
Silvestro, Federico | University of Genova |
Keywords: Power and Energy Systems automation, Renewable Energy Sources, Optimization and Optimal Control
Abstract: Port areas serve as hubs for energy production, storage, distribution, and high consumption, and can be key players for the energy transition and reduction of emissions. In fact, the management of renewables, storage systems, electric vehicles and boats can be integrated with logistics operations to satisfy energy demands and to provide support to the distribution system operator. The aim of this paper is to present a new Energy Management System (EMS) for port areas (with specific reference to touristic ports), which can integrate sensors and technologies in field, simulation tools and optimization models, to minimize the overall costs and customers’ dissatisfaction. Each component communicates with the EMS using MQTT (Message Queuing Telemetry Transport) protocol. A real case study of a touristic port (Finale Ligure) in the Savona Municipality is considered.
|
|
11:00-11:15, Paper MoAT3.3 | |
Optimal Stochastic Management of Energy Storage Systems Based on Non-Linear Energy Reservoir Models (I) |
|
Mignoni, Nicola | Politecnico Di Bari |
Scarabaggio, Paolo | Politecnico Di Bari |
Carli, Raffaele | Politecnico Di Bari |
Dotoli, Mariagrazia | Politecnico Di Bari |
Keywords: Power and Energy Systems automation, Renewable Energy Sources, Optimization and Optimal Control
Abstract: This paper discusses and extends energy-reservoir models (ERMs) for energy storage systems (ESSs), recently proposed in the related literature, by introducing a non-unitary efficiency for the discharging process. This enhancement allows the resulting ESS model to more accurately represent losses during both charging and discharging cycles, albeit at the cost of introducing a non-linearity. We show that, while the lower ERM capacity bound preserves convexity, the upper bound does not. Hence, a mixed-integer reformulation is provided to tackle such a non-convexity. We focus on the perspective of a prosumer equipped with an ERM and served by an energy retailer characterized by a realistic energy pricing scheme. We also account for the inherent uncertainty in ESS management related to the prosumer's energy demand and generation curves: to accommodate this uncertainty, our approach accepts probabilistic forecasts as inputs, enabling objective function approximation through techniques such as sample average approximation. The proposed approach is numerically validated using real data, implementing the formulation within a model-predictive-control framework.
|
|
11:15-11:30, Paper MoAT3.4 | |
Multi-Energy Demand-Side Management for Flexibility Service in Distribution Networks (I) |
|
Wang, Meichen | University of Manchester |
Xu, Yiqiao | University of Manchester |
Parisio, Alessandra | The University of Manchester |
Keywords: Demand Side Management, Distributed Generation and Storage, Planning, Scheduling and Coordination
Abstract: The high penetration of renewable energy has significantly increased the demand for flexibility services, particularly in distribution networks. At the same time, the electrification of the heating sector introduces interactions and interdependencies between energy vectors, presenting a promising yet unexplored source of flexibility. This paper proposes a multi-energy demand-side management (DSM) framework for flexibility service provision in distribution networks. The proposed framework includes multi-energy assets and demands, such as pumped thermal energy storage (PTES) and combined heat and power (CHP) systems. Among diverse energy storage technologies, PTES emerges as a cost-effective and grid-scale solution with environmentally friendly operation and extended lifetime. The proposed model is formulated as a mixed-integer linear optimization problem to schedule resources, optimizing utilization payment and energy costs while ensuring compliance with the flexibility service requirements. The performance of the proposed optimization framework is verified through numerical studies on a modified benchmark network, demonstrating the successful operation of flexibility services and the potential of PTES to enhance network flexibility.
|
|
11:30-11:45, Paper MoAT3.5 | |
User-Centric Vehicle-To-Grid Optimization with an Input Convex Neural Network-Based Battery Degradation Model (I) |
|
Mallick, Arghya | Delft University of Technology |
Pantazis, Georgios | TU Delft |
Khosravi, Mohammad | TU Delft |
Mohajerin Esfahani, Peyman | TU Delft |
Grammatico, Sergio | Delft University of Technology |
Keywords: Plug-in Electric Vehicles, Machine learning, Human-Centered Automation
Abstract: We propose a data-driven, user-centric vehicle-to- grid (V2G) methodology based on multi-objective optimization to balance battery degradation and V2G revenue according to EV user preference. Given the lack of accurate and generalizable battery degradation models, we leverage input convex neural networks (ICNNs) to develop a data-driven degradation model trained on extensive experimental datasets. This approach enables our model to capture nonconvex dependencies on battery temperature and time while maintaining convexity with respect to the charging rate. Such a partial convexity property ensures that the second stage of our methodology remains computationally efficient. In the second stage, we integrate our data-driven degradation model into a multi- objective optimization framework to generate an optimal smart charging profile for each EV. This profile effectively balances the trade-off between financial benefits from V2G participation and battery degradation, controlled by a hyperparameter reflecting the user prioritization of battery health. Numerical simulations show the high accuracy of the ICNN model in predicting battery degradation for unseen data. Finally, we present a trade-off curve illustrating financial benefits from V2G versus losses from battery health degradation based on user preferences and showcase smart charging strategies under realistic scenarios.
|
|
11:45-12:00, Paper MoAT3.6 | |
Automatic Virtual Metrology for Long-Term Energy Baseline Forecasting (I) |
|
Tieng, Hao | National University of Tainan |
Chen, Pin-Jui | National Cheng Kung University |
Wu, Tung-Qing | National Cheng Kung University |
Wu, Chia-Hsi | National Cheng Kung University Intelligent Manufacturing Researc |
Cheng, Fan-Tien | National Cheng Kung University |
Keywords: Big data Analytics for Large-scale Energy Systems, Energy and Environment-aware Automation, Power and Energy Systems automation
Abstract: Global warming poses significant environmental and economic challenges, with energy consumption being a major contributor. Industry 4.2 for Green Intelligent Manufacturing (I4.2-GiM) integrates digitalization, intelligent manufacturing, and energy management systems (EMSs) to optimize energy efficiency and achieve net zero. Accurate long-term energy forecasting is crucial but challenging due to complex consumption patterns and external influences. Automatic Virtual Metrology (AVM) enhances real-time quality prediction, with the second generation AVM (AVMII) incorporating Convolutional Neural Networks (CNNs) for improving prediction accuracy. However, CNNs struggle with long-term dependencies. This study proposes the third generation AVM (AVMIII) by integrating Long Short-Term Memory (LSTM) networks to enhance long-term time-series predictions so as to enable smarter energy management in factories.
|
|
MoAT4 |
Room T4 |
Trajectory, Object, and Position 1 |
Regular Session |
Chair: Mikšík, Martin | Czech Technical University in Prague |
|
10:30-10:48, Paper MoAT4.1 | |
Synthetic Dataset for Vision-Based Air-To-Air Object Detection |
|
Rassas, Basaam | Toronto Metropolitan University |
Singoji, Shashank | Toronto Metropolitan University |
Waslander, Steven Lake | University of Toronto |
Faieghi, Reza | Toronto Metropolitan University |
Keywords: Computer Vision for Transportation, Deep Learning in Robotics and Automation, Autonomous Vehicle Navigation
Abstract: The increasing deployment of Uncrewed Aerial Vehicles (UAVs) across various industries has heightened the need for robust airborne object detection to ensure safe airspace operations. Vision-based deep learning models offer effective solutions but require large, diverse, and well-annotated datasets for optimal performance. Existing datasets are limited in diversity, expensive to acquire, and manually annotated, restricting their scalability. This work presents a scalable synthetic dataset for vision-based airborne object detection, generated using AirSim integrated with Unreal Engine. Our dataset includes nine UAV models, multiple airborne object classes (birds, helicopters, balloons, aircraft), and diverse environmental conditions. Unlike existing synthetic datasets, our framework eliminates the need for photogrammetry or chroma-keying, making it fully automated and cost-effective. We evaluate our dataset by training detection models and benchmarking them against existing real-world datasets, demonstrating that synthetic data can improve model generalization while significantly reducing acquisition costs. The dataset is available at: www.kaggle.com/datasets/avldevelopment/air-to-air-object-detection-dataset
|
|
10:48-11:06, Paper MoAT4.2 | |
OPTRObot: A Synthetic Training Paradigm for Robotic Grasping of Specular Objects in Cluttered Environments |
|
Mikšík, Martin | Czech Technical University in Prague |
Zeman, Vít | Czech Technical University in Prague, Faculty of Electrical Enge |
Moroz, Artem | Czech Institute of Informatics, Robotics and Cybernetics, CTU In |
Burget, Pavel | Czech Technical University in Prague |
Keywords: Computer Vision for Manufacturing, Machine learning, Factory Automation
Abstract: We present a novel and complete vision-based pipeline for 6DoF object pose estimation in challenging industrial bin-picking scenarios, characterized by significant clutter, occlusions, and reflective surfaces. Our approach addresses the limitations of both computationally expensive fine-tuning methods and the current immaturity of foundation models in handling such complex environments. The key contribution lies in a balanced approach leveraging synthetic data augmentation and a streamlined architecture to achieve robust performance without extensive per-object optimization. The pipeline integrates existing state-of-the-art object detection, coarse pose estimation, and a render-and-compare refinement strategy, enabling accurate pose estimation from monocular images. We introduce a new benchmark on the recently released dataset, establishing a baseline for future research. Unlike existing industrial approaches, our system minimizes reliance on multi-sensor configurations, offering a cost-effective and easily deployable solution. We demonstrate the impact of error propagation across a complete pipeline using datasets that mirror real-world industrial conditions, in contrast to commonly used but less representative datasets.
|
|
11:06-11:24, Paper MoAT4.3 | |
You Only Look Once, but the Parts Keep Moving: YOLO-Based Workpiece Pose Classification for Aerodynamic Part Feeding |
|
Shieff, Dasha | Leibniz University Hannover |
Akchi, Mohamed | Leibniz University Hanover, Institute of Assembly Technology And |
Raatz, Annika | Leibniz Universität Hannover |
Keywords: Computer Vision for Manufacturing, Factory Automation, Intelligent and Flexible Manufacturing
Abstract: Flexible part feeding is a key challenge in modern automated production, where increasing uncertainties, shorter product life cycles, and cost pressures require adaptable solutions. Aerodynamic part feeding systems, which use controlled air jets to manipulate workpieces, offer a retooling-free alternative to traditional vibratory bowl feeders. To ensure precise workpiece handling, reliable pose classification is essential. This paper presents a machine learning-based framework for classifying workpiece poses using a class of convolutional neural networks (CNNs) called YOLO and an industrial camera. Instead of relying on manually labeled real-world images—which would introduce machine downtimes and increased setup efforts—the proposed method trains CNNs exclusively on synthetic datasets. Artificial images of workpieces in various poses are generated from CAD models using the open-source rendering engine Blender. Multiple CNN architectures are trained and evaluated, achieving a classification precision exceeding 95 % for most workpieces when tested on real workpiece images. The results demonstrate that the approach enables accurate and efficient workpiece pose classification without the need for labor-intensive dataset creation. While developed for aerodynamic part feeding, the proposed method is applicable to a wide range of industrial scenarios requiring automated workpiece orientation classification.
|
|
11:24-11:42, Paper MoAT4.4 | |
Voxel-Based Hierarchical Approximate Convex Decomposition for Efficient 3D Representation of Objects in Robotic Applications |
|
Mastromarino, Fabio | Politecnico Di Bari |
Scarabaggio, Paolo | Politecnico Di Bari |
Carli, Raffaele | Politecnico Di Bari |
Dotoli, Mariagrazia | Politecnico Di Bari |
Keywords: Collision Avoidance, Formal Methods in Robotics and Automation, Industrial and Service Robotics
Abstract: Approximate Convex Decomposition (ACD) is essential for industrial robotics, enabling efficient collision detection, motion planning, and physics-based simulation of robotic manipulators. However, traditional ACD methods, such as Hierarchical ACD (HACD) and Volumetric HACD (V-HACD), often suffer from high computational costs and over-segmentation, making them unsuitable for real-time robotic applications. This paper presents a novel voxel-based HACD (VX-HACD) approach designed to enhance computational efficiency while preserving the geometric fidelity of robotic manipulator components. The proposed approach first converts the input mesh into a structured voxel grid, simplifying the convex decomposition process. A gap-filling algorithm ensures topological continuity, preventing segmentation artifacts caused by voxel discretization. Additionally, a hierarchical voxel aggregation strategy reduces the number of convex components while maintaining accuracy, optimizing the representation for robotic applications. The methodology is validated on high-complexity robotic manipulator components, demonstrating reduced processing times, lower volumetric error, and fewer convex components compared to state-of-the-art ACD techniques. The proposed approach, while validated in the context of industrial robotics for collision-aware motion planning, can be applied to a wide range of applications requiring efficient convex decomposition in high-performance simulation (e.g., precision or surgical robotics, video games, and physics simulation).
|
|
11:42-12:00, Paper MoAT4.5 | |
Learning Rapid Turning, Aerial Reorientation, and Balancing Using Manipulator As a Tail |
|
Yang, Insung | Korean Advanced Institute of Science and Technology |
Hwangbo, Jemin | Korean Advanced Institute of Science and Technology |
Keywords: Motion Control, Model Learning for Control
Abstract: In this research, we investigated the innovative use of a manipulator as a tail in quadruped robots to augment their physical capabilities. Previous studies have primarily focused on enhancing various abilities by attaching robotic tails that function solely as tails on quadruped robots. While these tails improve the performance of the robots, they come with several disadvantages, such as increased overall weight and higher costs. To mitigate these limitations, we propose the use of a 6-DoF manipulator as a tail, allowing it to serve both as a tail and as a manipulator. To control this highly complex robot, we developed a controller based on reinforcement learning for the robot equipped with the manipulator. Our experimental results demonstrate that robots equipped with a manipulator outperform those without a manipulator in tasks such as rapid turning, aerial reorientation, and balancing. These results indicate that the manipulator can improve the agility and stability of quadruped robots, similar to a tail, in addition to its manipulation capabilities.
|
|
MoAT5 |
Room T5 |
Drones and UAV |
Regular Session |
Chair: Fikri, Muhamad Rausyan | Tampere University |
|
10:30-10:48, Paper MoAT5.1 | |
Tracking Multiple Moving Assets with a Smaller Group of Drones |
|
Shahsavar, Mohammadreza | University of Houston |
Rajasekaran, Siddharth | University of Houston |
Kabin, Richard A | University of Houston |
Yannuzzi, Michael | University of Houston |
Becker, Aaron | University of Houston |
Keywords: Swarms, Surveillance Systems, Optimization and Optimal Control
Abstract: A limited number of drones must monitor a large number of assets whose future motions are unknown, ensuring that each asset is monitored by at least one drone. The ideal configuration places the drone’s sensors as close as possible to their assets. This objective is achieved by minimizing the altitude of all drones, which in turn reduces the total area of their ground coverage footprints. There exists an optimal assignment of assets to drones. However, if the assets are moving, then the optimal assignment is not static. Assets must instead be swapped between drones. We present centralized and decentralized methods to cluster our moving assets statically each control iteration, and then our cover algorithm guarantees continuous 100% coverage over our moving assets while minimizing the total area covered by drones.
|
|
10:48-11:06, Paper MoAT5.2 | |
Human-Drone Swarm Interaction System for Persistent Monitoring of Large Disperse Area |
|
Kosonen, Petri | Tampere University |
Fadaeian, Yeganeh | Tampere University |
Eura, Reeta | Tampere University |
Sulameri, Jussi Ilari | Tampere University |
Fikri, Muhamad Rausyan | Tampere University |
Gusrialdi, Azwirman | Tampere University |
Keywords: Robot Networks, Swarms, Surveillance Systems
Abstract: This paper presents a human-drone swarm interaction system that enables adaptive and prioritized monitoring through an ergodic coverage control algorithm. The system is designed to ensure that drone swarms effectively cover a userdefined probability density function while maintaining both safety and persistent monitoring. A robust set of safety features prevents collisions between neighboring drones and prevents the drones from exiting the monitoring area. In addition, a fault tolerance algorithm is implemented to ensure continuous monitoring even if individual drones fail. The proposed algorithm is validated through real-world experiments using Crazyflie 2.1 from Bitcraze. The experimental results demonstrate that the algorithm successfully optimizes coverage as measured by the ergodic metric while maintaining safe operation.
|
|
11:06-11:24, Paper MoAT5.3 | |
On Formation Control Strategies in a Failure-Prevention Scenario for Multi-UAV Payload Transportation |
|
Delbene, Andrea | Univerity of Genoa |
Baglietto, Marco | University of Genoa |
Keywords: Autonomous Vehicle Navigation, Autonomous Agents, Motion and Path Planning
Abstract: This study considers a system composed of multiple Unmanned Aerial Vehicles (UAVs), carrying a payload by flexible cables. A formation control problem is addressed in the course of a path-following procedure. In the context of a scenario where one UAV could fail during the flight and is automatically detached from the system, the optimal formation to be kept for the whole mission is found, which would minimize the maximum path the remaining UAVs should travel to change the formation if any of the UAVs would fail. The focus is on a system with three UAVs and the transition to a formation of two UAVs is studied after possible failures. The evolution of the system is described by a set of dynamic equations; a considered software architecture is provided, and software-in-the-loop tests are performed to validate the proposed algorithms. Results compare the performances of the proposed formation against a static configuration in a failure scenario of one UAV during the path-following, highlighting the benefits of the first when the failure happens in critical points during the flight.
|
|
11:24-11:42, Paper MoAT5.4 | |
AeroPowerNet: Fixed-Wing UAV Power Consumption Estimation with an AI-Driven Hybrid Deep Learning Framework |
|
Wahid, Mirza Anas | École De Technologie Supérieure ÉTS |
Lahlou, Laaziz | École De Technologie Supérieure ÉTS |
Kara, Nadjia | ETS, University of Quebec |
Keywords: AI-Based Methods, Deep Learning in Robotics and Automation, Machine learning
Abstract: The instantaneous power consumption of electric-powered aerial vehicles in the aviation industry is crucial for optimizing flight activities. However, devising physics-based power consumption models requires deep insight into the dynamics of an unmanned aerial vehicle (UAV). This becomes challenging due to the variability and complexity of the parameters of airspeed, altitude, and motion of the flight controls. Therefore, a power consumption model is needed to map the influence of flight parameters on power utilization during varying flight phases. This model is crucial for mission planning, optimization, and extending the endurance of UAVs. This study introduces the AeroPowerNet framework, employing a data-driven approach based on deep learning to model UAV power consumption utilizing real-world flight data, which can serve as a foundation for future integration into fixed-wing UAV flight operations. We implemented a test of four different models: RNN-LSTM, GRU, Transformer, and a hybrid model, which were trained and compared. Experimental results show that the hybrid model outperforms all other models, achieving the best performance with MAE of 3.38W, R2 of 99.31%, and NRMSE of 0.21% for the first UAV flight, and an MAE of 7.52W, R2 of 96.68%, and NRMSE of 1.03% for the second flight.
|
|
11:42-12:00, Paper MoAT5.5 | |
Learning Based Approach towards AUV and Marine Life Interaction |
|
Kumar, Harshith | Drexel University |
P, Siri | Global Academy of Technology |
Keywords: Collision Avoidance, Motion and Path Planning, Reinforcement
Abstract: Autonomous Underwater Vehicles (AUVs) have significantly advanced in their capabilities, enabling exploration and operations in diverse underwater environments. While navigation in sparse and obstacle-free terrains is relatively simple, navigating deeper waters introduces new challenges due to the presence of marine life, such as migrating shoals of fish and predators hunting for food. This paper explores a novel approach with the integration of reinforcement learning for motion planning in underwater robotics. The primary focus is on the implementation of the Proximal Policy Optimization (PPO) algorithm and Gumbel Social Transformer (GST), which enables the robot to learn how to navigate in an underwater environment with dynamic obstacles. The obstacles are modeled as a shoal of fish and a predator, with the robot tasked with avoiding collisions. The underwater system will be simulated using the Robot Operating System (ROS) framework, and onboard sonar sensors will be employed to detect and track any dynamic obstacles in the vicinity.
|
|
MoAT6 |
Room T6 |
Human Robot Collaboration for Smart Manufacturing 1 |
Special Session |
Chair: Zhang, Yunbo | Rochester Institute of Technology |
Organizer: Wang, Weitian | Montclair State University |
Organizer: Zhou, MengChu | New Jersey Institute of Technology |
Organizer: Guo, Xiwang | Liaoning Petrochemical University |
Organizer: Qiao, Yan | Macau University of Science and Technology |
|
10:30-10:48, Paper MoAT6.1 | |
RoPESim: A Framework for Robot Manipulation Policy Evaluation Via Simulation (I) |
|
Wang, Xueting | Rochester Institute of Technology |
Dengxiong, Xiwen | Rochester Institute of Technology |
Bai, Shi | IServe Robotics |
Zhang, Yunbo | Rochester Institute of Technology |
Keywords: Human-Centered Automation, Collaborative Robots in Manufacturing, Intelligent and Flexible Manufacturing
Abstract: Predicting the robot manipulation plan prior to real-world execution is an important capability for robots to complete tasks in manufacturing environments. However, current AI-based manipulation planning methods lack this capability, making it difficult to deploy them in real-world manufacturing scenarios. In this work, we propose a simulation-based human-robot collaboration framework to evaluate predicted robot actions before real-world execution. The framework consists of a VLM-based scenario generator, a diffusion-based action simulator, and an evaluator. First, the scenario generator automatically creates a simulation scenario with objects and obstacles identified and placed. Then, the action simulator generates a series of manipulation action trajectories using a diffusion model in the simulation environment. Each action trajectory is assessed by the evaluator for collision failure, manipulation failure, and completion rate. The final evaluation results are returned to the user for verification and approval. In our experiment, we apply our framework to five chosen scenarios with highly potential collision failures. For each scenario, at least one feasible planned action trajectory is generated. It is then verified through real robot execution, demonstrating the effectiveness of the proposed framework.
|
|
10:48-11:06, Paper MoAT6.2 | |
Zero-Shot Robot Manipulation Via Action Decomposition and Composition (I) |
|
Dengxiong, Xiwen | Rochester Institute of Technology |
Wang, Xueting | Rochester Institute of Technology |
Li, Rui | Rochester Institute of Technology |
Zhang, Yunbo | Rochester Institute of Technology |
Keywords: Task Planning, Intelligent and Flexible Manufacturing, AI-Based Methods
Abstract: The ability to learn generalized skills from demonstrations and apply the acquired skills in various real-world scenarios is a key challenge for robot manipulation. Different from the typical robot learning tasks that learn the action from multiple demonstrated samples in a single task, zero-shot robot manipulation requires the robot to efficiently leverage multiple learned robot skills to accomplish a new task. In this paper, we propose an action decomposition/composition framework that efficiently transfers key manipulation skills to various new derivative tasks. Specifically, we first decompose one demonstration that encompasses several foundation skills that do not contain the derivative task. Then we adopt an action prediction approach to generate possible manipulation poses and the end pose for each subtask in the derivative task based on the robot's action and the video frames from the robot cameras. Since the generated poses may be impacted by previous misleading actions, we denoise the action by selecting the most possible manipulation poses based on the task to guide the robot manipulation.
|
|
11:06-11:24, Paper MoAT6.3 | |
Cognitive Architecture for Adaptive Skill Learning towards Fluent Human-Robot Collaborative Assembly (I) |
|
Zhou, Rui | The University of Auckland |
Lu, Yuqian | The University of Auckland |
Keywords: Collaborative Robots in Manufacturing, Cognitive Automation
Abstract: This paper presents a new cognitive architecture to enable human-robot collaborative assembly in complex, unstructured environments. While existing human-robot collaboration technologies have demonstrated success in simple setups, they struggle with fluent interaction in complex scenarios characterized by unpredictable human intentions and flexible workspace configurations. Our approach addresses these limitations by developing a cognitive architecture built upon the SOAR architecture, emphasizing internal cognitive processes, including real-time learning, adaptive decision-making, and knowledge evolution. The proposed system integrates perception, learning, memory, and execution components into a unified architecture that enables robots to continuously acquire skills through human interaction. Through HRC experiments for assembling the Generic Assembly Box (GAB), we demonstrated a 6.74% improvement in task success rates, a 10.9% reduction in execution time, and a 15.6% decrease in human instruction needs over traditional methods. These results validate the system’s potential to bridge the gap between traditional cognitive architectures and practical robotic applications, contributing to more adaptive and intelligent human-robot collaboration.
|
|
11:24-11:42, Paper MoAT6.4 | |
Development and Evaluation of a Deep Q-Network-Based Robot Learning Paradigm in Real-World Human-Robot Collaborative Tasks (I) |
|
Modery, Garrett | Montclair State University |
Wang, Weitian | Montclair State University |
Li, Rui | Montclair State University |
Chen, Yi | ABB US Research Center |
Zhou, MengChu | New Jersey Institute of Technology |
Keywords: Human-Centered Automation, Collaborative Robots in Manufacturing
Abstract: As robot systems continue to be advanced and implemented across industries, they do so typically by two methodologies, including standalone systems and collaborative ones. Standalone systems are typically set in their own areas, away from human workers. Collaborative robots share a common workspace with their human counterparts and work with them to complete tasks together efficiently and safely. Within this category of robotics, there exists another subcategory that describes the method of implementation and usage rather than simply the type of system. This subcategory involves how the machine will interact with workers and understand its role in interaction. Thus, it raises interest in the field of learning from demonstrations, where the robot may dynamically learn the behavior that is desired by the user rather than being explicitly hardcoded to perform its task. In this work, we develop a deep Q-network-based robot learning paradigm for human-robot partnerships in shared tasks. The proposed approach is validated in real-world human-robot collaborative contexts. In addition, to assess the performance of this approach and the acceptance from practitioners, we conduct a multi-metric user study. Implementation and evaluation results indicate that the developed solution works effectively for human-robot teamwork and receives high support from active users who rate it very well on several key metrics. The future work of this study is also discussed.
|
|
11:42-12:00, Paper MoAT6.5 | |
Safe and Intuitive Human-Robot Collaborative Assembly with Potential Field-Based Dynamic Obstacle Avoidance and Gestured-Based Communication Interface (I) |
|
Patel, Dipesh | The University of Auckland |
Phu, Nathan | The University of Auckland |
Lu, Yuqian | The University of Auckland |
Keywords: Human-Centered Automation, Collision Avoidance, Motion and Path Planning
Abstract: We present a robotic system that enhances human-robot collaboration (HRC) in manufacturing through intuitive gesture recognition and dynamic obstacle avoidance. Despite the growing adoption of HRC systems, significant challenges persist in creating natural interfaces and maintaining safety without impeding workflow. Our contributions include: (1) a motion planning algorithm with dampening and softening components for safe and efficient obstacle avoidance; (2) an accurate gesture recognition and tool detection model; and (3) a finite state machine that translates human gestures into fitting assistance actions. We validate our approach through a manufacturing case study that shows improved collaboration efficiency while maintaining safety. Results show that our system reduces production delays by eliminating the need for workers to divert attention from assembly tasks to manage robot interactions, thereby creating more natural and productive human-robot partnerships in manufacturing settings.
|
|
MoAT7 |
Room T7 |
Learning and Computation |
Regular Session |
Chair: Kim, Heeyoung | KAIST |
|
10:30-10:48, Paper MoAT7.1 | |
Development of an Automatic Algorithm for Predicting Physics-Informed Mechanical Properties in Intelligent Manufacturing of Hairpin Motors |
|
Shim, Young-Dae | Georgia Institute of Technology |
Kim, Jihun | Sungkyunkwan University |
Kim, Changhyeon | Sungkyunkwan University |
Park, Jihyun | HYUNDAI MOBIS |
Yang, DongWook | Hyundai Mobis |
Lee, Eun-Ho | Sungkyunkwan Univeristy |
Keywords: Intelligent and Flexible Manufacturing, Zero-Defect Manufacturing, Process Control
Abstract: This paper introduces a physics-informed approach for the real-time prediction of mechanical properties in metallic materials, specifically utilized in the intelligent manufacturing of hairpin coils for electric vehicle production. Inconsistencies in processing that result in variations in material properties frequently cause faults in metal forming operations, highlighting the necessity for continuous non-destructive monitoring. Although conventional tensile testing offers precise mechanical characterization, it is inadequate for inline monitoring. We create a prediction model utilizing eddy current testing (ECT), thermodynamic energy balance, and dislocation-based crystal plasticity. The method creates a theoretical connection between plastic deformation and electrical impedance, utilizing Matthiessen’s Rule and circuit theory to measure the impact of dislocation density on conductivity and magnetic energy transfer efficiency. An algorithm based on physics was deployed on a production line and adjusted using environmental sensors. Experimental validation using 22 hairpin coil samples shown remarkable agreement with tensile test results, achieving a prediction error margin of 3.5% and a root mean square error (RMSE) of 1.56 MPa. This study illustrates the effectiveness of the suggested framework for non-destructive, real-time prediction of mechanical properties in intelligent manufacturing systems.
|
|
10:48-11:06, Paper MoAT7.2 | |
Quantized Parallel Particle Filtering on FPGA for High-Speed Target Tracking in Edge Computing Devices |
|
Kim, Nayeon | Kumoh National Institute of Technology |
Lee, Heoncheol | Kumoh National Institute of Technology |
Lee, Jieun | LIG Nex1 |
Kim, Haerim | LIG Nex1 |
Choi, Wonseok | LIGnex1 |
Keywords: Optimization and Optimal Control
Abstract: This paper addresses the problem of applying particle filters to edge computing devices for tracking a high speed target with nonlinear and non-Gaussian characteristics in real time. It is not efficient to use conventional particle filters due to much computation time caused by sequential processes with a lot of particles. This paper proposes a quantized parallel particle filtering method which can be efficiently conducted on FPGA (Field-Programmable Gate Array) to accelerate most of the computation processes. The proposed method employs INT16 quantization rather than double-precision arithmetic for better compatibility with FPGAs in edge computing devices. Then, pipelining and unrolling methods were used for parallelization. Experimental results showed that the proposed method achieved 3.32× speedup in total compared to conventional particle filters on CPU. Also, despite measurement noises and quantized errors, the proposed method tracked the high-speed target accurately.
|
|
11:06-11:24, Paper MoAT7.3 | |
Performance Analysis for AIE-Based Matrix Multiplication Acceleration in Adaptive Compute Acceleration Algorithms |
|
Kim, Haerim | LIG Nex1 |
Lee, Jieun | LIG Nex1 |
Keywords: Optimization and Optimal Control, AI-Based Methods, Domain-specific Software and Software Engineering
Abstract: Real-time matrix computations, particularly the optimization of matrix multiplication using advanced technologies like the Versal AI Engine, are essential for modern guided weapon systems to enhance precision and responsiveness. This paper investigates the optimization of matrix multiplication (MMUL) using the Versal AI Engine (AIE) on the Xilinx VCK190 board, which is significant due to its advanced parallel processing capabilities that are ideal for enhancing computational efficiency in real-time applications. We analyze the performance impact of various block size configurations, such as 4×4×4 and 2×4×8, to exploit the board's parallel processing capabilities. Experimental results demonstrate that smaller block sizes, such as 4×4×4 and 2×4×8, achieve optimal computational efficiency by effectively balancing execution speed with memory access demands, thereby maximizing the use of available resources. These findings offer a scalable and efficient approach to enhancing real-time processing in guidance systems, potentially leading to significant improvements in precision and responsiveness of modern guided weapon systems.
|
|
11:24-11:42, Paper MoAT7.4 | |
Integrated Monitoring in Aquaponics - a Preliminary Study |
|
Edan, Yael | Ben-Gurion University of the Negev |
Turetzky, Dan | Ben Gurion University of the Negev |
Amir, Nadav | Ben Gurion University |
Aflalo, Eliahu | Ben Gurion University |
Eisa, Adam | Ben Gurion University of the Negev |
Keywords: Machine learning, Deep Learning in Robotics and Automation, Computer Vision in Automation
Abstract: Aquaponics is an innovative and sustainable food production system integrating aquaculture and hydroponics to create a symbiotic environment. In this preliminary study, the growth of both plants and fish grown in a nutrient film technique (NFT) system was monitored along one season. Computer vision techniques were developed to: 1) estimate lettuce plant size 2) measure root system size (length and width), and 3) estimate fish weight. Statistical analyses were applied to determine the effect of location along the NFT system to the root length and plant size. The fish length was used to estimate fish weight based on a regression model developed from fish length and weight measurements. By providing an integrated approach to automated monitoring systems we aim to contribute to the development of improved aquaponic cultivation.
|
|
11:42-12:00, Paper MoAT7.5 | |
Shape Feature-Informed Segmentation in the Rolling Process |
|
Lee, Doryun | POSCO |
Keywords: AI-Based Methods, Computer Vision in Automation, Computer Vision for Manufacturing
Abstract: In the steel industry, recognizing the shape of steel plates during the rolling process using AI-based cameras is crucial. However, due to the characteristics of industrial environments, issues such as steam and water often degrade the quality of images extracted by cameras. In such conditions, errors can occur during image segmentation, where non-steel areas are mistakenly classified as steel, or parts of the steel plate are incorrectly identified as non-steel. To address these issues, this study developed a segmentation method leveraging domain knowledge of steel plate characteristics. Specifically, we designed loss functions that consider the thickness-direction object recognition and the continuity of steel plates, integrating them into the segmentation process. As a result, compared to the existing YOLOv11 segmentation method, the proposed approach improved precision from 0.8393 to 0.8647, maintained recall at 1, and increased mAP50-95 from 0.8782 to 0.8991, indicating enhanced overall performance. This approach allows for more accurate recognition of steel plate shapes even in conditions where image quality is compromised, increasing its applicability in the rolling processes of the steel industry.
|
|
MoAT8 |
Room T8 |
Federated and Distributed Learning |
Special Session |
Chair: Yue, Xubo | University of Michigan |
Co-Chair: Reisi Gahrooei, Mostafa | University of Florida |
Organizer: Reisi Gahrooei, Mostafa | University of Florida |
Organizer: Yue, Xubo | University of Michigan |
Organizer: Gaw, Nathan | Georgia Institute of Technology |
|
10:30-10:48, Paper MoAT8.1 | |
Federated Learning for Deep Anomaly Detection with Noisy and Heterogeneous Data (I) |
|
Li, Ao | The Hong Kong University of Science and Technology (HKUST) |
Li, Songze | Southeast University |
Tsung, Fugee | HKUST |
Keywords: Machine learning
Abstract: We consider the problem of unsupervised learning for deep anomaly detection (DAD) in a federated learning (FL) network, consisting of a central server and many distributed clients. The conventional FedAvg algorithm involves clients training local DAD models with their private datasets, and then uploading these models to the server for aggregation into a global model. In practical scenarios, this framework faces two major challenges: 1) unlabeled training datasets may contain unknown anomalies; 2) training datasets from different clients are typically non-independent and identically distributed (non-IID). To address these problems, we propose FedDAD, a robust FL framework for training DAD models with noisy and heterogeneous data. In FedDAD, a small public dataset at the server, containing only a few normal samples (e.g., one sample from each normal class), serves as a normal anchor in the latent space across all clients. This anchor significantly improves the accuracy of identifying unknown anomalies at clients, in the presence of data heterogeneity. Having identified anomalies, clients utilize contrastive learning to train local feature extractors that further enhance the separation between normal and abnormal data. Extensive experimental results demonstrate the uniform superiority of FedDAD over all FL baselines across various settings and datasets. Furthermore, the model trained with FedDAD even achieves comparable performance to the model trained in a centralized manner.
|
|
10:48-11:06, Paper MoAT8.2 | |
Adaptive Asynchronous Federated Learning with Convolutional Autoencoders in Heterogeneous Systems (I) |
|
Kim, Jungmin | Rutgers University |
Guo, Weihong | Rutgers University |
Keywords: Machine learning
Abstract: Federated Learning (FL) has emerged as a promising solution for distributed machine learning in various systems, enabling collaborative model training while preserving data privacy. However, traditional synchronous FL approaches face challenges in heterogeneous environments where clients have varying computational capabilities and network conditions. This paper proposes a novel asynchronous FL framework with convolutional autoencoders for unsupervised representation learning (URL) in distributed systems. Unlike conventional FL methods that require synchronous updates from all clients, our approach enables flexible client participation based on their update history, making it particularly suitable for real-time scenarios where continuous model improvement is critical. The framework addresses key challenges in diverse systems: privacy preservation, unlabeled data handling, and system heterogeneity. Experimental results in the MNIST dataset demonstrate that our method achieves faster convergence and lower training time compared to centralized learning (CL), individual learning (IL), and traditional FL approaches, such as Federated Averaging (FedAvg) and FedProx, while maintaining comparable accuracy.
|
|
11:06-11:24, Paper MoAT8.3 | |
Federated Learning of Dynamic Bayesian Network Via Continuous Optimization from Time Series Data (I) |
|
Yue, Xubo | Northeastern University |
Keywords: Causal Models
Abstract: Traditionally, learning the structure of a Dynamic Bayesian Network has been centralized, requiring all data to be pooled in one location. However, in real-world scenarios, data are often distributed across multiple entities (e.g., companies, devices) that seek to collaboratively learn a Dynamic Bayesian Network while preserving data privacy and security. More importantly, due to the presence of diverse clients, the data may follow different distributions, resulting in data heterogeneity. This heterogeneity poses additional challenges for centralized approaches. In this study, we first introduce a federated learning approach for estimating the structure of a Dynamic Bayesian Network from homogeneous time series data that are horizontally distributed across different parties. We then extend this approach to heterogeneous time series data by incorporating a proximal operator as a regularization term in a personalized federated learning framework. To this end, we propose texttt{FDBNL} and texttt{PFDBNL}, which leverage continuous optimization, ensuring that only model parameters are exchanged during the optimization process. Experimental results on synthetic and real-world datasets demonstrate that our method outperforms state-of-the-art techniques, particularly in scenarios with many clients and limited individual sample sizes.
|
|
11:24-11:42, Paper MoAT8.4 | |
A Federated Semi-Supervised Approach to Predicting Parkinson’s Disease Severity from Tabular Data (I) |
|
Allsop, Jennifer | Air Force Institute of Technology |
Gaw, Nathan | Georgia Institute of Technology |
Reisi Gahrooei, Mostafa | University of Florida |
Cox, Bruce | Air Force Institute of Technology |
Johnstone, Chancellor | Air Force Institute of Technology |
Keywords: AI and Machine Learning in Healthcare
Abstract: Data privacy is a growing concern in real-world machine learning (ML) applications, particularly in sensitive domains like healthcare. Federated learning (FL) offers a promising solution by enabling model training across decentralized, private data sources. However, both traditional ML and FL approaches typically assume access to fully labeled datasets, an assumption that rarely holds in practice. Users often lack the time, motivation, or expertise to label their data, making labeled examples scarce. This paper proposes a federated semi-supervised learning (FSSL) framework that learns from a small set of labeled data alongside a large volume of unlabeled data. Our approach combines FL with VIME, a leading semi-supervised learning (SSL) method for tabular data. Unlike image or text data, tabular data presents unique challenges for SSL due to the absence of transferable pretext tasks. We evaluate our method of predicting Parkinson’s disease severity and show that it significantly outperforms both supervised and SSL baselines across varying proportions of labeled data. The model achieves an RMSE of 7.74 and an MAE of 6.26 in the most challenging setting with only 10% labeled data, substantially outperforming both supervised FL and standalone SSL baselines, demonstrating the strength of our method under limited supervision. These results show that our method effectively leverages unlabeled data to enhance predictive performance in a privacy-preserving, real-world setting.
|
|
11:42-12:00, Paper MoAT8.5 | |
Federated Automatic Latent Variable Selection in Multi-Output Gaussian Processes (I) |
|
Gao, Jingyi | University of Virginia |
Chung, Seokhyun | University of Virginia |
Keywords: Probability and Statistical Methods, Learning and Adaptive Systems, Optimization and Optimal Control
Abstract: This paper explores a federated learning approach that automatically selects the number of latent processes in multi-output Gaussian processes (MGPs). The MGP has seen great success as a transfer learning tool when data is generated from multiple sources or units. A common approach in MGPs to transfer knowledge across units involves gathering all data from each unit to a central server and extracting common independent latent processes to express each unit as a linear combination of the shared latent patterns. However, this approach poses key challenges in (i) determining the adequate number of latent processes and (ii) relying on centralized learning which leads to potential privacy risks and significant computational burdens on the central server. To address these issues, we propose a hierarchical model that places spike-and-slab priors on the coefficients of each latent process. These priors help automatically select only needed latent processes by shrinking the coefficients of unnecessary ones to zero. To estimate the model while avoiding the drawbacks of centralized learning, we propose a variational inference-based approach, that formulates model inference as an optimization problem compatible with federated settings. We then design a federated learning algorithm that allows units to jointly select and infer the common latent processes without sharing their data. Simulation and case studies on Li-ion battery degradation demonstrate the advantageous features of our proposed approach.
|
|
MoAT9 |
Room T9 |
Human-Robot and HCA 1 |
Regular Session |
Chair: Rahman, S M Mizanoor | Pennsylvania State University |
|
10:30-10:48, Paper MoAT9.1 | |
Emotion Recognition: Low-Rank Multimodal Shear and Splicing Fusion |
|
Wang, Jiaming | University of Chinese Academy of Sciences |
Yue, ZhiJian | University of Chinese Academy of Sciences |
Huang, Jiangpeng | Chongqing Aerospace Rocket Electronics Technology Co., Ltd |
Yang, Leiyu | Chongqing Aerospace Rocket Electronics Technology Co., Ltd |
Liu, Yujie | Chongqing Aerospace Rocket Electronics Technology Co., Ltd |
Peng, Yi | Chongqing Aerospace Rocket Electronics Technology Co., Ltd |
Xia, Chengzhu | Chongqing Aerospace Rocket Electronics Technology Co., Ltd |
Wang, Xupeng | Chongqing Aerospace Rocket Electronics Technology Co., Ltd |
Wang, Yong | University of Chinese Academy of Sciences |
Keywords: Computer Vision in Automation, Data fusion, Deep Learning in Robotics and Automation
Abstract: We propose a low-rank multi-modal shearing and splicing fusion (LMSSF) method for accurate emotion recognition by effectively integrating the information from three modalities: text, image, and voice. Recognizing users emotions accurately using social media information is challenging due to the diversity of user-generated data and the difficulty in accurately identifying and extracting features from multi-modal information. Our method fills the gap in the multi-modal field by leveraging feature extraction and fusion techniques to combine voice, image, and text modalities for emotion recognition. To address the interaction of multi-modal information, we introduce standard feature extraction and private feature retention methods to ensure the integrity of the multimodal information. Further-more, we have developed a step-by-step discrimination approach that significantly reduces calculation and discrimination time by distinguishing standard features of three modalities, common features of two modalities, and private features. Our method effectively solves the problem of accurately recognizing emotions in social media users with diverse information modalities, achieving 97.3% accuracy with only 180K and an accuracy improvement of close to 10% than others.
|
|
10:48-11:06, Paper MoAT9.2 | |
Towards Human-Understandable Visual Recognition for Nonexperts in Industrial Inspection: A Case Study for Car Manufacturing Lines |
|
Sardari, Sarvenaz | Mercedes-Benz AG |
Fernandes, Freddy | Mercedes Benz |
Araya Martinez, Jose Moises | Mercedes-Benz AG, TU Berlin |
Zak, Jan Alexander | Mercedes-Benz AG |
Roitberg, Alina | University of Stuttgart |
Keywords: Computer Vision for Manufacturing, Human-Centered Automation, Computer Vision in Automation
Abstract: Despite growing interest in attribution-based eXplainable Artificial Intelligence (XAI) methods, existing tools designed by Artificial Intelligence (AI) experts often overlook the needs of end users such as Blue Collar Workers (BCW), increasing environmental and financial risks for companies relying on them for critical decisions in industrial inspection. In this work, we aim to understand BCW needs in car manufacturing lines and improve the transparency and trustworthiness of visual recognition tools. We evaluate existing XAI methods, such as Shapley values and saliency maps, and explore how enhancing them with Large Multimodal Model (LMM)s can help bridge the gap between model explanations and human intuition. Our goal is to provide explanations that better align with the needs of nonexpert users. Our hypothesis is validated through a custom-designed user study involving 20 nonexpert users, specifically addressing two key use cases: nut welding quality assurance in body-in-white assembly and part carrier inspection for assembly line readiness. The results demonstrate that humans, assisted by our XAI methods, satisfaction and efficiency increased in use cases consisting of object detection or classification. These findings suggest that augmenting attribution-based methods with LMM explanations offers great potential for human-understandable XAI interfaces in automotive manufacturing, reducing the risks of over- or under-trusting AI in critical production settings.
|
|
11:06-11:24, Paper MoAT9.3 | |
Trust-Triggered Cyber-Physical-Human System for Human-Robot Collaboration in Flexible Manufacturing |
|
Rahman, S M Mizanoor | Pennsylvania State University |
Keywords: Human-Centered Automation, Collaborative Robots in Manufacturing, Cyber-physical Production Systems and Industry 4.0
Abstract: We proposed and investigated a bidirectional trust-triggered cyber-physical-human (CPH) system framework for human-robot collaborative assembly in flexible manufacturing. For this purpose, we developed a one human-one robot hybrid cell where the human and the robot collaborated with each other to perform the assembly operation of different manufacturing components in a flexible manufacturing setup. In the proposed framework, we configured the human-robot collaborative system using three interconnected components of a CPH system: cyber system (software system), physical system (the robot, sensors and necessary hardware), and human system (the human co-worker, supervisor and work environment). We divided the functions of the CPH framework into three interconnected modules: computing, communication and control. We proposed a model to compute human and robot’s bidirectional trust in real-time to monitor and control the performance of the CPH framework. We evaluated the performance of the framework implementing it on a human-robot collaborative assembly setup for different experimental conditions considering variations in: (i) computing complexity, (ii) communication complexity, (iii) control complexity, and (iv) human perceptual complexity. We compared the results among those experimental conditions and identified a condition that enabled the CPH framework to demonstrate the highest level of performance. The results revealed satisfactory performance of the CPH framework in terms of human-robot interaction, and task efficiency and quality. The results can help incorporate modularity and objectivity in the design, development, analysis and control of human-robot collaborative systems configuring them in the form of a CPH framework.
|
|
11:24-11:42, Paper MoAT9.4 | |
Efficient and Human Centered Industry 5.0 Data Propagation on the Operational Technology Level. Case Study with OPC UA Interfacing, Node-RED and Ignition |
|
Korodi, Adrian | University Politehnica Timisoara, Faculty of Automation and Comp |
Vesa, Vlad-Cristian | University Politehnica Timisoara, Faculty of Automation and Comp |
Dontu, Raul Andrei | University Politehnica Timisoara, Faculty of Automation and Comp |
Keywords: Human-Centered Automation, Cyber-physical Production Systems and Industry 4.0, Factory Automation
Abstract: Industry 5.0 is focused on human centricity, sustainability and resilience, but in the meantime it is relying on Industry 4.0 main objective that refers to efficiency increase. The operational technology (OT) level usually lacks data structuring and context, these steps being attributed to the middleware interfacing with the information technology (IT), or higher SCADA levels. Higher the hierarchical level, lower the technological process related knowledge becomes, and therefore representations may not be optimal. Also, various supervisory control and data acquisition (SCADA) solutions development targeting the same controlled process lead to different perspectives, higher costs, larger development times, and a more difficult maintenance. Considering both legacy systems and technological progress, the current paper proposes a solution that assures efficient and human centered structured, contextualized, and graphically sustained data propagation on the OT level. The work approaches and builds upon industry adopted environments and protocols in order to assure a fast adoption and a large impact of the solution. A Node-RED and Ignition based case study is presented, representing the PLC to SCADA data integration levels. The structured and graphically depicted data transfer and representation using Open Platform Communication Unified Architecture (OPC UA) protocol is showing good results, without the need of additional developments on the SCADA levels.
|
|
11:42-12:00, Paper MoAT9.5 | |
Real-Time Social Presence Modulation of Embodied AI-Based Robots: An Audio-Centric Approach |
|
Wijesinghe, Nipuni | University of Canberra |
Jayasuriya, Maleen | University of Canberra |
Grant, Janie Busby | University of Canberra |
Herath, Damith | University of Canberra |
Keywords: Human-Centered Automation, Machine learning, Behavior-Based Systems
Abstract: Recent advancements in large language models have enabled the robotic embodiment yielding AI-driven robots with simulated personalities and social adeptness. However, modulating embodiment and presence remains overlooked. Unlike humans and animals, who instinctively adjust presence, heightened in emergencies, subdued in focus, robots operate rigidly, lacking such adaptability, like modulating vocal tone contextually. This paper proposes a framework for real-time, context-based social presence modulation in embodied AI which comprises three components: Sensor Integration, Context Identification, and Adaptive Presence Expressions. In our initial implementation, we use audio-based context detection, expandable to visual and physiological cues. The system fuses CNN-based ambient sound detection, speech-to-text keyword analysis, and sentiment evaluation via a transformer pipeline, statistically fused and classifying context through a Naive Bayes model. We define three primary states—Alarmed (emergency scenario), Social ('everyday' functioning scenario), and Disengaged (no system presence scenario) and two intermediate states to address uncertainty: Alert (between Alarmed and Social) and Passive (between Social and Disengaged). Real-world testing on a robot confirmed real-time modulation of actions and speech, validating the framework’s efficacy in adaptive social presence.
|
|
MoAT10 |
Room T10 |
Manipulation 1 |
Regular Session |
Chair: Komaee, Arash | Southern Illinois University, Carbondale |
|
10:30-10:48, Paper MoAT10.1 | |
Estimating Force/Torque Sensor Offsets and Gravity Parameters Using Only Wrench Measurements to Facilitate Human Demonstration of Robot Manipulation Tasks in Contact |
|
Mousavi Mohammadi, Ali | KU Leuven |
Vochten, Maxim | KU Leuven |
De Schutter, Joris | KU Leuven |
Aertbelien, Erwin | KU Leuven |
Keywords: Calibration and Identification, Model Learning for Control, Compliant Assembly
Abstract: In human demonstration of manipulation tasks involving contact, a tool equipped with a force/torque (FT) sensor is typically used to capture contact wrenches (i.e., forces and moments). For accurate measurements, the sensor must be properly calibrated to ensure that it only records the contact wrenches during tool-environment interactions. This estimation in our context refers to estimating the sensor offsets, as well as the gravity parameters, i.e. mass and center of mass (COM), of the rigid object mounted on the FT sensor. Proper estimation enables access to the true contact wrench, which facilitates construction of reliable task models from human demonstrations. We propose a method for estimating sensor offsets and gravity parameters using only wrench measurements. This is particularly beneficial in scenarios where orientation information is unavailable or unreliable. By relying solely on wrench data, the method mitigates motion-related noise, eliminates the need for sensor orientation calibration, and remains computationally efficient, making it well-suited for real-time applications. The method's effectiveness is evaluated against a baseline method that utilizes both wrench and accurate orientation measurements, whereas our method relies solely on wrench data. The results show that both methods perform well, particularly when the excitation range exceeds 10^{circ}. However, the proposed method consistently outperforms the baseline across all experiments within this excitation range.
|
|
10:48-11:06, Paper MoAT10.2 | |
HARMONI: Haptic-Guided Assistance for Unified Robotic Tele-Manipulation and Tele-Navigation |
|
Sripada, Venkatesh | University of Surrey |
Khan, Muhammad Arshad | University of Lincoln |
Foecker, Julia | University of Lincoln |
Parsa, Soran | University of Huddersfield |
Palavajjhala, Susmitha | Jaguar Land Rover |
Maior, Horia Alexandru | University of Nottingham |
Ghalamzan Esfahani, Amir Masoud | University of Surrey |
Keywords: Control Architectures and Programming, Behavior-Based Systems, Robust/Adaptive Control
Abstract: Shared control, which combines human expertise with autonomous assistance, is critical for effective teleoperation in complex environments. While recent advances in haptic-guided teleoperation have shown promise, they are often limited to simplified tasks involving 6- or 7-DoF manipulators and rely on separate control strategies for navigation and manipulation. This increases both cognitive load and operational overhead. In this paper, we present a unified tele-mobile manipulation framework that leverages haptic guided shared control. The system integrates a 9-DoF follower mobile manipulator and a 7-DoF leader robotic arm, enabling seamless transitions between tele-navigation and tele-manipulation through real-time haptic feedback. A user study with 20 participants under real-world conditions demonstrates that our framework significantly improves task accuracy and efficiency without increasing cognitive load. These findings highlight the potential of haptic-guided shared control for enhancing operator performance in demanding teleoperation scenarios
|
|
11:06-11:24, Paper MoAT10.3 | |
Immersive Teleoperation Framework for Locomanipulation Tasks |
|
Boehringer, Takuya | University College London |
Embley-Riches, Jonathan | University College London |
Hammoud, Karim | University College London |
Modugno, Valerio | University College London |
Kanoulas, Dimitrios | University College London |
Keywords: Virtual Reality and Interfaces, Telerobotics and Teleoperation
Abstract: Recent advancements in robotic loco-manipulation have leveraged Virtual Reality (VR) to enhance the precision and immersiveness of teleoperation systems, significantly outperforming traditional methods reliant on 2D camera feeds and joystick controls. Despite these advancements, challenges remain, particularly concerning user experience across different setups. This paper introduces a novel VR-based teleoperation framework designed for a robotic manipulator integrated onto a mobile platform. Central to our approach is the application of Gaussian splattering, a technique that abstracts the manipulable scene into a VR environment, thereby enabling more intuitive and immersive interactions. Users can navigate and manipulate within the virtual scene as if interacting with a real robot, enhancing both the engagement and efficacy of teleoperation tasks. An extensive user study validates our approach, demonstrating significant usability and efficiency improvements. Two-thirds (66%) of participants completed tasks faster, achieving an average time reduction of 43%. Additionally, 93% preferred the Gaussian Splat interface overall, with unanimous (100%) recommendations for future use, highlighting improvements in precision, responsiveness, and situational awareness. Finally, we demonstrate the effectiveness of our framework through real-world experiments in two distinct application scenarios, showcasing the practical capabilities and versatility of the Splat-based VR interface.
|
|
11:24-11:42, Paper MoAT10.4 | |
Robust Feedback Linearization for Noncontact Manipulation of Magnetic Particles by a Hexagonal Array of Electromagnets |
|
Hasan, MD Nazmul | Southern Illinois University Carbondale |
Komaee, Arash | Southern Illinois University, Carbondale |
Keywords: Robust/Adaptive Control, Medical Robots and Systems, Automation at Micro-Nano Scales
Abstract: This paper develops a robust feedback control law for planer steering of magnetic particles in a circular workspace using a hexagonal arrangement of electromagnets encircling the workspace. The electromagnets are actuated independently by a feedback loop to control their aggregate magnetic field flexibly, which is then exploited to exert magnetic force on the particles as needed for steering them along desired reference trajectories. This magnetic force is a highly nonlinear function of both the position of magnetic particles and the control voltages actuating the electromagnets. The control strategy in this work is to cancel this nonlinearity using a nonlinear inverse map, which in effect, realizes the concept of feedback linearization. Such an inverse map is developed in this paper in a specific way that drastically enhances its robustness against inherent modeling errors and uncertainty in the magnetic field. The resulting robust inverse map notably improves the performance of feedback control, as demonstrated by experiments in this paper.
|
|
11:42-12:00, Paper MoAT10.5 | |
Hierarchical Control Framework for Collision-Free Collaborative Loco-Manipulation of Large and Heavy Objects |
|
Rigo, Alberto | University of Southern California |
Ma, Junchao | University of Southern California |
Chun, Nathan | University of Southern California |
Gupta, Satyandra K. | University of Southern California |
Nguyen, Quan | University of Southern California |
Keywords: Optimization and Optimal Control, Planning, Scheduling and Coordination, Collision Avoidance
Abstract: Legged manipulators offer significant advantages over traditional mobile manipulators, particularly in navigating uneven terrain. However, they are limited by payload capacity and the dimensions of the objects they can manipulate. Collaboration between legged manipulators can mitigate these challenges, but it introduces complexities in coordinating the robots to manipulate and locomote as a unified team. This paper presents a novel hierarchical framework for collaborative loco-manipulation with quadruped manipulators, designed to address the challenges of coordinating a robot team and handling bulky, heavy objects. The framework starts with a manipulation planner that computes the desired trajectory of the object, incorporating obstacle avoidance. A subsequent mapping between the object's trajectory and the robot's states ensures that commands are compatible with each robot. Finally, a decentralized loco-manipulation controller tracks the reference for the end effector while steering the robot base to avoid obstacles and compensate for manipulation forces. Our approach has been validated through simulations and hardware experiments, demonstrating the framework's versatility across different robot team compositions and a variety of payload-carrying tasks. We highlight the critical role of obstacle avoidance when manipulating bulky objects in real-world scenarios, particularly the need to prevent contact between the ground and the object being manipulated.
|
|
MoAT11 |
Room T11 |
Best Conference Papers Competition |
Special Session |
Chair: Lennartson, Bengt | Chalmers University of Technology |
|
10:30-10:55, Paper MoAT11.1 | |
A Fast Solution Method for Unit Commitment with Renewable Power Via Ordinal Optimization (I) |
|
Xu, Zhibo | Tsinghua University |
Liu, Siwei | Tsinghua University |
Jia, Qing-Shan | Tsinghua University |
Keywords: Optimization and Optimal Control, AI-Based Methods, Smart Grids
Abstract: The high penetration of renewable power has led to an increasing demand for rapid solutions to large scale security-constrained unit commitment (SCUC) problems. In this paper, we develop a fast solution method for SCUC problems with a significant number of renewable power scenarios based on Ordinal Optimization (OO). First, we propose a Crude Feasibility Model to efficiently generate estimated feasible solutions. The Crude Feasibility Model uses machine learning (ML) techniques to generate high-quality initial solutions, and then evaluates the feasibility of these solutions based on feasibility conditions with negligible computational effort. Second, we propose an OO-based method to rapidly seek good enough solutions while still providing probabilistic performance guarantees. The computational burden is reduced from the original large-scale mixed-integer programming (MIP) problem to a few linear programming (LP) problems that can be solved in parallel, significantly accelerating the solution process. Numerical experiments on the IEEE 118-bus system demonstrate that our method achieves the near-optimal solution with a performance gap of no more than 0.4%, while achieving a speedup of 8.39 to 27.99 times across different scales compared to Gurobi. Our method exhibits high computational efficiency and scales effectively to larger problems.
|
|
10:55-11:20, Paper MoAT11.2 | |
Multi-Modal Generative Modeling of Event Sequences and Time Series for Solar PV Systems |
|
Huang, Jiayu | Arizona State University |
Xu, Boyang | Arizona State University |
Liu, Yongming | Arizona State University |
Yan, Hao | Arizona State University |
Keywords: Data fusion, Machine learning, Deep Learning in Robotics and Automation
Abstract: This paper presents a multimodal learning model designed to simulate a solar energy plant system for predictive maintenance and fault prediction. We propose a novel approach to enhance system generation through conditional generation by leveraging both event sequences and time-series data. We employ a Transformer Hawkes process to encode event data and an iTransformer to encode time-series data. We then introduce a co-attention mechanism to effectively combine these two modalities, capturing dependencies and interactions between events and time-series signals. The integrated representations enable a conditional generation framework that iteratively predicts future system states and events over an extended time horizon. The simulation is based on a comprehensive collection of data from the Red Rock Solar Site in Arizona. Experimental evaluations demonstrate that our approach offers promising applications in predictive maintenance, fault detection, and energy optimization in solar power systems.
|
|
11:20-11:45, Paper MoAT11.3 | |
Extended Invalid Action Mask: Training-Time Size-Agnostic Safe Reinforcement Learning for Flexible Job Shop Scheduling Problems |
|
Cai, Weilin | The Chinese University of Hong Kong, Shenzhen |
Zheng, Wenjun | The Chinese University of Hong Kong, Shenzhen |
Wang, Zhaoli | The Chinese University of Hong Kong, Shenzhen |
Chen, Yilan | Columbia University in the City of New York |
Mao, Jianfeng | The Chinese University of Hong Kong, Shenzhen |
Keywords: Planning, Scheduling and Coordination, Reinforcement, Intelligent and Flexible Manufacturing
Abstract: The flexible job shop scheduling problem (FJSP) is an NP-hard problem in which machine assignments are additionally considered compared to the classical job shop scheduling problem. It has received increased attention in recent years, as its global hard constraints highlight the importance of policy safety, especially in real production contexts. Deep reinforcement learning (DRL) has become a popular method for solving FJSP; however, ensuring both size-agnostic and safety-guaranteed properties during training remains challenging. Penalty-based approaches allow infeasible actions, degrading policy quality, while standard invalid action masking prevents training across different problem sizes. To address this, we propose an extended action-masking mechanism that maintains semantic consistency across varying scales while strictly enforcing action feasibility. Our framework integrates state and action embeddings to support adaptive scheduling and multi-scale training. Empirical results demonstrate improved performance over the latest DRL-based methods, indicating that our approach is scalable and robust for production scheduling.
|
|
MoBT1 |
Room T1 |
AI-Powered Collaborative Manufacturing |
Special Session |
Chair: Wang, Junkai | Tongji University |
Co-Chair: Li, Rui | Montclair State University |
Organizer: Wang, Junkai | Tongji University |
Organizer: Chang, Qing | University of Virginia |
Organizer: Matta, Andrea | Politecnico Di Milano |
Organizer: Wang, Xi Vincent | KTH Royal Institute of Technology |
Organizer: Li, Xiaoou | Center of Research and Advanced Studies of NationalPolytechnic I |
Organizer: Tang, Ying | Rowan University |
Organizer: Yan, Chao-Bo | Xi'an Jiaotong University |
Organizer: Zhu, Haibin | Nipissing University |
|
14:45-15:03, Paper MoBT1.1 | |
Emotion-Based Robotic Action Optimization System for Human-Robot Collaboration (I) |
|
Murphy, Jordan | Montclair State University |
Parron, Jesse | Montclair State University |
Wang, Weitian | Montclair State University |
Li, Rui | Montclair State University |
Keywords: Collaborative Robots in Manufacturing, Human-Centered Automation, Assembly
Abstract: Although collaborative robots aim to boost productivity in manufacturing, misalignment between robot’s actions and the human’s intentions of the collaboration can cause discomfort or frustration, potentially discouraging future collaborations. Inspired by human-to-human interactions, this paper aims to help solve this problem by enabling a collaborative robot to adjust how it moves and acts based on human emotions to improve the overall collaboration process. To achieve this goal, an emotion-based robotic action optimization system was developed and integrated into a collaborative robot. The system utilizes hierarchical reinforcement learning (HRL) to train and guide the robot to adjust its actions according to detected human emotions. Specifically, this paper introduces (1) a HRL model that leverages a vision-audio-based emotion recognition model to determine and adjust robot actions (movement speed, drop-off distance, reaction time, and rate of success) according to human emotions. The goal of this model is to avoid negative emotions of the human user that are triggered by the robot actions. (2) A robot motion control method driven by recognized human intentions and actions from the HRL model, guiding the robot arm and gripper to adjust movements and deliver parts as desired. (3) objective and subjective evaluation experiments to evaluate the effectiveness of the developed system. The results and analysis of the experiments demonstrated the effectiveness of our developed system in a human-robot collaboration setting.
|
|
15:03-15:21, Paper MoBT1.2 | |
Robust Electricity Forecasting in Smart Buildings with Missing Data: A Concept Echo State Network Approach (I) |
|
Zhu, Yingqin | CINVESTAV |
Li, Xiaoou | Center of Research and Advanced Studies of NationalPolytechnic I |
Yu, Wen | CINVESTAV-IPN |
Keywords: Smart Home and City, AI-Based Methods, Machine learning
Abstract: Accurate electricity forecasting is essential for smart building energy optimization, yet dynamic usage and missing data pose significant challenges. This paper introduces a Concept Echo State Network (CESN) approach for robust forecasting. CESNs extract semantic concepts, constructing a dynamic matrix via a recursive process. A context-aware multi-objective optimization minimizes errors and maximizes robustness against missing data. A hierarchical fusion enhances adaptability. Evaluations on real-world datasets, simulating data gaps, demonstrate that CESNs outperform existing methods. This approach delivers superior accuracy and actionable insights, even with incomplete data. This research advances smart building energy management through interpretable, robust electricity prediction, directly addressing missing data challenges.
|
|
15:21-15:39, Paper MoBT1.3 | |
Multi-Products Production Control and Human-Autonomous Truck Distribution Planning with the Use of Collaborative Multi-Agent Reinforcement Learning (I) |
|
Deng, Yang | City University of Hong Kong |
Keywords: Inventory Management, AI-Based Methods, Optimization and Optimal Control
Abstract: In the era of Industry 5.0, the integration of human-centric values with advanced automation is pivotal. This paper addresses a complex distribution planning problem that necessitates the collaboration of human distributors and autonomous trucks within a manufacturing setting. We formulate the problem as a Partially Observable Markov Decision Process (POMDP) and propose a hierarchical multi-agent reinforcement learning (MARL) framework that unifies production control with distributor planning. At the upper level, two different agents determine production quantities for both preordered and on-time products, while concurrently planning appropriate distribution channels based on fluctuating market conditions. At the lower level, a decentralized execution strategy enables human and robotic distributors to dynamically fulfill customer demands. Key to this approach is a communication-based centralized learning scheme that leverages differentiable inter-agent learning to coordinate decisions, ensuring that autonomous operations do not undermine the human workforce, which is a core tenet of Industry 5.0. Computational experiments demonstrate that the proposed MARL framework outperforms traditional heuristic algorithms and conventional RL methods, achieving superior performance through improved coordination and adaptability. The results highlight not only the efficiency gains from automation but also the essential balance maintained between technological advancement and human involvement.
|
|
15:39-15:57, Paper MoBT1.4 | |
Manufacturing Task Scheduling Optimization with Buffer Zones Using Parallel Advantage Actor-Critic Algorithm (I) |
|
Wenjing, Zeng | New Jersey City University |
Guo, Xiwang | Liaoning Petrochemical University |
Wang, Jiacun | Monmouth University |
Tang, Ying | Rowan University |
Wang, Weitian | Montclair State University |
Bin, Hu | Kean University |
Wang, Nan | William Paterson University |
Keywords: Intelligent and Flexible Manufacturing, Robust Manufacturing, Remanufacturing
Abstract: With the diversification of market demands and the limited availability of production resources, optimizing the allocation of manufacturing tasks within the constraints of limited workstation resources becomes increasingly important. This study explores the workstation buffer zone disassembly-assembly line balancing problem, aiming to improve the operational area of workstations and reduce component transportation costs. Based on the characteristics of the problem, a computational model is developed to maximize the recovery profit. To facilitate the search for an optimal solution, the parallel advantage actor-critic (PA2C) algorithm is used to address this problem, and the feasibility of the developed approach in disassembly and assembly lines is analyzed. Comparisons with AC and the original A2C suggest the competitive performance of the proposed solution.
|
|
15:57-16:15, Paper MoBT1.5 | |
Deterioration-Aware Collaborative Energy-Efficient Batch Scheduling and Maintenance for Unrelated Parallel Machines Based on Improved MOEA/D |
|
Wang, Haixuan | Tongji University |
Qiao, Fei | Tongji University |
Jiang, Shengxi | Tongji University |
Zhu, Haibin | Nipissing University |
Wang, Junkai | Tongji University |
Keywords: Manufacturing, Maintenance and Supply Chains, Planning, Scheduling and Coordination, Sustainable Production and Service Automation
Abstract: The deterioration phenomenon is common and lasting as machines' service time increases within energy-intensive manufacturing processes such as heat treatment, which may bring about processes time extension or even the breakdown of a machine. It is crucial to collaboratively optimize batch scheduling and maintenance to ensure stable, efficient production, and achieve energy efficiency. This study takes into account preventive maintenance, where a maintenance activity is carried out after a certain number of batches are processed. A novel multi-objective mixed-integer programming model for unrelated parallel batching machines is proposed to minimize the makespan, total completion time and total energy consumption. The entire problem is broken down into four sub-issues: job division, job dispatching, batch formation and batch sequencing. Given the NP-hard nature of the problem, three heuristic algorithms based on several structural properties are designed according to the features of the latter three parts. Meanwhile, an integrated methodology, a Multi-Objective Evolutionary Algorithm based on Decomposition combined with Variable Neighborhood Search (MOEA/D-VNS), is put forward to handle job division and the multi-dimensional collaborative optimization problem. The performance of the proposed algorithms is compared with that of other typical dominance-based evolutionary algorithms. Extensive numerical experiments are conducted to validate the effectiveness of the proposed model and algorithms.
|
|
MoBT2 |
Room T2 |
TASE Paper Session 2 |
Special Session |
Chair: Zhang, Chen | Tsinghua University |
|
14:45-15:03, Paper MoBT2.1 | |
Nonlinear Causal Discovery Via Dynamic Latent Variables |
|
Yang, Xing | Shenzhen University |
Lan, Tian | Tsinghua University |
Qiu, Hao | Sichuan Baicha Baidao Industrial Co., Ltd |
Zhang, Chen | Tsinghua University |
Keywords: Causal Models, Probability and Statistical Methods, Machine learning
Abstract: Distinguishing causality from mere correlation is a cornerstone in empirical research, as conflating the two can result in significant errors in decision-making, affecting policy formulation and the validity of scientific inferences. Traditional experimental designs, such as randomized trials, often fall short in complex systems where variables interact in a high-dimensional space with limited data. This paper aims to address these challenges by introducing an innovative causal discovery approach, extending beyond conventional methodologies by incorporating algorithmic advances in computational efficiency and design. We present a novel double Gaussian process state space causal model (GPSSCM) that contends with the multifaceted nature of causal inference, accounting for noisy observations and latent variables, which are commonly encountered in dynamic systems. Our methodological contribution includes the application of a Markov chain Monte Carlo technique for unraveling latent state dynamics and an expectation-maximization (EM) algorithm for robust parameter estimation. The acyclic nature of the causal graph is ensured through an integrated acyclic constraint within the EM framework, maintaining the integrity of the causal model. The efficacy of our proposed GPSSCM is evaluated through a series of tests on both synthetic data and empirical case studies from the industrial domain. The results highlight the model's capacity to accurately infer complex nonlinear causal relationships, demonstrating its superiority over traditional structural equation modeling, especially when dealing with time series data and latent variables. This paper not only contributes a sophisticated tool for researchers and practitioners but also enriches the literature on causal discovery by offeri
|
|
15:03-15:21, Paper MoBT2.2 | |
LeTO: Learning Constrained Visuomotor Policy with Differentiable Trajectory Optimization |
|
Xu, Zhengtong | Purdue University |
She, Yu | Purdue University |
Keywords: Deep Learning in Robotics and Automation, Machine learning, Optimization and Optimal Control
Abstract: This paper introduces LeTO, a method for learning constrained visuomotor policy with differentiable trajectory optimization. Our approach integrates a differentiable optimization layer into the neural network. By formulating the optimization layer as a trajectory optimization problem, we enable the model to end-to-end generate actions in a safe and constraint-controlled fashion without extra modules. Our method allows for the introduction of constraint information during the training process, thereby balancing the training objectives of satisfying constraints, smoothing the trajectories, and minimizing errors with demonstrations. This "gray box" method marries optimization-based safety and interpretability with powerful representational abilities of neural networks. We quantitatively evaluate LeTO in simulation and in the real robot. The results demonstrate that LeTO performs well in both simulated and real-world tasks. In addition, it is capable of generating trajectories that are less uncertain, higher quality, and smoother compared to existing imitation learning methods. Therefore, it is shown that LeTO provides a practical example of how to achieve the integration of neural networks with trajectory optimization. We release our code at https://github.com/ZhengtongXu/LeTO.
|
|
15:21-15:39, Paper MoBT2.3 | |
DartBot: Overhand Throwing of Deformable Objects with Tactile Sensing and Reinforcement Learning |
|
Aslam, Shoaib | The Hong Kong University of Science and Technology (HKUST), Clea |
Kumar, Krish | Purdue University |
Zhou, Pokuang | Purdue University |
Yu, Hongyu | The Hong Kong University of Science and Technology |
Wang, Michael Yu | Hong Kong University of Science and Technology |
She, Yu | Purdue University |
Keywords: Machine learning, Force and Tactile Sensing, Deep Learning in Robotics and Automation
Abstract: Object transfer through throwing is a classic dynamic manipulation task that necessitates precise control and perception capabilities. However, developing dynamic models for unstructured environments using analytical methods presents challenges. In this study, we present DartBot, a robot that integrates tactile exploration and reinforcement learning to achieve robust throwing skills for nonrigid relatively small objects under the influence of moment of inertia which cause the object to spin in the air. Unlike traditional sim-to-real transfer methods, our approach involves direct training of the agent on a real hardware robot equipped with a high-resolution tactile sensor, enabling reinforced learning in a realistic and dynamic environment. By leveraging tactile perception, we incorporate pseudo-embeddings of the physical properties of objects into the learning process through tilting actions at two distinct angles. This tactile information enables the agent to infer and adapt its throwing strategy, resulting in improved accuracy when handling various objects and targeting distant locations. Furthermore, we demonstrate that the quality of a grasp significantly impacts the success rate of the throwing task. We evaluate the effectiveness of our method through extensive experiments, demonstrating superior performance and generalization capabilities in real-world throwing scenarios. We achieved a success rate of 95% for unseen objects with a mean error of 3.15 cm from the goal. A high-resolution video demo of our work is available at https://youtu.be/KNFgDeLt-0g.
|
|
15:39-15:57, Paper MoBT2.4 | |
High-Quality Dataset-Sharing and Trade Based on a Performance-Oriented Directed Graph Neural Network |
|
Zeng, Yingyan | University of Cincinnati |
Zhou, Xiaona | University of Illinois Urbana-Champaign |
Chilukuri, Premith Kumar | Virginia Tech |
Lourentzou, Ismini | University of Illinois Urbana-Champaign |
Jin, Ran | Virginia Tech |
Keywords: AI-Based Methods, Big-Data and Data Mining, Data fusion
Abstract: The advancement of Artificial Intelligence (AI) models heavily relies on large high-quality datasets. However, in advanced manufacturing, collecting such data is time-consuming and labor-intensive for a single enterprise. Hence, it is important to establish a context-aware and privacy-preserving data sharing system to share small-but-high-quality datasets between trusted stakeholders. Existing data sharing approaches have explored privacy-preserving data distillation methods and focused on valuating individual samples tied to a specific AI model, limiting their flexibility across data modalities, AI tasks, and dataset ownership. In this work, we propose a performance-oriented representation learning (PORL) framework in a Directed Graph Neural Network (DiGNN). PORL distills raw datasets into privacy-preserving proxy datasets for sharing and learns compact meta data representations for each stakeholder locally. The meta data will then be used in DiGNN to forecast the AI model performance and guide the sharing via graph-level supervised learning. The effectiveness of the PORL-DiGNN is validated by two case studies: data sharing in the semiconducting manufacturing network between similar processes to create similar quality defect models; and data sharing in the design and manufacturing network of Microbial Fuel Cell anodes between upstream (design) and downstream (Additive Manufacturing) stages to create distinct but related AI models.
|
|
15:57-16:15, Paper MoBT2.5 | |
Orchestrated Robust Controller for Precision Control of Heavy-Duty Hydraulic Manipulators |
|
Hejrati, Mahdi | Tampere University |
Mattila, Jouni | Tampere University |
Keywords: Robust/Adaptive Control, Neural and Fuzzy Control, Motion Control
Abstract: Vast industrial investment along with increased academic research on heavy-duty hydraulic manipulators has unavoidably paved the way for their automatization, necessitating the design of robust and high-precision controllers. In this study, an orchestrated robust controller is designed to address the mentioned issue for generic manipulators with an anthropomorphic arm and spherical wrist. Thanks to virtual decomposition control (VDC), the entire robotic system is decomposed into subsystems, and a robust controller is designed at each local subsystem by considering unknown model uncertainties, unknown disturbances, and compound input nonlinearities. As such, radial basis function neural networks (RBFNNs) are incorporated into VDC to tackle unknown disturbances and uncertainties, resulting in novel decentralized RBFNNs. All robust local controllers designed at each local subsystem, then, are orchestrated to accomplish high-precision control. In the end, for the first time in the context of VDC, a semi-globally uniformly ultimate boundedness is achieved under the designed controller. The validity of the theoretical results is verified by performing extensive simulations and experiments on a 6-degrees-of-freedom industrial manipulator with a nominal lifting capacity of 600, kg at 5 meters reach. Comparing the simulation results with the state-of-the-art controllers along with provided experimental results, demonstrates that proposed method fulfilled all promises and performed excellently.
|
|
MoBT3 |
Room T3 |
Automation for Enhanced Healthcare 1 |
Special Session |
Chair: Liu, Feng | Stevens Institute of Technology |
Organizer: Wen, Yuxin | Chapman University |
Organizer: Huang, Jiajing | Kennesaw State University |
Organizer: Liu, Feng | Stevens Institute of Technology |
Organizer: Chen, Jia | University of California Riverside |
Organizer: Wang, Chao | University of Maryland |
Organizer: Shen, Xin | University of California, Riverside |
|
14:45-15:03, Paper MoBT3.1 | |
Integrating Intracranial EEG and Scalp EEG for Whole Brain Network Inference (I) |
|
Yang, Shihao | Stevens Institute of Technology |
Liu, Feng | Stevens Institute of Technology |
Keywords: AI and Machine Learning in Healthcare, Modelling, Simulation and Optimization in Healthcare, Machine learning
Abstract: Over the past few decades, brain imaging research has shifted from mapping task-evoked brain regions of activation to identifying and characterizing dynamic brain networks involving multiple coordinated regions. Electrophysiological signals directly reflect brain activity, making the characterization of whole-brain electrophysiological networks (WBEN) a crucial tool for both neuroscience research and clinical applications. In this work, we introduce a novel framework for integrating scalp EEG and intracranial EEG (iEEG) to estimate WBEN, based on a principled state-space model estimation approach. An Expectation-Maximization (EM) algorithm is designed to simultaneously infer state variables and brain connectivity. We validated the proposed method using synthetic data, demonstrating improved performance over traditional two-step methods that rely solely on scalp EEG. This highlights the importance of incorporating iEEG signals for accurate WBEN estimation. For real data involving simultaneous EEG and iEEG recordings, we applied the developed framework to investigate the information flow during the encoding and maintenance phases of a working memory task. Our findings reveal distinct information flows between subcortical and cortical regions, with more significant flows from cortical to subcortical regions during the maintenance phase. These results align with previous studies, but provide a comprehensive view of the whole brain, underscoring the unique utility of the proposed framework.
|
|
15:03-15:21, Paper MoBT3.2 | |
An Optimization Model to Study the Impact of Digital Health on Regional Healthcare Accessibility (I) |
|
Weng, Leqi | Tsinghua University |
Wang, Qing | Tsinghua University |
Li, Jingshan | Tsinghua University |
Keywords: Health Care Management, Modelling, Simulation and Optimization in Healthcare, Scheduling in Healthcare
Abstract: This abstract proposes a quantitative model to study the impact of digital health on regional healthcare accessibility. An improved two-step floating area method (2SFCA) is used to describe the accessibility of offline and online medical services. Using this model, the impact of allocation of internet medical resources on regional healthcare accessibility is investigated, and optimal allocation of online medical resources can be achieved.
|
|
15:21-15:39, Paper MoBT3.3 | |
An Intelligent Wireless Capsule System for Early Detection and Precision Treatment of Gastrointestinal Disorders (I) |
|
Zheng, Jie-Ming | National Sun Yat-Sen University |
Tsai, Wen Chin | National Sun Yat-Sen University |
Lin, Jyun Ying | National Sun Yat-Sen University |
Liu, Hsiao-Chuan | University of Southern California |
Wu, Jian-Xing | National Sun Yat-Sen University |
Keywords: AI and Machine Learning in Healthcare, Physically Assistive Devices, Medical Robots and Systems
Abstract: At present, no wireless optical therapeutic capsule system is commercially available. This study presents the development of an integrated capsule system combining (1) wireless chip-based data transmission, (2) image compression techniques for efficient wireless communication, (3) high-resolution optical imaging, (4) a color feature extraction algorithm for bleeding detection, and (5) a controllable microneedle injection mechanism for targeted drug delivery. The system's performance is evaluated based on three key criteria: (1) real-time visualization of gastrointestinal bleeding, (2) accurate drug injection to target sites, and (3) successful capsule excretion following operation. The wireless communication module is based on the nRF52840 chipset, ensuring stable and reliable data transmission. Optical imaging employs a CMOS sensor with a 140-degree field of view, integrated with four white-light LEDs to provide uniform illumination. Image processing is performed using an ESP32 controller, which enhances visual data and applies advanced compression algorithms, achieving a data size reduction of up to 1/12. For hemorrhage detection, a combination of ResNet-based deep learning and color feature extraction algorithm is implemented, achieving an accuracy of 95.75%. Power is supplied by four button batteries, providing an operational duration of up to 10.3 hours. This innovative capsule system demonstrates significant potential for wireless, minimally invasive gastrointestinal diagnostics and therapy, enabling real-time monitoring, intelligent bleeding detection, and precise on-demand drug delivery within the gastrointestinal tract. The proposed platform advances current capsule endoscopy technologies and paves the way for future smart therapeutic capsules.
|
|
15:39-15:57, Paper MoBT3.4 | |
An Autonomous Robotic System for Aorta Ultrasound Screening with Deep Learning Segmentation (I) |
|
Farsoni, Saverio | University of Ferrara |
Bertagnon, Alessandro | University of Ferrara |
D'Antona, Andrea | University of Ferrara |
Rizzi, Jacopo | University of Ferrara |
Roma, Marco | University of Ferrara |
Bonfe, Marcello | University of Ferrara |
Proto, Antonino | University of Ferrara |
Baldazzi, Giulia | University of Ferrara |
Pagani, Anselmo | University of Ferrara |
Zamboni, Paolo | University of Ferrara |
Keywords: Human-Centered Automation, Robotics and Automation in Life Sciences, AI and Machine Learning in Healthcare
Abstract: Abdominal aortic aneurysm consists in the enlargement of the abdominal aorta involving a local diameter greater than 3 cm. Despite the fact that most aneurysms are asymptomatic, such a pathology becomes critical in case of complications as embolization, occlusion and rupture. The diagnosis is commonly achieved by means of an ultrasound examination carried out by expert sonographers. We designed a robotic system that can autonomously perform the ultrasound screening of the abdominal aorta, measuring its diameter and therefore providing the early diagnosis of the aneurysm. We use an impedance-controlled collaborative robot to move the ultrasound probe on the patient's abdomen while a deep learning neural network segments the aorta in the ultrasound image and estimates the diameter. Our motion planning algorithm makes use of an artificial potential field that guides the robot to move the probe toward the generation of a good aorta view. Finally, we conducted several experiments to validate the feasibility of the proposed approach.
|
|
15:57-16:15, Paper MoBT3.5 | |
Addressing the Allocation of Medical Examination Resources Using Simulation (I) |
|
Zhang, Mirui | Tsinghua University |
Zhao, Yue | Beijing Tsinghua Changgung Hospital |
Wang, Feifan | Tsinghua University |
Keywords: Health Care Management, Modelling, Simulation and Optimization in Healthcare, Scheduling in Healthcare
Abstract: The allocation of medical examination resources plays a critical role in hospital operations. Inefficient allocation can lead to delays in treatments and extend patient stays, affecting bed availability. The examination resources are allocated to inpatients and outpatients, but it must strike a balance. Prioritizing either inpatients or outpatients may disproportionately affect the other, leading to delays and reduced patient satisfaction. This study is motivated by a medical examination resource allocation problem in a general hospital. Currently, hospital managers rely on experience-based methods for resource allocation. However, long patient waiting times suggest that the intuitively determined allocation is not optimal. Determining the optimal allocation strategies in real-world hospital operations is difficult due to uncertain demand and unpredictable long-term factors. This study develops a simulation program to help determine an appropriate proportion of an examination team allocated to inpatients and outpatients. It integrates medical examinations, hospitalization, and surgery and has a user-friendly interface. Experiments based on both virtual scenario and real-world case are conducted. It is shown that the simulation program can help hospital managers systematically evaluate different allocation strategies before implementation.
|
|
MoBT4 |
Room T4 |
Trajectory, Object, and Position 2 |
Regular Session |
Chair: Yamaguchi, Tomoya | Toyota Motor Corporation |
|
14:45-15:03, Paper MoBT4.1 | |
Robot Trajectory Optimization for Safe Transport of Deformable Packages |
|
Shukla, Rishabh | University of Southern California |
Moode, Samrudh | University of Southern California |
Wang, Fan | Amazon Robotics |
Mayya, Siddharth | Amazon Robotics |
Gupta, Satyandra K. | University of Southern California |
Keywords: Motion and Path Planning, Industrial and Service Robotics, Foundations of Automation
Abstract: Efficient and safe transport of deformable packages using suction cups is crucial in warehouse automation. Unlike rigid packages, deformable packages exhibit complex oscillatory behaviors and can detach under aggressive motions. Traditional motion planners typically overlook these oscillations, often resulting in either unsafe trajectories or overly conservative, slow motions. This paper addresses that gap by formulating package oscillation dynamics as constraints and incorporating them into a Cartesian trajectory optimization framework. These constraints are formulated to be state-dependent - i.e., they adapt according to the instantaneous conditions along the planned trajectory (such as acceleration and gripper orientation) - to ensure that oscillations remain within safe limits. We derive a pendulum-like model to characterize package swing, enforcing constraints on peak oscillation angles. Our approach then optimizes end-effector trajectories under these state-dependent constraints, ensuring safe transport when the end-effector follows constant-acceleration profiles. Real-world experiments demonstrate that our optimized trajectories reduce transport time by up to 18% compared to baseline motions while strictly adhering to safety limits on package swing.
|
|
15:03-15:21, Paper MoBT4.2 | |
A Mixed-Integer Conic Program for the Multi-Agent Moving-Target Traveling Salesman Problem |
|
George Philip, Allen | Texas A&M University |
Ren, Zhongqiang | Shanghai Jiao Tong University |
Rathinam, Sivakumar | TAMU |
Choset, Howie | Carnegie Mellon University |
Keywords: Motion and Path Planning, Planning, Scheduling and Coordination, Autonomous Agents
Abstract: The Moving-Target Traveling Salesman Problem (MT-TSP) seeks a shortest path for an agent that starts at a stationary depot, visits a set of moving targets exactly once, each within one of their respective time windows, and returns to the depot. In this paper, we introduce a new Mixed-Integer Conic Program (MICP) formulation for the Multi-Agent Moving-Target Traveling Salesman Problem (MA-MT-TSP), a generalization of the MT-TSP involving multiple agents. Our approach begins by restating the current state-of-the-art MICP formulation for MA-MT-TSP as a Nonconvex Mixed-Integer Nonlinear Program (MINLP), followed by a novel reformulation into a new MICP. We present computational results demonstrating that our formulation outperforms the state-of-the-art, achieving up to two orders of magnitude reduction in runtime, and over 90% improvement in optimality gap.
|
|
15:21-15:39, Paper MoBT4.3 | |
Heterogeneous Performance of Swarm Collision Avoidance Strategies |
|
Lewis, Ryan | University of Houston |
Becker, Aaron | University of Houston |
Bernardini, Francesco | University of Houston |
Julien, Leclerc | University of Houston |
Keywords: Collision Avoidance, Motion Control, Agent-Based Systems
Abstract: Four prominent collision avoidance methods are Artificial Potential Fields, Artificial Potential Fields expressed as Control Barrier Functions, Control Barrier Functions, and Reciprocal Velocity Obstacles. Prior work often assumes all agents are using the same obstacle avoidance methods. The methods differ in computational scalability, how they react to different types of obstacles, and in how they react to agents with heterogenous collision avoidance methods. This paper explores the scenario robustness and scalability of these methods through three key navigation scenarios: different structures of stationary obstacles, circle-crossing collision avoidance benchmarks, and defense against an antagonistic swarm.
|
|
15:39-15:57, Paper MoBT4.4 | |
Scalable Multi-Agent Path Finding for Delivery Robot Systems with Temporal and Edge Capacity Constraints on Weighted Graphs |
|
Yamaguchi, Tomoya | Toyota Motor Corporation |
Nishitani, Ippei | Toyota Motor Corporation |
Ota, Yusuke | Toyota Motor Corporation |
Hoxha, Bardh | Toyota Research Institute of North America |
Fainekos, Georgios | Toyota NA-R&D |
Keywords: Planning, Scheduling and Coordination, Formal Methods in Robotics and Automation, Task Planning
Abstract: The Multi-Agent Path Finding (MAPF) problem on a graph is a fundamental research topic in robotics. This paper formulates a mathematical problem that not only captures path finding in traditional MAPF but also incorporates constraints related to weighted graphs, deadlines, and edge capacity considerations. The feasibility of this formulation is demonstrated and compared with Mixed Integer Linear Programming, Satisfiability Modulo Theories, and OR-Tools, focusing on scalability within a realistic delivery robot system.
|
|
15:57-16:15, Paper MoBT4.5 | |
Sampling-Based Near Time-Optimal Trajectory Generation for Pneumatic Drives |
|
Hoffmann, Kathrin | University of Stuttgart |
Baumgart, Michaela | University of Stuttgart |
Kanagalingam, Gajanan | University of Stuttgart |
Verl, Alexander | University of Stuttgart |
Sawodny, Oliver | University of Stuttgart |
Keywords: Motion and Path Planning, Hydraulic/Pneumatic Actuators
Abstract: When servo-pneumatic drives are applied in automation, their motion trajectories should be fast to maximize productivity. There occur nonlinear state-dependent jerk constraints because the pressure dynamics are not negligibly fast, the air mass flow through the valves is subject to pressure-dependent constraints, and the mechanics and pneumatics are coupled. The goal of this work is to generate near time-optimal trajectories for pneumatic drives, taking into account the aforementioned in a model-based way. To this end, first, the system dynamics and constraints are formulated using differential flatness such that they can be incorporated into trajectory generation frameworks. Then, the class of sampling-based near time-optimal path parametrization approaches, which build a tree of samples in the path parameter space, is chosen and extended to the present type of constraints. Results for various scenarios are discussed, compared to our previous work where nonlinear programming was applied, and validated in real-world experiments. The experimental outcomes demonstrate the applicability of the sampling-based algorithm to the present system.
|
|
MoBT5 |
Room T5 |
Human-Robot and HCA 2 |
Regular Session |
Chair: Hu, Lianming | Massachusetts Institute of Technology |
|
14:45-15:03, Paper MoBT5.1 | |
Eye Movement Feature-Guided Signal De-Drifting in Electrooculography Systems |
|
Hu, Lianming | Massachusetts Institute of Technology |
Zhang, Xiaotong | Massachusetts Institute of Technology |
Youcef-Toumi, Kamal | Massachusetts Institute of Technology |
Keywords: Sensor Fusion, Human Performance Augmentation, Human-Centered Automation
Abstract: Electrooculography (EOG) is widely used for gaze tracking in Human-Robot Collaboration (HRC). However, baseline drift caused by low-frequency noise significantly impacts the accuracy of EOG signals, creating challenges for further sensor fusion. This paper presents an Eye Movement Feature-Guided De-drift (FGD) method for mitigating drift artifacts in EOG signals. The proposed approach leverages active eye-movement feature recognition to reconstruct the feature-extracted EOG baseline and adaptively correct signal drift while preserving the morphological integrity of the EOG waveform. The FGD is evaluated using both simulation data and real-world data, achieving a significant reduction in mean error. The average error is reduced to 0.896° in simulation, representing a 36.29% decrease, and to 1.033° in real-world data, corresponding to a 26.53% reduction. Despite additional and unpredictable noise in real-world data, the proposed method consistently outperforms conventional de-drifting techniques, demonstrating its effectiveness in practical applications such as enhancing human performance augmentation.
|
|
15:03-15:21, Paper MoBT5.2 | |
Enhancing Autonomous Manipulator Control with Human-In-Loop for Uncertain Assembly Environments |
|
Mishra, Ashutosh | Tohoku University |
Santra, Shreya | Tohoku University |
Gozbasi, Hazal | Tohoku University |
Uno, Kentaro | Tohoku University |
Yoshida, Kazuya | Tohoku University |
Keywords: Assembly, Human Factors and Human-in-the-Loop, Manipulation Planning
Abstract: This study presents an advanced approach to enhance robotic manipulation in uncertain and challenging environments, with a focus on autonomous operations augmented by human-in-the-loop (HITL) control for lunar missions. Emphasizing the critical role of HITL control, the research integrates human decision-making capabilities with autonomous robotic functions to improve task reliability and efficiency for space applications. The key task addressed is the autonomous deployment of flexible solar panels using an extendable ladder-like structure and a robotic manipulator with real-time feedback for precision. The manipulator continuously relays position and force-torque data, enabling dynamic error detection, correction, and adaptive control during deployment. To mitigate the effects of sinkage, variable payload, and low-lighting conditions, efficient motion planning strategies are employed, supplemented by human control that allows operators to intervene in ambiguous scenarios. Digital twin simulation enhances system robustness by enabling continuous feedback, iterative task refinement, and seamless integration with the deployment pipeline. The system has been tested to validate its performance in simulated lunar conditions and ensure reliability in extreme lighting, variable terrain, changing payloads, and sensor limitations.
|
|
15:21-15:39, Paper MoBT5.3 | |
Force Plates for Analyzing, Recording and Teaching Forces of Dis-/Assembly Processes for Robot Programming-By-Demonstration |
|
Bargmann, Daniel | Fraunhofer IPA |
Kraus, Werner | Fraunhofer IPA |
Huber, Marco F. | University of Stuttgart |
Keywords: Human-Centered Automation, Sensor Fusion, Assembly
Abstract: Assembly tasks remain a challenge for industrial robots, as they involve physical contact where small path deviations can cause irreversible damage. While force control can mitigate such issues, tuning control parameters requires expert knowledge.Learning-from-Demonstration (LfD) or Im- itation Learning (IL) offers a more intuitive alternative, but most force-based approaches rely on hand-guiding, requiring expensive cobots or raising safety concerns. Observation-based methods, in contrast, often depend on cameras and are limited to position control, suffering from occlusions and inaccuracies. We propose a novel tool that estimates position from force signals, without cameras or direct robot interaction, to enable force-based programming of assembly tasks. Using a network of four force-torque sensors, our system detects contact position, direction, and magnitude, achieving sub-millimeter accuracy (< 1 mm) in relevant areas. Demonstrations include the force-based assembly of a snap-fit, occluded plug insertion, and terminal clamp assembly. A user study with 20 participants shows that our approach reduces teaching time for terminal clamps by 61% compared to hand-guided methods, and lowers both mental load and perceived frustration by over 40% each.
|
|
15:39-15:57, Paper MoBT5.4 | |
Enhanced Human-Robot Collaboration Using Constrained Probabilistic Human-Motion Prediction |
|
Kothari, Aadi | Massachusetts Institute of Technology |
Tohme, Tony | Massachusetts Institute of Technology |
Zhang, Xiaotong | Massachusetts Institute of Technology |
Youcef-Toumi, Kamal | Massachusetts Institute of Technology |
Keywords: Human-Centered Automation, Human Factors and Human-in-the-Loop
Abstract: Human motion prediction is an essential step for efficient and safe human-robot collaboration. Current methods either purely rely on representing the human joints in some form of neural network-based architecture or use regression models offline to fit hyper-parameters in the hope of capturing a model encompassing human motion. While these methods provide good initial results, they are missing out on leveraging well-studied human body kinematic models as well as body and scene constraints, which can help boost the efficacy of these prediction frameworks. These methods are also lacking on mechanisms to explicitly avoid implausible human joint configurations. We propose a novel human motion prediction framework that incorporates human joint constraints and scene constraints in a Gaussian Process Regression (GPR) model, while considering associated measurement uncertainty, to predict human motion. This formulation is combined with an online context-aware constraint model to leverage task-dependent motions. Our emphasis on explicit constraint modeling differentiates this work from prior studies. The proposed approach is validated on a human arm kinematic model and implemented in a human-robot collaborative setup with a UR5 robot arm, demonstrating its real-time feasibility. Simulations show that our framework dramatically improves overall mean per joint position error by as much as 66% on HA4M dataset and 51% on Andy dataset, while negative log-likelihood on the predicted probability distribution function is also improved by 32% on HA4M dataset and 15% on Andy dataset when compared to baseline methods.
|
|
15:57-16:15, Paper MoBT5.5 | |
Safe Human Robot Navigation in Warehouse Scenario |
|
Farrell, Seth | University of California San Diego |
Li, Chenghao | University of California, San Diego |
Yu, Hongzhan | University of California San Diego |
Yoshimitsu, Ryo | IHI Corporation |
Gao, Sicun | UCSD |
Christensen, Henrik Iskov | UC San Diego |
Keywords: Model Learning for Control, Collision Avoidance, Autonomous Agents
Abstract: The integration of autonomous mobile robots (AMRs) in industrial environments, particularly warehouses, has revolutionized logistics and operational efficiency. However, ensuring the safety of human workers in dynamic, shared spaces remains a critical challenge. This work proposes a novel methodology that leverages control barrier functions (CBFs) to enhance safety in warehouse navigation. By integrating learning-based CBFs with the Open Robotics Middleware Framework (Open-RMF), the system achieves adaptive and safety-enhanced controls in multi-robot, multi-agent scenarios. Experiments conducted using various robot platforms demonstrate the efficacy of the proposed approach in avoiding static and dynamic obstacles, including human pedestrians. Our experiments evaluate different scenarios in which the number of robots, robot platforms, speed, and number of obstacles are varied, from which we achieve promising performance.
|
|
MoBT6 |
Room T6 |
Detection, Estimation and Prediction 1 |
Regular Session |
Chair: Jin, Ran | Virginia Tech |
|
14:45-15:03, Paper MoBT6.1 | |
DCAF: Dynamic Cross-Attention Feature Fusion from Robotic Anomaly Detection to Position Accuracy Modeling |
|
Liu, Hui | Virginia Tech |
Qiao, Guixiu | National Institute of Standards and Technology |
Piliptchak, Pavel | National Institute of Standards and Technology |
Moore, James | University of Sheffield Advanced Manufacturing Research Centre |
Sawyer, Daniela | University of Sheffield - Advanced Manufacturing Research Centre |
Zeng, Yingyan | University of Cincinnati |
Jin, Ran | Virginia Tech |
Keywords: AI-Based Methods, Failure Detection and Recovery, Data fusion
Abstract: In robotic operations, heterogeneous computation tasks and sensor configurations pose significant challenges to analyze different modalities of data for data sharing and collaborative learning in robotic Artificial Intelligence (AI) tasks. The lack of historical data in new scenarios or new computation tasks complicates model training and limits the applicability of existing AI methodologies. Current transfer learning approaches heavily rely on static feature extraction, which fail to dynamically adjust to specific feature relationships between different samples or modalities. In the literature, these methods struggle to capture inter-modal associations effectively, resulting in insufficient information sharing and poor modeling performance. Motivated by these challenges, this paper proposes a Dynamic Cross-Attention Feature Fusion (DCAF) approach to map the features from one robotic AI task to another. By calculating attention weights tailored to each target domain sample, DCAF extracts the most relevant source domain features and generates dynamic fused representations. The proposed approach enables sample-specific feature selection and fine-grained domain alignment, effectively enhancing the modeling performance compared with traditional transfer learning and model training based on the local data source. It is particularly suited for a new robotic AI training task with limited sample size and new data modalities. Experimental results for feature fusion from a robotics anomaly detection dataset to a position accuracy modeling data set demonstrate the effectiveness of DCAF, providing an efficient solution for domain adaptation and multimodal fusion.
|
|
15:03-15:21, Paper MoBT6.2 | |
Reflex-Plan: A Safety Monitoring Architecture for Thinking Fast and Slow |
|
Rizwan , Momina | Lund University |
Reichenbach, Christoph | Lund University |
Krueger, Volker | Lund University |
Keywords: Domain-specific Software and Software Engineering, Failure Detection and Recovery, Software, Middleware and Programming Environments
Abstract: Ensuring functional safety is crucial for the deployment of autonomous systems in real-life dynamic environments, as they must operate reliably and safely among humans. However, existing safety systems are designed with a closed-world assumption and can over-constrain the system by shutting down the robot at every safety violation, limiting the robot's ability to complete its tasks. To address this problem, we present a novel operational safety approach supported by our software architecture Reflex-Plan, where a safety monitor proactively influences high-level planning to enable safe and adaptive recovery behaviors thus preventing unnecessary stops. Unlike traditional safety monitors that primarily react to violations through predefined stop mechanisms, our software architecture follows a two-step process: the fast-thinking safety monitor provides immediate reflexive responses, while the slow-thinking high-level planner processes the safety monitor's feedback to plan recovery strategies. This allows the robot to respond quickly to safety-critical situations while maintaining adaptability for long-term autonomy. We validate the effectiveness of Reflex-Plan through real-world robot experiments in a mock hospital environment. Our experimental results confirm that keeping immediate safety responses within the safety monitor ensures fast reactivity while delegating recovery strategies to the reasoning layer enables efficient adaptation, reducing failures and ensuring more stable operation without reliance on external intervention.
|
|
15:21-15:39, Paper MoBT6.3 | |
Towards Trustworthy Degradation Prediction: An Interpretable Deep Learning Approach with Sparse Feature Extraction and Temporal Fusion |
|
Li, Dongpeng | The Hong Kong Polytechnic University |
Zheng, Pai | The Hong Kong Polytechnic University |
Li, Weihua | South China University of Technology |
Keywords: Diagnosis and Prognostics, AI-Based Methods, Machine learning
Abstract: Degradation prediction of industrial equipment is crucial for ensuring the reduction of downtime and optimizing maintenance strategies. Although the existing Deep Learning (DL) based estimation methods provide accurate predictions with generalizability, the interpretable extraction of deep features related to degradation has not been discussed. It is also challenging to interpret and trace the temporal dynamics of deep features, including regular degradation accumulation and abnormal situations. To address these issues, this paper proposes an interpretable and traceable framework for degradation prediction. First, the Degradation-Informed Interpretable Encoder (DIIE) encodes the raw signal into sparse features, in which the parametric wavelet kernel and degradation constraint are designed to guide the automatic degradation feature extraction. Then the Interpretable Temporal Fusion Module (ITFM) with binarized gating values is used to directly process the multi-step features with more transparency. Finally, the temporal-enhanced features are fed into the predictor to make inferences. The proposed approach was validated on a bearing degradation dataset and has achieved competitive predictive performance. Additionally, it provides interpretations for feature extraction and temporal fusion, which can improve the understanding and trustworthiness regarding the prediction of mechanical degradation.
|
|
15:39-15:57, Paper MoBT6.4 | |
ROCKET-LRP: Explainable Time Series Classification with Application to Anomaly Prediction in Manufacturing |
|
Ling, Zhijian | University of Toronto |
Aoyama, Takuya | Konica Minolta |
Yano, Keijiro | Konica Minolta |
Cohen, Eldan | University of Toronto |
Keywords: Machine learning, AI-Based Methods, Intelligent and Flexible Manufacturing
Abstract: Time Series Classification is a popular approach in machine learning with many applications. The Random Convolutional Kernel Transform (ROCKET) model has achieved state-of-the-art performance in various time-series classification tasks due to its ability to capture complex patterns and temporal relationships. However, its reliance on random convolutions hinders the explainability of the model, as the relationships between the transformed features and the original input data become obscured. To address these challenges, we propose a novel approach for computing explanations in ROCKET-based time-series classification models that integrates Layer-wise Relevance Propagation with either model-agnostic post-hoc or model-intrinsic local explanation techniques. We implement our approach for two widely used classification models and three local explanation techniques. We validate our approach on two simulated datasets, demonstrating its faithfulness and effectiveness. Additionally, we present an application of our approach to anomaly prediction in real-world manufacturing data and show that it provides superior local explanations compared to popular explanation techniques such as SHAP and LIME.
|
|
15:57-16:15, Paper MoBT6.5 | |
Using Style Transfer to Leverage Synthetic Data for Machine Learning-Based Quality Inspection in Forming Processes |
|
Benfer, Achim | Technical University of Munich |
Hujo, Dominik | Technical University of Munich |
Krüger, Marius | Technical University of Munich |
Land, Kathrin Sophie | Technical University of Munich |
Lechner, Michael | Friedrich-Alexander-Universität |
Merklein, Marion | Friedrich-Alexander-Universität |
Vogel-Heuser, Birgit | Technical University Munich |
Keywords: Computer Vision in Automation, Machine learning, Data fusion
Abstract: Camera-based measurement systems are increasingly used in manufacturing, with machine learning models outperforming traditional image recognition methods. However, industrial adoption remains limited, partly due to the effort required for data collection and model training, which typically relies on real manufacturing data. Many mid-sized companies that build the machines do not have the required personnel to set up and train these systems. In addition, the time it takes to implement these networks either delays the start of manufacturing or prevents them from being implemented until after manufacturing has started. A possible solution is training models on synthetic data, such as Computer Aided Design (CAD) renderings, instead of real manufacturing images. However, differences in appearance between renderings and real images lead to poor model performance due to the domain gap, which is the difference between the appearance of the photos and the renderings. This paper proposes an AI-based visual quality inspection method using synthetic training data, bridging the domain gap with a style transfer applied only to the training set. This approach is evaluated on two use cases which are implemented into the industrial PC (IPC) connected to an industrial high-speed press and evaluated and compared to baselines on real manufacturing photos from this process. Results show that the domain gap between synthetic and real images can be closed through style transfer. When using the same product as a style reference, an Intersection over Union (IoU) of over 96% is achieved.
|
|
MoBT7 |
Room T7 |
Assembly Automation |
Regular Session |
Chair: Popa, Dan | University of Louisville |
|
14:45-15:03, Paper MoBT7.1 | |
Multimodal Sensing and Machine Learning to Compare Printed and Verbal Assembly Instructions Delivered by a Social Robot |
|
Mishra, Ruchik | University of Louisville |
Prasanna, Laksita | University of Louisville |
Adair, Adair | University of Louisville |
Popa, Dan | University of Louisville |
Keywords: Assembly, AI-Based Methods, Machine learning
Abstract: In this paper, we compare a manual assembly task communicated to workers using both printed and robot-delivered instructions. The comparison was made using physiological signals (blood volume pulse (BVP) and electrodermal activity (EDA)) collected from individuals during an experimental study. In addition, we also collected responses using the NASA Task Load Index (TLX) survey. Furthermore, we mapped the collected physiological signals to the responses of participants for NASA TLX to predict their workload. For both classification problems, we compared the performance of Convolutional Neural Networks (CNNs) and Long-Short-Term Memory (LSTM) models. Results show that for our CNN-based approach using multimodal data including both BVP and EDA gave better results than using just BVP (approx. 8.38% better) and EDA (approx. 20.49% better). Furthermore, our LSTM-based approach also had better results when we used multimodal data (approx. 8.38% better than just BVP and 6.70% better than just EDA). Overall, CNNs performed 7.72% better than LSTMs for classifying physiologies for paper vs. robot-based instruction. The CNN-based model also provided better classification results (approximately 17.83% better on an average across all responses of the NASA TLX) within a few minutes of training compared to the LSTM-based models.
|
|
15:03-15:21, Paper MoBT7.2 | |
Design and Integration of a Robotic Gripper and Warehouse System for Automated Cable Assembly |
|
Govoni, Andrea | University of Bologna |
Cavuoto, Michela | Università Di Bologna |
Massini Alunni, Miriam | Alma Mater Studiorum |
Palli, Gianluca | University of Bologna |
Indovini, Maurizio | Iema |
Keywords: Collaborative Robots in Manufacturing, Product Design, Development and Prototyping, Manufacturing, Maintenance and Supply Chains
Abstract: Robotic automation can improve efficiency in switchgear manufacturing, but fully automating the wiring phase remains challenging. A key limitation is the unstructured organization of cables after production, which hinders robotic integration. This paper introduces a cost-effective, modular warehouse system that bridges the gap between automated cable production and robotic wiring. The proposed system combines a custom gripper, capable of handling cables of various diameters, with a structured storage solution using adaptive clips optimized via finite element analysis. By enabling deterministic cable placement, the system aims to eliminate the need for complex vision-based identification. Experimental results confirm its robustness and repeatability, paving the way toward fully automated cable assembly in industrial applications.
|
|
15:21-15:39, Paper MoBT7.3 | |
Cutaway View Learning for Visually-Guided Assembly with Crane |
|
Li, Pusong | University of Illinois at Urbana-Champaign |
Hauser, Kris | University of Illinois at Urbana-Champaign |
Nagi, Rakesh | University of Illinois, Urbana-Champaign |
Keywords: Reinforcement, Deep Learning in Robotics and Automation, Automation in Construction
Abstract: Object manipulation for construction assembly using a crane is a control problem with highly challenging dynamics, merging contact-rich manipulation, high dynamics uncertainty, and an underactuated system. Learning a vision-guided controller for such a system using reinforcement learning is a promising but challenging approach, as mating surfaces are occluded during the last stage of assembly, making feedback indirect. We present a novel form of cutaway-view privileged information for assembly tasks that is used within a student-teacher framework, making alignment information readily available during the initial stage of training. This is paired with a pretrained encoder and embedding buffer that leverages nonphysical manipulation within the simulator to collect its training data. We evaluate our method on four different assembly-type placement tasks, and find that our system significantly outperforms both kinodynamic planning and standard reinforcement-learning baselines. We also evaluate the ability of our trained controllers to transfer to a realistic simulation environment with different underlying dynamics, demonstrating continued superior performance under deployment with a significant dynamics gap.
|
|
15:39-15:57, Paper MoBT7.4 | |
Task-Context-Aware Diffusion Policy with Language Guidance for Multi-Task Disassembly |
|
Kang, Jeon Ho | University of Southern California |
Joshi, Sagar Jatin | University of Southern California |
Dhanaraj, Neel | University of Southern California |
Gupta, Satyandra K. | University of Southern California |
Keywords: Manipulation Planning, AI-Based Methods, Intelligent and Flexible Manufacturing
Abstract: Diffusion-based policy learning has shown strong performance across diverse robotic tasks, often achieving high success rates. However, real-world deployment requires more than task success—it demands efficient execution and the ability to handle complex environments. In many assembly and disassembly settings, a single scene contains multiple potential task goals. This can confuse learned policies, leading to ambiguous behavior. Enabling explicit task selection via natural language is thus crucial for robust and flexible operation. In this paper, we address two key challenges: (1) improving task execution efficiency by structuring tasks into distinct sub-task modes using language, and (2) resolving goal ambiguity by allowing human operators to specify desired tasks through natural language commands. We further introduce an adaptive parameter selection mechanism that adjusts reliance on different sensory modalities depending on the active sub-task. We evaluate our approach on the NIST Task Board, a representative benchmark with multiple co-located task goals. Our method improves execution speed by 57% and increases task success rate by 19% compared to baseline approaches. Demonstration videos are available at: https://rros-lab.github.io/task-aware-diffusion
|
|
15:57-16:15, Paper MoBT7.5 | |
Accurate Pose Estimation Using Contact Manifold Sampling for Safe Peg-In-Hole Insertion of Complex Geometries |
|
Negi, Abhay | University of Southern California |
Manyar, Omey Mohan | University of Southern California |
Penmetsa, Dhanush Kumar Varma | University of Southern California |
Gupta, Satyandra K. | University of Southern California |
Keywords: Assembly, Compliant Assembly, Intelligent and Flexible Manufacturing
Abstract: Robotic assembly of complex, non-convex geometries with tight clearances remains a challenging problem, demanding precise state estimation for successful insertion. In this work, we propose a novel framework that relies solely on contact states to estimate the full SE(3) pose of a peg relative to a hole. Our method constructs an online submanifold of contact states through primitive motions with just 6 seconds of online execution, subsequently mapping it to an offline contact manifold for precise pose estimation. We demonstrate that without such state estimation, robots risk jamming and excessive force application, potentially causing damage. We evaluate our approach on five industrially relevant, complex geometries with 0.1 to 1.0 mm clearances, achieving a 96.7% success rate-a 6x improvement over primitive-based insertion without state estimation. Additionally, we analyze insertion forces, and overall insertion times, showing our method significantly reduces the average wrench, enabling safer and more efficient assembly.
|
|
MoBT8 |
Room T8 |
Social and Intelligent Manufacturing 1 |
Special Session |
Chair: Wang, Di | South China University of Technology |
Co-Chair: Lin, Weizhi | University of Southern California |
Organizer: Wang, Feiyue | Institute of Automation, Chinese Academy of Sciences |
Organizer: Jiang, Pingyu | Xi’an Jiatong University |
Organizer: Huang, Qiang | University of Southern California |
Organizer: Pian, Chunyuan | Xinxiang University |
Organizer: Wang, Di | South China University of Technology |
Organizer: Shen, Zhen | Institute of Automation, Chinese Academy of Sciences |
|
14:45-15:03, Paper MoBT8.1 | |
Accelerating Additive Manufacturing Slicing: A GPU-Based Parallel Algorithm for Large and Complex Mesh Models (I) |
|
Xiao, Yao | Xi'an Jiaotong University |
Qu, Zhi | Beijing Aerospace Propulsion Institute |
Wei, Chao | ZWSOFT CO., LTD.(Guangzhou) |
Yan, Chao-Bo | Xi'an Jiaotong University |
Cui, Bin | Xi'an Jiaotong University |
Keywords: Additive Manufacturing
Abstract: Efficient slicing of massive data remains a significant challenge in additive manufacturing. To address computer memory limitations and enhance slicing efficiency, this study presents a novel approach combining batch processing of large-scale mesh models with a GPU-accelerated parallel slicing algorithm. The proposed method partitions mesh model files, which are typically too large for single memory allocation, into multiple sub-models based on the slicing direction. During sub-model processing, an optimized edge-labeling algorithm is implemented to topologically mark all edges within each sub-model. The slicing operation is then executed in parallel across sub-models using GPU acceleration through OpenCL, significantly improving computational efficiency. The individual slicing results are subsequently integrated to generate the final output. Theoretically, this algorithm eliminates memory constraints on mesh model size while maintaining high slicing efficiency. Comparative experiments with industry-standard software Magics and Cura demonstrate the superiority of our method. The proposed algorithm successfully processes large-scale mesh models that exceed the capacity of both commercial solutions. Furthermore, it achieves a remarkable 80% reduction in slicing time for complex models compared to Magics and Cura, demonstrating both the feasibility and superior efficiency of our approach.
|
|
15:03-15:21, Paper MoBT8.2 | |
Dynamic Double-Sided Rolling-Horizon Auction Mechanisms for Additive Manufacturing Collaboration in Social Manufacturing (I) |
|
Sun, Mingyue | The Hong Kong Polytechnic University |
Li, Jinpeng | The Hong Kong Polytechnic University |
Chen, Qiqi | The Hong Kong Polytechnic University |
Zhang, Mengdi | The Hong Kong Polytechnic University |
Zhao, Zhiheng | The Hong Kong Polytechnic University |
Huang, George Q. | The Hong Kong Polytechnic University |
Keywords: Planning, Scheduling and Coordination, Task Planning, Additive Manufacturing
Abstract: The fusion of Social Manufacturing (SM) and Additive Manufacturing (AM) has led to the emergence of distributed and cooperative production models, where prosumers seamlessly transition between the roles of producers and consumers. However, existing manufacturing-sharing platforms often struggle to accommodate the bidirectional, dynamic, and long-term collaborative requirements inherent to AM. We first design an one-shot double-sided VCG auction mechanism, ensuring incentive compatibility, allocative efficiency and individual rationality. To support long-term and adaptive AM collaboration, this study design two rolling-horizon auction mechanisms: (1) a greedy algorithm-driven approach, which iteratively assigns AM orders by utilizing short-term price variations, and (2) a heuristic-based auction, which reformulates the collaborative manufacturing problem as a Maximum Weighted Independent Set (MWIS) problem, solved using a hybrid Iterated Local Search (ILS) heuristic. To assess the effectiveness of these mechanisms, we conduct a numerical experiment, demonstrating their capability to enhance AM resource efficiency and foster collaborative production.
|
|
15:21-15:39, Paper MoBT8.3 | |
Automated Qualification of 3D-Printed Products for Personalized Manufacturing (I) |
|
Lin, Weizhi | University of Southern California |
Huang, Qiang | University of Southern California |
Keywords: Additive Manufacturing, Machine learning
Abstract: Product qualification is typically performed by specifying features or regions of interest (ROIs) during design, conducting shape registration of the inspected product to establish correspondence with its design counterpart, and measuring discrepancies for compliance assessment. However, qualification of complex freeform products often requires human intervention to ensure accuracy, particularly in personalized manufacturing through 3D printing. Geometric variety and complexity can induce operator-to-operator variability due to heterogeneous spatial distributions of geometric distortions. To enable automated product qualification, we propose to identify and represent ROIs as surface patches using geometric descriptors indicative of their intrinsic deviation patterns. ROI specification via shape space dimension reduction, non-rigid intrinsic shape registration, and intrinsic deviation representation can therefore be conducted for product qualification. Finite types of ROIs or surface patches can be extracted based on their intrinsic deviation patterns, independent of covariates such as size and location. A MATLAB software suite has been developed to implement the entire process, demonstrating its effectiveness in the qualification of complex dental models.
|
|
15:39-15:57, Paper MoBT8.4 | |
EVT-CLIP: Enhancing Zero-Shot Anomaly Segmentation with Vision-Text Models (I) |
|
Yue, ZhiJian | University of Chinese Academy of Sciences |
Shen, Zhen | Institute of Automation, Chinese Academy of Sciences |
Fang, Qihang | Institute of Automation, Chinese Academy of Sciences |
Wang, Weixing | CASIA |
Xiong, Gang | Institute of Automation, Chinese Academy of Sciences |
Dong, Xisong | Institute of Automation, Chinese Academy of Sciences |
Wang, Feiyue | Institute of Automation, Chinese Academy of Sciences |
Keywords: Computer Vision for Manufacturing, Zero-Defect Manufacturing, Additive Manufacturing
Abstract: In recent years, zero-shot anomaly segmentation (ZSAS) has emerged as a cutting-edge technology, demonstrating significant potential in the field of anomaly detection. However, traditional methods often rely on manually designed fixed textual descriptions or anomaly prompts, which limits the model's adaptability to different types of anomalies. Additionally, existing methods exhibit shortcomings in the interaction and fusion of image and text features, resulting in suboptimal cross-modal understanding and insufficient information sharing. To address these challenges, in this paper we propose an innovative ZSAS method—EVT-CLIP—aimed at enhancing the performance of anomaly detection and localization tasks. The core idea of this method is to combine the Dynamic Attention-Enhanced Prompt (DAEP) module with the Cross-modal Interaction (CMI) module to improve the model's generalization capability and cross-modal information fusion. Specifically, the DAEP module reduces reliance on category-specific information by precisely fusing global image features with textual prompts, thereby enhancing the model's adaptability to various anomaly types. Meanwhile, the CMI module captures both local details and global contextual information in images through deep interaction between image and text features, optimizes text embeddings, and significantly enhances cross-modal understanding between images and text. Experimental validation on multiple benchmark datasets demonstrates that the EVT-CLIP framework achieves remarkable performance improvements in anomaly segmentation tasks, outperforming existing ZSAS methods and proving its effectiveness and advantages in practical applications.
|
|
15:57-16:15, Paper MoBT8.5 | |
Group-Based QMIX for Multi-Agent Reinforcement Learning (I) |
|
Hong, Weixin | Institute of Automation, Chinese Academy of Sciences |
Wu, Huaiyu | Institute of Automation, Chinese Academy of Sciences |
Fang, He | Institute of Automation, Chinese Academy of Sciences |
Shen, Zhen | Institute of Automation, Chinese Academy of Sciences |
Han, Yunjun | Institute of Automation, Chinese Academy of Sciences |
Lv, Yisheng | Chinese Academy of Sciences |
Xiong, Gang | Institute of Automation, Chinese Academy of Sciences |
Keywords: AI-Based Methods, Agent-Based Systems, Autonomous Agents
Abstract: In multi-agent reinforcement learning environments, value decomposition methods are popularly applied to address the cooperation issue among agents. However, in some multi-agent value decomposition methods, the global action-value is usually approximated using upper and lower bounds, and leads to a lack of fine-grained cooperative actions. Furthermore, current state-of-the-art value decomposition approaches are predominantly confined to addressing cooperative learning problems involving small-scale multi-agent systems. As the number of agents increases, these methods may lead to difficulties in the convergence of the Q value function especially in more complex cooperative scenarios. To address the above two challenges, we propose a Group-based QMIX (GQMIX) method which learns to dynamically divide agents into multiple groups during exploration while applying Graph Attention Network (GAT) to simultaneously learn value decomposition under both global observation and local observation. This enables the subdivision of agents into different groups in largescale settings, allowing the learning of the common subtasks in complex scenarios and improving the convergence efficiency of the value function. Experimental results demonstrate that our proposed algorithm is valid by providing better scheduling solutions for the extended flexible job shop scheduling problem. And it outperforms existing multi-agent reinforcement learning methods in terms of convergence and stability
|
|
MoBT9 |
Romm T9 |
Human Robot Collaboration for Smart Manufacturing 2 |
Special Session |
Chair: Zhang, Yunbo | Rochester Institute of Technology |
Organizer: Zheng, Pai | The Hong Kong Polytechnic University |
Organizer: Peng, Tao | Zhejiang University |
Organizer: Gu, Xi | Rutgers University |
Organizer: Wang, Yongjing | University of Birmingham |
Organizer: Kim, Hyun-Jung | Korea Advanced Institute of Science and Technology |
Organizer: Zhang, Yunbo | Rochester Institute of Technology |
Organizer: Bao, Jinsong | DongHua University |
Organizer: Huang, George Q. | The Hong Kong Polytechnic University |
Organizer: Wang, Lihui | KTH Royal Institute of Technology |
Organizer: Pham, Duc Truong | University of Birmingham |
|
14:45-15:03, Paper MoBT9.1 | |
More Attention for Human: A Multimodal Data-Driven Human Intention Identification Method (I) |
|
Liu, Zhixin | Zhejiang University |
Feng, Yixiong | Zhejiang University |
Lou, Shanhe | Nanyang Technological University |
Lu, Chengyu | Zhejiang University |
Tan, Jianrong | Zhejiang University |
Keywords: Human-Centered Automation, Data fusion, Machine learning
Abstract: “Human-Machine Symbiosis” is a defining characteristic of Industry 5.0, where human-centric manufacturing paradigms emphasize the integration of advanced production tools with intrinsically human problem-solving capabilities. While product manufacturing is inherently inseparable from product design, and conceptual design influences the cost during subsequent manufacturing stages. Current assistive manufacturing software is limited to passive command execution during interactions, lacking the ability to recognize human intent, which leads to barriers in human-machine collaboration. To address this limitation, a multimodal data-driven human intention identification method is proposed. This methodology employs 3D spatial modeling and verbal analysis to capture multimodal data generated during collaborative manufacturing processes. A novel Transformer architecture integrating T2T-ViT and Bert (TB-Multiformer) is developed to identify human intention. The multimodal features of data are extracted by inter-modal attention module and intra-modal self-attention module. A coordinated manufacture case involving two types of mechanism structures is utilized to verify the feasibility and viability of proposed method.
|
|
15:03-15:21, Paper MoBT9.2 | |
A Lightweight Human Posture Predictive Assessment in Human-Robot Collaboration Via Diffusion Mamba (I) |
|
Zhong, Ruirui | Zhejiang University |
Hu, Bingtao | Zhejiang University |
Zhang, Zhifeng | Zhejiang University |
Feng, Yixiong | Zhejiang University |
Yuan, Yixiu | Zhejiang University |
Tan, Jianrong | Zhejiang University |
Keywords: Deep Learning in Robotics and Automation, Human-Centered Automation, AI and Machine Learning in Healthcare
Abstract: Accurate human posture assessment is essential for ensuring safety and ergonomics in human-robot collaboration. Traditional assessment methods rely on real-time human posture estimation, which, while effective, lacks predictive capabilities and diversity in motion outcomes, limiting its ability to assess ergonomic risks proactively. To address this, we propose MambaFusion, a lightweight and diverse human posture predictive assessment framework based on Diffusion Mamba. MambaFusion integrates diffusion models and Mamba to generate diverse future posture predictions, enhancing the flexibility of ergonomic evaluation. A CrossMamba block is introduced for noise prediction, where cross attention mechanisms improve contextual understanding by refining conditional embeddings, leading to more accurate motion representation. Additionally, a REBA-based ergonomic assessment module evaluates predicted human postures, enabling more comprehensive ergonomic risk assessments. Extensive experiments demonstrate that MambaFusion exhibits strong performance in prediction accuracy, motion diversity, and the reliability of ergonomic evaluation, and outperforms other algorithms.
|
|
15:21-15:39, Paper MoBT9.3 | |
Dual-Replanning Tree: Fast Multi-Query Path Planning in Dynamic Environments (I) |
|
Li, Cheng | Xi'an Jiaotong University |
Huang, Ziang | Xi'an Jiaotong University |
Yan, Chao-Bo | Xi'an Jiaotong University |
Hu, Jianchen | Xi'an Jiaotong University |
Keywords: Motion and Path Planning, Collision Avoidance, Industrial and Service Robotics
Abstract: This paper presents Dual-Replanning Tree (DRT), a real-time multi-query path planning algorithm that integrates local and global path generation, multi-query planning, and dynamic obstacle avoidance. Existing algorithms, such as FAT, which fix the tree root at the destination, exhibit high replanning efficiency but struggle with multi-query path planning. On the other hand, algorithms like RT-RRT*, which maintain the tree root near the robot, are advantageous for multi-query path planning but are significantly affected by newly detected obstacles, limiting their performance in dynamic environments. Our method innovatively improves the replanning strategy of algorithms that keep the tree root near the robot by introducing a reference path mechanism, enabling efficient node rewiring and expansion. This reference path is obtained through local small-scale rewiring, resulting in low computational overhead. Based on this reference path, the proposed algorithm can achieve higher replanning efficiency than algorithms that fix the tree root at the destination while retaining the dynamic adjustment of the tree root to adapt to multi-query path planning. Experimental results demonstrate that under various environmental conditions, the DRT algorithm outperforms the FAT algorithm in two key metrics: execution cost and arrival time.
|
|
15:39-15:57, Paper MoBT9.4 | |
Multi-Class Human/Object Detection on Robot Manipulators Using Proprioceptive Sensing (I) |
|
Hehli, Justin | Unversity of Zurich |
Heiniger, Marco | Unversity of Zurich |
Rezayati, Maryam | University of Zurich (UZH), Zurich University of Applied Science |
van de Venn, Hans Wernher | Zurich University of Applied Science |
Keywords: Collaborative Robots in Manufacturing, Deep Learning in Robotics and Automation, Human-Centered Automation
Abstract: In physical human-robot collaboration (pHRC) settings, humans and robots collaborate directly in shared environments. Robots must analyze interactions with objects to ensure safety and facilitate meaningful workflows. One critical aspect is human/object detection, where the contacted object is identified. Past research introduced binary machine learning classifiers to distinguish between soft and hard objects. This study improves upon those results by evaluating three-class human/object detection models, offering more detailed contact analysis. A dataset was collected using the Franka Emika Panda robot manipulator, exploring preprocessing strategies for time-series analysis. Models including LSTM, GRU, and Transformers were trained on these datasets. The best-performing model achieved 91.11% accuracy during real-time testing, demonstrating the feasibility of multi-class detection models. Additionally, a comparison of preprocessing strategies suggests a sliding window approach is optimal for this task.
|
|
15:57-16:15, Paper MoBT9.5 | |
A Contrastive Learning Approach to Paraphrase Identification (I) |
|
Zhou, Jing | Xi'an Jiaotong University |
Hu, Min | China Mobile |
Li, Shuaipeng | Xi'an Jiaotong University |
Wang, Yuyang | Xi'an Jiaotong University |
Guo, Yifeng | China Mobile |
Dong, Xin | Xi'an Jiaotong University |
Cui, Jian | Xi'an Jiaotong University |
Li, Yibo | Xi'an Jiaotong University |
Song, Yunpeng | Xi'an Jiaotong University |
Cai, Zhongmin | Xi'an Jiaotong University |
Zhou, Wuai | China Mobile |
Yan, Chao-Bo | Xi'an Jiaotong University |
Keywords: Machine learning, AI-Based Methods, Data fusion
Abstract: This paper focuses on the paraphrase identification (PI) task, a fundamental NLP task, which aims to determine whether a pair of sentences convey the same or similar meanings. Despite the significant progresses of current pre-trained language models in PI task, the inherent ambiguity of natural languages stemming from the polysemous nature of words presents a challenge in assessing semantic similarity. Therefore, there is a necessity for further enhancement in capturing intricate relationships between sentences. In light of this challenge, we propose a method that utilizes contrastive learning to enhance sentence embeddings that are optimized for discriminating between sentences with similar or dissimilar semantic meanings. To be specific, the novel framework involves training a BERT model on modified Natural Language Inference (NLI) datasets using two-level contrastive learning to obtain a 2-Level-CLPI-BERT model, aiming to enhance sentence representations for the PI task. Experiments conducted on four PI datasets demonstrate that the proposed model outperforms state-of-the-art methods in intra-dataset. Furthermore, the cross-dataset performance evaluation substantiates the generalizability of 2-Level-CLPI-BERT embeddings.
|
|
MoBT10 |
Room T10 |
Planning, Scheduling and Control 2 |
Regular Session |
Chair: Reisi Gahrooei, Mostafa | University of Florida |
|
14:45-15:03, Paper MoBT10.1 | |
GS-NBV: A Geometry-Based, Semantics-Aware Viewpoint Planning Algorithm for Avocado Harvesting under Occlusions |
|
Song, Xiaoao | University of California Riverside |
Karydis, Konstantinos | University of California, Riverside |
Keywords: Agricultural Automation, Manipulation Planning, Reactive and Sensor-Based Planning
Abstract: Efficient identification of picking points is critical for automated fruit harvesting. Avocados present unique challenges owing to their irregular shape, weight, and less-structured growing environments, which require specific viewpoints for successful harvesting. We propose a geometry-based, semantics-aware viewpoint-planning algorithm to address these challenges. The planning process involves three key steps: viewpoint sampling, evaluation, and execution. Starting from a partially occluded view, the system first detects the fruit, then leverages geometric information to constrain the viewpoint search space to a 1D circle, and uniformly samples four points to balance the efficiency and exploration. A new picking score metric is introduced to evaluate the viewpoint suitability and guide the camera to the next-best view. We validate our method through simulation against two state-of-the-art algorithms. Results show a 100% success rate in two case studies with significant occlusions, demonstrating the efficiency and robustness of our approach. Our code is available at https://github.com/lineojcd/GSNBV.
|
|
15:03-15:21, Paper MoBT10.2 | |
Passenger Simulation Models for Transit Scheduling and Facility Layout Planning in Airports |
|
Alfas, Muhammad | Indian Institute of Technology Delhi |
Negi, Apurv | Indian Institute of Technology Delhi |
Masiwal, Mohit | Indian Institute of Technology Delhi |
Koriya, Vishal Kumar | Wipro India |
Babre, Tirtharaj Purushottam | Wipro India |
Vepakomma, Navya | Wipro India |
Shriyam, Shaurya | IIT Delhi |
Keywords: Planning, Scheduling and Coordination, Simulation and Animation, Agent-Based Systems
Abstract: Simulating the movements of passengers on service systems has always been an interesting business problem. Unlike manufacturing systems, where processes and the movement of personnel follow predefined pathways, service systems like airports require a lot of information about customer movements and behaviors. In this work, we propose two simulation models to simulate passenger movement in airport systems. The first one is a discrete event simulation model, which models passengers and trains in an airport transit system. In the second model, we use the social force model, a microscopic agent-based model, to model the movement of passengers inside airport check-in facilities. Simulated annealing, along with the transit simulation model, is used to optimize the scheduling of trains to minimize waiting times and maximize occupancy. We consider the facility layout planning problem for the airport check-in area and use large language models to improve the layout. To evaluate the efficacy of the model, we use the social force model and space syntax analysis. Results indicate that the scheduling optimization can bring a 200% improvement, whereas the improved facility layout delivers a 2% improvement in the throughput.
|
|
15:21-15:39, Paper MoBT10.3 | |
Superquadric Object Representation As a Control Barrier Function for Obstacle Avoidance |
|
Fernandez, Louis Ferdinand Nicodemus | University of Technology Sydney |
Hernandez Moreno, Victor | University of Technology Sydney |
Sutjipto, Sheila | University of Technology, Sydney |
Carmichael, Marc | Centre for Autonomous Systems |
Keywords: Collision Avoidance
Abstract: Ensuring successful robot task performance and safety in unstructured environments is a critical challenge in robotics. A key requirement in addressing this challenge is to accurately model the scene in which robots operate to effectively perform obstacle avoidance. State-of-the-art approaches for online obstacle avoidance generally rely on simplified representations (e.g. ellipsoids) that often result in highly conservative collision models that limit their effectiveness. To address this challenge, this paper proposes a controller which leverages superquadrics to reduce conservative behaviours. The parametric nature of superquadrics allow for accurate modelling of the robotic system and the environment. The calculated distance between the superquadrics informs the construction of a Control Barrier Function, which is integrated into a Quadratic Program to enable obstacle avoidance. Finally, by formulating the problem in the operating space and considering object volumes, the proposed controller is able to utilise rotational deviation to achieve safer behaviours. The proposed approach is evaluated through simulation and real-world experiments. The simulation results demonstrate the effectiveness of the proposed framework, while results from the real-world experiments highlight the advantages of the framework in different scenarios.
|
|
15:39-15:57, Paper MoBT10.4 | |
Lightweight Learning Algorithm for Lane Line Detection in Different Lighting Conditions |
|
Shi, Xiaolin | Xi'an University of Posts & Telecommunications |
Keywords: Intelligent Transportation Systems, Autonomous Vehicle Navigation
Abstract: Lane line detection is a critical task in autonomous driving and autonomous vehicle navigation that requires fast and accurate prediction. For the lane detection task, an improved lightweight lane detection method is proposed by combining multi-scale feature fusion technology and attention mechanism to improve the accuracy and efficiency. Firstly, the original ResNet lane feature extraction algorithm is replaced with RepVGG algorithm, and to further utilize low-level features, a scheme of multi-scale lane feature extraction is designed. Then, to increase the capture ability on the lane regions and expand the network's receptive field, a lane dual attention (LDA) module dedicated to lane feature extraction is proposed. Finally, to address the problem of imbalance in lane samples, focal loss function is selected to replace the cross entropy loss function. Experimental results on the CULane test dataset show the overall superior performance on detection accuracy and speed in different lighting and traffic environments.
|
|
15:57-16:15, Paper MoBT10.5 | |
Trajectory Planning of a Curtain Wall Installation Robot Based on Biomimetic Mechanisms |
|
Liu, Xiao | Shenzhen Institute of Advanced Technology, Chinese Academy of Sc |
Wang, Weijun | Guangzhou Institute of Advanced Technology, Chinese Academy of Sc |
Huang, Tianlun | University of Chinese Academy of Sciences, Beijing 100049; Shenz |
Wang, Zhiyong | University of Chinese Academy of Sciences, Beijing 100049; Shenz |
Feng, Wei | Shenzhen Institutes of Advanced Technology, Chinese Academy of S |
Keywords: Automation in Construction, Biomimetics, Motion Control
Abstract: As the robotics market rapidly evolves, energy consumption has become a critical issue, particularly restricting the application of construction robots. To tackle this challenge, our study innovatively draws inspiration from the mechanics of human upper limb movements during weight lifting, proposing a bio-inspired trajectory planning framework that incorporates human energy conversion principles. By collecting motion trajectories and electromyography (EMG) signals during dumbbell curls, we construct an anthropomorphic trajectory planning that integrates human force exertion patterns and energy consumption patterns. Utilizing the Particle Swarm Optimization (PSO) algorithm, we achieve dynamic load distribution for robotic arm trajectory planning based on human-like movement features. In practical application, these bio-inspired movement characteristics are applied to curtain wall installation tasks, validating the correctness and superiority of our trajectory planning method. Simulation results demonstrate a 48.4% reduction in energy consumption through intelligent conversion between kinetic and potential energy. This approach provides new insights and theoretical support for optimizing energy use in curtain wall installation robots during actual handling tasks.
|
|
MoBT11 |
Room T11 |
Peter Luh Memorial Best Paper Award for Young Researcher |
Special Session |
Chair: Lennartson, Bengt | Chalmers University of Technology |
|
14:45-15:10, Paper MoBT11.1 | |
Modifying ABIT* for Tethered Rappelling Robot Motion Planning |
|
Goddu, Austen | Michigan Technological University |
Brown, Travis | NASA Jet Propulsion Laboratory, California Institute of Technolo |
Paton, Michael | Jet Propulsion Laboratory |
Motes, James | University of Illinois Urbana-Champaign |
Chen, Tan | Michigan Technological University |
Keywords: Motion and Path Planning, Autonomous Vehicle Navigation, Motion Control
Abstract: In this paper, we improve a path planner for a tethered rappelling robot to find initial solutions efficiently. Implemented on NASA-JPL's Axel rover, the new planner offers an increase in success rate as high as 20% while using fewer resources compared to the original. This higher performance is achieved by modifying the underlying random geometric graph configuration to use a k-nearest neighbor approach, and biasing the sampling portion of the algorithm to add more consideration to sloped regions. These improvements are tested on a number of sloped maps constructed to have specific features or model the real world, using a pipeline involving generated terrain models and a simulated depth camera. In addition to the comparison to the original planner, various configurations are found to improve the success and efficiency of the motion planner on large and noisy maps.
|
|
15:10-15:35, Paper MoBT11.2 | |
Tackling the ‘small Data’ Challenge: A Versatile Melt Pool Encoder Via Reconstructive Unsupervised Pretraining for Few-Shot Learning (I) |
|
Chen, Zijue | CSIRO |
Wang, Heng | University of Sydney |
Gunasegaram, Dayalan | CSIRO |
Keywords: Computer Vision for Manufacturing, AI-Based Methods, Additive Manufacturing
Abstract: Research in metal additive manufacturing frequently encounters `small data' challenges due to the high costs associated with data generation and the extensive categorization of the process. In addition, most of the available data is unlabeled, i.e., it is not data science-ready. In this paper, we demonstrate a method that overcomes this issue for analyzing melt pool (MP) signatures, a key real-time indicator of process health. Specifically, we propose a novel approach that utilizes reconstructive self-supervised pretraining to develop a pretrained MP encoder that is subsequently finetuned for multiple downstream tasks—--including predictions of MP width, aspect ratio, size, and average temperature—--using only a few labeled samples. By leveraging dense unlabeled visual monitoring data, our method significantly improves prediction accuracy compared to traditional end-to-end supervised learning, especially in few-shot scenarios. Comprehensive computational experiments demonstrate that our pretrained encoder, when finetuned with a simple regression head, achieves a superior performance while reducing the overall reliance on labeled data. It thus offers a cost-effective solution for advanced manufacturing applications, e.g., real-time process control.
|
|
15:35-16:00, Paper MoBT11.3 | |
Cooking Task Planning Using LLM and Verified by Graph Network |
|
Takebayashi, Ryunosuke | Osaka University |
Isume, Vitor Hideyo | Osaka University |
Kiyokawa, Takuya | Osaka University |
Wan, Weiwei | Osaka University |
Harada, Kensuke | Osaka University |
Keywords: Task Planning, Planning, Scheduling and Coordination, AI-Based Methods
Abstract: Cooking tasks remain a challenging problem for robotics due to their complexity. Videos of people cooking are a valuable source of information for such task, but introduces a lot of variability in terms of how to translate this data to a robotic environment. This research aims to streamline this process, focusing on the task plan generation step, by using a Large Language Model (LLM)-based Task and Motion Planning (TAMP) framework to autonomously generate cooking task plans from videos with subtitles, and execute them. Conventional LLM-based task planning methods are not well-suited for interpreting the cooking video data due to uncertainty in the videos, and the risk of hallucination in its output. To address both of these problems, we explore using LLMs in combination with Functional Object-Oriented Networks (FOON), to validate the plan and provide feedback in case of failure. This combination can generate task sequences with manipulation motions that are logically correct and executable by a robot. We compare the execution of the generated plans for 5 cooking recipes from our approach against the plans generated by a few-shot LLM-only approach for a dual-arm robot setup. It could successfully execute 4 of the plans generated by our approach, whereas only 1 of the plans generated by solely using the LLM could be executed.
|
|
MoCT1 |
Room T1 |
To Automate or to Augment? |
Special Session |
Chair: Moghaddam, Mohsen | Georgia Institute of Technology |
Organizer: Moghaddam, Mohsen | Georgia Institute of Technology |
Organizer: Davari Najafabadi, Shakiba | Georgia Tech |
Organizer: Andrist, Sean | Microsoft Research |
Organizer: Bohus, Dan | Microsoft Research |
Organizer: Marsella, Stacy | Northeastern University |
|
16:30-16:48, Paper MoCT1.1 | |
A Scalable Data-Driven Methodology for Human Intention Prediction in Diverse Collaborative Scenarios (I) |
|
Dell'Oca, Samuele | University of Applied Sciences and Arts of Southern Switzerland |
Montini, Elias | University of Applied Sciences of Southern Switzerland (SUPSI) |
Cutrona, Vincenzo | University of Applied Sciences of Southern Switzerland (SUPSI) |
Matteri, Davide | University of Applied Sciences and Arts of Southern Switzerland |
Landolfi, Giuseppe | SUPSI |
Bettoni, Andrea | University of Applied Sciences of Southern Switzerland (SUPSI) |
Keywords: Collaborative Robots in Manufacturing, Human-Centered Automation, Deep Learning in Robotics and Automation
Abstract: Collaborative robots were designed to work alongside humans and enable Human-Robot Collaboration in industry, but without intelligence, these robots cannot adapt, make decisions, or respond dynamically to human actions. They function more as programmable tools rather than true teammates. This study proposes a methodology to predict operators' short-term intentions by identifying execution patterns and contextual features. Its key strength lies in its task-agnostic nature, enabling adaptation across different scenarios through a structured formalization of use-case characteristics. This flexibility allows model reconfiguration to optimize performance in various industrial applications. The predictive capability is integrated into an orchestration system, allowing the cobot to complement human actions and optimize task execution. The Intention Prediction Model was first validated in individual scenarios, demonstrating its ability to interpret human intentions in different manufacturing contexts. It was then tested with 12 participants in three collaborative scenarios, showing effective task adaptation, improved role synchronization, and task variability management, despite lower intention prediction accuracy compared to individual setups. While no collisions occurred, collaboration smoothness varied, indicating the need for advanced coordination techniques and task assignment logics based on human intention interpretation.
|
|
16:48-17:06, Paper MoCT1.2 | |
6D Pose Tracking for Adaptive AR-Mediated Human-Robot Collaboration (I) |
|
Ajikumar, Akhil | Georgia Institute of Technology |
Wen, Bowen | NVIDIA |
Moghaddam, Mohsen | Georgia Institute of Technology |
Keywords: Collaborative Robots in Manufacturing, Human-Centered Automation, Human Performance Augmentation
Abstract: This paper presents a system framework for adaptive, augmented reality (AR)-mediated human-robot collaboration (HRC), enabling real-time multimodal interaction and adaptive robot behaviors during collaborative manipulation tasks. The system integrates egocentric 6D object pose tracking with real-time visual attention tracking (gaze), hand gestures, and speech, facilitating seamless two-way communication between the human and the robot. While leveraging an existing 6D pose tracker (FoundationPose), we present the first evaluation of its integration within a real-time, egocentric AR + robot framework for dynamic HRC. Our results highlight practical limitations (e.g., tracking drift) and provide design insights for building more robust, adaptive collaboration systems. The system was validated across four real-world scenarios, demonstrating promising performance and identifying key challenges for future research.
|
|
17:06-17:24, Paper MoCT1.3 | |
SIGMA: An Open-Source Platform for Mixed-Reality Task Assistance Research (I) |
|
Andrist, Sean | Microsoft Research |
Bohus, Dan | Microsoft Research |
Keywords: Virtual Reality and Interfaces, AI-Based Methods, Software, Middleware and Programming Environments
Abstract: In this presentation, we will introduce an open-source system called SIGMA (short for "Situated Interactive Guidance, Monitoring, and Assistance") as a platform for conducting research on task-assistive agents in mixed-reality scenarios. The system leverages the sensing and rendering affordances of a head-mounted mixed-reality device in conjunction with large language and multimodal models to guide users step by step through procedural tasks. We present the system's core capabilities, discuss its overall design and implementation, and outline directions for future research enabled by the system. We will also discuss the underlying features and affordances of the Platform for Situated Intelligence framework that enabled the development of SIGMA.
|
|
17:24-17:42, Paper MoCT1.4 | |
System As a Collaborator (SAAC): A Framework for Modeling, Capturing and Augmenting Collaborative Activities in Extended Reality (I) |
|
Léchappé, Aurélien | IMT Atlantique, DAPI, LS2N, LabSticc |
Milliat, Aurélien | IMT Atlantique, DAPI, LS2N |
Kabil, Alexandre | LISN, CNRS |
Chollet, Mathieu | University of Glasgow |
Dumas, Cédric | IMT Atlantique, DAPI, LS2N |
Keywords: Virtual Reality and Interfaces, Human Factors and Human-in-the-Loop, Human Factors in Healthcare
Abstract: When humans collaborate on a shared task, they use a myriad of verbal, para-verbal and non-verbal cues to achieve this end. Modern Artificial Intelligence sensing techniques such as Social Signal Processing (SST) allow the characterization of users’ activities in Augmented or Virtual Environments through the analysis of heterogeneous multimodal data sources, such as interaction actions, gaze direction, avatars’ positions, and speech analysis. Therefore, this enables realtime assessment of team processes, including communication or team situation awareness. In return, it can provide context-specific feedback and information to augment team capabilities. However, implementing this vision remains a technical challenge, particularly in gathering real-time data sources in multi-user collaborative scenarios. This paper presents a framework for assessing and augmenting team collaboration with Extended Reality Environments (XRE). We propose a multimodal architecture to capture users’ activities and augment the team capabilities by displaying system-generated context-specific collaborative cues. Our proposed vision, System As A Collaborator (SAAC), frames the XRE as a direct actor embedded in the collaborative activity, improving team members’ capabilities with collaborative augmentations instead of being merely the space where collaboration occurs. We demonstrate the feasibility of our approach and system architecture through use cases of experimental XRE where our software architecture collects multimodal, heterogeneous, and multi-user interaction, behavioral and physiological data, and generates direct, reactive cues, and higher-level context-specific cues, augmenting team collaboration.
|
|
17:42-18:00, Paper MoCT1.5 | |
Assessing AI Roles in a Multi-Modal Human-AI Collaboration Framework for E-Scooters (I) |
|
Lo, Wei-Hsiang | San Jose State University |
Huang, Gaojian | San Jose State University |
Keywords: Human-Centered Automation, Human Factors and Human-in-the-Loop, Human Performance Augmentation
Abstract: Although incorporating AI-driven systems may help address the safety concerns caused by the growing use of e-scooters, the specific role of AI in micromobility has yet to be examined. This study explores user preferences for AI-human interaction in e-scooters across three AI roles (Advisor, Co-Pilot, Guardian) and eight human-machine interfaces (HMIs). A national survey (N=473) found no significant preference differences among the three AI roles. Auditory HMIs were preferred over Visual and Tactile HMIs. Among visual HMIs (i.e., AR glasses, control panel display, and road projection), the results indicated that AR glasses were the least satisfying, whereas control panels were the least useful. For auditory HMIs, informative voice assistance was favored over conversational types. Tactile feedback received the most positive response when delivered through handlebars compared to helmets and footpads. The findings of this study may guide the design of next-generation AI-driven systems for intelligent mobility.
|
|
MoCT2 |
Room T2 |
RAL Paper Session 1 |
Special Session |
Chair: Greer, Ross | University of California, Merced |
|
16:30-16:48, Paper MoCT2.1 | |
The Persistent Robot Charging Problem for Long-Duration Autonomy |
|
Kumar, Nitesh | Texas A&M University |
Lee, Jaekyung Jackie | Texas A&M University |
Rathinam, Sivakumar | TAMU |
Darbha, Swaroop | TAMU |
Pb, Sujit | IISER Bhopal |
Raman, Rajiv | IIIT Delhi |
Keywords: Surveillance Robotic Systems, Planning, Scheduling and Coordination, Task Planning
Abstract: This paper introduces a novel formulation aimed at determining the optimal schedule for recharging a fleet of n heterogeneous robots, with the primary objective of minimizing resource utilization. This study provides a foundational framework applicable to Multi-Robot Mission Planning, particularly in scenarios demanding Long-Duration Autonomy (LDA) or other contexts that necessitate periodic recharging of multiple robots. A novel Integer Linear Programming (ILP) model is proposed to calculate the optimal initial conditions (partial charge) for individual robots, leading to minimal utilization of charging stations. This formulation was further generalized to maximize the servicing time for robots when charging stations are limited. The efficacy of the proposed formulation is evaluated through a comparative analysis, measuring its performance against the thrift price scheduling algorithm documented in the existing literature. The findings not only validate the effectiveness of the proposed approach but also underscore its potential as a valuable tool in optimizing resource allocation for a range of robotic and engineering applications.
|
|
16:48-17:06, Paper MoCT2.2 | |
Lights As Points: Learning to Look at Vehicle Substructures with Anchor-Free Object Detection |
|
Keskar, Maitrayee | University of California, San Diego |
Greer, Ross | University of California, Merced |
Gopalkrishnan, Akshay | University of California San Diego |
Deo, Nachiket | UC San Diego |
Trivedi, Mohan | University of California San Diego (UCSD) |
Keywords: Autonomous Vehicle Navigation, Computer Vision for Transportation, Deep Learning for Visual Perception
Abstract: Vehicle detection is a paramount task for safe autonomous driving, as the ego-vehicle must localize other sur- rounding vehicles for safe navigation. Unlike other traffic agents, vehicles have necessary substructural components such as the headlights and tail lights, which can provide important cues about a vehicle’s future trajectory. However, previous object detection methods still treat vehicles as a single entity, ignoring these safety-critical vehicle substructures. Our research addresses the detection of substructural components of vehicles in conjunction with the detection of the vehicles themselves. Emphasizing the integral detection of cars and their substructures, our objective is to establish a coherent representation of the vehicle as an entity. Inspired by the CenterNet approach for human pose estimation, our model predicts object centers and subsequently regresses to bounding boxes and key points for the object. We evaluate multiple model configurations to regress to vehicle substructures on the ApolloCar3D dataset and achieve an average precision of 0.782 for the threshold of 0.5 using the direct regression approach.
|
|
17:06-17:24, Paper MoCT2.3 | |
Maximum Next-State Entropy for Efficient Reinforcement Learning |
|
Zhong, Dianyu | Tsinghua University |
Yang, Yiqin | Institue of Automation, Chinese Academy of Sciences |
Zhang, Ziyou | Tsinghua University |
Jiang, Yuhua | Tsinghua University |
Xu, Bo | Institute of Automation, Chinese Academy of Sciences |
Zhao, Qianchuan | Tsinghua University |
Keywords: Reinforcement Learning, Machine Learning for Robot Control, Motion Control
Abstract: Entropy regularization is widely used to improve policy optimization and encourage exploration in reinforcement learning. By maximizing both the expected return and entropy, the agent aims to succeed at the task while acting as randomly as possible. However, current methods based on policy entropy encourage the agent to explore diverse actions, but they do not directly promote exploring diverse states. In this study, we theoretically reveal the challenge of optimizing the agent’s nextstate entropy and the gap between maximum next-state entropy and current policy entropy regularization methods. To address this limitation, we introduce Maximum Next-State Entropy (MNSE), a novel method that maximizes next-state entropy through an action mapping layer following the inner policy. We provide a theoretical analysis demonstrating that MNSE can maximize next-state entropy by optimizing the entropy of the inner policy. We conduct extensive experiments on various continuous control tasks and demonstrate that MNSE can significantly improve the performance of RL algorithms.
|
|
17:24-17:42, Paper MoCT2.4 | |
Inverse Design of Snap-Actuated Jumping Robots Powered by Mechanics-Aided Machine Learning |
|
Tong, Dezhong | University of Michigan |
Hao, Zhuonan | University of California, Los Angeles |
Liu, Mingchao | Nanyang Technological University |
Huang, Weicheng | Newcastle University |
Keywords: Modeling, Control, and Learning for Soft Robots, Dynamics, Methods and Tools for Robot System Design
Abstract: Simulating soft robots offers a cost-effective approach to exploring their design and control strategies. While current models, such as finite element analysis, are effective in capturing soft robotic dynamics, the field still requires a broadly applicable and efficient numerical simulation method. In this paper, we introduce a discrete differential geometry-based framework for the model-based inverse design of a novel snap-actuated jumping robot. Our findings reveal that the snapping beam actuator exhibits both symmetric and asymmetric dynamic modes, enabling tunable robot trajectories (e.g., horizontal or vertical jumps). Leveraging this bistable beam as a robotic actuator, we propose a physics-data hybrid inverse design strategy to endow the snap-jump robot with a diverse range of jumping capabilities. By utilizing a physical engine to examine the effects of design parameters on jump dynamics, we then use extensive simulation data to establish a data-driven inverse design solution. This approach allows rapid exploration of parameter spaces to achieve targeted jump trajectories, providing a robust foundation for the robot’s fabrication. Our methodology offers a powerful framework for advancing the design and control of soft robots through integrated simulation and data-driven techniques.
|
|
17:42-18:00, Paper MoCT2.5 | |
Towards a Steerable Neurosurgical Robot for Debulking of Brain Mass Lesions |
|
Saini, Sarvesh | University of Miami |
Rezaeian, Saeed | University of California Riverside |
Akbari, Arshia | University of California Riverside |
Badie, Behnam | City of Hope National Medical Center |
Sheng, Jun | University of California Riverside |
Keywords: Surgical Robotics: Steerable Catheters/Needles, Medical Robots and Systems, Soft Robot Applications
Abstract: Minimally invasive surgery is regarded as a safer approach than open craniotomy to removing deep intracerebral mass lesions such as hematomas. It is usually performed by introducing a straight suction tool, sometimes combined with accessories for tissue debridement and irrigation, into the brain. Since the collateral trauma to healthy tissue is proportional to the diameter of the tools, slender tools with small diameters are desired. However, current minimally invasive tools are inadequate for removal of large, multi-focal, and fibrous mass lesions. In this work, we present a new robotic surgical device for removing intracerebral mass lesions. The device consists of four concentric tubes. From outermost to innermost, they include a straight rigid stainless steel tube, a precurved superelastic nitinol tube with asymmetric notches, a braid-reinforced composite tube with tissue cutting holes at the tip, and a suction tube connected with a suction machine. A Pebax sleeve covers the notched area of the outer tube except the two most distal notches. By rotating and translating the notched nitinol tube, the robot tip can be manipulated inside a mass lesion. By concurrently rotating the cutting tube and applying negative pressure, tissues can be cut and removed through the suction tube. In this paper, we present our design and fabrication of this robotic device, kinematic modeling of the robot in terms of the rotation and translation of the notched tube and rotation of the cutting tube, and the results of feasibility studies show 540% improvement of mass lesion removal efficiency.
|
|
MoCT3 |
Room T3 |
Automation for Enhanced Healthcare 2 |
Special Session |
Chair: Wang, Feifan | Tsinghua University |
Organizer: Wang, Feifan | Tsinghua University |
Organizer: Zhong, Xiang | University of Florida |
|
16:30-16:48, Paper MoCT3.1 | |
Fractional Order Modeling and Control of Type 1 Diabetes with Genetic Algorithm Optimization (I) |
|
Caponetto, Riccardo | University of Messina |
Patanè, Luca | University of Messina |
Koledin, Nebojsa | University of Messina |
Wrona, Andrea | Sapienza University of Rome |
Baldisseri, Federico | Sapienza University of Rome |
Delli Priscoli, Francesco | Sapienza University of Rome |
Keywords: Modelling, Simulation and Optimization in Healthcare, Health Care Management, Clinical and Operational Decision Support
Abstract: Type 1 diabetes mellitus is a condition in which blood glucose levels rise to dangerously high levels due to insufficient or absent insulin production. To regulate blood glucose levels, diabetic patients require exogenous insulin infusion. This paper presents a comparison of four control strategies for autonomous regulation of glycemia: integer-order proportional-integral (IO-PI), fractional-order proportional-integral (FO-PI), integer-order sliding mode control (IO-SMC), and fractional-order sliding mode control (FO-SMC). Genetic algorithms are employed to optimize the controllers, that are validated through numerical simulations on the fractional Bergman glucoregulatory model in presence of meal disturbance. It is found that FO-SMC exhibits superior performance in terms in time in normal glycemic range and time spent in hypoglycemia, also showing robustness properties with respect to meal increments.
|
|
16:48-17:06, Paper MoCT3.2 | |
Optimizing Network Simulation of Cardiac Electrical Dynamics (I) |
|
Liu, Runsang | Pennsylvania State University |
Yang, Hui | The Pennsylvania State University |
Keywords: Modelling, Simulation and Optimization in Healthcare, Automation in Life Science: Biotechnology, Pharmaceutical and Health Care, AI and Machine Learning in Healthcare
Abstract: Our recent study has discovered that the structural geometry of a heart can be effectively encoded into a network, which provides an opportunity to simulate cardiac electrical dynamics on a sparse adjacency matrix. While simulation models often require optimization to effectively explore “what-if” scenarios, the calibration of heart models presents greater levels of complexity. These models not only exhibit chaotic and nonstationary dynamics but are also computationally expensive, which poses significant challenges to traditional calibration methodologies. This paper presents a new statistical metamodeling framework for optimizing network simulation of cardiac electrical dynamics. We first introduce a new control parameter to characterize the heart network and regulate cell-to-cell communications. Next, a Gaussian process (GP) surrogate is developed to predict the simulation response and guide the selection of the next best parameter setting that yields the maximum expected improvement. This calibration process is iteratively performed until convergence, and the performance is evaluated and validated through case studies on 2D cardiac tissues and 3D hearts. Experimental results show that the proposed statistical metamodeling approach efficiently tailors network simulation to complex spatiotemporal dynamics.
|
|
17:06-17:24, Paper MoCT3.3 | |
An Analytical Framework for Image-Based Evaluation of Motor Function Rehabilitation (I) |
|
Zhao, Yishen | Tsinghua University |
Wang, Qing | Tsinghua University |
Ma, Lin | China Rehabilitation Research Center |
Li, Jingshan | Tsinghua University |
Keywords: Health Care Management, Modelling, Simulation and Optimization in Healthcare, Rehabilitation
Abstract: This abstract introduces an analytical framework for image-based automatic evaluation in motor rehabilitation. The framework utilizes a regular camera to capture and analyze a patient's motion, uses machine learning methods for status analysis, and generates evaluation score for rehabilitation. Such a work provides a quantitative tool to reduce the burden of physicians, improve system efficiency and patient outcome in rehabilitation processes.
|
|
17:24-17:42, Paper MoCT3.4 | |
Leveraging Structured EHR Data and Machine Learning Methods for Improved Prediction and Interpretability in ASA-PS Classification (I) |
|
Zheng, Hanyi | Tsinghua University |
Wang, Qing | Tsinghua University |
Li, Jingshan | Tsinghua University |
Keywords: Health Care Management, AI and Machine Learning in Healthcare, AI-Based Methods
Abstract: This abstract explores classification prediction of American Society of Anesthesiologists Physical Status (ASA-PS) using structured electronic health records (EHR) and machine learning techniques. The MOVER database, which contains patient demographic information, laboratory measurements, and diagnosis codes from 2017 to 2022, is used in the study. The diagnosis codes are transformed into sentence embeddings based on their official descriptions and integrated with other structured features to serve as inputs for machine learning models. XGBoost and an attention-based neural network (NN) are employed, offering varying levels of interpretability. This work provides a promising preoperative tool for anesthesiologists, to provide early and interpretable predictions to reduce workload prior to patient evaluation.
|
|
17:42-18:00, Paper MoCT3.5 | |
Enhancing Prediction Accuracy of Surgery Duration Via Natural Language Models in Operating Rooms (I) |
|
Liu, Zhaoyang | Tsinghua University |
Wang, Qing | Tsinghua University |
Li, Jingshan | Tsinghua University |
Keywords: Health Care Management, AI and Machine Learning in Healthcare, AI-Based Methods
Abstract: This study develops a natural language processing (NLP) framework for surgery duration prediction in operating rooms using Mixture Density Networks (MDN). By incorporating Named Entity Recognition (NER) and word embeddings, with procedure names being the dominant predictive factor, the model achieves superior accuracy. This work demonstrates that significant improvement over traditional methods can be achieved using this approach, which enables more efficient operating room scheduling.
|
|
MoCT4 |
Room T4 |
3D Point Cloud Modeling 1 |
Special Session |
Chair: Biehler, Michael | University of Wisconsin - Madison |
Co-Chair: Wang, Yinan | RPI |
Organizer: Biehler, Michael | University of Wisconsin - Madison |
Organizer: Wang, Yinan | RPI |
|
16:30-16:48, Paper MoCT4.1 | |
Thickness Measurement Method for Panel Grids Based on Point Cloud Segmentation and Clustering (I) |
|
Zuo, Liling | Donghua University |
Zhang, Jie | Donghua University |
Ding, SiLong | Donghua University |
Cai, Hongyang | Donghua University |
Lyu, Youlong | Donghua University |
Keywords: AI-Based Methods, Big-Data and Data Mining, Computer Vision for Manufacturing
Abstract: The thickness measurement of panel grids plays an important role in ensuring its quality and load-bearing capacity. With the widespread application of 3D laser scanning technology, point cloud-based methods have provided an effective solution for accurate thickness measurement. This paper proposes an automated measurement method based on point cloud segmentation and clustering to estimate the thickness of multiple grids on the panel. First, an adaptive convolution-based point cloud segmentation network is designed to identify grid points from the input point cloud of panel. This segmentation network features a position adaptive convolution module and a direction adaptive convolution module. The former dynamically encodes spatial positions to capture local geometric relationships, while the latter learns the direction of the normal vector to distinguish global geometric differences. Second, the extracted grid points are then clustered using the DBSCAN algorithm to identify individual grid cells. Finally, the thickness of each grid cell is computed by measuring the distance between corresponding inner and outer surface points based on neighborhood normal vectors. Comparative experimental results demonstrate the effectiveness of the proposed thickness measurement method for panel grids.
|
|
16:48-17:06, Paper MoCT4.2 | |
STGS: Spatio-Temporal Gaussian Splatting for Traffic Prediction (I) |
|
Cui, Songyi | The University of Hong Kong |
Yan, Yimo | The University of Hong Kong |
Kuo, Yong-Hong | The University of Hong Kong |
Keywords: Big-Data and Data Mining, Intelligent Transportation Systems, Data fusion
Abstract: Accurately predicting traffic speed is challenging due to the difficulty of integrating heterogeneous urban data sources and the data sparsity in certain regions. In this paper, we propose STGS, a novel framework that integrates three distinct modalities using a spatial Gaussian splatting mechanism. By diffusing latent “hotspots” across the urban grid, our approach effectively captures complex spatial correlations and fuses this enriched representation with gated recurrent networks for traffic prediction. Experimental results on real-world datasets demonstrate that STGS outperforms traditional unimodal and simple multimodal baselines, showcasing the effectiveness of our integrated spatial diffusion process for modeling spatiotemporal evolution. These findings underscore the potential of STGS to improve real-time decision-making and support sustainable urban traffic management.
|
|
17:06-17:24, Paper MoCT4.3 | |
Physics-Informed Attention-Enhanced Fourier Neural Operator for Solar Magnetic Field Extrapolations (I) |
|
Cao, Jinghao | New Jersey Institute of Technology |
Li, Qin | New Jersey Institute of Technology |
Du, Mengnan | New Jersey Institute of Technology |
Haimin, Wang | New Jersey Institute of Technology |
Shen, Bo | NJIT |
Keywords: AI-Based Methods, Big-Data and Data Mining
Abstract: We propose a Physics-informed Attention-enhanced Fourier Neural Operator (PIANO) for solving the Nonlinear Force-Free Field (NLFFF) problem in solar physics. PIANO leverages Efficient Channel Attention (ECA) with Dilated Convolutions (DC) to capture multimodal inputs by emphasizing critical channels for magnetic field variations. Moreover, physics-informed loss functions enforcing force-free and divergence-free conditions ensure that predictions adhere to the underlying physics. Experiments on the ISEE NLFFF dataset show that PIANO outperforms state-of-the-art neural operators in accuracy and consistently reproduces the physical characteristics of solar magnetic fields.
|
|
17:24-17:42, Paper MoCT4.4 | |
Multiscale Spatio-Temporal Changepoint Detection with Outliers (I) |
|
Yang, Kai | Medical College of Wisconsin |
Keywords: Surveillance Systems, Probability and Statistical Methods, Process Control
Abstract: Spatio-temporal changepoint detection is crucial for applications such as disease surveillance, environmental monitoring, and crime analysis. While most existing methods focus on analyzing data at a single geospatial scale (e.g., state, county, or census tract), they often struggle with detecting changepoints, particularly in the presence of outliers. In many applications, however, spatio-temporal data, such as incidence rates of infectious diseases, are typically managed across multiple geospatial scales to optimize data management, resulting in multiscale spatio-temporal datasets. Furthermore, large-scale spatio-temporal datasets are prone to outliers, which complicate changepoint detection. This talk introduces a robust method for sequential spatio-temporal process monitoring, designed to effectively handle both the multiscale structure of spatio-temporal data and the presence of outliers. The proposed method integrates a rank-based loss function with a spatial variable selection procedure to identify regions experiencing changes across different spatial scales. To enhance computational efficiency, a forward elimination strategy is proposed to leverage the hierarchical aggregation structure inherent in multiscale spatio-temporal data. The optimal spatial scale for process monitoring is determined based on the detected regions of change, using a novel scale information criterion. Finally, a charting statistic is constructed based on the detected changes identified at the optimal scale, enabling real-time detection of significant changes. The proposed method is not only robust to outliers but also highly scalable and computationally efficient, making it a reliable and effective tool for online monitoring of multiscale spatio-temporal data with outliers.
|
|
17:42-18:00, Paper MoCT4.5 | |
Sub-Surface Thermal Measurement in Additive Manufacturing Via Machine Learning-Enabled High-Resolution Fiber Optic Sensing (I) |
|
Wang, Rongxuan | Auburn University |
|
|
MoCT5 |
Room T5 |
Human-Robot and HCA 3 |
Regular Session |
Chair: Zhang, Zihan | Georgia Institute of Technology |
|
16:30-16:48, Paper MoCT5.1 | |
KoARob: Towards AI-Based Safety in Human-Robot-Collaborations |
|
Bermuth, Daniel | University of Augsburg |
Poeppel, Alexander | University of Augsburg |
Reif, Wolfgang | University of Augsburg |
Keywords: Collaborative Robots in Manufacturing, AI-Based Methods, Foundations of Automation
Abstract: As industries face changes in population and the need for better production efficiency arises, combining human workers with robots is becoming more common. In such collaborations, besides the user experience, the safety of the human worker is a critical aspect. This paper introduces an AI-based safety system designed to maintain a safe distance from human workers to prevent injuries. To circumvent the limitations of similar safety approaches, the system uses redundant methods to detect humans and their various joints. An evaluation in a real-world scenario shows that such an AI-based system can reliably detect humans and stop robots before a collision occurs. This work proves that using AI-based systems for human detection in safety-related contexts is not impossible and creates a basis for a new generation of safety systems that can enhance future human-robot collaborations.
|
|
16:48-17:06, Paper MoCT5.2 | |
Trustworthy Human-Robot Collaboration Programming Using a Gantt Chart-Based Domain-Specific Modeling Language |
|
Buchner, Lukas | University of Applied Sciences Upper Austria |
Zallinger, Philipp | University of Applied Sciences Upper Austria |
Nachbagauer, Karin | University of Applied Science Upper Austria |
Zoitl, Alois | Johannes Kepler University Linz |
Froschauer, Roman | University of Applied Sciences Upper Austria |
Keywords: Collaborative Robots in Manufacturing, Task Planning, Human-Centered Automation
Abstract: Programming human-robot collaboration (HRC) systems remains challenging for non-expert users. It often requires complex coding skills and provides limited visibility into robot behavior. We present the Robot Collaboration Language (RCL), a Gantt chart-based, domain-specific modeling language designed to facilitate intuitive and safe programming of collaborative tasks by non-experts. Using familiar Gantt chart representations, RCL aims to improve HRC programming transparency, predictability, and safety. We describe the language's meta-model and implementation and demonstrate its use with an example. Our approach contributes to developing trustworthy HRC systems by simplifying the complexity of programming.
|
|
17:06-17:24, Paper MoCT5.3 | |
Language-Guided Robust Navigation for Mobile Robots in Dynamically-Changing Environments |
|
Simons, Cody | University of California, Riverside |
Liu, Zhichao | University of California, Riverside |
Marcus, Brandon | University of California, Riverside |
Roy-Chowdhury, Amit | University of California, Riverside |
Karydis, Konstantinos | University of California, Riverside |
Keywords: Human-Centered Automation, AI-Based Methods, Autonomous Vehicle Navigation
Abstract: In this paper, we develop an embodied AI system for human-in-the-loop navigation using a wheeled mobile robot. We propose a direct yet effective method for monitoring the robot's current plan to detect changes in the environment that significantly impact the intended trajectory of the robot and then query a human for feedback. We also develop a means to parse human feedback expressed in natural language into local navigation waypoints and integrate it into a global planning system by leveraging a map of semantic features and an aligned obstacle map. Extensive testing in simulation and physical hardware experiments with a resource-constrained wheeled robot tasked to navigate in a real-world environment validate the efficacy and robustness of our method. This work can support applications such as precision agriculture and construction, where persistent monitoring of the environment provides users with information about the state of the environment.
|
|
17:24-17:42, Paper MoCT5.4 | |
Enabling Shared-Control for a Riding Ballbot System |
|
Chen, Yu | University of Illinois at Urbana-Champaign |
Mansouri, Mahshid | University of Illinois at Urbana-Champaign |
Xiao, Chenzhang | University of Illinois at Urbana-Champaign |
Wang, Ze | University of Illinois Urbana-Champaign |
Hsiao-Wecksler, Elizabeth T. | University of Illinois at Urbana-Champaign |
Norris, William | University of Illinois Urbana-Champaign |
Keywords: Human Performance Augmentation, Human Factors and Human-in-the-Loop, Human-Centered Automation
Abstract: This study introduces a shared-control approach for collision avoidance in the self-balancing riding ballbot, PURE, marked by its dynamic stability, omnidirectional movement, and hands-free interface. Integrating a sensor array with a novel Passive Artificial Potential Field (PAPF) method, PURE provides intuitive navigation with deceleration assistance and haptic/audio feedback, effectively mitigating collision risks. This approach addresses the limitations of traditional APF methods, such as control oscillations and unnecessary speed reduction in challenging scenarios. A human subject test, with 20 manual wheelchair users and able-bodied individuals, was conducted to evaluate the performance of indoor navigation and obstacle avoidance with the proposed shared-control algorithm. Results showed that shared-control significantly reduced collisions and cognitive load without affecting travel speed, offering intuitive and safe operation. These findings highlight the shared-control system’s suitability for enhancing collision avoidance in self-balancing mobility devices, a relatively unexplored area in assistive mobility research.
|
|
17:42-18:00, Paper MoCT5.5 | |
Enhancing Hand Palm Motion Gesture Recognition by Eliminating Reference Frame Bias Via Frame-Invariant Similarity Measures |
|
Verduyn, Arno | KU Leuven |
Vochten, Maxim | KU Leuven |
De Schutter, Joris | KU Leuven |
Keywords: Human-Centered Automation, Machine learning
Abstract: The ability of robots to recognize human gestures facilitates a natural and accessible human-robot collaboration. However, most work in gesture recognition remains rooted in reference frame-dependent representations. This poses a challenge when reference frames vary due to different work cell layouts, imprecise frame calibrations, or other environmental changes. This paper investigated the use of invariant trajectory descriptors for robust hand palm motion gesture recognition under reference frame changes. First, a novel dataset of recorded Hand Palm Motion (HPM) gestures is introduced. The motion gestures in this dataset were specifically designed to be distinguishable without dependence on specific reference frames or directional cues. Afterwards, multiple invariant trajectory descriptor approaches were benchmarked to assess how their performances generalize to this novel HPM dataset. After this offline benchmarking, the best scoring approach is validated for online recognition by developing a real-time Proof of Concept (PoC). In this PoC, hand palm motion gestures were used to control the real-time movement of a manipulator arm. The PoC demonstrated a high recognition reliability in real-time operation, achieving a F1-score of 92.3%. This work demonstrates the effectiveness of the invariant descriptor approach as a standalone solution. Moreover, we believe that the invariant descriptor approach can also be utilized within other state-of-the-art pattern recognition and learning systems to improve their robustness against reference frame variations.
|
|
MoCT6 |
Room T6 |
Detection, Estimation and Prediction 2 |
Regular Session |
Chair: Wang, Yongjing | University of Birmingham |
|
16:30-16:48, Paper MoCT6.1 | |
Simulation-To-Reality Hyperparameter Optimization of MPPI Controllers Via Bayesian Optimization in NVIDIA Omniverse Isaac Sim |
|
Ruhe, Maximilian | Proximity Robotics & Automation GmbH |
Alba, Kathrin | Proximity Robotics & Automation GmbH |
Kipfmüller, Martin | Karlsruhe University of Applied Sciences |
Mamaev, Ilshat | Proximity Robotics & Automation GmbH |
Keywords: Simulation and Animation, Optimization and Optimal Control, Motion Control
Abstract: Autonomous mobile robots navigating in dynamic environments require robust and efficient control strategies. Model Predictive Path Integral (MPPI) control offers flexibility and computational efficiency but relies on the careful tuning of multiple hyperparameters, which is traditionally performed through heuristic approaches. In this paper, we present an automated and systematic method for MPPI hyperparameter tuning using Bayesian Optimization (BO) within a ROS 2 and Nav2-based framework. By leveraging a high-fidelity digital twin implemented in NVIDIA Omniverse Isaac Sim, our method automates and accelerates hyperparameter tuning, significantly enhancing trajectory smoothness, reducing control effort, and improving overall navigation efficiency. Experimental validation on a real differential-drive robot demonstrates strong consistency between simulation-optimized parameters and real-world performance, confirming the effectiveness of our simulation-to-reality approach. This work provides a practical and reproducible method for integrating BO with ROS 2 and Nav2, enabling streamlined deployment and adaptive tuning of MPPI controllers in real-world robotic applications.
|
|
16:48-17:06, Paper MoCT6.2 | |
Retrieval-Augmented Generation Using Knowledge Graphs for Manufacturing Problem-Solving |
|
Meister, Frederic | Fraunhofer Institute for Casting, Composite and Processing Techn |
Khanal, Parikshit | Fraunhofer Institute for Casting, Composite and Processing Techn |
Trauner, Ludwig | Fraunhofer Institute for Casting, Composite and Processing Techn |
Daub, Rüdiger | Technical University of Munich (TUM, Fraunhofer IGCV |
Keywords: Failure Detection and Recovery, Causal Models, AI-Based Methods
Abstract: The increasing variety of products has resulted in a rise in production errors, primarily due to more complexity in manufacturing processes. This paper proposes a data-driven inline problem-solving approach to mitigate the response times associated with these errors. Problem-solving is initiated by detecting anomalies within processes by an autoencoder model. Upon identifying these anomalies, the proposed approach employs causal inference using a Failure Mode and Effects Analysis (FMEA)-based Bayesian Network (BN) to determine potential root causes. The inferred causes, along with the user's problem description, are processed within a hybrid Retrieval-Augmented Generation (RAG) framework. The RAG produces two sets of retrievals: one by querying a Knowledge Graph (KG) containing historic eight discipline-based (8D) problem-solving data to extract failure information and relationships; the other through keyword similarity and vector search techniques. The combined retrievals, along with the results from the BN, are then input for a relatively small-scale Large Language Model (LLM) from Mistral. The findings indicate that this approach achieves accurate information retrieval and provides reliable outputs, even when problem descriptions are vague.
|
|
17:06-17:24, Paper MoCT6.3 | |
A Unified Framework for Real-Time Failure Handling in Robotics Using Vision-Language Models, Reactive Planner and Behavior Trees |
|
Ahmad, Faseeh | Lund University |
Ismail, Hashim | Lund University |
Styrud, Jonathan | ABB |
Stenmark, Maj | Lund University |
Krueger, Volker | Lund University |
Keywords: Failure Detection and Recovery, Behavior-Based Systems, Collaborative Robots in Manufacturing
Abstract: Robotic systems often face execution failures due to unexpected obstacles, sensor errors, or environmental changes. Traditional failure recovery methods rely on predefined strategies or human intervention, making them less adaptable. This paper presents a unified failure recovery framework that combines Vision-Language Models (VLMs), a reactive planner, and Behavior Trees (BTs) to enable real-time failure handling. Our approach includes proactive pre-execution verification, which checks for potential failures before execution, and reactive failure handling, which detects and corrects failures during execution by inferring missing preconditions and, when necessary, generating new skills. The framework uses a scene graph for structured environmental perception and an execution history for continuous monitoring, enabling context-aware and adaptive failure handling. We evaluate our framework through real-world experiments with an ABB YuMi robot on tasks like peg insertion, object sorting, and drawer placement, as well as simulation benchmarks in AI2-THOR. Compared to proactive-only, reactive-only, and post-execution recovery methods, our approach achieves higher task success rates and improves adaptability. Ablation studies highlight the importance of VLM-based reasoning, structured scene representation, and execution history tracking for effective failure recovery in robotics.
|
|
17:24-17:42, Paper MoCT6.4 | |
Leveraging CVAE for Joint Configuration Estimation of Multifingered Grippers from Point Cloud Data |
|
MÉrand, Julien | Université Paris-Saclay, CEA, List |
Meden, Boris | Université Paris Saclay, CEA, LIST, F-91120 Palaiseau, France |
Grossard, Mathieu | Université Paris-Saclay, CEA, List |
Keywords: Model Learning for Control, AI-Based Methods, Deep Learning in Robotics and Automation
Abstract: This paper presents an efficient approach for determining the joint configuration of a multifingered gripper solely from the point cloud data of its poly-articulated chain, as generated by visual sensors, simulations or even generative neural networks. Well-known inverse kinematics (IK) techniques can provide mathematically exact solutions (when they exist) for joint configuration determination based solely on the fingertip pose, but often require post-hoc decision-making by considering the positions of all intermediate phalanges in the gripper's fingers, or rely on algorithms to numerically approximate solutions for more complex kinematics. In contrast, our method leverages machine learning to implicitly overcome these challenges. This is achieved through a Conditional Variational Auto-Encoder (CVAE), which takes point cloud data of key structural elements as input and reconstructs the corresponding joint configurations. We validate our approach on the MultiDex grasping dataset using the Allegro Hand, operating within 0.05 milliseconds and achieving accuracy comparable to state-of-the-art methods. This highlights the effectiveness of our pipeline for joint configuration estimation within the broader context of AI-driven techniques for grasp planning.
|
|
17:42-18:00, Paper MoCT6.5 | |
Predicting Bifurcation Points of Evolving Model Uncertainty in Online Alarm Flood Classification |
|
Manca, Gianluca | Ruhr University Bochum |
Kunze, Franz Christopher | Ruhr University Bochum |
Fay, Alexander | Ruhr University Bochum |
Keywords: Machine learning, Diagnosis and Prognostics, Probability and Statistical Methods
Abstract: Online alarm flood classification (AFC) methods help to address the challenge of alarm floods in automated industrial systems by assigning observed alarm sequences to known alarm flood classes. However, AFC methods can encounter temporary class ambiguities, where limited data introduces substantial uncertainty. Recent research addressed this issue by integrating conformal prediction (CP) with AFC, dynamically producing sets of plausible alarm flood classes. Despite their effectiveness, these approaches lack predictive insights into how the model uncertainty might evolve, thereby leaving operators unsure about optimal timing for decision-making. To resolve this limitation, we propose a novel method utilizing random forest regression models to predict bifurcation points—defined as future time steps at which at least one previously plausible class can be excluded. Additionally, we introduce a delay timer to stabilize prediction sets, significantly reducing chattering and unnecessary fluctuations in predicted class sets. Evaluating our preliminary results on a synthetic dataset using three AFC methods from the literature, we demonstrate effective estimation of upcoming bifurcation points and improved stability of predictions.
|
|
MoCT7 |
Room T7 |
Planning and Control for Semiconductor Mfg |
Special Session |
Chair: Moench, Lars | University of Hagen |
Organizer: Moench, Lars | University of Hagen |
Organizer: Yugma, Claude | Ecole Des Mines De Saint-Etienne |
|
16:30-16:48, Paper MoCT7.1 | |
Analysis for Steady Schedule Convergence against Random Time Disruptions in a Dual-Armed Cluster Tool (I) |
|
Kim, Min-Chan | Korea Advanced Institute of Science and Technology |
Kim, Hyun-Jung | Korea Advanced Institute of Science and Technology |
Keywords: Semiconductor Manufacturing, Petri Nets for Automation Control, Discrete Event Dynamic Automation Systems
Abstract: Cluster tools are single-wafer processing systems widely used in semiconductor manufacturing. These tools typically operate under a steady regime, where the 1-cyclic schedule is the most desirable for maintaining stable and efficient production, as it ensures uniform quality and higher yield. However, time disruptions, such as process time delay, can destabilize the system, potentially leading to deviations from the 1-cyclic schedule. This study identifies the threshold of time disturbances that allow natural recovery to a 1-cyclic schedule without increasing cycle time. Furthermore, we show that dynamically adjusting start time of robot tasks contributes to system stability, ensuring consistent productivity despite disturbances. Our work lays the theoretical groundwork for a dynamic wafer loading control method that adapts to varying disturbance levels, ensuring stable and efficient tool operation.
|
|
16:48-17:06, Paper MoCT7.2 | |
Forecasting Wafer Fab Outputs Using Lot Remaining Cycle Time Prediction in Semiconductor Manufacturing (I) |
|
Wartelle, Adrien | Mines Saint-Etienne |
Dauzere-Peres, Stephane | Mines Saint-Etienne |
Yugma, Claude | Ecole Des Mines De Saint-Etienne |
Christ, Quentin | STMicroelectronics Crolles |
Roussel, Renaud | STMicroelectronics Crolles |
Keywords: Semiconductor Manufacturing, Planning, Scheduling and Coordination, Probability and Statistical Methods
Abstract: Semiconductor manufacturing is a crucial component of modern supply chains, yet it faces significant challenges due to the complexity and variability of production processes. This study addresses the problem of forecasting wafer fab outputs by predicting the remaining cycle time of lots in semiconductor manufacturing. We propose a statistical modeling approach that leverages historical data to estimate lot departures and output quantities for a given forecasting horizon. The study focuses on the 20 most highest-volume products in a high-mix, low-volume production environment, using data from a major wafer fab in southeast France. A total of 26 linear regression models are evaluated, considering different contexts input and output variables, to predict RCTs and forecast lot departures. The results indicate that the best-performing model achieves a forecasting accuracy of 41.55% for an 8-week forecasting horizon and 33.39% for a 12-week forecasting horizon, comparable to what is found in the literature. However, the study also highlights the challenges of extrapolating training data to future periods, particularly due to variability in product prioritization and system congestion. Future work will focus on integrating additional variables, such as congestion metrics and due dates, as well as exploring non-linear models to enhance forecasting accuracy and robustness. This research provides a foundational framework for wafer fab output forecasting, essential for effective production management and supply chain stability in the semiconductor industry.
|
|
17:06-17:24, Paper MoCT7.3 | |
A Metaheuristic Approach for a Flexible Flow Shop Scheduling Problem with Batch Processing Machines and Maximal Time Lags (I) |
|
Rocholl, Jens | University of Hagen |
Moench, Lars | University of Hagen |
Keywords: Planning, Scheduling and Coordination, Semiconductor Manufacturing
Abstract: A flexible flow shop scheduling problem motivated by process conditions in semiconductor wafer fabrication facilities (wafer fabs) is considered. Maximal time lags between consecutive operations are respected. A biased random-key genetic algorithm (BRKGA) is hybridized by list scheduling to minimize the total weighted tardiness of the jobs. In addition to this job-based decomposition approach, another decomposition approach using mixed integer linear pro-gramming (MILP) is applied for comparison. Results of com-putational experiments based on randomly generated problem instances are presented that show that the proposed meta-heuristic outperforms the MILP-based decomposition approach.
|
|
17:24-17:42, Paper MoCT7.4 | |
Using Genetic Programming for Solving a Two-Stage Flexible Flow Shop Scheduling Problem with Maximal Time Lags (I) |
|
Schorn, Daniel | University of Hagen |
Moench, Lars | University of Hagen |
Keywords: Planning, Scheduling and Coordination, Semiconductor Manufacturing
Abstract: A scheduling problem for a two-stage flexible flow shop with maximal time lags between consecutive operations motivated by process conditions found in semiconductor wafer fabrication facilities (wafer fabs) is considered. The jobs have unequal ready times and both initial and inter-stage time lags. The performance measure is the total weighted tardiness (TWT). A heuristic scheduling framework using genetic programming (GP) to automatically discover priority indices is designed. Computational experiments based on randomly generated problem instances are carried out. The GP scheme is compared with a reference heuristic based on a biased random key genetic algorithm (BRKGA) combined with a backtracking procedure and a constraint programming (CP)-based decomposition approach. The results show that high-quality schedules are obtained in a short amount of computing time (CT) using the GP scheme.
|
|
17:42-18:00, Paper MoCT7.5 | |
Minimizing Total Weighted Tardiness for a Multiple-Orders-Per-Job Scheduling Problem with Unequal Release Times (I) |
|
Korde, Rohan | Arizona State University |
Fowler, John | Arizona State University |
Moench, Lars | University of Hagen |
Keywords: Semiconductor Manufacturing, Planning, Scheduling and Coordination
Abstract: We minimize total weighted tardiness of customer orders with unequal release times in a two-stage permutation flow shop. We solve this problem using a mixed integer linear programming (MILP) formulation, a hybrid heuristic, and a metaheuristic. Computational experiments based on randomly generated problem instances are used to compare the solution approaches. We find that the metaheuristic provides high-quality solutions in a short amount of computing time.
|
|
MoCT8 |
Room T8 |
Social and Intelligent Manufacturing 2 |
Special Session |
Chair: Wang, Di | South China University of Technology |
Co-Chair: Lin, Weizhi | University of Southern California |
Organizer: Jiang, Zhibin | Shanghai Jiao Tong University |
Organizer: Zhou, Liping | Shanghai Jiao Tong University |
|
16:30-16:48, Paper MoCT8.1 | |
Environment-Aware Continual Transfer Learning for Real-Time Defect Detection in 3D Printing (I) |
|
Li, Hongyu | CASIA |
Shen, Zhen | Institute of Automation, Chinese Academy of Sciences |
Fang, Qihang | Institute of Automation, Chinese Academy of Sciences |
Guo, Jinyuan | Institute of Automation, Chinese Academy of Sciences |
Dong, Xisong | Institute of Automation, Chinese Academy of Sciences |
Wang, Di | South China University of Technology |
Xiong, Gang | Institute of Automation, Chinese Academy of Sciences |
Wang, Feiyue | Institute of Automation, Chinese Academy of Sciences |
Keywords: Machine learning, Computer Vision in Automation, Additive Manufacturing
Abstract: In 3D printing material defect detection, environmental variations frequently induce false positives and missed detections by automated systems. Current research addresses dynamic environments by fine-tuning models during the detection phase. However, existing methods suffer from delayed adaptation: models require prolonged iterations to achieve stable performance in new environments. Furthermore, noise-contaminated pseudo-labels during domain adaptation exacerbate error accumulation and catastrophic forgetting, leading to severe performance degradation of the model. To address these challenges, we propose a robust and rapid adaptive detection framework tailored for dynamic environments. First, we innovatively employ the Gram matrix of model feature layers to quantify environmental shifts, endowing the model with real-time environmental awareness. Second, the dynamically maintained sample buffer ensures that stored samples satisfy three critical properties: pseudo-label reliability, class-balanced distribution, and diverse environmental representation. This mechanism selects samples most representative of new environments for fine-tuning, significantly accelerating adaptation. Experimental results demonstrate that our method achieves superior detection accuracy on real-world 3D printing material datasets under complex scenarios (e.g., sudden illumination changes and environmental shifts). Compared to baseline Test-Time Adaptation (TTA) methods, it exhibits enhanced adaptability and robustness.
|
|
16:48-17:06, Paper MoCT8.2 | |
Postpone or Not: Dynamic Scheduling for Single Additive Manufacturing Machine with Monte Carlo Tree Search (I) |
|
Wu, Hao | Tongji University |
Yu, Chunlong | Tongji University |
Keywords: Additive Manufacturing, Optimization and Optimal Control
Abstract: Additive manufacturing (AM) faces high costs, driving small enterprises toward social manufacturing by pooling distributed orders to spread out fixed costs. However, due to stochastic order arrivals in such systems, real-time scheduling must balance immediate processing to minimize delays with strategic postponement to facilitate cost effective batch production. This study proposes a dynamic scheduling method for single-AM-machine operators. We first formulate the problem as a Markov Decision Process, then develop a direct lookahead policy based on Monte Carlo Tree Search to approximate optimal decisions, effectively managing the trade-off between operational costs and tardiness penalties.
|
|
17:06-17:24, Paper MoCT8.3 | |
Quantifying Overlay Printing Registration Accuracy with Object Keypoint Detection for Automated Process Control in FPE Printing (I) |
|
Kim, Juhuhn | Korea Advanced Institute of Science and Technology |
Jung, Younsu | Sungkyunkwan University |
Parajuli, Sajjan | Sungkyunkwan University |
Shrestha, Sagar | Sungkyunkwan University |
Park, Jinhwa | Sungkyunkwan University |
Cho, Gyoujin | Sungkyunkwan Univerdity |
Lee, Jong-Seok | Korea Advanced Institute of Science and Technology |
Keywords: Process Control, Computer Vision for Manufacturing, Deep Learning in Robotics and Automation
Abstract: Achieving high-precision overlay printing registration accuracy (OPRA) is a critical challenge in the flexible printed electronics (FPE) printing process, particularly in roll-to-roll (R2R) gravure printing. Conventional OPRA quantification methods, primarily based on template matching, suffer from instability under real-world conditions, such as poor contrast, severe noise, and morphological variations in printed register markers. In this study, we propose a deep learning-based framework for marker detection and OPRA quantification, addressing key limitations of traditional approaches. Our method enables accurate localization of marker centers, overcoming inaccuracies caused by ink translucency, occlusion, and motion-induced blurring. Furthermore, it facilitates automatic real-time OPRA assessment, enabling statistical process control in FPE printing. Experimental evaluations demonstrate the superior robustness and reliability of the proposed approach.
|
|
17:24-17:42, Paper MoCT8.4 | |
Heterogeneous Graph Neural Network with Dual-View Fusion for Machine-Robot Collaborative Scheduling (I) |
|
Xiao, Meng | Shanghai Jiao Tong University |
Chen, Nuo | Shanghai Jiao Tong University |
Chen, Lu | Shanghai Jiao Tong University |
Keywords: Planning, Scheduling and Coordination, AI-Based Methods, Intelligent and Flexible Manufacturing
Abstract: In recent years, deep reinforcement learning (DRL), by integrating neural network technologies, has made significant progress in the field of production scheduling,represents a promising approach. However, current DRL-based solutions lack sufficient analysis of the heterogeneity among production factors in scheduling problems, resulting in suboptimal representations of scheduling states received by DRL algorithms during training. To address this issue,this paper proposes a heterogeneous graph neural network with dual-view fusion for the machine-robot collaborative scheduling problem. It effectively resolves the heterogeneity among the three production factors—operations, machines, and logistics robots—through three neural network modules, thereby providing high-quality representations of scheduling states for DRL algorithms. Experimental results demonstrate that the proposed method outperforms a representative metaheuristic algorithm (Genetic algorithm), a priority dispatching rule (First in first out), and existing DRL-based methods, both in small-scale problems and in generalization to large-scale scenarios.
|
|
17:42-18:00, Paper MoCT8.5 | |
Transfer Learning-Driven Scalability for Indoor Positioning Systems in Industrial Environments (I) |
|
Li, Peisen | The Hong Kong Polytechnic University |
Liu, Haoran | Nanyang Technological University |
Guo, Wei | The Hong Kong Polytechnic University,Department of the In |
Yue, Pengjun | The Hong Kong Polytechnic University |
Shen, Leidi | Hong Kong Polytechnic University |
Zhao, Zhiheng | The Hong Kong Polytechnic University |
Huang, George Q. | The Hong Kong Polytechnic University |
Keywords: Cyber-physical Production Systems and Industry 4.0, Intelligent and Flexible Manufacturing, Logistics
Abstract: Accurate and timely spatial-temporal data enables organizations to improve operational efficiency by optimizing production processes, monitoring worker safety, and managing resources. Indoor positioning systems (IPS) are critical for acquiring such data in environments where GNSS does not work. However, traditional IPS implementations often involve substantial effort in signal collection and system calibration, which can be challenging when production layouts change frequently or expand. This research introduces a transfer learning-enabled indoor positioning system (TLIPS) to address these challenges. TLIPS reduces the need for extensive data collection and system calibration by leveraging operational knowledge from existing environments. By applying transfer learning (TL), TLIPS enables the system to adapt and calibrate automatically in new environments, using minimal new data. This significantly reduces manual intervention and calibration time. The effectiveness of TLIPS was tested and validated through deployments in both experimental testbed (source environment) and new environment (target environment). Results demonstrated a marked reduction in calibration time and costs, highlighting the efficiency and adaptability of TLIPS. This approach offers a scalable and efficient solution for IPS deployment across diverse industrial environments, making the system more flexible and less dependent on frequent manual adjustments.
|
|
MoCT9 |
Room T9 |
Trajectory, Object, and Position 3 |
Regular Session |
Chair: Khanal, Abhish | George Mason University |
|
16:30-16:48, Paper MoCT9.1 | |
Cross-Scale Clustering and Neighborhood-Weighted Motion Pattern Extraction for Radar Trajectory Analysis |
|
Wang, Ziqian | University of Chinese Academy of Sciences |
Zhang, Lifang | The Chinese People's Liberation Army |
Guo, Yuxin | University of Chinese Academy of Sciences |
Su, Hu | Institute of Automation, Chinese Academy of Science |
Zou, Wei | Chinese Academy of Sciences, University of Chinese Academy of Sci |
Ma, Hongxuan | Institute of Automation, Chinese Academy of Sciences |
Keywords: Big-Data and Data Mining, Probability and Statistical Methods, Optimization and Optimal Control
Abstract: Clustering and motion pattern extraction of radar trajectories with the same flight route are of great significance for analyzing the intent of flight targets. However, the subjectivity of flight intent and the maneuverability of targets result in large differences between trajectories and increased trajectory complexity, presenting challenges for trajectory clustering and motion pattern extraction. To address these issues, this paper proposes a novel trajectory analysis algorithm for radar trajectories, capable of clustering trajectories across different spatial scales and extracting their motion patterns. Specifically, in the clustering phase, a cross-scale density clustering method is proposed to efficiently cluster trajectories with similar shapes but varying spatial scales. Subsequently, in the motion pattern extraction phase, a neighborhood-weighted algorithm is designed to extract concise and ordered motion patterns from complex trajectories with multiple loops and repetitive movements. Finally, we constructed a large-scale dataset containing 2,754 trajectories collected from real radar data throughout the entire year of 2017. Based on this dataset, we conducted a series of experiments to verify the effectiveness of the proposed algorithm. Experimental results show that our approach effectively overcomes the impact of geographical location and spatial scale differences on clustering and successfully extracts typical motion patterns from complex trajectories.
|
|
16:48-17:06, Paper MoCT9.2 | |
SceneDM: Consistent Diffusion Models for Coherent Multi-Agent Trajectory Generation |
|
Guo, Zhiming | Huazhong University of Science & Technology |
Gao, Xing | Shanghai AI Lab |
Zhou, Jianlan | Huazhong University of Science & Technology |
Cai, Xinyu | Shanghai AI Laboratory |
Yang, Xuemeng | Shanghai Artificial Intelligence Laboratory |
Wen, Licheng | Shanghai AI Laboratory |
Sun, Xiao | Shanghai AI Laboratory, China |
Keywords: Deep Learning in Robotics and Automation, Motion and Path Planning, Autonomous Vehicle Navigation
Abstract: Realistic multi-agent motion simulations are essential for the advancement of self-driving algorithms. However, the majority of existing works tend to overlook the kinematic realism of the simulated motions. In this paper, we present SceneDM, a novel consistent diffusion model designed to jointly generate consistent and realistic motions for all types of agents within a traffic scene. To employ temporal dependencies and improve the kinematic realism of the generated motions, we introduce an innovative constructive noise pattern alongside smoothing regularization techniques integrated into the framework of the diffusion model. Moreover, the inference procedure of this model is tailored to effectively ensure local temporal consistency. Furthermore, a scene-level scoring function is incorporated to evaluate the safety and road adherence of the generated agents’ motions, helping to filter out unrealistic simulations. Through empirical validation in the Waymo Sim Agents task, we substantiate the effectiveness of SceneDM in improving the smoothness and realism of generated agent trajectories. The project webpage is available at https://alperen-hub.github.io/SceneDM.
|
|
17:06-17:24, Paper MoCT9.3 | |
DRPA-MPPI: Dynamic Repulsive Potential Augmented MPPI for Reactive Navigation in Unstructured Environments |
|
Fuke, Takahiro | Keio University |
Endo, Masafumi | CyberAgent, Inc |
Honda, Kohei | Nagoya University |
Ishigami, Genya | Keio University |
Keywords: Motion and Path Planning, Reactive and Sensor-Based Planning, Optimization and Optimal Control
Abstract: Reactive mobile robot navigation in unstructured environments is challenging when robots encounter unexpected obstacles that invalidate previously planned trajectories. Model predictive path integral control (MPPI) enables reactive planning, but still suffers from limited prediction horizons that lead to local minima traps near obstacles. Current solutions rely on heuristic cost design or scenario-specific pre-training, which often limits their adaptability to new environments. We introduce dynamic repulsive potential augmented MPPI (DRPA-MPPI), which dynamically detects potential entrapments on the predicted trajectories. Upon detecting local minima, DRPA-MPPI automatically switches between standard goal-oriented optimization and a modified cost function that generates repulsive forces away from local minima. Comprehensive testing in simulated obstacle-rich environments confirms DRPA-MPPI's superior navigation performance and safety compared to conventional methods with less computational burden.
|
|
17:24-17:42, Paper MoCT9.4 | |
Learning-Augmented Model-Based Multi-Robot Planning for Time-Critical Search and Inspection under Uncertainty |
|
Khanal, Abhish | George Mason University |
Prince Mathew, Joseph | Geroge Mason University |
Nowzari, Cameron | George Mason University |
Stein, Gregory | George Mason University |
Keywords: Planning, Scheduling and Coordination, Motion and Path Planning, Autonomous Agents
Abstract: In disaster response or surveillance operations, quickly identifying areas needing urgent attention is critical, but deploying response teams to every location is inefficient or often impossible. Effective performance in this domain requires coordinating a multi-robot inspection team to prioritize inspecting locations more likely to need immediate response, while also minimizing travel time. This is particularly challenging because robots must directly observe the locations to determine which ones require additional attention. This work introduces a multi-robot planning framework for coordinated time-critical multi-robot search under uncertainty. Our approach uses a graph neural network to estimate the likelihood of PoIs needing attention from noisy sensor data and then uses those predictions to guide a multi-robot model based planner to determine the cost effective plan. Simulated experiments demonstrate that our planner improves performance at least by 16.3%, 26.7%, and 26.2% for 1, 3, and 5 robots, respectively, compared to non-learned and learned baselines. We also validate our approach on real-world platforms using quad-copters.
|
|
17:42-18:00, Paper MoCT9.5 | |
An Efficient and Unified Method for Extracting the Shortest Path from the Dubins Set |
|
Huang, Xuanhao | Xi'an Jiaotong University |
Yan, Chao-Bo | Xi'an Jiaotong University |
Keywords: Motion and Path Planning, Autonomous Vehicle Navigation
Abstract: Path planning is crucial for the efficient operation of Autonomous Mobile Robots (AMRs) in factory environments. Many existing algorithms rely on Dubins paths, which have been adapted for various applications. However, an efficient method for directly determining the shortest Dubins path remains underdeveloped. This paper presents a comprehensive approach to efficiently identify the shortest path within the Dubins set. We classify the initial and final configurations into six equivalency groups based on the quadrants formed by their orientation angle pairs. Paths within each group exhibit shared topological properties, enabling a reduction in the number of candidate cases to analyze. This pre-classification step simplifies the problem and eliminates the need to explicitly compute and compare the lengths of all possible paths. As a result, the proposed method significantly lowers computational complexity. Extensive experiments confirm that our approach consistently outperforms existing methods in terms of computational efficiency.
|
|
MoCT10 |
Room T10 |
Planning, Scheduling and Control 3 |
Regular Session |
Chair: Ruiz, Cesar | University of Oklahoma |
|
16:30-16:48, Paper MoCT10.1 | |
Lexicographic Optimization-Based Model Predictive Control Framework for Rigid-Formation-Based Collaborative Transportation |
|
Huang, Xuanhao | Xi'an Jiaotong University |
Li, Yuanxiang | Xi'an JiaotongUniversity |
Yan, Chao-Bo | Xi'an Jiaotong University |
Hu, Jianchen | Xi'an Jiaotong University |
Keywords: Collaborative Robots in Manufacturing, Motion and Path Planning, Optimization and Optimal Control
Abstract: As manufacturing continues to advance, autonomous mobile robots (AMRs) are becoming increasingly essential in logistics transportation. However, individual AMRs fall short in terms of size and payload capacity, restricting their ability to transport large or heavy objects. To address this limitation, collaborative systems composed of multiple AMRs have emerged as a promising solution for transporting oversized items. Despite their potential, existing collaborative transportation methods for controlling the system encounter several challenges, including formation maintenance, trajectory optimization, and obstacle avoidance, each of which holds varying levels of priority. To effectively manage these challenges and manage task prioritization, this study proposes a lexicographic optimization-based model predictive control (LO-MPC) framework. Through simulation experiments, the proposed framework demonstrates its ability to ensure a safe and reliable transportation process.
|
|
16:48-17:06, Paper MoCT10.2 | |
Virtual Fencing for Safer Cobots |
|
Pippera Badguna, Vineela Reddy | New York University |
Arab, Aliasghar | NYU |
Kodavalla, Durga Avinash | New York University |
Li, Rui | New York University |
Kurabayashi, Katsuo | New York University |
Keywords: Collaborative Robots in Manufacturing, Collision Avoidance, Optimization and Optimal Control
Abstract: Collaborative robots (cobots) increasingly operate alongside humans, demanding robust real-time safeguarding. Current safety standards (e.g., ISO 10218, ANSI/RIA 15.06, ISO/TS 15066) require risk assessments but offer limited guidance for real-time responses. We propose a virtual fencing approach that detects and predicts human motion, ensuring safe cobot operation. Safety and performance trade-offs are modeled as an optimization problem and solved via sequential quadratic programming. Experimental validation shows that our method minimizes operational pauses while maintaining safety, providing a modular solution for human-robot collaboration.
|
|
17:06-17:24, Paper MoCT10.3 | |
Group Confident Policy Optimization |
|
Li, Yao | Tsinghua University |
Liang, Zhenglin | Tsinghua University |
Keywords: Autonomous Agents, Collision Avoidance, Reinforcement
Abstract: Learning-based policy improvement methods rely on extensive data collection and iterative training, leading to credibility challenges during early training stage when the agent is transferred to new, uncertain and risky environments. To address this, this paper develops a novel reinforcement learning algorithm, termed Group Confident Policy Optimization (GCPO), which emphasizes enhancing the safety and confidence of exploration processes and policy updates. The algorithm proposes a confident advantage function that leverages group normalization to mitigate training variance induced by sampled data and introduces a confidence-based corrective factor. By employing confidence-augmented policy gradient updates, this method ensures safe agent behaviors and progressive performance improvement throughout the training cycle. Simulation experiments demonstrate superior performance of GCPO in risky tasks compared to conventional methods. These findings contribute to establishing trustworthy engineering paradigms for safety-critical automation in cross-environment transfer scenarios.
|
|
17:24-17:42, Paper MoCT10.4 | |
NN-PL-VIO: A Customizable Neural Network Based VIO Framework with a Lightweight Point-Line Joint Network |
|
Tang, Jiahao | Tsinghua University |
Yu, Jincheng | Tsinghua University |
Xiang, Yunfei | Tsinghua University |
Xue, Min | Tsinghua University |
Xu, Yuanfan | Tsinghua University |
Dong, Yuhan | Tsinghua University |
Wang, Yu | Tsinghua University |
Keywords: Motion and Path Planning, Deep Learning in Robotics and Automation, Learning and Adaptive Systems
Abstract: Harnessing the potential of line features to enhance the localization accuracy of point-based Visual-Inertial SLAM (VINS) has become a focus due to the additional constraints they provide on scene structure, especially in environments with low texture. However, the challenge of real-time performance when integrating line features into VINS remains unaddressed. This paper introduces NN-PL-VIO, a customizable real-time Visual-Inertial Odometry (VIO) system designed for embedded devices to achieve both accuracy and efficiency in point-line feature extraction. This framework facilitates the performance testing of various point, line extraction, and matching methods in a positioning system for multi-feature joint localization. In addition, to offer real-time methods, we propose SuperPLNet, a self-supervised fusion network for joint point-line detection and description. Experiments on the Euroc Dataset show that the accuracy surpasses the baseline by 20% in difficult scenes with a processing speed of up to 11.7fps on embedded systems. The source code of our method is available at: https://github.com/efc-robot/NN-PL-VIO
|
|
17:42-18:00, Paper MoCT10.5 | |
Risk-Aware Planner for Quadrotor in Cluttered and Dynamic Environments |
|
Li, Yongjian | The Hong Kong University of Science and Technology (Guangzhou) |
Zheng, Minzhe | The Hong Kong University of Science and Technology (Guangzhou) |
Chen, Kai | The Hong Kong University of Science and Technology |
Liu, Hongji | The Hong Kong University of Science and Technology |
Zhou, Jinni | Hong Kong University of Science and Technology (Guangzhou) |
Wang, Lujia | The Hong Kong University of Technology (Guangzhou) |
Ma, Jun | The Hong Kong University of Science and Technology |
Keywords: Motion and Path Planning, Collision Avoidance
Abstract: Autonomous quadrotors face significant challenges in navigating through complex environments due to dynamic obstacles. Existing trajectory planning methods typically rely on simplified motion assumptions and struggle to account for the uncertainties introduced by dynamic agents, leading to overly conservative or unsafe paths. To address this issue, we propose a risk-aware planner that integrates probabilistic risk assessment into motion planning, which incorporates dynamic constraints to enable safer and more efficient navigation. Our approach first generates a risk map by combining a probabilistic representation of dynamic obstacles with a distance-based static risk model, offering a more comprehensive environmental risk assessment. Based on this map, we introduce a kinodynamic A* planner that generates an initial path utilizing risk-based heuristics, which is then optimized to minimize risk while ensuring smoothness and feasibility. Simulation experiments demonstrate that our method allows quadrotors to navigate dynamic environments more safely and efficiently.
|
|
MoCT11 |
Room T11 |
Best Application Paper Competition |
Special Session |
Chair: Lennartson, Bengt | Chalmers University of Technology |
|
16:30-16:55, Paper MoCT11.1 | |
Requirement-Driven Sharing of Manufacturing Digital Twins Along the Value Chain |
|
Gnadlinger, Michael | Technical University of Munich (TUM) |
Tilbury, Dawn | University of Michigan |
Barton, Kira | University of Michigan at Ann Arbor |
Wilch, Jan | Technical University of Munich |
Vogel-Heuser, Birgit | Technical University Munich |
Keywords: Cyber-physical Production Systems and Industry 4.0, Manufacturing, Maintenance and Supply Chains, Software, Middleware and Programming Environments
Abstract: Digital Twins (DTs) are key enablers of Smart Manufacturing, yet their adoption across the value chain is hindered by the lack of a standardized sharing framework. This paper addresses this challenge by identifying essential descriptive and qualitative elements of DTs based on standards and literature. Leveraging the Asset Administration Shell (AAS), it proposes a Submodel Template, which standardizes the packaging of DT models, interfaces, and computational and network requirements thus going beyond, and combining, existing AAS Submodels, i.e. for simulation models, to encapsulate the full multidimensionality of DTs. A case study on a Quality Monitoring DT (QM-DT) demonstrates the template’s ability to support seamless DT deployment, aggregation, and operation across heterogeneous manufacturing environments. Results show that the template enables structured transfer of subject matter expertise captured in DT models, real-time constraint support, and interoperability, laying the groundwork for improved DT integration and exchange.
|
|
16:55-17:20, Paper MoCT11.2 | |
Bayesian Intention for Enhanced Human Robot Collaboration |
|
Hernandez-Cruz, Vanessa | Massachusetts Institute of Technology |
Zhang, Xiaotong | Massachusetts Institute of Technology |
Youcef-Toumi, Kamal | Massachusetts Institute of Technology |
Keywords: Human-Centered Automation, Probability and Statistical Methods, Industrial and Service Robotics
Abstract: As robots increasingly assist humans in dynamic tasks, predicting human intent is essential to achieving seamless Human-Robot Collaboration (HRC). Many existing approaches for human intention prediction fail to fully exploit the inherent relationships between objects, tasks, and the human model. Current methods for predicting human intent, such as Gaussian Mixture Models (GMMs) and Conditional Random Fields (CRFs), often lack interpretability due to their failure to account for causal relationships between variables. To address these challenges, in this paper, we developed a novel Bayesian Intention (BI) framework to predict human intent within a multi-modality information framework in HRC scenarios. This framework captures the complexity of intent prediction by modeling the correlations between human behavior conventions and scene data. Our framework leverages these inferred intent predictions to optimize the robot's response in real-time, enabling smoother and more intuitive collaboration. We demonstrate the effectiveness of our approach through an HRC task involving a UR5 robot, highlighting BI's capability for real-time human intent prediction and collision avoidance using a unique dataset we created. Our evaluations show that the multi-modality BI model predicts human intent within 2.69ms, with a 36% increase in precision, a 60% increase in F1 Score, and an 85% increase in accuracy compared to its best baseline method. The results underscore BI's potential to advance real-time human intent prediction and collision avoidance, making a significant contribution to the field of HRC.
|
|
17:20-17:45, Paper MoCT11.3 | |
Data-Driven Inventory Management for New Products: An Adjusted Dyna-Q Approach with Transfer Learning (I) |
|
Qu, Xinye | The University of Hong Kong |
Liu, Longxiao | The University of Hong Kong |
Huang, Wenjie | The University of Hong Kong |
Keywords: Inventory Management, Reinforcement, AI-Based Methods
Abstract: In this paper, we propose a novel reinforcement learning algorithm for inventory management of newly launched products with no historical demand information. The algorithm follows the classic Dyna-Q structure, balancing the model-free and model-based approaches, while accelerating the training process of Dyna-Q and mitigating the model discrepancy generated by the model-based feedback. Based on the idea of transfer learning, warm-start information from the demand data of existing similar products can be incorporated into the algorithm to further stabilize the early-stage training and reduce the variance of the estimated optimal policy. Our approach is validated through a case study of bakery inventory management with real data. The adjusted Dyna-Q shows up to a 23.7% reduction in average daily cost compared with Q-learning, and up to a 77.5% reduction in training time within the same horizon compared with classic Dyna-Q. By using transfer learning, it can be found that the adjusted Dyna-Q has the lowest total cost, lowest variance in total cost, and relatively low shortage percentages among all the benchmarking algorithms under a 30-day testing.
|
| |