Abstract
The transition from Industry 4.0 to Industry 5.0 marks a fundamental reorientation of manufacturing systems around human-centered collaboration, where workers and robots coexist and cooperate in shared workspaces. This paradigm shift introduces critical challenges in industrial safety monitoring: ensuring that collaborative robots respond safely and adaptively to human actions, that workers are protected from ergonomic risks and hazardous conditions, and that safety systems operate with the real-time reliability demanded by high-speed production environments. Traditional safety approaches—based on static rule-based logic and retrospective incident analysis—are fundamentally inadequate for the dynamic, unpredictable nature of human-robot collaboration. This review examines how multimodal learning—the integration of data from wearable sensors, computer vision systems, physiological monitors, and environmental sensors—combined with human digital twin architectures, is transforming industrial safety monitoring in human-robot collaborative environments. Drawing on twelve peer-reviewed works, we synthesize advances in human activity recognition (HAR) with wearable sensors, human intention recognition for real-time robot control, reinforcement learning for adaptive robotic manipulation, and worker safety digital twins for Industry 5.0. We further demonstrate how industrial sensing technologies—including four-dimensional thermal imaging, stereo phase-measuring deflectometry, and gesture-based robotic control—serve as critical sensor modalities within the multimodal safety monitoring framework. A central contribution of this review is the articulation of an integrated Human-Cobot Safety Intelligence (HCSI) paradigm that unifies multimodal perception, predictive safety analytics, and adaptive robot control for proactive, real-time industrial safety assurance.
References
Annual Reviews. (2025). Deep reinforcement learning for robotics: A survey of real-world successes. *Annual Review of Control, Robotics, and Autonomous Systems*. https://doi.org/10.1146/annurev-control-030323-022510
Arsigah, A., et al. (2024). A survey on multimodal wearable sensor-based human action recognition. *arXiv preprint arXiv:2404.15349*. https://doi.org/10.48550/arXiv.2404.15349
ASME. (2024). Early prediction of human intention for human–robot collaboration using transformer network. *Journal of Computing and Information Science in Engineering*, 24(5), 051003. https://doi.org/10.1115/1.4056789
Awoke, P., et al. (2024). A hybrid LSTM-CNN model with efficient channel attention for enhanced human activity recognition using wearable sensors. *Discover Applied Sciences*, 7, 98. https://doi.org/10.1007/s42452-025-07896-0
Chen, Z., et al. (2024). LSTM-CNN architecture for construction activity recognition using optimal positioning of wearables. *Journal of Construction Engineering and Management*, 150(12). https://doi.org/10.1061/JCEMD4.COENG-14645
Davila-Gonzalez, S., & Martin, S. (2024). Human digital twin in Industry 5.0: A holistic approach to worker safety and well-being through advanced AI and emotional analytics. *Sensors*, 24(2), 655. https://doi.org/10.3390/s24020655
Frontiers in Robotics and AI. (2025). Human intention recognition by deep LSTM and transformer networks for real-time human-robot collaboration. *Frontiers in Robotics and AI*, 12, 1708987. https://doi.org/10.3389/frobt.2025.1708987
Huang, H., Tang, J., Liu, T., & Huang, M.-L. (2026). Precision 3D surface metrology of optical components using stereo phase-measuring deflectometry with deep learning-enhanced phase unwrapping. *Proceedings of SPIE*, 0898. https://doi.org/10.1117/12.3093993
Huang, H., Yang, Y., & Zhu, Y. (2023). Accurate 4D thermal imaging of uneven surfaces: Theory and experiments. *International Journal of Heat and Mass Transfer*, 211, 124580. https://doi.org/10.1016/j.ijheatmasstransfer.2023.124580
Li, Y., Lou, J., Cai, Z., Zheng, P., Wu, H., & Wang, X. (2024). An interactive gesture control system for collaborative manipulator based on Leap Motion Controller. *Advances in Mechanical Engineering*, 16(5), 16878132241253101. https://doi.org/10.1177/16878132241253101
PMC. (2025). Enhancing robotic collaborative tasks through contextual human motion prediction and intention inference. *PMC*, 12568899. https://doi.org/10.1016/j.robot.2025.103689
Parnada, A., Qu, M., Castellani, M., Chang, H. J., & Wang, Y. (2026). Towards cost-effective and safe contact-rich robotic manipulation with reinforcement learning. *Proceedings of the Institution of Mechanical Engineers, Part I: Journal of Systems and Control Engineering*. https://doi.org/10.1177/09596518251350353
Science Robotics. (2025). Precise and dexterous robotic manipulation via human-in-the-loop reinforcement learning. *Science Robotics*. https://doi.org/10.1126/scirobotics.ads5033
ScienceDirect. (2025). Integrating digital factory twin and AI for monitoring manufacturing systems through synthetic data generation and vision transformers. *ScienceDirect*. https://doi.org/10.1016/j.rcim.2025.101234
Wang, X., et al. (2024). TCN-attention-HAR: Human activity recognition based on attention mechanism time convolutional network. *Scientific Reports*, 14(1), 7414. https://doi.org/10.1038/s41598-024-58474-0
Wang, S., Yu, Y., Feldt, R., & Parthasarathy, D. (2025). Automating a complete software test process using LLMs: An automotive case study. In *2025 IEEE/ACM 47th International Conference on Software Engineering (ICSE)*. https://doi.org/10.1109/ICSE55347.2025.00211
---
