Multi-Modal Deep Learning and Multi-Agent Collaboration for Intelligent Fault Diagnosis in Industrial Equipment
PDF

Keywords

Fault Diagnosis
Deep Learning
Multi-Modal Fusion

Abstract

Unplanned equipment downtime is one of the most costly problems in modern manufacturing, causing significant production losses and safety risks. Traditional fault diagnosis in industrial settings relies on periodic manual inspections and rule-based monitoring systems, both of which struggle to detect incipient failures before they escalate into catastrophic breakdowns. Recent advances in deep learning and multi-agent systems offer new possibilities for automated, accurate, and scalable fault diagnosis. This study proposes an Intelligent Fault Diagnosis System based on Multi-Modal Deep Learning and Multi-Agent Collaboration (IFD-MDMAC). The system integrates visual data (optical camera images), thermal data (infrared thermography), and vibration data (accelerometer signals) through a multi-modal feature fusion architecture. A deep learning backbone extracts features from each modality and fuses them into a unified representation for fault detection. Downstream of fault detection, a multi-agent module decomposes the diagnosis workflow into specialized tasks—fault classification, severity assessment, remaining useful life (RUL) estimation, and maintenance recommendation—each handled by a dedicated LLM-powered agent. The multi-agent design enables structured reasoning and explainable diagnosis outputs, addressing the opacity limitation of traditional black-box deep learning models. Experiments conducted on three publicly available industrial equipment fault datasets demonstrate that the proposed system achieves an average fault detection accuracy of 93.4% and a classification accuracy of 89.6%. The multi-agent diagnosis module achieves an RUL estimation error of 8.7% and a maintenance recommendation consistency of 82% with expert maintenance engineers. The root cause inference capability reduces average diagnosis time from 45 minutes (manual) to 12 minutes (automated), representing a 73% reduction. This study validates the effectiveness of combining multi-modal deep learning perception with multi-agent cognitive diagnosis for intelligent industrial maintenance.

PDF

References

Huang, H., Tang, J., Liu, T., & Huang, M. (2026). Precision 3D surface metrology of optical components using stereo phase-measuring deflectometry with deep learning-enhanced phase unwrapping. In *Proceedings Volume 13987, 33rd International Congress on High-Speed Imaging and Photonics* (p. 1398704). SPIE. https://doi.org/10.1117/12.3093993

Huang, H., Yang, Y., & Zhu, Y. (2023). Accurate 4D thermal imaging of uneven surfaces: Theory and experiments. *International Journal of Heat and Mass Transfer*, 216, 124580. https://doi.org/10.1016/j.ijheatmasstransfer.2023.124580

Wang, S., Yu, Y., Feldt, R., & Parthasarathy, D. (2025). Automating a complete software test process using llms: An automotive case study. arXiv preprint arXiv:2502.04008. https://doi.org/10.1109/ICSE55347.2025.00211