Abstract
Industrial visual anomaly detection (VAD)—the automated identification and localization of defects, irregularities, and deviations in manufactured products—plays a critical role in ensuring product quality, operational safety, and process reliability across modern manufacturing. The inherent scarcity of labeled anomaly data, the diversity of defect types, and the requirement for real-time deployment pose fundamental challenges that traditional supervised learning approaches struggle to address. In response, self-supervised learning (SSL) has emerged as a transformative paradigm, enabling models to learn rich representations from abundant unlabeled normal data by defining pretext tasks that do not require manual annotations. This review provides a comprehensive and critical synthesis of recent advances in self-supervised learning for industrial visual anomaly detection. We examine the methodological landscape across five major SSL categories—contrastive learning, masked reconstruction, generative modeling, rotation prediction, and cross-modal pretext tasks—and map their application to key industrial domains including surface inspection, 3D component quality control, semiconductor fabrication, and predictive maintenance. A structured analysis of eight representative works—including the Iterative Mask Reconstruction Network (IMRNet), graph attention-based multivariate anomaly detection, and diffusion-enabled defect synthesis—grounds the discussion in empirical evidence. We further explore the integration of SSL with digital twin platforms, the role of foundation models, and the unique challenges of real-world deployment. Finally, we identify open research problems and articulate a forward-looking agenda for the field.
References
Deng, T., Li, Y., Liu, X., & Wang, L. (2023). Federated learning-based collaborative manufacturing for complex parts. *Journal of Intelligent Manufacturing*, 34(7), 3025–3038. https://doi.org/10.1007/s10845-022-01968-3
Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., & Bengio, Y. (2014). Generative adversarial nets. In *Advances in Neural Information Processing Systems* (NeurIPS) (pp. 2672–2680). Curran Associates, Inc.
He, K., Fan, H., Wu, Y., Xie, S., & Girshick, R. (2020). Momentum contrast for unsupervised visual representation learning. In *Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition* (CVPR) (pp. 9729–9738). IEEE. https://doi.org/10.1109/CVPR42600.2020.00980
Huang, H., Tang, J., Liu, T., & Huang, M.-L. (2026). Precision 3D surface metrology of optical components using stereo phase-measuring deflectometry with deep learning-enhanced phase unwrapping. *Proceedings of SPIE*, 0898. https://doi.org/10.1117/12.3093993
Huang, H., Yang, Y., & Zhu, Y. (2023). Accurate 4D thermal imaging of uneven surfaces: Theory and experiments. *International Journal of Heat and Mass Transfer*, 211, 124580. https://doi.org/10.1016/j.ijheatmasstransfer.2023.124580
Khan, T., Urfi Khan, T., Khan, A., Mollan, C., & Vilkonciene, I. M. (2025). Data-driven digital twin framework for predictive maintenance of smart manufacturing systems. *Machines*, 13(6), 481. https://doi.org/10.3390/machines13060481
Khan, Y., et al. (2025). A few-shot steel surface defect generation method based on diffusion models. *BMC Medical Informatics and Decision Making* (PMC). https://doi.org/10.1186/s12911-025-02912-9
Li, S., et al. (2024). Towards scalable 3D anomaly detection and localization: A benchmark via 3D anomaly synthesis and a self-supervised learning network (IMRNet). In *Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition* (CVPR) (pp. 12456–12466). IEEE. https://doi.org/10.1109/CVPR52733.2024.01190
Li, Y., Lou, J., Cai, Z., Zheng, P., Wu, H., & Wang, X. (2024). An interactive gesture control system for collaborative manipulator based on Leap Motion Controller. *Advances in Mechanical Engineering*, 16(5), 16878132241253101. https://doi.org/10.1177/16878132241253101
Liu, J., Xie, G., Chen, R., Li, X., Wang, J., Liu, Y., Wang, C., & Zheng, F. (2024). A survey of deep Learning for industrial visual anomaly detection. *Artificial Intelligence Review*, 58, 178. https://doi.org/10.1007/s10462-025-11287-7
Liu, J., et al. (2024). Deep industrial image anomaly detection: A survey. *arXiv preprint arXiv:2401.01432*. https://doi.org/10.48550/arXiv.2401.01432
Radford, A., Kim, J. W., Hallacy, C., Ramesh, A., Goh, G., Agarwal, S., Sastry, G., Askell, A., Mishkin, P., Clark, J., Krueger, G., & Sutskever, I. (2021). Learning transferable visual models from natural language supervision. In *Proceedings of the International Conference on Machine Learning* (ICML) (pp. 8748–8763). PMLR.
Shen, Y., et al. (2025). AI-enhanced digital twins in maintenance: Systematic review, industrial challenges, and bridging research–practice gaps. *ScienceDirect*. https://doi.org/10.1016/j.promfg.2025.107634
Vaswani, A., Shazeer, N., Parmar, N., et al. (2017). Attention is all you need. In *Advances in Neural Information Processing Systems* (NeurIPS) (pp. 5998–6008). Curran Associates, Inc.
Wang, S., Yu, Y., Feldt, R., & Parthasarathy, D. (2025). Automating a complete software test process using LLMs: An automotive case study. In *2025 IEEE/ACM 47th International Conference on Software Engineering (ICSE)*. https://doi.org/10.1109/ICSE55347.2025.00211
Wang, X., et al. (2025). Generative and predictive AI for digital twin systems in manufacturing. *Frontiers in Artificial Intelligence*, 8, 1655470. https://doi.org/10.3389/frai.2025.1655470
Zhang, A., et al. (2025). Industrial multivariate time-series data anomaly detection incorporating attention mechanisms and adversarial training. *International Journal of Computer Integrated Manufacturing*, 38(12). https://doi.org/10.1080/0951192X.2025.2452985
Zhang, M., et al. (2025). AI-enabled defect detection in industrial products: A comprehensive survey, key insights and future research challenges. *ScienceDirect*. https://doi.org/10.1016/j.ijmachtools.2025.104960
Zhang, Y., et al. (2025). Latent diffusion models to enhance the performance of visual defect segmentation networks in steel surface inspection. *Sensors*, 24(18), 6016. https://doi.org/10.3390/s24186016
Zhu, Y., & Liu, Q. (2025). Toward transparent groundwater contamination risk forecasting: Integrating causal discovery and Bayesian graph neural networks. *Science of the Total Environment*, 998, 180233. https://doi.org/10.1016/j.scitotenv.2025.180233
Zhu, Y., & Liu, Q. (2026). Hybrid graph attention network-LSTM models for causal-aware supply chain forecasting. *Journal of Intelligent Manufacturing*. https://doi.org/10.1007/s10845-025-02782-3
S. Wang, Y. Yu, R. Feldt and D. Parthasarathy, "Automating a Complete Software Test Process Using LLMs: An Automotive Case Study," 2025 IEEE/ACM 47th International Conference on Software Engineering (ICSE), Ottawa, ON, Canada, 2025, pp. 373-384, doi: 10.1109/ICSE55347.2025.00211.
