Abstract
Image recognition systems often struggle under complex backgrounds, scale variation, texture ambiguity, and noise interference. This article proposes a multi-scale wavelet attention network for robust image recognition across general and medical visual environments. The model uses wavelet transform convolution to decompose images into frequency-aware representations, allowing the network to preserve both low-frequency structural information and high-frequency texture details. A convolutional block attention module is integrated to adaptively emphasize important spatial and channel features. The framework is designed for image recognition tasks such as industrial defect identification, melanoma screening, breast cancer image segmentation support, vaccine side-effect monitoring, and remote sensing scene analysis. Compared with conventional convolutional neural networks, the proposed approach enhances feature discrimination by combining multi-resolution analysis with attention-based refinement. The article also discusses how frequency-domain representations can improve robustness under image blur, illumination change, noise interference, and partial occlusion. By connecting wavelet-based feature extraction with deep attention mechanisms, the study contributes to the development of robust image recognition systems for high-variability visual environments.
References
Zhu, Y. (2026). An Image Recognition Method Based on Multi-Scale Wavelet Transform Convolution and Convolutional Block Attention. Conference Paper.
Wang, C., Zheng, G., Zhang, R., & Liu, X. (2026). DPPF: Dual-Path Pre-Fusion With Semantic-Guided Encoding for Remote Sensing Image Captioning. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing.
Guo, Z., Zhao, K., & Zhang, L. (2026). InstanceRSR: Real-World Super-Resolution via Instance-Aware Representation Alignment. ICASSP 2026 - 2026 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Barcelona, Spain, 10577–10581. doi: 10.1109/ICASSP55912.2026.11462690.
Yang, J., Chung, C. I., Koach, J., Liu, H., Navalkar, A., He, H., et al., & Shu, X. (2024). MYC phase separation selectively modulates the transcriptome. Nature Structural & Molecular Biology, 31(10), 1567–1579. doi: 10.1038/s41594-024-01322-6.
Liu, Y., Li, C., Li, F., Lin, R., Zhang, D., & Lian, Y. (2025). Advances in computer vision and deep learning-facilitated early detection of melanoma. Briefings in Functional Genomics, 24, elaf002.
Xie, S., Xu, L., Lei, C., Wang, J., Wang, J., Wang, Z., Sun, Y., Li, D., Li, F., Lin, R., et al. (2026). RST2G: Residual-Guided Spatiotemporal Transformer Graph Fusion Enhancement for Breast Cancer Segmentation in DCE-MRI. Cyborg and Bionic Systems, 7, 0502.
Lang, H., Zhou, Y., Yu, Y., Su, Z., Zhuge, H., Wang, W., Fang, D., Qin, J., Wei, M., et al. (2026). Multi-modal low-dose medical imaging through instruction-guided unified AI. Frontiers in Medicine, 13, 1691143.
Li, C., Shao, S., Mikason, W., Lin, R., & Liu, Y. (2024). Utilizing Computer Vision for Continuous Monitoring of Vaccine Side Effects in Experimental Mice. arXiv preprint arXiv:2404.03121.
Woo, S., Park, J., Lee, J. Y., & Kweon, I. S. (2018). CBAM: Convolutional Block Attention Module. European Conference on Computer Vision, 3–19.
Liu, P., Zhang, H., Lian, W., & Zuo, W. (2019). Multi-level Wavelet Convolutional Neural Networks. arXiv preprint arXiv:1907.03128.
Zhao, X., Zhang, W., & Xiao, X. (2022). Wavelet-Attention CNN for image classification. Multimedia Systems.
He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep residual learning for image recognition. Proceedings of CVPR 2016, 770–778.
Hu, J., Shen, L., & Sun, G. (2018). Squeeze-and-Excitation Networks. Proceedings of CVPR 2018, 7132–7141.
Ronneberger, O., Fischer, P., & Brox, T. (2015). U-Net: Convolutional networks for biomedical image segmentation. MICCAI 2015, 234–241.
