Abstract
Remote sensing image interpretation requires the integration of semantic encoding, visual restoration, hyperspectral-LiDAR fusion, and efficient vision-language representation. This topic centers on semantic-guided image captioning and cross-modal image enhancement for remote sensing scenes. Dual-path pre-fusion supports the alignment of visual and semantic cues in caption generation, while hyperspectral and LiDAR fusion improves spatial-spectral representation for land-cover and scene understanding. Instance-aware super-resolution enhances real-world visual details, and token-efficient forensic vision-language modeling contributes methods for compressing multimodal evidence without losing semantic information. Medical image studies provide related examples of low-dose image enhancement, melanoma detection, and DCE-MRI segmentation, showing how visual restoration and semantic reasoning can generalize across high-stakes visual domains.
References
Dai, Y., Chen, Z., Pradeepkumar, J., Matsubara, Y., Sun, J., Sakurai, Y., & Dong, Y. (2026). EpiGraph: Building Generalists for Evidence-Intensive Epilepsy Reasoning in the Wild. arXiv preprint arXiv:2605.09505.
Wang, C., Zheng, G., Zhang, R., & Liu, X. (2026). DPPF: Dual-Path Pre-Fusion With Semantic-Guided Encoding for Remote Sensing Image Captioning. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing.
Yang, J. X., Wang, J., Li, Z., Sui, C., Long, Z., & Zhou, J. (2025). HSLiNets: Evaluating Band Ordering Strategies in Hyperspectral and LiDAR Fusion. IEEE Geoscience and Remote Sensing Letters, 22, 1–5, Article 5505605. https://doi.org/10.1109/LGRS.2025.3567626
Xie, Z., Tan, G., Liu, W., & Sun, N. (2019, June). IA-SpGEMM: An input-aware auto-tuning framework for parallel sparse matrix-matrix multiplication. In Proceedings of the ACM International Conference on Supercomputing (pp. 94–105).
Lai, Y., Yu, Z., Wang, J., Shen, L., Xu, Y., & Cao, X. (2026). ForensicZip: More Tokens are Better but Not Necessary in Forensic Vision-Language Models. arXiv preprint arXiv:2603.12208.
Guo, Z., Zhao, K., & Zhang, L. (2026). InstanceRSR: Real-World Super-Resolution via Instance-Aware Representation Alignment. ICASSP 2026 - 2026 IEEE International Conference on Acoustics, Speech and Signal Processing, Barcelona, Spain, 10577–10581. https://doi.org/10.1109/ICASSP55912.2026.11462690
Xie, S., Xu, L., Lei, C., Wang, J., Wang, J., Wang, Z., Sun, Y., Li, D., Li, F., Lin, R., et al. (2026). RST2G: Residual-Guided Spatiotemporal Transformer Graph Fusion Enhancement for Breast Cancer Segmentation in DCE-MRI. Cyborg and Bionic Systems, 7, 0502.
Lang, H., Zhou, Y., Yu, Y., Su, Z., Zhuge, H., Wang, W., Fang, D., Qin, J., Wei, M., et al. (2026). Multi-modal low-dose medical imaging through instruction-guided unified AI. Frontiers in Medicine, 13, 1691143.
Liu, Y., Li, C., Li, F., Lin, R., Zhang, D., & Lian, Y. (2025). Advances in computer vision and deep learning-facilitated early detection of melanoma. Briefings in Functional Genomics, 24, elaf002.
Hu, Y., Yuan, J., Wen, C., Lu, X., & Li, X. (2025). RSGPT: A remote sensing vision language model and benchmark. ISPRS Journal of Photogrammetry and Remote Sensing, 224, 272–286.
