Semantic-Guided Remote Sensing Captioning and Cross-Modal Image Restoration

Peggy R Jones; Emily Richardson

Vol. 2 No. 1 (2026), Articles

Vol. 2 No. 1 (2026)

Semantic-Guided Remote Sensing Captioning and Cross-Modal Image Restoration

Articles

Published 2026-05-26

Peggy R Jones
Emily Richardson

Peggy R Jones

Emily Richardson

Keywords

remote sensing captioning

Abstract

Remote sensing image interpretation requires the integration of semantic encoding, visual restoration, hyperspectral-LiDAR fusion, and efficient vision-language representation. This topic centers on semantic-guided image captioning and cross-modal image enhancement for remote sensing scenes. Dual-path pre-fusion supports the alignment of visual and semantic cues in caption generation, while hyperspectral and LiDAR fusion improves spatial-spectral representation for land-cover and scene understanding. Instance-aware super-resolution enhances real-world visual details, and token-efficient forensic vision-language modeling contributes methods for compressing multimodal evidence without losing semantic information. Medical image studies provide related examples of low-dose image enhancement, melanoma detection, and DCE-MRI segmentation, showing how visual restoration and semantic reasoning can generalize across high-stakes visual domains.

References

Dai, Y., Chen, Z., Pradeepkumar, J., Matsubara, Y., Sun, J., Sakurai, Y., & Dong, Y. (2026). EpiGraph: Building Generalists for Evidence-Intensive Epilepsy Reasoning in the Wild. arXiv preprint arXiv:2605.09505.

Wang, C., Zheng, G., Zhang, R., & Liu, X. (2026). DPPF: Dual-Path Pre-Fusion With Semantic-Guided Encoding for Remote Sensing Image Captioning. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing.

Yang, J. X., Wang, J., Li, Z., Sui, C., Long, Z., & Zhou, J. (2025). HSLiNets: Evaluating Band Ordering Strategies in Hyperspectral and LiDAR Fusion. IEEE Geoscience and Remote Sensing Letters, 22, 1–5, Article 5505605. https://doi.org/10.1109/LGRS.2025.3567626

Xie, Z., Tan, G., Liu, W., & Sun, N. (2019, June). IA-SpGEMM: An input-aware auto-tuning framework for parallel sparse matrix-matrix multiplication. In Proceedings of the ACM International Conference on Supercomputing (pp. 94–105).

Lai, Y., Yu, Z., Wang, J., Shen, L., Xu, Y., & Cao, X. (2026). ForensicZip: More Tokens are Better but Not Necessary in Forensic Vision-Language Models. arXiv preprint arXiv:2603.12208.

Guo, Z., Zhao, K., & Zhang, L. (2026). InstanceRSR: Real-World Super-Resolution via Instance-Aware Representation Alignment. ICASSP 2026 - 2026 IEEE International Conference on Acoustics, Speech and Signal Processing, Barcelona, Spain, 10577–10581. https://doi.org/10.1109/ICASSP55912.2026.11462690

Xie, S., Xu, L., Lei, C., Wang, J., Wang, J., Wang, Z., Sun, Y., Li, D., Li, F., Lin, R., et al. (2026). RST2G: Residual-Guided Spatiotemporal Transformer Graph Fusion Enhancement for Breast Cancer Segmentation in DCE-MRI. Cyborg and Bionic Systems, 7, 0502.

Lang, H., Zhou, Y., Yu, Y., Su, Z., Zhuge, H., Wang, W., Fang, D., Qin, J., Wei, M., et al. (2026). Multi-modal low-dose medical imaging through instruction-guided unified AI. Frontiers in Medicine, 13, 1691143.

Liu, Y., Li, C., Li, F., Lin, R., Zhang, D., & Lian, Y. (2025). Advances in computer vision and deep learning-facilitated early detection of melanoma. Briefings in Functional Genomics, 24, elaf002.

Hu, Y., Yuan, J., Wen, C., Lu, X., & Li, X. (2025). RSGPT: A remote sensing vision language model and benchmark. ISPRS Journal of Photogrammetry and Remote Sensing, 224, 272–286.