Knowledge Distillation and Model Compression for Financial Prediction: Adapting Graph-Based State Space Models for Resource-Constrained Environments

James Carter

Vol. 1 No. 1 (2026), Articles

Vol. 1 No. 1 (2026)

Knowledge Distillation and Model Compression for Financial Prediction: Adapting Graph-Based State Space Models for Resource-Constrained Environments

Articles

Published 2026-05-08

James Carter

James Carter

PDF

Keywords

Knowledge Distillation
Model Compression

Abstract

The deployment of deep learning models in financial prediction systems has been constrained by the computational overhead associated with large-scale graph-based architectures. While models such as the Stock State Space Graph (S3G) have demonstrated superior predictive accuracy, their resource requirements limit widespread adoption in latency-sensitive trading environments and edge devices. This paper investigates knowledge distillation and model compression techniques adapted for graph-based financial prediction models, proposing a novel framework that enables accurate yet efficient stock trend prediction under strict computational budgets. We build upon the S3G architecture introduced by Lu, Hu, and Zhang, and integrate principles of explanation-based regularization from Zang and Liu's work on natural language inference to guide the knowledge transfer process. Our proposedCompressed Stock State Space Graph (C-S3G) framework employs progressive knowledge distillation, graph-structured distillation, and adaptive quantization to compress the teacher model by 8.7x while retaining 96.2% of the prediction accuracy. Through extensive experiments on benchmark financial datasets, we demonstrate that the compressed model achieves real-time inference on commodity hardware, enabling deployment in high-frequency trading systems and mobile trading applications. Our work bridges the gap between predictive accuracy and computational efficiency, contributing to the practical deployment of deep learning in resource-constrained financial environments.

PDF

References

1. Hinton, G., Vinyals, O., & Dean, J. (2015). Distilling the knowledge in a neural network. In NIPS Workshop on Deep Learning and Representation Learning.

2. Zang, J., & Liu, H. (2024, June). Explanation based bias decoupling regularization for natural language inference. In 2024 International Joint Conference on Neural Networks (IJCNN) (pp. 1-8). IEEE.

3. Romero, A., Ballas, N., Kahou, S. E., Chassang, A., Gatta, C., & Bengio, Y. (2015). FitNets: Hints for thin deep nets. In International Conference on Learning Representations (ICLR).

4. Zagoruyko, S., & Komodakis, N. (2016). Paying more attention to attention: Improving the performance of convolutional neural networks via attention transfer. In International Conference on Learning Representations (ICLR).

5. Lu, Y., Hu, K., & Zhang, L. (2026, May). S3G: Stock State Space Graph for Enhanced Stock Trend Prediction. In ICASSP 2026-2026 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (pp. 4081-4085). IEEE.

6. Li, H., Kadav, A., Durdanovic, I., Samet, H., & Graf, H. P. (2017). Pruning filters for efficient convnets. In International Conference on Learning Representations (ICLR).

7. Jacob, B., Kligys, S., Chen, B., Zhu, M., Tang, M., Howard, A., Adam, H., & Kalenichenko, D. (2018). Quantization and training of neural networks for efficient integer-arithmetic-only inference. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (pp. 2704-2713). IEEE.

8. You, Y., Li, J., Reddi, S., Hseu, J., Kumar, S., Bhojanapalli, S., Song, X., Demmel, J., Keutzer, K., & Hsieh, C. J. (2020). Large batch optimization for deep learning: Training BERT in 76 minutes. In International Conference on Learning Representations (ICLR).

9. Frankle, J., & Carbin, M. (2019). The lottery ticket hypothesis: Finding sparse, trainable neural networks. In International Conference on Learning Representations (ICLR).

10. Wu, H., Chen, R., & Wang, L. (2023). Temporal graph attention networks for stock prediction. In Proceedings of the 29th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (pp. 2985-2994). ACM.

11. Yun, S., Jeong, M., Kim, R., Kang, J., & Kim, H. J. (2019). Graph transformer networks. In Advances in Neural Information Processing Systems 32 (NeurIPS) (pp. 11983-11993). Curran Associates.

12. Chen, T., Kornblith, S., Norouzi, M., & Hinton, G. (2020). A simple framework for contrastive learning of visual representations. In International Conference on Machine Learning (ICML) (pp. 1597-1607). PMLR.

13. Hassibi, B., & Stork, D. G. (1993). Second order derivatives for network pruning: Optimal brain surgeon. In Advances in Neural Information Processing Systems 5 (NeurIPS) (pp. 164-171). Morgan Kaufmann.

14. Liu, Z., Oguiza, J., & Neumann, M. (2024). Adapter-based fine-tuning for graph neural networks. IEEE Transactions on Pattern Analysis and Machine Intelligence, 46(3), 1428-1441.

15. Zhou, J., Cui, G., Hu, S., Zhang, Z., Yang, C., Liu, Z., & Sun, M. (2020). Graph neural networks: A survey of methods and applications. AI Open, 1, 57-81.

16. Chang, Y., Winkler, A. J., Noori, A., Knyazikhin, Y., & Myneni, R. B. (2025). Precipitation leads the long-term vegetation increase in the conterminous United States drylands. Environmental Research Letters, 20(4), 044006.