您好,欢迎访问江西省农业科学院 机构知识库!

TransGeneSelector: using a transformer approach to mine key genes from small transcriptomic datasets in plant responses to various environments

文献类型: 外文期刊

作者: Huang, Kerui 1 ; Tian, Jianhong 2 ; Sun, Lei 3 ; Hu, Haoliang 1 ; Huang, Xuebin 1 ; Zhou, Shiqi 4 ; Deng, Aihua 1 ; Zhou, Zhibo 1 ; Jiang, Ming 1 ; Li, Guiwu 2 ; Xie, Peng 1 ; Wang, Yun 1 ; Jiang, Xiaocheng 2 ;

作者机构: 1.Hunan Univ Arts & Sci, Key Lab Agr Prod Proc & Food Safety Hunan Higher E, Changde 415000, Peoples R China

2.Hunan Normal Univ, Coll Life Sci, Changsha 410081, Hunan, Peoples R China

3.Huaihua Univ, Coll Biol & Food Engn, Key Lab Res & Utilizat Ethnomed Plant Resources Hu, Huaihua 418000, Peoples R China

4.Jiangxi Acad Agr Sci, Rice Res Inst, Nanchang 330200, Peoples R China

关键词: Gene mining; Plant; Deep learning; Machine learning; Small sample

期刊名称:BMC GENOMICS ( 影响因子:3.7; 五年影响因子:4.2 )

ISSN: 1471-2164

年卷期: 2025 年 26 卷 1 期

页码:

收录情况: SCI

摘要: Gene mining is crucial for understanding the regulatory mechanisms underlying complex biological processes, particularly in plants responding to environmental conditions. Traditional machine learning methods, while useful, often overlook important gene relationships due to their reliance on manual feature selection and limited ability to capture complex inter-gene regulatory dynamics. Deep learning approaches, while powerful, are often unsuitable for small sample sizes. This study introduces TransGeneSelector, the first deep learning framework specifically designed for mining key genes from small transcriptomic datasets. By integrating a Wasserstein Generative Adversarial Network with Gradient Penalty (WGAN-GP) for sample generation and a Transformer-based network for classification, TransGeneSelector efficiently addresses the challenges of small-sample transcriptomic data, capturing both global gene regulatory interactions and specific biological processes. Evaluated in Arabidopsis thaliana, the model achieved high classification accuracy in predicting seed germination and heat stress conditions, outperforming traditional methods like Random Forest and Support Vector Machines (SVM). Moreover, Shapley Additive Explanations (SHAP) analysis and gene regulatory network construction revealed that TransGeneSelector effectively identified genes that appear to have upstream regulatory functions based on our analyses, enriching them in multiple key pathways which are critical for seed germination and heat stress response. RT-qPCR validation further confirmed the model's gene selection accuracy, demonstrating consistent expression patterns across varying germination conditions. The findings underscore the potential of TransGeneSelector as a robust tool for gene mining, offering deeper insights into gene regulation and organism adaptation under diverse environmental conditions. This work provides a framework that leverages deep learning for key gene identification in small transcriptomic datasets.

  • 相关文献
作者其他论文 更多>>