Bioinformatics And Machine Learning To Improve Cotton By Means Of Transcription Factors And Ssrs
DOI:
https://doi.org/10.64252/jbm5t013Keywords:
regulatory networks, gene evolution, comparative genomics, genomic selection, computational biologyAbstract
This study integrated bioinformatics tools and machine learning techniques to analyze transcription factors (TFs) and molecular markers in Gossypium species, focusing on their relationship with fiber quality and stress tolerance. A comparative analysis of three species—tetraploid G. hirsutum and its diploid ancestors G. arboreum and G. raimondii—led to the identification of 9,306 non-redundant TFs. In G. hirsutum, regulatory families such as MYB and bHLH were notably expanded, likely due to its polyploid nature. SSR analysis revealed species-specific patterns, with dinucleotide repeats predominating in G. raimondii and trinucleotide motifs being more frequent in G. hirsutum, suggesting divergent evolutionary pathways. Predictive modeling showed that 78% of TFs are conserved, while 2,109 clusters showed single-copy genes, indicating gene loss or functional specialization. Experimental validation confirmed the functional role of TFs such as GhHOX3 and MYB48 in fiber development, and the association of specific SSRs with differential gene expression. These findings enhance our understanding of regulatory networks in cotton and provide valuable molecular markers for breeding programs. The methodology applied highlights the potential of computational approaches to accelerate the functional characterization of candidate genes in crops, reducing the time and costs associated with traditional methods.