Machine learning approaches to identify core and dispensable genes in pangenomes
Machine learning approaches to identify core and dispensable genes in pangenomes
Blog Article
Abstract A gene in a given taxonomic group is either present in every individual (core) or absent in at least a single individual (dispensable).Previous pangenomic studies have identified certain functional differences between core and dispensable genes.However, identifying if a gene ORG PLANT CALCIUM belongs to the core or dispensable portion of the genome requires the construction of a pangenome, which involves sequencing the genomes of many individuals.Here we aim to leverage the previously characterized core and dispensable gene content for two grass species [Brachypodium distachyon (L.
) P.Beauv.and Oryza sativa L.] to construct a machine learning model capable of accurately classifying genes as core or dispensable using only a single annotated reference genome.
Such a Office Supplies model may mitigate the need for pangenome construction, an expensive hurdle especially in orphan crops, which often lack the adequate genomic resources.