Genomic selection in crop plants
Introduction
Plant breeding is changing profoundly in the age of quickly developing genomics. Genomic selection (GS), also known as genome-wide selection or genomic prediction, is one of the most potent new techniques in plant breeding. Genomic selection uses genome-wide molecular marker data to predict an individual's genetic merit, in contrast to traditional breeding techniques that mostly rely on phenotypic performance and pedigree. Breeders are able to increase genetic gains, speed up breeding cycles, increase accuracy, and choose candidates earlier thanks to this.
1. Definition and Background History
Genomic selection: what is it?
Genomic selection (GS) is the process of estimating breeding values (also known as genomic estimated breeding values, or GEBVs) for selection candidates using genome-wide marker data for a group of individuals, even prior to complete phenotypic evaluation. Márton Meuwissen and associates first put forth the idea in 2001 for livestock, and it has since been modified for use in plant breeding.
Essentially, GS estimates the effects of all or a very large number of markers simultaneously, capturing more of the underlying genetic variation across the genome than marker-assisted selection (MAS), which selects only on visible traits or limited marker-trait associations. For complex quantitative traits governed by numerous small-effect loci, GS is particularly well-suited due to its wider capture of variation.
Historical Context
Plant breeders relied solely on pedigree data and phenotypic selection in the early days. Later, marker-assisted selection (MAS) gained popularity with the introduction of molecular markers (RFLPs, SSRs) and quantitative trait loci (QTL) mapping. However, for highly polygenic traits—traits governed by numerous genes with little individual influence—MAS proved to be limited. The next step was GS, which allows breeders to try to predict breeding value directly from marker data thanks to dense marker coverage and enhanced statistical techniques.
From proof-of-concept in cereals like maize and wheat to broader deployment in legumes, forages, and tree species, reviews have detailed the development of GS in plants.
2. The Genomic Selection Process and Principle
The principle
This is the main concept:
1. A training population, also known as the reference population, has a high density of genotypes throughout its genome and is phenotyped for desired traits.
2. A prediction model that connects marker genotypes to phenotypic performance (or breeding value) is constructed using the training population data.
3. After that, a selection population of candidates—who may not yet have complete phenotypes—are genotyped. Genomic estimated breeding values (GEBVs) are calculated by applying the prediction model to their genotype data.
4. Instead of waiting for complete phenotypic data, individuals are chosen based on their GEBVs, which allows for earlier selection, shorter cycles, and possibly more precise selection of superior genotypes.
Key steps and workflow
Genotyping: Individuals in training and selection populations are characterized by their genomes using high-density markers, such as SNPs.
Phenotyping: The training population is used to gather solid phenotypic data on the desired traits (yield, quality, disease resistance, and abiotic stress tolerance).
Creating statistical models: To estimate breeding values or marker effects, models combine phenotypic and marker data.
Prediction: GEBVs for candidates are calculated using the model.
Selection: Breeding and advancement decisions are based on GEBVs.
Validation and updating: To ensure accuracy, the model is validated (for example, through cross-validation) and updated on a regular basis with fresh phenotypic and genotypic data.
3. Genomic Selection Statistical Models and Techniques
For GS, several statistical methods have been developed. The model selection affects interpretability, computational demands, and prediction accuracy.
Typical models
1 Ridge regression is used for shrinkage in RR-BLUP (Ridge Regression Best Linear Unbiased Prediction), which assumes that all marker effects have equal variance and are normally distributed.
2 By substituting a genomic relationship matrix (derived from marker data) for the pedigree relationship matrix, G-BLUP (Genomic BLUP) expands on BLUP models.
3 Bayesian techniques (BayesA, BayesB, BayesCπ, etc.): To account for the varying contributions of markers—some having greater effects than others—use Bayesian shrinkage priors.
4 The least absolute shrinkage and selection operator (LASSO) is a penalized regression technique that combines shrinkage and marker selection.
5 Random forests, support vector machines (SVMs), convolutional/deep neural networks, and hybrid approaches are examples of machine learning/deep learning algorithms that are being used more and more in GS. Deep models, for instance, have been demonstrated to increase accuracy in specific situations.
Assessment of the model and estimation of its accuracy
The correlation between GEBVs and observed phenotypes (or true breeding values) in validation sets (e.g., cross-validation) is frequently used to measure prediction accuracy. Training population size, trait heritability, marker density, genetic relationships, genotype × environment (G×E) interactions, and other factors all have a significant impact on model accuracy.
Crucial factors to take into account
Training population design: Accuracy is impacted by the size, diversity, and representation of the candidate population.
Marker density and coverage: In general, more markers are able to capture genetic variation and linkage disequilibrium.
Trait architecture: Accuracy is typically higher for traits with simpler genetic architectures and high heritability.
In multi-environment breeding, models that disregard G×E interactions and non-additive effects may perform poorly.
Model updating: Model retraining may be required as a result of deteriorating relationships between training and selection sets over time.
Assessment of the model and estimation of its accuracy
The correlation between GEBVs and observed phenotypes (or true breeding values) in validation sets (e.g., cross-validation) is frequently used to measure prediction accuracy. Training population size, trait heritability, marker density, genetic relationships, genotype × environment (G×E) interactions, and other factors all have a significant impact on model accuracy.
4. Important Elements Affecting the Accuracy of Genomic Selection
Understanding and optimizing accuracy and expected genetic gain is a central theme in GS research. Let's look at the important elements.
1 Training population size and association
Prediction accuracy is typically increased by larger training populations, but returns may eventually decline.
The predictive ability increases with the degree of genetic similarity between the training and selection populations.
2 Coverage and density of markers
More genome-wide variation can be captured with the use of genotyping-by-sequencing (GBS) or dense SNP arrays.
3 Genetic architecture and trait heritability
Predicting traits with high heritability is simpler. When it comes to highly polygenic traits (many genes), GS performs better than MAS.
Models that incorporate non-additive genetic variation (dominance, epistasis) may be more accurate.
Stronger linkage disequilibrium (LD) coverage is ensured by adequate marker density, which enhances prediction.
4 Interaction between genotype and environment (G×E)
If G×E is not accurately modeled, it can lower prediction accuracy in multi-environment trials. Robustness is increased by using multi-environment and multi-trait models.
5 Quantity and quality of phenotypic information
It is crucial to accurately phenotype the training population because weak models result from inaccurate phenotypes.
Data quality is being improved more and more through the use of high-throughput phenotyping (HTP) technologies (such as sensors and drones).
6 Length of breeding cycle and level of selection
Higher annual genetic gain can result from shorter cycle times and more intense selection. Shorter cycles and earlier selection are made possible by GS.
7 Updates to the model and modifications to the genome
Allele frequencies, relationships, and LD structure change over breeding cycles; accuracy is maintained through periodic retraining.
8 Financial and logistical limitations
Practical implementation is influenced by human expertise, data handling, infrastructure, and genotyping costs.
5. Genomic Selection's Use in Plant Breeding
Numerous breeding programs and a broad variety of crop species have incorporated GS. Here are some important uses and illustrations.
5.1 Cereals (rice, wheat, and maize)
GS has demonstrated significant genetic gains in maize (Zea mays) when used to select hybrid parents and in recurrent selection schemes.
GS is used for quality traits, disease resistance, and yield in wheat (Triticum aestivum) in multi-environment trials.
Research on GS for grain weight, stress tolerance, and climate change adaptation in rice (Oryza sativa) is moving forward.
5.2 Oilseeds, pulses, and legumes
For traits like drought tolerance, nitrogen fixation, disease resistance, and yield stability, GS is being used more and more in breeding programs for chickpea, soybean, groundnut, and other pulses.
5.3 Perennial crops, trees, and forages
GS has enormous potential in tree species and perennial forages with lengthy generation cycles. Because trees have a longer generation interval, earlier selection through GS is particularly beneficial.
5.4 Enhancement of germplasm and pre-breeding
Additionally, valuable alleles can be extracted from gene banks and added to elite breeding germplasm using GS. Genetic diversity enters breeding pipelines more quickly as a result.
5.5 Combining environmental and high-throughput phenotyping
Breeders can capture more complex traits (such as canopy temperature and stress responses) and increase model accuracy by combining GS with high-throughput phenotyping (HTP) platforms (such as imaging, drones, and sensors) and enviromics (environmental data).
6. Genomic Selection's Advantages
The following are the main benefits of GS for plant breeding:
Reduced breeding cycles: Early individual selection reduces the amount of time needed for thorough phenotypic assessments, allowing for quicker variety development.
Increased accuracy: GS can offer more accurate selection than phenotypic selection alone, particularly for traits that are challenging to observe or assess at the end of the cycle.
More effective use of resources: the potential to test larger populations, reduced costs per unit gain, and less dependence on lengthy field trials.
Improved management of complex traits: GS outperforms MAS in addressing traits with low heritability that are regulated by multiple genes, or polygenic traits.
Greater annual genetic gain (the ultimate aim of breeding) is the result of faster cycles plus increased accuracy.
Genetic diversity management: Relationships and diversity in breeding populations can be more effectively tracked and controlled with genome-wide data.
7. Restrictions, Difficulties, and Useful Considerations
Despite its potential, GS has drawbacks. Breeders and students need to understand the scientific and practical limitations.
Among the difficulties are:
High initial cost: Even though genotyping costs have decreased, it is still costly to put together sizable training populations with high-density genotypes and high-quality phenotypes, especially in breeding programs with limited funding.
Large and representative training populations are required. Transferability is limited, and prediction accuracy is decreased by inadequate training sets (size or diversity).
Changing genetic backgrounds: Allele frequencies and relationships change as breeding cycles go on, and if models aren't updated, their accuracy may deteriorate.
G×E and complex trait architecture: Non-additive genetic effects (dominance, epistasis) and multi-environment contexts make models more difficult to understand and can lower accuracy if ignored.
Phenotyping bottleneck: The quality of phenotypic data is still a significant constraint because subpar data leads to subpar predictions.
Computational and statistical know-how: GS calls for sophisticated statistical modeling, data administration, hardware and software infrastructure, and knowledge that might not be accessible everywhere.
Integration into breeding programs: It can be difficult to put into practice (redesigning pipelines, training breeders, reallocating funds, etc.).
Issues with diversity, ethics, and data sharing: There may be issues with germplasm access, genomic data ownership, and fair benefit sharing.
Realistic implementation considerations
1 Establishing a training population that is the proper size and diversity.
2 Ensuring cost-effective genotyping platforms with adequate marker density.
3 Investing in multi-environment trials and trustworthy phenotyping.
4 Creating breeding pipelines (such as speed breeding, integration with doubling haploids, rapid cycles, and early generation selection) that take advantage of GS.
5 Keeping an eye on prediction decay and updating models frequently.
6 Combining high-throughput phenotyping, enviromics, MAS for major genes, and GS with traditional selection.
7 Analyzing cost-benefit: how much does your breeding program return on investment?
8. How to Use Genomic Selection in a Breeding Program
Here is a recommended road map for plant breeders considering incorporating GS into their programs:
1 Goal-setting and assessment: Determine which traits to focus on (complex quantitative traits are good candidates), establish goals for genetic gain, and assess the resources that are currently available.
2 Finding and phenotyping a well-characterized set of germplasm that reflects the diversity and relationship structure of your breeding pool is the first step in training population development.
3 Select a genotyping platform with sufficient marker coverage, such as SNP arrays or GBS.
4 Phenotyping: Manage experimental design, spatial variation, and data cleaning while conducting high-quality phenotyping in various settings.
5 Model building: Choose the best performing model or models, assess cross-validation accuracy, and select appropriate statistical models.
6 Prediction and selection: Determine GEBVs, rank and choose individuals based on indices that combine GEBV, cost, and other limitations, and genotype selection candidates.
7 Redesign the breeding cycle by incorporating GS into your pipeline (e.g., use rapid generation advance, double haploids, reducing the number of phenotypic evaluation years, selecting at F2 or early generations, etc.).
8 Model updating and monitoring: Retrain and update models, track accuracy, and modify the makeup of the training population as new genotypic and phenotypic data become available.
9 Continuous improvement and cost-benefit analysis: Monitor the cost per cycle and the genetic gain per unit of time, and adjust procedures as necessary.
Plant breeding optimization studies show that GS efficiency is increased by using multi-trait/multi-environment models, training population design, field design, and HTP.
9. Prospects for the Future and New Developments in Genomic Selection
GS has a promising future, and a few new developments are noteworthy:
1 Integration with omics data: Adding transcriptomics, metabolomics, proteomics, and epigenomics to GS models could increase prediction accuracy even more than just using markers.
2 Deep learning and machine learning: More sophisticated algorithms (such as transformers and convolutional neural networks) are being evaluated for their capacity to simulate intricate and non-linear genotype-phenotype relationships.
3 High-throughput phenotyping and envirotyping: Breeders can generate richer datasets for GS by capturing detailed information on traits and environmental response using sensor data, drones, imaging, and enviromics.
4 Fast breeding cycles and speed breeding: By combining GS with methods like single-seed descent, doubled haploids, and controlled environment rapid generation advance, breeding cycles can be turned over even more quickly.
5 Accessibility and cost reduction: Smaller breeding programs and areas with fewer resources can now afford GS as genotyping and phenotyping costs decline.
6 Climate-resilient crops: GS will be crucial in breeding crops that are resilient and able to adapt to abiotic stress, changing climate conditions, and new pests and diseases.
7 Open data, worldwide training populations: Accuracy and applicability across germplasm pools will be improved by programs sharing genotypic/phenotypic datasets and cooperative training populations.
8 Breeding system integration: As part of a "modern breeding triangle" that also includes MAS, gene editing (CRISPR), genomic prediction, speed breeding, and other technologies, GS will become a standard and essential part of breeding programs.
10. Case Study Synopsis: Genomic Selection of Maize and Wheat
Zea mays, or maize
Breeding programs for maize have embraced GS early on. Selection decisions have improved, for instance, when hybrid performance is predicted from parental inbred lines. In recurrent selfing populations, GS has also resulted in shorter cycle times.
Triticum aestivum, or wheat
GS has been applied to wheat in multi-environment trials for yield, quality, and disease resistance. Accuracy is increased in models that use G×E effects and multi-environment data.
The observable outcome was improved candidate line selection, more effective trial resource allocation, and higher annual genetic gain when compared to traditional pipelines.
11. Synopsis and Key Takeaways
1 Genomic selection is a potent and contemporary breeding technique that estimates breeding values (GEBVs) and speeds up genetic improvement using genome-wide markers and statistical models.
2 For complex traits, especially those governed by numerous small-effect genes, GS performs better than traditional marker-assisted selection.
3 Large and carefully planned training populations, dense genotyping, high-quality phenotyping, reliable statistical models, and integration into breeding pipelines are all necessary for successful implementation.
4 Training population size and relationship, marker density, trait heritability, G×E interactions, phenotypic data quality, and cycle time are important factors that affect accuracy.
5 Many crops, such as trees, perennials, legumes, and cereals, have embraced GS, leading to quicker cycles and greater profits.
6 Shorter breeding cycles, increased accuracy, cost-effectiveness, enhanced management of complex traits, and increased annual genetic gain are all noteworthy advantages.
7 The cost of genotyping and phenotyping, the requirement for sophisticated modeling, shifting populations, the complexity of G×E, and practical integration into breeding programs are still obstacles.
8 In the future, GS's power will be further increased by combining it with omics, machine learning, HTP, speed-breeding, open data, and international collaborations.
9 Understanding genetics, quantitative trait modeling, statistics, data science, and practical breeding strategy is all necessary for students and plant scientists to master GS.
Keywords: genomic selection, genomic prediction, genomic estimated breeding values, plant breeding,genome-wide markers, training population, prediction accuracy, genotyping, high-throughput phenotyping, genetic gains in crops
(Note: The article was created by ChatGPT; however, conceptualization, review, and editing of this article were done by Dr. UKS Kushwaha.)
Comments
Post a Comment