-
Sequencing genes in silico using single nucleotide polymorphisms
Background:
The advent of high throughput sequencing technology has enabled the 1000 Genomes Project Pilot 3 to generate complete sequence data for more than 906 genes and 8,140 exons representing 697 subjects. The 1000 Genomes database provides a critical opportunity for further interpreting disease associations with single nucleotide polymorphisms (SNPs) discovered from genetic association studies. Currently, direct sequencing of candidate genes or regions on a large number of subjects remains both cost- and time-prohibitive.
Results:
To accelerate the translation from discovery to functional studies, we propose an in silico gene sequencing method (ISS), which predicts phased sequences of intragenic regions, using SNPs. The key underlying idea of our method is to infer diploid sequences (a pair of phased sequences/alleles) at every functional locus utilizing the deep sequencing data from the 1000 Genomes Project and SNP data from the HapMap Project, and to build prediction models using flanking SNPs. Using this method, we have developed a database of predictive models for 611 known genes. Sequence prediction accuracy for these genes is 96.26% on average (ranges 79% - 100%). This database of predictive models can be enhanced and scaled up to include new genes as the 1000 Genomes Project sequences additional genes on additional individuals. Applying our predictive model for the KCNJ11 gene to the Wellcome Trust Case Control Consortium (WTCCC) Type 2 diabetes cohort, we demonstrate how the prediction of phased sequences inferred from GWAS SNP genotype data can be used to facilitate interpretation and identify a probable functional mechanism such as protein changes.
Conclusions:
Prior to the general availability of routine sequencing all subjects, the ISS method proposed here provides a time- and cost-effective approach to broadening the characterization of disease associated SNPs and regions, and facilitating the prioritization of candidate genes for more detailed functional and mechanistic studies.
-
Assessing the joint effect of population stratification and sample selection in studies of gene-gene (environment) interactions
Background:
It is well known that the presence of population stratification (PS) may cause the usual test in case-control studies to produce spurious gene-disease associations. However, the impact of the PS and sample selection (SS) is less known. In this paper, we provide a systematic study of the joint effect of PS and SS under a more general risk model containing genetic and environmental factors. We provide simulation results to show the magnitude of the bias and its impact on type I error rate of the usual chi-square test under a wide range of PS level and selection bias.
Results:
The biases to the estimation of main and interaction effect are quantified and then their bounds derived. The estimated bounds can be used to compute conservative p-values for the association tests. If the conservative p-value is smaller than the significance level, we can safely claim that the association test is significant regardless of the presence of PS or not, or if there is any selection bias. We also identify conditions for the null bias. The bias depends on the allele frequencies, exposure rates, gene-environment odds ratios and disease risks across subpopulations and the samplings in the case and control populations.
Conclusion:
Our results show that the bias cannot be ignored even the case and control data were matched in ethnicity. A real example is given to illustrate application of the conservative p-value. These results are useful to the genetic association studies of main and interaction effects.
-
An R package "VariABEL" for genome-wide searching of
potentially interacting loci by testing genotypic variance heterogeneity
Background:
Hundreds of new loci have been discovered by genome-wide association studies of human traits. These studies mostly focused on associations between single locus and a trait. Interactions between genes and between genes and environmental factors are of interest as they can improve our understanding of the genetic background underlying complex traits. Genome-wide testing of complex genetic models is a computationally demanding task. Moreover, testing of such models leads to multiple comparison problems that reduce the probability of new findings. Assuming that the genetic model underlying a complex trait can include hundreds of genes and environmental factors, testing of these models in genome-wide association studies represent substantial difficulties.We and Pare with colleagues (2010) developed a method allowing to overcome such difficulties. The method is based on the fact that loci which are involved in interactions can show genotypic variance heterogeneity of a trait. Genome-wide testing of such heterogeneity can be a fast scanning approach which can point to the interacting genetic variants.
Results:
In this work we present a new method, SVLM, allowing for variance heterogeneity analysis of imputed genetic variation. Type I error and power of this test are investigated and contracted with these of the Levene's test. We also present an R package, VariABEL, implementing existing and newly developed tests. Conclusions: Variance heterogeneity analysis is a promising method for detection of potentially interacting loci. New method and software package developed in this work will facilitate such analysis in genome-wide context.
-
Correction: The B chromosome of the cichlid fish Haplochromis obliquidens harbors 18S rRNA genes
After the publication of our work [1], we detected that the species focus of the study, Astatotilapia latifasciata (Figure 1), was erroneously identified as Haplochromis obliquidens. This species was described as Haplochromis latifasciatus [2] and later ascribed to the genus Astatotilapia [3]. Our mistake comes from the fact that this species is also frequently listed as Haplochromis "zebra obliquidens" in the aquarium trade. Astatotilapia latifasciata has been reported to occur in Lake Nawampasa a small satellite lake of the much larger Lake Kyoga, and in Lake Kyoga located north of Lake Victoria in Uganda [3].1 Poletto AB, Ferreira IA, Martins C: The B chromosome of the cichlid fish Haplochromis obliquidens harbors 18S rRNA genes. BMC Genetics 2010, 11: 1
-
Correction: Chromosome differentiation patterns during cichlid fish evolution
After the publication of our work [1], we detected that one of the species analyzed in the study, Astatotilapia latifasciata (Figure 1), was erroneously identified as Haplochromis obliquidens. This species was described as Haplochromis latifasciatus [2] and later ascribed to the genus Astatotilapia [3]. Our mistake comes from the fact that this species is also frequently listed as Haplochromis "zebra obliquidens" in the aquarium trade. Astatotilapia latifasciata has been reported to occur in Lake Nawampasa a small satellite lake of the much larger Lake Kyoga, and in Lake Kyoga located north of Lake Victoria in Uganda [3].1 Poletto AB, Ferreira IA, Cabral-de-Mello DC, Nakajima RT, Mazzuchelli J, Ribeiro HB, Venere PC, Nirchio M, Kocher TD, Martins C: Chromosome differentiation patterns during cichlid fish evolution. BMC Genetics 2010, 11:50.
-
UPDG: Utilities package for data analysis of Pooled DNA GWAS
Background:
Despite being a well-established strategy for cost reduction in disease gene mapping, pooled DNA association study is much less popular than the individual DNA approach. This situation is especially true for pooled DNA genomewide association study (GWAS), for which very few computer resources have been developed for its data analysis. This motivates the development of UPDG (Utilities package for data analysis of Pooled DNA GWAS).
Results:
UPDG represents a generalized framework for data analysis of pooled DNA GWAS with the integration of Unix/Linux shell operations, Perl programs and R scripts. With the input of raw intensity data from GWAS, UPDG performs the following tasks in a stepwise manner: raw data manipulation, correction for allelic preferential amplification, normalization, nested analysis of variance for genetic association testing, and summarization of analysis results. Detailed instructions, procedures and commands are provided in the comprehensive user manual describing the whole process from preliminary preparation of software installation to final outcome acquisition. An example dataset (input files and sample output files) is also included in the package so that users can easily familiarize themselves with the data file formats, working procedures and expected output. Therefore, UPDG is especially useful for users with some computer knowledge, but without a sophisticated programming background.
Conclusions:
UPDG provides a free, simple and platform-independent one-stop service to scientists working on pooled DNA GWAS data analysis, but with less advanced programming knowledge. It is our vision and mission to reduce the hindrance for performing data analysis of pooled DNA GWAS through our contribution of UPDG. More importantly, we hope to promote the popularity of pooled DNA GWAS, which is a very useful research strategy.
-
Next generation DNA sequencing technology delivers valuable genetic markers for the genomic orphan legume species, Bituminaria bituminosa
Background:
Bituminaria bituminosa is a perennial legume species from the Canary Islands and Mediterranean region that has potential as a drought-tolerant pasture species and as a source of pharmaceutical compounds. Three botanical varieties have previously been identified in this species: albomarginata, bituminosa and crassiuscula. B. bituminosa can be considered a genomic 'orphan' species with very few genomic resources available. New DNA sequencing technologies provide an opportunity to develop high quality molecular markers for such orphan species.
Results:
432,306 mRNA molecules were sampled from a leaf transcriptome of a single B. bituminosa plant using Roche 454 pyrosequencing, resulting in an average read length of 345 bp (149.1 Mbp in total). Sequences were assembled into 3,838 isotigs/contigs representing putatively unique gene transcripts. Gene ontology descriptors were identified for 3,419 sequences. Raw sequence reads containing simple sequence repeat (SSR) motifs were identified, and 240 primer pairs flanking these motifs were designed. Of 87 primer pairs developed this way, 75 (86.2%) successfully amplified primarily single fragments by PCR. Fragment analysis using 20 primer pairs in 79 accessions of B. bituminosa detected 130 alleles at 21 SSR loci. Genetic diversity analyses confirmed that variation at these SSR loci accurately reflected known taxonomic relationships in original collections of B. bituminosa and provided additional evidence that a division of the botanical variety bituminosa into two according to geographical origin (Mediterranean region and Canary Islands) may be appropriate. Evidence of cross-pollination was also found between botanical varieties within a B. bituminosa breeding programme.
Conclusions:
B. bituminosa can no longer be considered a genomic orphan species, having now a large (albeit incomplete) repertoire of expressed gene sequences that can serve as a resource for future genetic studies. This experimental approach was effective in developing codominant and polymorphic SSR markers for application in diverse genetic studies. These markers have already given new insight into genetic variation in B. bituminosa, providing evidence that a division of the botanical variety bituminosa may be appropriate. This approach is commended to those seeking to develop useful markers for genomic orphan species.
-
Association, effects and validation of polymorphisms within the NCAPG - LCORL locus located on BTA6 with feed intake, gain, meat and carcass traits in beef cattle.
Background:
In a previously reported genome-wide association study based on a high-density bovine SNP genotyping array, 8 SNP were nominally associated (P[less than or equal to]0.003) with average daily gain (ADG) and 3 of these were also associated (P[less than or equal to]0.002) with average daily feed intake (ADFI) in a population of crossbred beef cattle. The SNP were clustered in a 570 kb region around 38 Mb on the draft sequence of bovine chromosome 6 (BTA6), an interval containing several positional and functional candidate genes including the bovine LAP3, NCAPG, and LCORL genes. The goal of the present study was to develop and examine additional markers in this region to optimize the ability to distinguish favorable alleles, with potential to identify functional variation.
Results:
Animals from the original study were genotyped for 47 SNP within or near the gene boundaries of the three candidate genes. Sixteen markers in the NCAPG-LCORL locus displayed significant association with both ADFI and ADG even after stringent correction for multiple testing (P[less than or equal to]0.005). These markers were evaluated for their effects on meat and carcass traits. The alleles associated with higher ADFI and ADG were also associated with higher hot carcass weight (HCW) and ribeye area (REA), and lower adjusted fat thickness (AFT). A reduced set of markers was genotyped on a separate, crossbred population including genetic contributions from 14 beef cattle breeds. Two of the markers located within the LCORL gene locus remained significant for ADG (P[less than or equal to]0.04).
Conclusions:
Several markers within the NCAPG-LCORL locus were significantly associated with feed intake and body weight gain phenotypes. These markers were also associated with HCW, REA and AFT suggesting that they are involved with lean growth and reduced fat deposition. Additionally, the two markers significant for ADG in the validation population of animals may be more robust for the prediction of ADG and possibly the correlated trait ADFI, across multiple breeds and populations of cattle.
-
Spatial and temporal variation in population genetic structure of wild Nile tilapia (Oreochromis niloticus) across Africa
Background:
Reconstructing the evolutionary history of a species is challenging. It often depends not only on the past biogeographic and climatic events but also the contemporary and ecological factors, such as current connectivity and habitat heterogeneity. In fact, these factors might interact with each other and shape the current species distribution. However, to what extent the current population genetic structure reflects the past and the contemporary factors is largely unknown. Here we investigated spatio-temporal genetic structures of Nile tilapia (Oreochromis niloticus) populations, across their natural distribution in Africa. While its large biogeographic distribution can cause genetic differentiation at the paleo-biogeographic scales, its restricted dispersal capacity might induce a strong genetic structure at micro-geographic scales.
Results:
Using nine microsatellite loci and 350 samples from ten natural populations, we found the highest genetic differentiation among the three ichthyofaunal provinces and regions (Ethiopian, Nilotic and Sudano-Sahelian) (R
ST = 0.38 - 0.69). This result suggests the predominant effect of paleo-geographic events at macro-geographic scale. In addition, intermediate divergences were found between rivers and lakes within the regions, presumably reflecting relatively recent interruptions of gene flow between hydrographic basins (RST
= 0.24 - 0.32). The lowest differentiations were observed among connected populations within a basin (RST
= 0.015 in the Volta basin). Comparison of temporal sample series revealed subtle changes in the gene pools in a few generations (F = 0 - 0.053). The estimated effective population sizes were 23 - 143 and the estimated migration rate was moderate (m ~ 0.094 - 0.097) in the Volta populations.
Conclusions:
This study revealed clear hierarchical patterns of the population genetic structuring of O. niloticus in Africa. The effects of paleo-geographic and climatic events were predominant at macro-geographic scale, and the significant effect of geographic connectivity was detected at micro-geographic scale. The estimated effective population size, the moderate level of dispersal and the rapid temporal change in genetic composition might reflect a potential effect of life history strategy on population dynamics. This hypothesis deserves further investigation. The dynamic pattern revealed at micro-geographic and temporal scales appears important from a genetic resource management as well as from a biodiversity conservation point of view.
-
Ploidy mosaicism and allele-specific gene expression differences in the allopolyploid Squalius alburnoides
Squalius alburnoides is an Iberian cyprinid fish resulting from an interspecific hybridisation between Squalius pyrenaicus females (P genome) and males of an unknown Anaecypris hispanica-like species (A genome). S. alburnoides is an allopolyploid hybridogenetic complex, which makes it a likely candidate for ploidy mosaicism occurrence, and is also an interesting model to address questions about gene expression regulation and genomic interactions. Indeed, it was previously suggested that in S. alburnoides triploids (PAA composition) silencing of one of the three alleles (mainly of the P allele) occurs. However, not a whole haplome is inactivated but a more or less random inactivation of alleles varying between individuals and even between organs of the same fish was seen.In this work we intended to correlate expression differences between individuals and/or between organs to the occurrence of mosaicism, evaluating if mosaics could explain previous observations and its impact on the assessment of gene expression patterns.
Results:
To achieve our goal, we developed flow cytometry and cell sorting protocols for this system generating more homogenous cellular and transcriptional samples. With this set-up we detected 10% ploidy mosaicism within the S. alburnoides complex, and determined the allelic expression profiles of ubiquitously expressed genes (rpl8; gapdh and beta-actin) in cells from liver and kidney of mosaic and non-mosaic individuals coming from different rivers over a wide geographic range.
Conclusions:
Ploidy mosaicism occurs sporadically within the S. alburnoides complex, but in a frequency significantly higher than reported for other organisms. Moreover, we could exclude the influence of this phenomenon on the detection of variable allelic expression profiles of ubiquitously expressed genes (rpl8; gapdh and beta-actin) in cells from liver and kidney of triploid individuals. Finally, we determined that the expression patterns previously detected only in a narrow geographic range is not a local restricted phenomenon but is pervasive in rivers where S. pyrenaicus is sympatric with S. alburnoides.We discuss mechanisms that could lead to the formation of mosaic S. alburnoides and hypothesise about a relaxation of the mechanisms that impose a tight control over mitosis and ploidy control in mixoploids.
|