比较基因组学-美高梅正规网址(www.9844.com)生物信息学中心-www.9844.com

www.9844.com-美高梅正规网址

研究团队

当前位置: 首页 / 研究团队 / 比较基因组学

比较基因组学

发布时间:2020-12-08 发布人: 浏览次数:

Computational comparative genomics

A key goal in biology is understanding the genetic basis for phenotypic diversity. High-throughput sequencing increasingly enables the detection of genetic differences between individuals. Nevertheless, identifying causal relationships between genotypic and phenotypic variation remains a key focus for evolutionary biology, human genetics, and plant breeding. We are interested in developing and applying new statistical models and algorithms to genome-wide inter- and intraspecific genome comparisons in order to investigate the causal genes/loci underlying trait diversity.

Gap-free chromosome-scale genome assembly

High-quality genome assembly has wide applications in genetics and medical studies. However, achieving gap-free chromosome-scale assemblies using high-throughput sequencing can still be a challenge. We implemented GALA (Gap-free long-read assembler), a statistical framework for genome assembly through a multi-layer computer graph that identifies mis-assemblies within preliminary assemblies or chimeric raw reads and partitions the data into chromosome-scale linkage groups. The subsequent independent assembly of each linkage group generates a gap-free assembly free from the mis-assembly errors that usually impede existing workflows. This flexible framework also allows us to integrate data from various technologies such as Hi-C, genetic maps, and even motif analyses forde novoor reference-guided assemblies. GALA is also the first practical computational framework on chromosome-by-chromosome assembly.

undefined

Figure 1. GALA implements a statistical framework for chromosome-by-chromosome assembly. The preliminary assemblies and raw reads are aligned against each other and encoded into a multi-layer graph. The conflicting alignments, indicated in red, are then removed iteratively by splitting the nodes involved. New edges are assigned accordingly. Nodes are then clustered into different linkage groups for independent assembly.

Structural variation studies

Many structural variants are associated with trait diversity. However, the genome-wide study is still limited because of computational challenges in their reliable identification. The long-term commitment of the group to genome assembly provides us a competitive edge in this field. For example, IMR/DENOM developed in the group can reliably detect insertions, deletions (INDEL), and structural variants (SV) such as those two variants in FRI, which are regarded as challenging to most algorithms. This allows us to perform genome-wide analysis in order to investigate the role of INDELs and SVs. Our study showed that a multi-allelic artefact caused by inconsistent alignments was a key obstacle for testing the association of insertion and deletion polymorphisms (INDELs) as well as for integrated association methods such as burden testing. To address this problem, we developed the Irisas software. This synchronises variants and integrates the impact of SNPs, INDELs, and structural variants for burden testing. We also identified novel trait loci that previous SNP-based association studies failed to map and which contain established candidate genes. We are also applying the method to explore the influence of structural variation on the genetic architecture of trait diversity.

undefined

Figure 2: Integrating the effect of SNPs, INDELs, and SVs for burden analysis leads to the identification of a novel association locus around FRI for phenotypes related to flowering time.

NGS data management and analysis support

The group provides NGS data management and analysis support for other groups in theuniversity. We have created pipelines to support various projects involving high throughput sequencing including DNAseq, RNAseq, microRNAseq and CHiPseq. These pipelines are implemented via an integrated web portal and linked to a manually curated database.