Between the two references, inversion counts differ for most human chromosomes ( n = 19) with the majority showing a net increase on T2T-CHM13 ( n = 13) (Additional file 1: Fig. In stark contrast, no misorientations are defined in the T2T-CHM13 genome, confirming its value as an improved reference (Additional file 1: Fig. In addition, the GRCh38 reference harbors 26 misorientations-defined here as any region where all 41 samples are homozygously inverted compared to the reference. S1B), the T2T-CHM13 callset increases the total number of inverted bases by ~ 10.5 Mbp (82.8 Mbp) compared to GRCh38 (72.3 Mbp). While we report a comparable number of inverted bases per chromosome (Additional file 1: Fig. 1A, Additional file 1: Supplemental Notes). For the remainder of this study, we focus exclusively on analysis of 296 balanced inversions (referred to as inversions or inversion polymorphisms), because they can be accurately and comprehensively genotyped to establish meaningful population frequencies (Fig. With this reanalysis we identified 373 inverted regions, including 296 balanced inversions, 56 inverted duplications, and 21 complex events across the autosomes and chromosome X (Additional files 1: Fig. Using the same algorithm applied to GRCh38, we remapped the Strand-seq data to T2T-CHM13 (v1.1) and combined it with both Bionano Genomics and assembly-based approaches to detect inversions (Methods). Previously, we generated Strand-seq data from 41 samples from the 1000 Genomes Project. More accurate and complete inversion discovery with T2T reference We, therefore, recalled inversions with respect to the T2T-CHM13 reference using data from multiple genomic platforms (Strand-seq, Bionano, and long-read assemblies) for 41 human genomes of diverse population origin. We sought to assess the potential advantage of detecting inversions on this new reference when compared to GRCh38 and whether it would significantly alter our understanding of the landscape and frequency of inversion polymorphism in the human genome. The T2T-CHM13 assembly has been put forward as an improved human reference genome over the current incomplete GRCh38 and GRCh37 references. Accurate detection of inversions of this type is critical for understanding human variation and disease because recurrent inversions have been shown to associate with regions of genome instability and neurodevelopmental disease. Long-read sequencing methods are particularly powerful for detecting smaller inversions ( 10 kbp) that are flanked by SDs and affect the greatest number of base pairs per haploid genome. While various approaches have been developed over the years to detect inversions (including mate-pair detection, optical mapping, Strand-seq, and long-read sequence detection), a combination of these methods has been shown to produce the best results. Even among existing high-quality long-read genome assemblies, large inversion polymorphisms are often missed or incorrectly represented. This is especially true for the largest events that are frequently flanked by long and highly identical segmental duplications (SDs). This is because most inversions are copy number neutral and are associated with repetitive DNA. Compared to other classes of variation, the detection of balanced events such as inversions is particularly challenging. The complete reference newly resolved > 240 Mbp of sequence not previously represented in GRCh38 improving the discovery of single-nucleotide variants and copy number variants (CNVs). A gapless telomere-to-telomere (T2T) assembly of a human genome (T2T-CHM13) was recently released.
0 Comments
Leave a Reply. |