Description

Shared ASat HORs track shows annotation of chromosome-specific haplotypes of alpha satellite higher-order repeats shared by two or more chromosomes (S1C1/5/19H1L, S2C13/21H1L and S2C14/22H1L) in T2T CHM13 human genome assembly release v1.1.

Methods

cen13 - cen21

According to the StV track statistics, the most common StVs for cen13 and 21 were S2C13/21H1L_13.1-11, S2C13/21H1L_13.1-7 and S2C13/21H1L_21.1-11. These complete sequences were extracted from StV track and aligned by MUSCLE (Edgar, 2004) and multiple alignments were used as HMMs with HMMER platform (Krogh et al., 1994, Eddy et al., 1998) using just these profiles as described for HumAS-HMMER-HOR (Uralsky et al., 2019). Chr13 and chr21 assemblies from CHM13 T2T v1.1 were processed and the output file was converted into a BED file and low-score and overlapping hits were filtered as described (Uralsky et al., 2019). BED files were opened in the UCSC Genome Browser. In the "Shared ASat HORs" track, the names of the HORs show individual cen13 StVs (11mer or 7mer), but only the chromosome to which the HOR belongs is indicated by color. One can see the regular disposition of certain dimers within HORs on each chromosome.

cen14 - cen22

According to the StV track statistics, the most common StVs for cen14 and 22 were the full-size HORs S2C14/22H1L_14.1-8 and S2C14/22H1L_22.1-8. These sequences were treated as described above for S2C13/21H1L.

cen1 - cen5 - cen19

The active HOR arrays of cen1, cen5 and cen19 mostly consist of non-canonical dimer S1C1/5/19H1L.6/4-5. Dimer proportions in S1C1/5/19H1L HOR arrays were: cen1 - 55%; cen5 - 13% and cen19 - 78% of the whole array length. Sequences of dimers were extracted for each chromosome separately from the HOR-track. The dimers from cen1 inversion (chr1:124130687-125858867) were reversed. From each set of dimers a random sample of 500 sequences was made, aligned by MUSCLE (Edgar, 2004) and used to build a tree where dimers coming from each chromosome were marked. All separate branches from this tree were extracted as sequences and used as HMMs (cen1 - 6 branches named 1a-f, cen5 - 2 branches named 5a and 5b, cen19 - 6 branches named 19a-f) to haplotype CHM13 T2T v1.1 assembly. In “Shared ASat HORs” track, the names of the HORs show each individual haplotype (e.g. S1C1/5/19_19c), but only the chromosome to which the haplotype belongs is indicated by color.

See details: Altemose et al., 2021, Supplementary Materials and Methods, Section 3: Chromosome-specific Array Comparison. Resolving the shared ASat arrays S2C13/21H1L, S2C14/22H1L and S1C1/5/19H1L.

References

Nicolas Altemose, Glennis Logsdon, Andrey V Bzikadze, Pragya Sidhwani, Sasha A Langley, Gina V. Caldas, Savannah J. Hoyt, Lev Uralsky, Fedor D. Ryabov, Colin Shew, Michael E.G. Sauria, Matthew Borchers, Ariel Gershman, Alla Mikheenko, Valery A. Shepelev, Tatiana Dvorkina, Olga Kunyavskaya, Mitchell R Vollger, Arang Rhie, Ann M. McCartney, Mobin Asri, Ryan Lorig-Roach, Kishwar Shafin, Sergey Aganezov, Daniel Olson, Leonardo Gomes de Lima, Tamara Potapova, Gabrielle A. Hartley, Marina Haukness, Peter Kerpedjiev, Fedor Gusev, Kristof Tigyi, Shelise Y. Brooks, Alice Young, Sergey Nurk, Sergey Koren, Sofie Salama, Benedict Paten, Evgeny I. Rogaev, Aaron M Streets, Gary H Karpen, Abby Dernburg, Beth A Sullivan, Aaron F Straight, Travis Wheeler, Jennifer L. Gerton, Evan Eichler, Adam M. Phillippy, Winston Timp, Megan Y. Dennis, Rachel J O'Neill, Justin M Zook, Michael Schatz, Pavel A Pevzner, Mark Diekhans, Charles H. Langley, Ivan A. Alexandrov, Karen H Miga. bioRxiv 2021.07.12.452052; doi: https://doi.org/10.1101/2021.07.12.452052.

Krogh, A., Brown, M., Mian, I. S., Sjölander, K., & Haussler, D. (1994). Hidden Markov models in computational biology. Applications to protein modeling. Journal of molecular biology, 235(5), 1501–1531. https://doi.org/10.1006/jmbi.1994.1104

Eddy, S., Profile hidden Markov models., Bioinformatics, Volume 14, Issue 9, 1998, Pages 755–763, https://doi.org/10.1093/bioinformatics/14.9.755

Edgar R. C. (2004). MUSCLE: multiple sequence alignment with high accuracy and high throughput. Nucleic acids research, 32(5), 1792–1797. https://doi.org/10.1093/nar/gkh340

Uralsky, L. I.; Shepelev, V. A.; Alexandrov, A. A.; Yurov, Y. B.; Rogaev, E. I.; Alexandrov, I. A. Classification and Monomer-by-Monomer Annotation Dataset of Suprachromosomal Family 1 Alpha Satellite Higher-Order Repeats in Hg38 Human Genome Assembly. Data Brief 2019, 24, 103708. https://doi.org/10.1016/j.dib.2019.103708.

Release history

  1. Shared_Sat - First public version of the Shared ASat HORs annotation.
  2. 2021-10-22 t2t-CHM13.release_v1.1 - Annotation of CHM13 v1.1 assembly.

Contact

Contact Fedor Ryabov <fedorrik1@gmail.com>

Contact Ivan A. Alexandrov <ivanalx@hotmail.com>

Credits

Fedor Ryabov (Moscow Polytechnic University, Russia)

Ivan A. Alexandrov (1. Vavilov Institute of General Genetics, Moscow, Russia; 2. Research Center of Biotechnology of the Russian Academy of Sciences, Moscow, Russia)