Description

Intron predictions from short-read RNA-Seq experiments

Methods

The reads from 432 single-ended and paired-end Poly(A)_ short-read RNA-Seq experiments were obtained from ENCODE. The selection emphasized brain, testis, liver, or the H1 cell line to optimize transcript diversity, as well as reads for at least 100bp. The experiments included are lists below.

Reads were aligned to the T2T CHM13 assembly using STAR 2.7.5. with parameters derived from the ENCODE pipeline. The maximum number of allowed multi-mapping reads was increases to 2048 to improve sensitivity over specificity. The intention is to develop post-mapping filters to obtain more specific mappings.

--outFilterMultimapNmax 2048	--alignSJoverhangMin 8	--alignSJDBoverhangMin 1
--outFilterMismatchNmax 999	--outFilterMismatchNoverReadLmax 0.04	--alignIntronMin 20
--alignIntronMax 1000000	--alignMatesGapMax 1000000	--outSAMunmapped Within
--outFilterType BySJout	--outSAMstrandField intronMotif (single ended)	--sjdbScore 1

Called introns using intronProspector with --min-confidence-score=1.0.

Display Conventions and Configuration

Each intron calls is represent by a two-block record, with the blocks representing the longest observed exon overlap. The name field contains the splicing junction motif, with known splice junctions capitalized. Items are colored based on their introns:

U2-type intron green

U12-type intron blue

unknown intron type red

U2-type intron	green
U12-type intron	blue
unknown intron type	red

Data access

Release history

2020-09-01: initial t2tChm13_20200727 release

Contacts

Mark Diekhans <markd@ucsc.edu>

Credits

Mark Diekhans
Ann Mc Cartney