RNA impact maps
The group of Mihaela Zavolan from the Biozentrum of the University of Basel has recently presented two new computational tools for studying three prime untranslated regions of mRNA, especially the poly(A) site: PAQR, a robust computational method for inferring relative poly(A) site use in terminal exons from RNA sequencing data called, and KAPAC, an approach to infer sequence motifs that are associated with the processing of poly(A) sites in specific samples.
In a recent Genome Biology paper they demonstrate that these methods help uncover regulators of polyadenylation in cancers and also shed light on their mechanism of action. As an important side result, the researchers also stress the importance of assessing the quality of samples used for highthroughput analyses, as this can have substantial impact on the estimates of gene expression.
Despite our current understanding of 3'-UTRs, they are still relative mysteries. Since mRNAs usually contain overlapping control elements, it is often difficult to specify their identity and function, let alone the regulatory factors that may bind at these sites. There are a number of methods though to study the complex structures and functions of the 3' UTR. Computational approaches, primarily by sequence analysis, have shown the presence of one or more miRNA target sites in as many as 60% or more of human 3'-UTRs. Software can rapidly compare millions of sequences at once to find similarities between various 3' UTRs within the genome.
Promising attempts to construct “RNA maps” relating the position of cisacting elements to the processing of individual exons, have shown the potential of such an approach by combining mapping of RNA-binding protein (RBP) binding sites with measurements of isoform expression. However, whether the impact of a regulator can be inferred solely from RNA sequencing data obtained from samples with different expression levels of various regulators is not known. To address this problem, the Zavolan group has developed KAPAC (for kmer activity on polyadenylation site choice), a method that infers position-dependent activities of sequence motifs on 3? end processing from changes in poly(A) site usage between conditions. The group refers to the activities of individual motifs inferred by KAPAC as “impact maps”, by analogy with RNA maps, and to emphasize the fact that their approach does not use information about RBP binding to RNA targets.
With their method, the group was able to demonstrate that the modeling of poly(A) sites (PAS) usage in terms of motifs in the vicinity of PAS can reveal global regulators, while the reconstructed positiondependent activity of their corresponding motifs provides insights into their mechanisms. In their paper, they pay special attention to the fact that some of the uncovered proteins are splicing factors, which in their opinion underscores a general coupling between splicing and polyadenylation. The new methods already have proven relevant for medical research, as running PAQR and KAPAC on RNA sequencing data from normal and tumor tissue samples uncovered motifs that can explain changes in cleavage and polyadenylation in specific cancers. In particular, they suspect polypyrimidine tract binding protein 1 to be a regulator of poly(A) site choice in glioblastoma.
Gruber A.J. et al. (2018) Genome Biology 19 (1), 44 (open access)
Text by Roland Fischer