Detection of long non–coding RNA homology, a comparative study on alignment and alignment–free metrics

  • Background: Long non-coding RNAs (lncRNAs) represent a novel class of non-coding RNAs having a crucial role in many biological processes. The identification of long non-coding homologs among different species is essential to investigate such roles in model organisms as homologous genes tend to retain similar molecular and biological functions. Alignment–based metrics are able to effectively capture the conservation of transcribed coding sequences and then the homology of protein coding genes. However, unlike protein coding genes the poor sequence conservation of long non-coding genes makes the identification of their homologs a challenging task. Results: In this study we compare alignment–based and alignment–free string similarity metrics and look at promoter regions as a possible source of conserved information. We show that promoter regions encode relevant information for the conservation of long non-coding genes across species and that such information is better captured by alignment–free metrics. We perform a genome wide test of this hypothesis in human, mouse, and zebrafish. Conclusions: The obtained results persuaded us to postulate the new hypothesis that, unlike protein coding genes, long non-coding genes tend to preserve their regulatory machinery rather than their transcribed sequence. All datasets, scripts, and the prediction tools adopted in this study are available at
Author:Teresa M. R. Noviello, Antonella Di Liddo, Giovanna M. M. Ventola, Antonietta Spagnuolo, Salvatore D'Aniello, Michele Ceccarelli, Luigi Cerulo
Pubmed Id:
Parent Title (English):BMC bioinformatics
Publisher:BioMed Central ; Springer
Place of publication:London ; Berlin ; Heidelberg
Document Type:Article
Year of Completion:2018
Date of first Publication:2018/11/06
Publishing Institution:Universitätsbibliothek Johann Christian Senckenberg
Release Date:2018/12/11
Tag:Homology; Long ncRNA; String similarity
Issue:1, Art. 407
Page Number:12
First Page:1
Last Page:12
Open Access: This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (, which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver ( applies to the data made available in this article, unless otherwise stated.
Institutes:Biowissenschaften / Biowissenschaften
Dewey Decimal Classification:5 Naturwissenschaften und Mathematik / 57 Biowissenschaften; Biologie / 570 Biowissenschaften; Biologie
Licence (German):License LogoCreative Commons - Namensnennung 4.0