Author | Carels, Nicolas | |
Author | Vidal, Ramon | |
Author | Frias, Diego | |
Access date | 2018-12-26T12:52:26Z | |
Available date | 2018-12-26T12:52:26Z | |
Document date | 2009 | |
Citation | CARELS, Nicolas; VIDAL, Ramon, FRIAS, Diego. Universal Features for the Classification of Coding and Non-coding DNA Sequences. Bioinformatics and Biology Insights, v.3, p.37-49, 2009. | pt_BR |
ISSN | 1177-9322 | pt_BR |
URI | https://www.arca.fiocruz.br/handle/icict/30768 | |
Language | eng | pt_BR |
Publisher | Libertas Academica | pt_BR |
Rights | open access | |
Subject in Portuguese | genômica | pt_BR |
Subject in Portuguese | predição do exon | pt_BR |
Subject in Portuguese | viés de purina | pt_BR |
Subject in Portuguese | recursos de codificação | pt_BR |
Subject in Portuguese | quadro de leitura aberta | pt_BR |
Subject in Portuguese | codon ancestral | pt_BR |
Title | Universal Features for the Classification of Coding and Non-coding DNA Sequences | pt_BR |
Type | Article | |
Abstract | In this report, we revisited simple features that allow the classification of coding sequences (CDS) from non-coding DNA. The spectrum of codon usage of our sequence sample is large and suggests that these features are universal. The features that we investigated combine (i) the stop codon distribution, (ii) the product of purine probabilities in the three positions of nucleotide triplets, (iii) the product of Cytosine, Guanine, Adenine probabilities in 1st, 2nd, 3rd position of triplets, respectively, (iv) the product of G and C probabilities in 1st and 2nd position of triplets. These features are a natural consequence of the physico-chemical properties of proteins and their combination is successful in classifying CDS and non-coding DNA (introns) with a success rate 95% above 350 bp. The coding strand and coding frame are implicitly deduced when the sequences are classified as coding. | pt_BR |
Affilliation | Fundação Oswaldo Cruz. Instituto Oswaldo Cruz. Laboratório de genômica Funcional e Bioinformática. Rio de Janeiro, RJ, Brasil / Universidade Estadual de santa Cruz. Núcleo de Biologia Computacional e Gestão de Informações Biotecnológicas. Ilhéus, BA, Brasil. | pt_BR |
Affilliation | Universidade Estadual de Santa Cruz. Núcleo de Biologia Computacional e Gestão de Informações Biotecnológicas. Ilhéus, BA, Brasil. | pt_BR |
Affilliation | Universidade Estadual de Santa Cruz. Núcleo de Biologia Computacional e Gestão de Informações Biotecnológicas. Ilhéus, BA, Brasil. | pt_BR |
Subject | genomics | pt_BR |
Subject | exon prediction | pt_BR |
Subject | purine bias | pt_BR |
Subject | coding features | pt_BR |
Subject | open reading frame | pt_BR |
Subject | ancestral codon | pt_BR |