Please use this identifier to cite or link to this item:
https://www.arca.fiocruz.br/handle/icict/13544
Type
ArticleCopyright
Open access
Collections
- IOC - Artigos de Periódicos [12339]
Metadata
Show full item record
IMPROVED ORTHOLOGOUS DATABASES TO EASE PROTOZOAN TARGETS INFERENCE
Homology inference
Target identification
Protozoa
Orthologous database
Distant homology
Leishmania
Cryptosporidium
Entamoeba
Affilliation
Fundação Oswaldo Cruz. Instituto Oswaldo Cruz. Laboratório de Biologia Computacional e Sistemas. Rio de Janeiro, RJ, Brasil.
Fundação Oswaldo Cruz. Instituto Oswaldo Cruz. Laboratório de Biologia Computacional e Sistemas. Rio de Janeiro, RJ, Brasil.
Fundação Oswaldo Cruz. Instituto Oswaldo Cruz. Laboratório de Biologia Computacional e Sistemas. Rio de Janeiro, RJ, Brasil.
Fundação Oswaldo Cruz. Instituto Oswaldo Cruz. Laboratório de Biologia Computacional e Sistemas. Rio de Janeiro, RJ, Brasil.
Fundação Oswaldo Cruz. Instituto Oswaldo Cruz. Laboratório de Biologia Computacional e Sistemas. Rio de Janeiro, RJ, Brasil.
Abstract
Background: Homology inference helps on identifying similarities, as well as differences among organisms, which
provides a better insight on how closely related one might be to another. In addition, comparative genomics
pipelines are widely adopted tools designed using different bioinformatics applications and algorithms. In this
article, we propose a methodology to build improved orthologous databases with the potential to aid on
protozoan target identification, one of the many tasks which benefit from comparative genomics tools.
Methods: Our analyses are based on OrthoSearch, a comparative genomics pipeline originally designed to infer
orthologs through protein-profile comparison, supported by an HMM, reciprocal best hits based approach. Our
methodology allows OrthoSearch to confront two orthologous databases and to generate an improved new one.
Such can be later used to infer potential protozoan targets through a similarity analysis against the human
genome.
Results: The protein sequences of Cryptosporidium hominis, Entamoeba histolytica and Leishmania infantum genomes
were comparatively analyzed against three orthologous databases: (i) EggNOG KOG, (ii) ProtozoaDB and (iii) Kegg
Orthology (KO). That allowed us to create two new orthologous databases, “KO + EggNOG KOG” and “KO + EggNOG
KOG + ProtozoaDB”, with 16,938 and 27,701 orthologous groups, respectively.
Such new orthologous databases were used for a regular OrthoSearch run. By confronting “KO + EggNOG KOG” and
“KO + EggNOG KOG + ProtozoaDB” databases and protozoan species we were able to detect the following total of
orthologous groups and coverage (relation between the inferred orthologous groups and the species total number of
proteins): Cryptosporidium hominis: 1,821 (11 %) and 3,254 (12 %); Entamoeba histolytica: 2,245 (13 %) and 5,305 (19 %);
Leishmania infantum: 2,702 (16 %) and 4,760 (17 %).
Using our HMM-based methodology and the largest created orthologous database, it was possible to infer 13
orthologous groups which represent potential protozoan targets; these were found because of our distant homology
approach.
We also provide the number of species-specific, pair-to-pair and core groups from such analyses, depicted in Venn
diagrams.
Conclusions: The orthologous databases generated by our HMM-based methodology provide a broader dataset, with
larger amounts of orthologous groups when compared to the original databases used as input. Those may be used for
several homology inference analyses, annotation tasks and protozoan targets identification.
Keywords
Comparative genomicsHomology inference
Target identification
Protozoa
Orthologous database
Distant homology
Leishmania
Cryptosporidium
Entamoeba
Share