Please use this identifier to cite or link to this item:
https://www.arca.fiocruz.br/handle/icict/69271
Type
ArticleCopyright
Open access
Collections
- IOC - Artigos de Periódicos [12965]
- MG - IRR - Artigos de Periódicos [4307]
Metadata
Show full item record
A COMPUTATIONAL FRAMEWORK FOR EXTRACTING BIOLOGICAL INSIGHTS FROM SRA CANCER DATA
Affilliation
Fundação Oswaldo Cruz. Instituto René Rachou. Grupo de Pesquisa Informática de Biossistemas, Bioengenharia e Genômica. Belo Horizonte, MG, Brasil / Fundação Oswaldo Cruz. Instituto Oswaldo Cruz. Laboratório de Biologia Computacional e Sistemas. Rio de Janeiro, RJ, Brasil.
Fundação Oswaldo Cruz. Instituto René Rachou. Grupo de Pesquisa Informática de Biossistemas, Bioengenharia e Genômica. Belo Horizonte, MG, Brasil / Fundação Oswaldo Cruz. Instituto Oswaldo Cruz. Laboratório de Biologia Computacional e Sistemas. Rio de Janeiro, RJ, Brasil.
Fundação Oswaldo Cruz. Instituto René Rachou. Grupo de Pesquisa Informática de Biossistemas, Bioengenharia e Genômica. Belo Horizonte, MG, Brasil / Fundação Oswaldo Cruz. Instituto Oswaldo Cruz. Laboratório de Biologia Computacional e Sistemas. Rio de Janeiro, RJ, Brasil.
Fundação Oswaldo Cruz. Instituto René Rachou. Grupo de Pesquisa Informática de Biossistemas, Bioengenharia e Genômica. Belo Horizonte, MG, Brasil / Fundação Oswaldo Cruz. Instituto Oswaldo Cruz. Laboratório de Biologia Computacional e Sistemas. Rio de Janeiro, RJ, Brasil.
Fundação Oswaldo Cruz. Instituto René Rachou. Grupo de Pesquisa Informática de Biossistemas, Bioengenharia e Genômica. Belo Horizonte, MG, Brasil / Fundação Oswaldo Cruz. Instituto Oswaldo Cruz. Laboratório de Biologia Computacional e Sistemas. Rio de Janeiro, RJ, Brasil.
Abstract
The integration of sequenced samples and clinical data from independent yet related studies from public domain databases, such as The Sequence Read Archive (SRA), has the potential to increase sample sizes and enhance the statistical power needed for more precise bioinformatic analysis. Data mining and sample grouping are the starting points in this process and still present several challenges, including the presence of structured and unstructured data, missing deposited data, and varying experimental conditions and techniques applied across the studies. Designed to address the main challenges of data mining and sample grouping for biomarkers research, the proposed methodology employs a computational approach integrating relational database construction, text and data mining, natural language processing, network analysis, search by Pubmed publications, and combining MeSH, TTD and WordNet database to identify groups of samples with the same characteristics. As a result, it identifies and illustrates relationships among sample collections, aiming to discover potential cancer biomarkers. In colorectal cancer (CRC) and acute lymphoblastic leukemia (ALL) case studies, this methodology effectively navigates SRA metadata, retrieving, extracting, and integrating data. It highlights significant connections between samples and patient clinical data, revealing important biological insights. The study grouped 2,737 (CRC) and 3,655 (ALL) samples into potential comparison groups, demonstrating the method’s power in identifying relationships and aiding biomarker discovery.
Share