BioLake: an RNA expression analysis framework for prostate cancer biomarker powered by data lakehouse

Yükleniyor...
Küçük Resim

Tarih

2025

Dergi Başlığı

Dergi ISSN

Cilt Başlığı

Yayıncı

Erişim Hakkı

info:eu-repo/semantics/openAccess
Attribution-NonCommercial-NoDerivatives 4.0 International

Özet

Biomedical researchers must often deal with large amounts of raw data, and analysis of this data might provide significant insights. However, if the raw data size is large, it might be difficult to uncover these insights. In this paper, a data framework named BioLake is presented that provides minimalist interactive methods to help researchers conduct bioinformatics data analysis. Unlike some existing analytical tools on the market, BioLake supports a wide range of web-based bioinformatics data analysis for public datasets, while allowing researchers to analyze their private datasets instantly. The tool also significantly enhances result interpretability by providing the source code and detailed instructions. In terms of data storage design, BioLake adopts the data lakehouse architecture to provide storage scalability and analysis flexibility. To further enhance the analysis efficiency, BioLake supports online analysis for custom data, allowing researchers to upload their own data via a designed procedure without waiting for server-side approval. BioLake allows a one-time upload of custom data of up to 500 MB to ensure that researchers avoid issues with data being too large for upload. In terms of the built-in dataset, BioLake applies reactive continuous data integration, helping the analysis pipeline to get rid of most preprocessing steps. The only pre-built-in dataset of BioLake in the first public version is TCGA-PRAD mRNA expression data for prostate cancer research, which is the primary focus of the development team of BioLake. In summary, BioLake offers a lightweight online tool to facilitate bioinformatic mRNA data analysis with the support of custom online data processing.

Açıklama

Anahtar Kelimeler

Data Lakehouse, Data Visualization, Expression Analysis, Parallel Computing, Prostate Cancer

Kaynak

BMC Bioinformatics

WoS Q Değeri

Q1

Scopus Q Değeri

Q1

Cilt

26

Sayı

1

Künye

Li, Q., Gamallat, Y., Rokne, J. G., Bismar, T. ve Alhajj, R. (2025). BioLake: an RNA expression analysis framework for prostate cancer biomarker powered by data lakehouse. BMC Bioinformatics, 26(1). http://dx.doi.org/10.1186/s12859-025-06050-2