BioLake: an RNA expression analysis framework for prostate cancer biomarker powered by data lakehouse

dc.contributor.authorLi, Qiaowang
dc.contributor.authorGamallat, Yaser
dc.contributor.authorRokne, Jon George
dc.contributor.authorBismar, Tarek
dc.contributor.authorAlhajj, Reda
dc.date.accessioned2026-05-11T11:14:08Z
dc.date.available2026-05-11T11:14:08Z
dc.date.issued2025
dc.departmentİstanbul Medipol Üniversitesi, Mühendislik ve Doğa Bilimleri Fakültesi, Bilgisayar Mühendisliği Bölümü
dc.description.abstractBiomedical researchers must often deal with large amounts of raw data, and analysis of this data might provide significant insights. However, if the raw data size is large, it might be difficult to uncover these insights. In this paper, a data framework named BioLake is presented that provides minimalist interactive methods to help researchers conduct bioinformatics data analysis. Unlike some existing analytical tools on the market, BioLake supports a wide range of web-based bioinformatics data analysis for public datasets, while allowing researchers to analyze their private datasets instantly. The tool also significantly enhances result interpretability by providing the source code and detailed instructions. In terms of data storage design, BioLake adopts the data lakehouse architecture to provide storage scalability and analysis flexibility. To further enhance the analysis efficiency, BioLake supports online analysis for custom data, allowing researchers to upload their own data via a designed procedure without waiting for server-side approval. BioLake allows a one-time upload of custom data of up to 500 MB to ensure that researchers avoid issues with data being too large for upload. In terms of the built-in dataset, BioLake applies reactive continuous data integration, helping the analysis pipeline to get rid of most preprocessing steps. The only pre-built-in dataset of BioLake in the first public version is TCGA-PRAD mRNA expression data for prostate cancer research, which is the primary focus of the development team of BioLake. In summary, BioLake offers a lightweight online tool to facilitate bioinformatic mRNA data analysis with the support of custom online data processing.
dc.identifier.citationLi, Q., Gamallat, Y., Rokne, J. G., Bismar, T. ve Alhajj, R. (2025). BioLake: an RNA expression analysis framework for prostate cancer biomarker powered by data lakehouse. BMC Bioinformatics, 26(1). http://dx.doi.org/10.1186/s12859-025-06050-2
dc.identifier.doi10.1186/s12859-025-06050-2
dc.identifier.issn1471-2105
dc.identifier.issue1
dc.identifier.pmid39905288
dc.identifier.scopus2-s2.0-85217989904
dc.identifier.scopusqualityQ1
dc.identifier.urihttp://dx.doi.org/10.1186/s12859-025-06050-2
dc.identifier.urihttps://hdl.handle.net/20.500.12511/13443
dc.identifier.volume26
dc.identifier.wosWOS:001412894100002
dc.identifier.wosqualityQ1
dc.indekslendigikaynakWeb of Science
dc.indekslendigikaynakScopus
dc.indekslendigikaynakPubMed
dc.institutionauthorAlhajj, Reda
dc.institutionauthorid0000-0001-6657-9738
dc.language.isoen
dc.relation.ispartofBMC Bioinformatics
dc.relation.publicationcategoryMakale - Uluslararası Hakemli Dergi - Kurum Öğretim Elemanı
dc.rightsinfo:eu-repo/semantics/openAccess
dc.rightsAttribution-NonCommercial-NoDerivatives 4.0 International
dc.rights.urihttps://creativecommons.org/licenses/by-nc-nd/4.0/
dc.subjectData Lakehouse
dc.subjectData Visualization
dc.subjectExpression Analysis
dc.subjectParallel Computing
dc.subjectProstate Cancer
dc.titleBioLake: an RNA expression analysis framework for prostate cancer biomarker powered by data lakehouse
dc.typeArticle

Dosyalar

Orijinal paket
Listeleniyor 1 - 1 / 1
Yükleniyor...
Küçük Resim
İsim:
Alhajj-Reda-2025.pdf
Boyut:
2.44 MB
Biçim:
Adobe Portable Document Format
Lisans paketi
Listeleniyor 1 - 1 / 1
Küçük Resim Yok
İsim:
license.txt
Boyut:
1.17 KB
Biçim:
Item-specific license agreed upon to submission
Açıklama: