Abstract
Objective
High-grade ovarian cancer (HGOC) remains a significant therapeutic challenge due to its aggressive nature and poor prognosis. The aim was to elucidate the molecular drivers of HGOC through an integrated bioinformatics analysis.
Material and Methods
The microarray datasets (GSE6008 and GSE14764) served as the training set, while an independent microarray dataset (GSE23603) was used as the validation set. These datasets included low- and high-grade ovarian tumor samples and were downloaded from the ArrayExpress database. Selection criteria included clearly classified low-grade ovarian cancer and HGOC samples, as well as platform and sample processing methods compatibility. After normalization, differentially expressed genes (DEGs) were obtained using R software. Functional enrichment analysis [including gene ontology (GO) and pathway analysis] was performed using the DAVID database. A protein-protein interaction (PPI) network was constructed by STRING to identify hub genes associated with HGOC.
Results
A total of 106 common DEGs were identified across all three datasets, including 66 up-regulated and 40 down-regulated genes. Given the study’s focus on potential oncogenic drivers, subsequent analyses prioritized the 66 up-regulated genes. The DEGs were classified into three groups by GO terms (21 biological process, 10 molecular function and 12 cellular component). Kyoto Encyclopedia of Genes and Genomes pathway analysis showed enrichment in metabolic pathways, oxidative phosphorylation, drug metabolism, and cell cycle regulation. The top nine up-regulated hub genes in the PPI network were GMPS, RFC4, YWHAZ, CHEK1, CYC1, MRPL13, MRPL15, SDHA, and CLPB.
Conclusion
The identification of these hub genes and pathways may represent an important step forward in our understanding of HGOC. While down-regulated genes may also hold biological significance, their analysis was beyond the scope of this study and warrants future investigation. Further experimental validation is needed to confirm the roles of the identified genes in disease pathogenesis and their potential as biomarkers and therapeutic targets.
Introduction
Ovarian cancer remains a significant global health concern, with a high mortality rate, primarily attributed to late-stage diagnosis (1). The World Ovarian Cancer Coalition’s 2022 analysis highlights this challenge, projecting a substantial increase in global ovarian cancer deaths in the coming decades (2). Early-stage ovarian cancer is associated with a favorable prognosis, with a 5-year survival rate exceeding 90%. However, most cases are diagnosed at advanced stages, which limits treatment options and reduces survival rates (3).
The key distinctions between high-grade ovarian cancer (HGOC; Grade 3, FIGO stages III-IV) and low-grade ovarian cancer (LGOC; Grade 1, FIGO stages I-II) are rooted in their disparate molecular profiles, which in turn dictate their divergent clinical behaviors. HGOC is recognized by its high degree of genomic instability, contributing to its aggressive nature, rapid disease progression, and generally poor prognosis, often observed as widespread metastatic disease at initial diagnosis. This aggressive cellular behavior is driven by characteristic genetic alterations that present considerable obstacles to effective treatment. In contrast, LGOC is defined by a slower proliferation rate and a more stable genome, reflecting fewer overall genetic changes. Although less aggressive clinically, LGOC is driven by its own unique spectrum of molecular alterations. These divergent genetic mechanisms are crucial in explaining the varied clinical progression and differential responses to therapy observed in HGOC vs. LGOC. A comprehensive understanding of these molecular specificities is essential for advancing targeted treatment approaches and enhancing patient prognosis across both disease categories (4). This biological dichotomy underscores the need for grade-specific molecular characterization.
Surgery remains the primary treatment modality for ovarian cancer, often followed by adjuvant chemotherapy for advanced-stage disease. While advances in biological therapies, immunotherapy, and radiotherapy have emerged, treatment resistance persists as a major challenge (4).
A deeper understanding of the molecular mechanisms underlying ovarian cancer progression may be beneficial for identifying potential biomarkers and therapeutic targets (5).
High-throughput microarray analysis enables comprehensive gene expression studies, revealing the molecular complexity of ovarian cancer. Comparing expression profiles between tumor grades identifies differentially expressed genes (DEGs), offering insights into tumorigenesis (6-8).
Bioinformatics analysis now plays an important role in integrating and analyzing large-scale genomic data, enabling the identification of key genes and pathways associated with ovarian cancer (9). Recent years have witnessed significant advances in multi-omics technologies, spanning genomics to metabolomics, which are revolutionizing biomarker discovery and personalized medicine (10).
Previous studies have employed a data-driven approach, using publicly available gene expression datasets, to identify prognostic signatures associated with ovarian cancer (11). To further explore the molecular basis of ovarian cancer, we analyzed three publicly available microarray datasets (GSE6008, GSE14764, and GSE23603) from the ArrayExpress database. GSE23603 was chosen as the validation set due to its balanced HGOC/LGOC ratio (38 HGOC vs. 24 LGOC) and platform consistency (GPL570). The datasets were selected based on the following criteria: (1) inclusion of both low- and HGOC samples; (2) availability of raw gene expression data; and (3) sufficient sample size for robust statistical analysis. Samples with incomplete clinical data or unclear tumor grade classification were excluded. The primary objective of this study was to employ bioinformatics analysis to identify key genes and elucidate their potential molecular mechanisms in the distinction between LGOC and HGOC. By uncovering these molecular differences, we hope to identify potential biomarkers and therapeutic targets for HGOC.
Material and Methods
This study employed a comprehensive bioinformatics workflow to identify potential biomarkers in HGOC, as illustrated in Figure 1. The methodology included the following components:
Gene expression data sources
This study analyzed three publicly available microarray datasets (GSE6008, GSE14764, and GSE23603) retrieved from the ArrayExpress database (12), comprising gene expression profiles from 110 HGOC and 72 LGOC samples. Samples were included based on histopathological grade (HGOC: high-grade serous carcinoma; LGOC: low-grade serous/borderline tumors) as defined in the original studies (13-15). Clinical annotations (e.g., treatment history, mutation status) were unavailable and thus not used for filtering. The sample distribution was as follows:
• GSE6008: 36 HGOC, 24 LGOC (13)
• GSE14764: 36 HGOC, 24 LGOC (14)
• GSE23603: 38 HGOC, 24 LGOC (15)
GSE23603 served as the validation set due to its larger sample size and balanced HGOC/LGOC representation. This study used de-identified data; thus, no additional ethical approval was required.
A summary of dataset filtering and categorization is presented in Table 1 to detail the data selection workflow.
Statistical analysis
Raw data were normalized using the Robust Multi-array Average method. DEGs between HGOC and LGOC were identified using R 3.4.0 [two-tailed t-test, p<0.05, log2 fold change (FC) >0]. Genes were classified as up-regulated (log2 FC >0) or down-regulated (log2 FC <0); subsequent analyses prioritized up-regulated genes as potential oncogenic drivers.
Functional enrichment analysis
Gene ontology (GO) analysis categorized DEGs into molecular functions, biological processes, and cellular components. Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway enrichment was performed using DAVID (16) with a significance threshold of p<0.05 (17), adjusted via the Benjamini-Hochberg method (18).
Candidate hub gene identification
Protein-protein interaction (PPI) networks were constructed using STRING (19, 20). Hub genes were selected based on high connectivity scores and cancer pathway relevance, with the top nine up-regulated genes prioritized.
Results
A total of 5,171 DEGs were identified in GSE6008 dataset (TS1), while 1,557 DEGs were identified in GSE14764 (TS2). Comparative analysis revealed 484 overlapping genes between these datasets (Figure 2). These 484 probesets were further validated using a t-test in the independent dataset GSE23603 (VS), yielding 106 consistently dysregulated probes (66 up-regulated and 40 down-regulated genes; Table 2). Given their higher likelihood of representing oncogenic drivers in HGOC, subsequent analyses focused on the 66 up-regulated genes.
FC values calculated across TS1, TS2, and VS demonstrated strong concordance in all pairwise comparisons (Figure 3), indicating robust experimental design and effective data normalization. The uniform distribution of FCs in scatter plots further corroborated the reliability of our differential expression analysis.
Hierarchical clustering and heatmap analysis of the 106 DEGs in the validation set (Figure 4) reinforced the stratification of samples based on gene expression profiles. For clarity, the heatmap employed a red-black-green color gradient (red: up-regulation; black: no change) and standardized labeling of samples and genes.
GO analysis categorized the DEGs into three functional groups: biological processes (21 terms), molecular functions (10 terms), and cellular components (12 terms) (Table 3).
• Biological processes were enriched for terms related to oxidative stress response (GO:0006975), regulation of cell death (GO:0043066, GO:0043065), and mitochondrial function (GO:0006121, GO:0070125), suggesting their potential disruption in HGOC progression.
• Molecular functions predominantly involved protein binding (GO:0005515) and catalytic activities linked to energy metabolism, including ATP binding (GO:0005524) and succinate dehydrogenase activity (GO:0000104), implicating altered PPIs and metabolic pathways in HGOC pathogenesis.
Cellular components highlighted enrichment in mitochondrial (GO:0005739), cytosolic (GO:0005829), and extracellular exosome compartments (GO:0070062), indicating diverse subcellular localization and potential roles in metastasis.
KEGG pathway analysis highlighted enrichment in metabolic pathways (hsa01100), oxidative phosphorylation (hsa00190), and drug metabolism (hsa00983), underscoring aberrant energy metabolism and therapeutic resistance in HGOC (Table 4). In addition, pathways associated with the cell cycle (hsa04110) and neurodegenerative diseases [e.g., Huntington’s disease (hsa05016), Parkinson’s disease (hsa05012)] were enriched, reflecting the complex interplay of biological processes in HGOC.
Among the 66 up-regulated genes, nine hub genes (GMPS, RFC4, YWHAZ, CHEK1, CYC1, MRPL13, MRPL15, SDHA, and CLPB) emerged as central nodes in the PPI network (Figure 5). This network topology underscores their potential as coordinators of the aggressive phenotype of HGOC. Pathway enrichment further emphasized their roles in metabolism, oxidative phosphorylation, and cell cycle regulation, suggesting actionable therapeutic targets.
Discussion
Ovarian cancer remains a major health challenge, with high mortality rates attributable to late diagnosis (21). While our study employs established bioinformatics methodologies, its novel contribution lies in the identification of underinvestigated genes, such as GMPS and CLPB, as possible pivotal players in HGOC pathogenesis. This finding extends beyond conventional biomarker discovery by highlighting potential therapeutic targets overlooked in prior studies. Through integrated analysis of multiple datasets, we identified 66 consistently up-regulated DEGs, predominantly enriched in metabolic and cell cycle pathways, which collectively illuminate the molecular drivers of HGOC progression.
Our decision to prioritize up-regulated genes (GMPS, RFC4, YWHAZ, CHEK1, CYC1, MRPL13, MRPL15, SDHA, and CLPB) was guided by three key considerations: 1) oncogenic drivers in aggressive cancers typically exhibit increased expression; 2) up-regulated genes generally engage in more extensive PPI networks than down-regulated counterparts (22); and 3) therapeutic targeting of overexpressed genes is clinically more feasible. Importantly, this strategic focus does not negate the potential relevance of down-regulated genes but aligns with translational priorities for diagnostic and therapeutic development.
Among the up-regulated hub genes identified, GMPS, RFC4, and CLPB emerged as particularly noteworthy due to their high connectivity in PPI networks and their understudied roles in HGOC. For instance, GMPS up-regulation has been linked to therapy resistance and tumor aggressiveness in other cancers (23, 24), suggesting a parallel mechanism in HGOC that warrants further validation. CHEK1, a critical mediator of genomic stability (25), was similarly overexpressed in our analysis, corroborating findings by Lopes et al. (25) and Fadaka et al. (26) in ovarian cancer. The consistent association of CHEK1 with tumorigenesis, particularly in DNA repair and therapy resistance, underscores its potential as a therapeutic target, as proposed in preclinical studies advocating CHEK1 inhibition (27-30).
CLPB, though primarily recognized for its role in protein homeostasis (31), demonstrated significant up-regulation in HGOC, implying an unexplored oncogenic function. While limited studies link CLPB to ovarian function (32), our findings provide the first evidence of its potential involvement in HGOC pathogenesis, meriting mechanistic investigation. YWHAZ, a multifunctional regulator of cell proliferation and apoptosis (33, 34), was similarly overexpressed, consistent with prior observations of its role in chemotherapy resistance (35) and microRNA-mediated oncogenic pathways (36). Our data reinforce YWHAZ as a candidate for targeted therapy, though its precise molecular mechanisms in HGOC remain to be elucidated.
Metabolic reprogramming emerged as a hallmark of HGOC in our study, exemplified by the up-regulation of CYC1, MRPL13, MRPL15, and SDHA. CYC1, a key component of mitochondrial respiration (37), was elevated in aggressive tumors, aligning with reports of metabolic heterogeneity in ovarian cancer (38, 39) and its association with uncontrolled proliferation (40). Similarly, MRPL13 and MRPL15, coding mitochondrial ribosomal proteins, were overexpressed, mirroring their documented roles in breast and lung cancers (41, 42). Their association with advanced disease stages (43) and modulation of PI3K/AKT/mTOR signaling suggests a broader role in HGOC progression, possibly through mitochondrial dysfunction. SDHA, a critical enzyme in oxidative phosphorylation (44), further underscores the metabolic adaptability of HGOC cells, with silencing of SDHA shown to suppress tumor growth (44-46).
Finally, RFC4, a DNA replication and repair factor, was significantly up-regulated, consistent with its established association with poor prognosis (47, 48). Our results not only validate RFC4 as a prognostic biomarker but also highlight its broader role in HGOC genomic instability, offering a rationale for targeting DNA replication machinery in therapy.
While this study primarily focused on up-regulated genes as potential oncogenic drivers, it is crucial to acknowledge the biological significance of down-regulated genes, which can function as tumor suppressors or indicators of impaired cellular processes. For instance, among the 40 down-regulated genes identified, examples such as BCL2, DLC1, and SOX4 were notable. BCL2 is a well-known anti-apoptotic gene, and its down-regulation could indicate altered apoptotic pathways that paradoxically might contribute to drug resistance or selective survival mechanisms in certain contexts of HGOC (49).
DLC1, a Rho GTPase-activating protein, is frequently reported as a tumor suppressor in various cancers, and its reduced expression can lead to increased cell migration and invasion (50).
Similarly, SOX4 is a transcription factor with context-dependent roles, often acting as an oncogene but also demonstrating tumor-suppressive functions in some cancers (51); its down-regulation here warrants further investigation into its precise role in HGOC.
While these genes were not the primary focus, their inclusion in the identified DEGs underscores the complex molecular landscape of HGOC and opens avenues for future research into their specific contributions to disease progression and their potential as therapeutic targets.
Study limitations
This study has several inherent limitations. Firstly, our analysis primarily focuses on gene expression data, which provides a snapshot of gene activity but may not fully capture the intricate complexities of ovarian cancer. Other crucial factors, such as epigenetic modifications, PPIs, and microRNA regulation, also significantly influence tumor development and progression. Secondly, the reliance on publicly available datasets introduces potential biases arising from variations in patient populations, sample collection methods, and data processing techniques. Finally, experimental validation is crucial to confirm the functional roles of the identified genes and to explore their therapeutic potential in preclinical and clinical settings.
Conclusion
Our study identified a set of up-regulated genes, including GMPS, CHEK1, CLPB, YWHAZ, CYC1, MRPL13, MRPL15, SDHA, and RFC4, as potential biomarkers for HGOC. While the role of GMPS in ovarian cancer has been relatively understudied, our findings suggest its potential as a valuable biomarker. While down-regulated genes may also hold biological significance, their analysis falls outside the scope of this study, which focused on oncogenic drivers, and should be explored in future research. To fully exploit the clinical potential of these genes, a systems biology approach is necessary to understand their complex interactions within cellular networks. Further wet lab validation is crucial to translate these findings into clinical applications.


