prioGene


Introduction

Prioritizing candidate genes for complex non-communicable diseases is critical to understanding their mechanisms and developing better diagnostics and treatments.

With the emergence of a great amount of biological data, more and more studies have been carried out on the identification and sequencing of disease-related genes by using the calculation method of protein-protein interaction (PPI) network. In gene sequencing methods, the topological features of PPI networks are often used, such as ToppNet (https://toppGene.cchmc.org) and Razaghi Moghadam’s gene sequencing method.

In this study, a candidate gene prioritization method was proposed for non-communicable diseases considering disease risks transferred between genes in weighted disease PPI networks with weights for nodes and edges based on functional information.

Construction of weighted networks

In biological networks, nodes represented genes and edges represented interactions between products of these genes. Weights for nodes (genes) and edges (interactions) were calculated utilizing functional information represented by GO terms, respectively. For each gene g, the gene weight w g was defined as the proportion of GO terms annotated by g in all GO terms annotated by human genes. The interaction weight Wgh was defined as the functional similarity of two interacting genes g and h.

Prioritization of candidate genes

The prioritization of candidate genes was performed based on disease risk scores of each gene obtained from an iteration process considering disease risks transferred between genes.


Installation

Get the development version from github:

if(!requireNamespace("devtools", quietly = TRUE))
    install.packages("devtools")
devtools::install_github("huerqiang/prioGene")

Or the released version from CRAN:

install.packages("prioGene")

Common operations on prioGene

library(prioGene)
#> Warning: multiple methods tables found for 'union'
#> Warning: multiple methods tables found for 'intersect'
#> Warning: multiple methods tables found for 'setdiff'
#> Warning: multiple methods tables found for 'setequal'
#> Warning: multiple methods tables found for 'union'
#> Warning: multiple methods tables found for 'intersect'
#> Warning: multiple methods tables found for 'setdiff'
#> Warning: multiple methods tables found for 'intersect'
#> Warning: multiple methods tables found for 'union'
#> Warning: multiple methods tables found for 'intersect'
#> Warning: multiple methods tables found for 'setdiff'
#> Warning: multiple methods tables found for 'setequal'
#> 

2. Calculation of network weights

These five functions form a pipeline to weight the nodes and edges of the network based on functional information. GO function annotation information comes from org.Hs.eg.db.

genes_mat <- get_gene_mat(net_disease)
#> 'select()' returned 1:many mapping between keys and columns
terms_mat <- get_term_mat(net_disease)
#> 'select()' returned 1:many mapping between keys and columns
net_disease_term <- get_net_disease_term(genes_mat,net_disease)
node_weight <- get_node_weight(genes_mat)
edge_weight <- get_edge_weight(net_disease_term,terms_mat)

3. Prioritization of candidate genes

The prioritization of candidate genes was performed based on disease risk scores of each gene obtained from an iteration process considering disease risks transferred between genes.

R_0<- get_R_0(dise_gene,node_weight,f=1)
result <- get_R(node_weight, net_disease_term, bet = 0.5, R_0 = R_0, threshold = 10^(-9))
#> [1] 24

Session information

sessionInfo()
#> R version 4.4.2 (2024-10-31)
#> Platform: x86_64-pc-linux-gnu
#> Running under: Ubuntu 24.04.1 LTS
#> 
#> Matrix products: default
#> BLAS:   /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3 
#> LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.26.so;  LAPACK version 3.12.0
#> 
#> locale:
#>  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
#>  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=C              
#>  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
#>  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
#>  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
#> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
#> 
#> time zone: Etc/UTC
#> tzcode source: system (glibc)
#> 
#> attached base packages:
#> [1] stats     graphics  grDevices utils     datasets  methods   base     
#> 
#> other attached packages:
#> [1] prioGene_1.0.1 rmarkdown_2.29
#> 
#> loaded via a namespace (and not attached):
#>  [1] bit_4.5.0               jsonlite_1.8.9          compiler_4.4.2         
#>  [4] crayon_1.5.3            Biobase_2.67.0          blob_1.2.4             
#>  [7] Biostrings_2.75.1       jquerylib_0.1.4         IRanges_2.41.0         
#> [10] png_0.1-8               yaml_2.3.10             fastmap_1.2.0          
#> [13] org.Hs.eg.db_3.20.0     R6_2.5.1                XVector_0.47.0         
#> [16] generics_0.1.3          GenomeInfoDb_1.43.0     knitr_1.49             
#> [19] BiocGenerics_0.53.2     maketools_1.3.1         GenomeInfoDbData_1.2.13
#> [22] AnnotationDbi_1.69.0    DBI_1.2.3               bslib_0.8.0            
#> [25] rlang_1.1.4             KEGGREST_1.47.0         cachem_1.1.0           
#> [28] xfun_0.49               sass_0.4.9              sys_3.4.3              
#> [31] bit64_4.5.2             RSQLite_2.3.7           memoise_2.0.1          
#> [34] cli_3.6.3               zlibbioc_1.52.0         digest_0.6.37          
#> [37] lifecycle_1.0.4         S4Vectors_0.45.1        vctrs_0.6.5            
#> [40] evaluate_1.0.1          buildtools_1.0.0        stats4_4.4.2           
#> [43] httr_1.4.7              pkgconfig_2.0.3         UCSC.utils_1.3.0       
#> [46] tools_4.4.2             htmltools_0.5.8.1