Identification of disease-causing genes among a large number of candidates is a fundamental challenge in human disease studies.However,it is still time-consuming and laborious to determine the real disease-causing genes by biological experiments.With the advances of the high-throughput techniques,a large number of protein-protein interactions have been produced.Therefore,to address this issue,several methods based on protein interaction network have been proposed.In this paper,we propose a shortest path-based algorithm,named SPranker,to prioritize disease-causing genes in protein interaction networks.Considering the fact that diseases with similar phenotypes are generally caused by functionally related genes,we further propose an improved algorithm SPGOranker by integrating the semantic similarity of gene ontology(GO)annotations.SPGOranker not only considers the topological similarity between protein pairs in a protein interaction network but also takes their functional similarity into account.The proposed algorithms SPranker and SPGOranker were applied to 1598 known orphan disease-causing genes from 172 orphan diseases and compared with three state-of-the-art approaches,ICN,VS and RWR.The experimental results show that SPranker and SPGOranker outperform ICN,VS,and RWR for the prioritization of orphan disease-causing genes.Importantly,for the case study of severe combined immunodeficiency,SPranker and SPGOranker predict several novel causal genes.
LI MinLI QiGANEGODA Gamage UpekshaWANG JianXinWU FangXiangPAN Yi
Essential proteins are those necessary for the survival or reproduction of species and discovering such essential proteins is fundamental for understanding the minimal requirements for cellular life, which is also meaningful to the disease study and drug design. With the development of high-throughput techniques, a large number of Protein-Protein Interactions(PPIs) can be used to identify essential proteins at the network level. Up to now, though a series of network-based computational methods have been proposed, it is still a challenge to improve the prediction precision as the high false positives in PPI networks. In this paper, we propose a new method GOS to identify essential proteins by integrating the Gene expressions, Orthology, and Subcellular localization information.The gene expressions and subcellular localization information are used to determine whether a neighbor in the PPI network is reliable. Only reliable neighbors are considered when we analyze the topological characteristics of a protein in a PPI network. We also analyze the orthologous attributes of each protein to reflect its conservative features, and use a random walk model to integrate a protein's topological characteristics and its orthology. The experimental results on the yeast PPI network show that the proposed method GOS outperforms the ten existing methods DC, BC, CC, SC, EC, IC, NC, Pe C, ION, and CSC.
Min LiZhibei NiuXiaopei ChenPing ZhongFangxiang WuYi Pan
Essential proteins are vital to the survival of a cell. There are various features related to the essentiality of proteins, such as biological and topological features. Many computational methods have been developed to identify essential proteins by using these features. However, it is still a big challenge to design an effective method that is able to select suitable features and integrate them to predict essential proteins. In this work, we first collect 26 features, and use SVM-RFE to select some of them to create a feature space for predicting essential proteins, and then remove the features that share the biological meaning with other features in the feature space according to their Pearson Correlation Coefficients(PCC). The experiments are carried out on S. cerevisiae data. Six features are determined as the best subset of features. To assess the prediction performance of our method, we further compare it with some machine learning methods, such as SVM, Naive Bayes, Bayes Network, and NBTree when inputting the different number of features. The results show that those methods using the 6 features outperform that using other features, which confirms the effectiveness of our feature selection method for essential protein prediction.
Jiancheng ZhongJianxin WangWei PengZhen ZhangMin Li