Gligorijevic, Dj., et. al. @ Sci. Rep.

Data-driven phenotype analyses on Electronic Health Record (EHR) data have recently drawn benefits across many areas of clinical practice, uncovering new links in the medical sciences that can potentially affect the well-being of millions of patients. In this paper, EHR data is used to discover novel relationships between diseases by studying their comorbidities (co-occurrences in patients). A novel embedding model is designed to extract knowledge from disease comorbidities by learning from a large-scale EHR database comprising more than 35 million inpatient cases spanning nearly a decade, revealing significant improvements on disease phenotyping over current computational approaches. In addition, the use of the proposed methodology is extended to discover novel disease-gene associations by including valuable domain knowledge from genome-wide association studies. To evaluate our approach, its effectiveness is compared against a held-out set where, again, it revealed very compelling results. For selected diseases, we further identify candidate gene lists for which disease-gene associations were not studied previously. Thus, our approach provides biomedical researchers with new tools to filter genes of interest, thus, reducing costly lab studies.


Gligorijevic, Dj., Stojanovic, J., Djuric, N., Radosavljevic, V., Grbovic, M., Kulathinal, R.J., Obradovic, Z. (2016) “ Large-Scale Discovery of Disease-Disease and Disease-Gene Associations,” Scientific Reports, Nature Publishing Group, 2016, Aug 31, 6:32404 doi 10.1038/srep32404.
(Impact Factor: 5.578)

Data (disease2vec and disease&gene2vec vectors)