Research 1. Machine Learning-Empowered Quantitative Pathology I spearheaded the first fully-automated analytical workflow that extracted 9,879 quantitative features from digital whole-slide histopathology images. Whole-slide histopathology images contain billions of pixels and are difficult to process. To address this challenge, I established image processing modules to identify the regions of interest and extract features describing the size, shape, and pixel intensity distribution of the cell nuclei and cytoplasm. The extracted features from lung cancer histopathology slides successfully predicted patients’ diagnoses and prognoses. I further incorporated patients’ quantitative histopathology features with their transcriptomic profiles to identify the associations between molecular pathways and histology morphology of lung adenocarcinoma. These works lay the foundation of quantitative digital pathology analysis. Selected publications: a. Kun-Hsing Yu, Ce Zhang, Gerald J. Berry, Russ B. Altman, Christopher Ré, Daniel L. Rubin, Michael Snyder. Predicting Non-Small Cell Lung Cancer Prognosis by Fully Automated Microscopic Pathology Image Features. Nature Communications. 2016 Aug 16;7:12474. [PubMed] [Codes] b. Kun-Hsing Yu, Gerald J. Berry, Daniel L. Rubin, Christopher Ré, Russ B. Altman, Michael Snyder Association of Omics Features with Histopathology Patterns in Lung Adenocarcinoma. Cell Systems. 2017 Dec 27;5(6):620-627.e3. [PubMed] c. Kun-Hsing Yu. Quantitative Pathology Analysis and Diagnosis using Neural Networks. U.S. Patent Application No.: 16/179,101. d. Kun-Hsing Yu, Andrew L. Beam, Isaac S. Kohane. Artificial Intelligence in Healthcare. Nature Biomedical Engineering. 2018 Oct. 10;2:719–731. [Paper] Machine Learning-Empowered Quantitative Pathology 2. Bioinformatics Methods for Multi-Omics Data Analyses My early work focused on the development of bioinformatics methods for human proteome analysis. High-throughput proteomics methods generate gigabytes of raw data. To fully utilize this rich information source, I established a computational framework that integrates proteomic and genomic data to characterize the genome-proteome correlation. Using the algorithms I devised, we identified biomarkers for chemotherapy resistance in ovarian cancer patients and early stage markers for colorectal cancer. These biomarkers were validated in a separate test set not involved in the model training. Selected publications: a. Kun-Hsing Yu, Douglas A. Levine, Hui Zhang, Daniel W. Chan, Zhen Zhang, Michael Snyder. Predicting Ovarian Cancer Patients' Clinical Response to Platinum-based Chemotherapy by their Tumor Proteomic Signatures. Journal of Proteome Research. 2016 Aug 5;15(8):2455-65. [PubMed] b. Kun-Hsing Yu, Michael Snyder. Omics profiling in precision oncology. Molecular & Cellular Proteomics. 2016 Aug;15(8):2525-36. [PubMed] c. Hui Zhang, Tao Liu, Zhen Zhang, Samuel H Payne, Jason E McDermott, Jian-Ying Zhou, Vladislav A Petyuk, Lily Chen, Debjit Ray, Shisheng Sun, Feng Yang, Bai Zhang, Jing Wang, Seong Won Cha, Lijun Chen, Sunghee Woo, Punit Shah, Paul Aiyetan, Yuan Tian, Caitlin Choi, Marina A Gritsenko, Ronald J Moore, Matthew E Monroe, Kun-Hsing Yu, David Tabb, David Fenyo, Vineet Bafna, Joseph Wang, Ie-Ming Shih, Akhilesh Pandey, Bing Zhang, Michael Snyder, Doug Levine, Richard D Smith, Daniel W Chan, Karin D Rodland and the TCGA investigators. Deep proteogenomic characterization of human ovarian cancer. Cell. 2016 Jul 28;166(3):755-65. [PubMed] d. Chia-Li Han, Jinn-Shiun Chen, Err-Cheng Chan, Chien-Peng Wu, Kun-Hsing Yu, Chia-Feng Tsai, Guei-Tian Chen, Chih-Wei Chien, Yung-Bin Kuo, Pei-Yi Lin, Chung-Chuan Chan, Jao-Song Yu, Yu-Ju Chen. An Informatics-assisted Label-free Approach for Personalized Tissue Membrane Proteomics: Case Study on Colorectal Cancer. Molecular & Cellular Proteomics. 2011 Apr;10(4):M110.003087. [PubMed] Bioinformatics Methods for Multi-Omics Data Analyses 3. Data-driven Algorithms for Structured and Unstructured Medical Data I develop data-driven algorithms that sift information from terabytes of structured and unstructured medical data, including annotations of genomic variations, transcriptomic profiles, and the medical literature. These methods achieved improved precision and recall in identifying the human proteins relevant to health and disease status as well as predicted the clinical phenotypes of cancer patients participated in The Cancer Genome Atlas (TCGA). I deployed these tools as cloud-based systems and open-sourced my codes. My algorithms are routinely used by investigators of the Biology/Disease-driven Human Proteome Project (B/D-HPP) of the Human Proteome Organization (HUPO). Selected publications: a. Kun-Hsing Yu, Oren Miron, Nathan Palmer, Dario Lemos, Kathe Fox, S. C. Kou, Mustafa Sahin, Isaac S. Kohane. Data-Driven Analyses Revealed the Comorbidity Landscape of Tuberous Sclerosis Complex. Neurology. 2018 Nov 20;91(21):974-976. [PubMed] b. Kun-Hsing Yu, Tsung-Lu Michael Lee, Chi-Shiang Wang, Yu-Ju Chen, Christopher Ré, Samuel C. Kou, Jung-Hsien Chiang, Michael Snyder, Isaac S. Kohane. A Cloud-Based Metabolite and Chemical Prioritization System for the Biology/Disease-Driven Human Proteome Project. Journal of Proteome Research. 2018 Aug 21;17(12):4345-4357. [PubMed] [Codes] [Running Server] c. Kun-Hsing Yu, Michael R. Fitzpatrick, Luke Pappas, Warren Chan, Jessica Kung, Michael Snyder. Omics AnalySIs System for PRecision Oncology (OASISPRO): A Web-based Omics Analysis Tool for Clinical Phenotype Prediction. Bioinformatics,. 2018 Jan 15;34(2):319-320. [PubMed] [Codes] [Running Server] d. Kun-Hsing Yu, Tsung-Lu Michael Lee, Chi-Shiang Wang, Yu-Ju Chen, Christopher Ré, Samuel C. Kou, Jung-Hsien Chiang, Isaac S. Kohane, Michael Snyder. Systematic Protein Prioritization for Targeted Proteomics Studies through Literature Mining. Journal of Proteome Research. 2018 Apr 6;17(4):1383-1396. [PubMed] [Codes] [Running Server] Data-driven Algorithms for Structured and Unstructured Medical Data 4. Unravelling the Genetic Basis of Complex Phenotypes In collaboration with a team of researchers, I developed bioinformatics methods to identify the genetic and transcriptomic profiles related to complex phenotypes, such as bronchopulmonary dysplasia (BPD). In these studies, I identified the rare mutations associated with increased BPD risk among premature infants as well as predicted the patient-specific susceptibility to drug cardiotoxicity. These analyses revealed the disrupted biological pathways underpinning diseases and pharmacodynamics and pointed to potential treatment strategies. a. Jingjing Li*, Kun-Hsing Yu*, John Oehlert, Laura L Jeliffe-Pawlowski, Jeffrey B Gould, David K Stevenson, Michael Snyder, Gary M Shaw, Hugh M O’Brodovich. Exome Sequencing of Neonatal Blood Spots Identifies Genes Implicated in Bronchopulmonary Dysplasia. American Journal of Respiratory and Critical Care Medicine. 2015 Sep 1;192(5):589-96. [PubMed] b. Kun-Hsing Yu, Jingjing Li, Michael Snyder, Gary M. Shaw, Hugh M. O'Brodovich. The Genetic Predisposition to Bronchopulmonary Dysplasia. Current Opinion in Pediatrics. 2016 Jun;28(3): 318-23. [PubMed] c. Elena Matsa, Paul W. Burridge, Kun-Hsing Yu, John H. Ahrens, Haodi Wu, Praveen Shukla, Jared M. Churko, Joseph D. Gold, Michael P. Snyder, Joseph C. Wu. Transcriptomic analysis of inter and intra patient variation in hiPSC-cardiomyocytes is predictive of patient-specific cardiotoxicity. Cell Stem Cell. 2016 Sep 1;19(3):311-25. [PubMed]