Hi there, I'm Yue ZHAO (赵越 in Chinese)! 👋
😄 I am on the market with expected graduation in Summer 2023. I am broadly interested in machine learning, data mining and science, and information science and systems positions. I can work in the U.S., Canada, and China without sponsorship; please reach out if you have an open opportunity in either academia or industry! Please reach out by email (zhaoy [AT] cmu.edu)
🌱 Short Bio: My name is Yue ZHAO (赵越 in Chinese). I am a rising 4-th year Ph.D. student at Carnegie Mellon University (CMU). Before joining CMU, I earned my Master degree from University of Toronto (2016) and Bachelor degree from University of Cincinnati (2015), and worked as a senior consultant at PwC Canada (2016-19). I am an expert on anomaly detection (a.k.a outlier detection) algorithms, systems, and its applications in security, healthcare, and Finance, with more than 7-year professional experience and 20+ papers (in JMLR, TKDE, NeurIPS, etc.). I appreciate the support from Norton Labs Graduate Fellowship. See my homepage and CV for more information.
Contributions to outlier detection systems, benchmarks, and applications: I build automated, scalable, and accelerated machine learning systems (MLSys) to support large-scale, real-world outlier detection applications in security, finance, and healthcare with millions of downloads. I designed CPU-based (PyOD), GPU-based (TOD), distributed detection systems (SUOD) for tabular (PyOD), time-series (TODS), and graph data (PyGOD). To understand the characteristics of OD algorithms, I co-author large-scale benchmarks for tabular data (ADBench), time-series data (paper), and graph data (UNOD). My work has been widely used by thousands of projects and applications, including leading firms like IBM, Morgan Stanley, and Tesla. See more applications.
🔭 Research outcomes (related to outlier detection if not specified):
Primary field | Secondary | Method | Year | Venue | Lead author |
---|---|---|---|---|---|
large-scale Benchmark | tabular anomaly detection | ADBench | 2022 | Preprint | Y |
large-scale Benchmark | graph anomaly detection | UNOD | 2022 | Preprint | Y |
large-scale Benchmark | sequence anomaly detection | TODS | 2021 | NeurIPS | |
automated machine learning | outlier model selection | MetaOD | 2021 | NeurIPS | Y |
automated machine learning | outlier model selection | ELECT | 2022 | ICDM | Y |
automated machine learning | outlier HP optimization | HPOD | 2022 | Preprint | Y |
automated machine learning | outlier evaluation | IPM | 2021 | Preprint | Y |
machine learning systems | PyOD | 2019 | JMLR | Y | |
machine learning systems | time series | TODS | 2020 | AAAI | |
machine learning systems | SUOD | 2021 | MLSys | Y | |
machine learning systems | distributed systems | TOD | 2022 | Preprint | Y |
machine learning systems | graph neural networks | PyGOD | 2022 | Preprint | Y |
ensemble learning | semi-supervised | XGBOD | 2018 | IJCNN | Y |
ensemble learning | LSCP | 2019 | SDM | Y | |
ensemble learning | machine learning systems | combo | 2020 | AAAI | Y |
ensemble learning | interpretable ML | COPOD | 2020 | ICDM | Y |
ensemble learning | interpretable ML | ECOD | 2022 | TKDE | Y |
graph mining | finance | AutoAudit | 2020 | BigData | |
graph neural networks | contrastive learning | CONAD | 2022 | PAKDD | |
Diffusion Models | survey | 2022 | Preprint | ||
AI x Science | synthetic data | SynC | 2020 | ICDMW | |
AI x Science | healthcare AI | PyHealth | 2020 | Preprint | Y |
AI x Science | Datasets & Benchmark | TDC | 2021 | NeurIPS | |
AI x Science | Datasets & Benchmark | TDC V2 | 2022 | NCHEMB |
At CMU, I work with Prof. Leman Akoglu, Prof. Zhihao Jia, and Prof. George H. Chen. Externally, I collaborate with Prof. Jure Leskovec at Stanford, Prof. Xia "Ben" Hu at Rice University, and Prof. Philip S. Yu at UIC.
⚡ Open-source Contribution: I have led or contributed as a core member to more than 10 ML open-source initiatives, receiving 14,000 GitHub stars (top 0.002%: ranked 800 out of 40M GitHub users) and >10,000,000 total downloads. Popular ones:
- PyOD: A Python Toolbox for Scalable Outlier Detection (Anomaly Detection).
- ADBench: The most comprehensive tabular anomaly detection benchmark (30 anomaly detection algorithms on 55 benchmark datasets).
- TOD: Tensor-based outlier detection--First large-scale GPU-based system for acceleration!
- SUOD: An Acceleration System for Large-scale Heterogeneous Outlier Detection.
- anomaly-detection-resources: The most starred resources (books, courses, etc.)!
- Python Graph Outlier Detection (PyGOD): A Python Library for Graph Outlier Detection.
- Therapeutics Data Commons (TDC): Machine learning for drug discovery.
- PyTorch Geometric (PyG): Graph Neural Network Library for PyTorch. Contributed to profiler & benchmarking, and heterogeneous data transformation.
- combo: A Python Toolbox for ML Model Combination (Ensemble Learning).
- TODS: Time-series Outlier Detection. Contributed to core detection models.
- MetaOD: Automatic Unsupervised Outlier Model Selection (AutoML).
📫 Contact me by:
💬 News & Travel:
-
Sep 2022: Check out our comprehensive survey on diffusion models. Star the code repo!
-
Aug 2022: ELECT: Toward Unsupervised Outlier Model Selection is accepted to IEEE International Conference on Data Mining (ICDM) as a regular paper!
-
Jul 2022: 🌟 Reached 1000 citations on Google Scholar!
-
Jul 2022: Invited talk on anomaly detection for risk modeling at Wells Fargo.
-
Jul 2022: Invited guest lecture on anomaly detection (e.g., PyOD) for Master of the Applied Analytics at Columbia University.
-
Jun 2022: We just released a 36-page, the most comprehensive anomaly detection benchmark paper. The fully open-sourced ADBench compares 30 anomaly detection algorithms on 55 benchmark datasets. Please star, fork, and follow for the latest update! See paper here!
-
Jun 2022: We just released the first node-level graph outlier detection benchmark paper. PyGOD Benchmark compares 10+ graph outlier detection algorithms with many new insights! Please star, fork, and follow for the latest update! See paper here!