Event box
Clustering & Classification (4 of 11): Distribution-Based Models
The Advanced R Series: Clustering & Classification
Data rarely comes neatly labeled or structured, yet patterns still exist—even when they are not immediately obvious. Clustering and classification methods allow researchers to uncover structure in their data, group similar observations, and reduce dimensionality without imposing rigid assumptions about the underlying relationships.
This series introduces researchers to statistical and machine-learning methods for grouping, modeling, and interpreting high-dimensional data. Participants will learn a broad range of approaches—from hierarchical and centroid-based models to probabilistic, fuzzy, density-based, graph-based, and mixed-type clustering techniques—along with strategies for dimensionality reduction and fairness considerations. Emphasis is placed on understanding model assumptions, evaluating model performance, and selecting methods that align with the characteristics of the data rather than forcing data to fit inappropriate models.
All workshops will use R and RStudio, so some experience with R or other programming languages is encouraged but not required. See the R Fundamentals for Data Analysis for an introduction to R and RStudio. Attendees without prior experience are encouraged to review this content.
Distribution-Based Models (workshop 4 of 11): This workshop introduces distribution-based clustering techniques, with emphasis on Gaussian mixture models. We discuss likelihood-based estimation, model selection, and cluster uncertainty. Participants will learn to fit mixture models, interpret component distributions, and assess probabilistic cluster assignments.
Application: Biomedical Data
Questions? Please reach out to the Centre for Scholarly Communication at csc.ok@ubc.ca.
A full schedule of workshops can be found at csc.ok.ubc.ca/workshops/