Event box
Clustering & Classification (7 of 11): Mixed-Type Data Quantification
The Advanced R Series: Clustering & Classification
Data rarely comes neatly labeled or structured, yet patterns still exist—even when they are not immediately obvious. Clustering and classification methods allow researchers to uncover structure in their data, group similar observations, and reduce dimensionality without imposing rigid assumptions about the underlying relationships.
This series introduces researchers to statistical and machine-learning methods for grouping, modeling, and interpreting high-dimensional data. Participants will learn a broad range of approaches—from hierarchical and centroid-based models to probabilistic, fuzzy, density-based, graph-based, and mixed-type clustering techniques—along with strategies for dimensionality reduction and fairness considerations. Emphasis is placed on understanding model assumptions, evaluating model performance, and selecting methods that align with the characteristics of the data rather than forcing data to fit inappropriate models.
All workshops will use R and RStudio, so some experience with R or other programming languages is encouraged but not required. See the R Fundamentals for Data Analysis for an introduction to R and RStudio. Attendees without prior experience are encouraged to review this content.
Mixed-Type Data Quantification (workshop 7 of 11): This session focuses on preparing mixed-type datasets—categorical, ordinal, and continuous variables—for clustering. We cover distance measures, data transformations, and quantification techniques that preserve variable meaning. Participants will be able to preprocess heterogeneous data in R and select appropriate similarity metrics.
Application: Survey & Demographic Data
Questions? Please reach out to the Centre for Scholarly Communication at csc.ok@ubc.ca.
A full schedule of workshops can be found at csc.ok.ubc.ca/workshops/