
Speaker: Brian Tom
Venue: Room 210, Haina Building 2, Zijingang Campus
Abstract: In precision medicine, there is growing interest in discovering disease endotypes from high-dimensional biomarkers which associate with clinically relevant outcomes or phenotypes. Motivated by an application in knee osteoarthritis (OA) using synovial fluid protein markers, we aimed to identify OA endotypes linked to disease severity (based on Kellgren–Lawrence grade). Bayesian profile regression (Molitor et al. 2010) was adopted to perform this outcome-guided clustering. However, although the clusters found were stable across training and validation sets, the prediction of disease severity from these clusters were poor in the validation set; thus, limiting its practical use. The poor out-of-sample performance can partly be attributed to the weak influence of the single binary outcome relative to the high dimensional set of biomarkers in determining the clustering structure. To address this imbalance and for more general multi-task learning problems, we propose a generalized Bayesian framework for outcome-guided clustering by reformulating the problem as a decision problem with separate loss components for the clustering and prediction tasks. This framework offers greater flexibility and admits standard Bayesian profile regression as a special case. To improve the interpretability of the weighting of the different loss components, we propose a principled standardization that places each task risk on a common scale, thereby decoupling the task weights from the learning rate. The learning rate is tuned/guided by an estimate of the expected out-of-sample risk under a user-chosen evaluation loss.