Identification of Biomarkers to Improve Patient Selection in a Clinical Trial (9/20/2014)


Sat, 09/20/2014

Customer requests a proposal for a project whose purpose to identify mutation signatures that differentiate patients responding to a treatment (undergoing Phase II clinical trial) from those that did not respond well, for purposes of defining improved criteria in selecting patients most suitable for the treatment. Data available are obtained through targeted resequencing experiments for a few hundred genes, followed by variant calling (SNVs and Indels), for a few dozen patients participating in the trial.

Sat, 09/20/2014 at 8:26 PM

AccuraScience LB: (Abbreviated version of the proposal) We would propose that three types of analysis be attempted.

(1) Unsupervised clustering analysis for the mutation profiles of all samples (blinded), with the expectation that the profiles of the patients responding to the treatment will cluster next to each other, and those of the patients not responding well will cluster to a separate group. It will also be interesting to see whether there is second-level subgrouping of the patients’ mutation profiles.

Multiple clustering methods will be attempted, including hierarchical clustering, K-means clustering, and self-organizing maps (SOMs).

(2) Supervised classification analysis. With the class labels (which patients responded well and which did not), we will attempt to develop classification models to distinguish the patients responding well and those that did not.

Multiple classification algorithms will be attempted, including Logistic Regression, and Support Vector Machines (SVMs). Classification performance for each method will be evaluated and compared.

(3) Feature selection analysis combined with supervised classification. The term “features” refers to the mutations (SNVs and Indels). It is expected that hundreds or even thousands of mutations will appear in the mutation profiles for the hundreds of genes and tens of samples. The purpose of feature selection analysis is to eliminate the uninformative features that do not contribute to the classification performance, and retain the features that help most in distinguishing between the two groups of patients. This analysis is performed in conjunction with the supervised classification, and often results in improved classification performance in the classification model.

Multiple classification algorithms can be attempted, including recursive feature elimination (RFE), a conditional mutual information (CMI) based method, and a t-test based method.

Back to Other Selected Recent Inquiries

Note: LB stands for Lead Bioinformatician. An AccuraScience LB is a senior bioinformatics expert and leader of an AccuraScience data analysis team.

Disclaimer: This text was selected and edited based on genuine communications that took place between a customer and AccuraScience data analysis team at specified dates and times. The editing was made to protect the customer’s privacy and for brevity. The edited text may or may not have been reviewed and approved by the customer. AccuraScience is solely responsible for the accuracy of the information reflected in this text.