Biomarkers: Statistical session

# Biomarkers: Statistical session
## Almirall
### <a href="#alejandro-caceres">Alejandro Cáceres</a>
### 2021/15/03

---

## Who am I?

---

##Experience

- Senior statistician in the Bioinformatics group at Barcelona Institute of Global Health <https://www.isglobal.org/>
 
- Adjunct lecturer in statustics at the Universitat Politectica de Catalunya <https://eebe.upc.edu/es>
 
- Over 13 years of experience analyzing biomedical data

- Develop novel analysis methods for biomarker discovery

- High dimensional data including imaging and omic data: genomic, transcriptomic, exposomic, etc.

- I write scientific articles and implement methods in software packages (R/Matlab).

---

##You can find me at 
  
- [linkedin](https://es.linkedin.com/in/alejandro-caceres-dominguez-7449aa176)

- [google scholar](https://scholar.google.es/citations?user=s1D-6WAAAAAJ&hl=es)

- [gitHub](https://github.com/alejandro-isglobal)

- [my blog](https://alejandro-isglobal.github.io/)

---

##Some examples of my work

Analytical validation of a biomarker: Functional magnetic resonance imaging

---

##Some examples of my work

Discovery of risk Biomarker: Chromosome Y function and risk of disease in men

---

##Aim of the talk

- What is the utility of surveying thousand/million biomarkers in drug development?
  
- How is biomarker utility assessed in terms of statistical evidence?

---

##Content

- Introduction:
  - Biomarker definition
  - Types of biomarkers (context of use)
  - Guidance for qualification and reporting (FDA)
- Statistical evidence of biomarker utility in assessing treatment efficacy:
  - Biomarker's sensitivity and specificity
  - Regression analyses of biomarker levels on efficacy (stratified and with interactions by treatment)
   - Multiple/Composite Biomarkers
- Examples:
   - Assessment of HIV antiviral resistance with a composite genomic biomarker (ROC curve, pubmed: https://pubmed.ncbi.nlm.nih.gov/12060770/)
   - Prediction of Brodalumab treatment on psoriasis area severity index using gene expression data. (https://pubmed.ncbi.nlm.nih.gov/31883845/)

---

## Introduction: Biomarker definition


**Definition**:
 - A biomarker is a biological measurement that has the potential to inform **decision-making** in relation to a clinical treatment or intervention.

**Properties**:
 - Source material or matrix 
 - Method of measurement
 - Purpose

---

## Biomarker's source

- Source material or matrix:
  - specific analyte (e.g., cholesterol)
  - anatomic feature (e.g., joint angle)
  - physiological characteristic (e.g., blood pressure)

- when they are composite
  - how the compoenents are interrelated (e.g. genomic data)
  - how is the process of obtaining the biomarker (e.g. algorithm, score)

---

## Biomarker's Method of measurement

- molecular
  
  - histologic
  
  - radiographic
  
  - physiologic characteristic
  
  - composite (algorithmic)

---

## Biomarkers are continuously created

- New methods of measurements

- New analyses are constantly developed

*The aim is to address outstanding needs!*

---

## How do biomarkers inform decision-making during drug development?

---

## Context of use (COU)

COU is a description of the biomarker's specific use in drug development:
  
**1. Category**

- Disease-related: (Who should be studied?)
  - Diagnostic (selection)
  - Prognostic (stratification)
  - Susceptibility/risk (enrich)

- Treatment-related: 
  - Predictive (Who should be treated?)
  - Safety (Should we stop treatment?)
  - Monitoring (Should we continue with treatment?)
  - Pharmacodynamic/response biomarker (What is the outcome of treatment? surrogate endpoint)

---

## Types of biomarkers

| Role        | Description   | 
  | ----------  | ------------- | 
  | Diagnosis of a disease | To make a diagnosis more reliably, more rapidly, or more inexpensively than available methods |
  | Severity assessment | To identify a subgroup of patients with a severe form of a disease associated with an increased probability of death or severe outcome
| Risk assessment | To identify a subgroup of patients who may experience better (or worse) outcome when exposed to an intervention
| Prediction of drug effects | To identify the pharmacological response of a patient exposed to a drug (efficacy, toxicity, and pharmacokinetics) 
| Monitoring | To assess the response to a therapeutic intervention

---

## Types of biomarkers

**2. Use**

- Purpose of use in drug development (safety biomarker to evaluate drug-induced injury)
  
  - Stage of use (phase 1 clinical trial)
  
  - Population (healthy adults, psoriasis patients)
  
  - Therapeutic mechanisms of action for which the biomarkers offer information.

---

## Guidance for qualification (FDA)

- **Qualification** of a biomarker is a determination the biomarker can be relied on to have a specific interpretation and application in drug development and regulatory review.

- Biomarker Qualification: Evidentiary Framework [doc](https://alejandro-isglobal.github.io/teaching/docs/fdabiomarker.pdf)

- When using a biomarker:
  - what has a biomarker been qualified for?

- when developing a biomarker:
  - What are the evidentiary requirements for demonstrating the utility of the biomarker?

---

## Guidance for qualification (FDA)

FDA Guidance for biomarker qualification (2018) for section 507 of the Federal Food, and Cosmetic Act (FD&C Act):

**A**. Define the **need**
  - i.e disease needs and added value of the biomarker to drug development

**B**. Define the context of use **COU**

- i.e Prediction: identify patients that respond to treatment

**C**. Assess **benefits** and **risks**

- i.e. Benefit: Don't treat a patient who won't benefit (specificity), treat a patient who will benefit (sensitivity). 
  - i.e. Risk: not treat a patient who could benefit, treat a patient who won't benefit.

**D**. Determine evidence that is **statistically** sufficient to support COU.

---

## Statistical evidence

**D**. Determining evidence that is **statistically** sufficient to support COU

- Analytical considerations: Is the test reliable?
  
    - validation of the Biomarkers test’s technical performance
    
    - cost-effectiveness, feasibility
    
    - assessment of measurement error

---

## Statistical evidence

Examples:

- Gene expression biomarkers:
  - RNAseq experiments have been experimentally validated, ready to include in a clinical trial

- Composite biomarkers derived from gene expression data through algorithms need to be validated

---

## Statistical evidence

Establish the relationship between a biomarker and an outcome of interest from:

- Randomized controlled trial
  - Single-arm/historical control trial
  - Cohort study
  - Case-control study (including nested)
  - Cross-sectional study
  - Case series or case reports
  - Registry information
  - Meta-analysis

Strongest evidence comes from prospective studies that are specifically designed but data from studies conducted for other purposes can be used to support biomarker utility.

---

## Statistical evidence

- The aim is to provide statistical evidence for the **correlation** between the biomarker and the outcome according to the COU:
    - i.e. Predictive biomarker: correlation between the biomarker and efficacy of treatment.

---

## Types of statistical analysis

---

## Sensitivity and specificity

- Suppose we have a collection of **treated** individuals with measurements of  
  - efficacy as a **response** variable
  - a test to detect the presence of a **biomarker**

- The **Response** measurement is dichotomous and has the events:
  - yes (the patient responded to treatment)
  - no (the patient did not respond to treatment)

- The **Biomarker** measurement is dichotomous (dichotomized by a cut-off) and has the events:
  - positive (the biomarker was detected)
  - negative (the biomarker was not detection detected)

---

## Sensitivity and specificity

|  Subject  |  Response  |  Biomarker  |
| ------------- | ------------- | ---------- |
| `$s_1$`         |   yes        | positive |
| `$s_2$`         |   no         | negative |
| `$s_3$`         |   yes        | positive |
|...            |   ...        | ...      |
| `$s_i$`         |   no         | positive* |
|...            |   ...        | ...      |
|...            |   ...        | ...      |
| `$s_3$`         |   yes        | negative* |
|...            |   ...        | ...      |
  
- Each individual has two measurements: (Response, Biomarker)

---

## Sensitivity and specificity 
  
Let's think first in terms of the response

Within those who responded to treatment (yes), how many 
were detected with the biomarker (positive)?

Sensitivity (true positive rate)
`$$fr(positive|yes)=\frac{n_{positive|yes}}{n_{negative|yes}+n_{negative|yes}}$$`

---

## Sensitivity and specificity

Let's think first in terms of the response

Among those who did not respond to treatment (no), how many were not detected with the biomarker (negative)?

specificity (True negative rate)

`$$fr(negative|no)=\frac{n_{negative|no}}{n_{positive|no}+n_{negative|no}}$$`

---

## Sensitivity and specificity 
 
| | Response: Yes | Response: No |
| --------- | --------- | -------- |
| Biomarker: positive | fr(positive&#124;yes) | fr(positive&#124;no) | 
| Biomarker: negative | fr(negative&#124;yes) | fr(negative&#124;no) | 
| sum | 1 | 1 |

| | Response: Yes | Response: No |
| --------- | --------- | -------- |
| Biomarker: positive |True positive rate (sensitivity) | False positive rate| 
| Biomarker: negative | False negative rate| True negative rate (specificity)| 
| sum | 1 | 1 |

- The trade-off between sensitivity and specificity needs to be evaluated in the Context of Use and usefulness of the biomarker's test

---

## Sensitivity and specificity

Let's think in terms of the biomarker test

Within those whose biomarker was detected (positive), how many responded to treatment (yes)?

Positive predictive value:

`$$fr(yes|positive)=\frac{n_{yes|positive}}{n_{yes|positive}+n_{no|positive}}$$`

---

## Sensitivity and specificity 
  
Let's think in terms of the biomarker test

Within those whose biomarker was not detected (negative), how many did not respond to treatment (no)?

Negative predictive value

`$$fr(no|negative)=\frac{n_{no|negative}}{n_{yes|negative}+n_{no|negative}}$$`

---

## Sensitivity and specificity

| | Response: Yes | Response: No | sum |
| --------- | --------- | -------- | ------ |
| Biomarker: positive | PPV: fr(yes&#124;positive) | fr(no&#124;positive) | 1 |
| Biomarker: negative | fr(yes&#124;negative) | NPV: fr(no&#124;negative) | 1 |

- PPV: positive predicted value
- NPV: negative predicted value

- These are really the values that we want to know. They depend on the probability of response to treatment.

---

## Sensitivity and specificity

There is a way to convert from **sensitivity** to **positive predicted value** (Baye's rule)

`$$fr(yes|possitive)=\frac{fr(positive|yes)}{fr(possitive)}fr(yes)$$`
  
which can be rewritten

`$$\frac{fr(yes|possitive)}{1-fr(yes|possitive)}=\frac{fr(positive|yes)}{fr(negative|yes)} \frac{fr(yes)}{1-fr(yes)}$$`
Odds pre-test `$\rightarrow$` odds post-test

---

## Sensitivity and specificity

`$$ODD_{posttest}=LHR*ODD_{pretest}$$`


 
`$$LHR+=\frac{sensitivity}{1-specificity}$$`

| | LHR+ |
| --- | --- |
| Excellent Efficacy Value | >10 |
| Good Efficacy Value | 5-10 |
| Poor Efficacy Value | 1-5 |
| No Efficacy Value | 1 |
 
 
 
---

- When the levels of the biomarker are continuous then a regression analysis can be used to determine the association with the outcome (response to treatment).
    
- The type of correlation depends on the COU.
    
    
          
---

- Let´s consider the following [study](https://www.nature.com/articles/s41398-019-0521-7):

- **Biomarkers for response in major depression: comparing paroxetine and venlafaxine from two randomized placebo-controlled clinical studies.** Carboni et al. *Translational psychiatry*. 2019
    
    
---

class: middle
      
## Regression analyses
      
- Features of the study 
    
  - Two placebo-controlled studies evaluating the efficacy and tolerability of novel drug candidates.
    
  - Two drug treatments: paroxetine or venlafaxine as active comparators
    
  - panel of peripheral biomarkers (including IL-6, IL-10, TNF-α, TNFRII, BDNF, CRP, MMP9 and PAI1) in depressed patients receiving paroxetine, venlafaxine, or placebo
    
- Aim: assess the correlation between **biomarker levels** and response outcome: 17 item scale of depression symptoms; responders >50% in reduction from baseline (reduction from 2 to 1 = reduction from 10 to 5)
    
---

class: middle
 
## Regression analyses
 
- Demographics:
 
<img src="img/tab1.JPG" style="width:100%" align="center">
 
---

class: middle
      
## Regression analyses
      
**Analysis 1.**
    
Associations between biomarker levels and depression severity at **base-line**:

*Which are state biomarkers?* (CUO: diagnosis) 
      
`$$D_{base} \rightarrow B_{base}$$`
      
- paroxetine sutudy: IL-6 (r=0.23, p=0.018), IL-10 (r=0.19, p=0.045), stratifying by sex no significant associations were found for females. 
    
- veroxine study: No significant correlations were found. 
The biomarkers do not show diagnostic capacity.
    
    
    
---

class: middle
      
## Regression analyses
     
**Analysis 2.**
    
Associations between biomarkers' changes and changes in depression symptoms:

*Which are biomarkers of treatment efficacy?* (CUO: surrogate endpoints)

`$$\Delta D=D_{w10}-  D_{base} \rightarrow \Delta B= B_{w10} - B_{base}$$`

- Adjusting for sex and `$B_{base}$` and `$D_{base}$` in **full** population

---

- TNF-α, IL-6, IL-10 and CRP significantly reduced with `$\Delta D$` in the paroxetine study, none in the venlafaxine.

---

**Analysis 3.**

Associations between changes in symptoms and biomarkers' levels at baseline:

*Which biomarkers predict improvement in symptoms when treated?* (CUO: prognostic under treatment biomarker)
      
- Similar to the sensibility and specificity analysis we can condition (stratify) on treated only
    
`$$B_{base|treated} \rightarrow \Delta D$$`
      
- Adjusting by `$D_{base|treated}$` and sex
    
---

**Analysis 3.**

- For those treated with paroxetine: IL-10 and TNF-α are at baseline were significantly associated changes in depression symptoms at week 10. 
    
- IL-10 and TNF-α showed predictive capacity under paroxetine treatment
    
    
---

class: middle
      
## Regression analyses
      
**Analysis 3.**
    
Associations between changes in symptoms and biomarkers' levels at baseline:

*Which biomarkers predict response when not treated?*
(CUO: Prognosis biomarkers)

`$$B_{base|placebo} \rightarrow \Delta D$$`

- Adjusting by `$D_{base|treated}$` and sex

---

**Analysis 3.**

- CPR was associated with improvement of symptoms in the veroxine study.

- This indicates that improvement of symptoms, when treated with paroxetine, may not be due to placebo effects.

---

**Analysis 3.**

Associations of symptoms changes and the interaction between baseline biomarkers' levels and treatment (CUO: Predictive biomarkers for treatment) 
    
`$$B_{base}\times T \rightarrow \Delta D$$`
      
- Adjusting by `$D_{base}$` and sex

---

**Analysis 3.**

- For those treated with paroxetine: treatment interactions with IL-10 and TNF-α showed a trend to significance (P=0.054, P=0.085).
    
- While testing for interactions requires more power, this suggests that individuals with low values of IL-10 will respond better to treatment than those with high values.
    
- Where to set the threshold?

---

- One of the main problems when analyzing multiple biomarkers independently is **multiplicity**
      
  - Take a biomarker with no correlation with efficacy and test the correlation in 100 clinical trials: 5% of studies with finding significant results. 
    
  - Take 100 biomarkers with no correlation with efficacy in one clinical trial: 5% of biomarkers will be declared significant. 
    
- In any such clinical trial is almost sure that will have at least one significant biomarker. 
    
- You want that only 5% of null trials report a significant biomarker. 
    
    
    
---

class: middle
 
## Multiple/Composite Biomarkers
 
 
- A correct threshold of significance (correction for multiple comparisons):
 
 - **Bonferroni**: divide the P-value by the number of biomarkers. In the Depression study then `$P < 0.05/8 = 0.0062$`: None of the results are significant!
 
 - **False discovery rate**: Order the 8 Pvalues from lower to higher: `$P_i$` for `$i=1...8$` and select `$i$` such that `$P_i \leq i/8*0.05$`. All P values between 0 and i are declared significant. 
 
- Both methods are widely implemented in statistical software. Bonferroni is more conservative than FDR, and FDR is most commonly used in omic studies. 
 
---

class: middle
      
## Multiple/Composite Biomarkers
      
      
- Another alternative is to **construct** a composite Biomarker. 
  - Computational processed and/or algorithms using machine learning and AI to discover a subset of individuals where treatment effect is maximum. 
    
  - Use the biomarkers to measure a new biological quantity that could be in the disease pathway. 
    
    
---

## Genetic mosaicisms 
 
<img src="img/mosaicism_med.jpeg" style="width:50%" align="center">
 
- One most common somatic mutation in man is the loss of chromosome Y.
 
- In a sample, we can measure the loss of RNA transcription from genes in chromosome Y from cells that do not produce RNA from chromosome Y because they lost it.
 
- We convert 1000 biomarkers into 1 with biological sense. 
 
- We have shown that the lost of transcription of chromosome Y is associates with cancer, BMI, lower immune cell count in the blood.

---

## Examples

---

## Examples

To run the examples in R.

- Install R (https://cran.r-project.org/)

- In the command line install the following packages (copy-paste the following code)

```r
if (!requireNamespace("BiocManager", quietly = TRUE))
    install.packages("BiocManager")

BiocManager::install(c("RCurl", "clusterProfiler", "cvAUC","pROC", "drc", "org.Hs.eg.db", "AnnotationDbi", "BiocGenerics", "Biobase", "sva", "limma", "repmis", ))
```

- just copy-paste in the command line all the code that I show!

---

## Examples 
  
load all the libraries

```r
library(cvAUC)
library(RCurl)
library(clusterProfiler)
library(cvAUC)
library(pROC)
library(drc)
library(sva)
library(limma)
library(repmis)
```

---

## Example 1

---

## Example 1 
  
  
  If we have detected one biomarker with continuous levels that is associated with efficacy, how do we select the threshold for clinical applications?
  
  [Diversity and complexity of HIV-1 drug resistance: a bioinformatics approach to predicting phenotype from genotype](https://pubmed.ncbi.nlm.nih.gov/12060770)

- Response:  Antiretroviral drug resistance

- Biomarker: Score that stratifies patients with genomic data using a machine learning method

---

## Example 1: ROC curve

```r
hiv <- read.delim("https://github.com/alejandro-isglobal/Biomarkers/raw/master/data/hiv.txt")
head(hiv)
```

```
##   response      test
## 1        1 -0.438185
## 2        1 -0.766791
## 3        1  0.695282
## 4        1 -0.689079
## 5        1  0.325977
## 6        1  0.704040
```

---

## Example 1: ROC curve

```r
table(hiv$response)
```

```
## 
##  -1   1 
## 267  78
```

- resistance: no, no resistance to drug treatment: -1
 
- resistance: yes, resistance to drug treatment: 1

---

## Example 1: ROC curve

```r
hist(hiv$test)
```

- Biomarkers test: ranges from `$-2$` to `$2$`

---

## Example 1: ROC curve

cut-off at `$-1$`
 
- Biomarker: negative: `$test < -1$` 
- Biomarker: positive: `$test > -1$`

```r
Biomarker <- hiv$test > -1 
Resistance <- factor(hiv$response, labels = c("No", "Yes"))
table(Resistance, Biomarker)
```

```
##           Biomarker
## Resistance FALSE TRUE
##        No    192   75
##        Yes     9   69
```

---

## Example 1: ROC curve

```r
br <- seq(-2,2,0.25)

hist(hiv$test[hiv$response==-1], 
     br=br, freq=F,xlab="RF", main="")

hist(hiv$test[hiv$response==1], 
     br=br, freq=F, add=T, col="blue")

legend("toprigh",
       legend=c("no resis.", "yes resis."),          
       col=c(1,2), lty=1)
```

---

## Example 1: ROC curve

cut-off at `$-1$`
 
- Biomarker: negative: `$test < -1$` 
- Biomarker: positive: `$test > -1$` 
 
<img src="Biomarkdown_files/figure-html/unnamed-chunk-8-1.png" width="50%" />

---

## Example 1: ROC curve

Biomarker was *positive* ( `$> -1$` ) when there was  resistance (yes)
  
**Sensitivity**: `$fr_{[cut-off=-1]}(positive|yes)$`

```r
mean(hiv$test[hiv$response==1] > -1 )
```

```
## [1] 0.8846154
```

---

## Example 1: ROC curve

Biomarker was *positive* ( `$> -1$` ) when there was no resistance (no)

**False positive rate** ( `$1 - specificity$` ): `$fr_{[cut-off=-1]}(positive|no)$`

```r
mean(hiv$test[hiv$response==-1] > -1 )
```

```
## [1] 0.2808989
```

---

## Example 1: ROC curve

`$(1 - specificity, sensitivity)_{[cut-off]}= (FPR, TPR)_{[cut-off]}$`

for each cutt-off we get one point, i.e: `$(0.280, 0.884)_{[-1]}$`

```r
out <- cvAUC(hiv$test, hiv$response) #compute ROC
plot(out$perf, col="blue", main="ROC") #plot
lines(c(0,1),c(0,1)); points(0.28, 0.88, pch=16) #cutoff=-1
```

---

## Example 1: ROC curve

Area under the curve

`$AUC=Pr(X2 < X1)$`
 
 Where `$X1$` is the outcome of a positive test and `$X2$` the outcome of a negative test

```r
ci.cvAUC(hiv$test, hiv$response)
```

```
## $cvAUC
## [1] 0.9047825
## 
## $se
## [1] 0.02276935
## 
## $ci
## [1] 0.8601554 0.9494096
## 
## $confidence
## [1] 0.95
```

---

## Example 1: ROC curve

```r
rocobj <- roc(hiv$response, hiv$test)
```

```
## Setting levels: control = -1, case = 1
```

```
## Setting direction: controls < cases
```

```r
coords(rocobj, "best", best.method="youden")
```

```
##   threshold specificity sensitivity
## 1 -0.700003   0.9138577   0.7948718
```

It optimizes `$sensitivity-(1-specificity)$`
  
---

## Example 1: ROC curve

```r
out <- cvAUC(hiv$test, hiv$response) #calcular ROC
plot(out$perf, col="blue", main="ROC") #plot
lines(c(0,1),c(0,1)); 
points(0.280, 0.884, pch=16)#cutoff=-1
points(1-0.9138, 0.7948, pch=16, col="red") #optimal
```

---

## Example 2

---

## Example2. Prediction of response to treatment
  
  [Short-term transcriptional response to IL-17 receptor-A antagonism in the treatment of psoriasis](https://pubmed.ncbi.nlm.nih.gov/31883845/]). JACI. 2020

<img src="img/abs.JPG" style="width:75%" align="center">
 
 
---

## Example2. Prediction of response to treatment

- They used transcription data from a panel of genes associated with psoriasis.

- They show that the improvement in  of psoriasis transcriptome with Brodalumbad treatment by responders.

COU: supprogate end point. Improvement of psoriasis transcriptome showing causal action of the drug in a biological pathway.

<img src="img/paper.JPG" style="width:75%" align="center">
 
 
---

## Example2. Prediction of response to treatment

The question of whether the biomarkers can be used to predict Brodalumbad remains.

I downloaded the data from [GEO](https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE117468) to find predictors of the efficacy.

```r
source_data("https://github.com/alejandro-isglobal/Biomarkers/raw/master/data/GSE117468.Rdata")
```

```
## [1] "phenodat"    "expr"        "genesid"     "genesentrez"
```

---

## Example2. Prediction of response to treatment

```r
dim(expr)
```

```
## [1] 53951    96
```

```r
expr[1:5,1:5]
```

```
##           GSM3300910 GSM3300916 GSM3300920 GSM3300928 GSM3300932
## 1007_s_at   9.975898   8.947814  10.544516  10.088859   9.948196
## 1053_at     7.264437   7.323783   6.468845   7.648613   7.304800
## 117_at      6.066067   6.010792   6.373146   6.520405   6.567029
## 121_at      6.037013   6.100783   6.261066   5.624866   6.403674
## 1255_g_at   3.013214   2.937823   3.087177   2.914161   3.672842
```

- These are gene transcription data of nonlesional tissue of psoriasis patients at baseline.
  
---

## Example2. Prediction of response to treatment

```r
dim(phenodat)
```

```
## [1] 96  8
```

```r
head(phenodat)
```

```
##            age    bmi     patient          t   eff      effdif effbase effend
## GSM3300910  53 20.750 10216001001 brodalumab  TRUE  1.00000000    12.4    0.0
## GSM3300916  51 35.235 10216001004    placebo  TRUE  0.44791667    19.2   10.6
## GSM3300920  47 35.471 10216001005    placebo FALSE -0.16417910    13.4   15.6
## GSM3300928  38 33.272 10216003001 brodalumab  TRUE  0.85427136    19.9    2.9
## GSM3300932  47 36.553 10216003002    placebo FALSE -0.67980296    20.3   34.1
## GSM3300936  64 32.189 10216003003    placebo FALSE -0.08116883    30.8   33.3
```

- <code>t=1,2</code>: placebo, Brodalumab 210mg or 140 mg. 
- <code>effdif</code>: percentage of improvement in PASI (psoriasis area-and-severity-index) between baseline and w12 (PASI (W0-W12)/W0).
- <code>eff=1,0</code>: PASI improvement W12 < W0

---

## Example2. Prediction of response to treatment

**Analysis 1.**

I aimed to test for which biomarkers significantly correlated with improvement in PASI (predictors of efficacy)

`$$\Delta PASI \times T \rightarrow B_{base}$$`

```r
table(phenodat$eff, phenodat$t)
```

```
##        
##         placebo brodalumab
##   FALSE       9          1
##   TRUE       16         70
```

- Only one individual under treatment did not improve PASI but 16 placebos improved PASI. Conditioning on the treated individuals is not suitable.

- Placebos still have information.

---

## Example2. Prediction of response to treatment

```r
mod0 <- model.matrix( ~ t + eff + age + bmi, data = phenodat)
mod <- model.matrix( ~ t:eff + t + eff + age + bmi, data = phenodat)
ns <- num.sv(expr, mod, method="be")
ss <- sva(expr, mod, mod0, n.sv=ns)$sv
modss <- cbind(mod, ss)

#estimate associations
fit <- lmFit(expr, modss)
fit <- eBayes(fit)

tt <- topTable(fit, number=Inf, coef="tbrodalumab:effTRUE")
```

---

## Example2. Prediction of response to treatment

```r
source_data("https://github.com/alejandro-isglobal/Biomarkers/raw/master/data/tt.RData")
```

```
## [1] "fit" "tt"
```

---

## Example2. Prediction of response to treatment

```r
head(tt)
```

```
##                 logFC  AveExpr         t      P.Value    adj.P.Val        B
## 204622_x_at -3.110576 6.517228 -8.155612 4.312802e-12 2.326800e-07 14.25881
## 211868_x_at -3.644099 4.872004 -7.726811 2.949329e-11 7.955961e-07 12.79006
## 215565_at   -2.831032 2.754007 -7.551538 6.452170e-11 1.160337e-06 12.18802
## 210090_at   -2.534860 3.629721 -7.344970 1.618297e-10 2.182719e-06 11.47802
## 215036_at   -2.370792 4.318090 -7.178871 3.380056e-10 3.647148e-06 10.90720
## 234884_x_at -3.816265 4.428421 -7.019798 6.824035e-10 6.004875e-06 10.36100
```

---

## Example2. Prediction of response to treatment

**Volcano plot**

```r
volcanoplot(fit, highlight=11, coef="tbrodalumab:effTRUE", 
            names=genesid[rownames(fit$coefficients)],cex=0.1)
```

---

## Example2. Prediction of response to treatment

**Volcano plot**
 
![](Biomarkdown_files/figure-html/unnamed-chunk-25-1.png)

---

## Example2. Prediction of response to treatment

We identified 87 tranctripts from 48 genes significantly (adjsuted `$P <0.05$`) associated with efficacy when treated with brodalumab.

```r
trascriptname <- rownames(tt)
sigGenespso <- trascriptname[tt$adj.P.Val<0.05]
length(sigGenespso)
```

```
## [1] 87
```

---

## Example2. Prediction of response to treatment

We select significant genes and asked which metabolic **pathways** are enriched with those genes

```r
#select genes in the format of ENTREZ
mappedgenesIds <- genesentrez[sigGenespso]
mappedgenesIds <- unique(unlist(strsplit(mappedgenesIds, " /// ")))

#run enrichment in GO
GO <- enrichGO(gene = mappedgenesIds, 'org.Hs.eg.db', ont="MF", pvalueCutoff=0.05, pAdjustMethod="BH")

dotplot(GO)
```

---

## Example2. Prediction of response to treatment

**pathways**

![](Biomarkdown_files/figure-html/unnamed-chunk-28-1.png)

---

## Example2. Prediction of response to treatment

**Analysis 2.**

I used new causal inference methods (random causal forest) that I have recently implemented for transcriptomic data.

The method

- builds a predictor from the transcription data of the relevant genes
- estimates the probability at baseline of the response to a potential brodalumab treatment
- predict probabilities on 19 randomly selected individuals not used to build the predictor.

---

## Example2. Prediction of response to treatment

```r
source_data("https://github.com/alejandro-isglobal/Biomarkers/raw/master/data/pred.Rdata")
```

```
## [1] "pred"
```

---

## Example2. Prediction of response to treatment

- <code>pasi_imp</code> is PASI improvement:

`$\Delta PASI = \frac{PASI_{W0}-PASI_{W12}}{PASI_{W0}}$`

- <code>prob</code> probability of brodalumab response at baseline 
- <code>t</code>: 1=Placebo, 2=Brodalumab

```r
head(pred)
```

```
##      pasi_imp      prob          t
## 1  0.08522727 0.3279899    placebo
## 2  1.00000000 0.3818976 brodalumab
## 3  0.92333333 0.3433848 brodalumab
## 4  0.80152672 0.3193873 brodalumab
## 5 -0.67980296 0.4011619    placebo
## 6  0.99019608 0.3488585 brodalumab
```

---

## Example2. Prediction of response to treatment

We can test the dose-response relationship between the brodalumab response at baseline (dose) and PASI improvement (response).

```r
met <- drm(pasi_imp ~ prob, t, fct=LL.4(), data=pred)
plot(met, legendPos=c(0.36,-0.25)) 
```

---

We can test if the dose-response model is significant

```r
noEffect(met)
```

```
## Chi-square test              Df         p-value 
##    4.163001e+01    7.000000e+00    6.126367e-07
```
---

- Gene expression data at baseline in nonelesional skin can strongly predict the level of response to brodalumab 12-week treatment.

- The top gene *NR4A2* is mechanistically involved in the  IL-23/Th17 axis. Blockage of *NR4A2* prevents Th17 from producing IL-17 and IL-21 in vitro.

- *NR4A2* offers additional treatment targeting or biomarker testing for response.

---
class: middle
name: Alejandro-Caceres

.left-col-50[
<img src="img/caceres.jfif" width = "180px"/>
### Alejandro Cáceres
&nbsp;ISGlobal Bioinformatics 
&nbsp;alejandro.caceres@isglobal.org 
]

## Thank you for your attention

### Find the slides [online](https://alejandro-isglobal.github.io/teaching/Biomarkdown.html).
]