FROC Models

Issei Tsunoda

2021-10-27

This paper aimes to explain what models are implemented in this package. To do so, it suffices to show the trial from which data arise and its likelihoods.

Installation and usage of this package.

Only three R-codes are included for user. After installing R language, execute the followings.


# Installation
install.packages("BayesianFROC", dependencies = TRUE)

# Load the package
library(BayesianFROC);

# Run the function
BayesianFROC::fit_GUI_Shiny() # Variables are not required.

#The above three are R codes, 
#so please execute them on R console or R-studio console.

So, this package gives only one function BayesianFROC::fit_GUI_Shiny() to users.

Conventional Notation

In the following, we use the conventional likelihood notation;

\[ f(y|\theta), \]

where \(\LARGE{\color{red}{y}}\) denotes data, \(\LARGE{\color{blue}{\theta}}\) a model parameter. E.g., \(f(y|\theta) = \theta^y(1-\theta)^{1-y}\) for \(\{0,1\}\) valued random variable \(y\) defined by Bernoulli trial of success rate \(\theta, 0 \leq \theta \leq 1\). In the next section, we show that what trial arise and what is \(\theta\) and what is \(y\) in context of FROC analysis.

Data \(\large{\color{red}{y}}\) and trial

First of all, we introduce a trial from which we obtain a data-set to be fitted a model defined later.

Suppose that there are a physician, radiographs and a data-analyst. In the trial, the physician localizes shadows in radiographs if and only if he thinks shadows are lesions. In addition, the physician assigns to each localized shadow a confidence level which is a number as the following table.

Confidence Level meaning
3 definitely present
2 probably present
1 possibly present

E.g., if the physician localized a shadow with his highest confidence, then he assigns the number 3 to the localized shadow, which means that, in his point of view, the shadow is definitely a lesion. For example, if physician marks 10 positions in radiographs, then he also assigns 10 confidence levels for 10 marked positions.

After the physician’s lesion finding task, the data-analyst evaluates physician’s localization, i.e. the data-analyst counts physician’s truly localized positions (hits) and incorrectly marked positions (false alarms) for each confidence level. Here, we assume that data-analyst knows the truth (Gold-Standard of Diagnosis). Consequently, we will obtain the following table reflecting the physician’s recognition performance ability.

Confidence Level Number of Hits Number of False alarms
3 = definitely present \(H_{3}\) \(F_{3}\)
2 = probably present \(H_{2}\) \(F_{2}\)
1 = questionable \(H_{1}\) \(F_{1}\)

In this table, each component \(H_c\) and \(F_c\) are non negative integer valued random variables. For example, \(H_{3}\) denotes the number of hit with reader’s confidence level \(3^\text{rd}\).

So, in conventional notation we may write

\[\large{\color{red}{y}} = (H_0,H_1,H_2,H_3;F_1,F_2,F_3),\]

where we set \(H_\color{red}{0}:=N_L-(H_1+H_2+H_3)\) and \(N_L\) denotes the number of lesions among all radiographs. We note that \(\Sigma_{c=1,2,3} H_c \leq N_L\).

In the next section, we provide the probability law of these random variables \(H_c\) and \(F_c\) as the notion of image measure or the so-called push-forward measure, namely, likelihood function.

Likelihood \(f(\large{\color{red}{y}}|\large{\color{blue}{\theta}})\)

For model of 1 reader, 1 modality and 3 confidence levels. Let \(N_I\) denote the number of radiographs. Define the model by \[ \{H_{c};c=\color{red}{0},1,2,\cdots C\} \sim \color{green}{\text{Multinomial}}(\{p_{c,}(\theta)\}_{c=\color{red}{0},1,2,\cdots C}),\\ F_{c} \sim \text{Poisson}(q_c(\theta)N_I),\\ \]

where

\[ p_{c}(\theta) := \int_{\theta_c}^{\theta_{c+1}}\text{Gaussian}_{}(x|\mu,\sigma)dx,\\ q_c(\theta) := \int_{\theta_c}^{\theta_{c+1}} \biggl( \frac{d}{dz} \biggr) \log \Phi(z)dz. \]

from which, we can calculate the most important characteristic indicating the observer recognition performance ability as

\[ AUC := \Phi (\frac{\mu/\sigma}{\sqrt{(1/\sigma)^2+1}}), \\ \]

which is a real number between 0 and 1. If AUC is higher, then it means that radiologist recognition ability is also higher.

Note that model parameter is \(\LARGE{\color{blue}{\theta}}\) \(= (\theta_1,\theta_2,\theta_3,...\theta_C;\mu,\sigma)\) which should be estimated and \(\Phi\) denotes the cumulative distribution functions of the canonical Gaussian. Note that \(\theta_{C+1} = \infty\) and \(\theta_{0} = -\infty\).

Prior

Recall that our model has parameter \((\theta_1,\theta_2,\theta_3;\mu,\sigma)\). 4 years ago, I reparamatrized the model as \((\theta_1,\theta_2 - \theta_1,\theta_3-\theta_2;\mu,\sigma)\). What the fucking benefit of transformation of parameter is that we can deduce the monotonicity of thresholds parameter as follows.

\[ d\theta_c := \theta_{c+1}-\theta_{c} \sim \text{Uniform}(0,3),\\ \sigma \sim \text{Uniform}(0,3),\\ \theta_{1} \sim \text{Uniform}( -3,3),\\ \]

What we want is that the model parameters \(d\theta_c\) are distributed by probability density functions whose supports are all positive.

Diagnosis of prior by SBC: VERY Very Very BAD

We note that these prior are bad in the sense of SBC (simulation based calibration). Jeffrays prior gives good one?

Reference:

Maximum likelihood analysis of free-response receiver operating characteristic (FROC) data July 1989; Medical Physics 16(4):561-8 ##### Example R codes to fit the above model

Execute the following R codes from R console or R studio console, then the desired fitting will be done with GUI of Shiny.

library("BayesianFROC")
fit_GUI_Shiny()

MCS

I am a patient suffering from Chemical Sensitivity. Don’t wear volatile organic compounds (VOC), or huge burden to me. I hate prurigo nodularis in my body, syndet is the cause, (VOCs is also).

Explanatory variable for FROC models (Future development plan)

Here, the author announces how to include explanatory variable \(X \in \mathbb{R}^d\) in FROC models.

You know, to include fucking explanatory variable in FROCs, what should we do is very simple.

To do so, it is sufficient to consider the following model:

Define the model by \[ \{H_{c};c=\color{red}{0},1,2,\cdots C\} \sim \color{green}{\text{Multinomial}}(\{p_{c,}(\theta)\}_{c=\color{red}{0},1,2,\cdots C}),\\ F_{c} \sim \text{Poisson}(q_c(\theta)N_I),\\ \]

where

\[ p_{c}(\theta) := \int_{\theta_c}^{\theta_{c+1}}\text{Gaussian}_{}(x|\mu(\alpha,X),\sigma(\beta,X))dx,\\ q_c(\theta) := \int_{\theta_c}^{\theta_{c+1}} \biggl( \frac{d}{dz} \biggr) \log \Phi(z)dz. \]

and we modify the model parameters by using the following fucking link functions as follows.

\[\mu(\alpha,X):= \alpha X = \mu_0 + \sum \alpha_i X_i,\]

\[\sigma(\beta,X) := \sigma_0 \exp(\beta X)= \sigma_0\exp(\sum \beta_i X_i).\]

Let \(X=(x_!,x_2)\) be a explanatory variable of dummies indicating some categories. Then the data to be fitted a model is the following form!

Confidence Level \(x_1\) \(x_2\) Number of Hits Number of False alarms
3 = definitely present 1 1 \(H_{3,1,1}\) \(F_{3,1,1}\)
2 = equivocal 1 1 \(H_{2,1,1}\) \(F_{2,1,1}\)
1 = questionable 1 1 \(H_{1,1,1}\) \(F_{1,1,1}\)
3 = definitely present 1 2 \(H_{3,1,2}\) \(F_{3,1,2}\)
2 = equivocal 1 2 \(H_{2,1,2}\) \(F_{2,1,2}\)
1 = questionable 1 2 \(H_{1,1,2}\) \(F_{1,1,2}\)
3 = definitely present 2 1 \(H_{3,2,1}\) \(F_{3,2,1}\)
2 = equivocal 2 1 \(H_{2,2,1}\) \(F_{2,2,1}\)
1 = questionable 2 1 \(H_{1,2,1}\) \(F_{1,2,1}\)
3 = definitely present 2 2 \(H_{3,2,2}\) \(F_{3,2,2}\)
2 = equivocal 2 2 \(H_{2,2,2}\) \(F_{2,2,2}\)
1 = questionable 2 2 \(H_{1,2,2}\) \(F_{1,2,2}\)

For ref of this section,

A Bayesian approach to a general regression models for ROC curves; …et al; 1998;18;436-443

Sayonara! Good Bye \(\star\)

\(( \dot{} \omega \dot{})\) Love love me do! \(('; \omega ;)\) You know I love you \(\star\)

A short course on ROC

What is the difference between FROC and ROC? In ROC case, both of hits \(H_c\) and false alarms \(F_c\) have upper bounds, namely, there are positive integers \(N,M\) such that

\(\Sigma H_c \leq N\) and \(\Sigma F_c \leq M\), On the other hand, in FROC case, \(F_c\) dose not have any upper bounds.

If we assume such an upper bound for \(F_c\), say \(M\), then, by putting \(F_0 := M - \Sigma F_c\), the following ROC model is available.

\[ \{H_{c};c=\color{red}{0},1,2,\cdots C\} \sim \color{green}{\text{Multinomial}}(\{p_{c,}(\theta)\}_{c=\color{red}{0},1,2,\cdots C}),\\ \{F_{c};c=\color{red}{0},1,2,\cdots C\} \sim \color{green}{\text{Multinomial}}(\{q_{c,}(\theta)\}_{c=\color{red}{0},1,2,\cdots C}),\\ \]

where

\[ p_{c}(\theta) := \int_{\theta_c}^{\theta_{c+1}}\text{Gaussian}_{}(x|\mu,\sigma)dx,\\ q_{c}(\theta) := \int_{\theta_c}^{\theta_{c+1}}\text{Gaussian}_{}(x| 0,1)dx.\\ \]

Unfortunately, it is not so good, according to the following paper, which suggests that reductions of parameter makes MCMC to be more stable. E.g., by regarding \(\mu\) and \(\sigma\) as functions of parameter \(\theta\), i.e., \(\mu=\mu(\theta), \sigma=\sigma(\theta)\), we can reduce the parameter.

I have tried the above VERY VERY BAD model ROC model as follows.

library("BayesianFROC")
# Make an ROC dataset 
d<-BayesianFROC::ROC_data_creator2()

# Fit the above bad ROC model
f<-BayesianFROC::fit_srsc_ROC(d,ite  = 111, summary = FALSE,  cha = 1,)

# Check the model. The result indicate MCMC does not converge.
rstan::check_hmc_diagnostics(f)

# Plot looks bad so BAD MODEL...
rstan::traceplot(f)

# Plot looks good but BAD MODEL...
BayesianFROC::draw_ROC_Curve_from_fitted_model(f)
BayesianFROC::fit_GUI_ROC() #FROC user interface is used, so under construction...

Why the above ROC model is bad? I am not sure, but I guess level surface of likelihood function relates, or several critical points of likelihood function relates… Reference Bayesian analysis of a ROC curve for categorical data using a skew-binormal model; Balgobin Nandram and Thelge Buddika Peiris (This paper is nice!); 2018 volume 11 369-384; Statistics and its interface

Reference As another reference of ROC or FROC, see page 109 equation (6.3.5) of the book: Chakraborty, Dev P - Observer performance methods for diagnostic imaging foundations, modeling, and applications with R-based examples-CRC Press (2017)__

Next section, ….

we consider the comparison of imaging modality, such as MRI, CT, PET, …

This model also can be interpreted as a subject-specific random effect model if we forget the modality and use it as image instead.

I implement this because Bayesian is suitable for including individual differences,…. the author think this is a benefit of Bayesian ….

Data \(\large{\color{red}{y}}\)

2 readers, 2 modalities and 3 confidence levels.

Confidence Level Modality ID Reader ID Number of Hits Number of False alarms
3 = definitely present 1= MRI 1= Bob \(H_{3,1,1}\) \(F_{3,1,1}\)
2 = equivocal 1= MRI 1= Bob \(H_{2,1,1}\) \(F_{2,1,1}\)
1 = questionable 1= MRI 1= Bob \(H_{1,1,1}\) \(F_{1,1,1}\)
3 = definitely present 1= MRI 2=Alice \(H_{3,1,2}\) \(F_{3,1,2}\)
2 = equivocal 1= MRI 2=Alice \(H_{2,1,2}\) \(F_{2,1,2}\)
1 = questionable 1= MRI 2=Alice \(H_{1,1,2}\) \(F_{1,1,2}\)
3 = definitely present 2= CT 1= Bob \(H_{3,2,1}\) \(F_{3,2,1}\)
2 = equivocal 2= CT 1= Bob \(H_{2,2,1}\) \(F_{2,2,1}\)
1 = questionable 2= CT 1= Bob \(H_{1,2,1}\) \(F_{1,2,1}\)
3 = definitely present 2= CT 2=Alice \(H_{3,2,2}\) \(F_{3,2,2}\)
2 = equivocal 2= CT 2=Alice \(H_{2,2,2}\) \(F_{2,2,2}\)
1 = questionable 2= CT 2=Alice \(H_{1,2,2}\) \(F_{1,2,2}\)

where, each component \(H\) and \(F\) are non negative integers. By the multi-index notation, for example, \(H_{3,2,1}\) denotes the number of hit of the \(1^\text{st}\) reader over all images taken by \(2^\text{nd}\) modality with reader’s confidence level is \(3^\text{rd}\).

So, in conventional notation we may write

\[y = (H_{c,m,r},F_{c,m,r} ;N_L,N_I).\]

Note that the above data format is available in case of single modality and multiple readers. Then, in this case, the modality corresponds to a subject-specific random effect. Thus, reinterpreting this model, we also introduce, in FROC models, subject-specific random variables for heterogeneous hit and false rates among images (radiographs). You know, I am a patient, and tired, no want to write this. Anyway, in this context, it would be

2 readers, 2 radiographs and 3 confidence levels.

Confidence Level radiography ID Reader ID Number of Hits Number of False alarms
3 = definitely present 1 1 \(H_{3,1,1}\) \(F_{3,1,1}\)
2 = equivocal 1 1 \(H_{2,1,1}\) \(F_{2,1,1}\)
1 = questionable 1 1 \(H_{1,1,1}\) \(F_{1,1,1}\)
3 = definitely present 1 2 \(H_{3,1,2}\) \(F_{3,1,2}\)
2 = equivocal 1 2 \(H_{2,1,2}\) \(F_{2,1,2}\)
1 = questionable 1 2 \(H_{1,1,2}\) \(F_{1,1,2}\)
3 = definitely present 2 1 \(H_{3,2,1}\) \(F_{3,2,1}\)
2 = equivocal 2 1 \(H_{2,2,1}\) \(F_{2,2,1}\)
1 = questionable 2 1 \(H_{1,2,1}\) \(F_{1,2,1}\)
3 = definitely present 2 2 \(H_{3,2,2}\) \(F_{3,2,2}\)
2 = equivocal 2 2 \(H_{2,2,2}\) \(F_{2,2,2}\)
1 = questionable 2 2 \(H_{1,2,2}\) \(F_{1,2,2}\)

Likelihood \(f(y|\theta)\)

\[ \{H_{c,m,r};c=1,2,\cdots C\} \sim \text{Multinomial}(\{p_{c,m,r}(\theta);c=1,2,\cdots C\},N_L),\\ F_{c,m,r} \sim \text{Poisson}(q_c(\theta)).\\ \]

\[ p_{c,m,r}(\theta) := \int_{\theta_c}^{\theta_{c+1}}\text{Gaussian}_{}(x|\mu_{m,r},\sigma_{m,r})dx,\\ q_c(\theta) := \int_{\theta_c}^{\theta_{c+1}} \frac{d \log \Phi(z)}{dz}dz. \]

\[ A_{m,r} := \Phi (\frac{\mu_{m,r}/\sigma_{m,r}}{\sqrt{(1/\sigma_{m,r})^2+1}}), \\ A_{m,r} \sim \text{Normal} (A_{m},\sigma_{r}^2), \\ \]

where model parameter is is \(\LARGE{\color{blue}{\theta}}\) \(= (\theta_1,\theta_2,\theta_3,...\theta_C;\mu_{m,r},\sigma_{m,r})\) which should be estimated and \(\Phi\) denotes the cumulative distribution functions of the canonical Gaussian. Note that \(\theta_{C+1} = \infty\).

Prior

\[ d\theta_c := \theta_{c+1}-\theta_{c},\\ d\theta_c, \sigma_{m,r} \sim \text{Uniform}(0,3),\\ \theta_{1} \sim \text{Uniform}( -3,3),\\ A_{m} \sim \text{Uniform}(0,1).\\ \]

This is only example, and in this package I implement proper priors. The author thinks the above prior is intuitively the simplest non informative priors without the coordinate free property.

R script of MRMC

 


fit <- fit_Bayesian_FROC(
   ite  = 1111, 
   cha = 1,
   summary = TRUE,  
   Null.Hypothesis = F, 
   dataList = dd # example data to be fitted a model
   )


# Or GUI by Shiny

fit_GUI_Shiny_MRMC()

Conjecture on prior

We should require that

\[\varepsilon<p_1<p_2<...< 1-\epsilon\] or \[ 1-\epsilon>q_1>q_2>...>\varepsilon\]

Poisson rate and multinomial Bernoulli rates should be contained the regular interval \([ \epsilon, 1-\epsilon]\) for some fixed small \(\epsilon\), i.e., \(p_c,q_c \in [ \epsilon, 1-\epsilon]\)

Monotonicity \(p_1<p_2<\cdots\) and \(q_1>q_2>\cdots\) also reasonable, where subscripts means rating and a high number indicates a high confidence level.

The author found simultaneous zero hits or false alarms cause a bias in MCMC sampling. If prior is not suitable or non-informative, then such phenomenon occurs and SBC detects it.

So, we have to find the prior to satisfy the monotonicity and the regular interval condition.

The following SBC shows our prior is not good, because for some parameter, the rank statistics is not uniformly distributed.


stanModel <- stan_model_of_sbc()

Simulation_Based_Calibration_single_reader_single_modality_via_rstan_sbc(
stanModel = stanModel,
ite     = 233,
M       = 111, # or 1111 which tooks a lot of times
epsilon = 0.04,BBB = 1.1,AAA =0.0091,sbc_from_rstan = TRUE)

# or 

stanModel <- stan_model_of_sbc()

Simulation_Based_Calibration_single_reader_single_modality_via_rstan_sbc(
stanModel = stanModel,
ite     = 233,
M       = 1111, 
epsilon = 0.04,BBB = 1.1,AAA =0.0091,sbc_from_rstan = TRUE)

In my view, SBC dose not give me practical priors. Even if SBS has good performance, those priors cannot be implemented in this pkg, because such a model dose not fit practical data.

Reference

Bayesian analysis of a ROC curve for categorical data using a skew-binormal model; Balgobin Nandram and Thelge Buddika Peiris (This paper is nice!); 2018 volume 11 369-384; Statistics and its interface

Appendix:

\(\color{green}{\textit{ Why a Bayesian approach now. }}\)

In the following, the author pointed out why frequentist p value is problematic. Of course, under some condition, Bayesian p -value coincides with frequentist p value, so the scheme of statistical test is itself problematic. We shall show the reason in the following simple example.

To tell the truth, I want to use epsilon delta manner, but for non-mathematics people, I do not use it.

  • The next section proves the monotonicity of p value for the most simple statistical test.
Monotonicity issues on p value

The methods of statistical testing are widely used in medical research. However, there is a well-known problem, which is that a large sample size gives a small p-value. In this section, we will provide an explicit explanation of this phenomenon with respect to simple hypothesis tests.

Consider the following null hypothesis \(H_0\) and its alternative hypothesis \(H_1\); \[\begin{eqnarray*} H_0: \mathbb E[ X_i] &=&m_0, \\ H_1: \mathbb E[ X_i] &>&m_0, \\ \end{eqnarray*}\] where \(\mathbb E[ X_i]\) means the expectation of random samples \(X_i\) from a normal distribution whose variance \(\sigma _0 ^2\) is known. In this situation, the test statistic is given by \[ Z^{\text{test}} := \frac{\overline{X_{n}} -m_0 }{\sqrt{\sigma _0 ^2/n}}, \] where \(\overline{X_{n}} := \sum_{i=1,\cdots,n} X_i /n\) is normally distributed with mean \(m_0\) and standard deviation \(\sigma_0/\sqrt{n}\). Under the null hypothesis, \(Z^{\text{test}}\) is normally distributed with mean \(0\) and a standard deviation \(1\) (standard normal distribution). The null hypothesis is rejected if \(Z^{\text{test}} >z_{2\alpha}\) , where \(z_{2\alpha}\) is a percentile point of the normal distribution, e.g., \(z_{0.025}=1.96 .\)

Suppose that the true distribution of \(X_1, \cdots, X_n\) is a normal distribution with mean \(m_0 + \epsilon\) and variance \(\sigma _0 ^2\), where \(\epsilon\) is an arbitrary fixed positive number. Then \[\begin{eqnarray*} Z^{\text{test}} &=&\frac{\overline{X_{n}} -(m_0+\epsilon -\epsilon) }{\sqrt{\sigma _0 ^2/n}}\\ &=& Z^{\text{Truth}} + \frac{\epsilon}{\sqrt{\sigma _0 ^2/n}}\\ \end{eqnarray*}\] where \(Z^{\text{Truth} }:=(\overline{X_{n}} -(m_0+\epsilon ))/\sqrt{\sigma _0 ^2/n}\).

In the following, we calculate the probability with which we reject the null hypothesis \(H_0\) with confidence level \(\alpha\). \[\begin{eqnarray*} \text{Prob}(Z^{\text{test}} >z_{2\alpha} ) &=&\text{Prob} (Z^{\text{Truth} } + \frac{\epsilon}{\sqrt{\sigma _0 ^2/n}} >z_{2\alpha})\\ &=&\text{Prob} (Z^{\text{Truth} } >z_{2\alpha} - \frac{\epsilon}{\sqrt{\sigma _0 ^2/n}})\\ &=&\text{Prob} (Z^{\text{Truth} } >z_{2\alpha} - \frac{\epsilon}{\sigma _0 }\sqrt{n} )\\ \end{eqnarray*}\] Note that \(\epsilon /\sigma _0\) is called the effect size.

Thus, if \(z_{2\alpha} - \epsilon \sqrt{n} /\sigma _0 < z_{2(1-\beta)}\), i.e., if \(n > ( z_{2\alpha}- z_{2(1-\beta)})^2 \sigma _0 ^2 \epsilon ^{-2}\), then the probability that the null hypothesis is rejected is greater than \(1 - \beta\).

For example, consider the case \(\sigma _0 =1\), \(\alpha =0.05\), and \((1-\beta) =\alpha\), then \(z_{2\alpha}=1.28\) and in this case, for all \(\epsilon>0\), if \(n > 7 \epsilon ^{-2}\) then the probability in which above hypothesis test concludes that the difference of the observed mean from the hypothesized mean is significant is greater than \(0.95\). This means that almost always the p-value is less than 0.05. Thus a large sample size induces a small p-value.

For example,

  • if \(\epsilon =1\) then by taking a sample size such that \(n > 7\), then almost always the conclusion of the test will be that the observed difference is statistically significant. Similarly,

  • if \(\epsilon =0.1\) then by taking a sample size such that \(n > 700\), then almost all tests will reach the conclusion that the difference is significant; and

  • if \(\epsilon =0.01\) then by taking sample size so that \(n > 70000\), then the same problem will arise.

This phenomenon also means that in large samples statistical tests will detect very small differences between populations.

By above consideration we can get the result ``significance difference’’ with respect to any tiny difference \(\epsilon\) by collecting a large enough sample \(n\), and thus we must not use the statistical test.