# Diagnostic for fine-mapping with summary statistics

#### 2021-06-07

This vignette demonstrates diagnostic plot for consistency between summary statistics and refenrence LD matrix.

The susie_rss assumes the LD matrix accurately estimate the correlations among SNPs from the original GWAS genotype data. Typically, the LD matrix comes from some public database of genotypes in a suitable reference population. The inaccurate LD information leads to unreliable fine-mapping result.

The diagnostic for consistency between summary statistics and refenrence LD matrix is based on the RSS model under the null with regularized LD matrix. $\hat{z} | R, s \sim N(0, (1-s)R + s I), 0<s<1$ The parameter s is estimated by maximum likelihood. A larger s means a greater inconsistency between summary statistics and the LD matrix. The expected z score is computed for each SNP, $$E(\hat{z}_j | \hat{z}_{-j})$$, and plotted against the observed z scores.

library(susieR)

## LD information from the original genotype data

We demonstrate the diagnostic plot in a simple case, the LD matrix is estimated from the original genotype data. We use the same simulated data as in fine mapping vignette.

data("N3finemapping")
b = N3finemapping$true_coef[,1] sumstats <- univariate_regression(N3finemapping$X, N3finemapping$Y[,1]) z_scores <- sumstats$betahat / sumstats$sebetahat Rin = cor(N3finemapping$X)
attr(Rin, 'eigen') = eigen(Rin, symmetric = T)
susie_plot(z_scores, y = "z", b=b) The estimated s is

s = estimate_s_rss(z_scores, Rin)
s
#  0.0001123485

The plot for the expected z scores vs the observed z scores is

condz_in = kriging_rss(z_scores, Rin)
condz_in$plot ## LD information from the reference panel We use another simulated data where the LD matrix is estimated from a reference panel. There is one signal in the simulated data (red point). There is one SNP with mismatched reference and alternative allele between summary statistics and the reference panel (yellow point). data("SummaryConsistency") zflip = SummaryConsistency$z
ld = SummaryConsistency$ldref plot(zflip, pch = 16, col = '#767676', main = 'Marginal Associations', xlab='SNP', ylab = 'z scores') points(SummaryConsistency$signal_id, zflip[SummaryConsistency$signal_id], col=2, pch=16) points(SummaryConsistency$flip_id, zflip[SummaryConsistency$flip_id], col=7, pch=16) The estimated s is s = estimate_s_rss(zflip, ld) s #  0.0198113 condz = kriging_rss(zflip, ld) condz$plot The diagnostic plot identifies the SNP with flipped allele between summary statistics and the reference panel.

## Session information

Here are some details about the computing environment, including the versions of R, and the R packages, used to generate these results.

sessionInfo()
# R version 3.6.2 (2019-12-12)
# Platform: x86_64-apple-darwin15.6.0 (64-bit)
# Running under: macOS Catalina 10.15.7
#
# Matrix products: default
# BLAS:   /Library/Frameworks/R.framework/Versions/3.6/Resources/lib/libRblas.0.dylib
# LAPACK: /Library/Frameworks/R.framework/Versions/3.6/Resources/lib/libRlapack.dylib
#
# locale:
#  en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
#
# attached base packages:
#  stats     graphics  grDevices utils     datasets  methods   base
#
# other attached packages:
#  ggplot2_3.3.0        microbenchmark_1.4-7 Matrix_1.2-18
#  L0Learn_1.2.0        susieR_0.11.42
#
# loaded via a namespace (and not attached):
#   Rcpp_1.0.5       pillar_1.4.3     compiler_3.6.2   plyr_1.8.5
#   highr_0.8        tools_3.6.2      digest_0.6.23    evaluate_0.14
#   lifecycle_0.1.0  tibble_2.1.3     gtable_0.3.0     lattice_0.20-38
#  pkgconfig_2.0.3  rlang_0.4.5      yaml_2.2.0       xfun_0.11
#  withr_2.1.2      stringr_1.4.0    dplyr_0.8.3      knitr_1.26
#  grid_3.6.2       tidyselect_0.2.5 reshape_0.8.8    glue_1.3.1
#  R6_2.4.1         rmarkdown_2.3    mixsqp_0.3-46    irlba_2.3.3
#  farver_2.0.1     reshape2_1.4.3   purrr_0.3.3      magrittr_1.5
#  scales_1.1.0     htmltools_0.4.0  assertthat_0.2.1 colorspace_1.4-1
#  labeling_0.3     stringi_1.4.3    munsell_0.5.0    crayon_1.3.4