s-LDSC analysis

Last updated: 2020-03-18

Checks: 7 0

Knit directory: m6AQTL_reproducibleDocument/

This reproducible R Markdown analysis was created with workflowr (version 1.6.0). The Checks tab describes the reproducibility checks that were applied when the results were created. The Past versions tab lists the development history.

R Markdown file: up-to-date

Great! Since the R Markdown file has been committed to the Git repository, you know the exact version of the code that produced these results.

Environment: empty

Great job! The global environment was empty. Objects defined in the global environment can affect the analysis in your R Markdown file in unknown ways. For reproduciblity it’s best to always run the code in an empty environment.

Seed: set.seed(20200317)

The command set.seed(20200317) was run prior to running the code in the R Markdown file. Setting a seed ensures that any results that rely on randomness, e.g. subsampling or permutations, are reproducible.

Session information: recorded

Great job! Recording the operating system, R version, and package versions is critical for reproducibility.

Cache: none

Nice! There were no cached chunks for this analysis, so you can be confident that you successfully produced the results during this run.

File paths: relative

Great job! Using relative paths to the files within your workflowr project makes it easier to run your code on other machines.

Repository version: cdc8ea5

Great! You are using Git for version control. Tracking code development and connecting the code version to the results is critical for reproducibility. The version displayed above was the version of the Git repository at the time these results were generated.

Note that you need to be careful to ensure that all relevant files for the analysis have been committed to Git prior to generating the results (you can use wflow_publish or wflow_git_commit). workflowr only checks the R Markdown file, but you know if there are other scripts or data files that it depends on. Below is the status of the Git repository when the results were generated:


Ignored files:
    Ignored:    .Rproj.user/

Unstaged changes:
    Modified:   analysis/index.Rmd

Note that any generated files, e.g. HTML, png, CSS, etc., are not included in this status report because it is ok for generated content to have uncommitted changes.

These are the previous versions of the R Markdown and HTML files. If you’ve configured a remote Git repository (see ?wflow_git_remote), click on the hyperlinks in the table below to view them.

File	Version	Author	Date	Message
Rmd	cdc8ea5	kevinlkx	2020-03-18	wflow_publish(“analysis/sLDSC_GWAS_m6AQTL.Rmd”)
html	d5ff06d	kevinlkx	2020-03-18	Build site.
Rmd	1db583b	kevinlkx	2020-03-18	wflow_publish(“analysis/sLDSC_GWAS_m6AQTL.Rmd”)

LD Score Regression (LDSC) https://github.com/bulik/ldsc

Download LDSC data

Download the baselineLD model LD scores

cd /project2/xinhe/m6A/ldsc/LDSCORE/

## baselineLD_v1.1
wget https://data.broadinstitute.org/alkesgroup/LDSCORE/1000G_Phase3_baselineLD_v1.1_ldscores.tgz
mkdir 1000G_Phase3_baselineLD_v1.1_ldscores
tar -xvzf 1000G_Phase3_baselineLD_v1.1_ldscores.tgz -C 1000G_Phase3_baselineLD_v1.1_ldscores

## baselineLD_v2.2
wget https://data.broadinstitute.org/alkesgroup/LDSCORE/1000G_Phase3_baselineLD_v2.2_ldscores.tgz
mkdir 1000G_Phase3_baselineLD_v2.2_ldscores
tar -xvzf 1000G_Phase3_baselineLD_v2.2_ldscores.tgz -C 1000G_Phase3_baselineLD_v2.2_ldscores

Download weights

cd /project2/xinhe/m6A/ldsc/LDSCORE/

wget https://data.broadinstitute.org/alkesgroup/LDSCORE/weights_hm3_no_hla.tgz
tar -xvzf weights_hm3_no_hla.tgz

Download allele frequencies

cd /project2/xinhe/m6A/ldsc/LDSCORE/

wget https://data.broadinstitute.org/alkesgroup/LDSCORE/1000G_Phase3_frq.tgz
tar -xvzf 1000G_Phase3_frq.tgz

Download 1000 genomes reference genotypes at hapmap 3 loci

cd /project2/xinhe/m6A/ldsc/LDSCORE/

wget https://data.broadinstitute.org/alkesgroup/LDSCORE/1000G_Phase3_plinkfiles.tgz
tar -xvzf 1000G_Phase3_plinkfiles.tgz

Download HapMap3 SNPs

cd /project2/xinhe/m6A/ldsc/LDSCORE/

wget https://data.broadinstitute.org/alkesgroup/LDSCORE/hapmap3_snps.tgz
tar -xvzf hapmap3_snps.tgz

Download a concatenated list of HapMap3 SNPs

wget https://data.broadinstitute.org/alkesgroup/LDSCORE/w_hm3.snplist.bz2
bzip2 -d w_hm3.snplist.bz2

List of HapMap 3 SNPs rsIDs

awk '{if ($1!="SNP") {print $1} }' w_hm3.snplist > listHM3.txt

Download 1000G_Phase3_cell_type_groups

tar -xvzf 1000G_Phase3_cell_type_groups.tgz

Create annotation files and compute LD scores

Computing LD scores for `baseline_gene_MAF_LD` annotation (based on MAF, LD, genomic annotations)

source: m6AQTL_workflowr/code/extract_baselineLD_generic_annot.R

args <- commandArgs(trailingOnly = TRUE)
dir_annot <- args[1]
chrom <- args[2]

name_annot <- "baseline_gene_MAF_LD"

library(data.table)

dir_baselineLD <- "/project2/xinhe/m6A/ldsc/LDSCORE/1000G_Phase3_baselineLD_v1.1_ldscores"
annot <- read.table(paste0(dir_baselineLD, "/baselineLD.", chrom, ".annot.gz"), header = T, stringsAsFactors = F)

annot_list <- c("CHR", "BP", "SNP", "CM", "base",
                "Coding_UCSC", "Coding_UCSC.extend.500",
                "UTR_5_UCSC", "UTR_5_UCSC.extend.500",
                "UTR_3_UCSC", "UTR_3_UCSC.extend.500",
                "Intron_UCSC", "Intron_UCSC.extend.500",
                "Promoter_UCSC", "Promoter_UCSC.extend.500",
                paste0("MAFbin", 1:10),
                "MAF_Adj_Predicted_Allele_Age","MAF_Adj_LLD_AFR", "Recomb_Rate_10kb", "Nucleotide_Diversity_10kb", "Backgrd_Selection_Stat", "CpG_Content_50kb")
annot_included <- annot[, annot_list]

cat("Annotations included: \n")
print(annot_list)

dir.create(paste0(dir_annot, "/", name_annot), showWarnings = F, recursive = T)
write.table(annot_included, gzfile(paste0(dir_annot, "/", name_annot, "/", name_annot, ".", chrom, ".annot.gz")), sep = "\t", quote = F, col.names = T, row.names = F)

source: m6AQTL_workflowr/code/compute_ldscore_generic_annot.sbatch

#!/bin/bash

source activate ldsc

module load R/3.5.1

name_annot=baseline_gene_MAF_LD

dir_ldscore=/project2/xinhe/m6A/ldsc/annot/ldscores/

mkdir -p ${dir_ldscore}/${name_annot}
cd ${dir_ldscore}/${name_annot}

echo "annot: ${dir_ldscore}/${name_annot}"

for chrom in {1..22}
do
echo ${chrom}

Rscript /home/kaixuan/projects/m6A/m6AQTL_workflowr/code/extract_baselineLD_generic_annot.R ${name_annot} ${dir_ldscore} ${chrom}

zless ${dir_ldscore}/${name_annot}/${name_annot}.${chrom}.annot.gz | head -2

echo "Computing LD scores with the annot file ${name_annot}.${chrom}.annot.gz"
python2 $HOME/softwares/ldsc/ldsc.py \
--l2 \
--bfile /project2/xinhe/m6A/ldsc/LDSCORE/1000G_EUR_Phase3_plink/1000G.EUR.QC.${chrom} \
--print-snps /project2/xinhe/m6A/ldsc/LDSCORE/listHM3.txt \
--ld-wind-cm 1 \
--annot ${dir_ldscore}/${name_annot}/${name_annot}.${chrom}.annot.gz \
--out ${dir_ldscore}/${name_annot}/${name_annot}.${chrom}

done

Prepare QTL annotation bed files

source: m6AQTL_workflowr/code/prepare_QTL_annot_bedfiles_threshQvalue.R

## prepare QTL annotation bed files for LDSC
m6A_version <- "jointPeak_threshold5_MeRIP_HISAT2Map"
m6A_phenotype_name <- "m6APeak_logOR_GC.IP.adjusted_qqnorm"
thresh_qvalue <- 0.1

num_PCs_m6AQTL <- 15

## Parameters
cat("m6A version: ", m6A_version, "\n")
cat("m6A phenotype: ", m6A_phenotype_name, "\n")
cat("QTL cutoff q-value: ", thresh_qvalue, "\n")
cat("# PCs for m6A-QTL: ", num_PCs_m6AQTL, "\n")

dir_annot_bed <- "/project2/xinhe/m6A/ldsc/annot/annot_bed/"
dir.create(dir_annot_bed, showWarnings = F, recursive = T)

## QTL annotations

### m6AQTL
m6AQTL <- readRDS(paste0("/project2/xinhe/m6A/data_shared/m6A_QTLs/", m6A_version, "/fastQTL_cis_100kb/", m6A_phenotype_name, "/m6AQTL.", m6A_phenotype_name, ".",num_PCs_m6AQTL, "PCs.fastQTL.nominals.rds"))
sig_m6AQTL <- m6AQTL[m6AQTL$qvalue < as.numeric(thresh_qvalue), ]

sig_m6AQTL$chr <- sapply(strsplit(as.character(sig_m6AQTL$SNP), ":"), "[", 1)
sig_m6AQTL$SNP_POS <- as.integer(sapply(strsplit(as.character(sig_m6AQTL$SNP), ":"), "[", 2))

sig_m6AQTL.bed <- data.frame(chr = sig_m6AQTL$chr, start = sig_m6AQTL$SNP_POS - 1, end = sig_m6AQTL$SNP_POS)
sig_m6AQTL.bed$chr <- factor(sig_m6AQTL.bed$chr, levels = paste0("chr", 1:22))
sig_m6AQTL.bed <- sig_m6AQTL.bed[order(sig_m6AQTL.bed$chr, sig_m6AQTL.bed$start), ]
sig_m6AQTL.bed <- unique(sig_m6AQTL.bed)
cat(nrow(sig_m6AQTL.bed), "unique m6A-QTLs with SNP level q-value <", thresh_qvalue, "\n")
write.table(sig_m6AQTL.bed, paste0(dir_annot_bed, "sig_m6AQTL_qvalue_", thresh_qvalue, ".bed"), sep = "\t", col.names = F, row.names = F, quote = F)

### eQTL
eQTL <- readRDS("/project2/xinhe/m6A/GWAS/anno/molecular_QTL_summary/eQTL_summary_stats.RDS")
sig_eQTL <- eQTL[eQTL$FDR < as.numeric(thresh_qvalue), ]
sig_eQTL$snps <- gsub("[.]",":",sig_eQTL$snps)

sig_eQTL$chr <- sapply(strsplit(as.character(sig_eQTL$snps), ":"), "[", 1)
sig_eQTL$SNP_POS <- as.integer(sapply(strsplit(as.character(sig_eQTL$snps), ":"), "[", 2))

sig_eQTL.bed <- data.frame(chr = sig_eQTL$chr, start = sig_eQTL$SNP_POS - 1, end = sig_eQTL$SNP_POS)
sig_eQTL.bed$chr <- factor(sig_eQTL.bed$chr, levels = paste0("chr", 1:22))
sig_eQTL.bed <- sig_eQTL.bed[order(sig_eQTL.bed$chr, sig_eQTL.bed$start), ]
sig_eQTL.bed <- unique(sig_eQTL.bed)
cat(nrow(sig_eQTL.bed), "unique eQTLs with SNP level q-value <", thresh_qvalue, "\n")
write.table(sig_eQTL.bed, paste0(dir_annot_bed, "sig_eQTL_qvalue_", thresh_qvalue, ".bed"), sep = "\t", col.names = F, row.names = F, quote = F)

### sQTL
sQTL <- readRDS("/project2/xinhe/m6A/GWAS/anno/molecular_QTL_summary/sQTL_summary_stats.RDS")
sig_sQTL <- sQTL[sQTL$FDR < as.numeric(thresh_qvalue), ]
sig_sQTL$snps <- gsub("[.]",":",sig_sQTL$snps)

sig_sQTL$chr <- sapply(strsplit(as.character(sig_sQTL$snps), ":"), "[", 1)
sig_sQTL$SNP_POS <- as.integer(sapply(strsplit(as.character(sig_sQTL$snps), ":"), "[", 2))

sig_sQTL.bed <- data.frame(chr = sig_sQTL$chr, start = sig_sQTL$SNP_POS - 1, end = sig_sQTL$SNP_POS)
sig_sQTL.bed$chr <- factor(sig_sQTL.bed$chr, levels = paste0("chr", 1:22))
sig_sQTL.bed <- sig_sQTL.bed[order(sig_sQTL.bed$chr, sig_sQTL.bed$start), ]
sig_sQTL.bed <- unique(sig_sQTL.bed)
cat(nrow(sig_sQTL.bed), "unique sQTLs with SNP level q-value <", thresh_qvalue, "\n")
write.table(sig_sQTL.bed, paste0(dir_annot_bed, "sig_sQTL_qvalue_", thresh_qvalue, ".bed"), sep = "\t", col.names = F, row.names = F, quote = F)

cat("annotation bed files saved at:", dir_annot_bed, "\n")

Prepare annotation bed files for PIP based annotations

source: m6AQTL_workflowr/code/prepare_QTL_annot_bedfiles_PIP.R

## prepare QTL PIP annotation for LDSC
args <- commandArgs(trailingOnly = TRUE)
QTL_type <- args[1] # jointPeak_threshold5_MeRIP_HISAT2Map
phenotype_name <- args[2] # m6APeak_logOR_GC.IP.adjusted_qqnorm
num_PCs <- as.integer(args[3]) # 15

genotype_name <- "Genotype_YangVCF"
window_size <- 1e5
L_Susie <- 3
susie_setting <- "Susie_L3_estPriorVar"
susie_prior <- "Susie_noPrior"

## Parameters
cat("phenotype_name:", phenotype_name, "\n")
cat("genotype_name:", genotype_name, "\n")
cat("Susie setting:", susie_setting, "\n")
cat(num_PCs, "PCs included \n")

dir_annot_bed <- "/project2/xinhe/m6A/ldsc/annot/annot_bed/"
dir.create(dir_annot_bed, showWarnings = F, recursive = T)

## QTL annotations
### Load Susie fine-mapping results
dir_fineMapping <- paste0("/project2/compbio/jointLCLs/results/fineMapping_SuSiE/", phenotype_name, "/", genotype_name, "/all_genes")
dir_susie_output <- paste0(dir_fineMapping, "/", susie_setting, "/", susie_prior)
SusieResult <- readRDS(paste0(dir_susie_output, "/SusieResult_allGenes.RDS"))

### take max PIP per SNP
SusieResult <- SusieResult[order(SusieResult$PIP,decreasing = T),]
SusieResult <- SusieResult[!duplicated(SusieResult$SNP),]

SusieResult$chr <- sapply(strsplit(as.character(SusieResult$SNP), ":"), "[", 1)
SusieResult$SNP_POS <- as.integer(sapply(strsplit(as.character(SusieResult$SNP), ":"), "[", 2))

### continuous maxPIP
QTL_continuous_PIP.bed <- data.frame(chr = SusieResult$chr, start = SusieResult$SNP_POS - 1, end = SusieResult$SNP_POS, PIP = SusieResult$PIP)
QTL_continuous_PIP.bed$chr <- factor(QTL_continuous_PIP.bed$chr, levels = paste0("chr", 1:22))
QTL_continuous_PIP.bed <- QTL_continuous_PIP.bed[order(QTL_continuous_PIP.bed$chr, QTL_continuous_PIP.bed$start), ]
QTL_continuous_PIP.bed <- unique(QTL_continuous_PIP.bed)
cat(nrow(QTL_continuous_PIP.bed), "SNPs maxPIP \n")
write.table(QTL_continuous_PIP.bed, paste0(dir_annot_bed, "/", QTL_type, "_PIP_", susie_prior, ".bed"), sep = "\t", col.names = F, row.names = F, quote = F)

### continuous maxPIP in CS
SusieResult_CS <- SusieResult[!is.na(SusieResult$CS), ]

QTL_continuous_PIP.bed <- data.frame(chr = SusieResult_CS$chr, start = SusieResult_CS$SNP_POS - 1, end = SusieResult_CS$SNP_POS, PIP = SusieResult_CS$PIP)
QTL_continuous_PIP.bed$chr <- factor(QTL_continuous_PIP.bed$chr, levels = paste0("chr", 1:22))
QTL_continuous_PIP.bed <- QTL_continuous_PIP.bed[order(QTL_continuous_PIP.bed$chr, QTL_continuous_PIP.bed$start), ]
QTL_continuous_PIP.bed <- unique(QTL_continuous_PIP.bed)
cat(nrow(QTL_continuous_PIP.bed), "SNPs maxPIP in CS \n")
write.table(QTL_continuous_PIP.bed, paste0(dir_annot_bed, "/", QTL_type, "_PIPinCS_", susie_prior, ".bed"), sep = "\t", col.names = F, row.names = F, quote = F)

cat("annotation bed files saved at:", dir_annot_bed, "\n")

Rscript m6AQTL_workflowr/code/prepare_QTL_annot_bedfiles_PIP.R jointPeak_threshold5_MeRIP_HISAT2Map m6APeak_logOR_GC.IP.adjusted_qqnorm 15

Rscript m6AQTL_workflowr/code/prepare_QTL_annot_bedfiles_PIP.R eQTL_Geuvadis fastqtl_qqnorm_RNAseqGeuvadis_phase2 15

Rscript m6AQTL_workflowr/code/prepare_QTL_annot_bedfiles_PIP.R sQTL_Geuvadis fastqtl_qqnorm_ASintron_RNAseqGeuvadis_YangVCF 3

Compute LD scores for binary annotations

source: m6AQTL_workflowr/code/ldsc_binary_annot_QTL.sbatch

#!/bin/bash

source activate ldsc

prefix_annot=$1
dir_LDSC=$2

echo "prefix_annot: ${prefix_annot}"

mkdir -p ${dir_LDSC}/annot/ldscores/${prefix_annot}

module load R/3.5.1

cd ${dir_LDSC}/annot/ldscores/

for chrom in {1..22}
do
echo ${chrom}

## Step 1: Creating an annot file
echo "Make ldsc-friendly annotation files for ${prefix_annot}.bed"
python $HOME/softwares/ldsc/make_annot.py \
--bed-file ${dir_LDSC}/annot/annot_bed/${prefix_annot}.bed \
--bimfile ${dir_LDSC}/LDSCORE/1000G_EUR_Phase3_plink/1000G.EUR.QC.${chrom}.bim \
--annot-file ${dir_LDSC}/annot/ldscores/${prefix_annot}/${prefix_annot}.${chrom}.annot.gz

## Step 2: Computing LD scores with an annot file
echo "Computing LD scores with the annot file ${prefix_annot}.${chrom}.annot.gz"
python $HOME/softwares/ldsc/ldsc.py \
--l2 \
--bfile ${dir_LDSC}/LDSCORE/1000G_EUR_Phase3_plink/1000G.EUR.QC.${chrom} \
--print-snps ${dir_LDSC}/LDSCORE/listHM3.txt \
--ld-wind-cm 1 \
--annot ${dir_LDSC}/annot/ldscores/${prefix_annot}/${prefix_annot}.${chrom}.annot.gz \
--thin-annot \
--out ${dir_LDSC}/annot/ldscores/${prefix_annot}/${prefix_annot}.${chrom}

done

sbatch m6AQTL_workflowr/code/ldsc_binary_annot_QTL.sbatch sig_m6AQTL_qvalue_0.1
sbatch m6AQTL_workflowr/code/ldsc_binary_annot_QTL.sbatch sig_eQTL_qvalue_0.1
sbatch m6AQTL_workflowr/code/ldsc_binary_annot_QTL.sbatch sig_sQTL_qvalue_0.1

Compute LD scores for PIP based continuous annotations

source: m6AQTL_workflowr/code/make_ldsc_continuous_annot.R

## make continuous annotation file from bim file and annotation bed file
args <- commandArgs(trailingOnly = TRUE)
bed_file <- args[1]
bim_file <- args[2]
annot_file <- args[3]
thin_annot <- args[4]

if(is.na(thin_annot) | is.null(thin_annot)){
  thin_annot <- "full"
}

suppressPackageStartupMessages(library(GenomicRanges))

annot_bed <- read.table(bed_file, header = F, stringsAsFactors = F)
colnames(annot_bed) <- c("chr", "start", "end", "value")
annot_bed$chr <- gsub("chr", "", annot_bed$chr)
annot_bed$start <- annot_bed$start + 1

dir_annot_ldsc <- dirname(annot_file)
dir.create(dir_annot_ldsc, showWarnings = F, recursive = T)

# cat("Make LDSC continuous annot for: ", bed_file, "\n")

bim_data <- read.table(bim_file, header = F, stringsAsFactors = F)
colnames(bim_data) <- c("CHR", "SNP", "CM", "BP", "A1", "A2")

annot_ldsc <- bim_data[, c("CHR", "BP", "SNP", "CM")]
annot_ldsc.gr <- makeGRangesFromDataFrame(annot_ldsc, seqnames.field = "CHR", start.field = "BP", end.field = "BP", keep.extra.columns = T)

annot_bed.gr <- makeGRangesFromDataFrame(annot_bed, keep.extra.columns = T)

idx_overlaps <- data.frame(findOverlaps(query = annot_ldsc.gr, subject = annot_bed.gr))

annot_ldsc[, "ANNOT"] <- 0
annot_ldsc[idx_overlaps$queryHits, "ANNOT"] <- annot_bed$value[idx_overlaps$subjectHits]

if(thin_annot == "thin-annot"){
  write.table(data.frame(ANNOT = annot_ldsc[, "ANNOT"]), gzfile(annot_file), col.names = T, row.names = F, quote = F, sep = "\t")
}else{
  write.table(annot_ldsc, gzfile(annot_file), col.names = T, row.names = F, quote = F, sep = "\t")
}

cat("LDSC annotation file saved at: ", annot_file, "\n")

source: m6AQTL_workflowr/code/ldsc_continuous_annot_QTL.sbatch

#!/bin/bash

source activate ldsc

prefix_annot=$1
dir_LDSC=$2

if [ ! -d "${dir_LDSC}" ]
then
dir_LDSC=/project2/xinhe/m6A/ldsc
fi

echo "dir_LDSC: ${dir_LDSC}"

echo "prefix_annot: ${prefix_annot}"

mkdir -p ${dir_LDSC}/annot/ldscores/${prefix_annot}

module load R/3.5.1

cd ${dir_LDSC}/annot/ldscores/

for chrom in {1..22}
do
echo ${chrom}

## Step 1: Creating an annot file
echo "Make ldsc-friendly annotation files for ${prefix_annot}.bed"
Rscript ~/projects/m6A/m6AQTL_workflowr/code/make_ldsc_continuous_annot.R \
${dir_LDSC}/annot/annot_bed/${prefix_annot}.bed \
${dir_LDSC}/LDSCORE/1000G_EUR_Phase3_plink/1000G.EUR.QC.${chrom}.bim \
${dir_LDSC}/annot/ldscores/${prefix_annot}/${prefix_annot}.${chrom}.annot.gz "full-annot"

## Step 2: Computing LD scores with an annot file
echo "Computing LD scores with the annot file ${prefix_annot}.${chrom}.annot.gz"
python $HOME/softwares/ldsc/ldsc.py \
--l2 \
--bfile ${dir_LDSC}/LDSCORE/1000G_EUR_Phase3_plink/1000G.EUR.QC.${chrom} \
--print-snps ${dir_LDSC}/LDSCORE/listHM3.txt \
--ld-wind-cm 1 \
--annot ${dir_LDSC}/annot/ldscores/${prefix_annot}/${prefix_annot}.${chrom}.annot.gz \
--out ${dir_LDSC}/annot/ldscores/${prefix_annot}/${prefix_annot}.${chrom}

done

## m6A-QTL
sbatch m6AQTL_workflowr/code/ldsc_continuous_annot_QTL.sbatch m6AQTL_PIP_Susie_noPrior

## eQTL
sbatch m6AQTL_workflowr/code/ldsc_continuous_annot_QTL.sbatch eQTL_Geuvadis_PIP_Susie_noPrior

## sQTL
sbatch m6AQTL_workflowr/code/ldsc_continuous_annot_QTL.sbatch sQTL_Geuvadis_PIP_Susie_noPrior

Partition heritability

Reference: https://github.com/bulik/ldsc/wiki/Partitioned-Heritability
convert GWAS summary stats to the .sumstats format using munge_sumstats.py

using binary QTL annotations

source: m6AQTL_workflowr/code/sldsc_annot_jointQTLs.sbatch

#!/bin/bash

dir_GWAS=$1
trait=$2
thresh_qvalue=$3
dir_sLDSC_output=$4

prefix_annot_m6AQTL=sig_m6AQTL_qvalue_${thresh_qvalue}
prefix_annot_eQTL=sig_eQTL_qvalue_${thresh_qvalue}
prefix_annot_sQTL=sig_sQTL_qvalue_${thresh_qvalue}

dir_ldsc_annot=/project2/xinhe/m6A/ldsc/annot/ldscores/
dir_baselineLD=/project2/xinhe/m6A/ldsc/LDSCORE/1000G_Phase3_baselineLD_v1.1_ldscores

source activate ldsc

echo "GWAS trait: ${trait}"

############### Separate model ##################
echo "***** Separate model *****"

for prefix_annot in ${prefix_annot_m6AQTL} ${prefix_annot_eQTL} ${prefix_annot_sQTL}
do

echo ${prefix_annot}

## Adjusting gene annotations, MAF and LD related annotations (baseline_gene_MAF_LD)
echo "Adjusting for gene annotations, MAF and LD related annotations (baseline_gene_MAF_LD)"

dir_out=${dir_sLDSC_output}/${trait}/baseline_gene_MAF_LD
mkdir -p ${dir_out}

python $HOME/softwares/ldsc/ldsc.py \
--h2 ${dir_GWAS}/${trait}.sumstats.gz \
--ref-ld-chr ${dir_ldsc_annot}/baseline_gene_MAF_LD/baseline_gene_MAF_LD.,${dir_ldsc_annot}/${prefix_annot}/${prefix_annot}. \
--frqfile-chr /project2/xinhe/m6A/ldsc/LDSCORE/1000G_Phase3_frq/1000G.EUR.QC. \
--w-ld-chr /project2/xinhe/m6A/ldsc/LDSCORE/1000G_Phase3_weights_hm3_no_MHC/weights.hm3_noMHC. \
--overlap-annot --print-cov --print-coefficients --print-delete-vals \
--out ${dir_out}/${trait}_${prefix_annot}_baseline_gene_MAF_LD

done

############### Joint analysis of three types of QTLs ##################
echo "***** Joint analysis of three types of QTLs  *****"

## Adjusting gene annotations, MAF and LD related annotations (baseline_gene_MAF_LD)
echo "Adjusting for gene annotations, MAF and LD related annotations (baseline_gene_MAF_LD)"

dir_out=${dir_sLDSC_output}/${trait}/baseline_gene_MAF_LD
mkdir -p ${dir_out}

python $HOME/softwares/ldsc/ldsc.py \
--h2 ${dir_GWAS}/${trait}.sumstats.gz \
--ref-ld-chr ${dir_ldsc_annot}/baseline_gene_MAF_LD/baseline_gene_MAF_LD.,\
/${dir_ldsc_annot}/${prefix_annot_m6AQTL}/${prefix_annot_m6AQTL}.,\
/${dir_ldsc_annot}/${prefix_annot_eQTL}/${prefix_annot_eQTL}.,\
/${dir_ldsc_annot}/${prefix_annot_sQTL}/${prefix_annot_sQTL}. \
--frqfile-chr /project2/xinhe/m6A/ldsc/LDSCORE/1000G_Phase3_frq/1000G.EUR.QC. \
--w-ld-chr /project2/xinhe/m6A/ldsc/LDSCORE/1000G_Phase3_weights_hm3_no_MHC/weights.hm3_noMHC. \
--overlap-annot --print-cov --print-coefficients --print-delete-vals \
--out ${dir_out}/${trait}_jointQTL_qvalue_${thresh_qvalue}_baseline_gene_MAF_LD


############### Joint with eQTL ##################
echo "***** Joint with eQTL *****"

## Adjusting gene annotations, MAF and LD related annotations (baseline_gene_MAF_LD)
echo "Adjusting for gene annotations, MAF and LD related annotations (baseline_gene_MAF_LD)"

dir_out=${dir_sLDSC_output}/${trait}/baseline_gene_MAF_LD
mkdir -p ${dir_out}

python $HOME/softwares/ldsc/ldsc.py \
--h2 ${dir_GWAS}/${trait}.sumstats.gz \
--ref-ld-chr ${dir_ldsc_annot}/baseline_gene_MAF_LD/baseline_gene_MAF_LD.,\
/${dir_ldsc_annot}/${prefix_annot_m6AQTL}/${prefix_annot_m6AQTL}.,\
/${dir_ldsc_annot}/${prefix_annot_eQTL}/${prefix_annot_eQTL}. \
--frqfile-chr /project2/xinhe/m6A/ldsc/LDSCORE/1000G_Phase3_frq/1000G.EUR.QC. \
--w-ld-chr /project2/xinhe/m6A/ldsc/LDSCORE/1000G_Phase3_weights_hm3_no_MHC/weights.hm3_noMHC. \
--overlap-annot --print-cov --print-coefficients --print-delete-vals \
--out ${dir_out}/${trait}_m6AQTL_eQTL_qvalue_${thresh_qvalue}_baseline_gene_MAF_LD

############### Joint with sQTL ##################
echo "***** Joint with sQTL *****"

## Adjusting gene annotations, MAF and LD related annotations (baseline_gene_MAF_LD)
echo "Adjusting for gene annotations, MAF and LD related annotations (baseline_gene_MAF_LD)"

dir_out=${dir_sLDSC_output}/${trait}/baseline_gene_MAF_LD
mkdir -p ${dir_out}

python $HOME/softwares/ldsc/ldsc.py \
--h2 ${dir_GWAS}/${trait}.sumstats.gz \
--ref-ld-chr ${dir_ldsc_annot}/baseline_gene_MAF_LD/baseline_gene_MAF_LD.,\
/${dir_ldsc_annot}/${prefix_annot_m6AQTL}/${prefix_annot_m6AQTL}.,\
/${dir_ldsc_annot}/${prefix_annot_sQTL}/${prefix_annot_sQTL}. \
--frqfile-chr /project2/xinhe/m6A/ldsc/LDSCORE/1000G_Phase3_frq/1000G.EUR.QC. \
--w-ld-chr /project2/xinhe/m6A/ldsc/LDSCORE/1000G_Phase3_weights_hm3_no_MHC/weights.hm3_noMHC. \
--overlap-annot --print-cov --print-coefficients --print-delete-vals \
--out ${dir_out}/${trait}_m6AQTL_sQTL_qvalue_${thresh_qvalue}_baseline_gene_MAF_LD

echo "Done. Results saved at: ${dir_out}"

using PIP based annotations

source: sldsc_annot_generic_baselineLD_separate.sbatch and sldsc_annot_generic_baselineLD_eQTL_sQTL_PIP.sbatch

#!/bin/bash

dir_GWAS=$1
trait=$2
m6A_annot=$3
eQTL_annot=$4
sQTL_annot=$5
dir_sLDSC_output=$6

dir_ldsc_annot=/project2/xinhe/m6A/ldsc/annot/ldscores/
dir_baselineLD=/project2/xinhe/m6A/ldsc/LDSCORE/1000G_Phase3_baselineLD_v1.1_ldscores

source activate ldsc

echo "GWAS trait: ${trait}"

name_annot=baseline_gene_MAF_LD
echo "Adjusting for gene annotations, MAF and LD related annotations: ${name_annot}"

############### Separate model ##################

dir_out=${dir_sLDSC_output}/${trait}/${name_annot}
mkdir -p ${dir_out}

python $HOME/softwares/ldsc/ldsc.py \
--h2 ${dir_GWAS}/${trait}.sumstats.gz \
--ref-ld-chr ${dir_ldsc_annot}/${name_annot}/${name_annot}.,${dir_ldsc_annot}/${m6A_annot}/${m6A_annot}. \
--frqfile-chr /project2/xinhe/m6A/ldsc/LDSCORE/1000G_Phase3_frq/1000G.EUR.QC. \
--w-ld-chr /project2/xinhe/m6A/ldsc/LDSCORE/1000G_Phase3_weights_hm3_no_MHC/weights.hm3_noMHC. \
--overlap-annot --print-cov --print-coefficients --print-delete-vals \
--out ${dir_out}/${trait}_${m6A_annot}_${name_annot}

############### Joint analysis with eQTLs and sQTLs ##################

echo "Adjusting for ${eQTL_annot} and ${sQTL_annot}"

## Joint with eQTL and sQTL PIP
python $HOME/softwares/ldsc/ldsc.py \
--h2 ${dir_GWAS}/${trait}.sumstats.gz \
--ref-ld-chr ${dir_ldsc_annot}/${name_annot}/${name_annot}.,${dir_ldsc_annot}/${m6A_annot}/${m6A_annot}.,${dir_ldsc_annot}/${eQTL_annot}/${eQTL_annot}.,${dir_ldsc_annot}/${sQTL_annot}/${sQTL_annot}. \
--frqfile-chr /project2/xinhe/m6A/ldsc/LDSCORE/1000G_Phase3_frq/1000G.EUR.QC. \
--w-ld-chr /project2/xinhe/m6A/ldsc/LDSCORE/1000G_Phase3_weights_hm3_no_MHC/weights.hm3_noMHC. \
--overlap-annot --print-cov --print-coefficients --print-delete-vals \
--out ${dir_out}/${trait}_${m6A_annot}_eQTL_sQTL_PIP_${name_annot}

## Joint with eQTL PIP
python $HOME/softwares/ldsc/ldsc.py \
--h2 ${dir_GWAS}/${trait}.sumstats.gz \
--ref-ld-chr ${dir_ldsc_annot}/${name_annot}/${name_annot}.,${dir_ldsc_annot}/${m6A_annot}/${m6A_annot}.,${dir_ldsc_annot}/${eQTL_annot}/${eQTL_annot}. \
--frqfile-chr /project2/xinhe/m6A/ldsc/LDSCORE/1000G_Phase3_frq/1000G.EUR.QC. \
--w-ld-chr /project2/xinhe/m6A/ldsc/LDSCORE/1000G_Phase3_weights_hm3_no_MHC/weights.hm3_noMHC. \
--overlap-annot --print-cov --print-coefficients --print-delete-vals \
--out ${dir_out}/${trait}_${m6A_annot}_eQTL_PIP_${name_annot}

## Joint with sQTL PIP
python $HOME/softwares/ldsc/ldsc.py \
--h2 ${dir_GWAS}/${trait}.sumstats.gz \
--ref-ld-chr ${dir_ldsc_annot}/${name_annot}/${name_annot}.,${dir_ldsc_annot}/${m6A_annot}/${m6A_annot}.,${dir_ldsc_annot}/${sQTL_annot}/${sQTL_annot}. \
--frqfile-chr /project2/xinhe/m6A/ldsc/LDSCORE/1000G_Phase3_frq/1000G.EUR.QC. \
--w-ld-chr /project2/xinhe/m6A/ldsc/LDSCORE/1000G_Phase3_weights_hm3_no_MHC/weights.hm3_noMHC. \
--overlap-annot --print-cov --print-coefficients --print-delete-vals \
--out ${dir_out}/${trait}_${m6A_annot}_sQTL_PIP_${name_annot}

echo "s-LDSC done. Results saved at: ${dir_out}"

sessionInfo()

s-LDSC analysis

Kaixuan Luo

2020-03-17

Download LDSC data

Download the baselineLD model LD scores

Download weights

Download allele frequencies

Download 1000 genomes reference genotypes at hapmap 3 loci

Download HapMap3 SNPs

Download a concatenated list of HapMap3 SNPs

List of HapMap 3 SNPs rsIDs

Download 1000G_Phase3_cell_type_groups

Create annotation files and compute LD scores

Computing LD scores for `baseline_gene_MAF_LD` annotation (based on MAF, LD, genomic annotations)

Prepare QTL annotation bed files

Prepare annotation bed files for PIP based annotations

Compute LD scores for binary annotations

Compute LD scores for PIP based continuous annotations

Partition heritability

using binary QTL annotations

using PIP based annotations

s-LDSC analysis

Kaixuan Luo

2020-03-17

Download LDSC data

Download the baselineLD model LD scores

Download weights

Download allele frequencies

Download 1000 genomes reference genotypes at hapmap 3 loci

Download HapMap3 SNPs

Download a concatenated list of HapMap3 SNPs

List of HapMap 3 SNPs rsIDs

Download 1000G_Phase3_cell_type_groups

Create annotation files and compute LD scores

Computing LD scores for baseline_gene_MAF_LD annotation (based on MAF, LD, genomic annotations)

Prepare QTL annotation bed files

Prepare annotation bed files for PIP based annotations

Compute LD scores for binary annotations

Compute LD scores for PIP based continuous annotations

Partition heritability

using binary QTL annotations

using PIP based annotations

Computing LD scores for `baseline_gene_MAF_LD` annotation (based on MAF, LD, genomic annotations)