Last updated: 2020-03-17

Checks: 6 0

Knit directory: m6AQTL_reproducibleDocument/

This reproducible R Markdown analysis was created with workflowr (version 1.3.0). The Checks tab describes the reproducibility checks that were applied when the results were created. The Past versions tab lists the development history.


Great! Since the R Markdown file has been committed to the Git repository, you know the exact version of the code that produced these results.

Great job! The global environment was empty. Objects defined in the global environment can affect the analysis in your R Markdown file in unknown ways. For reproduciblity it’s best to always run the code in an empty environment.

The command set.seed(20200317) was run prior to running the code in the R Markdown file. Setting a seed ensures that any results that rely on randomness, e.g. subsampling or permutations, are reproducible.

Great job! Recording the operating system, R version, and package versions is critical for reproducibility.

Nice! There were no cached chunks for this analysis, so you can be confident that you successfully produced the results during this run.

Great! You are using Git for version control. Tracking code development and connecting the code version to the results is critical for reproducibility. The version displayed above was the version of the Git repository at the time these results were generated.

Note that you need to be careful to ensure that all relevant files for the analysis have been committed to Git prior to generating the results (you can use wflow_publish or wflow_git_commit). workflowr only checks the R Markdown file, but you know if there are other scripts or data files that it depends on. Below is the status of the Git repository when the results were generated:


Untracked files:
    Untracked:  analysis/Re-analysis_of_Wang_2015_cell.Rmd

Note that any generated files, e.g. HTML, png, CSS, etc., are not included in this status report because it is ok for generated content to have uncommitted changes.


There are no past versions. Publish this analysis with wflow_publish() to start tracking its development.


Analysis of m6A consensus motif break and m6AQTL.

library(MeRIPtools)
library(BSgenome.Hsapiens.UCSC.hg19)

LCLs <- readRDS( "~/m6AQTL/peakcalling/MeRIPtools/jointPeak_threshold5_MeRIP.RDS")

m6AQTL <-  readRDS("~/m6AQTL/m6A_QTL_results/linear_model/m6AQTL.m6APeak_logOR_GC.IP.adjusted_qqnorm.15PCs.fastQTL.nominals.rds")
lead_m6AQTL <- readRDS( file = "~/m6AQTL/m6A_QTL_results/linear_model/lead.m6AQTL.m6APeak_logOR_GC.IP.adjusted_qqnorm.15PCs.fastQTL.nominals.rds")

permutation_m6AQTL <-  readRDS("~/m6AQTL/m6A_QTL_results/linear_model/m6AQTL.m6APeak_logOR_GC.IP.adjusted_qqnorm.15PCs.fastQTL.permutations.rds")

sig.m6AQTL <- lead_QTL <- lead_m6AQTL[lead_m6AQTL$PEAK %in% permutation_m6AQTL[permutation_m6AQTL$qvalue<0.1,"PEAK"], ]



m6Apeaks <- jointPeak(LCLs)

LCLs_tested <- filter(LCLs, paste0(m6Apeaks$chr,":",m6Apeaks$start,"-",m6Apeaks$end,"_",m6Apeaks$name,"_",m6Apeaks$strand ) %in% unique(m6AQTL$PEAK) ) 

LCLs_lead <- filter(LCLs, paste0(m6Apeaks$chr,":",m6Apeaks$start,"-",m6Apeaks$end,"_",m6Apeaks$name,"_",m6Apeaks$strand ) %in% unique(lead_QTL$PEAK) )

m6Apeaks <- m6Apeaks[which( paste0(m6Apeaks$chr,":",m6Apeaks$start,"-",m6Apeaks$end,"_",m6Apeaks$name,"_",m6Apeaks$strand ) %in% unique(m6AQTL$PEAK)),]
sig.QTL.peaks <- m6Apeaks[match(unique(lead_QTL$PEAK), paste0(m6Apeaks$chr,":",m6Apeaks$start,"-",m6Apeaks$end,"_",m6Apeaks$name,"_",m6Apeaks$strand) ),1:12]
vcf <- readRDS("~/m6AQTL/variant_data/sixtySample_hg19/MAF_filtered_allChr_vcf.RDS")

sig.m6AQTL.SNP <-  do.call(rbind.data.frame,strsplit(as.character(sig.m6AQTL$SNP) ,split = ":") )

## get significant SNPs in variant-associated peaks
sig.QTL.peaks.gr <- makeGRangesFromDataFrame(sig.QTL.peaks)

sig.m6AQTL.gr <- GRanges(seqnames = sig.m6AQTL.SNP[,1],ranges = IRanges(start = as.numeric(as.character(sig.m6AQTL.SNP[,2])),end = as.numeric(as.character(sig.m6AQTL.SNP[,2])) ) )
peakSNP_overlap <- findOverlaps(sig.m6AQTL.gr,sig.QTL.peaks.gr)


sig.m6AQTL.vcf <- vcf[match(sig.m6AQTL[queryHits(peakSNP_overlap),"SNPID"],vcf$ID),1:5]
colnames(sig.m6AQTL.vcf)[1] <- "CHROM"
## get strand for SNPs
sig.m6AQTL.vcf$strand <- as.character( strand( sig.QTL.peaks.gr[subjectHits(peakSNP_overlap)] ) )
## filter out unknown alleles
sig.m6AQTL.vcf <- sig.m6AQTL.vcf[!duplicated(sig.m6AQTL.vcf$ID) & !sig.m6AQTL.vcf$ALT %in% levels(sig.m6AQTL.vcf$ALT)[grep("<", levels(sig.m6AQTL.vcf$ALT))],]
sig.m6AQTL.vcf <- sig.m6AQTL.vcf[nchar(as.character(sig.m6AQTL.vcf$REF))==1 & nchar(as.character(sig.m6AQTL.vcf$ALT))==1, ]
SusieResult <- readRDS("~/m6AQTL/m6A_QTL_results/linear_model/fineMapping/Susie_noPrior/SusieResult_qvalue_0.1.RDS")
SusieResult <- SusieResult[order(SusieResult$PIP, decreasing = T),]
## get best SNP from each credible set
Susie_leadQTL <- SusieResult[!duplicated(paste(SusieResult$PEAK,SusieResult$CS)) & !is.na(SusieResult$CS),]
sig.m6AQTL <- m6AQTL[match(paste(Susie_leadQTL$SNP,Susie_leadQTL$PEAK), paste(m6AQTL$SNP,m6AQTL$PEAK)),]
lead.m6AQTL.SNP <-  do.call(rbind.data.frame,strsplit(as.character(sig.m6AQTL$SNP) ,split = ":") )


lead.m6AQTL.gr <- GRanges(seqnames = lead.m6AQTL.SNP[,1],ranges = IRanges(start = as.numeric(as.character(lead.m6AQTL.SNP[,2])),end = as.numeric(as.character(lead.m6AQTL.SNP[,2])) ) )
lead_peakSNP_overlap <- findOverlaps(lead.m6AQTL.gr,sig.QTL.peaks.gr)


lead.m6AQTL.vcf <- vcf[match(sig.m6AQTL[queryHits(lead_peakSNP_overlap),"SNPID"],vcf$ID),1:5]
colnames(lead.m6AQTL.vcf)[1] <- "CHROM"
## get strand for SNPs
lead.m6AQTL.vcf$strand <- as.character( strand( sig.QTL.peaks.gr[subjectHits(lead_peakSNP_overlap)] ) )
## filter out unknown alleles
lead.m6AQTL.vcf <- lead.m6AQTL.vcf[!duplicated(lead.m6AQTL.vcf$ID) & !lead.m6AQTL.vcf$ALT %in% levels(lead.m6AQTL.vcf$ALT)[grep("<", levels(lead.m6AQTL.vcf$ALT))],]
## filter for biallelic SNPs
lead.m6AQTL.vcf <- lead.m6AQTL.vcf[nchar(as.character(lead.m6AQTL.vcf$REF))==1 & nchar(as.character(lead.m6AQTL.vcf$ALT))==1, ]
sig.m6AQTL.bed <-  data.frame(chr = sig.m6AQTL.vcf$CHROM, start =sig.m6AQTL.vcf$POS-1, end = sig.m6AQTL.vcf$POS, name = paste( sig.m6AQTL.vcf$CHROM, sig.m6AQTL.vcf$POS, sig.m6AQTL.vcf$REF,  sig.m6AQTL.vcf$ALT,sep = ":"), score = 0, strand = "+" )

write.table(sig.m6AQTL.bed,file = "~/m6AQTL/m6AQTL_analysis/data/sig.m6AQTL_in_m6Apeaks.bed",sep = "\t",col.names = F,row.names = F,quote = F)

## lead SNPs
lead.m6AQTL.bed <-  data.frame(chr = lead.m6AQTL.vcf$CHROM, start =lead.m6AQTL.vcf$POS-1, end = lead.m6AQTL.vcf$POS, name = paste( lead.m6AQTL.vcf$CHROM, lead.m6AQTL.vcf$POS, lead.m6AQTL.vcf$REF,  lead.m6AQTL.vcf$ALT,sep = ":"), score = 0, strand = "+" )

write.table(lead.m6AQTL.bed,file = "~/m6AQTL/m6AQTL_analysis/data/sig.m6AQTL_lead_in_m6Apeaks.bed",sep = "\t",col.names = F,row.names = F,quote = F)
library(motifbreakR)
library(SNPlocs.Hsapiens.dbSNP144.GRCh37)
library(BSgenome.Hsapiens.UCSC.hg19)

snps.mb <- snps.from.file( file = "~/m6AQTL/m6AQTL_analysis/data/sig.m6AQTL_in_m6Apeaks.bed", dbSNP = SNPlocs.Hsapiens.dbSNP144.GRCh37 ,search.genome = BSgenome.Hsapiens.UCSC.hg19, format = "bed" )


result <- motifbreakR(snpList = snps.mb, filterp = TRUE,
                       pwmList = MotifList_m6A,
                       threshold = 0.05,
                       method = "ic",
                       bkg = c(A=0.25, C=0.25, G=0.25, T=0.25),
                       BPPARAM = BiocParallel::MulticoreParam(workers = 30) )

result$m6A_strand <- sig.m6AQTL.vcf$strand[match(names(result), paste( sig.m6AQTL.vcf$CHROM, sig.m6AQTL.vcf$POS, sig.m6AQTL.vcf$REF,  sig.m6AQTL.vcf$ALT,sep = ":") )]
## filter for strand of motif match
result <- result[ as.character(strand(result)) == result$m6A_strand ]
## filter for minimal motif match score
#result <- result[  result$scoreRef >4 | result$scoreAlt > 4 ]

saveRDS(result, file = "~/m6AQTL/m6A_QTL_results/linear_model/Homer2/motifBreakR_match.RDS")
snps.mb <- snps.from.file( file = "~/m6AQTL/m6AQTL_analysis/data/sig.m6AQTL_lead_in_m6Apeaks.bed", dbSNP = SNPlocs.Hsapiens.dbSNP144.GRCh37 ,search.genome = BSgenome.Hsapiens.UCSC.hg19, format = "bed" )


result <- motifbreakR(snpList = snps.mb, filterp = TRUE,
                       pwmList = MotifList_m6A,
                       threshold = 0.05,
                       method = "ic",
                       bkg = c(A=0.25, C=0.25, G=0.25, T=0.25),
                       BPPARAM = BiocParallel::MulticoreParam(workers = 5) )

result$m6A_strand <- lead.m6AQTL.vcf$strand[match(names(result), paste( lead.m6AQTL.vcf$CHROM, lead.m6AQTL.vcf$POS, lead.m6AQTL.vcf$REF,  lead.m6AQTL.vcf$ALT,sep = ":") )]
## filter for strand of motif match
result <- result[ as.character(strand(result)) == result$m6A_strand ]
## filter for minimal motif match score
#result <- result[  result$scoreRef >4 | result$scoreAlt > 4 ]

saveRDS(result, file = "~/m6AQTL/m6A_QTL_results/linear_model/Homer2/leadQTL_motifBreakR_match.RDS")

Plot match GGACU Motif

library(ggpmisc)
library(ggpubr)
motif1_match <- readRDS("~/m6AQTL/m6A_QTL_results/linear_model/Homer2/motifBreakR_match.RDS")
motif1_match <- motif1_match[motif1_match$providerName == "m6A_Concensus_motif1" ]


m6AQTL <- m6AQTL[order(abs(m6AQTL$DIST)),]
motif1_match$QTL_effect <- m6AQTL$beta[ match( gsub('.{4}$', '', names(motif1_match) ), m6AQTL$SNP) ]
motif1_match$QTL_distance <- m6AQTL$DIST[ match( gsub('.{4}$', '', names(motif1_match) ), m6AQTL$SNP) ]
motif1_match <- motif1_match[ abs(motif1_match$QTL_distance) <200 ]


motif1_match_df <- as.data.frame(motif1_match)

## plot in discrete version
#motif1_match_df <- motif1_match_df[motif1_match_df$effect == "strong",]
motif1_match_df$motifBreak <- ifelse(motif1_match_df$scoreAlt - motif1_match_df$scoreRef > 0, "Create", "Break")

motif1_match_df$markPoints <- TRUE

motif1_match_df$markPoints[229] <-FALSE


ggplot(motif1_match_df,aes(x=motifBreak,y=QTL_effect))+geom_violin(trim = F,fill = "grey80")+ggtitle("GGACU motif")+ xlab("Effect of SNP on motif ")+ ylab("Effect of SNP on m6A (beta)") +geom_boxplot(aes(colour = motifBreak),width=0.2,notch = T,outlier.shape = NA)+ geom_jitter(aes(colour = markPoints),  width = 0.05) + geom_hline(aes(yintercept = 0), colour="#990000", linetype="dashed") +theme_classic() + theme(axis.line = element_line(colour = "black",size = I(0.5)),axis.title = element_text(size = 18, face = "bold"),axis.text = element_text(colour  = "black", face = "bold", size = 16), plot.title = element_text(family = "arial", face = "bold", size = 20, hjust = 0.5), legend.position = "none" ) +scale_color_manual(values = c("TRUE" = "black", "FALSE" = "red","Break"="#F8766D","Create"="#00BFC4"))

print("P-value:")

print(t.test(motif1_match_df$QTL_effect[motif1_match_df$motifBreak == "Break"],motif1_match_df$QTL_effect[motif1_match_df$motifBreak == "Create"])$p.value)

Plot match GGACU Motif (lead SNPs only)

lead_motif1_match <- readRDS("~/m6AQTL/m6A_QTL_results/linear_model/Homer2/leadQTL_motifBreakR_match.RDS")
lead_motif1_match <- lead_motif1_match[lead_motif1_match$providerName == "m6A_Concensus_motif1" ]

SusieResult <- readRDS("~/m6AQTL/m6A_QTL_results/linear_model/fineMapping/Susie_noPrior/SusieResult_qvalue_0.1.RDS")
SusieResult <- SusieResult[order(SusieResult$PIP, decreasing = T),]
## get best SNP from each credible set
Susie_leadQTL <- SusieResult[!duplicated(paste(SusieResult$PEAK,SusieResult$CS)) & !is.na(SusieResult$CS),]
sig.m6AQTL <- m6AQTL[match(paste(Susie_leadQTL$SNP,Susie_leadQTL$PEAK), paste(m6AQTL$SNP,m6AQTL$PEAK)),]

sig.m6AQTL <- sig.m6AQTL[order(abs(sig.m6AQTL$DIST)),]
lead_motif1_match$QTL_effect <- sig.m6AQTL$beta[ match( gsub('.{4}$', '', names(lead_motif1_match) ), sig.m6AQTL$SNP) ]
lead_motif1_match$QTL_distance <- sig.m6AQTL$DIST[ match( gsub('.{4}$', '', names(lead_motif1_match) ), sig.m6AQTL$SNP) ]

lead_motif1_match <- lead_motif1_match[ abs(lead_motif1_match$QTL_distance) <200 ]


lead_motif1_match_df <- as.data.frame(lead_motif1_match)

## plot in discrete version
lead_motif1_match_df <- lead_motif1_match_df[lead_motif1_match_df$effect == "strong",]
lead_motif1_match_df$motifBreak <- ifelse(lead_motif1_match_df$scoreAlt - lead_motif1_match_df$scoreRef > 0, "Create", "Break")

ggplot(lead_motif1_match_df,aes(x=motifBreak,y=QTL_effect))+geom_violin(trim = F,fill = "grey80")+ggtitle("GGACU motif")+ xlab("Effect of SNP on motif ")+ ylab("Effect of SNP on m6A (beta)") +geom_boxplot(aes(colour = motifBreak),width=0.15,notch = F,outlier.shape = NA)+ geom_jitter(width = 0.05) + geom_hline(aes(yintercept = 0), colour="#990000", linetype="dashed") +theme_classic() + theme(axis.line = element_line(colour = "black",size = I(0.5)),axis.title = element_text(size = 18, face = "bold"),axis.text = element_text(colour  = "black", face = "bold", size = 16), plot.title = element_text(family = "arial", face = "bold", size = 20, hjust = 0.5), legend.position = "none" )
print("P-value:")
print(t.test(lead_motif1_match_df$QTL_effect[lead_motif1_match_df$motifBreak == "Break"],lead_motif1_match_df$QTL_effect[lead_motif1_match_df$motifBreak == "Create"])$p.value)

Plot example

LCLs <- PrepCoveragePlot(LCLs)
LCLs <- normalizeLibrary(LCLs)
plotSNPpeakPairs(LCLs,genotypeFile =  "~/m6AQTL/variant_data/sixtySample_hg19/Imputed.sixty.hg19.chr21.vcf.gz",SNPID = "rs58559714",
                 geneName = "PCNT",
                 libraryType = "opposite",
                 center = mean, ZoomIn = c(47831011,47831859),adjustExprLevel = F)
library(motifbreakR)
library(BSgenome.Hsapiens.UCSC.hg19)
plotMB(results = motif1_match, rsid = "chr21:47831309:G:A", effect = "strong" )

Analysis of CLIP-seq RBP motif break and m6AQTL.

library(motifbreakR)
library(Logolas)
library(MeRIPtools)

MotifList_CISBP <- readRDS( file = "~/m6AQTL/RBP_motif/CISBP_MotifDb.RDS")
MotifList_encode <- readRDS( file = "~/m6AQTL/RBP_motif/Encode_MotifDb.RDS")

Test Motif breaks for QTNs (PIP>0.5 or highest PIP in a peak ) w/o prior fine-mapping

vcf <- readRDS("~/m6AQTL/variant_data/sixtySample_hg19/MAF_filtered_allChr_vcf.RDS")
m6AQTL <-  readRDS("~/m6AQTL/m6A_QTL_results/linear_model/fineMapping/Susie_noPrior/SusieResult_qvalue_0.1.RDS")
m6AQTL <- m6AQTL[order(m6AQTL$PIP, decreasing = T),]
## get best SNP from each credible set
sig.m6AQTL <- m6AQTL[!duplicated(paste(m6AQTL$PEAK,m6AQTL$CS)) & !is.na(m6AQTL$CS),]
### get top SNP per peak
#sig.m6AQTL <- m6AQTL[m6AQTL$PIP>0.5 | !duplicated(m6AQTL$PEAK) ,]

LCLs <- readRDS("~/m6AQTL/peakcalling/MeRIPtools/jointPeak_threshold5_MeRIP.RDS")
m6Apeaks <- jointPeak(LCLs)
m6Apeaks <- m6Apeaks[which(paste0(m6Apeaks$chr,":",m6Apeaks$start,"-",m6Apeaks$end,"_",m6Apeaks$name,"_",m6Apeaks$strand ) %in% unique(m6AQTL$PEAK)),]
sig.m6AQTL$peakGene <- unlist(lapply(strsplit(as.character(sig.m6AQTL$PEAK), split = "_"),function(x) x[2]))

sig.m6AQTL_SNP.gr <- GRanges(unlist(lapply(strsplit(as.character(sig.m6AQTL$SNP),split = ":"),function(x){x[1]})),
                         IRanges(as.numeric(unlist(lapply(strsplit(as.character(sig.m6AQTL$SNP),split = ":"),function(x){x[2]}))),
                                 as.numeric(unlist(lapply(strsplit(as.character(sig.m6AQTL$SNP),split = ":"),function(x){x[2]})))) )

## Filter for on transcipt SNPs that act on peaks of the same transcript
## get significant SNPs in transcribed regions of the associated peak bearing gene.
geneModel <- MyTools::gtfToGeneModel("~/Database/genome/hg19/hg19_UCSC.gtf")
transcribed_region <- range(geneModel)

sigQTL_match <- findOverlaps(sig.m6AQTL_SNP.gr,transcribed_region[sig.m6AQTL$peakGene ], ignore.strand = F)
sig.m6AQTL_onTrans <- sig.m6AQTL[queryHits( sigQTL_match[queryHits(sigQTL_match) == subjectHits(sigQTL_match)] ), ]
sig.m6AQTL_onTansSNP.gr <- sig.m6AQTL_SNP.gr[ queryHits( sigQTL_match[queryHits(sigQTL_match) == subjectHits(sigQTL_match)] )]

sig.m6AQTL_onTrans.vcf <- vcf[match(sig.m6AQTL_onTrans$SNP, paste(vcf$CHROM,vcf$POS,sep=":") ),1:5]
## get strand for SNPs
sig.m6AQTL_onTrans.vcf$strand <- as.character(  unlist(lapply(strsplit(as.character(sig.m6AQTL_onTrans$PEAK), split = "_"),function(x) x[3])) )
## filter out unknown alleles
sig.m6AQTL_onTrans.vcf <- sig.m6AQTL_onTrans.vcf[!duplicated(sig.m6AQTL_onTrans.vcf$ID) & !sig.m6AQTL_onTrans.vcf$ALT %in% levels(sig.m6AQTL_onTrans.vcf$ALT)[grep("<", levels(sig.m6AQTL_onTrans.vcf$ALT))],]
## filter biallelic SNPs
sig.m6AQTL_onTrans.vcf <- sig.m6AQTL_onTrans.vcf[nchar(as.character(sig.m6AQTL_onTrans.vcf$REF))==1 & nchar(as.character(sig.m6AQTL_onTrans.vcf$ALT))==1, ]

sig.m6AQTL_onTrans.bed <-  data.frame(chr = sig.m6AQTL_onTrans.vcf$CHROM, start =sig.m6AQTL_onTrans.vcf$POS-1, end = sig.m6AQTL_onTrans.vcf$POS, name = paste( sig.m6AQTL_onTrans.vcf$CHROM, sig.m6AQTL_onTrans.vcf$POS, sig.m6AQTL_onTrans.vcf$REF,  sig.m6AQTL_onTrans.vcf$ALT,sep = ":"), score = 0, strand = "+" )
write.table(sig.m6AQTL_onTrans.bed,file = "~/m6AQTL/m6AQTL_analysis/data/sig.m6AQTLsusie_on_transcript.bed",sep = "\t",col.names = F,row.names = F,quote = F)
library(motifbreakR)
library(SNPlocs.Hsapiens.dbSNP144.GRCh37)
library(BSgenome.Hsapiens.UCSC.hg19)

snps.mb <- snps.from.file( file = "~/m6AQTL/m6AQTL_analysis/data/sig.m6AQTLsusie_on_transcript.bed", dbSNP = SNPlocs.Hsapiens.dbSNP144.GRCh37 ,search.genome = BSgenome.Hsapiens.UCSC.hg19, format = "bed" )

result <- motifbreakR(snpList = snps.mb, filterp = TRUE,
                       pwmList = c(MotifList_CISBP,MotifList_encode),
                       threshold = 1e-3,
                       method = "ic",
                       bkg = c(A=0.25, C=0.25, G=0.25, T=0.25),
                       BPPARAM = BiocParallel::MulticoreParam(workers = 15) )


saveRDS(result, file = "~/m6AQTL/RBP_motif/sig.m6AQTL_SusieUniPrior_motifBreakR_match_RBP.RDS")

Compute correlation for all SNPs that act on its host gene.

library(annotatr)
library(MyTools)
library(motifbreakR)
library(ggpmisc)
library(ggpubr)

motifBreak <- readRDS("~/m6AQTL/RBP_motif/sig.m6AQTL_SusieUniPrior_motifBreakR_match_RBP.RDS")
motifBreak <-motifBreak[which(! duplicated(paste0(names(motifBreak),motifBreak$providerName)) )]
RBPs <- as.character(unique(motifBreak$geneSymbol))
cat(paste0("There are motif (break/create) match on significant m6AQTLs for ",length(RBPs), " RBPs \n"))
cat(paste0("There are motif (break/create) match on significant m6AQTLs for ",length(unique(motifBreak$providerName)), " motifs \n"))


m6AQTL <-   readRDS("~/m6AQTL/m6A_QTL_results/linear_model/fineMapping/Susie_noPrior/SusieResult_qvalue_0.1.RDS")
m6AQTL <- m6AQTL[order(m6AQTL$PIP, decreasing = T),]
sig.m6AQTL <- m6AQTL[m6AQTL$PIP>0.5 | !duplicated(m6AQTL$PEAK) ,]

LCLs <- readRDS( "~/m6AQTL/peakcalling/MeRIPtools/jointPeak_threshold5_MeRIP.RDS")
m6Apeaks <- jointPeak(LCLs)
m6Apeaks <- m6Apeaks[which(paste0(m6Apeaks$chr,":",m6Apeaks$start,"-",m6Apeaks$end,"_",m6Apeaks$name,"_",m6Apeaks$strand ) %in% unique(m6AQTL$PEAK)),]
sig.m6AQTL$peakGene <- unlist(lapply(strsplit(as.character(sig.m6AQTL$PEAK), split = "_"),function(x) x[2]))

sig.m6AQTL_SNP.gr <- GRanges(unlist(lapply(strsplit(as.character(sig.m6AQTL$SNP),split = ":"),function(x){x[1]})),
                         IRanges(as.numeric(unlist(lapply(strsplit(as.character(sig.m6AQTL$SNP),split = ":"),function(x){x[2]}))),
                                 as.numeric(unlist(lapply(strsplit(as.character(sig.m6AQTL$SNP),split = ":"),function(x){x[2]})))) )

## Filter for on transcipt SNPs that act on peaks of the same transcript
## get significant SNPs in transcribed regions of the associated peak bearing gene.
geneModel <- MyTools::gtfToGeneModel("~/Database/genome/hg19/hg19_UCSC.gtf")
transcribed_region <- range(geneModel)

sigQTL_match <- findOverlaps(sig.m6AQTL_SNP.gr,transcribed_region[sig.m6AQTL$peakGene ], ignore.strand = F)
sig.m6AQTL_onTrans <- sig.m6AQTL[queryHits( sigQTL_match[queryHits(sigQTL_match) == subjectHits(sigQTL_match)] ), ]
sig.m6AQTL_onTrans$strand <- as.character(  unlist(lapply(strsplit(as.character(sig.m6AQTL_onTrans$PEAK), split = "_"),function(x) x[3])) )

onTransPeakRange <- as.character(  unlist(lapply(strsplit(as.character(sig.m6AQTL_onTrans$PEAK), split = "_"),function(x) x[1])) )
onTransPeakRange <- as.character(  unlist(lapply(strsplit(onTransPeakRange, split = ":"),function(x) x[2])) )
sig.m6AQTL_onTrans$peakStart <- as.numeric(  unlist(lapply(strsplit(onTransPeakRange, split = "-"),function(x) x[1])) )
sig.m6AQTL_onTrans$peakEnd <- as.numeric(  unlist(lapply(strsplit(onTransPeakRange, split = "-"),function(x) x[2])) )

sig.m6AQTL_onTrans.SNP <-  do.call(rbind.data.frame,strsplit(as.character(sig.m6AQTL_onTrans$SNP) ,split = ":") )
sig.QTL.peaks <- m6Apeaks[which(paste0(m6Apeaks$chr,":",m6Apeaks$start,"-",m6Apeaks$end,"_",m6Apeaks$name,"_",m6Apeaks$strand ) %in% unique(sig.m6AQTL$PEAK)),]
sig.QTL.peaks.gr <- makeGRangesFromDataFrame(sig.QTL.peaks)
## extend the peak by 1000bps
start(sig.QTL.peaks.gr) <- start(sig.QTL.peaks.gr) -1000
end(sig.QTL.peaks.gr) <- end(sig.QTL.peaks.gr) +1000
sig.m6AQTL_onTrans_nearPeak <- sig.m6AQTL_onTrans[unique(queryHits( findOverlaps( GRanges(seqnames = sig.m6AQTL_onTrans.SNP[,1],ranges = IRanges(start = as.numeric(as.character(sig.m6AQTL_onTrans.SNP[,2])),end = as.numeric(as.character(sig.m6AQTL_onTrans.SNP[,2])) ) ) , sig.QTL.peaks.gr) ) ), ]

Motifs <- unique(motifBreak$providerName)

require RBP binds <500 bp near peaks boundary

RG_pvalue_motif <- sapply(Motifs[grep("Encode_homer", Motifs)] , function(motif){
  tmp_motifBreak <- motifBreak[motifBreak$providerName == motif ]
  rbp <- unique(tmp_motifBreak$geneSymbol)
  tmp_QTL <- sig.m6AQTL_onTrans[which(sig.m6AQTL_onTrans$SNP %in% gsub('.{4}$', '', names(tmp_motifBreak) ) ) ,]
  
  matchID <- match(tmp_QTL$SNP, gsub('.{4}$', '', names(tmp_motifBreak) ))
  tmp_QTL$motif <-  tmp_motifBreak$providerName[matchID]
  tmp_QTL$RBP <- tmp_motifBreak$geneSymbol[matchID]
  tmp_QTL$deltaBind <- tmp_motifBreak$scoreAlt[matchID] - tmp_motifBreak$scoreRef[matchID]
  tmp_QTL$bindEffect <- tmp_motifBreak$effect[matchID] 
  tmp_QTL$bindStrand <- as.character( strand(tmp_motifBreak)[matchID] )
  
  tmp_QTL <- tmp_QTL[tmp_QTL$strand == tmp_QTL$bindStrand,]
  tmp_QTL$SNP.POS <- as.numeric(  unlist(lapply(strsplit(as.character(tmp_QTL$SNP), split = ":"),function(x) x[2])) )
  
  tmp_QTL <- tmp_QTL[(tmp_QTL$SNP.POS > (tmp_QTL$peakStart - 500) & tmp_QTL$SNP.POS < (tmp_QTL$peakStart + 500) ) | 
                       (tmp_QTL$SNP.POS > (tmp_QTL$peakEnd - 500) & tmp_QTL$SNP.POS < (tmp_QTL$peakEnd + 500) ),]
  
  regression.p <-  if( nrow(tmp_QTL)>= 5 ){ summary( lm(beta ~ deltaBind, data = tmp_QTL ) )$coef[2,4]}else{NA} 
  
  return( min(regression.p) )
  
})

## reorder according to regression p-value
RG_pvalue_motif <- sort(RG_pvalue_motif)

cat(paste("Regression performed on",length(RG_pvalue_motif[grep("Encode",names(RG_pvalue_motif))]) ),"motifs that has more than 5 data points.\n")

cat("FDR computed by Storey's qvalue method (top RBPs):\n")
print(head(qvalue(RG_pvalue_motif[grep("Encode",names(RG_pvalue_motif))], lambda = 0)$qvalue))
for( motif in names(RG_pvalue_motif[RG_pvalue_motif<0.1]) ){
  
  tmp_motifBreak <- motifBreak[motifBreak$providerName == motif ]
  rbp <- unique(tmp_motifBreak$geneSymbol)
  tmp_QTL <- sig.m6AQTL_onTrans[which(sig.m6AQTL_onTrans$SNP %in% gsub('.{4}$', '', names(tmp_motifBreak) ) ) ,]

  matchID <- match(tmp_QTL$SNP, gsub('.{4}$', '', names(tmp_motifBreak) ))
  tmp_QTL$motif <-  tmp_motifBreak$providerName[matchID]
  tmp_QTL$RBP <- tmp_motifBreak$geneSymbol[matchID]
  tmp_QTL$deltaBind <- tmp_motifBreak$scoreAlt[matchID] - tmp_motifBreak$scoreRef[matchID]
  tmp_QTL$bindEffect <- tmp_motifBreak$effect[matchID] 
  tmp_QTL$bindStrand <- as.character( strand(tmp_motifBreak)[matchID] )
  
  tmp_QTL <- tmp_QTL[tmp_QTL$strand == tmp_QTL$bindStrand,]
  tmp_QTL$SNP.POS <- as.numeric(  unlist(lapply(strsplit(as.character(tmp_QTL$SNP), split = ":"),function(x) x[2])) )
  
  tmp_QTL <- tmp_QTL[(tmp_QTL$SNP.POS > (tmp_QTL$peakStart - 500) & tmp_QTL$SNP.POS < (tmp_QTL$peakStart + 500) ) | 
                       (tmp_QTL$SNP.POS > (tmp_QTL$peakEnd - 500) & tmp_QTL$SNP.POS < (tmp_QTL$peakEnd + 500) ),]
  
  tmp_QTL$motifBreak <-  ifelse(tmp_QTL$deltaBind>0, "Create", "Break")
  
  
  pp <- ggplot(tmp_QTL , aes(x = deltaBind, y = beta) )+geom_point()+ggtitle(motif)+ylab("m6A-QTL effect size") + xlab("Motif match score (Alt - Ref)") +stat_smooth(method = 'lm')+ geom_hline(aes(yintercept = 0), colour="#990000", linetype="dashed") +theme_classic() + theme(axis.line = element_line(colour = "black",size = I(0.5)),axis.title = element_text(family = "arial", face = "bold"),axis.text = element_text(family = "arial", face = "bold", size = 12), plot.title = element_text(family = "arial", face = "bold", size = 14, hjust = 0.5), legend.position = "none" )+stat_poly_eq(aes(label = paste(..rr.label..)), label.x.npc = "left", label.y.npc = 0.98, formula = y~x, parse = TRUE, size = 5)+stat_fit_glance(method = "lm", 
                  method.args = list(formula = y~x),
                  label.x = "left",
                  label.y = 0.9,
                  aes(label = paste("italic(P)*\"-value = \"*", 
                                    signif(..p.value.., digits = 3), sep = "")), size = 5,
                  parse = TRUE)
  
   
ppZ <- ggplot(tmp_QTL , aes(x = deltaBind, y = beta/std.err ) )+geom_point()+ggtitle(motif)+ylab("m6A-QTL effect Z-score") + xlab("Motif match score (Alt - Ref)") +stat_smooth(method = 'lm')+ geom_hline(aes(yintercept = 0), colour="#990000", linetype="dashed") +theme_classic() + theme(axis.line = element_line(colour = "black",size = I(0.5)),axis.title = element_text(family = "arial", face = "bold"),axis.text = element_text(family = "arial", face = "bold", size = 12), plot.title = element_text(family = "arial", face = "bold", size = 14, hjust = 0.5), legend.position = "none" )+stat_poly_eq(aes(label = paste(..rr.label..)), label.x.npc = "left", label.y.npc = 0.98, formula = y~x, parse = TRUE, size = 5)+stat_fit_glance(method = "lm", 
                  method.args = list(formula = y~x),
                  label.x = "left",
                  label.y = 0.9,
                  aes(label = paste("italic(P)*\"-value = \"*", 
                                    signif(..p.value.., digits = 3), sep = "")), size = 5,
                  parse = TRUE)

ppDiscreate <- ggplot(tmp_QTL ,aes(x=motifBreak,y=beta))+geom_violin(trim = F,fill = "grey80")+ggtitle(paste0("Effec on  ",motif))+ xlab("Effect of SNP on motif ")+ ylab("Effect of SNP on m6A (log fold change)") + geom_boxplot(aes(colour = motifBreak),width=0.2,outlier.shape = NA)+ geom_jitter(width = 0.05) + geom_hline(aes(yintercept = 0), colour="#990000", linetype="dashed") +theme_classic() + theme(axis.line = element_line(colour = "black",size = I(0.5)),axis.title = element_text(family = "arial", face = "bold"),axis.text = element_text(family = "arial", face = "bold", size = 12), plot.title = element_text(family = "arial", face = "bold", size = 14, hjust = 0.5), legend.position = "none" )+stat_compare_means(comparisons = list(c("Create","Break")),method = "t.test")
  
gridExtra::grid.arrange(pp, ppZ, ppDiscreate, ncol = 3)
  
 
}

sessionInfo()
R version 3.5.3 (2019-03-11)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 17.10

Matrix products: default
BLAS: /usr/lib/x86_64-linux-gnu/openblas/libblas.so.3
LAPACK: /usr/lib/x86_64-linux-gnu/libopenblasp-r0.2.20.so

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

loaded via a namespace (and not attached):
 [1] workflowr_1.3.0 Rcpp_1.0.1      digest_0.6.18   rprojroot_1.3-2
 [5] backports_1.1.4 git2r_0.25.2    magrittr_1.5    evaluate_0.13  
 [9] stringi_1.4.3   fs_1.3.0        rmarkdown_1.12  tools_3.5.3    
[13] stringr_1.4.0   glue_1.3.1      xfun_0.6        yaml_2.2.0     
[17] compiler_3.5.3  htmltools_0.3.6 knitr_1.22