Last updated: 2020-03-17
Checks: 7 0
Knit directory: m6AQTL_reproducibleDocument/
This reproducible R Markdown analysis was created with workflowr (version 1.6.0). The Checks tab describes the reproducibility checks that were applied when the results were created. The Past versions tab lists the development history.
Great! Since the R Markdown file has been committed to the Git repository, you know the exact version of the code that produced these results.
Great job! The global environment was empty. Objects defined in the global environment can affect the analysis in your R Markdown file in unknown ways. For reproduciblity it’s best to always run the code in an empty environment.
The command set.seed(20200317)
was run prior to running the code in the R Markdown file. Setting a seed ensures that any results that rely on randomness, e.g. subsampling or permutations, are reproducible.
Great job! Recording the operating system, R version, and package versions is critical for reproducibility.
Nice! There were no cached chunks for this analysis, so you can be confident that you successfully produced the results during this run.
Great job! Using relative paths to the files within your workflowr project makes it easier to run your code on other machines.
Great! You are using Git for version control. Tracking code development and connecting the code version to the results is critical for reproducibility. The version displayed above was the version of the Git repository at the time these results were generated.
Note that you need to be careful to ensure that all relevant files for the analysis have been committed to Git prior to generating the results (you can use wflow_publish
or wflow_git_commit
). workflowr only checks the R Markdown file, but you know if there are other scripts or data files that it depends on. Below is the status of the Git repository when the results were generated:
Ignored files:
Ignored: .Rproj.user/
Note that any generated files, e.g. HTML, png, CSS, etc., are not included in this status report because it is ok for generated content to have uncommitted changes.
These are the previous versions of the R Markdown and HTML files. If you’ve configured a remote Git repository (see ?wflow_git_remote
), click on the hyperlinks in the table below to view them.
File | Version | Author | Date | Message |
---|---|---|---|---|
Rmd | c26375b | kevinlkx | 2020-03-17 | wflow_publish(“analysis/m6AQTL_mapping.Rmd”) |
html | fbf4376 | scottzijiezhang | 2020-03-17 | Build site. |
Rmd | bb81e9f | scottzijiezhang | 2020-03-17 | wflow_publish(c(“WASP_mapping.Rmd”, “peak_calling.Rmd”, “m6AQTL_mapping.Rmd”, “Analysis_of_m6AQTL.Rmd”, “Annotating_m6AQTL_enrichment.Rmd”, |
Preprocess m6A phenotype: Extract peak logOR, adjust PC and IP efficiency, perform quantile normalization
## source: m6AQTL_workflowr/code/prepare_fastQTL_input_peak_logOR_adjust.GC.IP_qqnorm_PCs.R
library(MeRIPtools)
library("BSgenome.Hsapiens.UCSC.hg19")
options(scipen=999) ## suppress scientific notations
m6A_version <- "jointPeak_threshold5_MeRIP_HISAT2Map"
num_tasks_per_node <- 10
## standardization and quantile normalization
normalize_phenotype <- function(phenotype.m, zscore_standardize = TRUE, quantile_normalize = TRUE, plot = FALSE){
colnames_phenotypes <- colnames(phenotype.m)
phenotype.m <- as.matrix(phenotype.m)
if(zscore_standardize == T){
## Standardize all measurements by gene.
## center and scale the phenotype measurements across individuals
## by removing the sample mean and dividing by the sample standard deviation.
## scale: center and scale the columns of a numeric matrix
cat("Center and scale standardization ... \n")
phenotype.m <- t(scale(t(phenotype.m), center = T, scale = T))
if(plot){
hist(phenotype.m, xlab = paste('z-score'), main = "Distribution after Z-score standardization")
}
}
if(quantile_normalize == T){
## Quantile-normalization to fit a standard normal distribution by individual.
cat("Quantile normalization ... \n")
# library(preprocessCore) ## for quantile normalization
# phenotype.m <- normalize.quantiles.use.target(phenotype.m, target = qnorm(ppoints(nrow(phenotype.m))))
phenotype.m <- apply(phenotype.m, 2, function(x) qqnorm(x, plot.it=FALSE)$x )
if(plot){
hist(phenotype.m, main = paste('Distribution after quantile normalization'))
qqnorm(phenotype.m, main = "After quantile normalization"); qqline(phenotype.m, col = 2)
}
}
colnames(phenotype.m) <- colnames_phenotypes
return(phenotype.m)
}
MeRIPdata <- readRDS(paste0("/project2/xinhe/m6A/m6A_seq/m6A_QTL/peakcalling/", m6A_version, "/", m6A_version, ".RDS"))
vcf_file <- "/project2/xinhe/m6A/sixtySample_hg19/Imputed.sixty.hg19.chr21.vcf.gz" ## this vcf file was simply used to get sample names
## get peak logOR
cat("get peak logOR ... \n")
peak_logOR_MeRIPdata <- get_peak_logOR(MeRIPdata, vcf_file, BSgenome.Hsapiens.UCSC.hg19, AdjustGC = TRUE, AdjIPeffi = TRUE, thread = num_tasks_per_node)
saveRDS(peak_logOR_MeRIPdata, paste0("/project2/xinhe/m6A/m6A_seq/m6A_QTL/peakcalling/", m6A_version, "/", "peak_logOR_MeRIPdata.rds"))
peak_logOR <- peak_logOR_MeRIPdata$logOR
MeRIPdata <- peak_logOR_MeRIPdata$MeRIPdata
## m6APeak_logOR_GC.IP.adjusted with qqnorm
phenotype_name <- "m6APeak_logOR_GC.IP.adjusted_qqnorm"
### normalize phenotypes
cat("Normalize phenotypes...\n")
peak_logOR.m <- as.matrix(peak_logOR[,-c(1:4)])
peak_logOR_normalized.m <- normalize_phenotype(peak_logOR.m, zscore_standardize = T, quantile_normalize = T)
peak_logOR_normalized <- data.frame(peak_logOR[,1:4], peak_logOR_normalized.m, check.names = F)
dir_output <- paste0("/project2/xinhe/m6A/m6A_seq/m6A_QTL/peakcalling/", m6A_version, "/", phenotype_name)
dir.create(dir_output, showWarnings = F, recursive = T)
## Compute phenotype PCs
cat("Computing Principal components...\n")
idx_peaks_filtered <- apply(extractIP(MeRIPdata),1,function(x) all(x!=0))
peak_logOR_normalized_filtered.m <- peak_logOR_normalized.m[idx_peaks_filtered, ]
saveRDS(peak_logOR_normalized_filtered.m, paste0(dir_output, "/", phenotype_name, ".filtered_matrix.rds"))
PCs <- prcomp( t( peak_logOR_normalized_filtered.m), center = T, scale = T )$x
dir.create(paste0(dir_output,"/PCs/"), showWarnings = F, recursive = T)
for(PCsToInclude in 1:30){
PCs_included <- data.frame(ID = paste0("PC", 1:PCsToInclude), t(PCs[, 1:PCsToInclude]), check.names = FALSE)
write.table(PCs_included, gzfile(paste0(dir_output, "/PCs/", PCsToInclude, "PCs.txt.gz")),
sep = "\t", quote = FALSE, col.names = TRUE, row.names = FALSE)
}
## Save to fastQTL format, use peak midpoint position
peak_logOR_normalized_sorted <- peak_logOR_normalized
peak_mid <- round((peak_logOR_normalized$start + peak_logOR_normalized$end)/2)
peak_logOR_normalized_sorted$start <- peak_mid - 1
peak_logOR_normalized_sorted$end <- peak_mid
peak_logOR_normalized_sorted$chr <- factor(peak_logOR_normalized_sorted$chr, levels = paste0("chr", c(1:22, "X", "Y", "M")))
peak_logOR_normalized_sorted <- peak_logOR_normalized_sorted[with(peak_logOR_normalized_sorted, order(chr, start, end)), ]
colnames(peak_logOR_normalized_sorted)[1] <- paste0("#", colnames(peak_logOR_normalized_sorted)[1])
write.table(peak_logOR_normalized_sorted, paste0(dir_output, "/", phenotype_name, ".fastQTL.txt"),
sep = "\t", quote = FALSE, col.names = TRUE, row.names = FALSE)
cat(phenotype_name, "results saved at: ", dir_output, "\n")
setwd(dir_output)
system(paste("bgzip", paste0(phenotype_name, ".fastQTL.txt"), "&& tabix -p bed", paste0(phenotype_name, ".fastQTL.txt.gz")))
cat("done. \n")
## source: ~/projects/m6A/m6AQTL_workflowr/code/fastQTL_nominal_m6APeak_logOR_PCs.sbatch
## map m6A-QTLs using fastQTL nominal pass
## parameters
m6A_version=jointPeak_threshold5_MeRIP_HISAT2Map
PHENOTYPE=m6APeak_logOR_GC.IP.adjusted_qqnorm
test_window=1e5
thresh_pvalue=1 # keep all results
echo "m6A_version: ${m6A_version}, phenotype: ${PHENOTYPE}, cis window size: ${test_window}, covariate: PCs"
dir_genotype=/project2/xinhe/m6A/m6A_seq/m6A_QTL/genotypes/hg19/YRI/sixtySample_hg19_201903
dir_phenotypes=/project2/xinhe/m6A/m6A_seq/m6A_QTL/peakcalling/${m6A_version}/${PHENOTYPE}
dir_covariates=${dir_phenotypes}/PCs/
echo "dir_genotype: ${dir_genotype}"
echo "dir_phenotypes: ${dir_phenotypes}"
echo "dir_covariates: ${dir_covariates}"
dir_output=/project2/xinhe/m6A/m6A_seq/m6A_QTL/results/hg19/m6A_QTLs/${m6A_version}/fastQTL_output/${PHENOTYPE}/cis_${test_window}bp/nominal_pass
mkdir -p ${dir_output}
dir_log=${dir_output}/log
mkdir -p ${dir_log}
for NUM_PC in {0..20}
do
for CHR in "chr"{1..22}
do
echo "fastQTL for ${PHENOTYPE} in ${CHR} with ${NUM_PC} PCs included."
if [[ ${NUM_PC} -gt 0 ]]; then
# include PCs as covariates
fastQTL --vcf ${dir_genotype}/Imputed.sixty.hg19.${CHR}.vcf.gz \
--bed ${dir_phenotypes}/${PHENOTYPE}.fastQTL.txt.gz \
--cov ${dir_covariates}/${NUM_PC}PCs.txt.gz \
--region ${CHR} \
--window ${test_window} \
--threshold ${thresh_pvalue} \
--out ${dir_output}/fastQTL.nominals.${PHENOTYPE}.${CHR}.${test_window}bp.${NUM_PC}PCs.pvalue${thresh_pvalue}.txt.gz \
--log ${dir_log}/fastQTL.nominals.${PHENOTYPE}.${CHR}.${test_window}bp.${NUM_PC}PCs.log \
--silent
else
# No covariates
fastQTL --vcf ${dir_genotype}/Imputed.sixty.hg19.${CHR}.vcf.gz \
--bed ${dir_phenotypes}/${PHENOTYPE}.fastQTL.txt.gz \
--region ${CHR} \
--window ${test_window} \
--threshold ${thresh_pvalue} \
--out ${dir_output}/fastQTL.nominals.${PHENOTYPE}.${CHR}.${test_window}bp.${NUM_PC}PCs.pvalue${thresh_pvalue}.txt.gz \
--log ${dir_log}/fastQTL.nominals.${PHENOTYPE}.${CHR}.${test_window}bp.${NUM_PC}PCs.log \
--silent
fi
done
zcat ${dir_output}/fastQTL.nominals.${PHENOTYPE}.chr*.${test_window}bp.${NUM_PC}PCs.pvalue${thresh_pvalue}.txt.gz | gzip -c > \
${dir_output}/fastQTL.nominals.${PHENOTYPE}.AllChrs.${test_window}bp.${NUM_PC}PCs.pvalue${thresh_pvalue}.txt.gz
done
echo "Done."
echo "fastQTL results at: ${dir_output}"
# source: m6AQTL_workflowr/code/fastQTL_permutation_m6APeak_logOR_PCs.sbatch
## map m6A-QTLs using fastQTL permutation pass
## parameters
m6A_version=jointPeak_threshold5_MeRIP_HISAT2Map
PHENOTYPE=m6APeak_logOR_GC.IP.adjusted_qqnorm
test_window=1e5
num_permutations=1000
echo "m6A_version: ${m6A_version}, phenotype: ${PHENOTYPE}, cis window size: ${test_window}, number of permutations: ${num_permutations}, covariate: PCs"
dir_genotype=/project2/xinhe/m6A/m6A_seq/m6A_QTL/genotypes/hg19/YRI/sixtySample_hg19_201903
dir_phenotypes=/project2/xinhe/m6A/m6A_seq/m6A_QTL/peakcalling/${m6A_version}/${PHENOTYPE}
dir_covariates=${dir_phenotypes}/PCs
echo "dir_genotype: ${dir_genotype}"
echo "dir_phenotypes: ${dir_phenotypes}"
echo "dir_covariates: ${dir_covariates}"
dir_output=/project2/xinhe/m6A/m6A_seq/m6A_QTL/results/hg19/m6A_QTLs/${m6A_version}/fastQTL_output/${PHENOTYPE}/cis_${test_window}bp/permutation_pass
mkdir -p ${dir_output}
dir_log=${dir_output}/log
mkdir -p ${dir_log}
for NUM_PC in {0..20}
do
for CHR in "chr"{1..22}
do
echo "fastQTL for ${PHENOTYPE} in ${CHR} with ${NUM_PC} PCs included."
if [[ ${NUM_PC} -gt 0 ]]; then
# include PCs as covariates
fastQTL --vcf ${dir_genotype}/Imputed.sixty.hg19.${CHR}.vcf.gz \
--bed ${dir_phenotypes}/${PHENOTYPE}.fastQTL.txt.gz \
--cov ${dir_covariates}/${NUM_PC}PCs.txt.gz \
--region ${CHR} \
--permute ${num_permutations} \
--window ${test_window} \
--out ${dir_output}/fastQTL.${num_permutations}permutations.${PHENOTYPE}.${CHR}.${test_window}bp.${NUM_PC}PCs.txt.gz \
--log ${dir_log}/fastQTL.${num_permutations}permutations.${PHENOTYPE}.${CHR}.${test_window}bp.${NUM_PC}PCs.log \
--silent
else
# No covariates
fastQTL --vcf ${dir_genotype}/Imputed.sixty.hg19.${CHR}.vcf.gz \
--bed ${dir_phenotypes}/${PHENOTYPE}.fastQTL.txt.gz \
--region ${CHR} \
--permute ${num_permutations} \
--window ${test_window} \
--out ${dir_output}/fastQTL.${num_permutations}permutations.${PHENOTYPE}.${CHR}.${test_window}bp.${NUM_PC}PCs.txt.gz \
--log ${dir_log}/fastQTL.${num_permutations}permutations.${PHENOTYPE}.${CHR}.${test_window}bp.${NUM_PC}PCs.log \
--silent
fi
done
zcat ${dir_output}/fastQTL.${num_permutations}permutations.${PHENOTYPE}.chr*.${test_window}bp.${NUM_PC}PCs.txt.gz | gzip -c > \
${dir_output}/fastQTL.${num_permutations}permutations.${PHENOTYPE}.AllChrs.${test_window}bp.${NUM_PC}PCs.txt.gz
done
echo "Done."
echo "fastQTL results at: ${dir_output}"
cd /project2/xinhe/m6A/m6A_seq/m6A_QTL/genotypes/hg19/YRI/sixtySample_hg19_201903
for chr in "chr"{1..22} "chrX"
do
echo $chr
bcftools query -f '%CHROM %POS %ID %REF %ALT\n' Imputed.sixty.hg19.${chr}.vcf.gz | gzip > SNP_info/"snp_info_${chr}.txt.gz"
done
zcat snp_info_chr*.txt.gz | gzip -c > snp_info.txt.gz
#source: m6AQTL_workflowr/code/process_fastQTL_summarystats.R
library(qvalue)
library(data.table)
options(scipen=999)
# get standard error and t-statitic from linear model output
get_se_summarystats <- function(beta, pvalue, N, num_cvrt = 0){
cat("Extract summary stats from linear model ... \n")
# Number of indepedent variables in the model: intercept + genotype + covariates
P <- 1 + 1 + num_cvrt
# Calculating the t-statistic
t_statistic <- abs(qt(pvalue/2, df = N - P)) * sign(beta)
# Calculating standard error
se <- beta / t_statistic
summary_stats <- data.frame(beta = beta, se = se, statistic = t_statistic, pvalue = pvalue)
return(summary_stats)
}
# extract peak regions
extract_m6Apeak_region <- function(peaks){
peaks <- as.character(peaks)
peak_loc <- sapply(strsplit(peaks, "_"), `[`, 1)
GENE <- sapply(strsplit(peaks, "_"), `[`, 2)
strand <- sapply(strsplit(peaks, "_"), `[`, 3)
chr <- gsub(":.+", "", peak_loc)
peak_range <- gsub(".+:", "", peak_loc)
peak_start <- as.integer(sapply(strsplit(peak_range, "-"), `[`, 1))
peak_end <- as.integer(sapply(strsplit(peak_range, "-"), `[`, 2))
peak_mid <- round((peak_start + peak_end)/2)
result.df <- data.frame(chr, peak_start, peak_end, peak_mid, strand, GENE)
return(result.df)
}
####### begins here #######
m6A_version <- "jointPeak_threshold5_MeRIP_HISAT2Map"
phenotype_name <- "m6APeak_logOR_GC.IP.adjusted_qqnorm"
num_PCs <- 15
chr_name <- "AllChrs"
num_permutations <- 1000
cat("Load fastQTL results for", m6A_version, "version of m6A reads alignment, phenotype file:", phenotype_name, "in", chr_name, "\n")
cat("Choose the results with", num_PCs, "PCs \n")
dir_fastQTL_nominal <- paste0("/project2/xinhe/m6A/m6A_seq/m6A_QTL/results/hg19/m6A_QTLs/", m6A_version, "/fastQTL_output/", phenotype_name, "/cis_1e5bp/nominal_pass/")
dir_fastQTL_PermPass <- paste0("/project2/xinhe/m6A/m6A_seq/m6A_QTL/results/hg19/m6A_QTLs/", m6A_version, "/fastQTL_output/", phenotype_name, "/cis_1e5bp/permutation_pass/")
dir_output <- paste0("/project2/xinhe/m6A/m6A_seq/m6A_QTL/results/hg19/m6A_QTLs/", m6A_version, "/fastQTL_cis_100kb/", phenotype_name)
dir.create(dir_output, showWarnings = F, recursive = T)
cat("Directory of output:", dir_output, "\n")
## load SNP location
cat("loading SNP location ... \n")
SNP_info <- fread("/project2/xinhe/m6A/m6A_seq/m6A_QTL/genotypes/hg19/YRI/sixtySample_hg19_201903/SNP_info/snp_info.txt.gz")
colnames(SNP_info) <- c("CHROM", "POS", "ID", "REF", "ALT")
SNP_info$SNP <- paste(SNP_info$CHROM, SNP_info$POS, sep = ":")
##################################
## nominal pass ##
##################################
cat("process results from fastQTL nominal pass ... \n")
filename_fastQTL_nominal <- paste0(dir_fastQTL_nominal, "/fastQTL.nominals.", phenotype_name, ".", chr_name, ".1e5bp.", num_PCs, "PCs.pvalue1.txt.gz")
if(!file.exists(filename_fastQTL_nominal)){
stop(filename_fastQTL_nominal, "does not exist! ")
}
m6AQTLs_nominal.df <- fread(filename_fastQTL_nominal)
colnames(m6AQTLs_nominal.df) <- c("PEAK", "SNPID", "DIST", "pvalue", "beta")
## add SNP info and update DIST
m6AQTLs_nominal.df <- cbind(m6AQTLs_nominal.df, extract_m6Apeak_region(m6AQTLs_nominal.df$PEAK)[,c("chr", "peak_mid", "strand","GENE")])
m6AQTLs_nominal.df$SNP_POS <- m6AQTLs_nominal.df$peak_mid + m6AQTLs_nominal.df$DIST
m6AQTLs_nominal.df$SNP <- paste(m6AQTLs_nominal.df$chr, m6AQTLs_nominal.df$SNP_POS, sep = ":")
SNPinfo_matched.df <- SNP_info[match(m6AQTLs_nominal.df$SNP, SNP_info$SNP), ]
if(!identical(m6AQTLs_nominal.df$SNPID, SNPinfo_matched.df$ID)){stop("SNP ID not match!")}
m6AQTLs_nominal.df <- data.frame(m6AQTLs_nominal.df, REF = SNPinfo_matched.df$REF, ALT = SNPinfo_matched.df$ALT)
m6AQTLs_nominal.df$DIST <- m6AQTLs_nominal.df$SNP_POS - m6AQTLs_nominal.df$peak_mid
m6AQTLs_nominal.df[m6AQTLs_nominal.df$strand == "-", "DIST"] <- - m6AQTLs_nominal.df[m6AQTLs_nominal.df$strand == "-", "DIST"]
cat("extract SE and t-statistic ... \n")
m6AQTLs_nominal_summarystats <- get_se_summarystats(m6AQTLs_nominal.df$beta, m6AQTLs_nominal.df$pvalue, N = 60, num_cvrt = num_PCs)
m6AQTLs_nominal.df$se <- m6AQTLs_nominal_summarystats$se
m6AQTLs_nominal.df$statistic <- m6AQTLs_nominal_summarystats$statistic
m6AQTLs_nominal.df$qvalue <- qvalue(m6AQTLs_nominal.df$pvalue)$qvalues
m6AQTLs_nominal.df <- m6AQTLs_nominal.df[, c("PEAK", "GENE", "SNP", "SNPID", "REF", "ALT", "DIST", "beta", "se", "statistic", "pvalue", "qvalue")]
cat("Save nominal-pass summary stats ... \n")
saveRDS(m6AQTLs_nominal.df, paste0(dir_output, "/m6AQTL.", phenotype_name, ".", num_PCs, "PCs.fastQTL.nominals.rds"))
fwrite(m6AQTLs_nominal.df, paste0(dir_output, "/m6AQTL.", phenotype_name, ".", num_PCs, "PCs.fastQTL.nominals.txt"), sep = "\t")
system(paste0("gzip -f ", dir_output, "/m6AQTL.", phenotype_name, ".", num_PCs, "PCs.fastQTL.nominals.txt"))
## qvalue < 0.1
sig_m6AQTLs_nominal.df <- m6AQTLs_nominal.df[m6AQTLs_nominal.df$qvalue < 0.1, ]
cat(nrow(sig_m6AQTLs_nominal.df), "sig m6A-QTLs based on nominal p-values (qvalue < 0.1) \n")
saveRDS(sig_m6AQTLs_nominal.df, paste0(dir_output, "/sig.m6AQTL.", phenotype_name, ".", num_PCs, "PCs.fastQTL.nominals.rds"))
lead_m6AQTLs_nominal.df <- sig_m6AQTLs_nominal.df[order(sig_m6AQTLs_nominal.df$pvalue), ]
lead_m6AQTLs_nominal.df <- lead_m6AQTLs_nominal.df[!duplicated(lead_m6AQTLs_nominal.df$PEAK), ]
cat(nrow(lead_m6AQTLs_nominal.df), "lead m6A-QTLs based on nominal p-values (qvalue < 0.1) \n")
saveRDS(lead_m6AQTLs_nominal.df, paste0(dir_output, "/lead.m6AQTL.", phenotype_name, ".", num_PCs, "PCs.fastQTL.nominals.rds"))
## qvalue < 0.2
sig_m6AQTLs_nominal.df <- m6AQTLs_nominal.df[m6AQTLs_nominal.df$qvalue < 0.2, ]
cat(nrow(sig_m6AQTLs_nominal.df), "sig m6A-QTLs based on nominal p-values (qvalue < 0.2) \n")
saveRDS(sig_m6AQTLs_nominal.df, paste0(dir_output, "/sig.m6AQTL.", phenotype_name, ".", num_PCs, "PCs.fastQTL.nominals.qvalue_0.2.rds"))
lead_m6AQTLs_nominal.df <- sig_m6AQTLs_nominal.df[order(sig_m6AQTLs_nominal.df$pvalue), ]
lead_m6AQTLs_nominal.df <- lead_m6AQTLs_nominal.df[!duplicated(lead_m6AQTLs_nominal.df$PEAK), ]
cat(nrow(lead_m6AQTLs_nominal.df), "lead m6A-QTLs based on nominal p-values (qvalue < 0.2) \n")
saveRDS(lead_m6AQTLs_nominal.df, paste0(dir_output, "/lead.m6AQTL.", phenotype_name, ".", num_PCs, "PCs.fastQTL.nominals.qvalue_0.2.rds"))
cat("Done. \n")
##################################
## permutation pass ##
##################################
cat("process results from fastQTL permutation pass ... \n")
filename_fastQTL_PermPass <- paste0(dir_fastQTL_PermPass, "/fastQTL.", num_permutations, "permutations.", phenotype_name, ".", chr_name, ".1e5bp.", num_PCs, "PCs.txt.gz")
if(!file.exists(filename_fastQTL_PermPass)){
stop(filename_fastQTL_PermPass, "does not exist! ")
}
m6AQTLs_PermPass.df <- fread(filename_fastQTL_PermPass)
colnames(m6AQTLs_PermPass.df) <- c("PEAK", "nvar", "shape1", "shape2", "dummy", "SNPID", "DIST", "pvalue", "beta", "ppval", "bpval")
## add SNP info and update DIST
m6AQTLs_PermPass.df <- cbind(m6AQTLs_PermPass.df, extract_m6Apeak_region(m6AQTLs_PermPass.df$PEAK)[,c("chr", "peak_mid", "strand","GENE")])
m6AQTLs_PermPass.df$SNP_POS <- m6AQTLs_PermPass.df$peak_mid + m6AQTLs_PermPass.df$DIST
m6AQTLs_PermPass.df$SNP <- paste(m6AQTLs_PermPass.df$chr, m6AQTLs_PermPass.df$SNP_POS, sep = ":")
SNPinfo_matched.df <- SNP_info[match(m6AQTLs_PermPass.df$SNP, SNP_info$SNP), ]
if(!identical(m6AQTLs_PermPass.df$SNPID, SNPinfo_matched.df$ID)){stop("SNP ID not match!")}
m6AQTLs_PermPass.df <- data.frame(m6AQTLs_PermPass.df, REF = SNPinfo_matched.df$REF, ALT = SNPinfo_matched.df$ALT)
m6AQTLs_PermPass.df$DIST <- m6AQTLs_PermPass.df$SNP_POS - m6AQTLs_PermPass.df$peak_mid
m6AQTLs_PermPass.df[m6AQTLs_PermPass.df$strand == "-", "DIST"] <- - m6AQTLs_PermPass.df[m6AQTLs_PermPass.df$strand == "-", "DIST"]
cat("extract SE and t-statistic ... \n")
m6AQTLs_PermPass_summarystats <- get_se_summarystats(m6AQTLs_PermPass.df$beta, m6AQTLs_PermPass.df$pvalue, N = 60, num_cvrt = num_PCs)
m6AQTLs_PermPass.df$se <- m6AQTLs_PermPass_summarystats$se
m6AQTLs_PermPass.df$statistic <- m6AQTLs_PermPass_summarystats$statistic
m6AQTLs_PermPass.df$qvalue <- qvalue(m6AQTLs_PermPass.df$bpval)$qvalues
m6AQTLs_PermPass.df <- m6AQTLs_PermPass.df[, c("PEAK", "GENE", "SNP", "SNPID", "REF", "ALT", "DIST", "beta", "se", "statistic", "pvalue", "ppval", "bpval", "qvalue")]
cat("Save permutation-pass summary stats ... \n")
saveRDS(m6AQTLs_PermPass.df, paste0(dir_output, "/m6AQTL.", phenotype_name, ".", num_PCs, "PCs.fastQTL.permutations.rds"))
fwrite(m6AQTLs_PermPass.df, paste0(dir_output, "/m6AQTL.", phenotype_name, ".", num_PCs, "PCs.fastQTL.permutations.txt"), sep = "\t")
system(paste0("gzip -f ", dir_output, "/m6AQTL.", phenotype_name, ".", num_PCs, "PCs.fastQTL.permutations.txt"))
## qvalue < 0.1
sig_m6AQTLs_PermPass.df <- m6AQTLs_PermPass.df[m6AQTLs_PermPass.df$qvalue < 0.1, ]
cat(nrow(sig_m6AQTLs_PermPass.df), "sig m6A-QTLs based on permutation adjusted p-values (qvalue < 0.1) \n")
saveRDS(sig_m6AQTLs_PermPass.df, paste0(dir_output, "/sig.m6AQTL.", phenotype_name, ".", num_PCs, "PCs.fastQTL.permutations.rds"))
## qvalue < 0.2
sig_m6AQTLs_PermPass.df <- m6AQTLs_PermPass.df[m6AQTLs_PermPass.df$qvalue < 0.2, ]
cat(nrow(sig_m6AQTLs_PermPass.df), "sig m6A-QTLs based on permutation adjusted p-values (qvalue < 0.2) \n")
saveRDS(sig_m6AQTLs_PermPass.df, paste0(dir_output, "/sig.m6AQTL.", phenotype_name, ".", num_PCs, "PCs.fastQTL.permutations.qvalue_0.2.rds"))
cat("Done. \n")
sessionInfo()
R version 3.5.1 (2018-07-02)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Scientific Linux 7.4 (Nitrogen)
Matrix products: default
BLAS/LAPACK: /software/openblas-0.2.19-el7-x86_64/lib/libopenblas_haswellp-r0.2.19.so
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
[3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=en_US.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] workflowr_1.6.0
loaded via a namespace (and not attached):
[1] Rcpp_1.0.3 rprojroot_1.3-2 digest_0.6.23 later_1.0.0
[5] R6_2.4.1 backports_1.1.5 git2r_0.26.1.9000 magrittr_1.5
[9] evaluate_0.14 stringi_1.4.5 rlang_0.4.4 fs_1.3.1
[13] promises_1.1.0 whisker_0.4 rmarkdown_2.1 tools_3.5.1
[17] stringr_1.4.0 glue_1.3.1 httpuv_1.5.2 xfun_0.12
[21] yaml_2.2.0 compiler_3.5.1 htmltools_0.4.0 knitr_1.28