m6A-QTL mapping

Last updated: 2020-03-17

Checks: 7 0

Knit directory: m6AQTL_reproducibleDocument/

This reproducible R Markdown analysis was created with workflowr (version 1.6.0). The Checks tab describes the reproducibility checks that were applied when the results were created. The Past versions tab lists the development history.

R Markdown file: up-to-date

Great! Since the R Markdown file has been committed to the Git repository, you know the exact version of the code that produced these results.

Environment: empty

Great job! The global environment was empty. Objects defined in the global environment can affect the analysis in your R Markdown file in unknown ways. For reproduciblity it’s best to always run the code in an empty environment.

Seed: set.seed(20200317)

The command set.seed(20200317) was run prior to running the code in the R Markdown file. Setting a seed ensures that any results that rely on randomness, e.g. subsampling or permutations, are reproducible.

Session information: recorded

Great job! Recording the operating system, R version, and package versions is critical for reproducibility.

Cache: none

Nice! There were no cached chunks for this analysis, so you can be confident that you successfully produced the results during this run.

File paths: relative

Great job! Using relative paths to the files within your workflowr project makes it easier to run your code on other machines.

Repository version: c26375b

Great! You are using Git for version control. Tracking code development and connecting the code version to the results is critical for reproducibility. The version displayed above was the version of the Git repository at the time these results were generated.

Note that you need to be careful to ensure that all relevant files for the analysis have been committed to Git prior to generating the results (you can use wflow_publish or wflow_git_commit). workflowr only checks the R Markdown file, but you know if there are other scripts or data files that it depends on. Below is the status of the Git repository when the results were generated:


Ignored files:
    Ignored:    .Rproj.user/

Note that any generated files, e.g. HTML, png, CSS, etc., are not included in this status report because it is ok for generated content to have uncommitted changes.

These are the previous versions of the R Markdown and HTML files. If you’ve configured a remote Git repository (see ?wflow_git_remote), click on the hyperlinks in the table below to view them.

File	Version	Author	Date	Message
Rmd	c26375b	kevinlkx	2020-03-17	wflow_publish(“analysis/m6AQTL_mapping.Rmd”)
html	fbf4376	scottzijiezhang	2020-03-17	Build site.
Rmd	bb81e9f	scottzijiezhang	2020-03-17	wflow_publish(c(“WASP_mapping.Rmd”, “peak_calling.Rmd”, “m6AQTL_mapping.Rmd”, “Analysis_of_m6AQTL.Rmd”, “Annotating_m6AQTL_enrichment.Rmd”,

Preprocess m6A phenotype and extract phenotype PCs

Preprocess m6A phenotype: Extract peak logOR, adjust PC and IP efficiency, perform quantile normalization

## source: m6AQTL_workflowr/code/prepare_fastQTL_input_peak_logOR_adjust.GC.IP_qqnorm_PCs.R

library(MeRIPtools)
library("BSgenome.Hsapiens.UCSC.hg19")
options(scipen=999) ## suppress scientific notations
m6A_version <- "jointPeak_threshold5_MeRIP_HISAT2Map"
num_tasks_per_node <- 10

## standardization and quantile normalization
normalize_phenotype <- function(phenotype.m, zscore_standardize = TRUE, quantile_normalize = TRUE, plot = FALSE){

  colnames_phenotypes <- colnames(phenotype.m)
  phenotype.m <- as.matrix(phenotype.m)

  if(zscore_standardize == T){
    ## Standardize all measurements by gene.
    ## center and scale the phenotype measurements across individuals
    ## by removing the sample mean and dividing by the sample standard deviation.
    ## scale: center and scale the columns of a numeric matrix
    cat("Center and scale standardization ... \n")
    phenotype.m <- t(scale(t(phenotype.m), center = T, scale = T))

    if(plot){
      hist(phenotype.m, xlab = paste('z-score'), main = "Distribution after Z-score standardization")
    }
  }

  if(quantile_normalize == T){
    ## Quantile-normalization to fit a standard normal distribution by individual.
    cat("Quantile normalization ... \n")

    # library(preprocessCore) ## for quantile normalization
    # phenotype.m <- normalize.quantiles.use.target(phenotype.m, target = qnorm(ppoints(nrow(phenotype.m))))
    phenotype.m <- apply(phenotype.m, 2, function(x) qqnorm(x, plot.it=FALSE)$x )

    if(plot){
      hist(phenotype.m, main = paste('Distribution after quantile normalization'))
      qqnorm(phenotype.m, main = "After quantile normalization"); qqline(phenotype.m, col = 2)
    }
  }
  colnames(phenotype.m) <- colnames_phenotypes
  return(phenotype.m)
}

MeRIPdata <- readRDS(paste0("/project2/xinhe/m6A/m6A_seq/m6A_QTL/peakcalling/", m6A_version, "/", m6A_version, ".RDS"))
vcf_file <- "/project2/xinhe/m6A/sixtySample_hg19/Imputed.sixty.hg19.chr21.vcf.gz" ## this vcf file was simply used to get sample names

## get peak logOR
cat("get peak logOR ... \n")
peak_logOR_MeRIPdata <- get_peak_logOR(MeRIPdata, vcf_file, BSgenome.Hsapiens.UCSC.hg19, AdjustGC = TRUE, AdjIPeffi = TRUE, thread = num_tasks_per_node)
saveRDS(peak_logOR_MeRIPdata, paste0("/project2/xinhe/m6A/m6A_seq/m6A_QTL/peakcalling/", m6A_version, "/", "peak_logOR_MeRIPdata.rds"))

peak_logOR <- peak_logOR_MeRIPdata$logOR
MeRIPdata <- peak_logOR_MeRIPdata$MeRIPdata

## m6APeak_logOR_GC.IP.adjusted with qqnorm
phenotype_name <- "m6APeak_logOR_GC.IP.adjusted_qqnorm"
### normalize phenotypes
cat("Normalize phenotypes...\n")
peak_logOR.m <- as.matrix(peak_logOR[,-c(1:4)])
peak_logOR_normalized.m <- normalize_phenotype(peak_logOR.m, zscore_standardize = T, quantile_normalize = T)
peak_logOR_normalized <- data.frame(peak_logOR[,1:4], peak_logOR_normalized.m, check.names = F)

dir_output <- paste0("/project2/xinhe/m6A/m6A_seq/m6A_QTL/peakcalling/", m6A_version, "/", phenotype_name)
dir.create(dir_output, showWarnings = F, recursive = T)

## Compute phenotype PCs
cat("Computing Principal components...\n")
idx_peaks_filtered <- apply(extractIP(MeRIPdata),1,function(x) all(x!=0))
peak_logOR_normalized_filtered.m <- peak_logOR_normalized.m[idx_peaks_filtered, ]
saveRDS(peak_logOR_normalized_filtered.m, paste0(dir_output, "/", phenotype_name, ".filtered_matrix.rds"))

PCs <- prcomp( t( peak_logOR_normalized_filtered.m), center = T, scale = T )$x

dir.create(paste0(dir_output,"/PCs/"), showWarnings = F, recursive = T)

for(PCsToInclude in 1:30){
  PCs_included <- data.frame(ID = paste0("PC", 1:PCsToInclude), t(PCs[, 1:PCsToInclude]), check.names = FALSE)
  write.table(PCs_included, gzfile(paste0(dir_output, "/PCs/", PCsToInclude, "PCs.txt.gz")),
              sep = "\t", quote = FALSE, col.names = TRUE, row.names = FALSE)
}

## Save to fastQTL format, use peak midpoint position
peak_logOR_normalized_sorted <- peak_logOR_normalized
peak_mid <- round((peak_logOR_normalized$start + peak_logOR_normalized$end)/2)
peak_logOR_normalized_sorted$start <- peak_mid - 1
peak_logOR_normalized_sorted$end <- peak_mid

peak_logOR_normalized_sorted$chr <- factor(peak_logOR_normalized_sorted$chr, levels = paste0("chr", c(1:22, "X", "Y", "M")))
peak_logOR_normalized_sorted <- peak_logOR_normalized_sorted[with(peak_logOR_normalized_sorted, order(chr, start, end)), ]
colnames(peak_logOR_normalized_sorted)[1] <- paste0("#", colnames(peak_logOR_normalized_sorted)[1])

write.table(peak_logOR_normalized_sorted, paste0(dir_output, "/", phenotype_name, ".fastQTL.txt"),
            sep = "\t", quote = FALSE, col.names = TRUE, row.names = FALSE)

cat(phenotype_name, "results saved at: ", dir_output, "\n")

setwd(dir_output)

system(paste("bgzip",  paste0(phenotype_name, ".fastQTL.txt"), "&& tabix -p bed", paste0(phenotype_name, ".fastQTL.txt.gz")))

cat("done. \n")

fastQTL nominal pass

## source: ~/projects/m6A/m6AQTL_workflowr/code/fastQTL_nominal_m6APeak_logOR_PCs.sbatch

## map m6A-QTLs using fastQTL nominal pass

## parameters
m6A_version=jointPeak_threshold5_MeRIP_HISAT2Map
PHENOTYPE=m6APeak_logOR_GC.IP.adjusted_qqnorm
test_window=1e5
thresh_pvalue=1 # keep all results

echo "m6A_version: ${m6A_version}, phenotype: ${PHENOTYPE}, cis window size: ${test_window}, covariate: PCs"

dir_genotype=/project2/xinhe/m6A/m6A_seq/m6A_QTL/genotypes/hg19/YRI/sixtySample_hg19_201903
dir_phenotypes=/project2/xinhe/m6A/m6A_seq/m6A_QTL/peakcalling/${m6A_version}/${PHENOTYPE}
dir_covariates=${dir_phenotypes}/PCs/

echo "dir_genotype: ${dir_genotype}"
echo "dir_phenotypes: ${dir_phenotypes}"
echo "dir_covariates: ${dir_covariates}"

dir_output=/project2/xinhe/m6A/m6A_seq/m6A_QTL/results/hg19/m6A_QTLs/${m6A_version}/fastQTL_output/${PHENOTYPE}/cis_${test_window}bp/nominal_pass
mkdir -p ${dir_output}

dir_log=${dir_output}/log
mkdir -p ${dir_log}

for NUM_PC in {0..20}
do

  for CHR in "chr"{1..22}
  do

    echo "fastQTL for ${PHENOTYPE} in ${CHR} with ${NUM_PC} PCs included."

    if [[ ${NUM_PC} -gt 0 ]]; then
      # include PCs as covariates
      fastQTL --vcf ${dir_genotype}/Imputed.sixty.hg19.${CHR}.vcf.gz \
              --bed ${dir_phenotypes}/${PHENOTYPE}.fastQTL.txt.gz \
              --cov ${dir_covariates}/${NUM_PC}PCs.txt.gz \
              --region ${CHR} \
              --window ${test_window} \
              --threshold ${thresh_pvalue} \
              --out ${dir_output}/fastQTL.nominals.${PHENOTYPE}.${CHR}.${test_window}bp.${NUM_PC}PCs.pvalue${thresh_pvalue}.txt.gz \
              --log ${dir_log}/fastQTL.nominals.${PHENOTYPE}.${CHR}.${test_window}bp.${NUM_PC}PCs.log \
              --silent

    else
      # No covariates
      fastQTL --vcf ${dir_genotype}/Imputed.sixty.hg19.${CHR}.vcf.gz \
              --bed ${dir_phenotypes}/${PHENOTYPE}.fastQTL.txt.gz \
              --region ${CHR} \
              --window ${test_window} \
              --threshold ${thresh_pvalue} \
              --out ${dir_output}/fastQTL.nominals.${PHENOTYPE}.${CHR}.${test_window}bp.${NUM_PC}PCs.pvalue${thresh_pvalue}.txt.gz \
              --log ${dir_log}/fastQTL.nominals.${PHENOTYPE}.${CHR}.${test_window}bp.${NUM_PC}PCs.log \
              --silent
    fi

  done

  zcat ${dir_output}/fastQTL.nominals.${PHENOTYPE}.chr*.${test_window}bp.${NUM_PC}PCs.pvalue${thresh_pvalue}.txt.gz | gzip -c > \
  ${dir_output}/fastQTL.nominals.${PHENOTYPE}.AllChrs.${test_window}bp.${NUM_PC}PCs.pvalue${thresh_pvalue}.txt.gz

done

echo "Done."

echo "fastQTL results at: ${dir_output}"

fastQTL permutation pass

# source: m6AQTL_workflowr/code/fastQTL_permutation_m6APeak_logOR_PCs.sbatch

## map m6A-QTLs using fastQTL permutation pass

## parameters
m6A_version=jointPeak_threshold5_MeRIP_HISAT2Map
PHENOTYPE=m6APeak_logOR_GC.IP.adjusted_qqnorm
test_window=1e5
num_permutations=1000

echo "m6A_version: ${m6A_version}, phenotype: ${PHENOTYPE}, cis window size: ${test_window}, number of permutations: ${num_permutations}, covariate: PCs"

dir_genotype=/project2/xinhe/m6A/m6A_seq/m6A_QTL/genotypes/hg19/YRI/sixtySample_hg19_201903
dir_phenotypes=/project2/xinhe/m6A/m6A_seq/m6A_QTL/peakcalling/${m6A_version}/${PHENOTYPE}
dir_covariates=${dir_phenotypes}/PCs

echo "dir_genotype: ${dir_genotype}"
echo "dir_phenotypes: ${dir_phenotypes}"
echo "dir_covariates: ${dir_covariates}"

dir_output=/project2/xinhe/m6A/m6A_seq/m6A_QTL/results/hg19/m6A_QTLs/${m6A_version}/fastQTL_output/${PHENOTYPE}/cis_${test_window}bp/permutation_pass
mkdir -p ${dir_output}

dir_log=${dir_output}/log
mkdir -p ${dir_log}

for NUM_PC in {0..20}
do

  for CHR in "chr"{1..22}
  do

    echo "fastQTL for ${PHENOTYPE} in ${CHR} with ${NUM_PC} PCs included."

    if [[ ${NUM_PC} -gt 0 ]]; then
      # include PCs as covariates
      fastQTL --vcf ${dir_genotype}/Imputed.sixty.hg19.${CHR}.vcf.gz \
              --bed ${dir_phenotypes}/${PHENOTYPE}.fastQTL.txt.gz \
              --cov ${dir_covariates}/${NUM_PC}PCs.txt.gz \
              --region ${CHR} \
              --permute ${num_permutations} \
              --window ${test_window} \
              --out ${dir_output}/fastQTL.${num_permutations}permutations.${PHENOTYPE}.${CHR}.${test_window}bp.${NUM_PC}PCs.txt.gz \
              --log ${dir_log}/fastQTL.${num_permutations}permutations.${PHENOTYPE}.${CHR}.${test_window}bp.${NUM_PC}PCs.log \
              --silent

    else
      # No covariates
      fastQTL --vcf ${dir_genotype}/Imputed.sixty.hg19.${CHR}.vcf.gz \
              --bed ${dir_phenotypes}/${PHENOTYPE}.fastQTL.txt.gz \
              --region ${CHR} \
              --permute ${num_permutations} \
              --window ${test_window} \
              --out ${dir_output}/fastQTL.${num_permutations}permutations.${PHENOTYPE}.${CHR}.${test_window}bp.${NUM_PC}PCs.txt.gz \
              --log ${dir_log}/fastQTL.${num_permutations}permutations.${PHENOTYPE}.${CHR}.${test_window}bp.${NUM_PC}PCs.log \
              --silent
    fi

  done

  zcat ${dir_output}/fastQTL.${num_permutations}permutations.${PHENOTYPE}.chr*.${test_window}bp.${NUM_PC}PCs.txt.gz | gzip -c > \
  ${dir_output}/fastQTL.${num_permutations}permutations.${PHENOTYPE}.AllChrs.${test_window}bp.${NUM_PC}PCs.txt.gz

done

echo "Done."

echo "fastQTL results at: ${dir_output}"

Process fastQTL summary stats

Extract SNP info


cd /project2/xinhe/m6A/m6A_seq/m6A_QTL/genotypes/hg19/YRI/sixtySample_hg19_201903

for chr in "chr"{1..22} "chrX"
do
  echo $chr
  bcftools query -f '%CHROM %POS %ID %REF %ALT\n' Imputed.sixty.hg19.${chr}.vcf.gz | gzip > SNP_info/"snp_info_${chr}.txt.gz"
done

zcat snp_info_chr*.txt.gz | gzip -c > snp_info.txt.gz

Process fastQTL summary stats with PCs, extract peak and SNP location, allele information, beta, SE, t-statistic, pvalue and qvalue

#source: m6AQTL_workflowr/code/process_fastQTL_summarystats.R

library(qvalue)
library(data.table)
options(scipen=999)

# get standard error and t-statitic from linear model output
get_se_summarystats <- function(beta, pvalue, N, num_cvrt = 0){
  cat("Extract summary stats from linear model ... \n")
  # Number of indepedent variables in the model: intercept + genotype + covariates
  P <- 1 + 1 + num_cvrt
  # Calculating the t-statistic
  t_statistic <- abs(qt(pvalue/2, df = N - P)) * sign(beta)
  # Calculating standard error
  se <- beta / t_statistic
  summary_stats <- data.frame(beta = beta, se = se, statistic = t_statistic, pvalue = pvalue)
  return(summary_stats)
}

# extract peak regions
extract_m6Apeak_region <- function(peaks){
  peaks <- as.character(peaks)
  peak_loc <- sapply(strsplit(peaks, "_"), `[`, 1)
  GENE <- sapply(strsplit(peaks, "_"), `[`, 2)
  strand <- sapply(strsplit(peaks, "_"), `[`, 3)
  chr <- gsub(":.+", "", peak_loc)
  peak_range <- gsub(".+:", "", peak_loc)
  peak_start <- as.integer(sapply(strsplit(peak_range, "-"), `[`, 1))
  peak_end <- as.integer(sapply(strsplit(peak_range, "-"), `[`, 2))
  peak_mid <- round((peak_start + peak_end)/2)

  result.df <- data.frame(chr, peak_start, peak_end, peak_mid, strand, GENE)
  return(result.df)
}

####### begins here #######
m6A_version <- "jointPeak_threshold5_MeRIP_HISAT2Map"
phenotype_name <- "m6APeak_logOR_GC.IP.adjusted_qqnorm"
num_PCs <- 15
chr_name <- "AllChrs"
num_permutations <- 1000

cat("Load fastQTL results for", m6A_version, "version of m6A reads alignment, phenotype file:", phenotype_name, "in", chr_name, "\n")
cat("Choose the results with", num_PCs, "PCs \n")

dir_fastQTL_nominal <- paste0("/project2/xinhe/m6A/m6A_seq/m6A_QTL/results/hg19/m6A_QTLs/", m6A_version, "/fastQTL_output/", phenotype_name, "/cis_1e5bp/nominal_pass/")
dir_fastQTL_PermPass <- paste0("/project2/xinhe/m6A/m6A_seq/m6A_QTL/results/hg19/m6A_QTLs/", m6A_version, "/fastQTL_output/", phenotype_name, "/cis_1e5bp/permutation_pass/")

dir_output <- paste0("/project2/xinhe/m6A/m6A_seq/m6A_QTL/results/hg19/m6A_QTLs/", m6A_version, "/fastQTL_cis_100kb/", phenotype_name)
dir.create(dir_output, showWarnings = F, recursive = T)
cat("Directory of output:", dir_output, "\n")

## load SNP location
cat("loading SNP location ... \n")
SNP_info <- fread("/project2/xinhe/m6A/m6A_seq/m6A_QTL/genotypes/hg19/YRI/sixtySample_hg19_201903/SNP_info/snp_info.txt.gz")
colnames(SNP_info) <- c("CHROM", "POS", "ID", "REF", "ALT")
SNP_info$SNP <- paste(SNP_info$CHROM, SNP_info$POS, sep = ":")

##################################
## nominal pass                 ##
##################################
cat("process results from fastQTL nominal pass ... \n")

filename_fastQTL_nominal <- paste0(dir_fastQTL_nominal, "/fastQTL.nominals.", phenotype_name, ".", chr_name, ".1e5bp.", num_PCs, "PCs.pvalue1.txt.gz")
if(!file.exists(filename_fastQTL_nominal)){
  stop(filename_fastQTL_nominal, "does not exist! ")
}
m6AQTLs_nominal.df <- fread(filename_fastQTL_nominal)
colnames(m6AQTLs_nominal.df) <- c("PEAK", "SNPID", "DIST", "pvalue", "beta")

## add SNP info and update DIST
m6AQTLs_nominal.df <- cbind(m6AQTLs_nominal.df, extract_m6Apeak_region(m6AQTLs_nominal.df$PEAK)[,c("chr", "peak_mid", "strand","GENE")])
m6AQTLs_nominal.df$SNP_POS <- m6AQTLs_nominal.df$peak_mid + m6AQTLs_nominal.df$DIST
m6AQTLs_nominal.df$SNP <- paste(m6AQTLs_nominal.df$chr, m6AQTLs_nominal.df$SNP_POS, sep = ":")

SNPinfo_matched.df <- SNP_info[match(m6AQTLs_nominal.df$SNP, SNP_info$SNP), ]
if(!identical(m6AQTLs_nominal.df$SNPID, SNPinfo_matched.df$ID)){stop("SNP ID not match!")}
m6AQTLs_nominal.df <- data.frame(m6AQTLs_nominal.df, REF = SNPinfo_matched.df$REF, ALT = SNPinfo_matched.df$ALT)

m6AQTLs_nominal.df$DIST <- m6AQTLs_nominal.df$SNP_POS - m6AQTLs_nominal.df$peak_mid
m6AQTLs_nominal.df[m6AQTLs_nominal.df$strand == "-", "DIST"] <- - m6AQTLs_nominal.df[m6AQTLs_nominal.df$strand == "-", "DIST"]

cat("extract SE and t-statistic ... \n")
m6AQTLs_nominal_summarystats <- get_se_summarystats(m6AQTLs_nominal.df$beta, m6AQTLs_nominal.df$pvalue, N = 60, num_cvrt = num_PCs)
m6AQTLs_nominal.df$se <- m6AQTLs_nominal_summarystats$se
m6AQTLs_nominal.df$statistic <- m6AQTLs_nominal_summarystats$statistic
m6AQTLs_nominal.df$qvalue <- qvalue(m6AQTLs_nominal.df$pvalue)$qvalues

m6AQTLs_nominal.df <- m6AQTLs_nominal.df[, c("PEAK", "GENE", "SNP", "SNPID", "REF", "ALT", "DIST", "beta", "se", "statistic", "pvalue", "qvalue")]

cat("Save nominal-pass summary stats ... \n")
saveRDS(m6AQTLs_nominal.df, paste0(dir_output, "/m6AQTL.", phenotype_name, ".", num_PCs, "PCs.fastQTL.nominals.rds"))
fwrite(m6AQTLs_nominal.df, paste0(dir_output, "/m6AQTL.", phenotype_name, ".", num_PCs, "PCs.fastQTL.nominals.txt"), sep = "\t")
system(paste0("gzip -f ", dir_output, "/m6AQTL.", phenotype_name, ".", num_PCs, "PCs.fastQTL.nominals.txt"))

## qvalue < 0.1
sig_m6AQTLs_nominal.df <- m6AQTLs_nominal.df[m6AQTLs_nominal.df$qvalue < 0.1, ]
cat(nrow(sig_m6AQTLs_nominal.df), "sig m6A-QTLs based on nominal p-values (qvalue < 0.1) \n")
saveRDS(sig_m6AQTLs_nominal.df, paste0(dir_output, "/sig.m6AQTL.", phenotype_name, ".", num_PCs, "PCs.fastQTL.nominals.rds"))

lead_m6AQTLs_nominal.df <- sig_m6AQTLs_nominal.df[order(sig_m6AQTLs_nominal.df$pvalue), ]
lead_m6AQTLs_nominal.df <- lead_m6AQTLs_nominal.df[!duplicated(lead_m6AQTLs_nominal.df$PEAK), ]
cat(nrow(lead_m6AQTLs_nominal.df), "lead m6A-QTLs based on nominal p-values (qvalue < 0.1) \n")
saveRDS(lead_m6AQTLs_nominal.df, paste0(dir_output, "/lead.m6AQTL.", phenotype_name, ".", num_PCs, "PCs.fastQTL.nominals.rds"))

## qvalue < 0.2
sig_m6AQTLs_nominal.df <- m6AQTLs_nominal.df[m6AQTLs_nominal.df$qvalue < 0.2, ]
cat(nrow(sig_m6AQTLs_nominal.df), "sig m6A-QTLs based on nominal p-values (qvalue < 0.2) \n")
saveRDS(sig_m6AQTLs_nominal.df, paste0(dir_output, "/sig.m6AQTL.", phenotype_name, ".", num_PCs, "PCs.fastQTL.nominals.qvalue_0.2.rds"))

lead_m6AQTLs_nominal.df <- sig_m6AQTLs_nominal.df[order(sig_m6AQTLs_nominal.df$pvalue), ]
lead_m6AQTLs_nominal.df <- lead_m6AQTLs_nominal.df[!duplicated(lead_m6AQTLs_nominal.df$PEAK), ]
cat(nrow(lead_m6AQTLs_nominal.df), "lead m6A-QTLs based on nominal p-values (qvalue < 0.2) \n")
saveRDS(lead_m6AQTLs_nominal.df, paste0(dir_output, "/lead.m6AQTL.", phenotype_name, ".", num_PCs, "PCs.fastQTL.nominals.qvalue_0.2.rds"))

cat("Done. \n")

##################################
## permutation pass             ##
##################################
cat("process results from fastQTL permutation pass ... \n")

filename_fastQTL_PermPass <- paste0(dir_fastQTL_PermPass, "/fastQTL.", num_permutations, "permutations.", phenotype_name, ".", chr_name, ".1e5bp.", num_PCs, "PCs.txt.gz")
if(!file.exists(filename_fastQTL_PermPass)){
  stop(filename_fastQTL_PermPass, "does not exist! ")
}
m6AQTLs_PermPass.df <- fread(filename_fastQTL_PermPass)
colnames(m6AQTLs_PermPass.df) <- c("PEAK", "nvar", "shape1", "shape2", "dummy", "SNPID", "DIST", "pvalue", "beta", "ppval", "bpval")

## add SNP info and update DIST
m6AQTLs_PermPass.df <- cbind(m6AQTLs_PermPass.df, extract_m6Apeak_region(m6AQTLs_PermPass.df$PEAK)[,c("chr", "peak_mid", "strand","GENE")])
m6AQTLs_PermPass.df$SNP_POS <- m6AQTLs_PermPass.df$peak_mid + m6AQTLs_PermPass.df$DIST
m6AQTLs_PermPass.df$SNP <- paste(m6AQTLs_PermPass.df$chr, m6AQTLs_PermPass.df$SNP_POS, sep = ":")

SNPinfo_matched.df <- SNP_info[match(m6AQTLs_PermPass.df$SNP, SNP_info$SNP), ]
if(!identical(m6AQTLs_PermPass.df$SNPID, SNPinfo_matched.df$ID)){stop("SNP ID not match!")}
m6AQTLs_PermPass.df <- data.frame(m6AQTLs_PermPass.df, REF = SNPinfo_matched.df$REF, ALT = SNPinfo_matched.df$ALT)

m6AQTLs_PermPass.df$DIST <- m6AQTLs_PermPass.df$SNP_POS - m6AQTLs_PermPass.df$peak_mid
m6AQTLs_PermPass.df[m6AQTLs_PermPass.df$strand == "-", "DIST"] <- - m6AQTLs_PermPass.df[m6AQTLs_PermPass.df$strand == "-", "DIST"]

cat("extract SE and t-statistic ... \n")
m6AQTLs_PermPass_summarystats <- get_se_summarystats(m6AQTLs_PermPass.df$beta, m6AQTLs_PermPass.df$pvalue, N = 60, num_cvrt = num_PCs)
m6AQTLs_PermPass.df$se <- m6AQTLs_PermPass_summarystats$se
m6AQTLs_PermPass.df$statistic <- m6AQTLs_PermPass_summarystats$statistic
m6AQTLs_PermPass.df$qvalue <- qvalue(m6AQTLs_PermPass.df$bpval)$qvalues

m6AQTLs_PermPass.df <- m6AQTLs_PermPass.df[, c("PEAK", "GENE", "SNP", "SNPID", "REF", "ALT", "DIST", "beta", "se", "statistic", "pvalue", "ppval", "bpval", "qvalue")]

cat("Save permutation-pass summary stats ... \n")
saveRDS(m6AQTLs_PermPass.df, paste0(dir_output, "/m6AQTL.", phenotype_name, ".", num_PCs, "PCs.fastQTL.permutations.rds"))
fwrite(m6AQTLs_PermPass.df, paste0(dir_output, "/m6AQTL.", phenotype_name, ".", num_PCs, "PCs.fastQTL.permutations.txt"), sep = "\t")
system(paste0("gzip -f ", dir_output, "/m6AQTL.", phenotype_name, ".", num_PCs, "PCs.fastQTL.permutations.txt"))

## qvalue < 0.1
sig_m6AQTLs_PermPass.df <- m6AQTLs_PermPass.df[m6AQTLs_PermPass.df$qvalue < 0.1, ]
cat(nrow(sig_m6AQTLs_PermPass.df), "sig m6A-QTLs based on permutation adjusted p-values (qvalue < 0.1) \n")
saveRDS(sig_m6AQTLs_PermPass.df, paste0(dir_output, "/sig.m6AQTL.", phenotype_name, ".", num_PCs, "PCs.fastQTL.permutations.rds"))

## qvalue < 0.2
sig_m6AQTLs_PermPass.df <- m6AQTLs_PermPass.df[m6AQTLs_PermPass.df$qvalue < 0.2, ]
cat(nrow(sig_m6AQTLs_PermPass.df), "sig m6A-QTLs based on permutation adjusted p-values (qvalue < 0.2) \n")
saveRDS(sig_m6AQTLs_PermPass.df, paste0(dir_output, "/sig.m6AQTL.", phenotype_name, ".", num_PCs, "PCs.fastQTL.permutations.qvalue_0.2.rds"))

cat("Done. \n")

sessionInfo()

R version 3.5.1 (2018-07-02)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Scientific Linux 7.4 (Nitrogen)

Matrix products: default
BLAS/LAPACK: /software/openblas-0.2.19-el7-x86_64/lib/libopenblas_haswellp-r0.2.19.so

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] workflowr_1.6.0

loaded via a namespace (and not attached):
 [1] Rcpp_1.0.3        rprojroot_1.3-2   digest_0.6.23     later_1.0.0      
 [5] R6_2.4.1          backports_1.1.5   git2r_0.26.1.9000 magrittr_1.5     
 [9] evaluate_0.14     stringi_1.4.5     rlang_0.4.4       fs_1.3.1         
[13] promises_1.1.0    whisker_0.4       rmarkdown_2.1     tools_3.5.1      
[17] stringr_1.4.0     glue_1.3.1        httpuv_1.5.2      xfun_0.12        
[21] yaml_2.2.0        compiler_3.5.1    htmltools_0.4.0   knitr_1.28