Last updated: 2020-03-17

Checks: 2 0

Knit directory: m6AQTL_reproducibleDocument/

This reproducible R Markdown analysis was created with workflowr (version 1.3.0). The Checks tab describes the reproducibility checks that were applied when the results were created. The Past versions tab lists the development history.


Great! Since the R Markdown file has been committed to the Git repository, you know the exact version of the code that produced these results.

Great! You are using Git for version control. Tracking code development and connecting the code version to the results is critical for reproducibility. The version displayed above was the version of the Git repository at the time these results were generated.

Note that you need to be careful to ensure that all relevant files for the analysis have been committed to Git prior to generating the results (you can use wflow_publish or wflow_git_commit). workflowr only checks the R Markdown file, but you know if there are other scripts or data files that it depends on. Below is the status of the Git repository when the results were generated:


Untracked files:
    Untracked:  analysis/Re-analysis_of_Wang_2015_cell.Rmd

Note that any generated files, e.g. HTML, png, CSS, etc., are not included in this status report because it is ok for generated content to have uncommitted changes.


There are no past versions. Publish this analysis with wflow_publish() to start tracking its development.


# Set these environment vars to point to
# your local installation of WASP
WASP=$HOME/Program/WASP
DATA=$HOME/m6AQTL/raw_fastq_all
DATA_DIR=$HOME/m6AQTL/WASP_OUT_201907
Genotype_data=$HOME/m6AQTL/variant_data/sixtySample_hg19
export LD_LIBRARY_PATH=$HOME/anaconda3/lib:$LD_LIBRARY_PATH
#export PATH=/home/scott/anaconda2/bin:$PATH

# These environment vars point to the reference genome and bowtie2.
# in the examples below, the reference genome is assumed
# to be indexed for use with bowtie2
INDEX=$HOME/Database/genome/hg19/hg19_UCSC
SPLICE=$HOME/Database/genome/hg19/hisat2_splice_sites.txt
GTF=$HOME/Database/genome/hg19/hg19_UCSC.gtf
re="re"

#$WASP/snp2h5/snp2h5 --chrom $Genotype_data/chromInfo.hg19.txt --format vcf --snp_index $Genotype_data/snp_index.h5 --snp_tab $Genotype_data/snp_tab.h5 --haplotype $Genotype_data/haps.h5 --samples $Genotype_data/YRI_samples.txt $Genotype_data/fiftyEightSample.hg19.chr*.vcf.gz
       
         
mySample="18486 18498 18499 18501 18502 18504 18505 18507 18508 18510 18511 18516 18517 18519 18522 18523 18852 18855 18856 18858 18861 18862 18870 18907 18909 18912 18913 18916 19092 19093 19098 19099 19101 19102 19108 19114 19116 19119 19127 19128 19130 19131 19137 19138 19140 19143 19144 19147 19152 19153 19159 19160 19192 19193 19200 19204 19209 19222 19239 19257"

for s in $mySample
do      
# Map reads using tophat2 (or another mapping tool of your choice)
hisat2 -x $INDEX --known-splicesite-infile $SPLICE -k 1 --no-unal --summary-file $DATA_DIR/alignment_summary/$s.IN.align_summary -p 15 -U $DATA/$s.IN.fastq.gz,$DATA/$s$re.IN.fastq.gz |samtools view -bS |samtools sort > $DATA_DIR/$s.input.bam
hisat2 -x $INDEX --known-splicesite-infile $SPLICE -k 1 --no-unal --summary-file $DATA_DIR/alignment_summary/$s.m6A.align_summary -p 15 -U $DATA/$s.m6A.fastq.gz,$DATA/$s$re.m6A.fastq.gz |samtools view -bS |samtools sort > $DATA_DIR/$s.m6A.bam


# Pull out reads that need to be remapped to check for bias
# Use the -p option for paired-end reads.
python $WASP/mapping/find_intersecting_snps.py \
       --output_dir $DATA_DIR  \
       --snp_index $Genotype_data/snp_index.h5 \
       --snp_tab $Genotype_data/snp_tab.h5 \
       --haplotype $Genotype_data/haps.h5 \
       --samples $Genotype_data/YRI_samples.txt \
       $DATA_DIR/$s.input.bam

python $WASP/mapping/find_intersecting_snps.py \
       --output_dir $DATA_DIR  \
       --snp_index $Genotype_data/snp_index.h5 \
       --snp_tab $Genotype_data/snp_tab.h5 \
       --haplotype $Genotype_data/haps.h5 \
       --samples $Genotype_data/YRI_samples.txt \
       $DATA_DIR/$s.m6A.bam
# Remap the reads, using same the program and options as before.
# NOTE: If you use an option in the first mapping step that modifies the
# reads (e.g. the -5 read trimming option to bowtie2) you should omit this
# option during the second mapping step here (otherwise the reads will be modified
# twice)!
hisat2 -x $INDEX --known-splicesite-infile $SPLICE -k 1 --no-unal --summary-file $DATA_DIR/alignment_summary/$s.IN.SNPswap_align_summary -p 15 -U $DATA_DIR/$s.input.remap.fq.gz |samtools view -bS |samtools sort > $DATA_DIR/$s.input.remap.bam
hisat2 -x $INDEX --known-splicesite-infile $SPLICE -k 1 --no-unal --summary-file $DATA_DIR/alignment_summary/$s.m6A.SNPswap_align_summary -p 15 -U $DATA_DIR/$s.m6A.remap.fq.gz |samtools view -bS |samtools sort > $DATA_DIR/$s.m6A.remap.bam

# Use filter_remapped_reads.py to create filtered list of reads that correctly
# remap to same position
python $WASP/mapping/filter_remapped_reads.py \
       $DATA_DIR/$s.input.to.remap.bam \
       $DATA_DIR/$s.input.remap.bam \
       $DATA_DIR/$s.input.remap.keep.bam

python $WASP/mapping/filter_remapped_reads.py \
       $DATA_DIR/$s.m6A.to.remap.bam \
       $DATA_DIR/$s.m6A.remap.bam \
       $DATA_DIR/$s.m6A.remap.keep.bam

# Create a merged BAM containing [1] reads that did
# not need remapping [2] filtered remapped reads
samtools merge $DATA_DIR/$s.input.keep.merged.bam \
         $DATA_DIR/$s.input.keep.bam $DATA_DIR/$s.input.remap.keep.bam
samtools merge $DATA_DIR/$s.m6A.keep.merged.bam \
         $DATA_DIR/$s.m6A.keep.bam $DATA_DIR/$s.m6A.remap.keep.bam

# Sort and index the bam file
samtools sort $DATA_DIR/$s.input.keep.merged.bam \
        -o $DATA_DIR/$s.input.keep.merged.sorted.bam
samtools sort $DATA_DIR/$s.m6A.keep.merged.bam \
        -o $DATA_DIR/$s.m6A.keep.merged.sorted.bam

samtools index $DATA_DIR/$s.input.keep.merged.sorted.bam
samtools index $DATA_DIR/$s.m6A.keep.merged.sorted.bam
# Filter out duplicate reads. Use rmdup_pe.py for paired-end reads,
# rmdup.py for single-end reads.
#python $WASP/mapping/rmdup_pe.py $DATA_DIR/sim_pe_reads.keep.merged.sorted.bam \
#       $DATA_DIR/sim_pe_reads.keep.rmdup.merged.sorted.bam

rm $DATA_DIR/$s.m6A.keep.merged.bam
rm $DATA_DIR/$s.input.keep.merged.bam

rm $DATA_DIR/$s.input.keep.bam
rm $DATA_DIR/$s.m6A.keep.bam

rm $DATA_DIR/$s.input.remap.keep.bam
rm $DATA_DIR/$s.m6A.remap.keep.bam

rm $DATA_DIR/$s.input.remap.bam
rm $DATA_DIR/$s.m6A.remap.bam

rm $DATA_DIR/$s.input.remap.fq.gz
rm $DATA_DIR/$s.m6A.remap.fq.gz

rm $DATA_DIR/$s.input.sort.bam
rm $DATA_DIR/$s.m6A.sort.bam

rm $DATA_DIR/$s.input.to.remap.bam
rm $DATA_DIR/$s.m6A.to.remap.bam

rm $DATA_DIR/$s.input.bam
rm $DATA_DIR/$s.m6A.bam


wait
done