The 4DN Repli-seq data processing pipeline includes read clipping, alignment, filtering, and aggregation. Downstream normalization, smoothing and replicate merging steps will be implemented in the near future.
Overview
Read Clipping
Adaptor sequences are clipped from repli-seq reads using cutadapt version 1.14. Specifically, we run:
- The
-q 0is used to turn off low-quality base removal before adapter searching. - The
-0 1sets the minimum required overlap length between read end and adaptor to be 1 (default is 3), in case the adaptor sequence partially overlaps with the read rather than being contained in a read. - The
-m 0means that empty reads are kept and will appear in the output.
AGATCGGAAGAGCACACGTCTG is used as adaptor sequence.
Alignment
Filtering
For filtering valid Repli-seq alignments, we use samtools.
Specifically, the filtering workflow consists of the following
steps:
- MAPQ filtering:
samtools viewcommand with-q 20was used to skip alignments with MAPQ smaller than 20. - Sorting:
samtools sortcommand was used to sort alignments by genomic coordinates. - Removal of PCR duplicates:
samtools rmdupcommand was used to remove duplicate alignments.
Binning and Aggregation
Filtered reads were aggregated for each 5kb window using bedtools coverage. Specifically, the following command was used.
Output is provided in both gzipped bedgraph and bigwig formats and can be viewed using HiGlass.
As of v16.1, the pipeline output includes a raw counts file in addition to the default scaled counts (RPKM).
Source files
The pipeline components are pre-installed in a publicly
available Docker image (4dndcic/4dn-repliseq:v16.1) on
Docker Hub. The source code for the Docker image and pipeline
description in Common Workflow Language (CWL) can be found on
GitHub.
- Latest version (v16.1)
- Workflow metadata : https://data.4dnucleome.org/workflows/622bdf75-2dd1-457f-ad78-d4cd128f8f5b/
- CWL : https://github.com/4dn-dcic/docker-4dn-repliseq/tree/v16.1/cwl
- Docker : https://github.com/4dn-dcic/docker-4dn-repliseq/tree/v16.1
- Older versions
- v16
- Workflow metadata : https://data.4dnucleome.org/workflows/2a6807f1-93db-4c7b-b148-672534193974/
- CWL : https://github.com/4dn-dcic/docker-4dn-repliseq/tree/v16/cwl
- Docker : https://github.com/4dn-dcic/docker-4dn-repliseq/tree/v16
- v14
- Workflow metadata : https://data.4dnucleome.org/workflows/4459a4d8-1bd8-4b6a-b2cc-2506f4270a34/
- CWL : https://github.com/4dn-dcic/docker-4dn-repliseq/tree/v14/cwl
- Docker : https://github.com/4dn-dcic/docker-4dn-repliseq/tree/v14
- v13.1
- Workflow metadata : https://data.4dnucleome.org/workflows/146da22a-502d-4500-bf57-a7cf0b4b2364/
- CWL : https://github.com/4dn-dcic/docker-4dn-repliseq/tree/v13.1/cwl
- Docker : https://github.com/4dn-dcic/docker-4dn-repliseq/tree/v13.1