scATAC Walkthrough =================== About ------------------------- ABC can predict cell-type-specific enhancer-gene interactions using only the chromatin accessibility profile obtained through scATAC-seq. Given the sparsity of scATAC data, we recommend pseudo-bulking the scATAC fragments by cell cluster. Each pseudo-bulked scATAC profile should contain at least 3 million (preferably 6 million) unique fragments to ensure optimal performance. We'll walk through an example of running a scATAC dataset through ABC. 1. Converting to TagAlign ------------------------- .. admonition:: **Prerequisite** Make sure you've followed the environment setup steps in :ref:`GettingStarted` and activate your abc conda environment. When working with scATAC files, you may start off with a `fragment file `_. Since ABC works with `tagAlign file format `_, we'll want to convert our fragment file to tagAlign. If you have a tagAlign file already, great! You can skip to the stage where we create the biosamples table. Let's download an example fragment file from the internet .. code-block:: bash (abc-env) [atan5133@sh03-04n23 /oak/stanford/groups/engreitz/Users/atan5133/data] (job 37050919) $ wget https://www.encodeproject.org/files/ENCFF794UXO/@@download/ENCFF794UXO.tar.gz (abc-env) [atan5133@sh03-04n23 /oak/stanford/groups/engreitz/Users/atan5133/data] (job 37050919) $ tar -xf ENCFF794UXO.tar.gz (abc-env) [atan5133@sh03-04n23 /oak/stanford/groups/engreitz/Users/atan5133/data] (job 37050919) $ ls encode_scatac_dcc_2/results/ENCSR308ZGJ-1/fragments/ fragments.tsv.gz fragments.tsv.gz.tbi Let's convert the fragment file to .tagAlign and index it .. code-block:: bash (abc-env) [atan5133@sh03-04n23 /oak/stanford/groups/engreitz/Users/atan5133/data] (job 37050919) $ cd encode_scatac_dcc_2/results/ENCSR308ZGJ-1/fragments (abc-env) [atan5133@sh03-04n23 /oak/stanford/groups/engreitz/Users/atan5133/data/encode_scatac_dcc_2/results/ENCSR308ZGJ-1/fragments] (job 37050919) $ LC_ALL=C zcat fragments.tsv.gz | sed '/^#/d' | awk -v OFS='\t' '{mid=int(($2+$3)/2); print $1,$2,mid,"N",1000,"+"; print $1,mid+1,$3,"N",1000,"-"}' | sort -k 1,1V -k 2,2n -k3,3n --parallel 5 | bgzip -c > tagAlign.gz # Adjust --parallel 5 based on number of cpus you have. The more cpus, the faster (abc-env) [atan5133@sh03-04n24 /oak/stanford/groups/engreitz/Users/atan5133/data/encode_scatac_dcc_2/results/ENCSR308ZGJ-1/fragments] (job 37151429) $ tabix -p bed tagAlign.gz Sorting the tagAlign typically takes a while and varies based on the size of the file. Setting LC_ALL=C and adding parallelization makes the sorting much faster. 2. Creating Your Config ------------------------- Once we have the tagAlign file, we'll create our biosample_table config. Let's assume we don't have any HiC data, so we'll opt to use powerlaw as our estimate for contact. Our biosamples config would look like the following: .. list-table:: :header-rows: 1 :widths: auto * - biosample - DHS - ATAC - H3K27ac - default_accessibility_feature - HiC_file - HiC_type - HiC_resolution - alt_TSS - alt_genes * - K562 - - /oak/stanford/groups/engreitz/Users/atan5133/data/encode_scatac_dcc_2/results/ENCSR308ZGJ-1/fragments/tagAlign.gz - - ATAC - - - - - Update the `ABC Config `_ to point the biosample tsv file you just created 3. Running ABC --------------- .. code-block:: bash (abc-env) [atan5133@sh02-ln01 login /oak/stanford/groups/engreitz/Users/atan5133/ ABC-Enhancer-Gene-Prediction]$ snakemake -j1 All your results will go to the ``results`` directory of your ABC directory! The actual predictions will be stored under the ``results/{biosample}/Predictions`` folder.