Single-cell RNA sequencing (scRNA-seq) data analysis is not just pressing one button. To get from raw FASTQ files to exploratory reports or publication-ready plots, you will need to:
- Preprocess your raw FASTQ files.
- Evaluate quality control (QC) metrics.
- Choose the right mapping tools and reference genomes.
- Preprocess gene count tables.
- Perform cell calling.
Hence, there is not just ‘one button’ to press; in reality, you will need to press many buttons. Here at SCD, we are experts in scRNA-seq analysis and aim to minimise “button pressing” as much as possible. We would like to share some tips and tricks we use to streamline our work in this blog.
[ Please note that commands are adapted for macOS and Linux systems ]
Jump to a section in this blog:
Create aliases
Working from the terminal is great. If you use specific commands on a daily basis you can add them to your ~/.bashrc (Linux) (or ~/.zshrc for MacOS) as aliases. An alias can be seen as command’s nickname. After adding a new alias to your .bashrc / .zshrc do not forget to source it (e.g. source ~/.bashrc) to implement changes.
Simple alias examples to add to your ~/.bashrc (~/.zshrc):
# copy a file into your current folder; useful when you have template files stored in a shared location
alias get_templ='cp /path/to/templates/template.txt . '
# if you work on a project for over a long period of time, you add an alias to cd to its folder quickly
alias cd1='cd path/to/where/to/cd/to'
# synch data ($1) from remote machine, e.g. HPC, to your local computer
get () { rsync -azvPL location/to/sync/from/$1 . }
# synch data ($1) to remote machine ($2), e.g. HPC, from your local computer
put () { rsync -azvP $1 location/to/sync/to/$2 }
# open Finder in current working directory (macOS)
alias o='open .'
Our data scientists can maximize your biological discovery
Our data consultancy service is designed for scientists seeking easy access to deep bioinformatics expertise. Rely on us for fast, flexible analysis. You bring the biological insight; we bring the analytical power.
Speed up your cursor
Very simple, yet very handy. Imagine, you have a set of FASTQ files that you want to map to a reference genome using a cellranger multi command, e.g.:
cellranger multi --id=library_A --csv=config.csv --disable-ui
You copy-paste the command to the terminal. Next, you want to edit the input --id (e.g. from library_A to library_X), meaning that your cursor need to be moved left all the way until the --id part of the command. In cases like these a faster cursor is handy!
Speed up your cursor:
- From the Apple menu , choose System Settings.
- Go to Keyboard (or paste in the search-bar key repeat rate).
- Increase Key repeat rate.
Edit files from the terminal
Let’s continue with the above example: cellranger multi --id=library_X --csv=config.csv --disable-ui.
Now, we want to modify the config.csv, which looks something like:
[gene-expression]
reference,/path/to/refdata-gex-GRCh38-2024-A
probe-set,/path/to/Chromium_Human_Transcriptome_Probe_Set_v1.1.0_GRCh38-2024-A.csv
no-bam,true
[libraries]
fastq_id,fastqs,feature_types
library_A,/path/to/data,Gene Expression
[samples]
sample_id,probe_barcode_ids,description
sample_a,BC001,control_1
sample_b,BC002,control_2
sample_c,BC003,test_1
sample_d,BC004,test_2
The above `config.csv` is exactly what we need to map our imaginary 10X Flex dataset. Let’s say we want to replace ‘test’ with ‘treatment’ under the sample description.
Edit files from the terminal:
### Edit file with: sed ###
# view changes (you will see that test_1 and test_2 now appear as treatment_1 and treatment_2)
sed 's/test/treatment/g' config.csv
# save changes to a new file
sed 's/test/treatment/g' config.csv > config_new.csv
### Edit file with: vim ###
# open the file with vim
vim config.csv
# once in vim, use the following substitution command to replace test with treatment for all rows
# substitution command explained:
# : → enters command-line mode in Vim.
# % → applies the command to the entire file (from the first to the last line).
# s → stands for substitute.
# test → this is the pattern to search for.
# treatment → this is the replacement text.
# g → stands for global on each line, meaning “replace all occurrences on the line,” not just the first one.
:%s/test/treatment/g
# save changes (w - write) and exit vim (q - quit)
:wq
This is how your file will look like after you have replaced test with treatment:
[gene-expression]
reference,/path/to/refdata-gex-GRCh38-2024-A
probe-set,/path/to/Chromium_Human_Transcriptome_Probe_Set_v1.1.0_GRCh38-2024-A.csv
no-bam,true
[libraries]
fastq_id,fastqs,feature_types
library_A,/path/to/data,Gene Expression
[samples]
sample_id,probe_barcode_ids,description
sample_a,BC001,control_1
sample_b,BC002,control_2
sample_c,BC003,treatment_1
sample_d,BC004,treatment_2
Show the branch name when in a git repo
Perfect, now that we have our terminal setup and our cellranger command running in the background, we have time for code development. Obviously, we will use git to manage our code. In the terminal, once we enter a git repository (a project folder with code, data, etc.), it is very helpful to automatically see the branch we are on, which would look something like this:
# no information about the branch is given
test-repository%
# the name of the branch is clearly visible
test-repository(branch-add-documentation-to-single-cell-flow)%
To show the branch name when in a git repo add the following lines to your .bashrc / .zshrc file:
### In your .bashrc file ###
# Show git branch
parse_git_branch() {
git branch 2>/dev/null | sed -n '/\* /s///p'
}
# Export the custom shell prompt, including the git branch
export PS1="\[\e[97m\]\u@\h \[\e[38;2;36;230;193m\]\W \[\e[38;2;226;255;82m\]\$(parse_git_branch)\[\e[00m\] $"
### In your .zshrc file ###
autoload -Uz vcs_info
precmd() { vcs_info }
# allow ${...} in PROMPT to expand
setopt prompt_subst
# configure vcs_info for git
zstyle ':vcs_info:git:*' formats '%F{cyan}(%b)%f' # green branch
zstyle ':vcs_info:*' enable git
# set the prompt
PROMPT='%n@%m %1~ ${vcs_info_msg_0_} %# '