How to get started with single-cell data analysis

How to start with single cell data analysis

Once you receive your single-cell sequencing data, you can get started with your single-cell data analysis. It can be overwhelming when you first download the data. Here, we’ll give you some tips for the first steps of single-cell sequencing data analysis.

Choose your method

The first and most important point is that there is not a general way to analyze single-cell data. There are many different methods and software solutions. We advise you to test them in an exploratory effort to find one that works best for your data and biological question.

You have three options to do your analysis:

  1. You learn R or Python and run one of the many single-cell analysis packages
  2. Download or buy software in which you can load your data to perform the downstream analysis and visualization
  3. Outsource your analysis to a collaborator or company

Here, we’ll give you some advice for the first option: perform your own analysis. We assume that you have count tables of mapped and demultiplexed data. This means that your data is in a gigantic table with genes as rows and cells as columns. It is the typical starting point for single-cell data analysis.

Step 1: Quality Control

First, it is essential to start with quality control of your data immediately. The QC will give you an instant idea about how well your experiment worked at a technical level. If the quality is not sufficient, you might need to repeat the experiment.

So, you can avoid a lot of wasted time by doing the quality control before moving on to downstream analysis. A few helpful questions: Do you detect the number of cells and read depth you were looking for? Do most cells look good (see filtering below)? Are all samples equally represented in the data?

You can do this by loading your data in R or Python and look at a few basic metrics. First, take a look at the number of successfully mapped reads. Second, check the number of cells in your dataset. And third, take a look at the number of genes per cell.

Step 2: Exploring filtering parameters

After the basic metrics, you can continue the quality control with three standard histograms.

A great first plot to make is a histogram of the total UMIs/cell. This will give you an overview of the distribution of reads per cell and insight into what filtering cutoffs to use. It also gives you a rough estimate of how many cells yield good data.

You can make a similar histogram for genes/cells, telling you how complex your dataset is.

Third, a histogram for the percentage of mitochondrial reads per cell is indicative of how many stressed or bad quality cells there are*.

If you do this for all your samples, you will have an idea about the general quality of your dataset and if you have enough cells from each sample for your downstream analysis. If all is well, it is time to start answering your biological questions with clustering and differential gene expression analysis.

*High mitochondrial content does not equal bad quality cells for all cell types. This depends on your sample.

Step 3: Downstream analysis with dedicated analysis packages

After gaining some intuition about the QC metrics, it’s time to cluster your data. A helpful way to get started is to look into packages specifically made for single-cell analysis, like Seurat or Scanpy.

These packages are written for R or Python respectively and have a wide set of tools. For example, can normalize your data, do batch effect corrections, filter bad quality cells, and most importantly: perform clustering and differential gene expression analysis.

Before trying these packages, it might be good to first familiarize yourself with the R or Python basics. Great starting points are this concise introduction to R or this Python for beginners starting page.

Next, you can try one of the example vignettes both these packages have. For example this Seurat vignette Once you feel comfortable with the analysis and understand the plots it produces, you are ready to cluster and analyze your data!


Sounds too complex? If you generated your data with us, you could outsource your analysis to our experienced bioinformaticians. Contact us to discuss the options.