How to get started with your own data analysis

There is not one generally accepted and failsafe way to analyze single-cell data. With data analysis, there are many different methods or software solutions. A general way forward is to briefly test several of them in an exploratory effort to find one that works best for your data and biological question.

Broadly speaking, there are three options to do your analysis:

Do your own analysis, by running one of the many single-cell analysis packages that have been written over the past few years with (basic) R or Python.
Download or buy software in which you can load your data to perform downstream analysis and visualization.
Outsource your analysis to a collaborator or company.

Data analysis by yourself

Step 1: Quality control

The first step after receiving your data is quality control. If you do this quickly and thoroughly, you will have an immediate idea about how well your experiment worked at a technical level. If the quality does not look to be sufficient, this is important information as repeating the experiment might be necessary.

Spending time on proper QC before moving on to downstream analysis can save a lot of (wasted) time. A few useful questions: Do you detect the number of cells and read depth you were looking for? Do most cells look good (see filtering below)? Are all samples equally represented in the data?

To answer these, you can load your data in R, Python or other analysis software of choice and look at a few basic metrics:

How many reads were successfully mapped?
How many cells did I detect?
How many genes do I detect per cell?
How many cells do I detect per sample?

Step 2: Determine filter parameters

Here, we explain three plots you can use to determine your filtering parameters. You want to set cutoffs to include only the cells that yield data of good quality.

A great first plot to make is a histogram of the total UMIs per cell. This will give you an overview of the distribution of reads per cell and first insight into what filtering cutoffs to use (and hence also roughly how many cells yield good data).

You can do the same for genes per cell, which will tell you how complex your dataset is. Third, a histogram for the percentage of mitochondrial reads per cell is indicative of how many stressed or bad quality cells there are*.

If you do this for all your samples, you will have a first idea about the general quality of your dataset and if you have enough cells from each sample for your downstream analysis. If all is well, it is time to now start answering your biological questions with clustering and differential gene expression analysis.

*High mitochondrial content does not equal a bad quality cell for all cell types. This depends on your sample.

Step 3: Downstream analysis

A useful way to get started with downstream analysis is to look into packages that are specifically made for single-cell analysis, like Seurat or Scanpy. These packages are written for R or Python respectively and have a wide set of tools. You can normalize your data, do batch effect correction, filter bad quality cells, and most important: clustering and differential gene expression analysis.

Before trying them, it might be good to first familiarize yourself with the R or Python basics (this will pay dividends in the long run!). Great starting points are this very short introduction to R or this Python for beginners starting page.

Next, you can try one of the example vignettes both these packages have (try to repeat the analysis of the often-used PBMC datasets for example). Once you feel comfortable with the analysis and understand the plots it produces, you are ready to cluster and analyze your own data!

Software

Code-free software solutions for single-cell data analysis are developing rapidly, with new and more sophisticated solutions appearing almost every month. Doing the analysis yourself gives you the most freedom to perform the analysis exactly how you want it, but not everyone has the time or expertise to choose this route.

If you don’t want to completely outsource the analysis, these software solutions are a great way of browsing and visualizing the data yourself without resorting to code.

If you would like to hear some recommendations from our staff, feel free to email us at data@scdiscoveries.com

Book a meeting with one of our specialists to discuss your project.

Book a call

Was this article helpful?

Yes No

What can we help you with?

Contact sequencing experts

Let's discuss your research

Single-cell sequencing

Bulk RNA sequencing

Sequencing with NovaSeq X Plus

Fast, high-quality Sequencing Service

Spatial transcriptomics

Visium HD Whole transcriptome spatial discovery at single-cell resolution

Data analysis

Data Consulting as a service

Complementary services

Custom solutions by our R&D team

We keep you ahead of the curve

Services

State-of-the-art RNA solutions

Complementary services

Plate-based

Combinatorial barcoding

Single-cell multiomics

10x Genomics

Complementary services

Contents

How to get started with your own data analysis

Data analysis by yourself

Step 1: Quality control

Step 2: Determine filter parameters

Step 3: Downstream analysis

Software

Book a meeting with one of our specialists to discuss your project.

Was this article helpful?

Related Articles

Let's discuss
your research

Fast, high-quality
Sequencing Service

Visium HD
Whole transcriptome spatial discovery at single-cell resolution

Data Consulting
as a service

State-of-the-art
RNA solutions