How to download your data
When your data has arrived and the preliminary analysis has been performed, you will receive an e-mail from our data team to download your data.
Your data will be sent with our secured Amazon Web Services (AWS) server. There are multiple ways to download:
- Your institute uses AWS as a data portal.
In this case, you can fetch your data directly from our AWS server space to your own AWS server space.
- Your institute is not using AWS
- Download your data by pasting the download link directly in your internet browser
- Download your data via a command-line session of your server cluster
We send parts of the standard cell ranger output, which consists of the files listed below, as well as a package of files and reports from our own preliminary clustering and differential gene expression analysis that was performed on all your samples that have to be compared.
The following files are always included:
- Html reports – Contains .html files for each sample that contain basic QC metrics and clustering info, as provided by the automated cell ranger software.
- Raw_counts – contains the cellranger output (mapped count tables). These are the raw count matrices with all barcodes.
- Filtered_counts – Same as above, but filtered for empty barcodes (this is used for downstream analaysis).
- Metrics – Contains the QC metrics from the html files in .csv format
- Clustering– Contains a preliminary clustering and differential gene expression analysis we have done with Seurat. You can find the description of the folders and files in our Seurat manual.
- cloupe_file – contains the cloupe files needed to load the data into the 10x Loupe cell browser.
- H5 files – Contain the file structure information as generated by cell ranger.
- Fastq – contains the raw fastq files
.bam files – If you want to receive .bam files as resulting from cell ranger, this can be included in the data transfer as well.
When performing a mapping assembly, the generated data will be ‘mapped’ against the reference genome. The genetic information of your species of interest is stored in the reference genome, e.g., cell types and positions of genes on the chromosome. During this procedure, the reference genome is scanned by the algorithm for the perfect spot to map a read. When the algorithm finds a match between your reads and the reference genome, this additional biological information within your data will be saved. This provides you information about the presence or absence of transcripts of specific genes within your data.
The mapping is most efficient when the mapping software indexes the genome. Two widely used methods to perform the mapping procedure are Spliced Transcripts Alignment to a Reference (STAR) and Burrows-Wheeler Aligner (BWA). The big difference between these two methods is the extra information from STAR about the spliced and unspliced transcripts in your data. Because of this additional information, we map 10x Genomics with STAR.
How to get started with your data analysis
There is not one generally accepted and failsafe way to analyze single-cell data. With data analysis, there are many different methods or software solutions. A general way forward is to briefly test several of them in an exploratory effort to find one that works best for your data and biological question.
Broadly speaking, there are three options to do your analysis:
- Do your own analysis, by running one of the many single-cell analysis packages that have been written over the past few years with (basic) R or Python.
- Download or buy software in which you can load your data to perform downstream analysis and visualization.
- Outsource your analysis to a collaborator or company.
Data analysis by yourself
Step 1: Quality control
The first step after receiving your data is quality control. If you do this quickly and thoroughly, you will have an immediate idea about how well your experiment worked at a technical level. If the quality does not look to be sufficient, this is important information as repeating the experiment might be necessary.
Spending time on proper QC before moving on to downstream analysis can save a lot of (wasted) time. A few useful questions: Do you detect the number of cells and read depth you were looking for? Do most cells look good (see filtering below)? Are all samples equally represented in the data?
To answer these, you can load your data in R, Python or other analysis software of choice and look at a few basic metrics:
- How many reads were successfully mapped?
- How many cells did I detect?
- How many genes do I detect per cell?
- How many cells do I detect per sample?
Step 2: Determine filter parameters
Here, we explain three plots you can use to determine your filtering parameters. You want to set cutoffs to include only the cells that yield data of good quality.
A great first plot to make is a histogram of the total UMIs per cell. This will give you an overview of the distribution of reads per cell and first insight into what filtering cutoffs to use (and hence also roughly how many cells yield good data).
You can do the same for genes per cell, which will tell you how complex your dataset is. Third, a histogram for the percentage of mitochondrial reads per cell is indicative of how many stressed or bad quality cells there are*.
If you do this for all your samples, you will have a first idea about the general quality of your dataset and if you have enough cells from each sample for your downstream analysis. If all is well, it is time to now start answering your biological questions with clustering and differential gene expression analysis.
*High mitochondrial content does not equal a bad quality cell for all cell types. This depends on your sample.
Step 3: Downstream analysis
A useful way to get started with downstream analysis is to look into packages that are specifically made for single-cell analysis, like Seurat or Scanpy. These packages are written for R or Python respectively and have a wide set of tools. You can normalize your data, do batch effect correction, filter bad quality cells, and most important: clustering and differential gene expression analysis.
Before trying them, it might be good to first familiarize yourself with the R or Python basics (this will pay dividends in the long run!). Great starting points are this very short introduction to R or this Python for beginners starting page.
Next, you can try one of the example vignettes both these packages have (try to repeat the analysis of the often-used PBMC datasets for example). Once you feel comfortable with the analysis and understand the plots it produces, you are ready to cluster and analyze your own data!
Code-free software solutions for single-cell data analysis are developing rapidly, with new and more sophisticated solutions appearing almost every month. Doing the analysis yourself gives you the most freedom to perform the analysis exactly how you want it, but not everyone has the time or expertise to choose this route.
For your 10x Genomics data, you can use their Loupe browser to do clustering and differential gene expression analysis. It does only allow you to analyze one sample at a time. For comparisons between multiple samples, or for data generated on other platforms, different software is needed.
If you don’t want to completely outsource the analysis, these software solutions are a great way of browsing and visualizing the data yourself without resorting to code.
If you would like to hear some recommendations from our staff, feel free to email us at email@example.com
Please mention Single Cell Discoveries in your acknowledgments if we helped you with single-cell sequencing experiments and/or data analysis. We always love to know when our clients have published data generated by our services, so please also let us know if your paper will come online, so we can help promote it. Sample text for acknowledgments:
“We thank Single Cell Discoveries for their help with project design, single-cell sequencing services and data analysis”
Materials & Methods Single Cell Gene Expression
This sample text is for both the Single Cell Gene Expression solution. Please adjust the highlighted text and other details where necessary.
Single-cell mRNA sequencing was performed at Single Cell Discoveries (single-cell sequencing service, the Netherlands) according to standard 10x Genomics 5’ or 3’ V3.1 chemistry. In short, single-cell suspensions were [methanol fixed in 80% methanol and frozen at -80°C. Prior to loading the cells on the 10x Chromium controller, cells were rehydrated in rehydration buffer following vendor instructions1.] OR [Cryopreserved following X protocol, stored at -80°C and thawed according to vendor instructions2.]
Cells were then filtered to prevent clumping and counted to assess cell integrity and concentration. Cells were loaded and the resulting sequencing libraries were prepared following standard 10x Genomics protocol. The DNA libraries were paired-end sequenced on an Illumina Novaseq S4, with a 2×150 bp Illumina kit.
BCL files resulting from sequencing were transformed to FASTQ files with 10x Genomics Cell Ranger mkfastq. FASTQ files were mapped with Cell Ranger count. During sequencing, Read 1 was assigned 28 base pairs and was used for identification of the Illumina library barcode, cell barcode, and UMI. R2 was used to map to the reference transcriptome [Hg38 CRCh38 3.0.0]. Filtering of empty barcodes was done with Cell Ranger. Unsupervised clustering and differential gene expression analysis was done with the Seurat1 R toolkit (Butler et al. 2018)
- Butler et al, 2019. PMID: 29608179
Materials & Methods Single Cell Immune Profiling
Single-cell mRNA sequencing was performed at Single Cell Discoveries (single-cell sequencing service, the Netherlands) on a 10x genomics chromium controller, following the 10x genomics v2 single-cell VDJ Next GEM chemistry. In short, single-cell suspensions were cryopreserved following X protocol, stored at -80°C, and thawed according to vendor instructions2. Cells were then filtered to prevent clumping and counted to assess cell integrity and concentration. Cells were loaded and the resulting sequencing libraries were prepared following standard 10x Genomics protocols, generating a transcriptome and a V(D)J library from each experiment. The DNA libraries were paired-end sequenced on an Illumina Novaseq S4, with a 2×150 bp Illumina kit.
BCL files resulting from sequencing were transformed to FASTQ files with 10x Genomics Cellranger mkfastq. FASTQ files were mapped with Cellranger vdj. During sequencing, Read 1 was assigned 28 base pairs and was used for identification of the Illumina library barcode, cell barcode, and UMI. R2 was used to map to the reference transcriptome [Hg38 CRCh38 3.0.0]. Filtering of empty barcodes was done with Cell Ranger. Unsupervised clustering and differential gene expression analysis was done with the Seurat R toolkit (Butler et al. 2018).