Data analysis

How to download your data

When your data has arrived and the preliminary analysis has been performed, you will receive an e-mail from our data team to download your data.

Your data will be sent with our secured Amazon Web Services (AWS) server. There are multiple ways to download:

  • Your institute uses AWS as a data portal.
    In this case, you can fetch your data directly from our AWS server space to your own AWS server space.
  • Your institute is not using AWS
    • Download your data by pasting the download link directly in your internet browser
    • Download your data via a command-line session of your server cluster

Data files

Diagnostics file contains basic QC and diagnostic plots for your samples. It contains the following information:

  • Reads per sample – Shows the total number of mapped reads for all samples.
  • Genes per sample – Shows the total number of genes detected for all samples.
  • Correlation heatmap – This shows a correlation heatmap indicating how similar samples are to each other (Red = similar, blue = different). Samples are clustered by unsupervised clustering based on their similarity.

The Counts folder contains a count table with the expression matrix of your data. Rows are genes and columns are samples.

The *_raw file contains the raw, not normalized reads

The *_normalized file contains the RPM normalized reads. We do RPM, not RPKM normalization because the library preparation was done based on CEL-seq, which is a 3’ based method. Therefore, normalization should not take the full length of the transcript into account.

Fastq contains the raw sequencing FAST files for all samples.

Mapping data

When performing a mapping assembly, the generated data will be ‘mapped’ against the reference genome. The genetic information of your species of interest is stored in the reference genome, e.g., cell types and positions of genes on the chromosome. During this procedure, the reference genome is scanned by the algorithm for the perfect spot to map a read. When the algorithm finds a match between your reads and the reference genome, this additional biological information within your data will be saved. This provides you information about the presence or absence of transcripts of specific genes within your data.

The mapping is most efficient when the mapping software indexes the genome. Two widely used methods to perform the mapping procedure are Spliced Transcripts Alignment to a Reference (STAR) and Burrows-Wheeler Aligner (BWA). The big difference between these two methods is the extra information from STAR about the spliced and unspliced transcripts in your data. Because of this additional information, SCD will switch from BWA to STAR shortly for bulk RNA sequencing



Please mention Single Cell Discoveries in your acknowledgements if we helped you with single-cell sequencing experiments and/or data analysis. We always love to know when our clients have published data generated by our services, so please also let us know if your paper will come online, so we can help promote it. Sample text for acknoledgements:

“We thank Single Cell Discoveries for their help with project design, single-cell sequencing services, and data analysis”

Thank you!

Materials & Methods

Sequencing was performed at Single Cell Discoveries, a sequencing service provider located in the Netherlands using an adapted version of the CEL-seq protocol. In brief:

Total RNA was extracted using the standard TRIzol (Invitrogen) protocol* and used for library preparation and sequencing. mRNA was processed as described previously, following an adapted version of the single-cell mRNA seq protocol of CEL-Seq [Hashimshony et al. 2012; PMID: 22939981 and Simini et al, 2014; PMID: 25500896]. In brief, samples were barcoded with CEL-seq primers during reverse transcription and pooled after second strand synthesis. The resulting cDNA was amplified with an overnight In vitro transcription reaction. From this amplified RNA, sequencing libraries were prepared with Illumina Truseq small RNA primers. Paired-end sequencing was performed on the Illumina Nextseq500 platform. Read 1 was used to identify the Illumina library index and CEL-Seq sample barcode. Read 2 was aligned to the hg38 human reference transcriptome using BWA [Li and Durbin, 2010; PMID 20080505]. Reads that mapped equally well to multiple locations were discarded. Mapping and generation of count tables were done using the MapAndGo script1. Samples were normalized using RPM normalization.


Updated on May 20, 2021

Was this article helpful?

Related Articles

We use cookies to collect information about your visit to improve our website. Please see our cookies page for further details or click the 'Accept' button to agree.

Cookie settings

Below you can choose which kind of cookies you allow on this website. Click on the "Save cookie settings" button to save your choice.

FunctionalOur website uses functional cookies. These cookies are necessary to let our website work.

AnalyticalOur website uses analytical cookies to make it possible to analyze our website and optimize for the purpose of a.o. the usability.

Social mediaOur website places social media cookies to show you 3rd party content like YouTube and FaceBook. These cookies may track your personal data.