Our data analysis

How to download your data

When your data has arrived and the preliminary analysis has been performed, you will receive an e-mail from our data team to download your data.

Your data will be sent with our secured Amazon Web Services (AWS) server. There are multiple ways to download:

  • Your institute uses AWS as a data portal.
    In this case, you can fetch your data directly from our AWS server space to your own AWS server space.
  • Your institute is not using AWS
    • Download your data by pasting the download link directly in your internet browser
    • Download your data via a command-line session of your server cluster

Data files

Data files | 10x Genomics

We send parts of the standard cell ranger output, which consists of the files listed below, as well as a package of files and reports from our own preliminary clustering and differential gene expression analysis that was performed on all your samples that have to be compared.

The following files are always included:

  • Html reports – Contains .html files for each sample that contain basic QC metrics and clustering info, as provided by the automated cell ranger software.
  • Raw_counts – contains the cellranger output (mapped count tables). These are the raw count matrices with all barcodes.
  • Filtered_counts – Same as above, but filtered for empty barcodes (this is used for downstream analaysis).
  • Metrics – Contains the QC metrics from the html files in .csv format
  • Clustering– Contains a preliminary clustering and differential gene expression analysis we have done with Seurat. You can find the description of the folders and files in our Seurat manual.
  • cloupe_file – contains the cloupe files needed to load the data into the 10x Loupe cell browser.
  • H5 files – Contain the file structure information as generated by cell ranger.
  • Fastq – contains the raw fastq files

Optional:

.bam files – If you want to receive .bam files as resulting from cell ranger, this can be included in the data transfer as well.

Data files | SORT-seq

The diagnostics file contains the QC and diagnostic plots for each plate. Here you can check how well each cell worked (endogenous reads, UMIs and genes per cell) and how well the technical handling of the plate worked (ERCC spike in reads). It also includes a plot to estimate oversequencing, by looking at how many times each molecule was sequenced (by comparing UMI corrected with raw reads). This tells us if enough sequencing depth was assigned to the sample. It also reports the most highly expressed genes and the most variable genes in the library. You can see an example of such a diagnostic plot here.


The Counts folder contains count tables. These come in 3 flavors:

  • read counts (raw mapped reads)
  • barcode counts (UMI corrected version of 1)
  • transcript counts (poisson counting statistics corrected version of 2) This is the file we use as input for downstream analysis since it comes closest to the real situation in the cell

Each of these threesomes of files is then also split into three separate kinds of mapped reads:

  • Exonic reads – reads found in exons
  • Intronic reads – reads found in introns
  • Total reads – combination of both intronic and exonic reads

It is up to you to decide which one you are most interested in. For clustering and downstream analysis, we usually use the transcript counts of the exonic reads.


Fastq contains the raw sequencing FAST files. These typically are a set of 8 files: two reads (Read 1 and Read 2) from each of the four lanes of the Nextseq500. R1 and R2 indicate the read type. L00X indicate the lane. To map, we concatenate all read 1 / read 2 files into two files, and use this as input for mapping with STAR.

The clustering folder contains a preliminary clustering analysis done with Seurat.

Data files | Bulk RNA sequencing

Diagnostics file contains basic QC and diagnostic plots for your samples. It contains the following information:

  • Reads per sample – Shows the total number of mapped reads for all samples.
  • Genes per sample – Shows the total number of genes detected for all samples.
  • Correlation heatmap – This shows a correlation heatmap indicating how similar samples are to each other (Red = similar, blue = different). Samples are clustered by unsupervised clustering based on their similarity.

The Counts folder contains a count table with the expression matrix of your data. Rows are genes and columns are samples.

The *_raw file contains the raw, not normalized reads

Fastq contains the raw sequencing FAST files for all samples.

Data files | VASA-seq

The diagnostics file contains the QC and diagnostic plots for each plate. Here you can check how well each cell worked (endogenous reads, UMIs and genes per cell) and how well the technical handling of the plate worked (ERCC spike in reads). It also includes a plot to estimate oversequencing, by looking at how many times each molecule was sequenced (by comparing UMI corrected with raw reads). This tells us if enough sequencing depth was assigned to the sample. It also reports the most highly expressed genes and the most variable genes in the library. You can see an example of such a diagnostic plot here.


The Counts folder contains count tables. These come in 3 flavors:

  • read counts (raw mapped reads)
  • barcode counts (UMI corrected version of 1)
  • transcript counts (poisson counting statistics corrected version of 2) This is the file we use as input for downstream analysis since it comes closest to the real situation in the cell

Each of these threesomes of files is then also split into three separate kinds of mapped reads:

  • Exonic reads – reads found in exons
  • Intronic reads – reads found in introns
  • Total reads – combination of both intronic and exonic reads

It is up to you to decide which one you are most interested in. For clustering and downstream analysis, we usually use the transcript counts of the exonic reads.


Fastq contains the raw sequencing FAST files. These typically are a set of 8 files: two reads (Read 1 and Read 2) from each of the four lanes of the Nextseq500. R1 and R2 indicate the read type. L00X indicate the lane. To map, we concatenate all read 1 / read 2 files into two files, and use this as input for mapping with STAR.

The clustering folder contains a preliminary clustering analysis done with Seurat.

Data mapping

When performing a mapping assembly, the generated data will be ‘mapped’ against the reference genome. The genetic information of your species of interest is stored in the reference genome, e.g., cell types and positions of genes on the chromosome. During this procedure, the reference genome is scanned by the algorithm for the perfect spot to map a read. When the algorithm finds a match between your reads and the reference genome, this additional biological information within your data will be saved. This provides you with information about the presence or absence of transcripts of specific genes within your data.

The mapping is most efficient when the mapping software indexes the genome. Two widely used methods to perform the mapping procedure are Spliced Transcripts Alignment to a Reference (STAR) and Burrows-Wheeler Aligner (BWA). The big difference between these two methods is the extra information from STAR about the spliced and unspliced transcripts in your data.

Updated on April 7, 2022

Was this article helpful?

Related Articles

We use cookies to collect information about your visit to improve our website. Please see our cookies page for further details or click the 'Accept' button to agree.

Cookie settings

Below you can choose which kind of cookies you allow on this website. Click on the "Save cookie settings" button to save your choice.

FunctionalOur website uses functional cookies. These cookies are necessary to let our website work.

AnalyticalOur website uses analytical cookies to make it possible to analyze our website and optimize for the purpose of a.o. the usability.

Social mediaOur website places social media cookies to show you 3rd party content like YouTube and FaceBook. These cookies may track your personal data.