CITE-seq: single-cell RNA sequencing plus surface protein analysis

CITE-seq is a method to simultaneously perform single-cell RNA sequencing and single-cell surface protein analysis. But how does this work? And what does it accomplish?

Single-cell RNA sequencing (often abbreviated to scRNA-seq) is a powerful technology that allows researchers to measure the gene expression of individual cells. This has provided valuable insights into how gene expression varies across different cells in a tissue and has contributed to our understanding of many biological processes.

CITE-seq has been developed to allow simultaneous measurement of individual cells’ gene expression and the proteins they express on the cell surface. Because cell-surface proteins are often useful markers for cell identity or cell state, this compatible technology can expand the power of single-cell analysis.

In this blog, we will discuss the benefits of this technology and how researchers use it. Here are the topics we cover:

  1. What is CITE-seq?
  2. What does CITE-seq stand for?
  3. How does it work?
  4. TotalSeq: how Single Cell Discoveries performs CITE-seq.
  5. What are its advantages?
  6. What are its limitations?
  7. How is it applied?

What is CITE-seq?

CITE-seq is a method of labeling cell-surface proteins with uniquely barcoded antibodies, which can then be captured and quantified along with the RNA molecules in single-cell RNA sequencing. As a result, CITE-seq data contains information on the transcriptome and cell surface proteins.

What does it stand for?

CITE-seq stands for Cellular Indexing of Transcriptomes and Epitopes by Sequencing. You can understand the term as follows:

  • Cellular Indexing in this context means adding a cell-specific barcode to both the RNA fragments and cell surface proteins;
  • Transcriptomes are one of the indexed datasets; it is the gene expression data, also known as the RNA profile;
  • Epitopes in this context mean the cell-surface proteins, named as such because antibodies recognize them —this is the second dataset;
  • Sequencing is the detection method, i.e., the reading of the DNA barcodes that identify transcript and epitope.

How does it work, exactly?

CITE-seq combines the process of single-cell RNA sequencing with cell-surface protein detection. We will explain both technologies.

Single-cell RNA sequencing

CITE-seq is compatible with 10x Genomics technology. This high-throughput single-cell RNA sequencing technology uses a microfluidic system based on an oil-water emulsion. How does this work?

First, tissues need to be dissociated into a single-cell suspension by a tissue-specific protocol. Then, inside a Chromium X or comparable machine, the cells are partitioned into water droplets with reagents and barcoded gel beads. These barcoded gel beads are covered with oligonucleotides: short stretches of nucleotides.

Each oligo contains two barcoding elements and one RNA-binding element. This latter part is a poly(T)-stretch which binds the mRNA. One barcoding element is a unique barcode for each bead. This is tagged onto the mRNA and makes it traceable to the cell it came from. Another part of the oligo uniquely barcodes each single RNA molecule. It is called the unique molecular identifier, a.k.a. UMI, and is important to quantify gene expression accurately.

The filled water-droplets-in-oil are called GEMs, and each GEM should contain one cell, one barcoded gel bead, and reagents. In each GEM, the reagents cause the cells to release the RNA, and the mRNA molecules bind to the barcoded gel beads. Then, the GEMs are collected, and a process of several steps ultimately creates copies of barcoded DNA fragments.

Each fragment includes both unique barcodes from the gel bead, plus a part of the mRNA code, so each fragment contains all information necessary to link an mRNA molecule to its specific gene and cell. After next-generation sequencing and data analysis, you’ve established a transcriptomic profile of the single cells in the tissue.

Cell-surface protein detection

The protein detection part is based on antibody technology. CITE-seq uses antibodies specifically manufactured to bind to a cell-surface protein of interest. Say that the protein of interest is CD4, a surface marker for T-helper cells. You can add anti-CD4 antibodies to the cell mix so they will bind to all cells that express CD4. Then, you wash all unbound antibodies away.

What makes the two technologies compatible

The antibodies are connected to a unique oligo compatible with the 10x Genomics platform. In our example, the oligo contains a barcode specific to the anti-CD4 antibodies. This is necessary to trace the protein back to the cell it was on. The oligo also includes a poly(A)-stretch that can bind to the gel beads. This component allows for the simultaneous detection of protein and RNA in one experiment.

Because the antibodies attach to the individual cells, they end up together in one GEM. Here, they get tagged with the cellular barcode and unique molecular identifier from the gel beads. The oligos separate at the right moment and are captured and quantified along with the mRNA molecules.

In the sequencing data, the specific CITE-seq barcodes, also called antibody-derived tags (ADT), identify the antibody bound to the cell and infer the cell-surface protein present. In our example, this makes it possible to quantify the CD4 molecules of each cell.

CITE-seq explained
Schematic figure of (A) the CITE-seq antibody linked with the barcoded oligo (B) the CITE-seq protocol. Credit: Winston & Gregory Timp (2020), license: CC-BY 4.0

Simultaneous quantification of multiple proteins

You can combine multiple different antibodies in one experiment. After adding a cocktail of different barcoded antibodies to a sample, the antibodies can bind to their corresponding epitopes. All unbound antibodies are washed away to not interfere with the technology. This is often necessary in immunology, where two or more surface proteins may characterize cell subtypes and their activation states.

How do we perform CITE-seq?

Single Cell Discoveries uses a version of CITE-seq called TotalSeq, which uses antibody-oligonucleotide conjugates from BioLegend, a global antibody manufacturer.

We can combine TotalSeq with the 10x Genomics Single Cell Gene Expression solution and Single Cell Immune Profiling solution.

The Single Cell Gene Expression solution provides users with 3’ RNA expression at single-cell resolution. This high-throughput sequencing solution is highly suitable for larger projects that require data for a few thousand cells per sample.

The Single Cell Immune Profiling solution provides users with 5’ transcriptome analysis combined with the T- and B-cell repertoire and antigen specificity. The objective is to sequence the 5′ end of mRNA molecules because, in T and B cells, this stretch contains the variable region of the T- and B-cell receptor that makes the receptor recognize specific antigens. With this solution, you can explore immune cell diversity or characterize the immune response to infection or treatment.

TotalSeq-B vs. TotalSeq-C

Both solutions require antibodies in which the oligos are slightly different. We combine TotalSeq-B with the 10x Genomics Single Cell Gene Expression solution and TotalSeq-C with the 10x Genomics Single Cell Immune Profiling solution.

TotalSeq-B antibody oligos can attach to the oligos present on the Single Cell Gene Expression gel beads. In contrast to CITE-seq, they adhere to separate oligos from the ones to which RNA molecules attach (see figure). The antibodies bind to a nucleotide stretch on these oligos called CaptureSeq1.

TotalSeq-C antibody oligos can attach to the Single Cell Immune Profiling gel beads. Here, the 5′ end of the RNA molecules and the antibody oligos attach to the same gel bead oligo (see figure). They both bind to a nucleotide stretch on these oligos called TSO.

How Single Cell Discoveries performs CITE-seq: Barcoded TotalSeq-B antibodies are compatible with the 10x Genomics Single Cell Gene Expression solution. Antibody oligos bind to the Capture Seq 1-part of an oligo connected to the 10x Genomics gel bead, which is an oligo seperate from the poly(dT)-containing oligo that binds mRNA. Barcoded TotalSeq-C antibodies are compatible with the 10x Genomics Single Cell Immune Profiling solution. Antibody oligos bind to the TSO-part of an oligo connected to the 10x Genomics gel bead.
TotalSeq-B uses barcoded antibodies whose oligos bind the Capture Seq 1–sequence on the 10x Genomics Single Cell Gene Expression gel bead oligo. This oligo is separate from the poly(dT)–containing oligos that bind mRNA. TotalSeq-C uses barcoded antibodies whose oligos bind the TSO-sequence on the 10x Genomics Single Cell Immune Profiling gel bead oligo, together with the mRNA.
Credits: 10x Genomics 3′ and 5′ protocol, and BioLegend.

Antibody cocktails

We can provide one of two antibody mixes for both solutions.

  1. The human TBNK cocktail. This mix contains antibodies specific for the nine surface proteins CD19, CD3, CD16, CD4, CD11c, CD56 (NCAM), CD14, CD8, and CD45. With this cocktail, you can characterize immune cell subtypes such as T cell subsets, B cells, natural killer cells, antigen-presenting cells, and dendritic cells.
  2. The Human Universal Cocktail. This mix contains antibodies specific to 130–134 different epitopes (find out which for TotalSeq-B or TotalSeq-C). It helps characterize major immune cell lineages and selected cell states.

What are the advantages of CITE-seq?

Combining single-cell RNA sequencing with cell surface protein analysis brings a lot to the table.

The major advantage of CITE-seq is that it allows researchers to directly link a cell’s RNA profile with its cell-surface protein expression. This is important because proteins are the central functional units of the cell and are responsible for carrying out many of the cellular processes. Yet, in some cases, a surface protein abundant in a cell population is only found in the transcriptome of a fraction of the cells (see Peterson et al., 2017). Researchers can bypass the discrepancy by combining both measurements.

In some cases, cell subsets are difficult to distinguish by their RNA profile alone. The multimodal data of transcriptome plus cell-surface proteins can then help identify subsets otherwise missed. For example, in the proof-of-concept study by Stoeckius et al. (2017), the researchers could identify natural killer cell subsets that were at the time difficult to pull apart in clusters based on the RNA-seq data. More recently, Ben-Chetrit et al. (2023) applied a novel spatial variant of CITE-seq on a mouse breast tumor and demonstrated they could distinguish between two cell states by their protein markers: tumor-fighting and immune-suppressive macrophages.

Because CITE-seq combines transcriptome and cell surface proteome analysis in one experiment, researchers can understand how these two layers of information interact at the single-cell level. As a result, researchers can gain a more complete understanding of the biological processes occurring within a cell population during health or disease.

This is particularly significant for pharmaceutical purposes, as many cell-surface proteins serve as diagnostic biomarkers or therapeutic targets. For example, the majority of drug targets are surface proteins. They comprise approximately two-thirds of the targets for approved human drugs currently listed in the DrugBank database (also see this paper). This is what makes CITE-seq (or alternative approaches) an asset to drug development.

What are the limitations of CITE-seq?

As with any technology, CITE-seq has certain technical limitations.

CITE-seq and TotalSeq are limited to tagging surface proteins. Access to epitopes inside a cell is not possible because it would require puncturing the cell surface membrane. That would release the cell’s RNA content, making the sample incompatible with current RNA sequencing technology.

There is no barcoded antibody for every surface protein, so the application of CITE-seq is limited to the manufactured antibodies available. For example, there are more barcoded antibodies for human or mouse epitopes than for other species. Moreover, measuring surface proteins outside of those targeted by universal antibody cocktails will be inaccessible for some researchers because labeling antibodies can become very expensive. New antibodies and antibody cocktails will become accessible as the field develops.

Surface proteins may change during experimental steps. For example, dissociating cells with enzymes that digest proteins may also affect surface protein analysis. Also, an antibody’s binding to a surface protein could activate a cell’s signaling pathway and thereby change the cell’s transcriptome. So, experienced users recommend fixing cells on ice and testing the effect of dissociation and antibody binding on a specific tissue.

The reliability of CITE-seq results may depend on the specificity with which antibodies bind to an epitope. There’s also an increased risk of cross-reactivity in mixes of multiple antibodies.

An alternative approach for simultaneous cell-surface protein analysis with single-cell RNA sequencing is combining flow cytometry (FACS) with plate-based sequencing methods like SORT-seq or VASA-seq. When should you choose this method over CITE-seq? Generally, what you lose in volume and speed, you gain in accuracy and modal flexibility. But if you’re interested in performing simultaneous transcriptome and surface protein analysis on a specific sample, you should always reach out for a consultation.

Advantages and limitations of CITE-seq. Advantages: Directly link a cell's RNA profile with its surface proteins, study samples at single-cell resolution, and at the high throughput of thousands of cells. With CITE-seq, you can identify cell subsets difficult to identify with single-cell RNA sequencing alone, and you can combine long-standing knowledge of surface protein analysis with ever more complete RNA-seq data. Finally, CITE-seq allows profiling multiple surface proteins simultaneously. Limitations: the number of targetable surface proteins is limited by antibody availability, an optimized protocol is required to prevent transcriptome changes due to pathway activation or protein changes due to antibody binding, and mixing antibodies can lower accuracy due to cross-reactivity.

How researchers apply CITE-seq

One of the key applications of CITE-seq is studying immune cells. The immune system comprises a diverse array of cell types, each of which plays a unique role in the body’s defense against infection and disease.

Using CITE-seq, researchers can measure individual immune cells’ gene and protein expression, providing valuable insights into how these cells function and respond to different stimuli.

Decades of research with fluorescent antibodies and flow cytometry assays have created an extensive knowledge base linking cell-surface proteins with immune cell type and function. This knowledge can be employed with single-cell RNA sequencing to better identify subsets and activation states of immune cells in a tissue. For example, a group of scientists from Amsterdam UMC used CITE-seq to study which immune cells reacted to SARS-CoV-2 in people unexposed to either coronavirus or -vaccine (read about it here).

3D reconstruction and electron microscopic images of three antibodies that target the stable version of the spike protein.
Mathieu Claireaux et al. used CITE-seq to study which immune cells reacted to SARS-CoV-2 in people unexposed to either coronavirus or -vaccine. Credit: Claireaux et al. (2022)

High-throughput sequencing makes it possible to analyze large groups of (immune) cells all at once. For example, one group of scientists performed CITE-seq on an entire atherosclerotic plaque to study T-cell heterogeneity. This enables the study of the immune system in greater detail and can help researchers develop better therapies for treating immunological diseases and improving the body’s immune response.

Another area where CITE-seq is being used is in the study of cancer. Tumors are heterogeneous tissue that pools of different immune cells often infiltrate. Multimodal information on cancer and immune cells can improve our understanding of how the tumor immune microenvironment progresses and help researchers develop more effective cancer treatments.

Concluding remarks

In all, you can view CITE-seq (and, equally, TotalSeq) as a valuable expansion of single-cell RNA sequencing. As a researcher, it allows you to combine decades of knowledge on cell-surface protein analysis with exceedingly complete single-cell transcriptome data. It is applied in immunology, oncology, and other fields for fundamental and translational questions.

Further reading

  • CITE-seq was developed by Marlon Stoeckius and his colleagues from New York Genome Center in collaboration with the Satija lab. In the proof-of-concept study published in 2017, the researchers monitored ten surface proteins, together with the RNA profiles, of 8,000 cells. You can read it here or watch a lecture by Stoeckius here.
  • An update called Cell Hashing makes it possible to combine different samples into one experiment and separate them during data analysis. You can find the scientific article explaining this technology here.
  • Two years later, the same center developed an updated technology called ECCITE-seq. They adapted it to fit with other 10x Genomics technologies that enable immune profiling—read about it here.
  • In 2023, a group of researchers led by Marlon Stoeckius and Dan Landau published a spatial variant of CITE-seq. It combines spatial transcriptomics with surface protein analysis—find the article here.

Other articles