On September 9, Nucleic Acids Research reported the latest findings of the research team led by Prof. FAN Xiaohui at the Zhejiang University College of Pharmaceutical Sciences and Prof. CHEN Huajun at the Zhejiang University College of Computer Science and Technology—scDeepSort: a pre-trained cell-type annotation method for single-cell transcriptomics using deep learning with a weighted graph neural network. With cells and genes treated as the nodes of the graph neural network and a supervised deep learning model applied to single-cell RNA sequencing (scRNA-seq) data, researchers can make predictions about new datasets, thus providing a novel solution to precise cell-type annotation in scRNA-seq data.
General conceptual framework and validation of scDeepSort
Cell-type annotation is a vital step in the data processing protocols of scRNA-seq experiments. Cell-type identification is commonly performed by mapping differentially expressed genes at the level of pre-computed clusters with prior knowledge of cell markers. Another cell-based annotation strategy compares the similarities between a single cell and a reference database of bulk or single-cell RNA-seq profiles to determine potential cellular identities. Nevertheless, such methods rely heavily on references and the accurate cell-type annotation for single-cell transcriptomic data remains an enormous challenge.
The weighted GNN algorithm of scDeepSort
Researchers from Zhejiang University adopted a modified version of the GraphSAGE information processing framework and constructed a graph neural network using cells and genes called scDeepSort. The algorithm of scDeepSort consists of three components: the embedding layer, the weighted graph aggregator layer, and the linear classifier layer. The embedding layer, namely the input layer, generates vectors for graph nodes, i.e. cells and genes. The weighted graph aggregator layer gathers information about the neighborhood and itself for each subgraph, which produces a new representation for each node. The linear classifier layer categorizes the final cell state representation as a predefined cell type. Experiments revealed that scDeepSort outperformed other existing methods in annotating 76 external test datasets, reaching an accuracy rate of 83.79% across 265,489 cells in humans and mice. Moreover, researchers also demonstrated the universality of scDeepSort using more challenging datasets and references from different scRNA-seq technology.
“scDeepSort is the first attempt to annotate cell types of scRNA-seq data with a pre-trained GNN model, which can realize the accurate cell-type annotation without additional references,” Prof. Fan said.