Introduction
Myeloid cells play vital roles in the health and disease of the central nervous system (CNS).1,2 The cell composition of myeloid cells in CNS mainly includes microglia, monocytes, macrophages, dendritic cells, and granulocytes.3 In the healthy CNS parenchyma, monocytes, and granulocytes are absent. They are rather localized in the leptomeninges.3 However, in CNS pathologies, various myeloid cells, such as microglia, monocytes, macrophages, dendritic cells, and granulocytes, can appear and be active in the pathological CNS parenchyma.3 Although there have been many studies on these cells, how to clearly distinguish them is still a difficult problem.
Morphology, immunohistochemistry, and flow cytometry are frequently used to identify these cells.4–6 Morphology mainly relies on conventional staining to identify cells through characteristic morphology under the microscope, which is very subjective. Moreover, their morphologies are very similar under pathological conditions, therefore conventional morphology has been unable to distinguish them.7 Immunohistochemistry and flow cytometry can identify myeloid cells by labeling their markers with a panel of antibodies. Combining these two methods, we can both quantify and locate, which seems to be a perfect scheme. However, in practical application, there are often the same or cross markers among myeloid cells, which seriously affects the accuracy of analysis.8 Therefore, it is necessary to select an effective method to distinguish the myeloid cells in the CNS.
Single-cell RNA sequencing (scRNA-Seq) can sequence thousands of cells at the single-cell level, and then divide the cells into different clusters according to the similarity of gene expression.9 However, it is still difficult to further define these cell clusters because collecting the cell markers is a knotty problem for researchers.10 At present, there are three main methods for cell type identification based on single-cell transcriptome data. First, comparing the upregulated genes with the marker genes in the database, such as CellMarker (http://xteam.xbio.top/CellMarker/ ),10 PanglaoDB (https://panglaodb.se/ ),11 and the Mouse Cell Atlas (http://bis.zju.edu.cn/MCA/gallery.html ),12 and then identify the cell types in combination with their expression. In addition, we can collect marker genes of certain cell types in the literature. Second, the expression profiles of genes in unknown cell clusters and known cell types are used for similarity analysis. If the similarity was high, it would be identified as this kind of cell.13,14 For example, the R package (SingleR) can complete this analysis.15 Third, using the expression profiles of known cell types to construct classifiers as the training sets, and the gene expression profiles of unknown cell clusters are input for classification and identification.13,14 For example, the R package (Garnett) can be used for this analysis.16 Although more and more automatic cell type annotation tools have been developed, it is difficult to ensure that an automatic cell type identification tool is suitable for all cell types.17 Therefore, researchers should select one of the defined results as a reference, and name the corresponding cell clusters in combination with manual annotation and relevant knowledge background. In any case, the specific marker genes are still the basis for defining cell clusters.13,14 Generally, specific marker genes are selected according to the discipline’s background knowledge, literature, and databases. However, distinguishing a variety of myeloid cells in the CNS is not easy, because of the cross and instability of these cell markers.8 For example, adgre1 (F4/80), the established marker for macrophages,18,19 is also expressed in monocytes, microglia, and dendritic cells.20P2ry12 and Tmem119, which are microglia markers, are often downregulated or even negative under the conditions of CNS injury, inflammation, and degeneration.21,23 So, establishing a simple and practical cell type identification method (CTIM) to distinguish these cell populations is of great significance.
Material and methods
Excel template design for CTIM
Based on CellMarker (http://xteam.xbio.top/CellMarker/ ),10 PanglaoDB (https://panglaodb.se/ ),11 Mouse Cell Atlas (http://bis.zju.edu.cn/MCA/gallery.html ), combining with the recent pieces of literature,2–4,6,8,19,23–34 a simple Excel template for CTIM was designed, in which a panel of gene makers corresponding to the myeloid cells, lymphocytes, common CNS cells, and proliferative cells were included (Fig. 1 and Table S1). Here, myeloid cells included monocytes (MNCs), macrophages (MACs), microglia (MG), granulocytes (mainly neutrophils, NEUTs), and dendritic cells (DCs). To minimize the effects of lymphocytes on myeloid cell identities, T, B, and natural killer cell (referred to as NK)-specific gene markers were also listed in the table.
Excel template design for gene markers and expression extraction
To perform the cell identification of a cluster, four Excel sheets: cell definition (Figs. 1 and 2e), cluster data (Fig. 2a), avg_logFC extraction (Fig. 2b and d), and gene extraction (Fig. 2c). In cluster data table, column A was the genes in a cluster, and column B was avg_logFC (average Log2 fold change), it was the ratio of the normalized mean gene counts in each cluster relative to all other clusters for comparison. The reason was that the count, transcripts per million, or fragments per kilobase of exon model per million mapped fragments were usually used, the gene expression value must be non-negative, and the value of fold change must be positive. When gene A expression was lower than gene B, the fold change of B on A was >1, and the fold change of log2 was >0; On the contrary, the fold change of log2 was <0. Based on this, we could display the upregulated (red) or downregulated (green) gene expression with different colors in the Excel template. In some reports, the average value of gene expression was also used. In the avg_logFC extraction table, the data in columns A and B should come from the corresponding columns of the cluster data table, column C extracted genes from column C of the gene extraction table, and column D extracted values from column C using the Excel command: VLOOKUP(Cn, A:B,2,0). In the gene extraction table, the data in column A were the gene markers from column B of the cell definition table, column B was the genes from column A of the avg_logFC extraction table, and column C was extracted values from column A using Excel command: IF(COUNTIF(B:B,An)>0,An,"").
CTIM workflow
The workflow of CTIM included the following steps: (1) Copy columns A and B from the cluster data table, and paste them to the corresponding columns A and B of avg_logFC extraction table; (2) Copy column A from avg_logFC extraction table, and paste it to the column B of gene extraction table, then the extracted genes will be obtained from gene markers (column A); (3) Copy column C from gene extraction table, and paste as values to the column C of avg_logFC extraction table, then the extracted values will be shown in column D; (4) Copy column D from avg_logFC extraction table, and paste as values to any column you like (such as C1, C2, and Cn) in the cell definition table; (5) In cell definition table, the cell identities can be performed by comparing the extracted values (upregulated and downregulated genes are shown as red and green, respectively) to the cell types (column A) and gene markers (column B). Finally, the cell types were identified based on the upregulated markers (Fig. 2).
Data
Normalized and clustered data used in this study were obtained from previous studies.12,35–37 The reason for choosing these data was they could be directly downloaded, which allowed the authors to compare their analysis with the original reports. The data are shown in Table 1 and as an Excel worksheet in Figure 2a.
Data | Mice | Tissue | Single cell | scRNA-Seq | Clustering | Cluster annotation |
---|---|---|---|---|---|---|
Ximerakis et al.35 | C57BL/6J mice (male, 2–3 months of age, and 21–22 months of age) | A total of 8 young and 8 old brains | Dissociated brain | Chromium Single Cell 3′ Chip (10x Genomics), the sequencing was performed on NextSeq 500 instrument (Illumina) | Seurat package (v.2.3) in R (v.3.3.4) | Using multiple cell type-specific/enriched marker genes that have been previously described in the literature (Plac8 for MNC) |
Han et al.12 | Wild-type C57BL/6J mice (SPF, female, 6–10 week-old) | Brain, blood, and bone marrow | Brain was dissociated using accutase; bone marrow was treated red blood cell lysis buffer; blood was treated red blood cell lysis buffer or Ficoll separation | Microwell-Seq, the 3′ ends of the transcripts are then enriched during library generation using PCR and sequenced using the Illumina HiSeq platform | Seurat was used for dimension reduction, clustering, and differential gene expression analysis | Single cell MCA (scMCA) analysis built by authors (Fig. 7A) |
Sankowski et al.36 embj20211 08605-sup-0008-datasetev1 | SPF and GF C57BL/6J mice (mixed sex, 6–10 weeks old) | The brain parenchyma, choroid plexus, leptomeninges, and perivascular space (20 mice per group) | Parenchyma and perivascular space cells were isolated using Percoll gradient. The choroid plexuses and leptomeninges were treated by mechanical dissociation through a 70 micron cell strainer. Viable CD11b+CD45+CD3− B220−Ly6G−cells were FACS-isolated | High-throughput scRNA-Seq using the high-sensitivity method mCEL-Seq2, the sequencing was performed on Illumina HiSeq 3000 sequencing system (pair-end multiplexing run) at a depth of 130,000–200,000 reads per cell | Seurat version 3 | Generating maps for the myeloid cell populations based on published signature genes (Jordao et al.33). Fig. 1B |
Mimouna et al.37 | C57BL/6 mice (mixed sex, 6–10 weeks old) | EAE mouse spinal cord | CNS-infiltrating cells were isolated using Percoll density gradient. F4/80+CD11b+CD45+ cells were sorted using FACS | Chromium Single Cell 3′ Chip (10x Genomics), The sequencing was performed on the Illumina NovaSeq system using a 28-8-98 paired-end cycle | R version 4.0.1 software (R Core Team, 2019), fastMNN implementation, Louvain graph-based community clustering | Cluster-specific markers were searched using the Wilcoxon rank-sum test. An automated cell type assignment was performed with singleR using training sets derived from the Immunological Genome Project database. PanglaoDB was used to identify putative cell and/or activation state for each individual Louvain cluster. The cell type and cell activation state transitions were identified by performing trajectory analysis with slingshot |
Statistical analysis
To test the consistency of this CTIM with previous reports, the identification results were divided into three grades, excellent, satisfactory, and poor (Table 2). Bowker’s test and kappa symmetric measures were used to test the difference and consistency of the paired data between the two groups. For Bowker’s test, p < 0.05 was considered to be a statistically significant difference. For kappa symmetric measures, kappa ≥ 0.75 indicated good consistency, 0.4 ≤ kappa < 0.75 indicated general consistency and kappa < 0.4 indicated poor consistency. Data were analyzed with SPSS software v.26 (IBM Corp., Armonk, NY, USA).
Consistency | Accuracy | Grade |
---|---|---|
Consistent | Both completely accurate | Both excellent (A) |
Both partially accurate | Both satisfactory (B) | |
Neither is accurate | Both poor (C) | |
Nonconsistent | One is completely accurate | Excellent (A) |
One is partially accurate | Satisfactory (B) | |
One is not accurate | Poor (C) |
Results
Descriptive comparison of the CTIM with the literature in CNS myeloid cells
Using the CTIM, CNS myeloid cells in four data sources reported in the literature were identified (Table 1).12,35–37 In supplementary Table 3 of Ximerakis et al.,35 the authors listed the most discriminating genes per cell type. From that table, MNCs, MACs, MG, NEUTs, DCs, neuronal-restricted precursors (NRPs), immature neurons, mature neurons, astrocyte-restricted precursors, astrocytes, oligodendrocyte precursor cells, oligodendrocytes, ependymocytes, and hypendymal cells were chosen as gold standard cells to test the CTIM. As shown in Figure 3, Table 3, and Figure S1, of the 14 cell clusters, MNCs were identified as mixed with a few NEUTs and DCs, and NRPs as proliferative cells. The other 12 cell clusters were completely consistent.
Cluster | Reported cell type | Our cell type | Consistency | Reason |
---|---|---|---|---|
MNC | MNC | MNC (mixed with a few NEUT and DC) | Part | Plac8 is also expressed in NEUT and DC |
MAC | MAC | MAC | Yes | NR |
MG | MG | MG | Yes | NR |
NEUT | NEUT | NEUT | Yes | NR |
DC | DC | DC | Yes | NR |
NRP | NRP | Proliferative cells | NA | Not within the scope of our evaluation. |
ImmN | ImmN | Neuron | Yes | NR |
mNEUR | mNEUR | Neuron | Yes | NR |
ARP | ARP | AST | Yes | NR |
AST | AST | AST | Yes | NR |
OPC | OPC | OPC | Yes | NR |
OL | OL | OL | Yes | NR |
EPC | EPC | Ependymal | Yes | NR |
HypEPC | HypEPC | Ependymal | Yes | NR |
Table 4 shows the results of the comparison of cell types identified in adult mouse brains. Fifteen clusters of adult mouse brains from Han et al.12 were identified. In the 15 cell clusters, pan-GABAergic and Schwann cells were not in the CTIM, the reported cluster 4 (Macrophage_Klf2 high) was mixed with a few MG, and the other 12 cell clusters were completely consistent. The CD11b+CD45+CD3−B220-Ly6G− cells isolated using fluorescence-activated cell sorting from adult mouse brain parenchyma, choroid plexus, leptomeninges, and perivascular space (embj2021108605-sup-0008-datasetev1) by Sankowski et al.36 were compared. As shown in Table 5, in the 17 cell clusters, 14 were completely consistent. The nonconsistent clusters included cluster 15 because it included stromal cells, which was not in our table. The reported cluster 6 (CNS-associated macrophages, CAMs) may have been Kolmer epiplexus cells that are reported to express microglial markers, and cluster 9 (CAMs), genes expressed in MACs were not increased.34
Cluster | Reported cell type | Our cell type | Consistency | Reason |
---|---|---|---|---|
1 | Myelinating oligodendrocyte | OL | Yes | NR |
2 | Microglia | MG | Yes | NR |
3 | Astrocyte_Mfe8 high | AST | Yes | NR |
4 | Macrophage_Klf2 high | MAC/MG | Part | The reported cluster 4 was mixed with a few MG |
5 | Astrocyte_Atp1b2 high | AST | Yes | NR |
6 | Oligodendrocyte precursor cell | OPC | Yes | NR |
7 | Neuron | Neuron | Yes | NR |
8 | Macrophage_Lyz2 high | MAC | Yes | NR |
9 | Astroglial cell (Bergman glia) | AST | Yes | NR |
10 | Pan-GABAergic | Proliferative cells | NA | Not within the scope of our evaluation. |
11 | Astrocyte_Pla2g7 high | AST | Yes | NR |
12 | Schwann cell | Unknown | NA | Not within the scope of our evaluation. |
13 | Granulocyte_Il33 high | NEUT | Yes | NR |
14 | Hypothalamic ependymal cell | Ependymal cells | Yes | NR |
15 | Granulocyte_Ngp high | NEUT | Yes | NR |
Cluster | Reported cell type | Our cell type | Consistency | Reason |
---|---|---|---|---|
C0 | MG | MG | Yes | NR |
C1 | CAMs | MAC | Yes | NR |
C2 | MG | MG | Yes | NR |
C3 | CAMs | MAC | Yes | NR |
C4 | CAMs | MAC | Yes | NR |
C5 | MG | MG | Yes | NR |
C6 | CAMs | MG | No | The expression of typical genes of MAC including Mrc1, Cd163, Lyve1, Pf4, Ms4a7, Stab1, and Cbr2 were not elevated. In contrast, MG-specific markers Hex, Olfml3, and Sparc were significantly elevated. This might be Kolmer perplexes cells that are reported to express “microglial markers” (Van Hove et al., 2019)34 |
.C7 | CAMs | MAC | Yes | NR |
C8 | Ly6clow monocytes | MNC | Yes | NR |
C9 | CAMs | Unknown | NA | The expression of typical genes of MAC including Mrc1, Cd163, Lyve1, Pf4, Ms4a7, Stab1, and Cbr2 were not elevated. The other genes were not within the scope of our evaluation. |
C10 | MG | MG | Yes | NR |
C11 | Ly6chi monocytes | MNC | Yes | NR |
C12 | DCs | DC | Yes | NR |
C13 | CAMs | MAC | Yes | NR |
C14 | Proliferating. cells | Proliferating cells | Yes | NR |
C15 | Stromal cells | Unknown | NA | Not within the scope of our evaluation. |
C16 | Lymphocytes | NK | Yes | NR |
We encountered some thorny problems when analyzing the data of Mimouna et al.37 In that data source, Louvain graph-based community clustering was used to divide the cells into clusters, and PanglaoDB was used to identify putative cell and/or activation state for each individual Louvain cluster. The cell types identified using CTIM are shown in Table 6. Although the results were basically consistent, the cell types were mixed, which indicated that the cell clustering for this data was not perfect.
Cluster | Reported cell type | Our cell type | Consistency | Reason |
---|---|---|---|---|
C1 | MAC/MG/others | MAC/MG/others | Yes | Cell clustering was not ideal. |
C2 | MAC/MG/NEUT | MAC/MG/NEUT | Yes | Cell clustering was not ideal |
C3 | MNC/MAC/MG | MAC/MG/NEUT | Part | Cell clustering was not ideal |
C4 | MAC/MG/NEUT | MAC/MG/NEUT | Yes | Cell clustering was not ideal |
C5 | MNC/MAC | MAC/MG/NEUT | Part | Cell clustering was not ideal |
C6 | NEUT | MAC/MG/NEUT | Part | Cell clustering was not ideal |
C7 | MAC/MG/others | MAC/MG/NEUT | Yes | Cell clustering was not ideal |
C8 | T/others | MAC/MG/NEUT | Part | Cell clustering was not ideal |
C9 | MNC/MAC | MAC/MG/NEUT | Part | Cell clustering was not ideal |
Comparison of the CTIM with the literature in peripheral blood and bone marrow myeloid cells
To test the identification of non-CNS myeloid cells by CTIM, 21 peripheral blood cell clusters and 17 bone marrow cell clusters of adult mice from Han et al.12 were employed. Table 7 shows the peripheral blood results. Of the 21 cell clusters, cluster 14 (Erythroblast_Car2 high), cluster 20 (B cell_Igha high), and cluster 21 (Erythroblast_Hba-a2 high) were not in the table. The reported cluster 18 (Macrophage_Pf4 high) included a few NEUTs, the other 17 cell clusters were completely consistent. The bone marrow results are shown in Table 8. Of the 17 cell clusters, cluster 3 (neutrophil progenitors), cluster 8 (hematopoietic stem progenitor cells), cluster 9 (erythroblasts), and cluster 15 (mast cells) were not in the table, the other 14 cell clusters were completely consistent.
Cluster | Reported cell type | Our cell type | Consistency | Reason |
---|---|---|---|---|
1 | T cell_Trbc2 high | T | Yes | NR |
2 | B cell_Ly6d high | B | Yes | NR |
3 | Macrophage_S100a4 high | MAC | Yes | NR |
4 | Neutrophil_Retnlg high | NEUT | Yes | NR |
5 | Neutrophil_Ltf high | NEUT | Yes | NR |
6 | Neutrophil_Camp high | NEUT | Yes | NR |
7 | Neutrophil_Il1b high | NEUT | Yes | NR |
8 | NK cell_Gzma high | NK | Yes | NR |
9 | Macrophage_Ace high | MAC | Yes | NR |
10 | Monocyte_Elane high | MNC | Yes | NR |
11 | B cell_Vpreb3 high | B | Yes | NR |
12 | Monocyte_F13a1 high | MNC | Yes | NR |
13 | T cell_Gm14303 high | T | Yes | NR |
14 | Erythroblast_Car2 high | Proliferative cells | NA | Not within the scope of our evaluation. |
15 | B cell_Rps27rt high | B | Yes | NR |
16 | Dendritic cell_Siglech high | DC | Yes | NR |
17 | Basophil_Prss34 high | Unknown | NA | NA |
18 | Macrophage_Pf4 high | MAC/NEUT | Part | The reported cluster 18 was mixed with a few NEUT. |
19 | B cell_Igha high | Unknown | NA | Not within the scope of our evaluation. |
20 | Macrophage_Flt-ps1 high | MAC | Yes | NR |
21 | Erythroblast_Hba-a2 high | Unknown | NA | Not within the scope of our evaluation. |
Cluster | Reported cell type | Our cell type | Consistency | Reason |
---|---|---|---|---|
1 | Neutrophil_Cebpe high | NEUT | Yes | NR |
2 | Neutrophil_Mmp8 high | NEUT | Yes | NR |
3 | Neutrophil progenitor | MNC/MAC/NEUT | NA | Not within the scope of our evaluation. |
4 | Monocyte_Prtn3 high | MNC | Yes | NR |
5 | Macrophage_Ms4a6c high | MAC | Yes | NR |
6 | Neutrophil_Ngp high | NEUT | Yes | NR |
7 | Prepro B cell | B | Yes | NR |
8 | Hematopoietic stem progenitor cell | Unknown | NA | Not within the scope of our evaluation. |
9 | Erythroblast | Proliferative unknown cell | NA | Not within the scope of our evaluation. |
10 | Neutrophil_Fcnb high | NEUT | Yes | NR |
11 | B cell_Igkc high | B | Yes | NR |
12 | Macrophage_S100a4 high | MAC | Yes | NR |
13 | T cell_Ms4a4b high | T | Yes | NR |
14 | Dendritic cell_Siglech high | DC | Yes | NR |
15 | Mast cell | Unknown | NA | Not within the scope of our evaluation. |
16 | Dendritic cell_H2-Eb1 high | DC | Yes | NR |
17 | Monocyte_Mif high | MNC | Yes | NR |
Results of the CTIM compared with the published literature
According to the grading evaluation method in Table 2, the results of all data analysis (Tables 3–8) were evaluated. Excluding those clusters that are not within the scope of the analysis (N/A), a total of 83 valid cases were obtained. As shown in Table 9, excellent, satisfactory, and poor results in previous studies were 74, 3, and 6, respectively. Correspondingly, they were 77, 1, and 5 in the results of CTIM. The overall consistency rate was 93.98% (78/83). Bowker’s test showed that there was no significant difference between the two groups (p > 0.05). Kappa symmetric measures showed that the kappa value was 0.642 (p < 0.01), indicating general consistency.
Studies * CTIM crosstabulation | |||||
---|---|---|---|---|---|
Grading | Grading (CTIM) | Total | |||
A (excellent) | B (satisfactory) | C (poor) | |||
Grading (studies) | A | 73 | 1 | 0 | 74 |
B | 3 | 0 | 0 | 3 | |
C | 1 | 0 | 5 | 6 | |
Total | 77 | 1 | 5 | 83 |
Discussion
For the last few decades, many advanced techniques, such as immunohistochemistry, flow cytometry, etc. have been used to identify CNS myeloid cell-subtypes. However, owing to the lack of absolutely specific markers and unstable expression of biomarkers under different pathophysiological conditions, their accuracy is still not satisfactory.8 Although, scRNA-Seq is a promising new technology to solve this problem, for ordinary researchers, various programming language analysis packages for scRNA-Seq data are not an easy task, and bioinformatics experts do not necessarily know the specific markers of CNS myeloid cell-subtypes.9 Therefore, building a bridge to connect the knowledge gap between ordinary researchers and bioinformatics experts is important.
In this study, a Microsoft-Excel template was designed, in which a panel of gene makers corresponding to myeloid cells, lymphocytes, common CNS cells, and proliferative cells were included. For users, as long as the gene expression data of cell clusters are obtained, the clusters can be named directly using this Excel template. It should be emphasized that the template is mainly suitable for determining the major categories of myeloid cells. If researchers need to further distinguish the subtypes of certain cells, it is only needed to add corresponding gene markers. This Excel template is open source, and researchers can modify or add new genes based on their needs (Table S1). For the selection of gene markers, we considered not only the relative specificity but also the crossover and commonality of different cells. In the Excel template, the letters P and N mean the gene markers are positive or negative. If the markers are positive or negative, they are defined as “P/N” (Fig. 1). For example, Ptprc (the gene of CD45) is a common marker of myeloid cells and lymphocytes.38–40 It was used as a common marker of myeloid cells and lymphocytes to distinguish CNS nonmyeloid cells (astrocytes, oligodendrocytes, neurons, etc.). In addition, in theory, the protein molecule CD45 expressed by Ptprc gene is positive in many leukocytes, but in the process of collecting gene markers and drawing the Excel template, we found that Ptprc gene was not expressed in every cell cluster, so it was defined as P/N. In addition to Ptprc, there were many similar examples (see Fig. 1 and Table S1 for details). For a certain cell, although there are some relatively specific gene markers, a panel of gene markers was still used to comprehensively evaluate and then define them. This could effectively distinguish the cell types with similar or cross gene expression and ensure the accuracy of cell cluster identification. In this Excel template, there were 73 gene markers (excluding nonmyeloid CNS cells) in each panel that could be used to distinguish myeloid cell-subtypes and lymphocytes (Fig. 1). For example, MNC could express Ptprc (P/N), Cd14 (P/N), Itgam (P/N), Itgax (P/N), Csf3r (P/N), Adgre1(P/N), Ly6c1 (P/N), S100a4 (P/N), Cd68 (P), Ly86 (P/N), Ctsb (P/N), Ccr2 (P/N), Ly6c2 (P), Plac8 (P), Pf4 (P/N), Lyz1 (P), Hmox1 (P/N), F13a1(P), Lyst (P/N), Prtn3 (P/N), Elane (P/N), and Pilra (P/N). Although several molecules (Cd68, Ly6c2, Plac8 and Lyz1) are positive (P) in MNC, they are also expressed in other cells. So, there were no absolute specific markers of MNC in this template. Nevertheless, we could still determine its cell type using comparative analysis. For those cell types with their own specific gene markers, it was easy to identify cell clusters using comparative analysis. Typical examples were Ms4a7, Lyve1, Cbr2, Mrc1, and Cd163 for MAC; Hexb, Olfml3, Sparc, Tgfbr1, P2ry12, and Tmem119 for MG; Ltf, Ly6g, Mmp8, Camp, Ngp, Fcnb, Cebpe, Retnlg, S100a8, S100a9, Lcn2, G0s2, Wfdc21 for NEUT. Of course, because of limitations of knowledge background and research level, this Excel template still has some defects. For example, for DCs, the expressions of H2-Ab1, H2-Eb1, H2-Aa, Cd74, and Cd209a should be positive, but these markers can also be expressed in MAC and B cells, especially B cells, are not myeloid cells, which is easy to result in misidentification. In this template, B cell markers were also added to facilitate distinguishing B cells from DC. In addition, it should be aware of Kolmer epiplexus cells which were reported to express “microglial markers” like P2ry12 as well.34–40 Kolmer epiplexus cells, first reported by Kolmer in 1921, are a population of macrophages that attach to the ventricle-facing surface of the choroid plexus.41,42 The gene transcription of these cells is more consistent with microglia than nonparenchymal macrophages. In addition, Kolmer epiplexus cells have the same ontogenetic and self-renewal ability as microglia, so they are considered a nonparenchymal microglia subtype.34,41 Therefore, we should be careful with the interpretation and definition of microglia and macrophages when encountering suspected Kolmer epiplexus cells. For example, in the cluster 6 of Table 5, the typical gene markers of MAC, including Mrc1, Cd163, Lyve1, Pf4, Ms4a7, Stab1, and Cbr2, were not increased. In contrast, MG specific markers, Hexb, Olfml3, and Sparc, were significantly increased. This might be identified as Kolmer epiplexus cells.
Compared with the findings of Ximerakis et al.,35 only one cluster was inconsistent (Table 3). Our results showed that there were a few NEUT and DC mixed with their MNC. The possible reason was that they took Plac8 as a specific marker of MNC. In fact, Plac8 is also expressed in NEUT and DC.12 Compared with Han et al.,12 in the cell type identified of adult brain, the cluster 4 was inconsistent (Table 4). The reason may be that the reported cluster 4 was mixed with a few MG, because we could find the typical microglia markers (Hexb, Olfml3, Sparc, Tgfbr1, P2ry12, and Tmem119). Compared with the findings of Sankowski et al.,36 the clusters 6 and 9 were inconsistent (Table 5). Both clusters were identified as CAMs, however, the expression of typical genes of MACs (Mrc1, Cd163, Lyve1, Pf4, Ms4a7, Stab1, and Cbr2) was not increased in both clusters. In contrast, MG specific markers (Hexb, Olfml3, and Sparc) were significantly increased in cluster 6, while the other genes in cluster 9 were not in our table. Comparing with the cell type identified in peripheral blood and bone marrow of Han et al.,12 excepting cluster 18 of peripheral blood was mixed with a few NEUT, the others were completely consistent. These indicated that our Excel template was also very effective for the analysis of non-CNS myeloid cells.
From the above analysis, it can be deduced that the appropriate gene markers and ideal scRNA-Seq data clustering are key factors for the accuracy of cell definition. The importance of cell clustering can be understood by the following example. When the data reported by Mimouna et al.37 were analyzed, both the reported and the CTIM were not ideal. Analyzing the reasons, it was found that their data clustering methods were different from those used in other studies. The cell clustering method in this literature was Louvain graph-based community clustering, which may be the reason why the clustering was not ideal. Although this Excel template still could be used to identify the cell types based on the author’s data, the cell types in each of the nine clusters were mixed (Table 6). Therefore, the data used in this Excel template should be processed through the standard scRNA-Seq analysis process, including quality control, standardization, data correction, feature selection, and data dimensionality reduction, finally, the cells were divided into different clusters according to the similarity of gene expression.
Conclusions
The Excel template can be a bridge to span the knowledge gap between ordinary researchers and bioinformatics experts. For ordinary researchers without a foundation in computer language programming, it can easily distinguish myeloid cell-subtypes and nonmyeloid cells by using a panel of gene markers for cell clustering data of CNS. For bioinformatics experts, it is also a valuable reference for selecting gene markers. It will also encourage researchers pertaining to different fields interested in utilizing the ever-growing scRNA-Seq data to design similar templates and pipelines for their specific cell population.
Supporting information
Supplementary material for this article is available at https://doi.org/10.61474/ncs.2023.00004 .
Abbreviations
- ARP:
astrocyte-restricted precursor
- AST:
astrocyte
- B:
B lymphocyte
- CAM:
CNS-associated macrophage
- CNS:
central nervous system
- CTIM:
cell type identification method
- DC:
dendritic cell
- EAE:
experimental autoimmune encephalomyelitis
- EPC:
ependymocyte
- FACS:
fluorescence-activated cell sorting
- GF:
germ-free
- HypEPC:
hypendymal cell
- ImmN:
immature neuron
- MAC:
macrophage
- MG:
microglia
- MNC:
monocyte
- mNEUR:
mature neuron
- NA:
not available
- NEUT:
neutrophil
- NK:
nature killer cell
- NK/T:
natural killer T cell
- NR:
not relevant
- NRP:
neuronal-restricted precursor
- OL:
oligodendrocyte
- OPC:
oligodendrocyte precursor cell
- scMCA:
A tool defines cell types in mouse based on single-cell digital expression
- scRNA-Seq:
single-cell RNA sequencing
- SPF:
specific pathogen free
- T:
T lymphocyte
Declarations
Funding
This study was supported by grant from the National Natural Science Foundation of China (82072416).
Conflict of interest
The manuscript was submitted during Dr. He-Zuo Lü's term as an editorial board member of Nature Cell and Science. The authors have no other conflict of interest to declare.
Authors’ contributions
Study design, data interpretation and writing (HZL, JGH) literature search, data collection, data analysis, and generation of tables and figures (XYL, JLL, SQD). All authors made a significant contribution to this study and have approved the final manuscript.