Artificial Intelligence: Milestone at the Single-Cell Level

In order to develop new therapies or preventive measures for chronic common diseases such as lung diseases, allergies or diabetes, biomedical research makes use of large amounts of data. However, these are often only available to a limited extent. Moreover, it can be difficult to integrate and analyze different datasets. With the participation of the DZD partner Helmholtz Munich, researchers are developing solutions to precisely these problems with the aid of artificial intelligence and machine learning. In the latest issue of the journal 'Nature Methods', they have now presented three articles with innovative solutions.

CellRank’s fate probabilities for lung regeneration: each cell is mapped at a position that reflects its probability of reaching any terminal state. © Helmholtz Munich / Marius Lange.

The researchers are focusing on single-cell genomics – the question of which genes are active in a specific cell at a specific point in time. The aim is to understand the origin of diseases at the molecular level and to develop medical innovations for a healthier society. The following solutions should help to better visualize and analyze the highly complex data.

Making different datasets comparable
If researchers want to check whether their results from single-cell analyses are generally valid, they have to compare their data with datasets from the same system. However, since the values of individual cells were not always generated at the same time, in the same place or by the same person, even the same cell types differ in their molecular profile. This problem, known as the batch effect, makes it extremely difficult to integrate datasets. In the first of the three articles listed below, the researchers present a guide on how best to solve this dilemma.

Predicting cell fates with open-source software
In medical research, a key focus is on the question: How do cells develop? To answer this question, researchers use single-cell RNA sequencing to analyze the gene expression of cells. However, the method only shows a brief snapshot of gene activity in the cell and not a long-term trajectory. In the journal Nature Methods, the researchers now present a new algorithm that can predict gene regulation and thus the development of cells. In an example of lung regeneration, the new open-source software was able to predict novel intermediate cell states, the existence of which was subsequently confirmed experimentally.

Visualizing spatial omics data
In recent years, it has become increasingly possible to analyze precisely which proteins or genes are expressed or produced at specific times, in specific cells and tissues. Using omics data, researchers can better understand how tissues are structured and how cells communicate with each other. However, flexible computer-based systems are needed to analyze the large amounts of data. A new software can support researchers in the analysis: 'Squidpy' combines omics data with data from image analyses and can thus visualize the exact spatial distribution of the molecular data.


Original publications:
Lücken et al.: Benchmarking atlas-level data integration in single-cell genomics. In: Nature Methods, 2021 DOI: 10.1038/s41592-021-01336-8.

Lange et al.: CellRank for directed single-cell fate mapping. In: Nature Methods, 2022 DOI: 10.1038/s41592-021-01346-6.

Palla, G.  et al.: Squidpy: a scalable framework for spatial omics analysis. In: Nature Methods, 2022, DOI: 10.1038/s41592-021-01358-2.