Building graph

Cell Interactome Toolkit builds knowledge graph with three commands.

  • ci_preprocessing: Pre-processing of multi-omics data

  • ci_pipeline: Estimation of cell-microbe-metabolite associations from omics data

  • ci_dataload: Build knowledge graph



Example data

Example data used in the paper ([Maruyama2023]) is downloadable from the following URLs.



Basic usage

  1. Start Neo4j server

neo4j start
  1. Prepare dataset in specified format

  2. Run three commands to build Cell Interactome Graph database

# preprocessing of multi-omics data
ci_preprocessing --config config.yaml --output output/raw/

# analyze the omics data to infer cell-cell / cell-microbe interactions
ci_pipeline --config config.yaml --input output/raw/ --output output/pipeline/

# load the results into Neo4j database
ci_dataload --config config.yaml --data output/raw/ --interaction output/pipeline/ --db_config db_config.yaml


Commands

ci_preprocessing

The step ci_preprocessing standardize the input multi-omics data (e.g., conversion of gene/microbe/metabolite name into standardized ID). The output files will be used in the following pipelines.

Arguments

-h, --help         show this help message and exit
--config CONFIG    Configuration file
--output OUTPUT    Output directory
--logfile LOGFILE  Log file

ci_pipeline

The step ci_pipeline runs analysis pipelines to estimate cell-cell / cell-microbe interactions. Output files (called network files) contain information of the cell-cell / cell-microbe interactions in standardized format as follows.

Arguments

-h, --help         show this help message and exit
--config CONFIG    Configuration file
--input INPUT      Input directory
--output OUTPUT    Output directory
--logfile LOGFILE  Log file

Example output

entity1       entity_class1   entity2 entity_class2   relation_type   value   method
Anaerostipes  microbe Mature venous EC        cell    CORRELATE_WITH  0.25679405532429        pearson
Blautia       microbe Mature venous EC        cell    CORRELATE_WITH  0.202618250187864       pearson
Anaerostipes  microbe BEST2+ Goblet cell (FER1L6+)    cell    CORRELATE_WITH  -0.20693        pearson
Escherichia-Shigella  microbe BEST2+ Goblet cell (FER1L6+)    cell    CORRELATE_WITH  -0.21256        pearson

ci_dataload

The step ci_dataload load the cell-cell / cell-microbe interactions estimated above or in public resources into Neo4j database. Columns entity1 and entity2 in the network files are used as nodes, entity_class1 and entity_class2 are used as node class, relation_type is used as edge class, other columns are used as properties of the edges.

Arguments

-h, --help            show this help message and exit
--config CONFIG       Configuration file
--data DATA           Preprocessed data
--interaction INTERACTION
                      Output files of pipelines
--db_config DB_CONFIG
                      Database configuration file
--logfile LOGFILE     Log file
--reset_db            Reset database before loading the data
--fast_load           Use LOAD CSV to load data quickly

Option fast_load

Option --fast_load makes data loading faster (it uses LOAD CSV Cypher command). You need to modify neo4j.conf file in your $NEO4J_HOME/config/ directory as follows.

### Comment out the following variable
#server.directories.import=import