Building graph
Cell Interactome Toolkit builds knowledge graph with three commands.
ci_preprocessing
: Pre-processing of multi-omics dataci_pipeline
: Estimation of cell-microbe-metabolite associations from omics dataci_dataload
: Build knowledge graph
Example data
Example data used in the paper ([Maruyama2023]) is downloadable from the following URLs.
Inflammatory Bowel Disease: http://cellinteractome.com/download/IBD.tar.gz
Oral Squamous Cell Carcinoma: http://cellinteractome.com/download/OSCC.tar.gz
Basic usage
Start Neo4j server
neo4j start
Prepare dataset in specified format
Run three commands to build Cell Interactome Graph database
# preprocessing of multi-omics data
ci_preprocessing --config config.yaml --output output/raw/
# analyze the omics data to infer cell-cell / cell-microbe interactions
ci_pipeline --config config.yaml --input output/raw/ --output output/pipeline/
# load the results into Neo4j database
ci_dataload --config config.yaml --data output/raw/ --interaction output/pipeline/ --db_config db_config.yaml
Commands
ci_preprocessing
The step ci_preprocessing
standardize the input multi-omics data (e.g., conversion of gene/microbe/metabolite name into standardized ID). The output files will be used in the following pipelines.
Arguments
-h, --help show this help message and exit
--config CONFIG Configuration file
--output OUTPUT Output directory
--logfile LOGFILE Log file
ci_pipeline
The step ci_pipeline
runs analysis pipelines to estimate cell-cell / cell-microbe interactions. Output files (called network files) contain information of the cell-cell / cell-microbe interactions in standardized format as follows.
Arguments
-h, --help show this help message and exit
--config CONFIG Configuration file
--input INPUT Input directory
--output OUTPUT Output directory
--logfile LOGFILE Log file
Example output
entity1 entity_class1 entity2 entity_class2 relation_type value method
Anaerostipes microbe Mature venous EC cell CORRELATE_WITH 0.25679405532429 pearson
Blautia microbe Mature venous EC cell CORRELATE_WITH 0.202618250187864 pearson
Anaerostipes microbe BEST2+ Goblet cell (FER1L6+) cell CORRELATE_WITH -0.20693 pearson
Escherichia-Shigella microbe BEST2+ Goblet cell (FER1L6+) cell CORRELATE_WITH -0.21256 pearson
ci_dataload
The step ci_dataload
load the cell-cell / cell-microbe interactions estimated above or in public resources into Neo4j database.
Columns entity1
and entity2
in the network files are used as nodes, entity_class1
and entity_class2
are used as node class, relation_type
is used as edge class, other columns are used as properties of the edges.
Arguments
-h, --help show this help message and exit
--config CONFIG Configuration file
--data DATA Preprocessed data
--interaction INTERACTION
Output files of pipelines
--db_config DB_CONFIG
Database configuration file
--logfile LOGFILE Log file
--reset_db Reset database before loading the data
--fast_load Use LOAD CSV to load data quickly
Option fast_load
Option --fast_load
makes data loading faster (it uses LOAD CSV
Cypher command).
You need to modify neo4j.conf
file in your $NEO4J_HOME/config/
directory as follows.
### Comment out the following variable
#server.directories.import=import