================
Building graph
================
Cell Interactome Toolkit builds knowledge graph with three commands.
* ``ci_preprocessing``: Pre-processing of multi-omics data
* ``ci_pipeline``: Estimation of cell-microbe-metabolite associations from omics data
* ``ci_dataload``: Build knowledge graph
.. raw:: html
----------------
Example data
----------------
Example data used in the paper ([Maruyama2023]_) is downloadable from the following URLs.
* Inflammatory Bowel Disease: http://cellinteractome.com/download/IBD.tar.gz
* Oral Squamous Cell Carcinoma: http://cellinteractome.com/download/OSCC.tar.gz
.. raw:: html
----------------
Basic usage
----------------
1. Start Neo4j server
.. code-block:: bash
neo4j start
2. Prepare dataset in specified format
3. Run three commands to build Cell Interactome Graph database
.. code-block:: bash
# preprocessing of multi-omics data
ci_preprocessing --config config.yaml --output output/raw/
# analyze the omics data to infer cell-cell / cell-microbe interactions
ci_pipeline --config config.yaml --input output/raw/ --output output/pipeline/
# load the results into Neo4j database
ci_dataload --config config.yaml --data output/raw/ --interaction output/pipeline/ --db_config db_config.yaml
.. raw:: html
----------------
Commands
----------------
``ci_preprocessing``
====================
The step ``ci_preprocessing`` standardize the input multi-omics data (e.g., conversion of gene/microbe/metabolite name into standardized ID). The output files will be used in the following pipelines.
**Arguments**
.. code-block:: bash
-h, --help show this help message and exit
--config CONFIG Configuration file
--output OUTPUT Output directory
--logfile LOGFILE Log file
++++
``ci_pipeline``
====================
The step ``ci_pipeline`` runs analysis pipelines to estimate cell-cell / cell-microbe interactions. Output files (called network files) contain information of the cell-cell / cell-microbe interactions in standardized format as follows.
**Arguments**
.. code-block:: bash
-h, --help show this help message and exit
--config CONFIG Configuration file
--input INPUT Input directory
--output OUTPUT Output directory
--logfile LOGFILE Log file
**Example output**
.. code-block:: bash
entity1 entity_class1 entity2 entity_class2 relation_type value method
Anaerostipes microbe Mature venous EC cell CORRELATE_WITH 0.25679405532429 pearson
Blautia microbe Mature venous EC cell CORRELATE_WITH 0.202618250187864 pearson
Anaerostipes microbe BEST2+ Goblet cell (FER1L6+) cell CORRELATE_WITH -0.20693 pearson
Escherichia-Shigella microbe BEST2+ Goblet cell (FER1L6+) cell CORRELATE_WITH -0.21256 pearson
++++
``ci_dataload``
====================
The step ``ci_dataload`` load the cell-cell / cell-microbe interactions estimated above or in public resources into Neo4j database.
Columns ``entity1`` and ``entity2`` in the network files are used as nodes, ``entity_class1`` and ``entity_class2`` are used as node class, ``relation_type`` is used as edge class, other columns are used as properties of the edges.
**Arguments**
.. code-block:: bash
-h, --help show this help message and exit
--config CONFIG Configuration file
--data DATA Preprocessed data
--interaction INTERACTION
Output files of pipelines
--db_config DB_CONFIG
Database configuration file
--logfile LOGFILE Log file
--reset_db Reset database before loading the data
--fast_load Use LOAD CSV to load data quickly
Option ``fast_load``
--------------------
Option ``--fast_load`` makes data loading faster (it uses ``LOAD CSV`` Cypher command).
You need to modify ``neo4j.conf`` file in your ``$NEO4J_HOME/config/`` directory as follows.
.. code-block::
### Comment out the following variable
#server.directories.import=import