================
Building graph
================

Cell Interactome Toolkit builds knowledge graph with three commands.

* ``ci_preprocessing``: Pre-processing of multi-omics data
* ``ci_pipeline``: Estimation of cell-microbe-metabolite associations from omics data
* ``ci_dataload``: Build knowledge graph

.. raw:: html
    
   <br>
   <hr style="border:1px solid gray">


----------------
Example data
----------------

Example data used in the paper ([Maruyama2023]_) is downloadable from the following URLs.

* Inflammatory Bowel Disease: http://cellinteractome.com/download/IBD.tar.gz
* Oral Squamous Cell Carcinoma: http://cellinteractome.com/download/OSCC.tar.gz


.. raw:: html

   <br>
   <hr style="border:1px solid gray">


----------------
Basic usage
----------------

1. Start Neo4j server

.. code-block:: bash
  
  neo4j start

2. Prepare dataset in specified format
3. Run three commands to build Cell Interactome Graph database

.. code-block:: bash

  # preprocessing of multi-omics data
  ci_preprocessing --config config.yaml --output output/raw/
  
  # analyze the omics data to infer cell-cell / cell-microbe interactions
  ci_pipeline --config config.yaml --input output/raw/ --output output/pipeline/
  
  # load the results into Neo4j database
  ci_dataload --config config.yaml --data output/raw/ --interaction output/pipeline/ --db_config db_config.yaml
  

.. raw:: html

   <br>
   <hr style="border:1px solid gray">


----------------
Commands
----------------

``ci_preprocessing``
====================

The step ``ci_preprocessing`` standardize the input multi-omics data (e.g., conversion of gene/microbe/metabolite name into standardized ID). The output files will be used in the following pipelines.

**Arguments**

.. code-block:: bash
  
  -h, --help         show this help message and exit
  --config CONFIG    Configuration file
  --output OUTPUT    Output directory
  --logfile LOGFILE  Log file

++++


``ci_pipeline``
====================

The step ``ci_pipeline`` runs analysis pipelines to estimate cell-cell / cell-microbe interactions. Output files (called network files) contain information of the cell-cell / cell-microbe interactions in standardized format as follows.

**Arguments**

.. code-block:: bash
  
  -h, --help         show this help message and exit
  --config CONFIG    Configuration file
  --input INPUT      Input directory
  --output OUTPUT    Output directory
  --logfile LOGFILE  Log file
  
**Example output**

.. code-block:: bash
  
  entity1	entity_class1	entity2	entity_class2	relation_type	value	method
  Anaerostipes	microbe	Mature venous EC	cell	CORRELATE_WITH	0.25679405532429	pearson
  Blautia	microbe	Mature venous EC	cell	CORRELATE_WITH	0.202618250187864	pearson
  Anaerostipes	microbe	BEST2+ Goblet cell (FER1L6+)	cell	CORRELATE_WITH	-0.20693	pearson
  Escherichia-Shigella	microbe	BEST2+ Goblet cell (FER1L6+)	cell	CORRELATE_WITH	-0.21256	pearson

++++

``ci_dataload``
====================

The step ``ci_dataload`` load the cell-cell / cell-microbe interactions estimated above or in public resources into Neo4j database. 
Columns ``entity1`` and ``entity2`` in the network files are used as nodes, ``entity_class1`` and ``entity_class2`` are used as node class, ``relation_type`` is used as edge class, other columns are used as properties of the edges.
 
**Arguments**

.. code-block:: bash
  
  -h, --help            show this help message and exit
  --config CONFIG       Configuration file
  --data DATA           Preprocessed data
  --interaction INTERACTION
                        Output files of pipelines
  --db_config DB_CONFIG
                        Database configuration file
  --logfile LOGFILE     Log file
  --reset_db            Reset database before loading the data
  --fast_load           Use LOAD CSV to load data quickly
  
Option ``fast_load``
--------------------

Option ``--fast_load`` makes data loading faster (it uses ``LOAD CSV`` Cypher command).
You need to modify ``neo4j.conf`` file in your ``$NEO4J_HOME/config/`` directory as follows.

.. code-block:: 
  
  ### Comment out the following variable
  #server.directories.import=import