================ Building graph ================ Cell Interactome Toolkit builds knowledge graph with three commands. * ``ci_preprocessing``: Pre-processing of multi-omics data * ``ci_pipeline``: Estimation of cell-microbe-metabolite associations from omics data * ``ci_dataload``: Build knowledge graph .. raw:: html

---------------- Example data ---------------- Example data used in the paper ([Maruyama2023]_) is downloadable from the following URLs. * Inflammatory Bowel Disease: http://cellinteractome.com/download/IBD.tar.gz * Oral Squamous Cell Carcinoma: http://cellinteractome.com/download/OSCC.tar.gz .. raw:: html

---------------- Basic usage ---------------- 1. Start Neo4j server .. code-block:: bash neo4j start 2. Prepare dataset in specified format 3. Run three commands to build Cell Interactome Graph database .. code-block:: bash # preprocessing of multi-omics data ci_preprocessing --config config.yaml --output output/raw/ # analyze the omics data to infer cell-cell / cell-microbe interactions ci_pipeline --config config.yaml --input output/raw/ --output output/pipeline/ # load the results into Neo4j database ci_dataload --config config.yaml --data output/raw/ --interaction output/pipeline/ --db_config db_config.yaml .. raw:: html

---------------- Commands ---------------- ``ci_preprocessing`` ==================== The step ``ci_preprocessing`` standardize the input multi-omics data (e.g., conversion of gene/microbe/metabolite name into standardized ID). The output files will be used in the following pipelines. **Arguments** .. code-block:: bash -h, --help show this help message and exit --config CONFIG Configuration file --output OUTPUT Output directory --logfile LOGFILE Log file ++++ ``ci_pipeline`` ==================== The step ``ci_pipeline`` runs analysis pipelines to estimate cell-cell / cell-microbe interactions. Output files (called network files) contain information of the cell-cell / cell-microbe interactions in standardized format as follows. **Arguments** .. code-block:: bash -h, --help show this help message and exit --config CONFIG Configuration file --input INPUT Input directory --output OUTPUT Output directory --logfile LOGFILE Log file **Example output** .. code-block:: bash entity1 entity_class1 entity2 entity_class2 relation_type value method Anaerostipes microbe Mature venous EC cell CORRELATE_WITH 0.25679405532429 pearson Blautia microbe Mature venous EC cell CORRELATE_WITH 0.202618250187864 pearson Anaerostipes microbe BEST2+ Goblet cell (FER1L6+) cell CORRELATE_WITH -0.20693 pearson Escherichia-Shigella microbe BEST2+ Goblet cell (FER1L6+) cell CORRELATE_WITH -0.21256 pearson ++++ ``ci_dataload`` ==================== The step ``ci_dataload`` load the cell-cell / cell-microbe interactions estimated above or in public resources into Neo4j database. Columns ``entity1`` and ``entity2`` in the network files are used as nodes, ``entity_class1`` and ``entity_class2`` are used as node class, ``relation_type`` is used as edge class, other columns are used as properties of the edges. **Arguments** .. code-block:: bash -h, --help show this help message and exit --config CONFIG Configuration file --data DATA Preprocessed data --interaction INTERACTION Output files of pipelines --db_config DB_CONFIG Database configuration file --logfile LOGFILE Log file --reset_db Reset database before loading the data --fast_load Use LOAD CSV to load data quickly Option ``fast_load`` -------------------- Option ``--fast_load`` makes data loading faster (it uses ``LOAD CSV`` Cypher command). You need to modify ``neo4j.conf`` file in your ``$NEO4J_HOME/config/`` directory as follows. .. code-block:: ### Comment out the following variable #server.directories.import=import