algonote(en)

There's More Than One Way To Do It

Visualize software architecture with a graph database

Use a graph database to advance package-by-feature

Graph databases are powerful for interactive architecture visualization

Software architecture is essentially a graph. If the contents of File A are used in File B, and File B is used in File C, you can draw a dependency graph like A ⇒ B ⇒ C.

You can export static graphs as images with tools like Graphviz, but when considering architecture changes such as file-structure refactoring, being able to manipulate the graph interactively leads to deeper understanding.

Neo4j, one of graph databases, supports Cypher, the SQL equivalent for graph database. It also ships with a visualization tool. I feel it is well-suited to this kind of case. So we’ll explore it.

For the data, we’ll use the Redmine model graph and model clustering results from the following article to evaluate a package-by-feature architecture:

www.algonote.com

Installing Neo4j

brew install neo4j
brew services start neo4j

source venv/bin/activate
pip install neo4j

A GUI starts at http://localhost:7474.

Log in with id: neo4j, password: neo4j, and follow the prompt to change the password.

Importing model relationships

Import data into Neo4j via CSV using Python. Neo4j does not require you to define a schema in advance.

Read queries can be treated as undirected graphs, but when creating data it is a directed graph, so for from_model and to_model we reverse belongs_to relationships so that has always points toward to_model.

from_model,to_model,association_type
Doorkeeper::AccessToken,Doorkeeper::Application,belongs_to
Doorkeeper::AccessGrant,Doorkeeper::Application,belongs_to
WorkflowRule,Role,belongs_to
WorkflowRule,Tracker,belongs_to
WorkflowRule,IssueStatus,belongs_to
...

   

#!/usr/bin/env python3
# -*- coding: utf-8 -*-
"""
Load model associations from a CSV into Neo4j.

Nodes: (:Model {name})
Relationships (directed from from_model -> to_model):
  :BELONGS_TO | :HAS_ONE | :HAS_MANY | :HAS_AND_BELONGS_TO_MANY
Each relationship keeps the original association_type as a property too.

Usage:
  export NEO4J_URI="neo4j://localhost:7687"
  export NEO4J_USER="neo4j"
  export NEO4J_PASSWORD="password"
  python load_associations.py path/to/associations.csv
"""

import os
import sys
import csv
import argparse
from typing import List, Dict
from neo4j import GraphDatabase
from neo4j.exceptions import ServiceUnavailable

REL_MAP = {
    "belongs_to": "BELONGS_TO",
    "has_one": "HAS_ONE",
    "has_many": "HAS_MANY",
    "has_and_belongs_to_many": "HAS_AND_BELONGS_TO_MANY",
}

def parse_csv(path: str) -> List[Dict[str, str]]:
    rows = []
    with open(path, "r", encoding="utf-8-sig", newline="") as f:
        reader = csv.DictReader(f)
        for i, row in enumerate(reader, start=1):
            fm = row["from_model"].strip()
            tm = row["to_model"].strip()
            at = row["association_type"].strip()
            key = at.lower()
            if key not in REL_MAP:
                raise ValueError(f"Unknown association_type at line {i}: {at}")
            if key == "belongs_to":
                rows.append({
                    "from_model": tm,
                    "to_model": fm,
                    "association_type": "has",
                    "rel_type": "HAS",
                })
            else:
                rows.append({
                    "from_model": fm,
                    "to_model": tm,
                    # "association_type": at,
                    "association_type": "has",
                    # "rel_type": REL_MAP[key],
                    "rel_type": "HAS",
                })
    return rows

def ensure_constraints(tx):
    # Unique model names
    tx.run("CREATE CONSTRAINT model_name_unique IF NOT EXISTS "
           "FOR (m:Model) REQUIRE m.name IS UNIQUE")

def load_chunk_by_type(tx, rows: List[Dict[str, str]], rel_type: str):
    """
    Insert a chunk of rows having the same relationship type.
    """
    query = f"""
    UNWIND $rows AS row
    MERGE (a:Model {{name: row.from_model}})
    MERGE (b:Model {{name: row.to_model}})
    MERGE (a)-[r:{rel_type}]->(b)
      ON CREATE SET r.association_type = row.association_type
      ON MATCH  SET r.association_type = coalesce(r.association_type, row.association_type)
    """
    tx.run(query, rows=rows)

def chunked(iterable, size):
    buf = []
    for it in iterable:
        buf.append(it)
        if len(buf) >= size:
            yield buf
            buf = []
    if buf:
        yield buf

def main():
    parser = argparse.ArgumentParser()
    parser.add_argument("csv_path", help="Path to associations CSV")
    parser.add_argument("--uri", default=os.getenv("NEO4J_URI", "neo4j://localhost:7687"))
    parser.add_argument("--user", default=os.getenv("NEO4J_USER", "neo4j"))
    parser.add_argument("--password", default=os.getenv("NEO4J_PASSWORD", "neo4j"))
    parser.add_argument("--batch", type=int, default=1000, help="Batch size per write")
    args = parser.parse_args()

    rows = parse_csv(args.csv_path)

    # Group rows per relationship type to allow typed relationships in Cypher
    grouped: Dict[str, List[Dict[str, str]]] = {}
    for r in rows:
        grouped.setdefault(r["rel_type"], []).append(r)

    driver = GraphDatabase.driver(args.uri, auth=(args.user, args.password))
    try:
        with driver.session() as session:
            session.execute_write(ensure_constraints)

            total = 0
            for rel_type, rel_rows in grouped.items():
                for ch in chunked(rel_rows, args.batch):
                    session.execute_write(load_chunk_by_type, ch, rel_type)
                    total += len(ch)

        print(f"Done. Inserted/merged {len({k: len(v) for k,v in grouped.items()})} relationship groups, {total} rows.")
    except ServiceUnavailable as e:
        print("Neo4j service unavailable. Check URI/credentials or that Neo4j is running.", file=sys.stderr)
        raise e
    finally:
        driver.close()

if __name__ == "__main__":
    main()

Run the python script and then run a Cypher query in the GUI to draw the graph:

export NEO4J_URI="neo4j://localhost:7687"
export NEO4J_USER="neo4j"
export NEO4J_PASSWORD="password"
python neo4j_import.py model_relations.csv

  

MATCH (n) RETURN n

You’ll see a dense graph, but isolated parts stand out clearly.

Assuming that grouping models under the same feature is useful, import feature-to-model relations.

feature,model
Feature_1,Doorkeeper::AccessGrant
Feature_1,Doorkeeper::AccessToken
Feature_1,Doorkeeper::Application
Feature_2,IssueStatus
Feature_2,Tracker
...

   

#!/usr/bin/env python3
# -*- coding: utf-8 -*-
"""
Import feature->model pairs from a CSV into Neo4j.

CSV columns:
  feature,model

Schema created/used:
  (:Feature {name: <feature>})
  (:Model   {name: <model>})   # Reuse if exists
  (:Feature)-[:INCLUDES]->(:Model)

Usage:
  export NEO4J_URI="neo4j://localhost:7687"
  export NEO4J_USER="neo4j"
  export NEO4J_PASSWORD="password"
  python import_feature_model.py path/to/feature_model.csv

Options:
  --rel-type INCLUDES         # relation type
  --batch 1000                # batch size
"""

import os
import csv
import argparse
from typing import List, Dict, Tuple, Iterable
from neo4j import GraphDatabase
from neo4j.exceptions import ServiceUnavailable

def parse_csv(csv_path: str) -> List[Dict[str, str]]:
    rows: List[Dict[str, str]] = []
    with open(csv_path, "r", encoding="utf-8-sig", newline="") as f:
        reader = csv.DictReader(f)
        # header: feature,model
        for i, row in enumerate(reader, start=2):
            feat = (row.get("feature") or "").strip()
            model = (row.get("model") or "").strip()
            if not feat or not model:
                # skip for broken
                continue
            rows.append({"feature": feat, "model": model})
    return rows

def dedup(rows: Iterable[Dict[str, str]]) -> List[Dict[str, str]]:
    seen: set[Tuple[str, str]] = set()
    out: List[Dict[str, str]] = []
    for r in rows:
        key = (r["feature"], r["model"])
        if key in seen:
            continue
        seen.add(key)
        out.append(r)
    return out

def ensure_constraints(tx):
    # 一意制約(存在しなければ作成)
    tx.run("CREATE CONSTRAINT feature_name_unique IF NOT EXISTS "
           "FOR (f:Feature) REQUIRE f.name IS UNIQUE")
    tx.run("CREATE CONSTRAINT model_name_unique IF NOT EXISTS "
           "FOR (m:Model) REQUIRE m.name IS UNIQUE")

def write_batch(tx, batch_rows: List[Dict[str, str]], rel_type: str):
    query = f"""
    UNWIND $rows AS row
    MERGE (f:Feature {{name: row.feature}})
    MERGE (m:Model   {{name: row.model}})
    MERGE (f)-[r:{rel_type}]->(m)
    RETURN count(r) as created_or_merged
    """
    tx.run(query, rows=batch_rows)

def chunked(items: List[Dict[str, str]], size: int):
    for i in range(0, len(items), size):
        yield items[i:i+size]

def main():
    parser = argparse.ArgumentParser()
    parser.add_argument("csv_path", help="Path to feature_model.csv (feature,model)")
    parser.add_argument("--uri", default=os.getenv("NEO4J_URI", "neo4j://localhost:7687"))
    parser.add_argument("--user", default=os.getenv("NEO4J_USER", "neo4j"))
    parser.add_argument("--password", default=os.getenv("NEO4J_PASSWORD", "neo4j"))
    parser.add_argument("--rel-type", default="INCLUDES",
                        help="Relationship type name (default: INCLUDES)")
    parser.add_argument("--batch", type=int, default=1000, help="Write batch size")
    args = parser.parse_args()

    rows = dedup(parse_csv(args.csv_path))
    if not rows:
        print("No rows to import. Check the CSV content/headers (feature,model).")
        return

    driver = GraphDatabase.driver(args.uri, auth=(args.user, args.password))
    try:
        with driver.session() as session:
            # constraint
            session.execute_write(ensure_constraints)

            total = 0
            for ch in chunked(rows, args.batch):
                session.execute_write(write_batch, ch, args.rel_type)
                total += len(ch)

        print(f"Done. Upserted {total} (Feature)-[:{args.rel_type}]->(Model) pairs.")
    except ServiceUnavailable as e:
        print("Neo4j service unavailable. Check URI/credentials or that Neo4j is running.")
        raise e
    finally:
        driver.close()

if __name__ == "__main__":
    main()

Feature nodes are now imported alongside Model nodes.

python import_feature_model.py feature_model.csv

To reduce clutter, delete relationships between Models that belong to the same Feature:

MATCH (m1:Model)-[r:HAS]->(m2:Model)
WHERE EXISTS {
  MATCH (f:Feature)-[:INCLUDES]->(m1)
  MATCH (f)-[:INCLUDES]->(m2)
}
DELETE r;

The graph becomes easier to read.

Removing edges concentrated on certain models

By exploring the data you may find, for example, that many edges converge on the Role node. Even though user-related edges were supposed to be filtered earlier, Redmine’s user → role → model pattern remains.

Because Role represents a user-oriented concept and adds noise when deciding on package-by-feature, remove edges to Role.

Delete Role, then regroup models. The result is here.

Detected communities (Louvain):
Community 1: Doorkeeper::AccessGrant, Doorkeeper::AccessToken, Doorkeeper::Application
Community 2: Attachment, Container, Issue, IssueRelation, IssueStatus, Journal, JournalDetail, Journalized, Tracker, Version, WorkflowPermission, WorkflowRule, WorkflowTransition
Community 3: Board, Comment, Commented, EnabledModule, Message, News, Principal, Reactable, Reaction, Watchable, Watcher, Wiki, WikiContent, WikiContentVersion, WikiPage, WikiRedirect
Community 4: Change, Changeset, IssueQuery, Project, ProjectAdminQuery, ProjectQuery, Query, Repository, Repository::Bazaar, Repository::Cvs, Repository::Filesystem, Repository::Git, Repository::Mercurial, Repository::Subversion, TimeEntryQuery, UserQuery
Community 5: Import, ImportItem, IssueImport, TimeEntryImport, UserImport
Community 6: EmailAddress, Group, GroupAnonymous, GroupBuiltin, GroupNonMember, IssueCategory, Member, MemberRole
Community 7: CustomField, CustomFieldEnumeration, CustomValue, Customized, Document, DocumentCategory, DocumentCategoryCustomField, DocumentCustomField, Enumeration, GroupCustomField, IssueCustomField, IssuePriority, IssuePriorityCustomField, ProjectCustomField, TimeEntry, TimeEntryActivity, TimeEntryActivityCustomField, TimeEntryCustomField, UserCustomField, VersionCustomField

Visualization ls like this.

It has been easier to view because the number of edges is reduced. However Project is still center model and makes clustering hard.

Project is not on user-side axis. It is on feature-side axis.

Excluding is not always good, but in this try, I assume treating Project as global concept improves total visibility.

MATCH (m:Model {name:'Project'})-[r:HAS]-(:Model)
DELETE r;

It is really easy to understand. Simple relationship.

Routing cross-feature requests through Feature APIs

When we divide software by feature, ideally, features would have zero inter-feature relationships. But In reality some cross-feature dependencies remain in this relationship model.

Ideal is ideal, but for practical application, we cannot delete that all inter-feature relationship. To minimize impact, expose such relationships via a Feature API—not an external web API, but an internal interface for feature-to-feature interaction.

Calling the other feature directly is too much freedom. We need the regulation to control.

Instead of calling another feature’s model directly (Model a → Model b), force a hop through its Feature API (Model a → Feature_B_API → Model b). This limits freedom and allows mocking Feature API calls for isolated testing.

Create Feature_API nodes and rewire cross-feature edges through them.

If Model a has_one/many Model b、Model b belongs_to Model a, actually sometimes a calls b, sometimes b calls a. For simplicity, we describe the graph assuming that only a has b call exists.

// Create unique constraint
CREATE CONSTRAINT feature_api_name_unique IF NOT EXISTS
FOR (n:Feature_API) REQUIRE n.name IS UNIQUE;

// Create Node(If exists, recuse it)
UNWIND ['Feature_2_API','Feature_3_API','Feature_4_API','Feature_6_API','Feature_7_API'] AS nm
MERGE (:Feature_API {name: nm});
UNWIND ['Feature_2','Feature_3','Feature_4','Feature_6','Feature_7'] AS featName
WITH featName, featName + '_API' AS apiName
MATCH (f:Feature {name: featName})
MERGE (api:Feature_API {name: apiName})
WITH f, api
MATCH (f)-[:INCLUDES]->(b:Model)
MATCH (a:Model)-[h:HAS]->(b)
WHERE a <> b
WITH DISTINCT a, api, b, h
MERGE (a)-[:CALL]->(api)
MERGE (api)-[:CALL]->(b)
WITH DISTINCT h
DELETE h;

The visualization is now much cleaner.

Finally: revisiting the architecture

This completes the visualization.

From here, architects can iterate on the structure—aiming to minimize the number of edges to Feature APIs, which would increase feature independence.

For example, Feature 7 still has too many API calls. Treating CustomValue as a global concept like Project could improve clarity.

Or analyzing Feature 2’s API calls might suggest moving Issue and IssueCategory into the same feature.

Continuously refining these decisions is the essence of architectural work.