Use a graph database to advance package-by-feature
Graph databases are powerful for interactive architecture visualization
Software architecture is essentially a graph. If the contents of File A are used in File B, and File B is used in File C, you can draw a dependency graph like A ⇒ B ⇒ C.
You can export static graphs as images with tools like Graphviz, but when considering architecture changes such as file-structure refactoring, being able to manipulate the graph interactively leads to deeper understanding.
Neo4j, one of graph databases, supports Cypher, the SQL equivalent for graph database. It also ships with a visualization tool. I feel it is well-suited to this kind of case. So we’ll explore it.
For the data, we’ll use the Redmine model graph and model clustering results from the following article to evaluate a package-by-feature architecture:
Installing Neo4j
brew install neo4j brew services start neo4j source venv/bin/activate pip install neo4j
A GUI starts at http://localhost:7474.
Log in with id: neo4j, password: neo4j, and follow the prompt to change the password.
Importing model relationships
Import data into Neo4j via CSV using Python. Neo4j does not require you to define a schema in advance.
Read queries can be treated as undirected graphs, but when creating data it is a directed graph, so for from_model and to_model we reverse belongs_to relationships so that has always points toward to_model.
from_model,to_model,association_type Doorkeeper::AccessToken,Doorkeeper::Application,belongs_to Doorkeeper::AccessGrant,Doorkeeper::Application,belongs_to WorkflowRule,Role,belongs_to WorkflowRule,Tracker,belongs_to WorkflowRule,IssueStatus,belongs_to ...
#!/usr/bin/env python3 # -*- coding: utf-8 -*- """ Load model associations from a CSV into Neo4j. Nodes: (:Model {name}) Relationships (directed from from_model -> to_model): :BELONGS_TO | :HAS_ONE | :HAS_MANY | :HAS_AND_BELONGS_TO_MANY Each relationship keeps the original association_type as a property too. Usage: export NEO4J_URI="neo4j://localhost:7687" export NEO4J_USER="neo4j" export NEO4J_PASSWORD="password" python load_associations.py path/to/associations.csv """ import os import sys import csv import argparse from typing import List, Dict from neo4j import GraphDatabase from neo4j.exceptions import ServiceUnavailable REL_MAP = { "belongs_to": "BELONGS_TO", "has_one": "HAS_ONE", "has_many": "HAS_MANY", "has_and_belongs_to_many": "HAS_AND_BELONGS_TO_MANY", } def parse_csv(path: str) -> List[Dict[str, str]]: rows = [] with open(path, "r", encoding="utf-8-sig", newline="") as f: reader = csv.DictReader(f) for i, row in enumerate(reader, start=1): fm = row["from_model"].strip() tm = row["to_model"].strip() at = row["association_type"].strip() key = at.lower() if key not in REL_MAP: raise ValueError(f"Unknown association_type at line {i}: {at}") if key == "belongs_to": rows.append({ "from_model": tm, "to_model": fm, "association_type": "has", "rel_type": "HAS", }) else: rows.append({ "from_model": fm, "to_model": tm, # "association_type": at, "association_type": "has", # "rel_type": REL_MAP[key], "rel_type": "HAS", }) return rows def ensure_constraints(tx): # Unique model names tx.run("CREATE CONSTRAINT model_name_unique IF NOT EXISTS " "FOR (m:Model) REQUIRE m.name IS UNIQUE") def load_chunk_by_type(tx, rows: List[Dict[str, str]], rel_type: str): """ Insert a chunk of rows having the same relationship type. """ query = f""" UNWIND $rows AS row MERGE (a:Model {{name: row.from_model}}) MERGE (b:Model {{name: row.to_model}}) MERGE (a)-[r:{rel_type}]->(b) ON CREATE SET r.association_type = row.association_type ON MATCH SET r.association_type = coalesce(r.association_type, row.association_type) """ tx.run(query, rows=rows) def chunked(iterable, size): buf = [] for it in iterable: buf.append(it) if len(buf) >= size: yield buf buf = [] if buf: yield buf def main(): parser = argparse.ArgumentParser() parser.add_argument("csv_path", help="Path to associations CSV") parser.add_argument("--uri", default=os.getenv("NEO4J_URI", "neo4j://localhost:7687")) parser.add_argument("--user", default=os.getenv("NEO4J_USER", "neo4j")) parser.add_argument("--password", default=os.getenv("NEO4J_PASSWORD", "neo4j")) parser.add_argument("--batch", type=int, default=1000, help="Batch size per write") args = parser.parse_args() rows = parse_csv(args.csv_path) # Group rows per relationship type to allow typed relationships in Cypher grouped: Dict[str, List[Dict[str, str]]] = {} for r in rows: grouped.setdefault(r["rel_type"], []).append(r) driver = GraphDatabase.driver(args.uri, auth=(args.user, args.password)) try: with driver.session() as session: session.execute_write(ensure_constraints) total = 0 for rel_type, rel_rows in grouped.items(): for ch in chunked(rel_rows, args.batch): session.execute_write(load_chunk_by_type, ch, rel_type) total += len(ch) print(f"Done. Inserted/merged {len({k: len(v) for k,v in grouped.items()})} relationship groups, {total} rows.") except ServiceUnavailable as e: print("Neo4j service unavailable. Check URI/credentials or that Neo4j is running.", file=sys.stderr) raise e finally: driver.close() if __name__ == "__main__": main()
Run the python script and then run a Cypher query in the GUI to draw the graph:
export NEO4J_URI="neo4j://localhost:7687" export NEO4J_USER="neo4j" export NEO4J_PASSWORD="password" python neo4j_import.py model_relations.csv
MATCH (n) RETURN n

You’ll see a dense graph, but isolated parts stand out clearly.

Adding Feature-to-Model links
Assuming that grouping models under the same feature is useful, import feature-to-model relations.
feature,model Feature_1,Doorkeeper::AccessGrant Feature_1,Doorkeeper::AccessToken Feature_1,Doorkeeper::Application Feature_2,IssueStatus Feature_2,Tracker ...
#!/usr/bin/env python3 # -*- coding: utf-8 -*- """ Import feature->model pairs from a CSV into Neo4j. CSV columns: feature,model Schema created/used: (:Feature {name: <feature>}) (:Model {name: <model>}) # Reuse if exists (:Feature)-[:INCLUDES]->(:Model) Usage: export NEO4J_URI="neo4j://localhost:7687" export NEO4J_USER="neo4j" export NEO4J_PASSWORD="password" python import_feature_model.py path/to/feature_model.csv Options: --rel-type INCLUDES # relation type --batch 1000 # batch size """ import os import csv import argparse from typing import List, Dict, Tuple, Iterable from neo4j import GraphDatabase from neo4j.exceptions import ServiceUnavailable def parse_csv(csv_path: str) -> List[Dict[str, str]]: rows: List[Dict[str, str]] = [] with open(csv_path, "r", encoding="utf-8-sig", newline="") as f: reader = csv.DictReader(f) # header: feature,model for i, row in enumerate(reader, start=2): feat = (row.get("feature") or "").strip() model = (row.get("model") or "").strip() if not feat or not model: # skip for broken continue rows.append({"feature": feat, "model": model}) return rows def dedup(rows: Iterable[Dict[str, str]]) -> List[Dict[str, str]]: seen: set[Tuple[str, str]] = set() out: List[Dict[str, str]] = [] for r in rows: key = (r["feature"], r["model"]) if key in seen: continue seen.add(key) out.append(r) return out def ensure_constraints(tx): # 一意制約(存在しなければ作成) tx.run("CREATE CONSTRAINT feature_name_unique IF NOT EXISTS " "FOR (f:Feature) REQUIRE f.name IS UNIQUE") tx.run("CREATE CONSTRAINT model_name_unique IF NOT EXISTS " "FOR (m:Model) REQUIRE m.name IS UNIQUE") def write_batch(tx, batch_rows: List[Dict[str, str]], rel_type: str): query = f""" UNWIND $rows AS row MERGE (f:Feature {{name: row.feature}}) MERGE (m:Model {{name: row.model}}) MERGE (f)-[r:{rel_type}]->(m) RETURN count(r) as created_or_merged """ tx.run(query, rows=batch_rows) def chunked(items: List[Dict[str, str]], size: int): for i in range(0, len(items), size): yield items[i:i+size] def main(): parser = argparse.ArgumentParser() parser.add_argument("csv_path", help="Path to feature_model.csv (feature,model)") parser.add_argument("--uri", default=os.getenv("NEO4J_URI", "neo4j://localhost:7687")) parser.add_argument("--user", default=os.getenv("NEO4J_USER", "neo4j")) parser.add_argument("--password", default=os.getenv("NEO4J_PASSWORD", "neo4j")) parser.add_argument("--rel-type", default="INCLUDES", help="Relationship type name (default: INCLUDES)") parser.add_argument("--batch", type=int, default=1000, help="Write batch size") args = parser.parse_args() rows = dedup(parse_csv(args.csv_path)) if not rows: print("No rows to import. Check the CSV content/headers (feature,model).") return driver = GraphDatabase.driver(args.uri, auth=(args.user, args.password)) try: with driver.session() as session: # constraint session.execute_write(ensure_constraints) total = 0 for ch in chunked(rows, args.batch): session.execute_write(write_batch, ch, args.rel_type) total += len(ch) print(f"Done. Upserted {total} (Feature)-[:{args.rel_type}]->(Model) pairs.") except ServiceUnavailable as e: print("Neo4j service unavailable. Check URI/credentials or that Neo4j is running.") raise e finally: driver.close() if __name__ == "__main__": main()
Feature nodes are now imported alongside Model nodes.
python import_feature_model.py feature_model.csv


To reduce clutter, delete relationships between Models that belong to the same Feature:
MATCH (m1:Model)-[r:HAS]->(m2:Model)
WHERE EXISTS {
MATCH (f:Feature)-[:INCLUDES]->(m1)
MATCH (f)-[:INCLUDES]->(m2)
}
DELETE r;

The graph becomes easier to read.
Removing edges concentrated on certain models

By exploring the data you may find, for example, that many edges converge on the Role node.
Even though user-related edges were supposed to be filtered earlier, Redmine’s user → role → model pattern remains.
Because Role represents a user-oriented concept and adds noise when deciding on package-by-feature, remove edges to Role.
Delete Role, then regroup models. The result is here.
Detected communities (Louvain): Community 1: Doorkeeper::AccessGrant, Doorkeeper::AccessToken, Doorkeeper::Application Community 2: Attachment, Container, Issue, IssueRelation, IssueStatus, Journal, JournalDetail, Journalized, Tracker, Version, WorkflowPermission, WorkflowRule, WorkflowTransition Community 3: Board, Comment, Commented, EnabledModule, Message, News, Principal, Reactable, Reaction, Watchable, Watcher, Wiki, WikiContent, WikiContentVersion, WikiPage, WikiRedirect Community 4: Change, Changeset, IssueQuery, Project, ProjectAdminQuery, ProjectQuery, Query, Repository, Repository::Bazaar, Repository::Cvs, Repository::Filesystem, Repository::Git, Repository::Mercurial, Repository::Subversion, TimeEntryQuery, UserQuery Community 5: Import, ImportItem, IssueImport, TimeEntryImport, UserImport Community 6: EmailAddress, Group, GroupAnonymous, GroupBuiltin, GroupNonMember, IssueCategory, Member, MemberRole Community 7: CustomField, CustomFieldEnumeration, CustomValue, Customized, Document, DocumentCategory, DocumentCategoryCustomField, DocumentCustomField, Enumeration, GroupCustomField, IssueCustomField, IssuePriority, IssuePriorityCustomField, ProjectCustomField, TimeEntry, TimeEntryActivity, TimeEntryActivityCustomField, TimeEntryCustomField, UserCustomField, VersionCustomField
Visualization ls like this.

It has been easier to view because the number of edges is reduced. However Project is still center model and makes clustering hard.

Project is not on user-side axis. It is on feature-side axis.
Excluding is not always good, but in this try, I assume treating Project as global concept improves total visibility.
MATCH (m:Model {name:'Project'})-[r:HAS]-(:Model)
DELETE r;

It is really easy to understand. Simple relationship.
Routing cross-feature requests through Feature APIs
When we divide software by feature, ideally, features would have zero inter-feature relationships. But In reality some cross-feature dependencies remain in this relationship model.
Ideal is ideal, but for practical application, we cannot delete that all inter-feature relationship. To minimize impact, expose such relationships via a Feature API—not an external web API, but an internal interface for feature-to-feature interaction.
Calling the other feature directly is too much freedom. We need the regulation to control.
Instead of calling another feature’s model directly (Model a → Model b), force a hop through its Feature API (Model a → Feature_B_API → Model b).
This limits freedom and allows mocking Feature API calls for isolated testing.
Create Feature_API nodes and rewire cross-feature edges through them.
If Model a has_one/many Model b、Model b belongs_to Model a, actually sometimes a calls b, sometimes b calls a. For simplicity, we describe the graph assuming that only a has b call exists.
// Create unique constraint
CREATE CONSTRAINT feature_api_name_unique IF NOT EXISTS
FOR (n:Feature_API) REQUIRE n.name IS UNIQUE;
// Create Node(If exists, recuse it)
UNWIND ['Feature_2_API','Feature_3_API','Feature_4_API','Feature_6_API','Feature_7_API'] AS nm
MERGE (:Feature_API {name: nm});
UNWIND ['Feature_2','Feature_3','Feature_4','Feature_6','Feature_7'] AS featName
WITH featName, featName + '_API' AS apiName
MATCH (f:Feature {name: featName})
MERGE (api:Feature_API {name: apiName})
WITH f, api
MATCH (f)-[:INCLUDES]->(b:Model)
MATCH (a:Model)-[h:HAS]->(b)
WHERE a <> b
WITH DISTINCT a, api, b, h
MERGE (a)-[:CALL]->(api)
MERGE (api)-[:CALL]->(b)
WITH DISTINCT h
DELETE h;

The visualization is now much cleaner.
Finally: revisiting the architecture
This completes the visualization.
From here, architects can iterate on the structure—aiming to minimize the number of edges to Feature APIs, which would increase feature independence.

For example, Feature 7 still has too many API calls.
Treating CustomValue as a global concept like Project could improve clarity.

Or analyzing Feature 2’s API calls might suggest moving Issue and IssueCategory into the same feature.
Continuously refining these decisions is the essence of architectural work.
