How to measure cohesion in Python
What is Cohesion?
Cohesion is a software metric that measures the proximity of related methods and variables.
For example, in LCOM4, the functions accessing a variable in a class are traced and clustered. If the number of clusters is 1, the class is in a good state. If the number is 2 or more, the class is in a bad state because it should be split up. 0 (no method) is also a bad state.
This time, I found cohesion, which can measure cohesion in Python, and I will try it out.
The definition of cohesion in library cohesion seems to be 100% if a method uses all instance and class variables in a certain class.
Environment Building
I want to evaluate this against a real code base, so I bring the code for Zulip, a chat service made in Python.
Looking at the dependencies, it seems to use both Django and Tornado. Complicated.
$ git clone --depth 1 git@github.com:zulip/zulip.git # 1e0339c18bb46a1c502b01a71c2d66471848cf36
Install cohesion
$ python -m pip install cohesion $ python -m cohesion -h usage: __main__.py [-h] [-v | -x] (-f FILES [FILES ...] | -d DIRECTORY) [-b BELOW | -a ABOVE] A tool for measuring Python class cohesion. optional arguments: -h, --help show this help message and exit -v, --verbose print more verbose output -x, --debug print debugging output -f FILES [FILES ...], --files FILES [FILES ...] analyze these Python files -d DIRECTORY, --directory DIRECTORY recursively analyze this directory of Python files -b BELOW, --below BELOW only show results with this percentage or lower -a ABOVE, --above ABOVE only show results with this percentage or higher
Putting cohesion on Zulip
I'm applying cohesion to Zulip.
Django is MTC, not MVC. The rule is roughly MVC Model = Django Model, MVC View = Django Template, MVC Controller = Django View.
I guessed that the calculation is likely to be in model and view. So I'm going to try to apply cohesion to one file.
$ python -m cohesion -f zerver/models/alert_words.py File: zerver/models/alert_words.py Class: AlertWord (14:0) Total: 0.0% Class: Meta (24:4) Total: 0.0% $ python -m cohesion -f zerver/views/auth.py File: zerver/views/auth.py Class: TwoFactorLoginView (794:0) Function: __init__ 1/3 33.33% Function: get_context_data 2/3 66.67% Function: done 0/3 0.00% Total: 33.33%
Per definition, methods with double underscores are also target.
I can still accept __init__
, but there are cases where __str__
appears. I feel it is noise to include it as a factor in the measure of good class design.
Batch measurement by specifying a directory
Next, use -d to apply cohesion to a directoriy.
$ python -m cohesion -d zerver/models/ File: zerver/models/custom_profile_fields.py Class: CustomProfileField (59:0) Function: __str__ 4/20 20.00% Function: as_dict 7/20 35.00% Function: is_renderable 1/20 5.00% Total: 20.0% Class: CustomProfileFieldValue (182:0) Function: __str__ 3/4 75.00% Total: 75.0% Class: Meta (188:4) Total: 0.0% File: zerver/models/linkifiers.py Class: RealmFilter (48:0) Function: __str__ 3/4 75.00% Function: clean 2/4 50.00% Total: 62.5% Class: Meta (60:4) Total: 0.0% File: zerver/models/onboarding_steps.py Class: OnboardingStep (8:0) Total: 0.0% Class: Meta (13:4) Total: 0.0% (後略) $ python -m cohesion -d zerver/views File: zerver/views/registration.py File: zerver/views/user_settings.py File: zerver/views/digest.py File: zerver/views/auth.py Class: TwoFactorLoginView (794:0) Function: __init__ 1/3 33.33% Function: get_context_data 2/3 66.67% Function: done 0/3 0.00% Total: 33.33% File: zerver/views/custom_profile_fields.py File: zerver/views/realm_linkifiers.py File: zerver/views/documentation.py Class: DocumentationArticle (35:0) Total: 0.0% Class: ApiURLView (66:0) Function: get_context_data 1/1 100.00% Total: 100.0% Class: MarkdownDirectoryView (78:0) Function: get_path 4/5 80.00% Function: get_context_data 4/5 80.00% Function: get 2/5 40.00% Total: 66.67% Class: IntegrationView (323:0) Function: get_context_data 1/2 50.00% Total: 50.0% (後略)
I tried to include a separate directory called actions, but overall there are a lot of files without classes.
Code quality metrics are often based on a Java-derived class-based philosophy, but I can't divert if the architecture doesn't use many classes.
Identification of upper and lower level files
You can use -b to bring up files below a certain level and -a to bring up files above a certain level.
As mentioned above, files without classes are much noisier. So I filter them appropriately.
$ python -m cohesion -d zerver/views -b 30 |grep -v 'File:' Class: DocumentationArticle (35:0) Total: 0.0% Class: VideoCallSession (39:0) Function: __init__ 0/0 0.00% Total: 0.0% Class: InvalidZoomTokenError (44:0) Function: __init__ 0/1 0.00% Total: 0.0% Class: InvalidMirrorInputError (29:0) Total: 0.0% Class: SentryTunnelSession (25:0) Function: __init__ 0/0 0.00% Total: 0.0% $ python -m cohesion -d zerver/views -a 70 |grep -v 'File:' Class: ApiURLView (66:0) Function: get_context_data 1/1 100.00% Total: 100.0%
Finally
That's all.
It is useful as a tool to measure cohesion easily, but it does not fit Zulip's design so well.
The library code is also relatively short, so it would be interesting to create your own tool for your own use case.