Sunday, June 8, 2014

Duplication and Cohesion

A lot of the conversation around removing duplication goes something along the lines:
"remove duplication"
The idea is there is a a scale and your code should be on the low duplication side of it.


Unfortunately, this is only half the story. There is a hidden, often forgotten, label that belongs on this scale: Cohesion


The rule you want to follow is this:

Choose the lowest duplication for the natural cohesion

Examples of low cohesion

My last name is 'Falco'. My sister's last name is also 'Falco'. This is duplication, but there is no cohesion between the names. If my sister marries and changes her name, my name should not change as well.

Because of the low cohesion between our two names, it should have high duplication.

Many unit test scenarios fall into this area, which is one reason you might end up with higher duplication in your test code.

Examples of high cohesion

An advantage of working in many PHP systems is that no matter what I do to a page the worst I can do is mess up that single page. PHP systems tend to have high duplication. The down side of this is when something changes, for example: how sales tax is calculated on a page.  Sales tax has high cohesion, I don't want it to be different for each page. This means that when I need to change it I have to go to all the different pages that implement it and change them. The biggest problem is that most likely means I will forget one of the pages.

Bob Martin has a nice blog on how to separate parts to remove duplication when there is high duplication but low Cohesion.

Detecting Cohesion via Source Control

Think about the the follow example:
You are looking at your source control and discover that in January five files all changed in a single checkin. Those same 5 files all changed together in February as well.
March - Those 5 files changed together
April - Those 5 files changed together
May - Only 4 files changed together
June -  Those 5 files changed together

What happened in May?
Yes, someone introduced a bug. 

Think about that from a bug detection point of view. You didn't test, you didn't even know what the expected user behavior was,  you certainly didn't look at the code. You just detected a natural cohesion and noted that a change was missed in 1 place. This is why you want the minimal duplication for the cohesion that is naturally existing. If everything thing that needs to change together is only in 1 place, you can not forget the other places.


TL&DR; If you have things that need to change together duplicated you have too much duplication. However, if you've removed duplication on similar things that don't change together you will have problems when you change one of them.

No comments: