Part of being a good engineer is finding the right balance.
I know engineers who would gladly duplicate code all over the code base to avoid creating a new abstraction.
I know engineers who create polymorphic abstractions for a single caller with a very obvious set of parameters.
So much of wisdom is in finding balance and not being dogmatic about rules.
Usually, some moron decided to copy paste things a few levels up and then the top half of the system metastasized into two parallel universes of broken garbage.
For instance, one might decide to perform auth later in the flow so unauthorized handlers can run and set a “this requires auth” bit that defaults to false, and the other flow could add a forged auth header before the auth step.
Now, the auth handler needs a “allow forged header” flag and a “already authenticated” flag.
I’ve seen that grow to a half dozen cases until massive production dataloss occurred. A buggy client tried to delete something local to their account without specifying a userid as a parameter (this codebase was garbage!) and deleted the something for all users instead.
I can’t remember how the dataloss was “fixed”, but it definitely wasn’t “all requests go through a simple auth check, and all handlers declare/implement their auth requirements in the same way”.
Getting a design approved to require a user id be specified exactly once for account-level operations was fantasy land for that team. (Most hires with any sort of engineering talent bounced in under a year.)
Anyway the “abstractions are hard so copy paste” approach did provide job security for the lifers on that product. I can’t imagine them holding a job elsewhere, but they were completely immune to layoffs (hostage style).
This is a pretty valid approach if you’re an agent hired to perform industrial sabotage, or if you keep replacing keyboards after you knaw through the corner.
Code duplication and 'wrong' abstractions both count themselves amongst the other foibles of programming. But they don't directly produce a cost which can be cheap or expensive.
They produce some other high dimensional intermediate value which can then produce highly variable cost dependent on the domain, goals, and scenario.
As ever, it depends.
The depends is quantifiable, but it doesn't fit in a blog post. Think more along the lines of war and peace.
Overengineering, abstractions and premature optimisation are the 3 worst plagues of engineering.
At the same time I’m happy they exist because it means we’ll always have a job.
The Wrong Abstraction (2016) - https://news.ycombinator.com/item?id=35927149 - May 2023 (69 comments)
The Wrong Abstraction (2016) - https://news.ycombinator.com/item?id=27095503 - May 2021 (17 comments)
The Wrong Abstraction (2016) - https://news.ycombinator.com/item?id=23739596 - July 2020 (240 comments)
The Wrong Abstraction (2016) - https://news.ycombinator.com/item?id=17578714 - July 2018 (207 comments)
Prefer duplication over the wrong abstraction - https://news.ycombinator.com/item?id=12061453 - July 2016 (96 comments)
The Wrong Abstraction - https://news.ycombinator.com/item?id=11032296 - Feb 2016 (119 comments)
Also I’ve seen the kind of codebase that seems to be LZW packed due to the sheer desire to DRY everything out. Not pleasant thing, by the time you goto 10 layers deep on some “helper” function you forgot why you in there.
Copy and paste once is fine, twice, not so much.
Often I've seen two totally different things exist in one bit of code, no overlap!
Premature generification is bad, and leads the developer to believe that two things are the same, making it harder to see they are not.
Also, can make it much harder to see that a different abstraction would give a cleaner outcome....
There has been growth since but it's been concentrated into fewer channels and somewhat industrialized.
IMO it's easier to inline a bad abstraction than it is to consolidate a bunch of subtly different things that should have been abstracted from the beginning.
But I expect people's opinions on this differ wildly based on their personal experiences. Just my anecdotal take.
Otherwise what is better is better and we don't know what we don't know
I've worked in too many projects, where every new feature needs to be built on top of existing abstractions, that often lead to severe restrictions if something slightly different is required. I always try to create reusable units/components, that can either be used as intended or replaced by something that behaves slightly different if needed.
Components are not necessarily frontend components, this extends also to backend logic.
This step should also be parameterized by how many times the duplication has occurred. Refactoring preemptively may lead to poor abstractions, but not refactoring after seeing the exact same thing tens of times would also be weird. See also:
On the other hand it is pretty difficult and error prone to consolidate duplicated code which have drifted apart over time.
If in doubt, chose the approach which is simplest and least risk to revert if you discover in the future you made the wrong choice.
I do agree a bad abstraction can cause huge problems. But it’s usually not the kind of abstractions introduced to eliminate code duplication, but the kind of top-down “architecture astronaut” abstractions, where a model is chosen which does not fit the complexity of the problem.
But with that in mind, I mostly agree with the article: if it's not a violation of "single source of truth", then abstractions are just a convenience. If it starts being inconvenient, then it's not doing its job and there's no reason to use it. It's a serious code smell if a function needs several flags for custom behavior; that means it's probably the wrong abstraction or violating the single responsibility principle. If there is a legit need for lots of customization, an often-good way to handle is to take a function/functor as an argument for the customization. E.g., rather than `solve(f:double -> double, max_iters = 99, x_abs_tol = 1e-15, x_rel_tol = 1e-15, ...)` you can do `solve(f:double -> double, stopping_criteria: StoppingCriteriaClass)`
Mike's talk argues that code solutions need not be modelled on the real world, and that different data creates different problems, which need different solutions. I can't do the talk justice, but it's had a big impact on me.
Brian's talk is about abstraction generally, and how it's difficult to find the "right" abstraction.
it wasn't received well and senior developer told me that 'good developers know exactly what patterns to use all the time before writing any piece of code and that he will clean up my mess'
long story short his refactoring caused what was otherwise a stable system into a complete mess and it reminded me of Nassim Taleb's book
So code duplication because of abstraction issues is rare. Code duplication because of siloed developers is so much more common.
At the very least it is not once you're working at the wrong kind of scale.
Once you have an awkward number of customers (more than five and less than a hundred), maintaining duplicated code that should have been abstracted and modularised will only seem cheap if you don't mind that you burn through even junior employees at a pace.
And in the LLM era the wrong kind of scale appears in different ways; code generated and duplicated without proper abstraction and then maintained by an LLM that cannot be trusted to do the same modification each time it encounters a pattern or to have enough of an overview to slowly rescue duplicated code through good abstractions.
I would go as far as to say that any abstraction you can maintain (that is in active maintenance, I mean) is better than code duplication once you are past a de minimis threshold.
Refactoring code to reduce the number of lines is _compression_, akin to RLE coding.
Refactoring the code to lift conceptually coherent parts is _abstraction_.
Less compression, more abstraction. Then you're fine.
Having a wrong abstraction means you end up with a class/function/module with a huge amount of configurations through boolean/enum parameters. It's not even clear that all combinations of configurations is even valid. This situation may be simplified by duplicating, and then eliminating code, thus creating more streamlined code for each use case. This may require fixing similar or cross-cutting bugs in multiple places (eg: JSON serialization is stupid, need to hack a workaround), but keeps the business logic changes simple. Maybe a bit more numerous, but the code is able to raise all the scenarios to consider.
Having no abstraction means you may have to change business logic consistently in multiple places, or you have to fix exactly the same misconception (aka a bug) in multiple cases. e.g. tax rate management in a multi-national context. This is also terrible, because you may fix an important problem in one place and forget other places with the same issue. Now you missed 12 potential bugs by fixing one. This can however allow you to discover a true abstraction. Maybe these 12 places should call just one place?
But for code evolving across a team understanding this tension, a bit of duplication while waiting for confirmation that these pieces of code break together and change together is better than just shoving the same 3 if-statements into a function to avoid "line duplication". Concept duplication is more important.
Some of the biggest rabbit holes come from naming conventions not aligning across the business and technology silos. If everyone agrees that Customer has exactly 34 attributes, then it is possible to move to the next step of sharing libraries of types across the team. Getting your POCOs/DTOs 1:1 across the board is when the duplication really starts to melt away.
Generalizing this in the abstract is a wrong abstraction.
Some previous discussions:
2023 https://news.ycombinator.com/item?id=35927149
2021 https://news.ycombinator.com/item?id=27095503
2020 https://news.ycombinator.com/item?id=23739596
Abstraction is a vague term when used here. Is a shared function an “abstraction”? It’s more like implementation hiding, maybe some data hiding. But you definitely have a dependency on it now.
Acronyms like DRY are for beginners. Once you get good you know when to break the “rules” (and when not to).
Duplications can often be cleaned up over time, bad abstractions can quickly become a bottleneck, that severely slow down everyone working on the project.
Do you want to iterate using for loop or using .iter().step(2).map()?
I would rather have consistency than a mixed bag of levels of abstractions.
The Node ecosystem is full of wrong abstractions.
Very true in some sense, but I continue to encourage DRY-bias because I've literally never seen teams duplicate code responsibly and later dedupe it when it's the right time. 95% of the time this sentiment is quoted to justify shipping quick slop and stable reusable bits are never extracted into a shared lib later.
Fundamentally, the article addresses cases where it's not clear yet how many sources of truth there will be. Are the two spots in the code using the same algorithm, or slightly different versions? More importantly, will they change for the same sorts of reasons?
The title adage (correctly, imo) argues that making two different things the same will cause you more pain than making two same things different via duplication. In the latter thing case, the "damage" is just having to make the same changes twice, or doing a refactor to introduce the abstraction. In the former case, you have to keep adding to your abstraction, or undo it. Most crucially, it breaks "locality", which is the only property you really care about when making changes. I just want to make this change and not worry about side effects to unrelated parts of the system.
I've always found it odd when even fairly smart engineers sometimes prioritize real-world metaphors over the actual needs of the codebase. Years ago when I was only a few years out of school, I was implementing a connection pool in Rust, and the most reasonable way to implement it was to have the connection hold a weak reference to the pool so that it could get checked back in automatically when dropped. My manager (an extremely experienced engineer) didn't like this idea because "a library holds library books, not the other way around". I didn't feel like this was a compelling reason to design things differently, but he refused to engage with the issue in any way other than through the lens of that metaphor. Eventually the impasse was solved by one of the other managers in my department suggested that while library books don't contain libraries, they do have the name of the library stamped in the back as a reference to where they should be returned, and I guess my manager found this to be a reasonable extension of the analogy. If I were more experienced, maybe I would have recognized that I could find a way to engage with the analogy like the other manager did without ceding the point, but even today I still feel that it was completely bizarre to insist on that as the canonical way to frame things rather than just considering the ramifications of the abstraction in the code and the experience of using the library based on it.
This isn't really a good example, assuming both can be used to represent the same thing.
The problem with the wrong abstraction is when your abstraction doesn't let you represent something. Then, because of you've already invested so heavily into it, you start contorting the problem to fit your abstraction and it becomes a shit show.
I don’t think it matters, specially for sort sized loop scopes
But beyond that, any stable abstraction is better than duplicated code.
OP is right that code duplication is far cheaper than the wrong abstraction, but the opposite is also true - the right abstraction is far cheaper than code duplication.
Maybe this is an area where AI can help identify duplicate code though to show opportunities for de-duping
Automated re-factoring means you can refactor duplicated code only as long as it is exactly duplicate.
Whereas the whole problem is that when somebody changes 3 out of 10 of the duplicate cases in a simple way that they are no longer exactly duplicate, and then somebody fixes a bug in one of the other 7/10 cases, they can update the bug across the 7 "duplicate" cases but they'll miss the 3 that aren't.
The problem with duplicate code is always when some of the instances get changed/fixed but not all of them. And that when somebody edits one instance, they often aren't even aware of all the other instances.
Abstractions are low-risk, because you know where the code is. If it's the wrong abstraction, you can fix that and know what you're fixing. Whereas with duplicated-yet-modified code, you've now lost the connections between them.
Unsurprisingly, that goes for just about any idea in software development. I worked in one code base that heard small functions are best, so every function was less than three lines long. You don’t gain anything by replacing `lst.get(0)` with `get_first_item_in_list(lst)` (in fact, understanding becomes much more difficult), but breaking down functions into the smallest units that make sense independently within the business domain can be very helpful, both for understanding and testing.
It's of course possible to functional-ize segments of logic, but then the question of state mutation must be brought up. How isolated are these changes from other parts of the code / system state. Can this be run in parallel or is it something that must be serial? What potential race conditions exist?
ideal case: support libraries and then very simple duplicated code that is easy to read and modify. critically the core control flow should remain duplicated, but simplified by the support libraries.
I've learned to tolerate a small amount of duplicate code for this reason. If the duplication remains small, it's not that harmful, and if it starts to grow, then one has a better shot at finding a good abstraction for it.
Code that is coincidentally similar very often diverges in either the short or long term, and DRYing it up aggressively tends to result in functions that have many boolean parameters that each trigger disjoint sets of behavior - which is a bit of a nightmare to maintain due to the high cognitive overhead of remembering how all the interleaved-but-actually-unrelated behaviors should work.
This outcome is low-cohesion code.
It's a useful concept to be aware of - worth clicking through to the actual content of the talk rather than just the headline.
It really depends on the exact type of code we're working with, and what our objectives are.
In my case, I often use object inheritance. It's a damn cheap way to DRY. However, when people hear "inheritance," they often think "polymorphism." There's a really big difference between the two, but popular culture has jammed them into one ball, and it's not worth the agita, to try to explain the difference.
But if you are doing optimization, long stacks can be your enemy, and inheritance tends to have long, windy stacks.
In these cases, the copy/pasta method may well be the best approach.
Like I said, "It Depends."
Starting with abstraction when you are only beginning something rarely works well and leads to code bases littered with interfaces having only one implementation.
Abstracting the code when you have two copies does not always pay off, especially when you end up not needeing more than just two copies anyway.
But once you have three copies, it's indeed time to start generalizing.
For $19.95, you can replace your single single point of failure with multiple single points of failure!
It would be iconoclastic if the common sense basic approach would be to start with abstraction. It's not, the common sense default is to write possibly duplicate behavior until you actually discover several cases to abstract away, until you bevalop a sensible idea of which functionality unites them and which doesn't carry over all of them.
>Once you have an awkward number of customers (more than five and less than a hundred), maintaining duplicated code that should have been abstracted and modularised will only seem cheap if you don't mind that you burn through even junior employees at a pace
Maintaining the wrong abstraction, or, god help, abstractions, would be even worse.
If you haven't figured out a good abstraction at 5-100 customers, God help you.
I agree that LLMs are naturally anti abstraction machines.. I'm often trying to find way to reverse that.
Even better is an old school MPA with progressive enhancement.
Not that I'm immune from choosing the wrong abstraction sometimes. More than once the "other people" was me. We all make mistakes.
Perhaps "recompilation" - rewrite by replaying all prompts in strict code quality context (linters, complexity & dedup checks) would make better abstractions.
The only problem now is that LLMs are non-deterministic.
Basically since moving to a functional approach in typescript I find I do not fight abstractions as I used to when I used classes and inheritance.
Hard disagree. When you've had to chase through a change in untold and actually unknown numbers of duplications of code in different permutations and fix them because they are all on fire simultaneously, you'd disagree too. A bad abstraction would at least have had one fire in one place.
This blend of opinion is very naive. Every single project is a business requirement away from having the wrong abstraction in place.
Half of your abstractions are wrong. The hard part is knowing which half.
This is tautological though, it's like saying “starving is much better than eating the wrong food” (for instance: eating quick lime).
Of course you'll always find a way to do things wrong in a way that is costlier than not doing anything.
I am a bit of an LLM cynic but I am trying to learn it all, and I have to say I have spent most time trying to work out: how do you explain how a brown-field codebase actually works, in such a way that the LLM won't pervert it through misunderstanding.
It does encourage you towards the "conventional" coding standard for any new project, because you want to use a pattern that it will have seen in its training set.
But for example there are differences of opinion in how wordpress plugins (which have a very complex control flow) should be structured. LLMs are incredible at knowing how WP works, actually, but what is difficult is explaining how your methodology for a large plugin is going to work.
It is a battle — but a useful one because it can be used for, er, studying the comparative belief systems of the LLMs.
Of course it's a truism if you just say any abstraction that works is a good abstraction.
That is not what I am saying at all. Bullshit abstractions at least let you control the problem. Duplication doesn't.
But also it's very possible to not realise you needed an abstraction until it catches fire in multiple places.
And quite often it's not you that got the codebase to a hundred customers, is it? Sometimes it is a sequence of fresh-faced young developers who didn't have the authority to say "this duplication is bullshit" and were instead compelled to repeat it.
I think a lot of these discussions happen in nice little blog-post vacuums of progressive thinking, where people can go "mmm, object oriented coding obscures intent and clarity, mmm", blog posts with "an X is a Y", "the unreasonable effectiveness of foobar" etc.
In the real world, every duplication that works sticks for good; there is rarely budget to electively replace code that isn't broken. Until one day it doesn't work. And then… how many times is it actually duplicated? How many of the duplicates diverged? How many of these do we no longer need?
Compiled assembly code is not an input to the next compilation; source code is an input to the LLM's next inference.
Sure, maybe "prompts are the code," but you must realize that code is also the prompt.
The other end of this spectrum is dealing with the architecture astronaut's up-front abstraction. Totally overengineered for solving the initial requirements, but then constantly needing new hacks to make it cope with new requirements as they come up in the normal course of work.
That's why there's a balance in there, it's somewhere between "always duplicate code even when you know a lot about the problem" and "always write abstractions even when you know very little about the problem."
That's true only for "good" abstractions. Bad abstractions will often require you to change code in all the places using it, requiring you to understand how all of them work and what are their requirements, _all at the same time_.
It all depends on the amount of duplication and the complexity of the abstraction. Like you said, no generic advice is possible that clearly separates it into "abstract here" and "duplicatehere".
In your example it sounds like we aren't talking about 2-3 places where duplicate code existed that just needed to be refactored into separate units. It sounds more like a complete disregard for abstraction to move on quickly.
If you see duplicate code and have a good understanding how to solve that then it's totally a good thing. The real problem comes in if you add abstractions without knowing wether they will hold up. And this is where the blogpost comes in. In my opinion 2 duplicates are fine, at 3 you should start thinking or implementing an abstraction if you have a good understanding of the code and usecases.
Wouldn't most large codebases with poor abstractions just have engineers engineer around them with their own solutions? In a large enough codebase you'd have both the bad abstractions and all the not-quite-duplicate implementations ignoring the bad abstraction?
I'm using bad here loosely, it could be buggy, incorrect, incomplete, insufficient and more; while being owned by someone or some team that's a challenge to work with for various reasons (overloaded, under-resourced, overbearing, etc., etc.).
On the contrary: that's precisely what a bad abstraction would not offer.
Instead it would spread its assumptions to different parts of the system, as every caller, sub-service, etc. would have to change shape to fit in that abstraction's box, however unnatural it is (and we know it would be unnatural, because we already said it's a bad abstraction).
Abstraction is not the same as encapsulation.
Abstractions are a form of coupling, and coupling can be good, if the components are truly interdependent, and have a well defined domain. The problem with most abstractions, and I’ve seen this time and time again, is that they become brittle, are over used, and the cost of maintaining them grows exponentially with the size of the code base. With the reason for the cost ballooning being the system has disparate components that look interrelated but are absolutely not. Once you give someone a hammer they tend to assume everything is a nail.
The biggest problem, IMHO, is that abstractions are often used where a pattern would be more effective, easier to maintain, and easier to iterate on. And the primary difference between a pattern and an abstraction really comes down to coupling. Patterns remain decoupled, abstractions are tightly coupled.
And to be clear, I will and do use abstractions, when and where they make sense. But only after clear patterns emerge, and it’s been proven that components are truly coupled.
I will gladly die on the hill, that abstractions are measurably worse than duplication an overwhelming amount of the time. They’re often nothing more than a form of premature optimization.
Exactly. The abstraction purists are not working in the messy, dead line driven real world.
But if I tell it "read these files that use the same conventions" first, there's no misunderstanding, and the agent also picks up the general "tone" of the code. I have very little to tweak if I've defined the problem well.
I agree with you that it’s a truism, but it’s useful advice for people who have a habit of trying too hard to DRY their code. IIRC the author comes from the Ruby world, where DRY was a big thing, and this talk was part of the pendulum swinging back away from this DRY obsession that sometimes just resulted in convoluted code.
Write everything twice quickly becomes write everything 4 times once a new change appears, just as quickly as it becomes write everything 8 times, and so on.
I'm afraid there's no sensible soundbite developers can follow blindly.
So... the wrong abstraction, no matter how bad, is better than code duplication?
Obviously, yes. But it is my experience that this happens more slowly and that API invocations that break when the abstraction is changed are much easier to identify than broader duplicated patterns of code that span many lines and subtly diverge.
And even then those divergences are better because each wrapper around the abstraction is documenting the problem with it. But the abstraction can generally be replaced by one with the same API surface.
(Even if you take into account the fact that any API behaviour ultimately gets relied upon even if undocumented. Which is true.)
To be fair my experience is that of a freelancer and contractor who arrives trying to fix things that have been through many such hands. And I think if these developers had it drummed into their head that any attempt at abstraction would be better than copy and paste, these situations would be more knowable.
When that happens there's a major engineering leadership failure currently in progress, even if engineering leadership isn't aware of it.
But so does duplication, in practice, and it diverges more as it does.
But any abstraction ends up with a signature and a name that can quickly be found in code.
The risk of a long-lived duplication losing its shape and being hard to find is much greater. Especially if the code is going through multiple hands.
I once had to pick up a project — a working, fully functional website. I could see, pretty clearly, the work of several people. All but one of them terrible.
The one was a diligent developer who was fully wrong in their abstraction (in fact significantly) but was consistent in how they used it.
The rest had simply worked around that code, copied and re-copied their own modified duplications and let things lose any shape. The result was error-prone stuff.
Clearly either the budget (or the client's capriciousness — a separate issue and arguably the bigger one) scared away the one guy, who I actually wanted to talk to but could not track down. He possibly had the origin story, and I wanted to know why his particular abstraction, which was at odds with the framework, was there. It was good code in the wrong shape, and it clearly used to do more, and that is interesting.
All the expedient people who had decided to avoid his code and just patch in duplicated pieces around it were the problem. There was no form to their solution at all. And that had clearly happened over some time (because you could see several different code styles)
Having a lot of if/else in your code is definitely a cost. My weakness isn’t so much the libraries and APIs, but the actual binary - once I have a service that does A very well, and I run into needing A’ I mostly just add in a config line “op_mode = A|A’” and have the else/if chains in the server driving code. Moreso for CLIs that I use myself than production services, but I have added tunables for consistency and replication to datastores to allow new use cases and expand my footprint in the data center.
That's a good problem to have. Getting to 4 or 8 or 12, and then pruning it to 1 or maybe 2 or 3 clearly different cases, is better than shoehorning multiple cases into the wrong abstraction, having everything that speaks with them coupled to that and dancing around their assumptions, and then having to untangle that.
Duplicated code is by definition LESS coupled.
> I would go as far as to say that any abstraction you can maintain (that is in active maintenance, I mean) is better than code duplication once you are past a de minimis threshold.
I appear to be in a solid minority thinking this. But I'm OK with it. I'm probably not going to write a blog post.
A story I like is that in the now lost era of handwriting recognition on PDAs, Jef Raskin concluded that the easiest way to solve the problem was to change handwriting so as to meet the algorithm in the middle.
That is, to find a noticeable simplification of handwriting that people could learn quickly and that eliminated hard-to-process quirks.
I feel I am there with the LLM at the moment, trying to work out what the common ground is.
I originally wrote the following for my Chainline Newsletter, but I continue to get tweets about this idea, so I'm re-publishing the article here on my blog. This version has been lightly edited.
I've been thinking about the consequences of the "wrong abstraction." My RailsConf 2014 "all the little things" talk included a section where I asserted:
duplication is far cheaper than the wrong abstraction
And in the summary, I went on to advise:
prefer duplication over the wrong abstraction
This small section of a much bigger talk invoked a surprisingly strong reaction. A few folks suggested that I had lost my mind, but many more expressed sentiments along the lines of:
This, a million times this! "@BonzoESC: "Duplication is far cheaper than the wrong abstraction" @sandimetz @rbonales pic.twitter.com/3qMI0waqWb"
— 41 shades of blue (@pims) March 7, 2014
The strength of the reaction made me realize just how widespread and intractable the "wrong abstraction" problem is. I started asking questions and came to see the following pattern:
Programmer A sees duplication.
Programmer A extracts duplication and gives it a name.
This creates a new abstraction. It could be a new method, or perhaps even a new class.
Programmer A replaces the duplication with the new abstraction.
Ah, the code is perfect. Programmer A trots happily away.
Time passes.
A new requirement appears for which the current abstraction is almost perfect.
Programmer B gets tasked to implement this requirement.
Programmer B feels honor-bound to retain the existing abstraction, but since isn't exactly the same for every case, they alter the code to take a parameter, and then add logic to conditionally do the right thing based on the value of that parameter.
What was once a universal abstraction now behaves differently for different cases.
Another new requirement arrives.
Programmer X.
Another additional parameter.
Another new conditional.
Loop until code becomes incomprehensible.
You appear in the story about here, and your life takes a dramatic turn for the worse.
Existing code exerts a powerful influence. Its very presence argues that it is both correct and necessary. We know that code represents effort expended, and we are very motivated to preserve the value of this effort. And, unfortunately, the sad truth is that the more complicated and incomprehensible the code, i.e. the deeper the investment in creating it, the more we feel pressure to retain it (the "sunk cost fallacy"). It's as if our unconscious tell us "Goodness, that's so confusing, it must have taken ages to get right. Surely it's really, really important. It would be a sin to let all that effort go to waste."
When you appear in this story in step 8 above, this pressure may compel you to proceed forward, that is, to implement the new requirement by changing the existing code. Attempting to do so, however, is brutal. The code no longer represents a single, common abstraction, but has instead become a condition-laden procedure which interleaves a number of vaguely associated ideas. It is hard to understand and easy to break.
If you find yourself in this situation, resist being driven by sunk costs. When dealing with the wrong abstraction, the fastest way forward is back. Do the following:
This removes both the abstraction and the conditionals, and reduces each caller to only the code it needs. When you rewind decisions in this way, it's common to find that although each caller ostensibly invoked a shared abstraction, the code they were running was fairly unique. Once you completely remove the old abstraction you can start anew, re-isolating duplication and re-extracting abstractions.
I've seen problems where folks were trying valiantly to move forward with the wrong abstraction, but having very little success. Adding new features was incredibly hard, and each success further complicated the code, which made adding the next feature even harder. When they altered their point of view from "I must preserve our investment in this code" to "This code made sense for a while, but perhaps we've learned all we can from it," and gave themselves permission to re-think their abstractions in light of current requirements, everything got easier. Once they inlined the code, the path forward became obvious, and adding new features become faster and easier.
The moral of this story? Don't get trapped by the sunk cost fallacy. If you find yourself passing parameters and adding conditional paths through shared code, the abstraction is incorrect. It may have been right to begin with, but that day has passed. Once an abstraction is proved wrong the best strategy is to re-introduce duplication and let it show you what's right. Although it occasionally makes sense to accumulate a few conditionals to gain insight into what's going on, you'll suffer less pain if you abandon the wrong abstraction sooner rather than later.
When the abstraction is wrong, the fastest way forward is back. This is not retreat, it's advance in a better direction. Do it. You'll improve your own life, and the lives of all who follow.
The 2nd Edition of 99 Bottles of OOP has been released!
The 2nd Edition contains 3 new chapters and is about 50% longer than the 1st. Also, because 99 Bottles of OOP is about object-oriented design in general rather than any specific language, this time around we created separate books that are technically identical, but use different programming languages for the examples.
99 Bottles of OOP is currently available in Ruby, JavaScript, and PHP versions, and beer and milk beverages. It's delivered in epub, kepub, mobi and pdf formats. This results in six different books and (3x2x4) 24 possible downloads; all unique, yet still the same. One purchase gives you rights to download any or all.