The Six Stages of Code Grief
The Six Stages of Code Grief
The Six Stages of Code Grief
Even worse than there being no comments: the code is extensively commented, but its function has drifted from what the comments describe to the point where they are actively misleading.
The good old "signal left when switching to right lane."
I mean sometimes you gotta trick the compiler to get a leg up in runtime.
longest file I have ever maintained contained 50,000 lines of code.
fifty THOUSAND.
forgive me for not weeping for 2000 lines.
my advice, don't fucking touch it. pull out as much functionality out of it into other things over time.
there will come a day when you can throw it away. maybe not today, maybe not tomorrow... but some day.
Yeah, been there. The codebase I worked on also had a single method with 10k lines.
The database IDs were strings including the hostname of the machine that wrote to the DB. Since it was a centralized server, all IDs had the same hostname. The ID also included date and time accurate to the millisecond, and the table name itself.
Me: Mom, can we have UUIDs? Mom: We have UUIDs at home UUIDs at home: that shit
You should add the local weather forecast, a random fun fact and the canteen menu of the day to the key to make it more interesting to read.
I was working on a project that had 100 000 line oracle database PL/SQL procedure that ordered a work order from subcontractor. It was just one single function. That was called by classic asp + visual basic COM component.
Oh Lord, I get Vietnam flashbacks about it.
Jesus i worked at exactly this kind of project once. The only other dev was also very hostile and protective of this position. He did not want me there in the slightest. Took about 6 months before we cancelled the contract since this dude was just actively harrassing me in Teams DMs on the daily and he just ignored all my concerns regarding maintainability since "he could understand the code" and i was probably just "not experienced enough".
Don't downplay what this does to your mental health. 5 years of workplaces like this and I'm now starting to see a therapist due to exhaustion disorder symptoms in my goddamn 20s. Take care our there!
So infuriating when you have some dickhead making themselves unfireable by intentionally convoluting the codebase and chasing out any other hire. And even worse when management bought into it and think the guy's an actual irreplaceable genius.
Probably even believes it himself. I hate narcissists.
I literally told my boss that I was just going to rebuild the entire pipeline from the ground up when I took over the codebase. The legacy code is a massive pile of patchwork spaghetti that takes days just to track down where things are happening because someone, in their infinite wisdom, decided to just pass a dictionary around and add/remove shit from it so there is no actual way to find where or when anything is done.
FUCK. Triggers me. Just got let go from a place that had this problem and wouldn’t let me make any changes whatsoever. I didn’t even push hard.
I did this once
I was generating a large fake dataset that had to make sense in certain ways. I created a neat thing in C# where you could index a hashmap by the type of model it stored, and it would give you the collection storing that data.
This made obtaining resources for generation trivial
However, it made figuring out the order i needed to generate things an effing nightmare
Of note, a lot of these resource "Pools" depended on other resource Pools, and often times, adding a new Pool dependency to a generator meant more time fiddling with the Pool standup code
Side-rant:
I rarely write Python code. One reason for that is the lack of type safety.
Whenever I'm automating something and try to use some 3rd party Python library, it feels like there's a good 50/50 chance that front and center in its API is some method that takes a dict of strings. What the fuck. I feel like there's perhaps also something of a cultural difference between users of scripting languages and those of backend languages.
What you described sounds so much worse though holy shit.
Yeah, the new pipeline is based HEAVILY on object inheritance and method/property calls so there is a paper trail for ALL of it. Also using Abstract Base Classes so future developers are forced to adhere to the architecture. It has to be in Python, but I am also trying to use the type hinting as much as humanly possible to force things into something resembling a typed codebase.
I was part of project that scoffed at the idea documenting code. Comments were also few and far between. In retrospective, it really seemed like they wanted to give that elitist feel because everything reeked of wanting to keep things under wraps despite everything being done out in the freakin' open.
The language is COBOL.
your paycheck is $5000 because you know COBOL
Pretty sure that knowing COBOL isn't the hard part. It has relatively few language concepts.
This lack of language concepts just makes it difficult to reason about it, so that's what you're getting a paycheck for. Well, and possibly also because it might take months to have a new dev figure out your legacy codebase, so it's cheaper to keep the current dev by paying them competitive prices.
Per day.
What? I make that kind of money by dabbling in Ansible, Python and Kubernetes. $5000 sounds pretty lowball for fairly niche knowledge like COBOL.
I just inherited my first codebase a few months ago. It's like this everywhere and original developer was fired, so what should sometimes be a simple fix turns into a full day of finding what needs to change. Any recommendations on fixing/maintaining code like this or should I just make it the next person's problem?
Figure out what something does, and rename it (with a stupidly verbose name, if you have to). Use the IDE refactor tools to rename all instances of that identifier
[[Wikilinks]] syntax to link between notes, which lets you build a graph structure using your notes as nodesFor example, a function or property, or class might be invoked using Reflection, via a string literal (or even worse, a constructed string). And renaming it can cause a reflective invocation somewhere else random to fail
Or function or operator overloading/overiding doing something bizarre
Or two tightly coupled objects that mutate each other, and expect certain unstated invariants to be held (like, foo() can only be called once, or thingyA.len() must equal thingyB.len()
You can use these to more thoroughly compare behavior between the original and a refactor
Separate out those "concerns", into their own object/interface, and pass them into the class / function at invocation (Dependency Injection)
cs
public Value? Func(String arg) {
if (arg.IsEmpty()) {
return null;
}
if (this.Bar == null) {
return null;
}
// ...
return new Value();
/// instead of
if (!arg.IsEmpty) {
if (this.Bar != null) {
// ...
return new Value();
}
}
return null;
}
I use sentences as variable names sometimes, because I necessarily end up with lots of similar-sounding variables or functions.
List_of_foo_dicts = Get_foo_from_bar_api()
Add comments as you go
You're going to want to follow the "campsite rule" everywhere you go, and also sneak in positive refactors into your feature changes (if business is not willing to commit time to improving the maintainability of the codebase).
Read up on good software design principles. I don't know you experience level, but for instance, everyone agrees that appropriate abstraction, and encapsulation make code easier and more enjoyable to work with, and will let you run tests on isolated sections of the code without having to do a full end-to-end testsuite run.
Having tests that you trust, especially if they execute quickly, will increase your "developer velocity" and let you to code fearlessly--knowing that your changes are reasonably safe to deploy. (Bugs and escaped defects will happen, but you just fix them and continue on.)
Good luck!
ARGH this triggered a bit of PTSD for me....
"We're going to convert these COBOL applications to C#, and you need to test that the new application works exactly the same, including the same bugs as the old application."
"Ok, where's the specifications and test reports of the old COBOL applications?"
"They were lost to time, we don't know where they are."
"Ok, so how are the developers going to write the C# code?"
"They're going to read the COBOL scripts and recreate them into C#, we advise you do the same."
Cue me spending a month trying to decypher the COBOL gobblediremoved into inputs and outputs, and write testcases based on that. And after that month was up, and I had delivered my testcases, they told me that my services were no longer needed.
I had delivered my testcases, they told me that my services were no longer needed.
Gee, I wonder how all those specifications and test reports became "lost to time"....
Hey! This was my first real job. Is Matlab code written by physicists who just recently learned programming.
My first thought immediately was of academia also.
I'll get shit on for suggesting it but this is a great use case for AI: comment the code and generate some basic docs. Even if it's wrong it'll give you a sense of where to start looking for flows.
Problem is, you won't know what the AI screwed up until someone breaks everything.
You got 3 letters?! Luck!
I worked at a japanese company whose engineers we're former NTT developers. Copypasta (i.e. not using functions), inefficient algos, single-letter var names, remote code execution from code as root, etc. good times!
There are no comments in the code
At my last job, I was assigned to a project being run by a straight-out-of-college developer who felt that not only were comments unnecessary, they were actually a "code smell", a sign of professional incompetence on the part of whoever added them. It's an insane philosophy that could only appeal to people who have never had to take over an old codebase.
I kind of get the idea that code should be self-documenting, but at the same time, there's so many crazy business rules that comments are basically a necessity if nothing else other than to explain why in the hell the crazed mess that provides the required functionality for the business rules exists.
Yeah some comments are not useful
python
# returns the value as a string
return str(user.id)
Some comments are
python
# returns the user id as a string because ZenDesk's API throws errors if it gets a number.
# See ticket RA-1037
# See ZenDesk docs: https://etc/
return str(user.id)
That's typically what people who advocate for less/no comments really mean. The code should self explain "what" it does, but if the "why" isn't obvious (i.e. confusing business logic) nobody argues that you shouldn't comment it. That's how I've worked in every company I've been at (and all developers around me) from 50 person start ups to >2k people. It's really common mentality with Ruby developers
Or, it appeals to people that have had had to take over an old codebase where the comments were all lies.
“Code never lies. Comments sometimes do.”
It's funny, the exact same logic applies to method and variable names. There's no compiler that ensures that a method's name accurately describes what the method does or ensures that a variable's name accurately describes what the variable represents. Yet nobody ever says "you shouldn't use descriptive method and variable names because they might be misleading". And this is hardly academic: I can't count the number of times I've run into methods that no longer do what the method name implies they do.
And yet method and variable names are exactly what people mean when they talk about "self-documenting" code.
I don't know that I could have stopped myself from asking whose nephew they are and I'm just a hobbyist
Oh, it's only the files that have over 2k lines of code? Hell, I'll take that over what I'm dealing with now. I've got multiple FUNCTIONS that are over 2k lines. >:(
Yeah, I dont see a big problem with files over 2000 lines in some cases, as long as things remain well writrej, organized, abstractd.
One piece of garbage that I'll never touch again hae most functions this size. One was 50,000 lines! Hundreds.of lines of if/else, half of the functions passed the same 60 arguments because he didn't understand classes or even dictionaries, etc etc. And was used heavily.
Reject merge.
Documentation is part of design.
Do it or die in obscurity.
Something that I'm disproportionately proud of is that my contributions to open source software are a few minor documentation improvements. One of those times, the docs were wrong and it took me ages to figure out how to do the thing I was trying to do. After I solved it, I was annoyed at the documentation being wrong, and fixed it before submitting a pull request.
I've not yet made any code contributions to open source, but there have been a few people on Lemmy who helped me to realise I shouldn't diminish my contribution because good documentation is essential, but often neglected.
The fact that documentation and comments can't "fail" if the underlying code changes is a real problem. I've even worked at places which dictated that comments had to go directly above or even beside (inline) with the code they were explaining, so they would show up in any patches changing the code.
What do you think happened? Yup, people would change code and leave the outdated (and wrong) comment untouched, directly to the right of the code they just changed.
Hell, I was one of those people, so I get how it can happen.
Code was written before git was invented.
Tell that to Linus.
Shit needs syntax and documentation.
Anything submitted needs to be reviewed before merge/push. Syntax and Documentation rejections don't result in errors. Get your shit right first. You are trading on someone else's rep with this.
If you want to push your own code do it with a separate pull. If you want it merged that carries responsibility to the person carrying it.
https://en.wikipedia.org/wiki/Git
Initial release 7 April 2005; 20 years ago
Oh oh
That's what agentic AI is for! Your OS will figure out by itself what you are doing and weave together a shambolic rococo digital house of cards that will be not just undocumented but utterly incomprehensible.
It's fine, just get a 5GHz CPU with 48 cores, 1TB of DDR5 HBM super RAM, and maybe a few petabytes of storage (in the cloud in a flatpack Docker that runs on a VM), so that you can finally make that button blue.
Shut up and take my venture capital money! And maybe 2/3 of the whole market cap in stock options! /s
Fucken right, get your agentic AI to get in touch with my agentic AI (with wire transfer deets)
I've been doing this for years at my current job. It has become a masterpiece of refactoring and comments. They weren't even asking the right questions. I'm very proud of myself.
So naturally, I'm about to get fired and have the whole thing redone by AI.
Then re-hired for 3x salary to make it work again, I hope. Or just watch the company/project fail spectacularly
I didn't even know we were hiring ...
Bonus frame:
The 2000 line file is one function
That implements a ******* VM in which all of the byte code runs in, and rest of source is just byte code listings that the linker magically gathers into a working program.
What’s a hunter2 VM?
Oh, so you worked with my ex-coworker.
It implemented a database. Giant branching if/for loop.
I’m dealing with this currently. Thank you all for confirming that I’m not crazy
That time I started a new job and my first task was "fix bash"...and then I discovered a multi megabyte monstrosity called "bash.sh"
omfg that's over 1 MILLION characters 💀 💀 💀
vomit
Then you fix 90% of a problem and get blamed when the rest of 10% doesn't work
Honest question: would an LLM be able to write useful comments in code like this?
It would probably struggle to see the larger picture. I can see it being used to add comments in self-contained functions though without too much difficulty.
100% I use them a lot to ingest and understand shitty code for me. Of course it's not perfect, it's like having a colleague who's not super strong but has infinite patience for bullshit
Honest question: would an LLM be able to write useful comments in code like this?
It can be better han nothing, but not really. The LLM faces the same challenge that any competent coder does: neither were present to learn the human, business and organization context when the code was first written.
use the LLM to generate regression tests for the large file, then start refactoring it
The only experience I have like this is when I wanted to see how the ARMA Life mod was doing certain things, but it was programmed by like 20 different people in 3 different languages. Most of it was in German and French.
It was easier to just to find my own way of doing what I wanted to do.
"Documenting the code base will be your first task for the next month to help show us how well you understand the codebase."
Translation: please help us understand our codebase. We're paralyzed by fear.
Cursor please document this codebase
The next row would be "boss fires you thinking Claude can maintain the codebase."
At least there's a kind of happy ending when we walk past the old boss and don't toss a dollar into his pan-handling hat.
Well, I’m the only maintainer for my project, so ha! (I only have myself to blame.)
That just means my boss will have to do all the work. Ha, what an idiot. Wait… aw. 🙁
"Code IS documentation"
Those are rookie numbers. We got functions with 5000+ lines and 20 levels of indentation directly in the user-interaction event handlers :)
Well, that's how you do it!
And if two widgets need to create the same effect, you just copy the 5000 lines around. That's why copy-and-paste was invented.
(It really shouldn't be necessary... but in case somebody still needs it, here's the \s)
This is the right strategy. Storage space costs nothing these days. Why not just clone and go? That's what I always say.
Why, you can just 'inherit' some code by copying a block, pasting it, then making a few small changes. No thinking, no problem.
Ok, I'm off to copy of my code folder for the next release.
every programmer I've seen who says their code is self documenting writes dogshit code
I think we're all just dogshit but think we're better than the next person, it's like driving. I'm a "comment if there's no way to make it readable" kinda guy, I work with some "comment and don't bother to make it readable because there's comments" people. We all suck. I probably forget to comment on unreadable places sometimes, or overestimate readability he either doesn't update comments so they're out of date or the code is so gibberish that a comment didn't help.
Ideally I guess you comment AND make it readable AND make sure the comments are up to date, but who do you think we are? Superman? And what's the right level of commenting anyway? Probably depends on who is reading them.
The team lead has spend the last two months writing a permissions library that nobody understands how to use or debug. He wrote it with Cthulhu at his side. Soon not even Cthulhu will understand it.
Those are amateur problems, real problems start when you are unable to run it or you don't have source code. Bonus, it's written in the in house language made by developer who left job or died - true story.
Oh God. Story time.
I had an important CICD pipeline that published a dinky little web-thing that was important for customer experience. The first line of the final docker file was from company-node:base. I had all the source code. I had all the docker files. At no point was there ever a container named company-node let alone a tag of base.
The one and only version of this container was on the CICD server.
This was me when I started working with my current full time job.
What a nightmare.
This is the dram. Since the entire codebase is shit, you basically have to rewrite it basically in its entirety.
Which means you can do it with an actual good design.
And if you mess up on something, you have a working version you can consult.
The link is a proxied image link for some reason.
JavaScript developer in a strongly typed language decoding json into dictionaries with single letter keys.
I can live without documentation and comments, but then you've got to write really well-structured, self-documenting code. Which means long variable names (or better: local constants) that describe exactly what's in them, and function names that describe clearly what the function is for, and readable code that shows what it does.
But perhaps expecting that kind of discipline from people who lack the discipline to write documentation, was not entirely realistic.
Yeah, that was a fun job... at least the database tended to have some descriptive column names. They never lined up with the entity they mapped to, but it was better than nothing.
#include "globals.h"
// please help
A few years ago I had to port a tool from HTBasic (a proprietary BASIC dialect) to Python. The original source only runs in their proprietary IDE. Of course, no comments whatsoever and a lot of GOTO magic and matrice calculations some of which have no other purpose as to confuse the reader. The variables had only cryptic and meaningless three digit letters. My theory is that they intentionally wrote it in a way that it would be a nightmare to reverse engineer. And they succeeded.
Allow me to introduce a shit ton of jQuery into all the jsp files you got.
With the short variable you probably also get shadowing. That's super fun in a new code base.
Or another favourite of mine: The first time I had to edit a perl script at work someone had used a scalar and a hash with the same name. Took me a while to realize that scalars, arrays, and hashes have separate namespaces, and the two things with seemingly the same name were unrelated.
Fork the repo.
Ask an LLM to rename all the variables and add comments and docstrings. Give it your style guide (assuming you have one).
Ask another LLM to check their work.
Done.
Disclaimer: I'm not a programmer, I'm a network engineer who dabbles in automation and scripting. But it seems to me that grunt work like this is what LLMs are really good for.
Also I only use short variable names inside of loops (for i in iterable...). Is that not how it should be done?
i and I are acceptable in small loops. But it depends a lot on the language used. If you're in C or bash maybe it's fine. But if you're in a higher level language like C# you usually have built on functions for iterating over something.
For example you have a list of movies you want to get the rating from, instead of doing
for (i = 0; i < movies.length; i++)
var movie = movies[i]
....
Its often more readable to do
movies.forEach { movie ->
var rating = movie.rating
....
}
Also if you work with tables it can be very helpful to name your iteration variables as row and column.
It's all about making it readable, understandable, and correct. There's no point having comments if you forget to update them when you change the code. And you better make sure the AI comments on the 2000 lines of three letter variables is correct!
Yeah I script more than anything...python, bash, powershell, etc.
Only terrible code I inherit is the stuff I wrote >=3 months ago. I'll keep saying that three months from now, too.
In Go, the recommended convention for variable name length is to be proportional to their scope. It is common to use one or few letters long variables if they are local to a few lines loop or a short function.
Ngl that's like baby levels of nasty code. The real nasty shit is the stuff with pointless abstractions and call chains that make you question your sanity. Stuff that looks like it's only purpose was to burn the clock and show off a niche language feature. Or worse than that even is when the project you inherit has decade old dependencies that have all been forked and patched by the old team
If all I had to worry about was organization and naming I'd be over the moon
Git commits with message saying “pushing changes” and there are over 50 files with unrelated code in it.
"fixed issue"
"stuff, lol"
In the past I had commit messages with change numbers from a system, that was no longer in use.
So the commit just said “CH-12345“. It is the kind of annoying, where you can't even really be mad at someone.
Former coworkers: “oh, these two lines are the same in function x and function y. TIME TO ABSTRACT”
Such DRY
The real nasty stuff is not code it's in proprietary blobs which can only be edited through proprietary software. The documentation is shit (because the editor also sells training) and there are no communities (because implementation specialists think having secrets is having an edge).
My favorite was an abstract class that called 3 levels in to other classes that then called another implementation of said abstract class.
And people wonder why no one on our team ever got shit done.
And hard casting onto the wrong class because a neat function lives in there (who will detect you did that and treat you a little different because you don't have all the resuired data in that class instance) as a "quick fix"
Even if the abstractions aren't pointless, there's a limit to how many levels of abstraction you can make sense of.
I've seen some projects that are very well engineered, with nice code, good comments, well named variables and functions. But, the levels of abstraction and nesting get so deep that you forget why you were digging by the time you get somewhere relevant.
What's frustrating there is that you can't blame someone else. It's just a limit for how much your brain can contain.