Cube farm ========= Summary ------- I remember helping this guy with a problem. The guy was in an absolute emotional wreck of a state. I decided to be as friendly and calm as I could to get him to relax. I started looking at the problem and the more I dug into it, the more I realized there was rotten problems all the way down in our code stack. And also, this shit was way beyond this guy's skill level. Then my boss snaps at me for taking too long. And she is MAD. I feel like shit because my boss's approval means a lot to me. It means too much to me. I can't calm down until I get her back on my side. So I stay at work late putting together patches for the bugs I found. Objectively, I did some excellent work. Then I went into our bugtracker and I found these issues have been going on for months now, maybe forever. The whole time, I'm berating myself for not fixing the patch first. "Save the herd, cull the defect" "Needs of the many outweigh the needs of the few or the one" I feel proud writing my boss an email at like 1:30 AM describing the patches, and the situation. I intentionally make it seem like no big deal, but this issue was already reported. The idea is that I'm showing how desperate I am for my boss's approval. And also a way to show how miserable the work culture is. Story ----- "It looks like you're touching the file system in that method. "See the parameter named buf? There's a stringIO buffer for handling in-memory stuff. That's the fastest buffer. "Then's buffers that look just like stringIO. They're meant to be used interchangably. I asked the kid if I could drive for a minute. I "Yep. You're hitting a network mount. In other words, you think you are looking up some data that's already on the literal box that the python interpreter is running on, right? "But it's actually network mounted. On your local laptop, the network is never that busy. There's like a dozen of us. In prod, customers are browsing images. We might have 5000 customers all doing network file access concurrently. "So the code hands here, right? Or even worse, "And there's another issue. Hold up. This isn't even the problem. We're trying to check out a memory leak. "The problem is that the local box's logs are in a text file on the box, and we're not allowed to download them. "And the python interpreter logs are in SUMO. But the SUMO logs said that python interpreter "OK, so I think the problem is that these python processes are loading these files in as IO buffers into memory so later it can add watermarks. "Remember there's like 80 python interpreters running on each box. All the processes share the same box memory, and if too many of them gobble up the memory, the OS has a thing called the low-mem-killer, and it starts wacking stuff. "I think there's a way to configure that thing. "But the problem with the network mount calls is also there. That's interesting too. Then Kathy came by. She was already pissed. She said something how this needed to be fixed immediately. "There's nothing worse than fixing a small bug and then creating a much bigger bug. You don't want trashy on your desk." "There's two bugs. "The code was written assuming that all image files were under a certain size, and the holiday stuff is much bigger and so when too many people on the same box play the santa claus video, the OS kills the process. "So then necromancer restarted a process, and the problem goes away." "But that's not the only bug that we saw. We're using the C3 client library and it's supposed to take care of all the network stuff. "So, there's a solution where the C3 team just changes some authorization rules for my client and I can do the client calls in the way to use callbacks instead of blocking. But the C3 team are a bunch of power hungry do nothings that took a bad system and instead of rewriting it, they added a new wrapper to it that requires all new client code. The idea was to be able to switch out the internals of C3 and we would be happy, but they're never gonna finish that. "They let me do this, then other people are gonna do it too. just use C2 the old way, and then C3 is another WOMBAT." Mike over the wall called out "Wasted of money, brains, and time" But here's the twist: The C3 team said they would fix the issue where some image files were not being cached on the local boxes. And they wouldn't let me! They wouldn't give me access to the thing that lets you look up meta data about the image. I could use that to figure out how to load the image. ---- I remember feeling bewildered. I was watching in real time how the system was getting more complex and more difficult.