Tales from the Machine Room
Every time something is done, you should have a reason, a GOOD, SOUND reason to do it, especially if it involves a lot of potential problems on the long run.
Let's suppose that you want to build a bridge. A bridge that can be used by invaders to invade your town. Well, if the town exists, it does so without the bridge, so having a bridge is not a condition for existence of the town. You could point out that having a -BETTER- connection with the external world is going to improve the economy of the town and in so doing improve the general condition of the whole population. However you should also balance that "possible" improvement with whatever "granted" problem the construction represent.
Money will be diverted into that project, money that will not be spent in something else. You'll have a large construction crew on premise, crew of people that probably are not from the town so they will be using resources that are maybe scarce and/or not readily available. If the crew is from the town, they will either be taken out from other jobs or anyway not available for the tasks that they are performing right now.
At the end, if the "pro" are still good enough, the project will be carried on, but this balance of pro/con should be done before the first step of the project is set on stone. Unfortunately, most of the time, things gets put in motion because somebody thought it was a good idea at the time... That, incidentally, is the same thing that the guy that stripped naked and jumped onto a cactus said when asked wtf...
And after this weird introduction, we're going to begin.
It is a dark, mid-wintery day in the office, and I just arrived after an hour spent on a semi-refrigerated train that got delayed twice, so I'm already not really in a good mood when I get in, and turn to the daily tasks when I see a mail from DB.
Is for the installation of a new server for $ConfusingPeople, another of our customers. $Confusing has a semi-large system composed by 12 application servers behind a load balancer, of the 12, 10 are serving web-pages, of the last 2 one is used as entry point for all the editors to enter their changes to the CMS while the last one is a "management" server used for ... various stuff.
Now the request is to install an extra server to be used as an SFTP server where the editors have to upload pictures, movies and other stuff, this server will respond to a 'special' URL and provide all the 'static' contents for the system, so it will also be attached to the load balancer.
Doesn't seems too big of a problem, after pointing out that ONE server that has to handle all the request for static content is probably going to be a very busy one, and since the editors will be able to manually rummage into it they will also be able to rummage directly into the LIVE content of the site and this shouldn't be allowed by the current agreement with $Confusing (it's another very long story).
Of course, MarketingMan points out that 1) the customer has already signed the contract for the installation of the system and 2) they want to go on production with it the same day. As usual, planning doesn't belongs to this universe.
I return to my table and start the basic installation of the system, one extra details makes me turn around and walk back to DB:
Me - Why there is this detail? "Create a second virtual disk and map it to /dev/sdb" ?
DB - It's because of the static images.
Me - ...I'm listening...
DB - What do you mean?
Me - That I don't see a reason to have the picture in a separate disk. A partition, I understand, but why a disk? And why not LVM?
DB - That way if there is a problem with the server we can un-link the disk from the vm and attach it to another vm.
Me - ...we can do the same with the entire volume and then only mount the part that we need, but besides that... What kind of problem could it have? It's a VIRTUAL server. Hosted on a CLUSTER of servers. For this server ALONE to have a problem it would take.... I don't know, something extremely odd. Anything that can potentially damage this virtual server is probably going to damage every other host at the datacenter and even the storage.
DB - The storage is backed-up, we can recover the disk.
Me - We can recover the entire volume. Again, why a separated DISK?
The "discussion" went on for a bit but without a real explanation. Anyway, since this was the Decree Of The Power That Are, I went back to my table and began the installation.
Before 5pm the system was 'alive'. Except for some small details like the fact that the new URL needed to be HTTPS and nobody has thought about buying a certificate, or the fact that the dns wasn't under our control so we couldn't point the url to the correct machine or the fact that none of the 'editors' was informed of the thing and they have no clue about SFTP... As usual.
Time passes, details are fixed, the system goes in full swing production...
Months later, I am checking logs when I notice a red blinking thing in the systems' monitor: the "static" server's partition is 100% full. Of course. 100Gb gone. Ok, time to either increase the size of the disk or tell $Confusing that is time to delete old stuff.
Of course $Confusing has less clue than us about what could or could not be deleted, so we start the process of asking them how big the disk should be.
If you ask me, things are easy: the disk is now 100Gb, and it got 100% full in X months, we should assume the same rate of use, so we should increase the size at least to 300Gb to ensure the same timeframe before it goes kabum again.
But logic is also an unknown things, so MarketingMan propose an increase in size of 50Gb. After several days of discussion and agonizing on the price (50 euro), $Confusing approve, at the condition that the operation can be done with no downtime, condition that MarketingMan is very happy to guarantee.
Me - We can't extend the disk without downtime.
MM - What?? I've already told $Confusing that we'll do!
Me - Sucks to be you.
MM - Why can't we do it? We did it for $OtherCustomer!
Me - $Other had LVM, this is not LVM.
MM - And what's the difference?
Me - ...are you asking me to explain to you a deep technical thing?
Since last time I had to explain to MM something slightly technical (how HTTP requests works) I had to break out the puppet show, and still don't think he understood, I wasn't inclined to go into technical details.
MM - So what can we do?
Me - What we MUST do to resize that thing is to extend the storage, then we need to unmount the partition, that require the system to be in downtime, then resize the partition, this can be done only by removing and rebuilding it, and there is also a chance that this will actually destroy the partion so maybe a data restore, then we can remount the partition.
MM - And how long should it take?
Me - Don't know, never did before. It's the advantage of LVM. You can do these kind of thing on the fly with LVM.
MM - ...I'll talk with $Confusing.
Of course, after a lot of yelling, crying, bargaining and howling, $Confusing manage to get from MM the guarantee that the process won't take more than 5 minutes and that will be done the next day between 5 and 5.15. At this point I said "fuck you" and told him that I wasn't going to do it.
The Quest for the Holy Asshole began! And in the end some unlucky sod was found for the activity. At 8.30 the operation still wasn't finished (resizing partition is not for the faint of heart).
A month or so later, I was, again, looking at log files, when a known red lights caught my attention in the systems' monitor... Guess what? The fucking "static" system's disk is -AGAIN- 100% full. And guess what? That thing is STILL not an LVM.
At least this time MM knows already that tell the customers "we'll do in 5 minutes" ain't gonna cut it. Or I hope so.
Of course my suggestion to "scrap that crap: let's add a new disk, LVM it and copy the data from the old one, then we can swap the two in less than a minute" was brushed aside with a "if something happens to the server we can transfer the disk to a different virtual server".... Nobody learn anything never eh?
Comments are added when and more important if I have the time to review them and after removing Spam, Crap, Phishing and the like. So don't hold your breath. And if your comment doesn't appear, is probably becuase it wasn't worth it.
No, nessuno impara mai niente, altrimenti non avremmo la nostra Storia
By Boso - posted 25/04/2017 20:23
Si è scoperta poi la patetica scus... voglio dire, la ponderata ragione per cui non si poteva usare LVM?
By Guido - posted 26/04/2017 07:24
<i>Ovviamente, MarketingMan [...]</i>
Quando hai detto "marketingman" hai detto tutto. Il loro compito e' vendere, non importa se quello che vendono e' assurdo, irrealizzabile o illogico...
who uses Debian learns Debian but who uses Slackware learns Linux
By Anonymous coward - posted 26/04/2017 17:36
Una delle OTTIME cose aggiunte da $finestre dal 2008 è il resize a caldo delle partizioni, anche quella di sistema. Ammetto che è veramente comoda...
By Steve - posted 27/11/2017 16:24
Maaa... aggiungere alla vm un secondo disco da 150 GB e montarlo, fare un bel rsync e poi scambiare i mount point (5 minuti di downtime) pareva troppo sicuro?
This site is made by me with blood, sweat and gunpowder, if you want to republish or redistribute any part of it, please drop me (or the author of the article if is not me) a mail.
This site was composed with VIM, now is composed with VIM and the (in)famous CMS FdT.
This site isn't optimized for vision with any specific browser, nor
it requires special fonts or resolution.
You're free to see it as you wish.