Tales from the Machine Room


Home Page | Comments | Articles | Faq | Documents | Search | Archive | Tales from the Machine Room | Contribute | Set language to:en it | Login/Register


Wake Me Up, Before You GoGo

Ahhh,  the '80s, when we were young and had long hairs... or for somebody, we simply had hairs. And the best thing of that time, was the music.

Alan Parsons, Pink Floyd, Police, Doors, Van Halen, Duran Duran... Say what you want, but if you turn on the Radio (if you still have one, I mean) you'll hear some song of that time. And this must mean something I think.

The only thing that wasn't very good in that time, were the computers. The problem was that, after the explosion of the PCs, everybody, I mean every-fricking-body, had begun to put together and sell PCs. And the result was that a lot of garbage was sold and used in places that could have used something a lot better. And the biggest problem of using such garbage instead of decent machines, was that they tended to crash at random.

This produced the "save at every change" effect that was typical of whoever was working on PC during that age. You could see that in the ability of the person to find the 'save' key without missing a beat in the typing, so the final effect was something akin to "make a change - save - make a change - save - make a chan - save - make a - save - make - save - ma - save - save - SAVE - SAVEFORFUCKSACKESAVE!!!"

And since hard disks were the most subject to failure, to that you had to add the "floppy-disk-jokeying" too.

Luckily for us, time passes and technology improves. Computers got better and that lowered the number of crashes, that let us work with less save-happy-fingers and go on for more than an hour without triggering a panic-save.

Then... The "cloud" arrived...

And as usual, it arrived promising the heaven and the heart. Server and Services that are impervious to hardware or software problems, servers that can be upgraded or downgraded at will with just a few clicks of the mouse, low cost, infinite bandwidth and so on. And, as usual, since everybody is still looking for "something for nothing" and they even think they can get it, forgetting that "you get what you pay for", everybody jumped on it.

And immediately the first problem start to show off. Sure, your services are maybe still running, but if the gateway is off line they are still unreachable, and for a customer, an unreachable system looks a lot like a dead one. Sure, you can upgrade a system with a click of the mouse, but good luck in downgrading it without having to trash it and rebuild it from scratch. And sure, the initial cost are low, just like those packaged deal that promises "holiday in $expensiveplace starting from X$", for X$ they show you the flier, if you want something more, like getting to $expensiveplace and being able to stay there for a while, that costs extra.

And then... You discover that the "server" is a fantasy, it exists only as long as you look at it, if you decide to turn it off because you want to save in costs (those costs that looked very low at the beginning), then you discover that when you turn it on again, everything in it is gone like a bad dream, all your data have gone in that empty white room where data go when nobody wants them anymore.

And it is at this point that the SAVESAVESAVE effect came back with vengeance.

And after this introduction we talk about... nobody special.

We manage a series of "cloud" environment, from the 'home-made' one, to stuff hosted on Amazon and Azure, not a big difference except that with Amazon and Azure you can pick several location and it should be more "reliable". In theory at least.

What passes for the "infrastructure" department, put together an enormous tool built with Python and Ansible, that is supposed to automate the creation and maintenance of environment on all the "cloud". This thing is built in such way that, if you create a server (or more than one), all the info ends up into a database from which you should also extract invoice data for the customer. I say "you should" because that bit apparently was never finalized.

Everything fine as long as you use that tool. If you don't, you're basically guaranteed that something, somewhere sooner or later is gonna go south. In an unexpected way and without details on how it got broken or why and what to do to fix it without trashing everything and starting from scratch.

Then there is the problem of "persistence".

Yes, because initialize the machines is easy, but once that is done, the machine doesn't do anything if you do not configure it for what it needs to do. That involve configuring the services, adding stuff like firewalls, gateways, relays and maybe also putting together special scripts or adjust others to do... whatever that thing needs to do.

Now, there is the idea, somewhere in the head of the guy that designed all this, that everybody should always and only use Ansible to do... everything. Idea that I think is crackpot. Let's be clear here, the fact that you could, in the event of a catastrophe, rebuild everything by simply running a script is fantastic, BUT... I'm old enough to know that this is based on the fact that EVERYTHING is in the script, and this is not going to happen.

Everybody (everybody that works in sysadmin I mean) during normal operation, first of all try to see what the problem is and then we normally try different thing, looking for the best approach, this require some trial and error and multiple changes. So the idea of "change the script - try - change again", turn something that maybe takes 5 minutes into a several hour of script(s) debugging, and when multiple peoles are trying to maintain a single script, things are going to spiral down fast.

What is going to happens in the end, is that you do the changes, check if it works, repeat and then... you leave it like it is. If you remember it later (yeah, right) you're going to add it to the script, otherwise... well, this is what backups are for.

And this (the backup I mean), is ESSENTIAL. Yes, because you can have a script that install the system, but the USER'S DATA are not in it. The data have to be saved and restored anyway.

So, the idea of having "everything in the script" is broken anyway. After the script, you need a restore. So why bother so much with the script?

Moreover, if you're like me, you'll end up with a lot of stuff that is not strictly essential, but as sure as hell is very handy to have around. That script that analyze the logs and generate a report that is so handy, for the 2 times a year they ask for it. But if somebody trash the script, you can bet your ass they are going to ask it right away. The other script that runs in cron and does... something and you've never backed it up or copied it anywhere, you're personal configuration file for vi with all your macros... All that stuff that is guaranteed isn't in the backup, and is not essential or you cannot rebuild it, but when it happens is a pain in the ass to rebuild them and you keep thinking "what else did I have on this thing?".

So... why am I writing this? You ask. Simple, I say, because it happened that we were working on a bunch of servers for a customer, and to apply some upgrades to them, somebody from the infrastructure decided that a nice reinstall was easy. And he did it without informing anyone. Again, this is not critical, the environment wasn't production so who cares, but the next day I was puzzled because some stuff I had and was testing were gone. Now, I managed to rebuild or recover them, but it toke me a good chunk of the day and I'm still unconvinced that always is back where it was. And I'll know only when I need something and isn't there.

Now, the aforementioned dude insists that is my fault because I didn't "ansibilized" that stuff, and we could discuss this, but my point is that nothing of that stuff was "production", there were not supposed to be there forever, once the system goes into production they will probably be disposed of. Still, I'd rather not have spent a full day rebuilding them.

Somebody suggested to have a "permanent" partition where to save all these stuff, nice, I say, but let's go back to the original problem. Why do I drive myself nuts to save stuff that shouldn't be deleted without warning? Also, if we do that, we'll end up putting everything in the "permanent" partition, that in the end will explode and you still need to back it up. Now, if it is a natural disaster, like a Tyrannosaurus Rex lose in the datacenter, ok, but this is a problem that could be avoided with a simple mail to the team to inform of the impeding datadoom.

Like... Wake me up before you make the FS gogo...

Davide
20/07/2020 10:09

Previous Next

Comments are added when and more important if I have the time to review them and after removing Spam, Crap, Phishing and the like. So don't hold your breath. And if your comment doesn't appear, is probably becuase it wasn't worth it.

13 messages  this document does not accept new posts

Messer Franz

By Messer Franz posted 31/08/2020 09:17

Un T Rex nel datacenter? Pericoloso, ma se a zonzo ci va un manager è moolto peggio...

comunque il tenere script aggiornati per rimettere tutto a posto è una buona idea (come anche tu dici) ma umanamente non è possibile, a meno che non si centuplichi il tempo per fare tutto (e mi pare di capire che siamo d'accordo), ma il "bello" è che per i manager è tutto bianco o nero: o hai lo script automatizzato, o se semplicemente chiedi di mettere una wiki aziendale per chiarire cosa fare in certi casi anche solo per te per ricordarti che caspita è quel problema e come l'avevi risolto, ti dicono che no, è tempo aziendale sprecato, che non c'è spazio sui server, e così via...e tu devi campare a post-it, perchè se ti segni le cose su di un quaderno non è figo, mentre il post-it va bene, che fa molto nuovo millennio...sai , quel millennio in cui sui perdono i dati, ci metti mesi a fare una cosa di minuti, e la burocrazia si espande che neanche le creaturine in un film horror, ma in cui siamo tutti intelligentissimi...

-- Messer Franz

Marco

By Marco posted 01/09/2020 21:23

Alan Parsons in cima alla lista! Grande!

-- Marco

Anonymous coward

By Anonymous coward posted 03/09/2020 11:31

Sto aspettando ardentemente il crollo della moda del cloud... buono per minchiatine lato customer (tipicamente il sync di dati da smartphone), pronto a esplodere lato business...

-- Anonymous coward

Davide Bianchi

@ Anonymous coward By Davide Bianchi posted 03/09/2020 12:50

Sto aspettando ardentemente il crollo della moda del cloud.

Mi sa che dovrai aspettare parecchio.

-- Davide Bianchi

Anonymous coward

@ Anonymous coward By Anonymous coward posted 07/09/2020 11:16

 

Sto aspettando ardentemente il crollo della moda del cloud... buono per minchiatine lato customer (tipicamente il sync di dati da smartphone), pronto a esplodere lato business...

ma non vi scocciate ad essere così "nazisti"? Il cloud come tante altre cose è buono e valido, ma non sempre. Giusto che passi la moda, ma dire che è buono solo per minchiatine è una grande caxxata, 

 

-- Anonymous coward

Anonymous coward

By Anonymous coward posted 04/09/2020 09:14

I Pink Floyd li vedo più anni 70 però

-- Anonymous coward

Anonymous coward

By Anonymous coward posted 04/09/2020 11:01

Mi hai ricordato quando ho scritto la tesi, dopo ogni punto premevo maiusco f12 per salvare.

 

-- Anonymous coward

Massimo m.

By Massimo m. posted 05/09/2020 15:30

Ti dimentichi di quando passavi tutto il giorno a fare salva salva salva salva e credevi di avere le spalle coperte, poi il programma o il pc si piantavano nel mezzo del salvataggio e ti trovavi il file inutilizzabile.

-- Massimo m.

Davide Bianchi

@ Massimo m. By Davide Bianchi posted 07/09/2020 07:24

Ti dimentichi di quando

Si', stavo cercando di dimenticarlo...

 

 

-- Davide Bianchi

Theodore

By Theodore posted 08/09/2020 23:00

La soluzione che vedo in giro non risolve tutti i problemi di persistenza, ma essendo gli script codice, un'idea potrebbe essere mettere la roba in un repository git messo da qualche parte. Oggidì non esiste solo github se uno si fida a mettersi un repo privato su un servizio hostato, se uno non si fidasse può sempre fare git bundle e spararsi il pacchetto nell'HDD, cloud o quel che è che vuole.

Quindi salva salva salva diventa uno psicopatico salva git commit git push, salva git commit git push, salva git commit git push, il mattino ha l'oro in bocca, il mattino ha l'oro in bocca

-- Theodore

Anonymous coward

By Anonymous coward posted 15/09/2020 01:40

proposta: ma perche noin agisci in modo "offerta-che-non-puoi-rifiutare"?

intendo dire: UL ti zappa via i tui file di configurazione e poi di incolpa di non averli "ansemblizzati"? bene: E TU GLI DAI FUOCO ALL'AUTO! Poi, alla macchinetta del caffe, gli fai un discorsetto del tipo "vedi, è pericoloso brasare i miei file di lavoro, perche poi,  sai... il karma... e va a fuoco alla tua auto! vedi l'universo come bilancia tutto?" Facendogli chiaramente capire che la volta successiva i suoi coglioni, seccati e imblsamati, diventerebbero il tuo portachiavi. girare con un costante bozzo sotto la giacca all'altezza dell'ascella : frega niente se è solo una sagoma di legno/alluminio, l'importante è che sembri una pistola. Insomma, farsi una fama da psicopatico/assassino e vedrai in ditta come le cose cambieranno: "buongiorno dottore", "buonasera dottore", "omaggi alla signora", "non si preoccupi per quel progetto, prenda tutto il tempo che le serve", "oh, non permetterei mai che fosse lei a offrire il caffe, penda, prenda pure la mia chiavetta", ecc ecc ecc

-- Anonymous coward

Davide Bianchi

@ Anonymous coward By Davide Bianchi posted 15/09/2020 07:45

proposta: ma perche noin agisci in modo "offerta-che-non-puoi-rifiutare"?

Tu dai fuoco al guidatore del bus quando scopri che non accettano piu' contanti ma solo pagamento via POS (e lo hanno detto da un mese e ci sono i cartelli a tutte le fermate e dentro il bus e fanno pure la pubblicita' in televisione ma tu non ci hai fatto caso)?

-- Davide Bianchi

Anonymous coward

@ Davide Bianchi By Anonymous coward posted 13/10/2020 19:01

 

proposta: ma perche noin agisci in modo "offerta-che-non-puoi-rifiutare"?

Tu dai fuoco al guidatore del bus quando scopri che non accettano piu' contanti ma solo pagamento via POS (e lo hanno detto da un mese e ci sono i cartelli a tutte le fermate e dentro il bus e fanno pure la pubblicita' in televisione ma tu non ci hai fatto caso)?

----

Per l'esempio del POS, no, non do fuoco al guidatore.

Ma leggendo la storia io avevo capito che era stata una scelta del tizio il fatto che tutto doveva essere "ansemblizzato": se ho capito male, ritiro quanto detto. Ma il metodo offerta-che-non-puoi-rifiutare rimane comunque valida a prescindere: UL scarivabarile, SUSL stronzi e quant'altro, meritano assolutamente tale metodo. Altrimenti continuerai fino alla pensione a dire "aveva ragione mia mamma a darmi del coglione". Sai come si dice? "l'era bun, l'era bun, l'era anca un po' un cujun". Era buono, era buono... era anche un po' un coglione". Ecco, ogni tanto, BISOGNA essere cattivi, secondo me.

-- Anonymous coward

13 messages  this document does not accept new posts

Previous Next


This site is made by me with blood, sweat and gunpowder, if you want to republish or redistribute any part of it, please drop me (or the author of the article if is not me) a mail.


This site was composed with VIM, now is composed with VIM and the (in)famous CMS FdT.

This site isn't optimized for vision with any specific browser, nor it requires special fonts or resolution.
You're free to see it as you wish.

Web Interoperability Pleadge Support This Project Powered By Gigan