Tales from the Machine Room |
Home Page | Comments | Articles | Faq | Documents | Search | Archive | Tales from the Machine Room | Contribute | Set language to:en it | Login/Register
DevOps, is the word of the day, that is not a word actually, but a combination of two.
If you have been on a long-extended vacation for a few years, then you probably don't even know what the heck 'devops' is. In that case, you're lucky. DevOps stand for 'Developer/Operations' and is the "new way" of doing all sys-adminning things that is born from the diffusion of virtual environments and the incredible amount of developers turned sysadmin.
With virtual environments it's very easy to create a plethora of servers to cover all your performance needs. Your application is slow? Do not waste precious time and resources to debug it and figure out what doesn't work; just toss in the mix a dozen or so application servers and voila': the system is now fast-ish. It will probably cost you a lot more and potentially explode faster because everything else is still the same as before, but at least you can't say it's slow anymore. Unless it still is. Because the problem wasn't in the number of application servers but in the way they function.
Anyway, having to "toss in" a dozen or so server all at once create more problems on the sysadmin side of things, because somebody must configure those servers, be sure that whatever needs to be installed is installed and they are hanging in the correct subnet or vlan (or both) and so on.
And what better way to configure a bunch of servers that are supposed to be all identical twins (except for obvious differences of names and IPs) than to use a script? Yes, exactly: it's hot water, but with a snazzy new name.
Now, if you have a minimum of experience, you should understand that writing a script to configure a bunch of servers works fine, if they are all the same (identical clones), but if the number of servers to be configured is low... Like... ONE, then the entire point is moot. It takes longer to write (and test) the script than configuring the single machine. And that's the problem with DevOps: it works fine if the environment is the right one, if the environment is not... It doesn't work at all.
If you've followed my (mis)adventure so far, you shoul'd have understood that the environment of $ShittyHostingProvided wasn't very much oriented to DevOpsing, being composed by (mostly) completely different collections of servers grouped per customers. Sure, there were similitude (a TomCat server is a TomCat server) but not even so much, since most machines were installed in different time and with different "preferred way", even servers with the same "purpose" were very different from each other. And let's not talk about the different versions of applications and libraries that couldn't be updated or replaced because of incompatibility between applications.
DevOps in such situation means writing a single script for each machine, turning what was a largely manual operation into a still manual operation with a pretense of automation.
This if nothing goes pear-shaped of course.
Enter $WeAreTheFuture, another customers that was brought to us by the wave of acquisition the company carried on in the last few years.
They were "tech-savvy", with this I mean that they knew that things tend to go pear-shaped more often than not, so they wanted a very "reliable" system. So they decided to get 2 load balancer, one in each datacenter, with a Round-robin DNS, and 2 application server behind each load-balancer (for a total of 4) and 2 database servers (one in each datacenter). The db server were supposed to be replicated master/master between the 2 datacenter so in the unlikely event of one system going down, the other one could have carried on.
Now, we can discuss about the failure of this plan a lot... Having 2 load balancer with a round-robin dns doesn't mean that in the event of failure of one the DNS automatically switch to the other one, and even if it does, chances are that the remote client will still be try to talk with the 'dead' system for a while. And exactly WHAT CAN FAIL? Not the hardware, since we are talking about VIRTUAL servers that are running on a cluster of hosts... Hardware failure is not an option, unless you consider a bomb directly on the datacenter, but that's a bit too far for 'normal' operations.
Oh, and the cherry on the pie? The application servers were Windows. With applications developed in-house that run as services.
Despite many discussions, this... thing went on. But with an addition: they wanted a "management" server to perform releases on the entire system "dev-ops-style". And the management server was Linux of course.
After a bit of push-and-pull, somebody (don't know for sure who, I think he moved away immediately after the project got the green-light) cobbled together a bunch of script that used Puppet and Python to perform 'deployment' on the whole system, these still required a huge amount of manual messing around (disable services on the Windows servers 'cause the 'Net stop' fails most of the time, remove files 'cause the 'XDEL' fails most of the time and so on), basically the deployment was still a manual operation except that it was sold as a 'semi-auto'.
My involvment in the project was minimal until a very rainy day, when I got the short straw and ended up having to handle a ticket from the customer that wanted to do a 'backoffice' deployment the same day. The 'backoffice' deployment turned out to be the replacement of some services that, in theory, shouldn't have affected the actual web site or other 'public' services, and it couls have been performed during office hours. Fine. I grabbed the documentation and discovered that the details were... scarce.
After a round of questions, I ended up asking the customer himself, that reacted with surprise at the news that we had no clue how to maintain the system they paid us to maintain. Ah, the New Technology...
Anyhow, after a while, I managed to put together an idea of how the system should work, so when the time rolled in, I started the operations that... went almost too smooth to be true. And in fact it wasn't true. About half an hour later I got a call from the customer that complained that no deployment had taken place and the application was stil the old one.
I checked and... the timestamp appeared to be new, but the version returned by the application was still the old one. Puzzled, I performed the deployment again and checked if the version was the correct one. It was. I called the customer to check and they reported that, again, the application was the old one. I checked again and... it was the old one again! Wtf ?
So I started investigating.
The procedure seemed to be the correct one, at least, nobody could remember anything and the only one that had performed a deployment before (CL) wasn't available because "working from home", that is: he was not in the office, his cellphone was turned off and he was not responding to mail or to chat messages, I call that "a free day that doesn't get marked as one". Anyhow, after several hours spent rummaging through the system, I couldn't find anything obviously wrong. So I decided to call it a day and told the customer that "today ain't a good day for deployment".
Did I mentioned that in their "high-availability" picture there was only a PRODUCTION system and nothing else, so no testing environment with which to play, or you know that already?
The next day, about half-morning CL rolled into the office and got immediately tackled. Of course he first needed to go get breakfast, this delayed his "start of the day" until about noon, then it was almost lunch time and everything was sluggish until 2 PM... after that I was almost ready to move directly to the baseball bat.
Anyhow, he seemed to remember that he had a bit of a problem with that deployment too... but he couldn't remember the small details.
At this point the customer had called about 25 times to know when we were going to do whatever they paid us to do (I think they started to wonder what exactly meant the "managed" bit near the "hosting" in their contract and invoices), summoned by the Ringing Phone of Doom, DumBoss moved in to investigate.
We (me, CL and DumBoss) ended up all 3 looking at the 'deployment', after several attempts that ended up exactly the same way, with the 'new' version being silently replaced by the 'old' version after a few minutes, CL showed sign of brain activity:
CL - Oh, right! Now I remember, it's very simple actually!
Me - Oh really?
CL - Yes...
Me - So are you gonna tell us or do we need to use a vice and pliers?
CL - What?
DB - (seeing my blood pressure rising) What is the problem?
CL - Well... you need to either turn off this-and-that services and after the deployment take a copy of the application and place it in this-and-that
directory on the server, otherwise it's overwriting the application with the old version that is in that directory.
Me - ...Question one: what the fuck? Question two: No seriously: WHAT THE FUCK? And Question three: why the fuck is not documented?
CL - I think they wanted to do some sort of auto-deployment but it never worked because there are too many services that fails otherwise.
Me - That is for the first question, how about the other 2?
CL - What?
Me - Why is this shit not documented?
CL - ...I don't know
Me - You did this stuff last time, didn't you thought it was a good idea to write it down?
CL - Well...
Me - Well what?
CL - It's time to get a cup of tea.
And he went to get a cup of tea, leaving me to hope he choked on it. No, it didn't happened.
Davide
13/02/2017 16:15
Comments are added when and more important if I have the time to review them and after removing Spam, Crap, Phishing and the like. So don't hold your breath. And if your comment doesn't appear, is probably becuase it wasn't worth it.
By Guido posted 06/03/2017 09:24
DevOps, almeno a me, da l'idea di qualcuno che voleva pronunciare la parola "Development" e poi e' scivolato nel frattempo...
-- who uses Debian learns Debian but who uses Slackware learns Linux
@ Guido By Paolo posted 08/04/2017 11:26
Bhe, anche io ne sento parlare dagli sviluppatori e mi fa ridere, perchè fino a quando sono arrivato sviluppavano applicazioni / web app che non si preoccupavano di gestire eventuali intolleranze / failure delle risorse utilizzate e imputavano tutto sempre al sistemista che in effetti non aveva nulla per evitare dei failure ot simila, come se SysAdmin significa il secondo server in un sistema di bilanciamento. Menomale che la mia precedente exp da sviluppatore mi ha portato in un certo qual modo a costringerli a considerare alcuni aspetti ... però purtroppo la loro forma mentis è sempre la stessa: "funziona sul mio pc e va veloce, quindi se va lento o non funziona in produzione non è colpa mia" e mai a considerare che una cosa è sviluppare un'applicazione che gira su un pc e serve un solo utente, e un'altra rendere disponibile una web app che serve centinaia, se non migliaia, di utenti e che i sistemi hanno comportamenti strani quando messi sotto pressione, proprio come gli esseri umani.
Alla fine sembra quasi che dando un sistema come servizio da gestire tramite degli script risolva tutti i problemi, magari imparare come sviluppare applicazioni efficienti e scalabili no????
Però sembra che il mondo, causa anche tutti questi Cloud Provider, stia andando verso questa direzione.
Mhaaaaaa!?!?!?!
-- Paolo
By Mattia posted 06/03/2017 13:47
Oh mioddio... E questo CL riesce ancora a deambulare con le proprie gambe? Sei troppo buono...
-- Mattia
By trekfan1 posted 07/03/2017 10:22
Ci fossi stato io al tuo posto quel CL avrebbe subito aggiornato la documentazione ovviamente a suon di frustate s $postodovenonbatteilsole, altro che tea!
-- trekfan1
By Anonymous coward posted 07/03/2017 10:30
Più che altro, DaBoss tiene elementi del genere?
-- Anonymous coward
By Francesco posted 07/03/2017 13:09
Davide una curiosità ma come mai fate script "CMD" invece che powershell?
-- Francesco
@ Francesco By Davide Bianchi posted 07/03/2017 16:51
Davide una curiosità ma come mai fate script "CMD" invece che powershell?
Dove vedi "cmd" ?
-- Davide Bianchi
By Francesco posted 08/03/2017 07:35
By Boso posted 08/03/2017 13:37
Penso che la cosa peggiore di queste situzioni sia la totale e spudorata mancanza di vergogna e di interesse da parte di tutti, che siano impiegati o dirigenti. Non va un tubazzo e sembra non importare a nessuno. Almeno una volta frignavano perche' qualcuno (di solito il SysAdmin) risolvesse il problema.
Davide, ma DumBoss non ha battuto ciglio? Questo può prendere e farsi un tea in mezzo a una discussione del genere?
-- Boso
By Anonimo codardo (!!!) posted 08/03/2017 15:24
Dopo un devOooops di questo genere non hai scuoiato vivo il CL? Non è da te!
-- Anonimo codardo (!!!)
By Anonymous coward posted 08/03/2017 16:45
ma DB non ha pensato di fare qualcosa a CL? Tipo levargli dallo stipendio una cifra sufficente a pagare una penale per il ritardo del rilascio?
Se la risposta è DB non sarebbe dumBoss se lo facesse, non serve dirlo
By trekfan1 posted 10/03/2017 12:06
Ma allora perché non mettere direttamente l'applicazione nuova nella directory da dove poi viene "installata" nella giusta posizione? O non è possibile?
-- trekfan1
@ trekfan1 By Davide Bianchi posted 13/03/2017 07:18
Ma allora perché non mettere direttamente l'applicazione nuova nella directory da dove poi viene "installata" nella giusta posizione? O non è possibile?
Non fare di queste domande... la risposta potrebbe spaventarti...
-- Davide Bianchi
By Sciking posted 05/04/2017 12:35
This site is made by me with blood, sweat and gunpowder, if you want to republish or redistribute any part of it, please drop me (or the author of the article if is not me) a mail.
This site was composed with VIM, now is composed with VIM and the (in)famous CMS FdT.
This site isn't optimized for vision with any specific browser, nor
it requires special fonts or resolution.
You're free to see it as you wish.