Tales from the Machine Room

Home Page | Comments | Articles | Faq | Documents | Search | Archive | Tales from the Machine Room | Contribute | Login/Register


Set: Day - Interior IT office - The sysadmin sits at his table and is happily typing on the pc. Suddenly te door opens and CL
enters brandishing a not-so-pristine shoe.

CL - I need a shoe! With a very sturdy heel!
Me - Hemmm... this is the IT office, I'm not a shoemaker...
CL - (weaving the shoe) Shoe! Heel! Sturdy!
Me - Ohi! Calm the fuck down. What the heck are you doing?
CL - I need a show with a sturdy heel! I told you already!
Me - And I told you that I'm not a shoemaker! What are you doing with a shoe?
CL - (show a huge nail) I need to put this into the wall!
Me - ...and why don't you use an hammer for that?
CL - Hammer? What the fuck.. SHOE! HEEL!

The scene vanishes with wibbidly-wobbidly effect.

Ok, this hasn't happened (yet), but if it does, I wouldn't be too surprised.

Very often there are individual that get "specialized" in doing specific things in a specific way, this is what is normally referred as "getting competent" and there is nothing wrong with it. The problem is when the acquired "competence" is the only one and the person in question see only one solution for every problem, doesn't matter what kind of problem.

The situation is aggravated when the "specialist" in question has actually acquired fragmented or just plain wrong informations. In that case, the person tend to apply repeatedly the same process that, of course, produces results that are not the expected. This in turn produces more attempts at the 'solution' that doesn't work.

And after this theatrical introduction, let's talk about $weknoweverything, a company that... did several things.

They had acquired some sort of specialization in a very narrow field and were selling such "experience" at a price (it is called "consultancy" I think). The problem wasn't that they were specialized in a field, but that they had convinced themselves that such "specialization" could be extended to every other field no matter how disconnected from the original it was.

For example, they had decided to build some sort of "database" with information about their expertise and provide some carefully selected customers (aka: everybody that could pay) access to the information, and since they were "competent" (they had read some article somewhere) they've decided the whole IT infrastructure, then they've mailed their ideas to MarketingMan whom had talked with somebody else and that was how the whole "architecture" was designed.

Fast forward a couple of months, when their database apparently went in a coma and obviously $weknow started to scream a lot.

That day I had the misfortune to look at the problem.

The database seemed to work correctly, at least, I couldn't see any problem, and the application for the access was also working fine, so the problem was somewhere else.
However, I noticed that repeating the same request, sometimes I got an "error 500", but that error didn't show in the application log. So... somewhere there was a snag.

After a bit spent poking and prodding, I realized that the idea of $weknow of a "redundant system" was to have 2 frontend server, in 2 different hosting and both IPs were in the DNS. Now the frontend that was hosted by the other ISP apparently was returning errors.

I could ping it and I could send him requests, but all I got back was errors. At this point my hypotesis was that the server or the connection with the database was broken, but I couldn't check.

Every time the DNS returned the "wrong" IP, I got an error in response. And obviously the DNS wasn't under our control.

After composing a mail to explain the problem and basically saying "you're barking to the wrong tree", I declared the problem closed on my side and went to do something else. But of course this didn't went well with $weknow, about a quarter later, we got CL on the phone.

CL - What do you mean you're not the one that manage that stuff?
Me - The server that returns errors is managed by $otherISP, therefore you have to ask them what the problem is, I can't do anything about it
CL - But your server should respond.
Me - And it does, the errors are from the other one.
CL - But if your server answers we shouldn't get any error!
Me - Hummm... Maybe I wasn't clear, the errors are coming from the other server, the one hosted by $otherISP that you should contact for...
CL - I got that, we're EXPERT you know? If one of the server responds, it should always respond so why isn't doing it?
Me - ...Ok, now I'm confused, how can the server respond to requests that are not sent to it without a connection between the two?
CL - Of course there is a connection they're both in the DNS right?
Me - ...And what is the point of that?
CL - What do you mean? Aren't you expert?
Me - Maybe I didn't understood.
CL - That both systems are in the DNS, so if one doesn't respond the other one should.
Me - Hemmm... No, the DNS doesn't work like this.
CL - What are you talking about?
Me - Maybe you're talking about a load balancer, but I don't see any load balancer in your structure...
CL - No, what load balancer, those things are useless, we have the DNS and that should answer with the good server all the time.
Me - No, the DNS doesn't work like that. The DNS simply translate an hostname into an IP address and that's it. The DNS doesn't check if the server exists, answers or anything.
CL - What the heck are you talking about? Experts...

The discussion went on for a bit, then the boss was called upon and the boss' boss... and at the end, we were all invited at a big meeting. And several peoples explained to them that NO, the DNS DOESN'T WORK LIKE THAT. And what you need is a LOAD BALANCER.

CL - yes, and if the load balancer breaks then?

Now, I want to open a little parenthesis here, this story of the "server that breaks" is by now a bit of a fairy tale. Long (ok, not really that long) time ago, we had PHYSICAL machines, that were expensive and as such everybody was trying to get along by saving every possible penny, using machines way past their lifetime or recycling all kind of garbage. That the hardware at some point were breaking was not so uncommon. But we're in 201X now! It was sometimes ago that VMWare blew everybody showing how they could run a "server" as a piece of software, today there is good chance that the 'server' is nothing more than software run on a cluster, that is a lot of machines and if one of them goes down you barely notice. And disks are also "slices" of a very large "volume". So the "server breaks" is getting less and less a possibility and even when IT DOES happens, you can spin an identical one in minutes. So... No, that's not the problem.

Everybody happy then? In that case you wouldn't be reading this. Because as said, the problem of an "expert" is that he doesn't accept so easily that his "expertise" is in doubt. So $weknow decided to get not one, but TWO load balancer, in different hosting. Each one connected with ONE backend. And both IPs for the LBs were in the DNS. In case of a malfunction, somebody had to login in the remaining lb and reconfigure it. And obviously we had no access to the other lb and the other way around.

14/01/2019 12:56

Previous Next

Comments are added when and more important if I have the time to review them and after removing Spam, Crap, Phishing and the like. So don't hold your breath. And if your comment doesn't appear, is probably becuase it wasn't worth it.

8 messages this document does not accept new posts
Messer Franz By Messer Franz - posted 04/03/2019 09:34

ho provato a leggere la tua storia appena alzato, quando mi snebbio con un po' di internet. Leggo la prima riga, e leggo " Il sysadmin siede al suo tavolo FESTEGGIANDO allegramente"..no, DB non festeggia mai al lavoro se non quando esce dalla porta e si dirige a casa sua o quando riceve lo stipendio (e si dirige a casa sua)...mi sa che devo rileggerlo quando sono più snebbiato...bello saper riconoscere i lapsus al primo sguardo, dà l'impressione "conosco il sito"..

Messer Franz

Messer Franz By Messer Franz - posted 04/03/2019 09:35

Due cose:

Primo: "Ma quale load balancer quelle cose non servono a niente"...si sa, difatti ,che i load balancer vengono utilizzati per rendere più trendy il server aggiungendo icone colorate che rendono il desktop più vivace e che così rendono il sistemista più alllegro e performante, ma i veri uomini vogliono il server monocromatico.. non bicromatico, nero e bianco (o nero e verde, per chi avesse ancora gli incubi a riguardo), ma nero e basta... e loro vogliono solo sistemisti che fanno paura anche a Chuck Norris...

Secondo: "un altro server identico puo' essere "installato" nel giro di pochi minuti"...anche quando non ti dicono cosa deve esserci dentro al server, che impostazioni fare e quanto grosso deve essere?...minuti per installarlo standard, mesi per litigare perchè dovresti aver fatto quello che non ti avevano detto di fare ma avresti dovuto capirlo con i poteri ESP del sistemista!...evidentemente non sei esperto quanto loro, vedi che hanno ragione?


Messer Franz

Davide Bianchi@ Messer Franz By Davide Bianchi - posted 04/03/2019 13:11

Secondo: "un altro server identico puo' essere "installato" nel giro di pochi minuti"...anche quando non ti dicono cosa deve esserci dentro al server,

Ok, la frase corretta sarebbe "un server puo' essere CLONATO nel giro di pochi minuti, gli snapshot aiutano".




Davide Bianchi

Anonymous coward By Anonymous coward - posted 04/03/2019 16:04

Secondo me alla riunione avresti potuto benissimo saltare alla gola di $ESPERTO e dilaniargli la carotide a morsi. Mi immagino la scena: sangue ovunque, colleghi impietriti ma sotto sotto felicissimi, tu che ululi sul cadavere di $ESPERTO, $ESPERTO2 inorridito dal fatto che il prossimo sarebbe stato lui.

Sei incensurato, un avvocato coi coglioni avrebbe gioco facile a farti dicharare infermo di mente. con 5 anni di ospedale psichiatrico te la caveresti... vuoi mettere la soddisfazione?

Anonymous coward

magaolimpia By magaolimpia - posted 04/03/2019 20:32

"Per cui il fatto che "il server si rompe" non e' piu' tanto un'evento da considerare. "



Guido By Guido - posted 05/03/2019 08:22

Dovresti mettere un disclaimer perche' leggere di uno che confonde DNS con "load balancer" e' pesa parecchio...

who uses Debian learns Debian but who uses Slackware learns Linux

Massimo m. By Massimo m. - posted 06/03/2019 20:42

Che il server si possa rompere, è possibilissimo.

L'unica differenza tra il server fisico e il virtuale è che quando si schianta il server fisico che ospita il virtuale, si schiantano N server virtuali.

Massimo m.

Davide Bianchi@ Massimo m. By Davide Bianchi - posted 07/03/2019 08:37

L'unica differenza tra il server fisico e il virtuale è che quando si schianta il server fisico che ospita il virtuale, si schiantano N server virtuali.

Se fai le cose "per bene", il server fisico non e' UN server ma un cluster di macchine, percui che UN server si rompa non fa nessuna differenza. La mia esperienza e' che non e' il server a rompersi ma l'admin che fa una qualche cagata.


Davide Bianchi

8 messages this document does not accept new posts

Previous Next

This site is made by me with blood, sweat and gunpowder, if you want to republish or redistribute any part of it, please drop me (or the author of the article if is not me) a mail.

This site was composed with VIM, now is composed with VIM and the (in)famous CMS FdT.

This site isn't optimized for vision with any specific browser, nor it requires special fonts or resolution.
You're free to see it as you wish.

Web Interoperability Pleadge Support This Project
Powered By Gojira