Tales from the Machine Room


Home Page | Comments | Articles | Faq | Documents | Search | Archive | Tales from the Machine Room | Contribute | Login/Register

Performance Proformance

"But it works fine on my laptop!"

Who hasn't heard this sentence a few billions times from the last developer that is trying to justify why his state-of-the-art craptastic application crawl like a snail on the production system that costs the customer a small fortune while the promised performances were stellar?

The problem between "perceived performances" and "actual performance" is just a small point in this discussion.

What the developer(s) ignores most of the time, for laziness or just because they're not used to think about scale, is that what works fine on their system with a limited size of data and operations, doesn't scale at all when loaded to a few order of magnitude on data and operations.

The problem is compounded when the "design" of the system doesn't consider the necessity to scale horizontaly as well as verticaly. Or when, for laziness or just because the developer(s) doesn't know any better, they insist in using the wrong tool even when it is clear as day that it's not the right tool for the job.

What happens in these situation, normally, is that the developer press to throw more hardware at the problem, in the hope that this will make the problem disappear. And sometimes it does help, but when the problem is actually bad design (in addition to bad execution), the problem doesn't get removed, simply hidden behind a curtain, ready to resurface with the next release cycle that pushes the request a little bit more.

Normally, what happens with me, is that I immediately propose to put the aforementioned laptop in production instead of the production system. Strangely enough, I haven't seen any developer agree to this request. Yet.

And after this performant introduction, let's go on with today's story.

Once upon a time, there was a company, let's call them $WeMeasureShit, that was busy trying to get a foot in the door of the Internet Of Shit market. Their plan was to provide everybody with some sort of network-enabled-ambient sensor that could control several parameters (temperature, humidity, air pressure, ambient-light etc.) and notify a centralized system.

The idea was that this could be applied to several things, like freezer or cold-environment in a production plant or merely office spaces, to be able to monitor the environment for wanted or unwanted wasted (heat applied when windows opened and so on). That, seen in a certain way, is also something of a good idea.

However, the goodness of the idea was fighting hard with the badness of the implementation, and losing. Badly.

First of all, the various devices were built more thinking about design and "let's pretend we're Apple" more than with "let's focus on something that works first", and the result is that placing a thermo sensor right on top of a light sensor fucks both and helps none.

Then, there was the whole IoShit part...

Of course, since we're talking about IoS stuff, we're talking mostly about hastily cobbled together software with a preponderance of PHP/Python, a good dose of Javascript, a robust meassure of MySQL, lots and lots and lots of buzzwords, very little (if any) documentation and NOT A FUCKING LINE dedicated to security. So here we go...

One of the first step that $WeMeasure decided to do, was to order the 'production' environment. Since the 'test' environment was the one that the developer(s) were using. Now, the 's' it's in parhentesis, since we only saw one guy, and it could have been one in a team of a dozens, but we only saw one, so I'll leave the 's' there just as a reminder.

The idea behind #$WeMeasure was to hook up all these devices with a GMS connection so they could keep sending data even in absence of network connection. They wanted to market this as a tracking/information devices for shipment and moving things. In order to do so, they added to the "thing" another "thing" that added some more stuff to the whole package. That, per-se, was not the problem, the problem was that the connection between the devices and the "central" location was now also dependent on the functionality of things like SMS (that are specifically marked as "best-effort" from every carrier and as such have zero reliability).

About a month before the actual "launch" of the thing, $WeMeasure was busy with a huge marketing campaign, touting the advantages of their "tracker" to everybody, from shipment/delivery companies to mere mortals that were curious about the wherabout of their "vacation house" and wanted to know if there was risk of freezing or something. The result was that even before the system went "live" there were serious discussions about "will the hardware manage the amount of requests that we are going to put on it?".

Now, since nor the customers nor the developers had any clue about how well their system was going to react, the answer was a resounding "fuck if we know". So we reached the moment to push the button and that was it.

For a while, everything seemed to work fine, but as the number of users increased and some features were added, things quickly become to look on the sluggish side.

And immediately we began to receive request for "tuning". However, it turned out quickly that you cannot tune what is badly designed. This prompted more requests for tuning and "improvements" in the hardware department, until we got into a stale-mate. And in the end, of course, there was "the" meeting.

From the customer's part there was Developer (DE) and Customer (CL), from our part it was Me (since, apparently, I was the only one with a clue about tuning and development around there) our Marketing Man (MM) and DB of course.

DE - ...and the system performs fine on my laptop, so I'm asking if there is something wrong with the hardware on production since it doesn't perform even close to our development environment.
MM - We have already increased the hardware resources on the environment but there was no improvement in the performance, the question is if the performance are related to the hardware or on things that are beyond our system.
ME - May I ask how many records do you have in your database?
DE - Well, it's comparable to the production.
ME - "Comparable" means what?
DE - We can't get the production database, it's a matter of privacy.
ME - Agreed, what does "comparable" means?
DE - What do you mean?
ME - (checking my notes) At the last checks, the production database is about 850 Gb, the "readout" table is the largest with approximatively 1.5 millions rows, followed by the "users" table with about 250 thousands... How are the numbers on your system?
DE - Well.... not that large.
ME - Have you tried with any dataset around this size?
DE - ...
CL - Can we optimize the database?
ME - There is nothing to optimize in the database, especially if the database itself is not used correctly.
DE - What do you mean?
ME - What I see by running a simple scan on the system is that 90% of the query that are processed are in the form "WHERE value IN ...", these kind of queries forces a tablescan everytime, that in turn forces a temporary table to be allocated to perform a sort and a selection. This happens approximatively every 5 seconds every time somebody is visiting your site.
DE - Ah, that is probably the javascript that refreshes the views...
ME - And kills the database.
DE - We could increase the memory of the database!
ME - It will still create a temporary table to do its job, it's the way the database is designed. And is the way your query are designed.
DE - How about using faster disks?
ME - It's a VIRTUAL disk.
DB - What we could do is to put another database server and use the two of them to split the load.
ME - It wouldn't do anything unless the split is done by the application.
DB - Why not? If we have a load-balancer in front...
ME - The load balancer would only keep asking the same query over and over again, the problem is not that the database server is slow but that the query is slow, unless you redesign that query in such a way that takes advantage of having multiple database it would do absolutely no difference.
DE - No, we can't redesign the query, that would be like redesign the entire application.
ME - That would definitively improve things.
CL - What do you mean?
ME - As far as I can see, you've used the wrong database. What you need is speed in recording the data after that you need to coalesce them into "views" that allow to be searched. I'd split the two part. A binary system to store the 'raw' data and then an off-line process that create the views already optimized for searches. Instead of using the same database for both.
CL - No we can't do that, it's not what we have established as our "core" functionality.
ME - Well, it was an idea.
CL - But why things like Facebook or Booking can do it and we can't?
ME - Because Booking and Facebook have designed their infrastructure from the ground up in such a way that the can easily partition the load between several HUNDREDS of systems without problems.
CL - What do you mean?
ME - Let's take Booking as an example: they don't have one database, they have about 400 of them. Each one is handling maybe one or 2 tables, and their applications are designed in such way that they know to open connections to all of them in turn, and they read from some while write on others. And let's not talk about the fact that they have also basically re-written the whole programming language from scratch. What they run ain't PHP. Not the version you get from the site anyway.
DE - We can't do that, it will be way too expensive.
CL - What if we simply double the memory of the server?

Yeah, because after repeating for half an hour that throwing more hardware at the problem is not a solution, of course you're going to throw more hardware at it. As long as you have hardware to throw.

I don't have to say that the doubling of the memory resolved absolutely nothing, right? And that they decided that they didn't need a "test" environment with the same amount of data of the production system to do testing either, right?

Davide
05/09/2017 14:57

Previous Next

Comments are added when and more important if I have the time to review them and after removing Spam, Crap, Phishing and the like. So don't hold your breath. And if your comment doesn't appear, is probably becuase it wasn't worth it.

19 messages this document does not accept new posts
Guido By Guido - posted 11/09/2017 09:13

Fammi immaginare i test li facevano su 10 righe?

--
who uses Debian learns Debian but who uses Slackware learns Linux


Cobra78 By Cobra78 - posted 11/09/2017 09:31

Da un lato è vero che a volte avere un ambiente di test realmetne comparabile con la produzione è difficile se non impossibile, anche da noi siamo messi così, con ambiente di sviluppo sui pc dei dev, ambiente di staging su un dato ambiente virtuale molto più grosso dei pc dei dev, e produzione che è molto molto molto più grossa dello staging.

 

Per fortuna i nostri dev sono abbastanza svegli da aver chiaro il problema e sapere che è un problema, quelli con cui di solito hai a che fare tu invece....

--
Prendi la vita al minuto, non all'ingrosso.
Sogna come se dovessi vivere per sempre; vivi come se dovessi morire
oggi.


Anonymous Coward By Anonymous Coward - posted 11/09/2017 10:59

Qualcuno ha detto "SAP + Accenture"? Dove "Select *" va bene sia per tabelle con 100 righe che per tabelle con 50 milioni di righe?

Quello che facevo io quando mi ritrovavo in queste situazioni, era proporre di mettere il suddetto laptop in produzione al posto del sistema preposto. Stranamente nessuno degli sviluppatori coinvolti si sono mai detti soddisfatti della cosa.

Sei stato mio collega e non me ne sono mai accorto?blush

--
Anonymous Coward


Manuel By Manuel - posted 11/09/2017 13:52

Fantastico.

Un piccolo OT: finalmente sono tornate le storie! mi sono DAVVERO mancate. Grazie Davide!

--
::: meksONE :::


Il solito anonimo codardo By Il solito anonimo codardo - posted 11/09/2017 16:08

Ca$$o, sul Commodore Vic 20 di mio cuggino (mio cuggino mio cuggino) girava tutto che è una meraviglia, e su 'sto fantamegacoso con 128 CPU virtuali, 64 terabyte di RAM virtuale e un HD virtuale da 7 zettabyte non gira! Ma cos'è 'sta storia? Insomma, se a gestire 1 dato sul Vic 20 gira bene, cosa vuoi che sia gestire un googolplex di dati qui? cheeky

N. B.: davvero sei caduto dalla padella alla brace, dal tuo precedente lavoro su cui hai pubblicato storie fino al 2012 a questo qui...

--
Il solito anonimo codardo


Pengh@ Il solito anonimo codardo By Pengh - posted 22/09/2017 09:12

 

Ca$$o, sul Commodore Vic 20 di mio cuggino (mio cuggino mio cuggino) girava tutto che è una meraviglia, e su 'sto fantamegacoso con 128 CPU virtuali, 64 terabyte di RAM virtuale e un HD virtuale da 7 zettabyte non gira! Ma cos'è 'sta storia? Insomma, se a gestire 1 dato sul Vic 20 gira bene, cosa vuoi che sia gestire un googolplex di dati qui? cheeky



Ma dai, sul mio lappatopi ultimo modello, un Sinclair ZX81, gira ancora meglio con zero dati! E dato che zero è multiplo di tutti i numeri, vuol dire che mi gira un'incredibile quantità di dati.

--
Pengh


Eladamri By Eladamri - posted 11/09/2017 17:47

Questo mi ricorda una breve disscussione con un mio Amico Programmatore:

AP"è inutile impiegare tanto tempo a programmare bene una cosa, costa di più un programmatore che l'hardware."

IO"quindi prima o poi ti ritrovi ad avere una serverfarm solo per giocare a campo minato"

AP:" ma tu non capisci, per debuggare e yadda yadda costa di più pagare dei programmatori, allo stesso prezzo aggiungi più hardware"

IO:"quindi se si paga un buon programmatore si risparmia anche sull'hardware"

Il silenzio che ne è scaturito è stato abbastanza eloqunte.

Buona Settimana Big D.

--
Eladamri


Anonymous coward@ Eladamri By Anonymous coward - posted 18/09/2017 09:56

Questo mi ricorda una breve disscussione con un mio Amico Programmatore:

AP"č inutile impiegare tanto tempo a programmare bene una cosa, costa di pių un programmatore che l'hardware."

IO"quindi prima o poi ti ritrovi ad avere una serverfarm solo per giocare a campo minato"

AP:" ma tu non capisci, per debuggare e yadda yadda costa di pių pagare dei programmatori, allo stesso prezzo aggiungi pių hardware"

IO:"quindi se si paga un buon programmatore si risparmia anche sull'hardware"

Il silenzio che ne č scaturito č stato abbastanza eloqunte.

Buona Settimana Big D.



Mitico

--
Anonymous coward


Anonymous genius By Anonymous genius - posted 11/09/2017 20:32

Per fortuna hai ripreso con le storie, stavo cercando un modo per rintracciarti ed applicare su di te il metodo "Kathy Bates" XD !

--
Anonymous genius


Messer Franz By Messer Franz - posted 12/09/2017 07:48

Tutto bello (da un certo punto di vista), ma mi sono perso un passaggio.

Il programmatore si chiamava DE nel racconto  perchè DEficente, sono le sue iniziali (ed ovviamente non puoi dirci il nome) o mi sono perso un passaggio ed oltre a CL, DB ecc si è aggiunto un nuovo acronimo?

ps: coraggio che la pensione è sempre più vicina...è il giorno dopo la crisi di nervi e il ricovero in manicomio...

--
Messer Franz


Davide Bianchi@ Messer Franz By Davide Bianchi - posted 13/09/2017 10:57

Il programmatore si chiamava DE nel racconto  perchè DEficente, sono le sue iniziali (ed ovviamente non puoi dirci il nome) o mi sono perso un passaggio ed oltre a CL, DB ecc si è aggiunto un nuovo acronimo?

 

Semplicemente DEveloper

 

--
Davide Bianchi


Mattia By Mattia - posted 12/09/2017 19:51

Quella del Laptop in produzione non e' male come idea... dovrei proporla a qualcuno che conosco, che non ci ha ancora pensato ma di sicuro la mettera' in pratica volentieri.

P.S: l'immagine di Gojira ha il link che manda ancora a Gort.

--
Mattia


Guido@ Mattia By Guido - posted 02/10/2017 11:05

Che in effetti e' quello che facciamo qui quando vogliamo vedere come gira qualche ottimizzazione su dati reali e non di test (ovviamente solo consultazione, non certo modifica :P )

Quella del Laptop in produzione non e' male come idea... dovrei proporla a qualcuno che conosco, che non ci ha ancora pensato ma di sicuro la mettera' in pratica volentieri.

P.S: l'immagine di Gojira ha il link che manda ancora a Gort.

 

--
who uses Debian learns Debian but who uses Slackware learns Linux


trekfan1 By trekfan1 - posted 13/09/2017 07:26

Dei suddetti "sviluppatori" io ne vidi sempre e solo uno, puo' darsi che fosse uno in un gruppo di una dozzina e piu', ma io ne vidi sempre e solo uno.

No, tu li hai visti tutti, avevano UN SOLO sviluppatore, ovvero quello che hai visto. Era quello il TEAM!

--
trekfan1


Messer Franz@ trekfan1 By Messer Franz - posted 13/09/2017 14:53

 

Dei suddetti "sviluppatori" io ne vidi sempre e solo uno, puo' darsi che fosse uno in un gruppo di una dozzina e piu', ma io ne vidi sempre e solo uno.

No, tu li hai visti tutti, avevano UN SOLO sviluppatore, ovvero quello che hai visto. Era quello il TEAM!

Beh, potrebbe anche essere solo uno ma con una grave forma di personalità multipla...se gestisce la sua salute come i suoi programmi non mi parrebbe nemmeno la cosa più tragica che gli possa essere successa*....

 

*Sì, ho ucciso un po' le concordanze verbali e la grammatica in generale ma a va bene lo stesso...

--
Messer Franz


ste By ste - posted 13/09/2017 18:31

Ho appena avuto una discussione simile con un collega

riassunto:

massi c'è il cloud, al max aggiungi risorse

angry

--
ste


Tsktsk By Tsktsk - posted 14/09/2017 15:19

Sviluppatore e cliente hanno entrambi le loro colpe. Lo sviluppatore avrebbe potuto fare l'applicazione un po' meglio, dando più tempo al cliente prima che l'app giungesse allo stadio dove scalare l'hardware costa più che non riscrivere l'applicazione con tecnologie più performanti. Il cliente avrebbe dovuto sapere che ad un certo punto avrebbe dovuto mettere a budget la riscrittura almeno in parte dell'applicazione perché non si reggono certi carichi con python.

Rimane il fatto che applicazioni come Booking o Twitter hanno attraversato una fase simile. Faccio il caso di Twitter, nato come applicazione Ruby on Rails a cui è stato tirato il collo fino all'estremo finché non si sono convinti a riscrivere larga parte del backend in Scala/Java.

--
Tsktsk


Nik By Nik - posted 28/09/2017 15:48

>> CL - Ma perche' cose come Facebook e Booking riescono a gestirsi mole di dati enormi e noi no?

Che tenerezza.... come quelli che venivano a chiedere un sito "come Facebook"...

--
Chronicles of a Broken Heart


massimo79m By massimo79m - posted 25/10/2017 17:13

 Ma perche' cose come Facebook e Booking riescono a gestirsi mole di dati enormi e noi no?

 

sei già stato gentile, io gli avrei detto : perché hanno deggli sviluppatori che sanno quello che fanno

--
massimo79m


19 messages this document does not accept new posts

Previous Next


This site is made by me with blood, sweat and gunpowder, if you want to republish or redistribute any part of it, please drop me (or the author of the article if is not me) a mail.


This site was composed with VIM, now is composed with VIM and the (in)famous CMS FdT.

This site isn't optimized for vision with any specific browser, nor it requires special fonts or resolution.
You're free to see it as you wish.

Web Interoperability Pleadge Support This Project
Powered By Gojira