Tales from the Machine Room


Home Page | Comments | Articles | Faq | Documents | Search | Archive | Tales from the Machine Room | Contribute | Set language to:en it | Login/Register


Are You Anonymized?

Let's talk about data. Lots of data. The kind of data you get when you run the same system for a few years without ever cleaning up your data structure. What you do with all that bunch of stuff? Well, mostly nothing. You keep it in there, make backups and keep on going. The problem is that every now and then, you need to test stuff. And if you have a TEST environment, you should have the same stuff as in production, so a very nice, big copy of all that junk.

But then you have 2 problems: 1. you have lots of data that you actually don't need and 2. you shouldn't keep real personal data in TEST because there is no need for and bad things (tm) could happen.

What to do?

Simple. Instead of keeping all the data in there permanently, every now and then you do a "data refresh", that basically means you make a copy of the production data, copy it into test and then zap all the data older than 1 month (for example) and replace all the names and addresses with random junk. This way your database is a lot cleaner and leaner and still the data are coherent and you don't have "personal" data in there that could cause problems.

Now, notice that I marked this a 'simple', not 'easy', because the process to go from a full-sized-database to a leaner-meaner-database is everything but easy. First of all, you need to get the whole database. That means A LOT OF DATA, potentially. And everytime you run it, it's more data than the last time. And then you need to figure out a way to DELETE stuff without breaking the data-coherency.

That is not easy and that is why it costs a shitload of time to do right.

Enters $hurryupitslate, a company that... I don't know what they do but they are... sorta... always late for something apparently.

Their normal procedure is to phone requesting something, when you ask for a support ticket to be opened they yell across the room for somebody else to enter the ticket while they keep you at the phone and as soon as the other guy say "it's done" they keep asking if you got the ticket and when you say "yes" they start asking why you didn't do wathever they asked already.

Is not such a wonder that when the phone rings and it's their number nobody wants to pick up the phone and suddenly everybody is really busy in doing... something...

Anyhow, sometimes ago they did their "shtick" to get a copy of their production database moved to the test environment. Unfortunately for them they got ME. And this means that A) I punctually informed them that no database could be copied to the 'test' enviroment since this wasn't existing and I wasn't going to create one without a signed contract for which I tossed them to the wolf, I mean MarketingMan.

When they finished with that, I also informed them that the size of database server they picked for the test environment was a bit on the small size and as such, only a very small set of data could be maintained in there. Also, there was no established procedure for the anonymization of the data and we couldn't devise one because we had no idea of the data structure.

A lot of discussion later, their developers provided a script that was supposed to do the whole delete-old-stuff-and-anonymize gig, it is only a matter of testing it.

And now, let's introduce the star of the day: CL. That was the "on call" person for the day and got the not-so-nice phone call from $hurryupitslate at 9 PM to test the whole thing at once.

CL wasn't exactly thrilled about doing the whole thing, in fact he was more counting on a quiet evening without having to do anything at all, but as you can understand, $hurryup wasn't going to take "no" for an answer. And before somebody mention it: no, that shouldn't have been a task to do during the evening unless it was arranged before hand and concorded with the person, but $hurryup did what they were used to do: called the "emergency number" and ignored the fact that it wasn't a fucking emergency at all.

Anyhow, about half an hour into the process, CL discovered a tragic mismatch between the "test" environment and the production one: the disk size. In specific, the disk size of the database server. However, $hurryup told him, basically, to not bother because the "script should take care of that". So CL simply proceeded.

The problem was that the process required the entire database to be imported before being reduced in size. And when he attempted to do so, the process failed because the disk was too small to host the whole production database. The result was that the process failed to import the database, leaving the TEST database basically an empty slate.

And here cames the problem.

The correct procedure at this point would have been to refuse to proceed with anything and refer the whole matter to DB & MarketingMan the next day, but of course $hurryup wasn't pleased with the solution so they pressed with a different solution. Solution that didn't existed. And this made CL confused... And when CL is confused, CL does stupid thing. What he thought was "hey, how about making a copy of the db on production and running the script on the copy? Then there should be enough space to make a dump of the reduced copy and move that to Test!".

That is a nice idea, I have to admit. But... It requires to run a destructive script on a PRODUCTION environment. And with an already noisy customer on the phone pressing for "speedy solution" and at a time when your brain is no longer in "high attention mode". You know what happened of course.

It happened that CL made a new database on production. And then run the script without making a copy first. The result was a nicely shrunk and anonymized database. On production. And a full test of the restore procedure for the production database and the whole set of data for the day lost.

What did we learned that day? First: do not give the 'emergency phone number' to customer and Second: the Emergency phone number is for EMERGENCY that is "something in production does not work", anything else is NOT an emergency. And third thing: $hurryup can go fuck themselves.

Davide
31/05/2018 16:30

Previous Next

Comments are added when and more important if I have the time to review them and after removing Spam, Crap, Phishing and the like. So don't hold your breath. And if your comment doesn't appear, is probably becuase it wasn't worth it.

8 messages  this document does not accept new posts

emi.ska

By emi.ska posted 23/07/2018 09:21

"Success che CL creo' un nuovo database in produzione e poi esegui' lo script ma senza fare la copia dei dati. Il risultato fu un database di produzione perfettamente anonimizzato e molto ridotto."

Ero sicuro che sarebbe successo!!! comunque quando il cliente si comporta cosi' se lo merita!

Buon lavoro a tutti coloro che non sono sotto un ombrellone ma davanti ad un PC!!

Emiliano

 

-- emi.ska

Berroll

By Berroll posted 23/07/2018 09:25

Ehi, per schiumare il mare ci vuole uno script di snellimento e anonimizzazione che lavori su un piccolo stagno di test!

-- Berroll

Anonymous coward

By Anonymous coward posted 24/07/2018 12:04

CL è il meno colpevole, a leggerla così... nelle condizioni imposte dal capo era l'unica cosa sensata da fare in velocità, la mancata copia probabilmente è dipesa dalla tarda ora, dalla stanchezza e dalla pressione ricevuta.

Leggendo la premessa pensavo parlassi del nostro caso: dischi di rete con roba vecchia di EONI, che nessuno vuole buttar via perché "non si sa mai", e direttore che non vuole spendere per il backup perché "prima buttate via quello che non serve". In mezzo, gente sfruttata, malpagata e che ha sempre la colpa di tutto, cioé noi...

-- Anonymous coward

Messer Franz

By Messer Franz posted 25/07/2018 05:42

Giorno dopo: hey, DB, c'è da schiumare il mare, $hurryup ce l'ha sul contratto, cioè, non c'era ma glielo abbiamo offerto noi di aggiungerlo, così NOI guadagnamo tanti bei soldi...tu con lo stipendio base meno un quarto devi solo fare tutto ciò che è necessario!

Tra l' altro, per coordinarsi in completa sinergia, abbiamo dato ad $hurryup il tuo numero di cellulare, il tuo numero del fisso di casa ( e se non hai un fisso no problem te l'abbiamo preso noi a spese tue ) e anche il tuo indirizzo e descrizione fisica...

CONTENTO?

Ah, quasi dimenticavo...deve essere finito per stasera...

-- Messer Franz

Anonymous coward

@ Messer Franz By Anonymous coward posted 26/07/2018 15:40

Giorno dopo: hey, DB, c'è da schiumare il mare, $hurryup ce l'ha sul contratto, cioè, non c'era ma glielo abbiamo offerto noi di aggiungerlo, così NOI guadagnamo tanti bei soldi...tu con lo stipendio base meno un quarto devi solo fare tutto ciò che è necessario!

Tra l' altro, per coordinarsi in completa sinergia, abbiamo dato ad $hurryup il tuo numero di cellulare, il tuo numero del fisso di casa ( e se non hai un fisso no problem te l'abbiamo preso noi a spese tue ) e anche il tuo indirizzo e descrizione fisica...

CONTENTO?

Ah, quasi dimenticavo...deve essere finito per stasera...

 

Lavori da me? Mi è familiare...

-- Anonymous coward

Guido

By Guido posted 30/07/2018 07:14

Dire che CL non era esattamente contento di essere chiamato alle 9 di sera e' dire poco, lui contava di passare una serata tranquilla senza dover fare un accidente di niente

Il nostro hosting pampers quando vuole perdere tempo evita di rispondere al ticket salvo poi chiedere autorizzazione a procedere (nonostante sia esplicita nel ticket stesso) ad orari improbabili...

Oppure chiedere chiarimenti su cosa fare (tipo se l'operazione richiesta e' copiare il contenuto delle tabella A, B...Z da produzione a test loro ti fanno aspettare una giornata poi ti mandano una mail chiedendo "quali tabelle dobbiamo copiare"?)

-- who uses Debian learns Debian but who uses Slackware learns Linux

Luca Ballarati

By Luca Ballarati posted 05/08/2018 21:43

Questo $bianconiglio é più un cappellaio matto

-- Luca Ballarati

Massimo m.

By Massimo m. posted 03/01/2019 20:15

Penso che ridurre la mole di dati in test sia un'arma a doppio taglio.

Da una parte hai il db più scattante, e per test di aggiornamenti e librerie, dall'altra se ti serve vedere se gli aggiornamenti impattano sulle prestazioni, puoi avere risultati falsati.

 

-- Massimo m.

8 messages  this document does not accept new posts

Previous Next


This site is made by me with blood, sweat and gunpowder, if you want to republish or redistribute any part of it, please drop me (or the author of the article if is not me) a mail.


This site was composed with VIM, now is composed with VIM and the (in)famous CMS FdT.

This site isn't optimized for vision with any specific browser, nor it requires special fonts or resolution.
You're free to see it as you wish.

Web Interoperability Pleadge Support This Project Powered By Gigan