Tales from the Machine Room


Home Page | Comments | Articles | Faq | Documents | Search | Archive | Tales from the Machine Room | Contribute | Login/Register

...only one is 28...

Backup is good, backup is nice, but you have to make it and test it. And you also need to hold onto it. Because if you make it and then throw it away, is useless.

When the world was jounger and backup was done using those magical things called TAPES, and when you mention them today everybody looks at you like you're a martian arrived from... well... Mars... Yoh! Have you ever saw Star Wars? Even the Death Star plan were on "tapes" eh! Yes, a technology that can manufacture faster-than-light-ship and lightsabers and they still uses tapes for storage. Well...

Anyhow, when we were using tapes for the backup, what we did was to use different tapes for different days. We had daily tapes, weekly tapes, monthly tapes and so on. This way we could go back in time until we could recover what we were looking for. That meant to have A LOT of tapes and ORGANIZED.

Stuffing in the library yesterday's tapes instead of last week's ones meant to live the whole week in the fear that somebody would show up and ask for a restore of just that date. Tha usually happened (Murphy).

Obiously, tapes were expensive and bulky, using money and space that could have been used for a lot of other things, but if you want to sleep at night...

Today... Today everything is "cloud". That basically means that the "backup" are in the cloud too. That can be good or bad, it depends.

On the 'good' side, backup means, mostly, to put a check in the box that says "take a snapshot every X hours".

On the other side, that would be the "bad" one, this also means that when you need to do a restore, first you have to figure out where the fuck is your snapshot and what's is name and how do you use it and then you'll have to figure out if you can just grab one file or you need to rebuild the whole fucking machine and then go pick up your file. And that normally this takes a lot of time, coffee and swearing.

Oh and you also need to check how much space all these "snapshot" are using, before you get a multi-billion invoice for "archive space".

And this bring us to today's argument: retention. No, ain't a swear word. Yet. It means that data that are "not very much useful", being it old backups or old logs, have to be stored for a while and then have to be destroyed because they are using space that could be used for something else.

Now, everybody has his own way to manage the data retention, somebody makes monster scripts/application that generates dozens upon dozens of 'reminder' and move stuff in or out 'warm-storage' or 'cold-storage' or 'wtf-storage' or wathever before throwing everything into the shredder, somebody does everything by hand (if the things are manageable)... and who, in many cases, doesn't do squat until something doesn't blow up in his face.

And speaking of "blowing up in your face", let's talk about $salesAndDeals, a company that managed 'vouchers' and 'discount tickets'.

These peoples had a system based on a web site that looked like it had been designed by somebody used to work in Fortran. First of all, every time a change had to be done on a table of the database, the table was copied into a new one with the date added to the name and the changes were done in the copy. The program was designed in such a way that always picked the last table available. This meant that in the db there were a bazillion tables named 'User_XXXXYYZZ' or 'Product_XXXXYYZZ' and so on and so forth.

Every first of the month, some poor asshole had to prepare for the 'billing cycle', that was used to send the invoices to the various customers to get the money. A task that was extremely important. This task started by running a script that scrubbed the db and generated an humongus .csv file that was then manipulated in Excel to be "cut" into pieces, each single piece was then re-manipulated to add discounts, specials and the like. This whole mess was then processed by another script that ended up generating a million invoices to be sent to a million customers.

As you can imagine, this procedure was slow, long and prone to go pear-shaped, especially in the last phase that was mostly manual modification. And the problem was that if something went wrong, the only way to repeat the whole thing was to restore the database to the point BEFORE the beginning, sinche the 'scrubbing' phase was creating a zillion new tables.

Somebody with some brain had suggested to $sales to use a 'retention' for the backup of at least 30 days, to be able to get back to the previous month in case of problems.

And we get o a nice day, when $sale opened an "urgent" ticket to ask the restore of the db to 2 month before. Our CL looked at it, and replied that the retention was only 30 days so it wasn't possible to do the restore. And then he closed the ticket faster then Lucky Luke.

20 seconds later the phone start ringing. And at this point the problems begins, because $sales pointed out that the if the 'retention' is 30 days and today is the first of March, then should be possible to go back to the 30 of January (since February is only 28 days). And mathematics agrees.

However, I couldn't fail to notice that the snapshot aint't there anymore. In fact... THERE ARE NO SNAPSHOT for the customer. What the fuck? Clearly the thing (and the mails sent by $sales to everybody) attract DB's attention.

So I start looking at the "backup" script and found that... the script should compute the $dayoflastbackup = $today - $numberofdaysretention, but apparently, the CL that wrote this things wasn't good at using DateCalc, so he resorted to  $dayoflastbackup=$firstdayofthemonth. Watch out: $firstdayofthemonth, not $firstdayofthemonthbefore. This means that the first day of the month, everything before is zapped.

And that's why the backup is only the first step, you also have to test it.

Davide
23/04/2019 15:56

Previous Next

Comments are added when and more important if I have the time to review them and after removing Spam, Crap, Phishing and the like. So don't hold your breath. And if your comment doesn't appear, is probably becuase it wasn't worth it.

15 messages post new
Anonymous coward By Anonymous coward - posted 17/06/2019 10:43 - reply

mmm, stavolta il C(og)L(ioide) mi pare sia un tuo collega...

Mi chiedo quanto $BigBoss abbia dovuto pagara di penale - visto che non e' stato in grado di garantire il servizio di backup (E CONSEGUENTE RESTORE) per il quale prende $beisoldoni, e soppatutto quanto forte il suddetto si sia inculato il CL di cui sopra, responsabile del disservizio.

No, perche tu parli del misfatto ma taci sulle conseguenze, e noi siamo curiosi come scimmie.

Quando si scrivono funzioni di quel tipo, ho imparato che bisognerebbe sempre prevedere il comportamento in tutt l'intervallo dei valori ammissibili: in questo caso, durante tutti i giorni dell'anno tramite apposito script che li testa tutti e 365+1 (si, conto anche i bisestili) e non limitarsi  "per la data di oggi funziona, allora funzionerà anche tutto il resto dell'anno". Si, speraci!

 

PS: ma com'e' che non ci sono piu' contributi da altri? le "storie" dei contributors erano belle, andrebbero reintrodotte.

 

--
Anonymous coward


Messer Franz By Messer Franz - posted 17/06/2019 11:25 - reply

il cliente sarà stato contentissimo, soprattutto perchè (se ho capito bene, è una cosa strana da leggere, sono ancora sconvolto) questa volta aveva ragione lui (e per la faccenda del 28 penso si debba vedere il contratto)... non so perchè ma sto avendo la visione di DB che chiude in una stanza quello che ha fatto lo script senza braghe e un elefante in crisi ormonale da primavera cui sono stati dati 2 quintali di viagra, e che poi (DB) si siede fuori con i popcorn in mano aspettando i vocalizzi...

--
Messer Franz


Nik By Nik - posted 17/06/2019 12:31 - reply

Il CL è stato poi scuoiato?

--
Se striscia fulmina, se svolazza l'ammazza


Davide Bianchi@ Nik By Davide Bianchi - posted 17/06/2019 16:06 - reply

Il CL è stato poi scuoiato?

Ovviamente no... che ti credi?

 

--
Davide Bianchi


Nik@ Davide Bianchi By Nik - posted 19/06/2019 10:48 - reply

 

> Il CL è stato poi scuoiato?

Ovviamente no... che ti credi?

 

Ho capito: diventerà UL

 

--
Se striscia fulmina, se svolazza l'ammazza


Vrann By Vrann - posted 17/06/2019 14:41 - reply

Ma ci sono ancora i nastri per il backup. Le cartucce Ultrium LTO, arrivate alla generazione 8, che tiene ben 12 terabyte di backup (che possono pure diventare 30 compressi). Non ci sarebbero problemi di ritenzione (idrica??? :D) con un'unità di backup in grado di gestire quelle cartucce.

(Già, ma mi sa che un'unità di backup costa troppi d€nari per il DB di turno...)

--
Vrann


massimo m.@ Vrann By massimo m. - posted 17/06/2019 19:11 - reply

> Ma ci sono ancora i nastri per il backup.

 

Qualche anno fa mi e' venuto lo sghiribizzo dei nastri per backuppare il mio server personale (qualche decina di gb), ma vedendo i prezzi dei drive, mi e' passato subito. Non e' che siano molto economici, tutt'altro. Quasi conviene comprare qualche hd esterno.

 

 

--
massimo m.


Guido By Guido - posted 18/06/2019 10:04 - reply

...di 28 ce n'e' uno e purtroppo non e' il mio...

--
who uses Debian learns Debian but who uses Slackware learns Linux


Luca Bertoncello By Luca Bertoncello - posted 18/06/2019 18:47 - reply

I nastri, che bella cosa... Anche noi in ufficio li usiamo (e uno script controlla quotidianamente se tutto e' a posto).

Anche $provider_schifoso li usa. Con la conseguenza che, a causa di un file-system sputtanato nel fine settimana, stiamo ancora aspettando adesso il restore di 6 files (leggasi sei) per un totale di 10KB (leggansi dieci kilobytes!!).

Certo, se il tipo di $provider_schifoso non perdesse tempo a trastullarsi scrivendo immondizia nel Ticket (tipo: "file X ripristinato. Tra tre ore prevedo di ripristinare il file Y" e roba del genere), forse gia' ci saremmo, purtroppo non e' cosi'...

E adesso che abbiamo visto come funziona il Backup, siamo nel panico a pensare a cosa succederebbe a $notissima_assicurazione_sanitaria_tedesca, ospitata da $provider_schifoso, se dei files veramente importanti svanissero nel nulla come sono svaniti quei pochi files...

Ancora tre giorni prima del meritatissimo fine settimana...

--
Luca Bertoncello


Antonio Pennino By Antonio Pennino - posted 18/06/2019 18:58 - reply

beh... ma almeno far pagare all' incauto una parte dei danni?

chesso', 30 giorni di paga :\)

--
Antonio Pennino


Tipo strano By Tipo strano - posted 19/06/2019 08:56 - reply

Quello da 28 è stato usato per il CL, giusto? :D

--
Tipo strano


Anonymous coward By Anonymous coward - posted 20/06/2019 11:49 - reply

giusto qualche giorno addietro stavo pensando ad un paio di script "temporanei" che scrissi tempo addietro (2011) provando ad immaginare se fosse ancora in uso.

Era un blob, un crimine contro l'informatica, che controllava, archiviava, registrando e notificando il tutto.

Qualche mese dopo aver scritto lo script i miei rapporti con l'azienda terminarono bruscamente (in 5 minuti di lucidità mandai a raddrizzar banane con le terga il cliente più grosso mentre $bigboss venne invitato a sedersi su una pianta di carciofi )* quindi non lo saprò mai con certezza.

Sospetto che abbia subito la stessa sorte di un certo firewall temporaneo.

* storia lunga, reazione che mi causò problemi lavorativi ma che non rimpiango anzi.

--
Anonymous coward


Anonymous coward By Anonymous coward - posted 21/06/2019 14:10 - reply

Oppure potrebbe succedere che per un casino elettrico (sorvolo...) caschino mezze macchine virtuali, prontamente resettate e riavviate la mattina dopo, e ripartano tutte tranne una, che OVVIAMENTE è il server che gestisce i backup (e i restore...), e il supporto ci metta più di un mese a sistemare il tutto...

--
Anonymous coward


Anonymous coward By Anonymous coward - posted 11/07/2019 23:43 - reply

In.magino che CL sia stato promosso in una posizione direttiva dove fara' meno danni, usufruendo al contempo %attuale_salario * 1.5. E' cosi' che funziona, gente!

--
Anonymous coward


Anonymous coward By Anonymous coward - posted 05/08/2019 13:36 - reply

come diceva? ah, si:

"30 giorni ha novembre, con april, giugno e settembre,

ventotto giorni ha gennaio,

tutti gli altri... ne hanno un paio"

 

Tutto qui, il tipo si ricordava male il proverbio :-\)

 

--
Anonymous coward


15 messages post new

Previous Next


This site is made by me with blood, sweat and gunpowder, if you want to republish or redistribute any part of it, please drop me (or the author of the article if is not me) a mail.


This site was composed with VIM, now is composed with VIM and the (in)famous CMS FdT.

This site isn't optimized for vision with any specific browser, nor it requires special fonts or resolution.
You're free to see it as you wish.

Web Interoperability Pleadge Support This Project
Powered By Gojira