Auteur Sujet: QRZ Logbook Server Temporarily Down  (Lu 3921 fois)

Hors ligne VE2UGO Hugo

  • Membre guru
  • *****
  • Messages: 871
  • Rapid Deployment Amateur Radio
    • http://files.qrz.com/m/kd4sm/life.jpg
QRZ Logbook Server Temporarily Down
« le: 13 Novembre 2014, 22:36:10 »
Ca va ben!!! Depuis 16h environ ici!!!

Due to unforeseen circumstances, the QRZ Logbook Server is temporarily down. You will notice that your Logbook will be unavailable. You will also notice that the Logbook tab will be blank during this time. This outage could last as long as 24 hours, though we hope that we will be able to restore service much more quickly. We assure you that we will make the database available as soon as we are able. We sincerely apologize for any inconvenience caused by this event.

Hors ligne VE2UGO Hugo

  • Membre guru
  • *****
  • Messages: 871
  • Rapid Deployment Amateur Radio
    • http://files.qrz.com/m/kd4sm/life.jpg
Re : QRZ Logbook Server Temporarily Down
« Réponse #1 le: 15 Novembre 2014, 02:03:33 »
Logbook Outage Update

First, thanks to all for your patience. While we're waiting on the database to rebuild, I'd like to take this moment to provide some greater insight into what happened.

Last month, our cloud service provider, Amazon, distributed a set of mandatory security patches to all of our servers. These patches required system reboots and we never knew exactly when they would come. The "reboot day" came and went with little fanfare. We did notice a few problems on the Forums server which were immediately corrected. Everything else survived the reboot and was working properly, including the Logbook server.

This week, while performing routine health checks on our servers, we noticed that a key disk drive on the Logbook machine was filling up and had reached 86% full. While looking at ways to remedy this, we noticed that the error log file on the Logbook server had grown to an enormous size, nearly 300 Gigabytes. Looking at the log, there was an indication of an internal fault in the Logbook database which was causing a flood of warning messages. Despite these warnings, the Logbook server was apparently running normally.

Although the server was still running and the data was (apparently) still intact, we shut the server down as a precautionary measure. At first, we didn't know if this was a problem that we could correct immediately, or one which would require an extended outage. Accordingly, our first announcement indicated that the outage was less severe than it was, based on what we knew at the time. When we discovered that a full database reload was the only fix, we amended out announcement to more closely convey the situation.

The problem was extremely esoteric and it is our belief that it was caused by last month's forced reboot. The most trustworthy fix was to dump the entire database and reload it into a fresh database engine. This is easy, technically, but very time consuming as there are now in excess of 72 million QSO's (stored in 67 gigabytes) to insert and index back into the database. The time needed to re-insert these records is not immediately known, however, at this moment, about 18 hours later, the database operation appears to be about 50% complete. Clearly, the Logbook won't be back this afternoon and may well take another 18 hours to complete.

While we regret the outage, we still believe that, given the circumstances, the precautionary shutdown was the best course to take to avoid the loss of QSO data. For those IT professionals among us, the same issue exists even with replicated databases as the error seems to propagate to replication servers. We don't use replication, however we do snapshot the entire 500GB database every night and we have several weeks worth of back copies to draw upon in the event of a more substantial disaster.

Again, we thank you for your patience and understanding as we are working diligently to bring the server back online.

73, -fred
Fred Lloyd, AA7BQ

Hors ligne VE2UGO Hugo

  • Membre guru
  • *****
  • Messages: 871
  • Rapid Deployment Amateur Radio
    • http://files.qrz.com/m/kd4sm/life.jpg
Re : QRZ Logbook Server Temporarily Down
« Réponse #2 le: 15 Novembre 2014, 19:56:26 »
Le log book semble de retour.

Me concernant, il ne semble pas avoir d'erreur!

Bon DX!

Michel VE2CRH

  • Invité
Re : QRZ Logbook Server Temporarily Down
« Réponse #3 le: 15 Novembre 2014, 20:37:09 »

Hello,

Me concernant, après une révision, je n'ai pas trouvé d'erreur,

Bon Dx,