catallaxy files

catallaxy in technical exile

Jacques Chester is a god II

with 19 comments

Following the tech disaster last Monday Jacques is trying to recover as much as he can from web caches etc. Basically where we stand right now is that everything after 2006 is lost. The National Library has a copy of Catallaxy up to October this year, and everything after October needs to be recovered. Even after Jacques recovers what he can, it will need to be programmed to reappear in blog format. So it looks like a huge and complex job ahead.

Sinc

Advertisements

Written by Admin

December 16, 2009 at 8:22 pm

Posted in Uncategorized

19 Responses

Subscribe to comments with RSS.

  1. I regularly export a copy of my blog to my computer – worth doing in future (and a lesson learned from multiple Catallaxy server disasters when I was a blogger there).

    Andrew Norton

    December 16, 2009 at 9:38 pm

  2. That’s both more and less serious than I feared – to lose years worth is bad, but lucky for the National Library (and Jacques of course *bows respectfully*). At least it won’t be a repeat of the Great Server Crash that lost us the first and best Thread of Doom. Or am I confusing that with one of the other crashes?

    Jarrah

    December 16, 2009 at 10:23 pm

  3. Sinc, this actually demonstrates that Jacques has not done a good job of operating your site.

    A competent admin maintains readily-deployed backups and also protects the site against disruptions in the first place.

    Sanjay

    December 16, 2009 at 11:35 pm

  4. Indeed. Jacques has not been the technical support person, C8to has been the technical support person. Jacques is the person we’re asked to do the FUFU work.

    Admin

    December 16, 2009 at 11:38 pm

  5. It’s the lack of a pyramid based security system. You mock bird at your peril

    pedro

    December 17, 2009 at 2:21 am

  6. Constructive criticism here and I stress, constructive.

    But look, Catallaxy is a pretty popular, important Australian blog. I think it’s time it had an Instapundit-like professionalism about its appearance and security. It’s a run-down looking piece of crap at the moment and the previous platform wasn’t much better.

    Time to decide whether you want a proper modern blog or an internet jalopy.

    C.L.

    December 17, 2009 at 2:48 am

  7. On the plus side the font is so much easier to read now. Respite for my poor eyes

    Jason Soon

    December 17, 2009 at 3:49 am

  8. CL – I hear what you’re saying – right now this is a patch job. Most importantly this is our hobby. Although I’ve come to a greater appreciation of the reach of catallaxy in the last few days as I’ve had communication with people wondering what has happened.

    Sinclair Davidson

    December 17, 2009 at 3:53 am

  9. You mean the embarrassing arguments reach a wide audience? Yikes!

    pedro

    December 17, 2009 at 4:00 am

  10. What embarrassing argument? 🙂

    Sinclair Davidson

    December 17, 2009 at 4:03 am

  11. Basically the problem is that the export of the original data failed. WordPress has an inbuilt export tool which spits out a “WXR” file. But on a site with lots of stories and comments – like Catallaxy – this takes time. Minutes.

    Most shared hosts, like the one Catallaxy was on, have settings to kill any program that runs for more than 60 seconds. So the Catallaxy export was killed mid-flight, as you can see. But WordPress doesn’t tell you that the export failed; the failure is silent. So it’s completely understandable that c8to thought he had a full backup.

    At the moment I am getting in touch with someone I know at the NLA. I’m going to ask for a copy of the structured archive they keep, which should be quicker and easier to write an import program for. I also found a tool called “Warrick”, which is automatically spidering web caches to obtain copies of every post I can. This is very time consuming as it cannot request items more than a few times in any 10 minute period. It will probably take days to complete the downloads from the caches.

    Once I have those two sources of data, I will need to write and test a script to combine their data with the incomplete backup, then spit out an SQL file that I can upload to the new server. This will probably take a few days to get right. But hopefully, between the three sources, we should be able to recover a large fraction of Catallaxy as it was up to last week.

    As for preventing future disasters, I have opened an account with the remote backup service tarnsap. I’ve been following its development off and on for a while and it seems to fit my requirements perfectly: it allows efficient full and incremental backups, is cheap and uses strong cryptography to secure the backups. I plan to add other sites I administer to the same regime soon.

    In summary: recovery works are under way, but it will take at least a week before full service is restored. Feel free to tell your friends and colleagues that I am utterly amazing and available to consult or freelance at very reasonable rates.

    Jacques Chester

    December 17, 2009 at 3:49 pm

  12. Jacques Chester really is a god!

    jtfsoon

    December 17, 2009 at 5:22 pm

  13. Yes. I am setting up a little Roman shrine in a little grotto in my house with a wooden sculpture of Jacques and some candles.

    dover_beach

    December 17, 2009 at 7:18 pm

  14. Google Reader has a massive cache, to October and further. You might be able to just sign up to Reader, add catallaxy and scroll through “all items”, alternatively send me an email and I’ll fetch all the HTML for you; it’s pretty nicely structured so you can just parse it apart into stories.
    — Rod

    Rod

    December 17, 2009 at 9:02 pm

  15. In fact, it’s pretty simple so here it is, back to the beginning of October, not too hard to pull apart, no comments though of course:

    http://pastebin.com/m7803f88a

    No idea how much further you could go, I imagine it’ll be turtles all the way down though if you just keep scrolling!

    And while you’re receiving unsolicited tech advice, Amazon AWS is a brilliant solution, very cheap with block-device-level differential backups, awesome.

    — Rod

    Rod

    December 17, 2009 at 9:20 pm

  16. Excellent idea, rod, and thanks for the pastebin entry. However I will continue to follow up with the Warrick script, as it will also try to receive images and attached files from web caches and archives.

    I’m still trying to reach my contact at the NLA. All of this might come together faster than originally hoped.

    As it happens, tarnsap uses Amazon S3 as its storage backend.

    Jacques Chester

    December 17, 2009 at 9:58 pm

  17. Rod – thank you for that.

    Sinclair Davidson

    December 18, 2009 at 8:24 am

  18. […] constructive commentary from CL. Catallaxy is a pretty popular, important Australian blog. I think it’s time it had an […]

  19. No Chester is not a god. Chester is God.

    JC1

    January 21, 2010 at 2:01 am


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: