The original proposal The book The World Wide Web, written by Tim Berners-Lee in 1989, is an important piece of Internet history. It also cannot be opened on modern computers.
John Graham-Cumming, a British software engineer and writer, attempted to open the Word document containing the proposal. Modern versions of Microsoft Word and Apple’s Pages failed to open the file, as he explained in a blog entryThe open source word processor LibreOffice It worked, albeit with a tricky format. Graham-Cumming eventually found a PDF exported by CERN in 1998, which was the only way he could view the document as it existed in 1989.
It’s worrying that such an important piece of history, in such a common file format, could be almost completely lost to the passage of time and software updates. Anyone with a collection of old digital documents, photos, and videos might wonder if the same will happen to their archives — which is the kind of question digital archivists deal with all the time, it seems. So I reached out to one of them.
“Twenty years, in the digital realm, is a long time,” says Lance Stuchell, director of digital preservation services at the University of Michigan. His team is frequently tasked with recovering digital files from old computers and storage media. “We have a lab that can handle old media — floppy drives, CDs, old computers. We can take them off those types of media and move them into our preservation system, making sure we don’t damage them while we’re at it.”
But getting files off disk is just the first step: Then they have to be opened and left in a state that will keep them open for decades. It’s a job that has given Stuchell reason to think about strategies for preserving documents as long as possible. I asked him what those of us who aren’t professional archivists should do to ensure our archives last for decades.
Use open formats
The Word document I mentioned earlier could no longer be opened in Microsoft Word because the software had changed over time. This is part of the challenge of archiving digital files.
“With physical objects, the less you look at them, the longer they last,” Stuchell says. “With digital objects, we are constantly fighting against obsolescence. As the archive progresses in time, it loses information.”
Software updates like Microsoft Word mean that files that opened fine in the 1980s won’t open in the 2020s. Part of the problem is that Microsoft, and only Microsoft, controls the file format or even knows how it works. For this reason, Stuchell says he encourages people to export files in an open file format, especially files they want to keep accessible long-term.
For the documents you recommend PDF/Aan open standard built on top of Adobe’s PDF format that includes everything the file needs to open, including the fonts used in the document. Microsoft Office, LibreOffice, and Adobe Acrobat all support exporting to PDF/A, meaning it’s relatively easy to create such a file. Stuchell recommends that you archive any documents you want to keep in that format.