When it comes to long-term data storage, modern man has yet to improve upon the clay tablets of Sumer. Tablets imprinted with cuneiform writing and baked in a kiln are as readable today as when first fired as far back as the Bronze Age and will be intact ten thousand years from now. However, clay tablets are neither portable nor convenient. The search for better writing mediums led to improvements fueling the growth of civilization. Each innovation brought with it a loss in permanence, a trade-off people made in exchange for convenience, lower price, and greater ease in transportation. Nowhere is the paradox more pertinent than with Digital data. Digital data enjoys immense advantages in terms of storage, duplication, and transmission costs over any other media in history, yet may prove to be the most short-lived of all.
The Written Word – Mediums of the Past
Papyrus and parchment were the media preferred by the ancient world. Papyrus, made from the pith of the papyrus plant, could last for thousands of years in the dry climate of the Middle East. Parchment, made from the skin of calves, sheep, or goats, was tough. Illuminated parchment manuscripts from the Middle Ages are still readable today. Paper first appeared in China around 200 B.C. and slowly spread to the west, first to the Arabs, and later to Europe, coming into wide use by the 14th century. Cheaper and easier to produce, paper replaced parchment in most every day use, reserving parchment for documents meant to last, which is why formal copies of the Declaration of Independence, the Constitution, and the Magna Carta are all parchment. From the Middle Ages until the 19th century, paper was made from pressed cellulose, such as vegetable fibers or rags. Cotton or hemp rag paper could last anywhere from 300 to 800 years in ideal conditions.
The 19th century changed things. The invention of pulp paper created an explosion of books and reading not seen since the invention of the printing press. Pulp paper was cheap and fast. Without it, newspapers and periodicals would have been too expensive to produce. One could argue the Industrial Revolution depended as much on pulp paper as coal and the steam engine. Mass market literature and the rapid dissemination of ideas went hand in hand with the telegraph, telephone, railroad, and steamship. There was a hidden danger, one unrecognized until William Barrow discovered it in the 1930s. Paper made from wood pulp tended to be high in acid, courtesy of the lignin caught in the pulp during production. Over time, pulp paper tends to turn yellow and brittle. By the end of the 20th century, there were frantic concerns about losing all the books, newspapers, and periodicals printed since 1850 due to the acid eating away the paper. Some envisioned future generations looking back and seeing a two hundred year gap, like another Dark Age, and lamented the effect on culture and civilization.
This will be controversial, but the threat of deterioration present in acid pulp paper may have been overstated and the cure, in some cases, has been worse than the threat itself. Those who claimed acid paper would disintegrate into powder or that it would last only 50 to 70 years have not been wholly right. Pulp paper can last up to 200 years if properly treated with care. The short fibers in the pulp do mean the paper can break if bent or creased. I have yet to hear of an authenticated case of a pulp book disintegrating. Usually it happens where mold or fungi were involved. The acid paper scare, although based on a real problem, did result in successful preservation projects and research that produced acid free paper, such as “permanent paper”, that might rival rag paper in lifespan (although claims of 500 to 1,000 years for archival grade alkaline paper may be unrealistic). It also may have ironically created the very loss of history and culture feared by in some quarters by causing libraries to discard old books and newspaper archives, moving it all over to microfilm and microfiche, which have their own problems and issues with long-term preservation few seem willing to address. However, that is a separate article.
Paper and the printed word remains a key component of the modern world, although the internet and the digital revolution are causing a sea change in the publishing world as reading habits and expectations shift.
Photos, Film, and Audio
Similar problems and issues crop up in other media. Until the 1950s, film stock was celluloid, which is so flammable it can explode and degrades over time. Today we have lost perhaps 90% of all the films made before 1929. The first photographs were daguerrotypes and tintypes, both expensive but long lasting with proper care. Photographic paper was cheaper, but not as durable, especially if not stored away from heat, light, and moisture, the three greatest enemies of paper and film. No one is sure how long photo prints and slide will last so figures are all over the place. There are too many variables in ink chemistry and paper choice. A photo print on display in a frame may last a few decades because of UV exposure, but will last longer in a photo album. Professional photos can last between 65 to 75 years before fading or discoloration. Photos from local pharmacies or elsewhere may last only 50 years or less. Photos 150 years old exist, indicating a wide range of possibilities. Archival photo paper claims a shelf life of 100 to 200 years, but it is unproven. Developed negatives can last up to a century while undeveloped negatives go stale between three to six years unless refrigerated or frozen. Freezing extends the life of film past the expiration date, but background radiation renders it unusuable over time.
Recorded music went from wax cylinders to 78 rpm records, long playing vinyl albums, magnetic tape, 8 track tapes, cassette tapes, DAT tape, CD, SACD, and now MP3, FLAC, and Ogg Vorbis files. In each case, a cheaper product holding more information in a more convenient package replaced the preceding physical medium. The same holds true in the digital realm, although with the curious fact that the digital realm saw a regression in quality and longevity, from wax and vinyl albums still playable a century later to digital downloads that can degrade within a few years, that tracks closely to the increase in portability and convenience. It is almost as if the jump in density of information carried with it a penalty in robustness.
I won’t get into the debate on digital sound or picture quality. The flame wars over the superiority of one format over another continue to this day and are brutal. In broad, crude terms, the original 16 bit digital tracks on CDs are an approximation of the analogue wave form. There is missing information. That is the nature of the digital beast. You replace a physical analogue object with one made of zeros and ones, a construct of electronic mathematics, and while you gain tremendous benefits, there are compromises. The real debate devolves to whether or not people can hear the differences in quality. Still, I suggest if one compared a 16-bit CD to the original analogue source, whether tape or LP, the difference in the midrange to treble can be astonishing. Compare a CD to an SACD and the differences can be just as stark. (For fun, do a search on the hotly debated 2007 experiment by Meyer and Moran that probably contributed to companies abandoning SACD.) Now consider how the MP3 takes the quality issue and throws it under a bus, resolving to a fraction of the original digital source. The fall in audio resolution was offset by two things: smaller file size, which meant you could store more in smaller spaces and transmit it more easily over old dial-up modems, and the ability to format shift music from one media to another.
The ability to move digital data, whether words, music, video, or pictures, from one storage media to another has been essential for the information age and the digital revolution. No matter how much some in the music and film industry lament phantom lost dollars, the truth is format shifting increased demand for content because customers, the people who bought and owned the data, want to enjoy it as they please and not how someone else tells them they should. Because the data was digital, the barrier to copying was low and modern culture became hooked on the easy and cheap availability of data, information, and entertainment. One sometimes wonders why companies cannot understand that if you give people what they want, they will pay for it hand over fist.
Photography mirrors the audio world. The first digital images were cartoons in terms of resolution compared to 35mm film. That soon changed. DSLR cameras now match and exceed the resolution of film. Again, the debates over film resolution get nasty. I’ve seen claims that 35mm film equals anywhere from 6 megapixels to 36 megapixels. The quality of a 35mm negative will vary tremendously depending upon type of film, the lens, lighting conditions, and other factors. My personal opinion based on experience is that film, under good conditions, occupies a spectrum of quality from 12 to 24 megapixels with 18 megapixels as an average. The only real advantage film continues to offer is a greater color bit depth and dynamic range, resulting in gradations of color and tone no digital camera can yet match. The best digital cameras capture 14 bits of color while film can reach 24 bits of color or more. Different file formats offer different advantages and disadvantages. JPEGs are the most popular choice. It uses varying levels of compression, resulting in files that can be small but lossy or large with better quality. Despite alternative like the JPEG 2000 standard, no one has come up a viable replacement for the JPEG, despite its flaws, which include a tendency to degrade the more you edit the original file. Professional photographers who shoot digital prefer RAW files or converting the result to TIFF or DNG files. The problem is you gain quality but lose portability: a RAW or TIFF file can be enormous. Data on longevity for digital image files is hard to come by, but JPEGs can corrupt over time even if never touched. I’ve seen estimates of 30-50 years for digital photos and I feel that is overstating actual life by a factor of five. Unlike in audio, photography is on a quest to develop higher resolution, higher ISOs, higher dynamic range and color depth. Within ten years I expect digital cameras to rival film in every area it currently excels. Given the race to higher resolution in photography and video, I suppose people are more willing to trust what they see rather than what they hear.
Like pulp paper before it, digital data allowed for the cheap production, transmission, and storage of information. The rate of creation is accelerating to unprecedented levels. Take a look at the following infographic from domo.com:
Digital data soon included words, images, music, and much else. The only limits on digital data became how you store and transmit it. Best of all, it could be manipulated with ease. Digital data, zeros and ones like the dot dash of a telegraph, could be compressed, rearranged, truncated, expanded, upsampled, downsampled, shaken and stirred like a martini – all for almost nothing.
The result was an information age whose effects still reverberate around us. All the issues noted above with paper, film, and photographs replicate themselves in the digital realm, though with different implications, multiplying when you factor in file formats, operating systems, computer languages, and planned obsolescence. One wonders how much of this staggering content will be available five or ten years down the road and how much will be lost or forgotten within twenty.
Lifespans of Digital Media
Forgive the history dump. There is a method to my madness. If you’ve read up to this point, you might ask “so what? What does all this mean to me?” Fair enough. Consider this graphic, found on crashplan.com:
Please note the infographic is short on nuance. Magnetic tape was the original media of choice for digital data. It has a lifespan of 30 years. Cassette tapes can last up to 30 years, with 10 to 20 years as an average. Floppy disks are variable. They can last 3 to 5 years or up to 20, depending upon conditions.
A commercially pressed CD can last 30 or more years if kept away from heat, sunlight, and excessive handling. CDs made with gold layers may last even longer. DVD-Rs and CD-Rs may only last 5 to 10 years according to the National Archives, a situation compounded by a phenomenon known as “cd rot” or "disk rot". I have encountered this myself: three CDs burned about ten years ago are now virtually unplayable because of the presence of static equivalent to listening to a distant AM radio station at midnight with a hangover. Giving a new meaning to the term "cd rot", there have been documented cases of mold or a type of fungus in South America literally eating away the CD.
USB flash drives are new enough that no hard figures exist so the infographic may be optimistic. The main limitation of a flash drive lies in the number of times you can overwrite data: a flash drive may last anywhere between 10 to 75 years, depending upon usage and build quality of the drive.
Hard drives have grown in size as they dropped in price. One manufacturer recently unveiled an 8 terabyte drive. Like a flash drive, a computer hard drive lifespan will vary depending on usage. A server hard drive might fail within 3 to 5 years while a typical consumer hard drive might last between 7 to 10 years; a high end drive may last even longer. Solid-state hard drives usually have a write limit where once reached, the likelihood of a failure becomes certain (though not all manufacturers provide those numbers). For example, Samsung rates its 256 GB EVO solid-state drive for 1000 writes. In theory, you could write all 256 gigabytes one thousand times, equaling 256 terabytes, before it reaches a state where failures are likely. (Most experts believe the real world figure to be 25% of the theoretical rewrites simply because some areas will wear faster than others: the distribution of writing will not be uniform.) The average computer user writes 10 gigabytes a day to their computer, translating to an impressive seventeen years assuming 25% of the maximum. A recent real-world test of solid-state drives suggests they are more durable than anyone thought and the results are intriguing.
The cloud? Now that is the most problematic storage media of all.
Data Storage and Some Best Practices
What does this mean to you? How do you store all your data? How you keep it so you can access it ten, twenty, thirty years down the road? What happens if your computer hard drive dies, taking with it all your family photos, all the music or movies you have downloaded or paid for, the software you depend on? Or what if there is a flood, earthquake, or hurricane? What happens when your old VCR tapes dies, most certainly at the same time you realize you cannot find a VCR player? Or if you’ve stored all your data on CD but can’t read it because the disks have gone bad or you’ve moved on to new software that cannot read the old data?
There was a case a few years back where NASA realized they had hours and hours of film footage from the Apollo moon landings – and no way to view or copy them. They had to call in former NASA engineers and rebuild a machine from spare parts to view the footage. This could easily happen to ordinary users a decade or two down the road. What happens if your grandkids want copies of your old photos, but cannot open the files because of the format or because of data rot? Or let’s say you saved your will in an obsolete file format and no one has a computer or software capable of loading or reading it? It is not far-fetched. There have been cases where criminal investigations got stalled because the criminal operated a computer system so old no one could copy or read the data. There may be possible legal challenges, too, in attempts to reverse engineer obsolete or abandoned software, file formats, and the hardware needed to read or display the data.
A few might say, “put it in the cloud!” The cloud may be a good short-term solution, but one must consider that digital data has to be stored on something. If it is the web, that means a server and a hard drive. Hardware is not free. Someone has to pay for server space and hosting. So who foots the bill? What happens if servers or hard drives fail? What happens when the hosting service goes out of business? I know photographers who lost thousands of images when their hosting service shut down giving users 24 hours to retrieve their data before deletion.
Take a look at the MegaUpload fiasco and the resulting still unresolved legal case. MegaUpload was a giant among online storage lockers. You paid a fee and you got so much space to store your data and even share it with others. Unfortunately, it became a popular place to trade and share copyrighted movies. The U.S. Justice Department shut down Megaupload in 2012 on charges of hosting copyrighted material. The U.S. government stymied all efforts to retrieve legitimate data stored on the Megaupload servers. So who really owns the data you’ve stored in the cloud and will it be there when you need it?
The same goes for streaming services and anything else where files or data are in the control of someone else. Governments and corporations are not the only actors who might cause problems or have an interest in your data. Black hat crackers routinely hit personal computers with viruses and ransomware or turn them into part of a botnet. The cloud is now a prime target, an alluring new frontier of crime. One hears of security breaches on an almost monthly basis -- and the tempo seems to be increasing.
There is no silver bullet to fix the problems listed above. For most people, the best thing they can do is settle on a system of backups and stick to them. I said “system” because you should never rely on a single source for back-ups. A single source of backups becomes a single point of failure. Redundancy is essential. A combination strategy is the wisest choice, if more work and trouble to maintain.
The first question one should ask is what to back up. Most experts say users should concentrate on documents, recent documents if you use a program that can handle incremental backups, application data (data files for your programs, browser bookmarks, etc), system backup of your operating system, and backups for any important files such as photos, music, and video with sentimental value. So-called heirloom files should receive special priority. A typical progression for the thorough is to make a system image (a snapshot of your entire computer system), a local backup of files on a regular basis, some system to archive old data, and an off-site or online back-up system with some method to sync all the data.
Before you choose a system, look again at the lifespans I quoted and think about what happens if data corrupts at any point without any alternative source for recovery. Think about the three types of back-ups: full, incremental, and differential. Full back-ups copy everything. Incremental back-ups copy only files changed since the last incremental back-up. Differential back-ups copy only changes since the last full back-up. Which type suits your needs? Now look at the different backup strategies.
The easiest and most popular way to back up data is to use an external hard drive featuring automatic backup software. External hard drives range in price from $50 to $200 depending upon size and features. Expect to pay more for hardened hard drives designed to survive water and fire damage. The software and capabilities of external hard drives vary greatly. Some offer complete backup solutions for system, data, and files, including all three types of back-ups. Others back-up just files or certain files. External hard drives have one advantage not often listed: if you need to evacuate, an external hard drive is easier to grab than a desktop computer.
Some recommend using an online backup service. The advantage is the files are not where you live. If your house has a fire or vanishes in a mudslide, your daily backups are in another state, safe from harm. However, as I pointed out earlier, while your data may be safe from disasters, it will also be out of your control and vulnerable to criminal crackers or hardware failures. A common variant is to use a service like Dropbox to back-up essential files or make them available when traveling. If your needs are modest and security isn’t a major concern, the price of an online backup service may be worth it.
Technologically adept users may opt for a network storage solution. A NAS or network-attached-storage is a way to backup files from several computers linked together on a network. It can double as a file server for computers on the network. All the files are stored or backed up in one central location so if any one computer fails, the data won’t be lost. Good NAS servers or hard drives range in price between $200 and $500. The downside? It is in the same location as your house.
Many use CDs or DVDs as a popular and cheap means of backups. The main issue with CDs lies in versioning. If you backup files that will not change over time, this can be a great option. If you are backing up files that often change, keeping a clear record of what version of what files is on what disc is imperative. The discs should always be clearly labeled and dated. If you choose CDs as a back-up, check old CDs every year to see if they are still readable. Be sure to store the CDs or DVDs in a cool, dry place away from light or moisture. If long-term archiving is a priority, investigate archival CDs and DVDs made of gold. Popular brands include Delkin’s Archival Gold, Kodak’s Gold Preservation, and MAM-A’s Archive Gold. They are expensive, but may be worth it if the data is important enough.
In the same vein, many use USB flash drives as back-ups. Modern flash drives are high capacity, reaching 1 terabyte in size, relatively cheap, and durable. I've seen flash drives survive washing machines. I have yet to have a USB flash drive go bad and I have some nearly fifteen years old. However, they have not seen heavy use so do not take my experience as representative. USB flash drives, if used in conjunction with CDs and DVDs, may be an affordable and portable solution. Because of their capacity and small size, they are a prime candidate for storing files along with your family documents and jewels in a safety deposit box. However, versioning can be a larger problem with flash drives because one looks pretty much like another. Getting mixed up about what files is on what drive happens too easily.
Most users should weigh their choices, decide what is important to them, pick a strategy, and stay with it. The paranoid should back-up their back-ups, keeping one local copy and one off-site. If you plan to store data off-site or in the cloud, use encryption. I repeat, use encryption.
Digital Memory and the Internet
However, there is more to the issue than merely saving your data. The chameleon-like nature of digital data has unseen, unnoticed impacts: files, file formats, and storage media are not the only things affected by systemic instability. Go back to the first infographic. Ask yourself how long all that new content will be available.
Now, moving far afield, fire up your browser and think about what you see. Digital memory on the net, in an extension of the impermanence of digital storage, is far more volatile than many realize. Nowhere is the power of the digital world more evident than on the internet. The sheer variety of content is staggering. Once upon a time, one could claim in all seriousness if you look hard enough, you can find almost anything you could imagine on the net.
Yet, it is incredibly fragile. Think of all the issues outlined for one person and their personal computer extrapolated for the whole world. How many times have you sat at your computer trying to find a lost file or picture? For it to be of use, internet content has to be searchable. You have to be able to find it. The first problem is technical. Servers and an internet connection are necessary for content to stay online. If the company hosting the server goes out of business or the site owner takes down the site, any content or information is lost. It simply vanishes. Just like data stored in the cloud or a data locker.
Then there are the pressures to remove content by force. It started with the DMCA and the increasing use of takedown notices. In recent years, the rate of notices has exploded and so has the number of companies using the DMCA to silence critics, remove embarrassing information, and even seek control of content that was never theirs to begin with, often with a complete failure to contemplate any legal Fair Use exemptions. Abuse of the DMCA system is so common it is hard to pretend there isn’t a problem. Also, instances of theft by companies and corporations of content posted online serves as an odd counterpoint. It is ironic that corporations go into circle your wagons mode when someone posts or shares a picture, screen cap, bit of music, and so on, but try to justify outright appropriation without compensation or attribution the work of photographers and other artists who posts samples of their original work online. A quick search will turn up many examples so this is not an isolated trend.
Now there is the European “right to be forgotten”, which allows people to force search engines to delink “embarrassing” information in the name of privacy rights, ignoring the obvious ways the system can be abused and the many potential pitfalls stemming from it. The point is the “right to be forgotten” serves as one more stake aimed at the heart of the internet because it allows the removal of information someone objects to, in effect rewriting history in a way to astonish Orwell. This is almost a classic “good idea gone bad because no one thought it through”. The question becomes how far one can go to scrub data, who gets to do it, and where does it end? When one considers how courts have continually conflated real life with a person’s online life, erasing the distinction between the two, the tin foil hat brigade starts looking reasonable. With computers hooked up to the net and people spending much of their life online, at what point does a line in the sand magically appear that says “this is private” and will a court listen? I suggest interested readers look up the third party doctrine, keeping an eye open toward what it could mean for data stored in the cloud and in social media. The point is, once data is in the cloud or on the net, it may no longer be “your” data. Some court decisions seem to suggest the possibility that data on your personal computer may not be entirely yours either since, according to the EULA for some operating systems, you do not “own” the software running on the computer.
A corollary to it is the way the internet itself can, because of the impermanence of digital memory, either perpetrate false information or have amnesia about the true history of events. It took watching “The Nightmare Before Christmas” with my nephews last weekend to drive that last bit home. I have refused to watch that movie since it was released on home video in 1997 so it was the first time I had seen it in seventeen years, bringing back a mix of memories. I first saw the movie in the theater with my girlfriend at the time, a big Patrick Stewart fan. (I knew from reviews Stewart did the narration and wanted to see her reaction.) All the way home after the movie, she talked about Patrick Stewart’s voice over as the narrator and the cute little epilogue at the end, noting how odd it left no room for a sequel. It stuck in my memory. When the movie came out on video four years later, I plopped it in the VCR. Imagine my surprise when I didn’t hear Patrick Stewart’s voice or that the epilogue was gone. Thinking my memory was in error, I bought the soundtrack CD. It was all there, just as I remembered from four years earlier. Why should it not? The CD released alongside the movie contained the songs and narration as they appeared in the theatrical version.
Perplexed and little outraged, I searched online. There was a dispute between Stewart and Disney – I never found any details – and the opinion of some claimed this was Disney’s payback, starting with the removal of his name from the theatrical credits. I even remember the vocal outrage of a few fans holding out hope to see the material reinstated for a special edition. When I watched the movie with my nephews, I told them about the cuts. They asked, “can’t we see it Youtube?” Like a dutiful uncle, I fired up Google and Yahoo...and got another surprise.
I assumed there would be references all over the place. No one had a clip of the epilogue I saw at the movies. Every source online claimed Disney never used Patrick Stewart’s narration in the theatrical movie, only on the original soundtrack album. I could not find any of the original sites discussing the cut epilogue because they had long since gone offline. The more I looked, the more I saw an almost textbook example of how bad information can contaminate the web. Then I stumbled into Disney’s webpage on the film and realized even Disney’s official account of the film production stated Stewart’s narration was recorded but never used in the film.
All I have left is my own fallible human memory as proof.
The image of O’Brien in “1984” incinerating a newspaper article and then claiming the article and the incident it described never existed ran through my head. The very impermanence of digital memory, its ease of replication and alteration, allows for both the infection of false information or memes through mistakes or misstatements and the possibility of a government, corporation, or some other entity rewriting the web for their purposes – all because it is easy, cheap, and invisible. Most of the time, the only real cost on the internet is bandwidth. The ease of manipulation and duplication in digital data becomes a temptation as well as a boon.
This issue has come up before. Anyone who followed the old Groklaw website, back when Pamela Jones updated it, knew how often companies involved in lawsuits scrubbed their sites to remove statements, announcements, and other information contrary to their position in the suit, an attempt to pull an Orwellian O’Brien. Fortunately, there often were people who cared enough to not only bookmark but also download the webpage itself or take screenshots or print it out. What happens if no one did? Who then can challenge the “new and improved” version of events? There are efforts to archive the web, such as the Internet Archive, but they are neither absolute nor immune to legal challenges.
The Price and Promise of Digital Memory
Like an ouroboros, we’ve come full circle.
The digital world carries immense promise. So much of our lives exists in the digital realm it is breathtaking when one contemplates its reach. We depend on it on so many levels, yet it has become so integrated as to become invisible. This is doubly true with our personal computers and the data we accumulate. Think about how many eBooks, music downloads, photos, and so much else exists only on our computers or on social media like Facebook or Flickr. A single computer crash can be catastrophic for the average user.
Look a bit further. Digital storage may have tremendous capacity, small size, and cheap transmission costs, but how durable it will be remains in question. Future technology might solve the problems in digital storage – and it might not. The only way to insure digital data and content is not lost is through redundancy, saving to multiple storage media and locations, and some method, both technical and legal, to retrieve that data decades down the road. I say legal because some questions of who owns digital information remain unsettled in law. For example, only one state allows the heirs of a deceased legal access to social media left orphaned by the death of the user.
The digital world has reached a point where it can affect and transform every human on the planet. That sort of potential scares some people, outrages others, invokes seismic trembling from authoritarians, and makes every corporation in existence salivate at the chance to make money. The next twenty years will determine the fate of the internet – and by extension, how we maintain control over personal data, both in our computers and with third parties. The future is uncertain and no one can say how it will end, whether the internet becomes a corporate walled garden, a government-controlled panopticon, a fusion of the two, or something we cannot envision yet.
But I believe the choice of future starts with two basic questions: how do I save my data and who controls it?
This post first appeared on The Bookworm's Apple | Book Reviews And Reflections By An Amateur Polymath, please read the originial post: here