Scanning as an act of preservation

As much as piracy get the bad rap from those who seemingly suffer from it, it has constantly functioned as a tool of archiving, even if by accident. I doubt too many groups who ripped games or people who uploaded and shared music on eMule were thinking that they were doing historical archival of the era’ popular culture. This is probably best reflected in how things were, and still are, scanned. Be it books, booklets, manuals etc. you’ll most likely end up with scans that are harshly compressed and filled with artifacting across the board, destroying the original information of the image. This is like having lower and lower bitrate in digital music files, except worse, because usually scans around are of low resolution. Sadly, there are times when original works have been all but lost, and the only things we’re left is  sub-150dpi scans with heavy compression thrown in. They don’t stand to modern standards, they never really did.

Scanning guides on the Internet often seem to recommend using medium settings for the output file, arguing that it’ll save disk space. This may have been an argument in earlier days of computing, when space was at premium. With time, this has become effectively a non-issue, especially with Cloud storage being a thing. Keeping websites light was also a priority, so finding that sweet spot between good-enough quality and load times was important. 56kb dial-up modems weren’t exactly the most effective way to transfer data around, but that’s what was available at the time and can’t really complain about that. Nowadays with blazingly fast connections on our phones, that’s not exactly an issue. All sites are more or less Java hells anyway. Of course, a lot of sites that carry any sort of scans or cover photos would like to keep everything rather small in size in order to avoid copyright infringing claims. Amazon often has small scans from God know when for older products, and even some new products have extremely limited size, from which you can’t really see much. Again, the bandwidth and storage space is cited to be the issue, but nothing really would keep these guys from using a thumbnail as a link that would send the user directly into the largest possible version of the image available. We should of course consider that allowing everyone access to highest possible version to an image might lead into easier copyright infringing or knock-off productions, but tracing exists for a reason.

Because this post will be heavy on images, more after the jump.

But to get to a point, here’s a comparison between a generic Mazin Wars scan that’s on Sega Retro, and from someone else’s stash. Sega Retro has to recycle whatever they get and whatever they can scan themselves, much like every wiki out there. All the images are set as galleries of two, so click them to have a bigger version. You can also access the image through the galleries.

The first point, of course, is the sheer size difference. More information can be packed into larger space, and it can be more detailed. However, packing is also important, as a high-resolution file can and will lose information per pixel if the compression quality is set too low. However, images with large resolution don’t suffer nearly as much from artifacting when compressing down to bits due to the information. Smaller pictures, not so much. The difference between these two images is rather high. The Sega Retro image is 178.8 Kb in size, whereas the larger one is about 36.2 Mb. To illustrate what I mean, here’s the same cover as above in two resolutions; 1200dpi and 150dpi, but both have smallest filesize Photoshop can muster.

The space background has taken a hit in detail in both of these versions, as the compression tries to migitate information between pixels in order to save space. Some programs can handle compression better than others, Photoshop often doing pretty good job. Some programs, on the other hand, just don’t do compression all that well. It doesn’t help that there are large differences in scanners as well, from what sort of scanning elements they have to how accurately they are able to replicate the colours of the target. The two above comparisons are much smaller, 6.26 Mb (or 6 420.2 Kb) and 137.8 Kb respectively. The compression is far less visible on the 1200dpi scan here due to the sheer size size of it where compression algorithm has room to move pixels around more. It still yields worse image quality.

The Sega Retro example above has visibly more saturated colours either because of different settings or post-processing compared to the rest, and this will always be a problem on itself; how do we know if the scanned object is accurately represented? Without an absolutely accurately calibrated monitor and having the scanned object in our hands, we don’t. Then you have all the other problems scanning produces, with creases and parts that have not been flattened against the scan bed. If you look at largest examples left side, you can see that it is fuzzier than the rest of the image as it has been off from the scan bed just enough. Cover scans are rather difficult to do, unless you take extra steps to straighten the cover first.

Resolution is really the key to all this. A question a friend years back asked me Why would anyone scan a picture in 1200dpi when it has been printed in 300dpi? It’s a really good question, and you probably know the answer already; information. The more information you have on a picture, the more you can work with it (be it fixing tears or smudges), the less certain elements become cumbersome when you begin to downscale to more usable size and the more accurate the colours will be. This is because how printing works. Let’s zoom into the face on the cover in both 1200 dpi and 150 dpi.

The “only” difference between these two pictures is their dpi. Somethings you can not see in the 150dpi version, like the little scratch on the left side of the nose or the dust particle on the left of the forehead gem. These do simply not exist on image’s information. Another thing you probably should notice is the vertical line running around the middle point in the image, where it seems like something less than pixel’s worth was cut out from the whole image. This is due to the scanner’s scanning element. Some scanners have segmented element, which effectively don’t see the whole image, losing fraction of a millimeter. In high dpi scans this becomes an issue, but with sub-1200dpi scans this becomes a non-issue, as the resolution doesn’t have information to that small scale. Picking up a scanner isn’t exactly straightforward to be sure, as there are a lot of variables to consider.

At this point I probably should also point out that JPG is a lossy format, not matter how large the filesize really is. PNG, GIF, TIF and whole slew of others are around. However, JPG is the most common, with PNG in its wake. JPG in general should be used when space is an issue, but as you can see above, the low compression – high filesize does produce good and accurate results nevertheless. The discussion for compression and image quality often applies to smaller sizes, not when you have information on the grain on the paper the image was printed on. PNG is to-go format for all basic digital works and you should never use JPG when you’re saving vector images or the like. SuperPNG is a format that packs PNG files into smaller size without losing any of the quality. If you’re a Photoshop user and create images, for the love of God use this. TIF is often cited as the best for commercial works, but I can say from experience that most people have never heard of this or can’t work the format out. All these of course also are put into how much time it takes to scan. A Bitmap file of course would probably the best all around, as it barely has any compression to its as such.

A sub-600dpi image scans itself in matter of seconds, while a  1200-dpi image can take up to a minute when the paper size goes closer A3 size. If you ever wonder why it takes so long to scan a 300-page book in 1200dpi, it’s because each of those pages take a minute to scan, and then all the other time to process the pages both with the scanner and in software. As I’ve often discussed, JPG is the good-enough format, but only when its initial size is large enough and compression is non-existent. Just remember to enable Unsharp Mask at these sizes. To showcase difference between scanners and image qualities with formats, the here’s the same high-resolution scan from above in JPG, and a new scan with a different scanner in PNG. The sizes are 26.2 Mb for the JPG file, while the PNG is 92.3 Mb. The difference can be best appreciated on higher end monitors.

It is important to save the original information on the scanned object as much as possible. Be it for archival or to provide replacement materials, the better the original scan with as much as redundant information as possible, the more we’ve managed to save. This cover here will never see an official print of any kind, and the only real way to see one is either on the Internet or to buy a used copy. Sadly, the Internet mostly holds low quality images, but anyone who has a scanner that can do at least 600dpi can help. After all, are these not art? I consider scan piracy, illegal as it might be, as a sort of conservation of history and as an act of archiving. It should be done for profit, but rather for the same reason why museums want to take care of any work of art. It’s part of our cultural heritage, however small. We’ve already lost so many comics, drawings and paintings to the void of time, and scanning might be the only way to preserve what might be gone the next day.

One thought on “Scanning as an act of preservation

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.