
I started scanning away physical documents at high resolution and found each page was coming out to about 290-330 KBs. A 3 page document could cost me 1.05 MBs to store, which sounded higher than it needed to be. That would start to add up quickly when you have hundreds of documents to scan. Could I save some storage space by scanning at a lower resolution? Here are my current storage size estimates:
Pages | Storage Size |
1 | 310 KB |
10 | 3.1 MB |
100 | 31 MB |
1,000 | 310 MB |
10,000 | 3.1 GB |
100,000 | 31 GB |
The fine details of most documents I’m scanning aren’t that important, all that matters is that the text is legible. For example, I like to keep a history of my car maintenance and most shops only provide a physical copy. A smog check receipt is important to save but the quality only needs to be good enough to verify some key text like certificate number, location, and date. It’s not a document I’m going to refer back to regularly, and may never even need to look at again, so it doesn’t need to have crisp text.
I decided to try out various resolutions for the smog check receipt, a standard letter size document, to find the pixel density sweet spot, meaning a density as small as possible but with text that’s still easily readable. I calculated the PPI, pixels per inch, for a standard letter size paper with the formula:
$$PPI = {Pixels \over Inches} = {\sqrt{W^2 + H^2} \over \sqrt{8.5^2 + 11^2}}$$Here are the samples I took using my phone:
Sample 1: 327 PPI Sample 2: 250 PPI Sample 3: 202 PPI Sample 4: 168 PPI Sample 5: 129 PPI Sample 6: 115 PPI
Sample | Camera Setting | PPI | Size (KB) | Size Reduction |
1 | 12M (4048×3036) | 327 | 286 | 0% |
2 | 7M (3200×2400) | 250 | 214 | 25% |
3 | 5M (2592×1944) | 202 | 156 | 45% |
4 | 3M (2048×1536) | 168 | 118 | 59% |
5 | 2M (1600×1200) | 129 | 79 | 72% |
6 | 1M (1440×1080) | 115 | 63 | 78% |
I’ve heard that the high quality PPI tipping point is 300, where increasing the density beyond that value doesn’t make a significant difference to the quality. Even the Library of Congress recommends 300 PPI for preserving digital resources. So I’m not surprised our first sample is sharp and very close to the original source.
Moving on to the next samples, 2 and 3 are both pretty easy to read. Sample 4 at 168 PPI is where it starts to become questionable then samples 5 and 6 become too difficult to read. That makes our winner sample 3! So 200 PPI is about as low as you can go and still retain sharp enough text. Of course this could vary by document depending on personal preferences, the size of the text, or other small details you are trying to preserve. Let’s revise our size estimate chart from the beginning using our new density:
Pages | Storage Size | Reduced Storage Size |
1 | 310 KB | 140 KB |
10 | 3.1 MB | 1.4 MB |
100 | 31 MB | 14 MB |
1,000 | 310 MB | 140 MB |
10,000 | 3.1 GB | 1.4 GB |
100,000 | 31 GB | 14 GB |
I’m able to shave off about 45% of my initial estimated storage space without any loss of information. That will add up to a significant savings as I accumulate more and more scanned documents over time.
[…] x 13.1 D x 6.1 H inches. But the scanner works perfectly and scans at a resolution well beyond the 300 DPI we need. I found the original specs posted on the Canon […]