Basic premis of a scan to Google docs project
An open source project to scan documents (or photo them on a phone) and put the resultant doc into Google docs.
Using a little bit of OCR and some rules to pick enough information out to then categorize the resultant document.
Why scan to google docs?
Business needs a way to go paperless. The main document scanning and storage tools are slow cumbersome and expensive. Most of the products I have seen are systems designed to lock you in. An open source project allows flexibility and transparency enabling business and home users to use with confidence.
- Integrate into Amazon S3 for storage.
- Use an in house Cassandra store for data.
- Pre storage encryption and retrieval decryption.
- Camera plugin to use photos or a scanner.
Python with Sane look like a relatively simple input mechanism.
Google docs has a python API that includes examples on how to use the OCR.
Cassandra can be used to save base 64 docs upto 16MB without too many issues. Other database engines could be used.
Anyone interested in starting this project with me?