Document Classification

Classification is the process of putting document images into certain predefined types.

Manual classification of large amounts of documents is:

labor-intensive
costly
slow
error-prone when working against tight deadlines

Classification can be very useful in the following example scenarios:

Input documents consist of invoices from a number of different suppliers – we can sort those documents and group invoices from the same supplier together

Assembly of input page images followed by automatic sorting.
For example: If batches of images are scanned in no particular order document classification can determine which document each image belongs to. The images can be assembled into the correct documents and saved to specific folders or processed further to extract any required data.

Detection of documents within larger files.
For example: A large multi-page PDF file is input containing numerous documents related to a single case. The classification process can detect different types of document within the PDF file, and extracts these as separate files. The remaining pages are saved as a separate PDF file.