Robotic Process Automation (RPA) helps auto lenders accelerate the lending process and significantly increase their revenue. One of the biggest challenges for RPA is the quality of the documents. Documents that reach the lender have been printed, scanned, and faxed multiple times, resulting in sub-par document quality that might not even be readable by a human.
Optical Character Recognition (OCR) sits at the center of any RPA for documents. OCR results from these documents are often incorrect, eg. 162 might become TGZ. A lot of common English words can be corrected using a spelling corrector. But the challenging part is fixing a number or an uncommon word, like a Vehicle Identification Number (VIN).
VIN verification is one of the most important variables when it comes to automating the auto loan origination process. Here is an example of the quality of documents and its corresponding OCR:
Informed.IQ can address the issue using data and domain expertise. Here is how we approached the issue.
VIN is a 17 character alphanumeric word with certain rules as shown in the image below. (Reference Here.)
Domain Knowledge about VIN:
- Serial Number can only be numbers i.e 0 – 9
- Checksum can only be 0-9 or X
- I, O, Q are never present in a VIN
- Certain cars can only be manufactured in certain countries (position 1)
- Each make has a predefined set of portraits of the car (position 4-8)
- Given the year of the car, the 10th digit is fixed and can be looked up, see chart below:
Data driven AI approach:
We leverage all the data we have on make, model, year and VIN of a car and use it to create a lookup for each make and model.
We break down a VIN for all make & models into 5 different parts:
- Country : Position 1
- Manufacturer : Position 2-3
- Model : Position 4-5
- Configuration : Position 6-8
- Plant: Position 11
Using the data we have on all the VINs, Make, Model and Year, we map out all possible characters for each of the positions mentioned about the make model. Based on the experience working OCR engines we were able to identify the common errors for a given character and create a lookup of all possible errors. We use this information to generate candidates for an incorrect VIN and use the checksum logic to correct the VIN.
Let’s take the above mentioned case where OCR gives us an incorrect value and leverage this approach to fix the VIN:
Incorrect VIN: 3C6URSULEGG336004
- Make: Ram
- Model: 2500
- Year: 2016
Step 1: Break down VIN into its base components
Step 2: Using Make, Model and Year, verify if the components exist in the look-up and select components that were not found in lookup
Step 3: Make sure all the components follow the rules for a valid VIN and if not fix it using known OCR errors
Step 4: Leveraging known OCR Errors, replace all characters and see if the new component is part of the look-up for the given VIN
Step 5: Run checksum and validate all candidates: And we get our corrected VIN ‘3C6UR5JLE6G336004’.
With the above outlined approach we can automatically fix ~50% of incorrect VINs with a precision of 95%+. With OCR, even with manual intervention, it would take a lot of time and be quite expensive to get the same result.
Informed’s proprietary machine learning enhances the OCR results and our proven track record of simplifying and optimizing RPA results in a best in class product for lenders. VIN enhancements are just a small piece of meeting the needs of the automotive financial services ecosystem. Combining leading edge technology with your customer facing team enables you to improve your business and grow your bottom line.
Harshil Prajapati is a Senior Machine Learning Engineer at Informed.IQ where he develops Machine Learning models for Classification & Name Entity Extraction. He has a Masters Degree in Machine Learning from Boston University.