Beyond simple OCR: an outlook of modern AI vision for insurance automation
By Elliot Hofman, Data Scientist at Zelros
Our mission at Zelros is to bring Artificial Intelligence to insurance companies, and enable them to offer a better service to both the insurers and their policyholders. In fact, even if the digital area has already started to reshape the insurance industry, many processes still require heavy human interventions, from underwriting to claim handling. Of course, reaching a full automation rate by AI algorithms often is an unreachable goal, but being able to automate certain parts of the processes with high levels of confidence allows to reduce processing delays. AI becomes an additional tool to strike the right balance between repetitive low-valued tasks and complex high-valued tasks. Insurers can automate parts of them, while focusing on the other ones that require full human experience and knowledge.
Zelros Documents2Insights is a module of our platform to perform processing automation of all the pieces of legal documents that are commonly handled in the insurance industry: national ID cards, driving licenses, car registrations, insurance statements, tax notices, …
Optical Character Recognition (also known as OCR) is a Machine Learning subfield, in which models are given an image as input and are asked to predict what is the textual content displayed on the image :
If OCR is obviously one of the main pillars in order to automate the parsing of such documents, it is worth mentioning that many other ML tasks are also crucial when analyzing these documents. Here are a few examples of the Computer Vision problems we work on at Zelros:
- Checkboxes analysis
- Detection and recognition of signatures
- Forgery detection in order to prevent fraud automation
- Drawings classification
- Automatic image processing in order to remove non OCR-friendly artefacts
Out of all the documents that are handled during insurance processes, we chose here to discuss in more detail how we tackled the problem of the European Accident Report automation. In Europe, when two drivers are involved in an accident, they must fill in a form describing the circumstances of this accident and provide some information about both drivers.
It is an interesting piece of document because it is very rich from a Computer Vision perspective. It features both typed and handwritten, structured and unstructured fields, along with drawings, checkboxes and signatures. Also, even if insurance companies can decide freely of the content and aspect of this document, it has been normalized so that differences between companies and countries are often minor. In France, it is estimated that 5 million accident reports are produced each year, which represents a huge volume.
In this blog-post series we will discuss more specifically about how we succeeded to automate the checkboxes analysis, that is, predict whether the checkboxes are ticked or not. This is an easy task to understand, and that is also encountered in many other kinds of documents. In fact, the methods we will present here have also been applied with success for other insurance forms, from both subscription and claims.
The checkboxes analysis will serve us as a pretext to discuss various general Machine Learning concepts:
- (1/4) What are some good and bad ways to model a ML problem?
- (2/4) How to fight the dataset biases with synthetic data generation?
- (3/4) How to assemble and validate various ML models?
- (4/4) How can we explain and interpret our ML model predictions?
Enjoy the reading!