Kamanda Wycliffe - Freelance data analyst and data scientist

Machine Learning - Kamanda Wycliffe machine-learning


Machine Learning

Aim

Result

Project Duration

By anticipating key change agents and potential future business scenarios, machine learning and AI especially enables enterprises to forecast market trends, better understand the challenging environment, and respond appropriately, giving them the very first advantage. The goal is to predict the outcome with high precision.

Results are presented in an interactive report. The resulting artifacts like models can be ingested into the business processes including in client and business applications to help drive decisions.

Project duration can be between 2 weeks and 3 months. In order to carry out the project as quickly as possible, it is important that the relevant data is available, complete and clean. Where necessary we design project-specific ETL pipelines or use available data sources to curate relevant data for the project.


Case

An insurance provider was interested in improving insurance claim processing by quickly determining the legitimacy of the claims made. To provide better services, claims that were detected to be potentially fraudulent would be investigated by the company to ensure the correct decision is made.


WHAT INCIDENCES WERE REPORTED BY THE POLICY HOLDERS?

Most of the incidences reported to the company were related to vehicle collusion including multiple-vehicle collision with 49,206 cases and single vehicle collusion with 26, 608 cases. Across all the claims made, there were fraudulent cases. The highest proportion of fraud relative to the number of cases was recorded among Parked Car cases (19%).


HOW CAN WE PREDICT THE OUTCOME?

A machine learning model is designed to use the various features of a claim to predict whether the claim made is likely to be legitimate or fraudulent. Each outcome is assigned a cost and a threshold for a claim to be legitimate so as to reduce the cost associated with the predictions. It is desirable to reduce the number of false claims i.e., improve the precision of the model. Ranking probabilities provides you with the list of claims that need to be investigated further to ascertain their legitimacy.

Depending on the set objective, the processed data is used in defining a tuned model with optimal performance. The process of model training and optimization is one of the most important processes which ensure a high-performing end product.

A high performing model which has been back-tested using historical data as well as simulated data is then deployed a production environment e.g., dashboard to generate live predictions for decision-making.

Selecting a deployment model is dependent on multiple factors. The best model is often that which meets multiple selection criteria that were specified for the given business case. A confusion matrix, besides time taken to run the model, is considered the basic unit of model evaluation.

The minimization of specific cases in binary classification problems, such as false negatives or false positives, grows increasingly crucial as we implement more machine learning into current products.

Want to know more about data analysis? Please get in touch with me here.