How to understand if your ML model is wrong and how to fix it?

How to understand if your ML model is wrong and how to fix it?

Machine learning refers to artificial intelligence to accurately predict outcomes based on previous data. It allows a computer to learn without needing any explicit programming to do so.

The number of machine learning models varies depending on their classification; however, we can broadly classify machine learning into three different types- supervised learning, unsupervised learning, and reinforcement learning. 

The first step of Intelligent document processing is classifying the type of document, which uses OCR technology based on the machine learning algorithm.

Let’s look at how to identify if the ML model is wrong and what you can do to fix it.

While training an ML model, we use a set of historical data to help the machine learning model understand the relationship between the features of input data and the predicted output. But, even if the model can accurately predict the output in the historical data, how can we be sure that it will work the same on the new data sets?

The first step to determining whether an ML model is good for you will have to assess the High Bias and High Variance scenarios.

A High Bias scenario represents an underfitting model for an example dataset. In this scenario, your model does not understand the accurate relationship between the input data set and the predicted output and often predicts wrong outputs. 

Your model is wrong if it provides an output with a higher number of errors, such as the variation between the actual value and the ML model predicted value.

A High Variance scenario refers to the overfitting ML model, the exact opposite of the High Bias. Your model predicts a highly accurate output for a given data set in this scenario. Even though it seems like a good thing, it is a reason for concern. An overfitting ML model may fail to generalize the future datasets and predict a bad outcome. Your model might work great for you for the existing data sets, and you can not be sure how it will work on future data sets.

To determine whether your model has High Variance or High Bias, you can Train-Test Split your example datasets. Split the dataset into a 70-30 ratio. Train your model on 70% of the data and then use 30% of the data to find the error rate. If your ML model provides a high error rate in both the 70% train dataset and 30% test dataset, it indicates High Bias, and the model is underfitting.

If your ML model provides an output with low errors in the 70% train dataset but has a high error rate in the 30% test dataset, then it is a scenario of High Variance. Your model was not able to generalize the test dataset.

If your model provides an output with low errors in both train and test datasets, it indicates a balance of bias and variance levels, and the ML model is the right one for you.

High accuracy doesn’t always mean a suitable ML model for a scenario. In the case of predicting an input as either positive or negative class and the positive to negative ratio is too high, an ML model can learn to always predict an input dataset to positive and still have a higher accuracy rate.

In such cases, you can use Precision and Recall metrics to determine the actual percentage of the positive class.

Precision measures the accuracy of the prediction of the positive class. You can calculate it as the number of True Positives over the True Positives and False Positives sum.

Recall measures the rate of prediction of the actual positive class. You can calculate it as the number of True Positives over the True Positives and False Negatives sum.

When only some of your positive predictions are true, then it is a case of Low Precision, and when the ML model never predicts nearly all of your positive values, then it is a case of Low Recall.

The Demerits of the Wrong ML Model

Using the wrong ML model can result in higher error rates. Your model may predict a false positive outcome for a dataset or never predict the actual positive outcome.

How to Fix the Wrong ML model?

There are several different strategies to fix a wrong Ml model if it shows High Bias or High Variance or has an imbalance in Precision and Recall.

For instance, in the case of High Bias, you can increase the number of features of the input data. An underfitting ML model results in a high error rate in the test and train dataset. If you plot the model error as an input feature, then a higher number of features provides a better fitting ML model.

Similarly, in the case of High Variance, decreasing the number of features of the input data. An overfitting ML model might result from using a high number of features. Decreasing the number of features of the input data adds flexibility to the ML model for testing future datasets.

Another way to make an ML model flexible is to increase the training examples.

In the case of Low Recall and Low Precision, altering the probability threshold, which identifies the negative versus positive class, will help.

Increasing the probability threshold lowers the prediction of positive class, which helps in the case of Low Precision. On the other hand, reducing the possibility threshold increases the prediction of positive class, which helps in Low Recall.

Making these changes enough times makes it possible to find the right balance between Precision vs. Recall and Bias vs. Variance.

Conclusion

Even with training your ML model to predict the right outcome for a dataset, it is possible to have a wrong model, resulting in higher error rates. 

Bias, Variance, Precision, and Recall are four different parameters that help to determine whether an ML model is suitable or not for you. KlearStack machine learning solutions ensure the right ML model for your requirement by offering the right balance between these four.

Ashutosh Saitwal
Ashutosh Saitwal
www.klearstack.com/

Ashutosh is the founder and director of the award winning KlearStack AI platform. You can catch him speaking at NASSCOM events around the world where he speaks and is an evangelist for RPA, AI, Machine Learning and Intelligent Document Processing.