Adaptive Regression

We recently put through our observation on Regression Problem in our research. This post is a nonformal attempt to explain it.
regression
research
analysis
Author

Jitin Kapila

Published

May 1, 2018

Adapting path through mountains! Photo by Zülfü Demir📸

Adapting path through mountains! Photo by Zülfü Demir📸

Introduction

Here I am trying to express our logic to find such Observation. Lets dive in.

There are different value estimation technique like regression analysis and time-series analysis. Everyone of us has experimented on regression using OLS ,MLE, Ridge, LASSO, Robust etc., and also might have evaluated them using RMSE (Root Mean/Median Square Error), MAD (Mean/Median Absolute Deviation), MAE (Mean / Median Absolute Error) and MAPE (Mean/Median Absolute Percentage Error), etc…

But all of these gives a single point estimate that what is the overall error looks like. Just a different thought!! can we be sure that this single value of MAPE or MAE? How easy it is to infer that our trained model has fitted well across the distribution of dependent variable?

Plot of Anscombe’s Quartet

Plot of Anscombe’s Quartet

Some Descriptive Stats for Anscombe’s Quartet

Some Descriptive Stats for Anscombe’s Quartet

Let me give you a pretty small data-set to play with “The Anscombe’s quartet”. This is a very famous data-set by Francis Anscombe. Please refer the plots below to understand the distribution of y1, y2, y3, y4. Isn’t it different?

Would the measure of central tendency and disportion be same for this data? I am sure none of us would believe but to our utter surprise we see all the descriptive stats are kind of same. Don’t believe me !!! Please see the results below ( Source: Wikipedia ):

So what we do Now!

Astonished !!! Don’t be. This is what has been hiding behind those numbers. And this is why we really won’t be able to cross certain performance level. Unless you change some features or even do a lot of hyper parameter tuning, your results won’t vary much.

If you look at the average value of MAPE in each decile you would see an interesting pattern. Let us show you what we see that pattern. One day while working on a business problem where I was using regression on a discussion with Kumarjit, we deviced a different way of model diagnosis. We worked together to give this a shape and build on it.

As you can see it is absolutely evident that either of the side in the distribution of MAPE values is going wild!!!!!!! Still overall MAPE is good (18%).

Seeking Scope of Improvement

We worked together to build a different framework to address such issues on the go and reduce the MAPE deterioration on the edge of the distribution.

This problems gives rise to a concept we named as Distribution Assertive Regression (DAR).

DAR is a framework that is based on cancelling the weakness of one point summaries by using the classical concepts of Reliability Engineering : The Bath Tub Curve.

Plot for Classical Bath Tub Curve using a Hazard Function

Plot for Classical Bath Tub Curve using a Hazard Function

The Specialty of this curve is that it gives you the likelihood which areas one tends to have high failure rates. In our experiments when we replace failure with MAPE value and the Time with sorted (ascending) value of target / dependent variable, we observe the same phenomenon. This is likely to happen because most of regression techniques assumes Normal (Gaussian) Distribution of data and fits itself towards the central tendency of this distribution.

Because of this tendency, any regression methods tends to learn less about data which are away from the central tendency of the target.

Lets look at BostonHousing data from “mlbench” package in R.

Plot for MAPE Bath Tub Curve for Decile Split “mdev” from Data

Plot for MAPE Bath Tub Curve for Decile Split “mdev” from Data

Here the MAPE is calculated for each decile split of ordered target variable. As you can observe it is following the bath tub curve. Hence the validates our hypothesis that the regression method is not able to understand much about the data at the either ends of the distribution.

Final Analysis

Now the DAR framework essentially fixes this weakness of regression method and understands the behavior of data which is stable and can be tweak in a fashion that can be use in general practice.

Plot of MAPE Bath Tub Curve after applying DAR Framework for Decile Split “mdev” from Data

How this framework with same method reduced MAPEs so much and made model much more stable…?? Well here it is:

The DAR framework splits the data at either ends of the order target variable and performs regression on these “split” data individually. This inherently reduces the so called “noise” part of the data and treat it as an individual data.

Scoring on New Data

Now you might be thinking while applying regression this sounds good but how will one score this on new data. Well to answer that we used our most simple yet very effective friend “KNN” (Though any multiclass Classifier can be used here). So ideally scoring involves two step method :

  1. Score new value against each KNN / Multiclass Classifier model of the data
  2. Based on closeness we score it with the regression method used for that part of data.

So now we know how we can improve the prediction power of data for regression.

Code and Flowchart

If things are simple lets keep it simple. Refer flowchart and code below for implementation of this framework. Paper here!

graph TB
    
    subgraph Testing
        p1(Finding bucket of model to choose)
        p1 --> p2([Making predictions <br> based on selected model for inference])
        p2 --> p3(Consolidate final score of prediction)
    end

    subgraph Training
        md([Fitting a <br>Regression model])==> di
        di{Binning Data via <br/> evaluating Distribution <br/> MAPE values }
        di --> md2([Fitting a Buckteing model <br/> to Binned MAPE Buckets])
        md2 --> md3([Fitting Regression <br> Models on Binned Data])
        md == Keeping main<br/>model ==> ro        
        md3 ==> ro(Final Models <br> Binning Data Models + <br> Set of Regressoin Models)
    end

    
    od([Data Input]) -- Training<br> Data--> md
    od -- Testing<br> Data--> p1
    ro -.-> p1
    ro -.-> p2

    classDef green fill:#9f6,stroke:#333,stroke-width:2px;
    classDef yellow fill:#ff6,stroke:#333,stroke-width:2px;
    classDef blue fill:#00f,stroke:#333,stroke-width:2px,color:#fff;
    classDef orange fill:#f96,stroke:#333,stroke-width:4px;
    class md,md2,md3 green
    class di orange
    class p1,p2 yellow
    class ro,p3 blue