How do you identify and deal with multicollinearity in your models?

The correlation coefficient measures the robustness of the relationship between two variables. The easiest way to find the correlation is by using function cor(DataSet Name). The value of the correlation coefficient, denoted as r, ranges from -1 to +1, which gives the strength of the relationship and whether the relationship is negative or positive. When the value of r is greater than zero, it is a positive relationship; when the value is less than zero, it is a negative relationship. A value of zero indicates that there is no relationship between the two variables.

correlation between two variable ranges from -1 to 1

If value is
>0 +ve relationship
<0 -ve relationship
=0 no relationship

As you can see, history_avg_timeonsite has a high correlation as compare to its correlation with today_sessions.

today_sessions                today_avg_timeonsite

history_avg_timeonsite           0.0461329219                  0.595848332

Apart from this, you may also plot correlation graph using below package

#draw correlation graph
M = cor(GaData) #where GaData is a dataset name
corrplot(M, method = “circle”)

So, by looking at the correlation value you may eliminate the variables and check the model accuracy.


You may also like...

Leave a Reply