Page 397 - Special Topic Session (STS) - Volume 4
P. 397

STS2320 Ali S. H.
            experimental rule, a variable has a sever skewness if its absolute SC is greater
            than 2 and severe kurtosis if its absolute value of KC is greater than 0.5.
                What to do with variables that have severe skewness and/or kurtosis? One
            way out here is to use the Box-Cox power transformation to make the variable
            that have severe skewness and/or kurtosis closer to the Normal distribution.
            To be specific, one can replace the i-th value, xi, by  () = ∑ −1)/. The
                                                                           
                                                                   
                                                                           
            parameter  is chosen such that the distribution of the variable Y() is close to
            normal. One way to achieve this is draw the Normal Probability Plot of Y()
            and  choose  the  value  of    that  makes  the  graph  as  linear  as  possible.
            Techniques such as the use of sliders (see, e.g., the software package Data
            Desk)  can  be  used  to  achieve  this  goal.  Alternatively,  the  function
            “BoxCoxLambda”  in  the  R  package  “DescTools”  automatically  detects  the
            optimal parameter A. Note that if the optimal value of l turns out to be zero,
            this indicates that the optimal transformation of the log transformation, that
            is, y(0) = log(x).
                For example, Figure 1(a) shows the histogram of a variable X, which shows
            clear  departure  from  Normality  as  indicated  by  SC  =  5.079  (significantly
            positively skewed) and KC = 25.495 (significantly heavy right tail distribution).
            The variable is highly skewed and has a relatively heavy tail. The variable needs
            transformation to achieve Normality. Figure 1(b) shows the Normal Q-Q plot
            of the variable X. Here  = 1 means no transformation is taken. The scatter of
            points do not resemble a straight line and the correlation between the sample
            quantiles and the theoretical quantile (under Normality) is low (correlation =
            0.544). Consistent with the histogram in Figure 1(a), this graph in (b) shows
            clear departure from Normality.





























                         Figure 1. Box-Cox Transformation of the variable X

                                                               386 | I S I   W S C   2 0 1 9
   392   393   394   395   396   397   398   399   400   401   402