Alexander Gutenkunst

Tech, Tricks and Thoughts

In this post the effects of outliers in data sets on the mean and the median of the set is visualized and roughly explained.

Your browser does not support HTML5 canvas.

Drag the points to see the effect on the mean and median.

The mean of a set of points is mostly defined as

$\bar{x} = \frac{1}{N} \sum\limits_{i=1}^N {x_i}$

one can think of this value for example as the centre of gravity.

The median on the other hand is defined as being the central value of a ordered(!) set of values. For example if you have
the set

$X = [3\; 5\; 2\; 1\; 9\; 23\; 7]$

the median would be $5$ because it is the value in the middle of the ordered set

$X_{sort} = [ 1\; 2\; 3\; \underline{5}\; 7\; 9\; 23 ]$.

To visualize the effect of outliers on both the mean and the median I created a small interactive demo for you. As you can see the mean is heavily influenced by the outlier to the right, whereas the median shows some robustness against it. You can drag the points to see the influence live.

Note that while the definition of the mean in higher dimensions is straightforward, the definition of the median in higher dimensions is a bit more complicated. But more on this soon...

As a beginner in MATLAB™ you mind find yourself confused about the indices. Maybe this short cheat sheet on matrix indices can help you getting started!