Posted on

Learn MATLAB Episode #27: Mean and Variance

In this lecture we’re going to talk about mean and variance. So how do we measure the distribution of continuous variables? We know that with discrete variables we can just check how often your random variable takes on a certain value and divide it by the total number of values to give you an approximation, or an estimation, to that value’s probability, but we can’t exactly do that with continuous variables. We can’t really count them and put them in the buckets like discrete variables. So we need to measure certain characteristics of the distributions that might tell us the shape of their distribution curve. So two common measures that you probably have already heard of are the mean and the variance, so first let’s talk about the mean. So the mean is like the average. It means you add up all the different values and you divide it by the total number of values, and that kind of tells you what the middle value would be. So what I want you to do is to load back up our random integer CSV. And so you know how to calculate the sum of all values, and we want to divide this by the total number of values. So the mean of r, so let’s just plot r again to remind ourselves what values that it can take, so -5 to 5, and so we get an average value of 0.35. So why is this value so small, because we generated values from -5 to 5. With a uniform distribution we could say that the distribution is balanced whether the value is less than 0 or greater than zero, so a lot of the numbers end up canceling each other out when you sum them, and therefore our mean value is about zero. Notice that we can never actually get the value .35 from our distribution right because our distribution gave us uniformly distributed numbers between -5 and 5, but only with discrete values. So you could get negative five, negative 4, negative 3, negative 2, negative 1, or 1 all the way up to 5, but you would never get the value .35. So it might seem odd that we also call the mean value the expected value even though we would never expect to get that actual value. We only expect that to be the average value of the values that we do draw from this random distribution, and of course there is an easier way to calculate the mean in MATLAB and it’s just the function mean. Right, so, we get the same value. Now let’s talk about the second measure of distribution variance. So we already talked about the mean which is kind of like the middle value, so we could say that it measures the centeredness of the distribution, so where the middle is, where the center is. Variance does a different thing, it measures the spread. So, one thing tells us where the random variable goes and that’s the mean, and then the variance tells us how much it’s spread out from that middle value. So the definition of the variance is it’s the expected value of the random variable minus its mean, and then we square that whole thing, and of course that doesn’t help us much since it doesn’t tell us how to calculate or estimate the variance, but it’s very similar to how we do the mean. So, let’s set the mean of r to be mean(R) like that. We can then calculate the variance of R by taking R minus the mean of R, and so we want to do the dot product between these two which is effectively squaring all the individual values and then multiplying by the corresponding value like that. So it’s treating R as a vector, and then finding the dot product with itself after subtracting the mean. In other words, it’s like subtracting the mean and then taking the squared distance, or the squared length, and of course we divide by the length of R. So we get 10.6475. Of course MATLAB has it’s own variance function, so we didn’t really need to do all this work, and it’s just var. So we get a pretty similar value. One thing that comes up in statistics which is a little bit outside of the scope of our discussion is that if we divide instead of by n we divide by n minus 1, that gives us what we call an unbiased estimate of the variance, and so this gives us the exact value of matlab’s version of R.