Posted on

Learn MATLAB Episode #28: Gaussian (Normal) Distribution

In this lecture we’re going to talk about a special continuous distribution called the normal distribution, or the Gaussian distribution. It probably looks very familiar to you since it is what most people refer to as the bell curve, and you’ve probably seen this in school where bell curves are used to shift marks up or down based on how well students perform. So this formula you see here is the PDF of the Gaussian distribution, notice how they also use the little f notation on Wikipedia. The interesting thing about the Gaussian distribution, so we talked about last time that the mean and the variance are two special numbers that help us describe what a continuous distribution looks like. With the Gaussian distribution the mean and the variance completely describe the shape of the distribution. So, the mean tells us where the center peak is of the bell curve, and the variance tells us how much that bell curve is spread out. So you can see this yellow curve is very spread out and the blue curve is spread out not that much. So let’s talk a little bit about this formula. First, there is a normalizing constant. It’s 1 over the square root of 2pi times Sigma which is the variance. Actually, Sigma stands for the standard deviation. Usually it’s written as 1 over the square root of 2 pi sigma squared where Sigma squared goes inside the square root, so Sigma squared is the variance and Sigma is a standard deviation. Second part of the PDF is this exponential. So we take the negative of X minus the mean which we denote by mu, square that, divide it by 2 Sigma squared or two times the variance, and then we exponentiate that. n=Note that since we square the thing where X is this PDF is symmetric, so if you go a distance from mean to the left, or that same distance to the right, you will get the same value for the PDF. So let’s do an exercise where we plot the values of a gaussian curve from say -100 to 100. So we’ll create a new function and call it my Gaussian. It will take in two parameters mu and sigma squared, and I will output an array. So n will be the number of different values between min_x and max_x. So we’re going to start our little x value at min x. And then we want to know how much to increment X on each iteration of the loop, so we’ll call that dx, and we’ll say it’s max_x minus min_x divided by n. So at the end of the loop we’re going to add dx to X. We’re going to call this f. Okay, so, we’re going to return the array of X values, also. So, X(i) is going to equal to x, and f(i) is going to equal, return x and f, so 1 over the square root of 2 pi sigma_sq, times exponential of negative x minus the mean squared, divided by 2 times sigma squared. So let’s do this for mu equals 0, Sigma squared equals 1, let’s say -10 to 10, and then have a thousand values between them. Okay, so, now we can plot x and f, alright so we get this bell curve. So the peak is at zero because that’s the mean, and then it’s spread out and from about -2 to 2, so the drop-off or how fast f of X goes to 0 is pretty quick. You can see the maximum value is about 0.4. Let’s try that again with a smaller variance, and some smaller values also for min_x and max_x. So let’s do 0.1 for sigma squared, let’s plot it again. Alright, so, the drop off of is even quicker now where we get to about 0 at -1 and 1, and notice the peak value is above 1.2. So since the PDF values above 1 are allowed.