Posted on 4 Comments

Learn MATLAB Episode #29: Hypothesis Testing

In this lecture we’re going to talk more about the Gaussian distribution. So the first thing we’ll ask is we’ve generated data so far from uniform distributions, how can we generate data for random normal distributions? So there’s a function in matlab called randn that will do this. So if you type in randn it should give you different function signatures. So let’s say we want to generate a thousand random normal values. Okay, and so this can represent any number of random random values. So if you download data from Kaggle or some other data set where you want to do machine learning or statistics on, you want to know the distribution of your input. In this case, we want to know if the distribution is normal or not. So one thing we could do to check that is just to plot it, plot the histogram. Alright, so, you can see that it looks relatively normal. We can use more bins to get a finer granularity. So we still see this same type of distribution even with 50 bins, and of course if we increase n our curve looks closer and closer to the Gaussian curve that we saw from the last lecture. Now of course in the real world your data is not going to look this nice and clean since your distribution may or may not be Gaussian distributed. The question now is how do you test if your data is Gaussian distributed or not? So we’re going to talk a little bit about hypothesis testing, but not too much since it’s a pretty long and difficult subject. So the basic idea with hypothesis testing is you have two different hypotheses. So in this lecture we’re not going to talk about the mechanics behind statistical testing, we’re going to kind of work backwards and jump right into statistical tests that you can use in matlab right away. So the first is a Jarque-Bera test. In matlab it’s the function jbtest. So you can use it to return a hypothesis and a p-value. The hypothesis will be whether or not your data is randomly distributed, and it returns a p-value to tell you the strength of that hypothesis. So let’s try this on our data that we just generated. Alright, so, one thing that we need to talk about is how to interpret the return values of the Jarque-Bera test. So, when we talk about hypothesis testing you’ll see that there is a null hypothesis and an alternative hypothesis. The Jarque-Bera test will return one if we reject the null hypothesis, and the null hypothesis is that our data is normally distributed. So if we pass it in R, we get the hypothesis is the null hypothesis and a p-value of .5 which means we do not reject the null hypothesis that our data is normally distributed. There is another test called the Kolmogorov-Smirnov test which essentially does the same thing. So the null hypothesis again is that the data comes from a normal distribution, and the result h will be 1 if you reject the null hypothesis, so let’s try that. Same thing with kstest, right, and so this also rejects or does not reject the null hypothesis that the data is normally distributed. So now let’s do something interesting, let’s generate some random data that we know is not normally distributed. So, we’ll do this data from a uniform distribution. So now I want to do jbtest on uniform data right. So we get h equals 1 which means we do reject the null hypothesis, and our p-value is one one-thousandth, and usually a p value is determined significant when it’s less than five percent. So, what this is basically saying is that our uniform generated data is not Gaussian distributed which we know already is true. We can do the same test with Kolmogorov-Smirnov, and so this also rejects the null hypothesis with a much smaller p-value. Alright, and so this is how you can test if your data is normally distributed or not.

Posted on

Learn MATLAB Episode #28: Gaussian (Normal) Distribution

In this lecture we’re going to talk about a special continuous distribution called the normal distribution, or the Gaussian distribution. It probably looks very familiar to you since it is what most people refer to as the bell curve, and you’ve probably seen this in school where bell curves are used to shift marks up or down based on how well students perform. So this formula you see here is the PDF of the Gaussian distribution, notice how they also use the little f notation on Wikipedia. The interesting thing about the Gaussian distribution, so we talked about last time that the mean and the variance are two special numbers that help us describe what a continuous distribution looks like. With the Gaussian distribution the mean and the variance completely describe the shape of the distribution. So, the mean tells us where the center peak is of the bell curve, and the variance tells us how much that bell curve is spread out. So you can see this yellow curve is very spread out and the blue curve is spread out not that much. So let’s talk a little bit about this formula. First, there is a normalizing constant. It’s 1 over the square root of 2pi times Sigma which is the variance. Actually, Sigma stands for the standard deviation. Usually it’s written as 1 over the square root of 2 pi sigma squared where Sigma squared goes inside the square root, so Sigma squared is the variance and Sigma is a standard deviation. Second part of the PDF is this exponential. So we take the negative of X minus the mean which we denote by mu, square that, divide it by 2 Sigma squared or two times the variance, and then we exponentiate that. n=Note that since we square the thing where X is this PDF is symmetric, so if you go a distance from mean to the left, or that same distance to the right, you will get the same value for the PDF. So let’s do an exercise where we plot the values of a gaussian curve from say -100 to 100. So we’ll create a new function and call it my Gaussian. It will take in two parameters mu and sigma squared, and I will output an array. So n will be the number of different values between min_x and max_x. So we’re going to start our little x value at min x. And then we want to know how much to increment X on each iteration of the loop, so we’ll call that dx, and we’ll say it’s max_x minus min_x divided by n. So at the end of the loop we’re going to add dx to X. We’re going to call this f. Okay, so, we’re going to return the array of X values, also. So, X(i) is going to equal to x, and f(i) is going to equal, return x and f, so 1 over the square root of 2 pi sigma_sq, times exponential of negative x minus the mean squared, divided by 2 times sigma squared. So let’s do this for mu equals 0, Sigma squared equals 1, let’s say -10 to 10, and then have a thousand values between them. Okay, so, now we can plot x and f, alright so we get this bell curve. So the peak is at zero because that’s the mean, and then it’s spread out and from about -2 to 2, so the drop-off or how fast f of X goes to 0 is pretty quick. You can see the maximum value is about 0.4. Let’s try that again with a smaller variance, and some smaller values also for min_x and max_x. So let’s do 0.1 for sigma squared, let’s plot it again. Alright, so, the drop off of is even quicker now where we get to about 0 at -1 and 1, and notice the peak value is above 1.2. So since the PDF values above 1 are allowed.

Posted on

Learn MATLAB Episode #27: Mean and Variance

In this lecture we’re going to talk about mean and variance. So how do we measure the distribution of continuous variables? We know that with discrete variables we can just check how often your random variable takes on a certain value and divide it by the total number of values to give you an approximation, or an estimation, to that value’s probability, but we can’t exactly do that with continuous variables. We can’t really count them and put them in the buckets like discrete variables. So we need to measure certain characteristics of the distributions that might tell us the shape of their distribution curve. So two common measures that you probably have already heard of are the mean and the variance, so first let’s talk about the mean. So the mean is like the average. It means you add up all the different values and you divide it by the total number of values, and that kind of tells you what the middle value would be. So what I want you to do is to load back up our random integer CSV. And so you know how to calculate the sum of all values, and we want to divide this by the total number of values. So the mean of r, so let’s just plot r again to remind ourselves what values that it can take, so -5 to 5, and so we get an average value of 0.35. So why is this value so small, because we generated values from -5 to 5. With a uniform distribution we could say that the distribution is balanced whether the value is less than 0 or greater than zero, so a lot of the numbers end up canceling each other out when you sum them, and therefore our mean value is about zero. Notice that we can never actually get the value .35 from our distribution right because our distribution gave us uniformly distributed numbers between -5 and 5, but only with discrete values. So you could get negative five, negative 4, negative 3, negative 2, negative 1, or 1 all the way up to 5, but you would never get the value .35. So it might seem odd that we also call the mean value the expected value even though we would never expect to get that actual value. We only expect that to be the average value of the values that we do draw from this random distribution, and of course there is an easier way to calculate the mean in MATLAB and it’s just the function mean. Right, so, we get the same value. Now let’s talk about the second measure of distribution variance. So we already talked about the mean which is kind of like the middle value, so we could say that it measures the centeredness of the distribution, so where the middle is, where the center is. Variance does a different thing, it measures the spread. So, one thing tells us where the random variable goes and that’s the mean, and then the variance tells us how much it’s spread out from that middle value. So the definition of the variance is it’s the expected value of the random variable minus its mean, and then we square that whole thing, and of course that doesn’t help us much since it doesn’t tell us how to calculate or estimate the variance, but it’s very similar to how we do the mean. So, let’s set the mean of r to be mean(R) like that. We can then calculate the variance of R by taking R minus the mean of R, and so we want to do the dot product between these two which is effectively squaring all the individual values and then multiplying by the corresponding value like that. So it’s treating R as a vector, and then finding the dot product with itself after subtracting the mean. In other words, it’s like subtracting the mean and then taking the squared distance, or the squared length, and of course we divide by the length of R. So we get 10.6475. Of course MATLAB has it’s own variance function, so we didn’t really need to do all this work, and it’s just var. So we get a pretty similar value. One thing that comes up in statistics which is a little bit outside of the scope of our discussion is that if we divide instead of by n we divide by n minus 1, that gives us what we call an unbiased estimate of the variance, and so this gives us the exact value of matlab’s version of R.

Posted on

Learn MATLAB Episode #26: Continuous Variables

In this lecture we’re going to talk about continuous variables, so we’ve talked about discrete variables up until now. Discrete variables can only take on distinct values, but continuous variables can take on any value. So with continuous variables we don’t have a notion of probabilities for exact values because X can take on an infinite number of values, so the probability of equaling any specific exact value is zero. We can have probabilities for ranges though. So, for example, we can say the probability of X being between 3.13 and 3.15 is greater than zero. We have a useful function called the cumulative distribution function, or the CDF, that helps us measure such probabilities. We usually label this function as big F of X, and so the definition of F(X) is it’s the probability that the random variable big X is greater than negative infinity, but less than little x. Note that the probability of big X being between negative infinity and positive infinity is 1 since X has to take on a value, therefore the value of big F of positive infinity is equal to 1. Now how about going back to our original problem if we want to calculate the probability that X is between 3.13 and 3.15. That would just be big F of 3.15 minus big F of 3.13. So now let’s talk about the other useful function when we’re talking about continuous variables. This one’s called the probability density function, or the PDF. We usually denote it by little f of X, and it is defined as the derivative of big F of X with respect to X, so it’s like the slope of big F of X. Note that this function can be greater than one since it’s not a probability, it is a probability density f of X, little f of X does have to be greater than or equal to 0 though. So here’s one example where little f of X can be bigger than one. So, let’s say little f of X is uniform between zero and 0.1, so that means if you try to sample from this random variable X you’ll always get a value between zero and 0.1, and the probability of any particular value is equal to all the others. Now I’m going to claim that little f of x has to equal 10 if X is between 0 and 0.1, and 0 otherwise. Now why is this, because big F of X. Since little f of X is the derivative of big F of X, big F of X is the integral of little f of X. In the integral we can take the constant out and then calculate the integral from 0 to X. Now we know that from above big F of infinity has to equal 1, so the integral from minus infinity to infinity equals to 1, but since little f of x is 0 after 0.1. we can just take the integral from 0 to 0.1. That gives us 0.1c, and if we solve for c, c equals 10. Therefore, we’ve seen a scenario where little f of X can have a value greater than one because it’s a probability density, and not a probability value. Later on in this course we’ll look at more complex continuous distributions.

Posted on

Learn MATLAB Episode #25: Birthday Paradox

One interesting and popular problem in probability is called the birthday problem, or the birthday paradox, and the problem goes something like this. So, given a classroom of n students, what is the probability that at least one pair of students shares a birthday? Now you might be surprised that at N equals 23 the probability is about fifty percent, which is why this problem is called a paradox. So that means in an average sized classroom there’s probably a pretty good chance that two people in the class share a birthday. This is counterintuitive since there are 365 days in a year. In this lecture I’ll show you the theory behind the solution, and how to visualize it in MATLAB. So the first thing is the problem of at least one pair of people sharing a birthday is difficult, but remember that the probability of all distinct events have to add up to one. So what is the opposite of at least one pair of people sharing a birthday? It’s not two people sharing a birthday, or three people sharing a birthday, or two pairs of two people sharing a birthday, these events all fit into at least one pair sharing a birthday. So the two disjoint events that we want to talk about are the probability that at least one pair shares a birthday, and nobody shares a birthday. So these two events are disjoint and therefore they have to add up to one. So we can calculate then the probability that two people or at least one pair of people shares a birthday as 1 minus the probability that nobody shares a birthday, and so in mathematics we would call this a counting problem. So now let’s think about how do we calculate the probability that nobody shares a birthday. So there are 365 days in a year. Now if you think of each day as a bucket we have one person and they have 365 buckets to choose from. The probability that this one person will collide with another is 0. The probability that one person shares a birthday with somebody else when there’s only that one person is 0, so the probability that nobody shares a birthday in this case is 1, or 365.365. Now what about two people? So with one person already having chosen a birthday or a bucket, the second person has a 1/365 chance of colliding with that person. So the probability of at least one pair having a common birthday is 364/365. So this is the case with two people. Now if we have three people the third person only has 363 buckets to choose from. So we have 364/365 times 363/365, and we can multiply these probabilities because they are independent. So this is the probability that three people, and a group of three people, at least one pair would share a birthday. So we can continue this pattern but it would probably be easier to write a matlab function to do this. So my birthday function is going to return all the probabilities up to the value n. So I’m going to initialize a to be an array of zeros, and I’m going to count up to n, and fill in the values of a. Actually, I’m going to say one is because we’re subtracting from one. Actually, it doesn’t matter what I initialize date to because i’m going to say 1 minus over here. Okay, so, we know that we have to multiply by something over 365 each time and subtract that from one, and so we can use a for loop to iterate over the thing that has to be subtracted and multiply the new value iteratively. Now that we have our function let’s test it, so let’s say a equals birthday, and let’s set n equal to 100, and let’s plot. Okay, so, you can see here when n is about 23 you get the probability around 0.5. When n is equal to 50 you’re right above 90%, so there’s a pretty good chance in a group of 50 people that at least one of those pairs of people shares a birthday. So let’s check A(23), right it’s about 50%, and that is the solution to the birthday paradox.

Posted on

Learn MATLAB Episode #24: Generating Random Values

So in this class we’re going to talk about generating a random variable from a certain distribution. This could be useful for doing simulations of systems that have uncertainty. Matlab has some built-in functions to help us do this. So the first one we’re going to talk about is called randi, and it takes one argument called imax, and this function gives us a uniformly distributed variable between one and imax. So let’s try it. So 9 is in between 1 and 10. Now there’s another function randi which takes in a maximum value, and another parameter called n. So let’s set n to 3, so that returns an n-by-n matrix of random values between 1 and imax. So suppose I wanted to generate random values between 10 and 20, how would we do that? Because randi can give us values anywhere between 1 and imax, so what we could do is we could just add 10 to all the values that randi returns. So this gives us a 3 by 3 matrix with values only between 10 and 20. Another useful function is just rand by itself. So this function gives us a random number between 0 and 1, so it’s different from the previous one where we don’t get integers we get real numbers. rand returns a number with a uniform distribution, so the probability of getting point .25 is the same as the probability of getting 0.75. So let’s try and imply histograms for different values of n. Alright, so this is a histogram for random numbers between 0 and 1, and N equals 10 array. Alright, so it’s not quite uniformly distributed, let’s try a bigger n value. Alright, so immediately it starts looking more uniformly distributed as n increases, so let’s try a bigger n. Alright, so it looks even more uniformly distributed. Now, 10,000. Alright, so it’s almost flat even. That’s 100,000, and this is a million, it looks almost perfectly flat. So that’s the idea with the frequentist view of probability is that when n approaches infinity, your probabilities approach their true values. So now let’s think about a different problem. Suppose I want a specific discrete distribution, so say I want to simulate an unfair coin. So, to write it out I want p of heads equal to .25, and I want p of tails to equal 0.75. How could I write a function to give me random values that could draw from this distribution instead of a uniform distribution? So we can create a function to do this. We can call it biased coin, it’s going to take in one value little p which represents, let’s call it P heads which is probability of getting heads, and it’s going to return the coin face. So we’re going to generate a random value, if it’s less than P heads we’re going to return heads, else we’re going to return tails. Let’s try our function. Alright, so now we’re going to try our new biased coin function by initializing an array of say size 1,000…you know what we’re going to do this in a separate function. We’re going to initialize a n by 1 array, we’re going to count from 1 to n, and we’re going to use the biased coin function to generate a value for each element of the array. Alright, so let’s try the function we just made. Test coin .25 for n equal to 1,000. Alright, so you see the number of heads which resolves to the integer 104, and then tails resolves to the integer 116. So you see this is about 250 and this is about 750 which is what we would expect in the thousand coin tosses.

Posted on

FREE Python Programming Course on Teachable

If you want to learn how to program, you will LOVE this course! This course was designed for complete beginners with little to no understanding of programming, and will give you the knowledge to get started coding using Python 3.

Enroll now for FREE on Teachable!

https://goo.gl/x6oBPE

We will cover the following topics in this course:

  • Python installation
  • Running Python scripts in terminal
  • PyCharm IDE setup
  • Numbers, strings, Boolean operators, lists, dictionaries, and variables
  • Functions, arguments, return values, loops, and modules
  • Final project using the information covered in the course

We hope you enjoy the course and it our goal to give you the knowledge to begin writing your own programs in Python!

https://goo.gl/x6oBPE

Posted on

Learn MATLAB Episode #23: Measuring Probability

Welcome to this third lecture on matlab and probability. In this lecture we’re going stray from your typical probability course. So we’re going to have some data matrix, and we’re going to measure the probability of a variable that the data matrix represents. So what I want you to do is I want you to go to in your web browser github.com/lazyprogrammer/matlab-probability-class. Once you go there, you’re going to copy and paste this SSH clone URL, and I want you to go into your terminal and type in and git clone, and then paste that URL. I’ve already done it so I’m not going to do it again. This is going to give you some files that are relevant to this class that I’m going to use for the coming lectures. Okay, so, now that you have those files you want to go into matlab, change your directory to work in that same directory that you guys checked out from git. So we’re going to load this data. Ok, so, r is a 100 by 1 matrix, so let’s just plot r amd let’s see what it looks like. Alright, so, it has a bunch of random values that are between -5 and 5. So now let’s say I wanted to calculate the probability that r is equal to -5. How would I do that? So, one way is I could sum all the values where r is equal to -5 and divide it by the total number of values in r. So it gives me .07. I can do the same thing for every other value in the matrix. So we get about .06 to .11. So this is what we call the frequentist view of statistics. It means that if we flip a coin 1000 times, and that’s a fair coin with heads or tails, the probability is 50% and we should get heads about 500 times, and we should get tails about 500 times. The idea is that as the number of coin flips approaches infinity, our measurement of the probability of heads should approach 0.5. So we can also plot the histogram of r, and this should give us an idea of the shape of the distribution. Alright, so, let’s do a little more complex example. Let’s say I want to calculate the probability r is even. How can we do that? So in the same way as we did before, we might want to say r equals -4 or r equals -2, but that wouldn’t be the best way to do this. We would use the modulo function. So if mod(R,2) is equal to 0, that means r is even, divided by the length of r. Alright, so the probability that r is even is .43, and we can do the same thing to determine the probability that r is odd. And so notice that these two events are disjoint. You can either have r equals even or r equals odd, and those two events are the only possible events. So their probabilities should add up to 1, and we can verify that .43 plus .57 is equal to 1. Now so let’s say I want to test if r takes on a specific value, so let’s say I want to calculate the probability that r is equal to -5 or positive 5. How would I do that? So we would use the or operator. So r equals 5 or r equals -5, divided by the length of r. So the probability that r is equal to 5 or negative 5 is .21. Notice that we can use the same method for our first problem which was to determine even or odd. Which is the way I suggested not doing it, but we want to check our answers, and so that gives us the same answer. The probability that r is equal to -4, -2, 0, 2, or 4 is the same as the probability that r is even.

Posted on

Learn MATLAB Episode #22: Introduction to Probability

Welcome to this course on matlab and probability. This first lecture will focus on the introduction and course outline. Because this course uses matlab it won’t follow a traditional probability and statistics course outline, rather i’ll show you as we go along how the concepts of probability can be applied, or viewed, through the lens of matlab. More generally, I want to show you how a programmer might approach problems in probability. So now let’s talk about some of the topics we’re going to go through in the second ecture. We’re going to talk about what is probability, and we’re going to give some definitions and examples. In the lecture after that we’re going to talk about how we can measure probability given some data. So, how to open a file, and measure the probability of some of the features of your data. In the next lecture we’ll talk about how do you generate random data, so how can you do a probabilistic simulation. In the next lecture we’ll talk about a famous problem in probability called the birthday problem, or the birthday paradox. In the next lecture we’ll extend the idea of probability from discrete variables to continuous variables. In a lecture after that we’ll talk about a special continuous variable distribution called the Gaussian distribution, or the normal distribution. In the next lecture we’ll talk about if you have some data that is continuous, how do you test if it is Gaussian distributed? In the lecture after that, we’ll talk about if you have two different Gaussian distributed groups of data, how can you compare the two? And then in the last lecture, we will extend the idea of the Gaussian to a multi dimensional Gaussian. So, what will you be able to do by the end of this course? You’ll understand mathematical problems that contain uncertainty. You’ll be able to quantify uncertainty both in theory and in practice. You’ll be able to use matlab to measure uncertainty in your data. You’ll be able to use matlab to simulate systems that contain uncertainty. You’ll be able to understand and calculate probabilities in the famous birthday paradox. You’ll understand the mathematics behind the very famous bell curve, or a normal distribution, or Gaussian distribution. You’ll know whether or not the data you’re working with is Gaussian distributed, and you’ll know how to handle Gaussian distributed data. I look forward to teaching you.

Posted on

Learn MATLAB Episode #21: Gaussian Filter Blur and Edge Detection

So now let’s take our Gaussian and convolve it with the image. So now remember that A is 512 x 512 x 3, which is a three-dimensional matrix, and H is a two-dimensional matrix. So if I try to do this I’m going to get an error I’m sorry we are using the CONV2 function, and you still get an error because A is not supposed to be a three-dimensional matrix, right. So if we look at the definition of convolution we’re working with two dimensional matrices, if we have two dimensional convolution. So what we want to do is let’s just take the red channel and it doesn’t matter which one, because remember when we were looking at the red, green, and blue channels we pretty much saw the same image for every channel. Ok, so, we still get another warning and that’s that A is still in values of UINT8, so let’s change that to double. Ok, so, now let’s imshow(C) and see what we get. So it’s all white, and so what does white mean? That means all of the values are too high, right, so there’s too much intensity in this image. So, you have to play around with the values a little bit, so let’s try making a smaller filter. Ok, so, let’s say H is equal to my_gaussian 25 and then sigma is 5. So now the filter’s smaller and the sigma is smaller, so the blur is going to be less, but the intensity is still too high. So that could mean just the values are too high and we need to multiply by lower values. So let’s divide the filter values H x 1000, and imshow(C) again. So now we’re starting to see some black, right, so that’s good. Ok, so, I’m decreasing values and I’m starting to see the image. Okay, so, here’s the original image but blurred using a Gaussian of sigma equals 5. Now one thing I didn’t show you guys because I wanted you to go through the exercise of creating a two dimensional Gaussian filter yourself, is that we already have a function called fspecial in matlab that create filters for us, and so fspecial can create many different types of filters. So, in addition to the Gaussian it can create laplacian filters, an averaging filter which is another thing we’ve used, the Sobell filter which is useful for finding edges, so all different types of filters. So let’s try using fspecial instead. fspecial(‘gaussian’, 25, 5); Now let’s do our convolution. So I’m going to divide by a thousand, because I know I’m going to have to divide. So let’s just see what we see. Ok, and so it’s a little dark so I didn’t have to divide by a thousand, maybe I could had divided a lower number, but you can see the idea is that we’ve blurred the original image using the Gaussian filter given to us by fspecial instead of our own Gaussian filter that we built. Ok, so, like we did before I want to plot the filter that we created. So I’m going to go imshow little h, which is the filter we created with fspecial, and notice how it’s just all black. Remember that 0 is black and 1 is white. So what we could do if we want to look at what’s in h, let’s check the maximum value of h. So max, max because it’s two-dimensional. The maximum value is .0065, so of course when we plot that it’s going to be pretty close to black. So what we could do if we want to scale it by one, if we want the maximum value to be one, we could do imshow(h) divided by the maximum value. Ok, so, now we see what we expect to see which is white in the middle. So one question you might have is why is the value of h so small when we use fspecial? That’s because it does a thing called normalization. So if you have ever studied probability you know that a probability distribution has to sum to 1, so here it is a similar thing. So if we sum across both dimensions of h we get 1, and so that’s why the values of h are so small. And so now just for completion sake, I want to do sort of what’s the opposite of blurring, I want to do edge detection of the image. So I want to find all the edges and set those values to 1. So matlab has a method called edge that will do this, and it’s very simple to use. You pass in again the gray-scale image, right, so only a two-dimensional matrix that won’t work if you pass in the entire image matrix. Ok, so, edge just does all of that automatically for you. So if imshow(E) I can more or less see where all the sharpest edges are, and of course there are parameters you can pass into edge to make it more or less sensitive, but this is by default what it does.