Posted on

Learn MATLAB Episode #30: 2 Sample Tests

In this lecture we’re going to continue our talk about hypothesis testing of Gaussian distributed data. So to elaborate on how hypothesis tests work we’re going to talk about the hypotheses a little bit more. So there are two hypotheses when we’re doing a statistical test. There’s a null hypothesis which we call h-naught, and it usually represents the control data or random data that results purely from chance. The second hypothesis is the alternative hypothesis. So this is the thing we’re trying to prove, generally. For example, if we’re doing an experiment where we’re testing a drug, the drug actually working would be the alternative hypothesis. And so the alternative hypothesis is the hypothesis that sample observations are influenced by some non-random cause. So now suppose we have some random data, so let’s say R1 equals randn. Let’s say it as a hundred points, a mean of zero, and a variance of one. Now let’s say we have a another data set which is maybe 20 points, but this one’s going to have a different mean. So let’s say it’s mean is one, and let’s increase its variance a little bit. Okay, so, we have two distributions that are Gaussian distributed. The first one has 0 mean and one variance, and the second one has a mean of one but a standard deviation of two. So, how do we compare these two distributions? There is a test called the ttest, or the two-sided t-test, or the two-sample t-test which does what we want, and it again returns a hypothesis and a p-value. So we can try this see ttest2(R1,R2). Ok, so, we reject the null hypothesis in this case with a very, very small p-value. Remember it only has to be less than five percent for us to reject the null hypothesis. So let’s try some things. Let’s have less data points for R1. Okay, and let’s do our ttest again. So notice how we still reject the null hypothesis but our p value has increased, so it’s less significant than before. Alright, so, now let’s do the same thing for R2, let’s say this now only has 10 points. Let’s do the ttest again. Alright, so, this is still significant. Let’s increase the variance. Alright, so, I had to increase the variance a lot to get an insignificant p value, so that’s one thing about when you’re comparing two gaussian distributions you can’t really say one is bigger than the other if they’re spread out a lot. So let’s put the variance back for R2, but let’s say the mean is now less far away from R1’s mean. Let’s do our ttest again, and so this also gives us an insignificant p value. So that’s another fact about the ttest is you also can’t tell if two distributions are different if they are very close together. If they’re very far apart, so the mean is, let’s say we do the ttest again, we now get a very small p-value.