Wilcoxon-Mann-Whitney test and a small sample size | Oxford Protein Informatics Group

The Wilcoxon Mann Whitney test (two samples), is a non-parametric test used to compare if the distributions of two populations are shifted , i.e. say $f_1(x) =f_2(x+k)$ where k is the shift between the two distributions, thus if k=0 then the two populations are actually the same one. This test is based in the rank of the observations of the two samples, which means that it won’t take into account how big the differences between the values of the two samples are, e.g. if performing a WMW test comparing S1=(1,2) and S2=(100,300) it wouldn’t differ of comparing S1=(1,2) and S2=(4,5). Therefore when having a small sample size this is a great loss of information.

Now, what happens when you perform a WMW test on samples of size 2 and 2 and they are as different as they can be (to what the test concerns), lest say S1=(1,2) and S2=(4,5). Then the p-value of this test would be 0.333, which means that the smallest p-value you can obtain from a WMW test when comparing two samples of size 2 and 2 is 0.3333. Hence you would only be able to detect differences between the two samples when using a level of significance greater than 0.333 .

Finally you must understand that having a sample of two is usually not enough for a statistical test. The following table shows the smallest p-value for different small sample sizes when the alternative hypothesis is two sided. (Values in the table are rounded).

Author

Luis Ospina Forero

View all posts

3 thoughts on “Wilcoxon-Mann-Whitney test and a small sample size”

Llibertat December 16, 2014 at 11:28 am

Hi,
However, what if we have two samples of different sizes? For instance, n1=15 and n2=2. Would it be possible to perform a M-W, just in a exploratory way?
Thanks a lot

Luis Ospina Forero Post authorJanuary 20, 2015 at 10:56 am

Hi, a WMW test can still be performed having such sample size values. In that case the minimum p-value you can obtain is 0.01471 for a two sided alternative.

I’m updating the post with a table of minimum p-values for different small sample sizes.

JP May 6, 2016 at 4:15 pm

I’m trying to understand the logic of these P values. Let’s take the simplest n1=n2=2 case you highlighted, with S1=(1,2) and S2=(4,5). Intuitively, the random chance that the first value in S2, when compared to the two values in S1, is the highest of the three, is 1/3. Likewise for the second value in S2. So I would have guessed that the one-sided P value would be 1/3 x 1/3 = 1/9, and the two sided P value would be 1/9 + 1/9 = 2/9 = 0.2222. But the table has P = 0.3333 instead, and the other P values also depart from my logic. Is there something I’m missing? Thanks.

Comments are closed.