The Wilcoxon Mann Whitney test (two samples), is a non-parametric test used to compare if the distributions of two populations are shifted , i.e. say $latex f_1(x) =f_2(x+k)$ where k is the shift between the two distributions, thus if k=0 then the two populations are actually the same one. This test is based in the rank of the observations of the two samples, which means that it won’t take into account how big the differences between the values of the two samples are, e.g. if performing a WMW test comparing S1=(1,2) and S2=(100,300) it wouldn’t differ of comparing S1=(1,2) and S2=(4,5). Therefore when having a small sample size this is a great loss of information.
Now, what happens when you perform a WMW test on samples of size 2 and 2 and they are as different as they can be (to what the test concerns), lest say S1=(1,2) and S2=(4,5). Then the p-value of this test would be 0.333, which means that the smallest p-value you can obtain from a WMW test when comparing two samples of size 2 and 2 is 0.3333. Hence you would only be able to detect differences between the two samples when using a level of significance greater than 0.333 .
Finally you must understand that having a sample of two is usually not enough for a statistical test. The following table shows the smallest p-value for different small sample sizes when the alternative hypothesis is two sided. (Values in the table are rounded).