First, thanks for stopping by again.
Second, it's long known that the game preserves random seed, so I sort of assumed that you knew that and didn't mention it again. Sorry for not doing so, because it would have saved you some unnecessary work: Your repetition of combats is not usable for our statistics. You did not provide a new sample, just mixed the existent sample a little, and hence get very similar results, no matter in which sequence you do the fights.
Here's an example. Let's say you did the warriors first, then some riflemen, and then all knights. Your warriors lost, your riflemen won, and your knights lost again, so all in all you had bad luck. To be precise, you mostly had bad luck, with a streak of good luck in between. Now you turn off stack attack and change the order in which you do your attacks. The RNG will still yield the same number sequence. So if you start with SAMs now, you will probably lose the first ones, necause the SAMs now have the beginning bad luck that your warriors had before. Also, the streak of good luck that your riflemen had *will* appear again (probably within the SAM battles, as these need more rounds than the warrior battles in your first try). This means that no matter in which sequence you do the fights, you will get similar, and by no means independent results. You then pile up these depemdemt results, get to a 12:2 win ratio and say that this is improbable, so the battle results have to be skewed. But you did this skewing yourself by using dependent data, which magnifies the random differences instead of balancing them.
If you do want more data, you have to get an independent sample, i.e. a new RNG sequence. To get this, load the game, enter the worldbuilder, save the game as world builder file, leave the game, and now load the worldbuilder file as a scenario. This will reseed the RNG, and you get new, independent data.
Third, why do you limit your analysis to "total matchup results"? This leaves you with only 6 independent data points (if we leave the tanks out). Of course the expected variation within 6 data points is much larger than in a sample with more (independent) data points. It's much better to perform such an ananlysis based on the outcome of the individual fights, because then we have 16*6=96 data points already. In each trial, there's a 50% chance to win, so each trial is like the tossing of a coin. With some stochastics, you can determine how probable / improbable our observed outcome is. Do you want to do this, or shall I take up the work again?
Edit: corrected formula in above paragraph, sorry. Also, addendum to my paragraph about more data points: a still better alternative to get greater number is to simply put more islands with units on the map. This has the advantage that a single save can be used to replicate the findings.