I do not care what you say. (Part 2)

Bigv32

Prince
Joined
Jun 27, 2008
Messages
567
Hello some of you may have read my first thread a couple of days ago about the combat odds and why I do not think that the game is as accurate as the odds say.
Sorry about the length but I felt I needed to explain what I did in a little more detail.

For those of you who did not here is what happened and why I am starting this new thread. If you read the other thread skip to part 2.

1. I did a test with samuri and axemen. The first test was only about twenty battles with reloads and new random seed option on for those of you who asked about that. The end result was that I lost more than I should have.

The next test I did was one hundred battle with samuri and axeman. There was a plus or minus 15% error in the outcome. (Thanks to who ever came up with that.)

2. Due to arguement about samuri and axemen promotions as well as the first strikes of the samuri I have redone my test. I also have used the standard version of the game (With Bug if it matters, No ACO) with no mods.

For those who do not know the first tests were done on one of my mods, but I did nothing to affect the units.

Now for the test and results.

I was boudica of celts (Yes I wanted the aggressive trait will explain later) and I played against washington so he had no promotion trait.

Did not care about the map so I just started and then went world builder. I made a 2 by fifty long island of grassland so I could have fifty battles and see them easily. Because of the arguements about samuri and first strikes I used axemen for both sides.

Now since I think the odds are wrong I was boudica for the easy combat one promotion. Since the enemy axemen would get a 10% defense bonus from grassland this way the odds should be fifty fifty.

I did the fifty battles and reloaded and did another fifty. (New random seed option on)
I then repeated the whole thing by reloading for a total of two hundred battles.

The results were not surprising to me. The first one hundred I only won 38 battles. while in the second test I won 63 battles.

This is just like my test with the samuri and axemen because the error margin was pretty close to plus or minus 15%.

Any comments or ideas. I am willing to try again but I do not see anything wrong with this test. The only thing I will not do is thousands of test like one person in my last thread said. This topic while it interests me is not that important because there is nothing I can do to fix it.
 
So, you did two tests and one of them was 12% below 50% and the other was 13% above 50% and this somehow doesn't average out to 50%?
 
Now since I think the odds are wrong I was boudica for the easy combat one promotion. Since the enemy axemen would get a 10% defense bonus from grassland this way the odds should be fifty fifty.

1. Boudica does not get combat 1 from worldbuilder axemen.
2. Civ4 grassland does not have a 10% defense bonus.
 
They were technically two different tests so the results from one had a twelve % error while the other had a thirteen percent errror. While the two together may average out they are not the same test and can not be considered together.

Thus the average margin of error for the two tests were 12.5%. Not the 15% from before but still a rather large number when you think about it.

If these numbers really show an error (and I think they do), everytime you look at the odds there is a range of plus or minus 12.5% so there is a total range of 25% in which the true odds are.
 
They were technically two different tests so the results from one had a twelve % error while the other had a thirteen percent errror. While the two together may average out they are not the same test and can not be considered together.

Thus the average margin of error for the two tests were 12.5%. Not the 15% from before but still a rather large number when you think about it.

If these numbers really show an error (and I think they do), everytime you look at the odds there is a range of plus or minus 12.5% so there is a total range of 25% in which the true odds are.

What do you mean they can't be considered together? You did two tests of the same scenario to validate that scenario. How can you do anything other than consider them together? Your results indicate that the 50% average seems to be accurate.
 
They were two tests because I did fifty and then went to worldbuild and repeated.

Then I reloaded and did it again. Different random seed but still there was about a 12.5% error in both of them. One above and one below. The first tests I did in my first thread had about a 15% error but both were below while the only one was above.
 
I repeated the tests with two more pairs of axemen.

First test: the attacker won!
100% win rate.

Second test: the attacker lost!
0% win rate.

That's a 100% gap between the two tests. Clearly something strange is going on.
 
I repeated the tests with two more pairs of axemen.

First test: the attacker won!
100% win rate.

Second test: the attacker lost!
0% win rate.

That's a 100% gap between the two tests. Clearly something strange is going on.

:lol: heresy. Everyone knows it's impossible to win against the computer with 50% odds :rolleyes:.
 
Well, a little knowledge is a dangerous thing.

The maths isn't that hard to understand it's just probability and statistical (confidence interval) testing. We know each round is a binomial trial and independent.

I can't be bothered to do the maths though ;) PieceOfMind knows his onions since he did the advanced combat odds calculator.
 
You should probably also report

a) what the displayed odds were and
b) what the combat log said was happening (the combat log gives a round by round report of each battle - an important detail it tells us is how much damage each unit can do to the other).

Trying a couple quick experiments on my machine (BTS 3.19)
  • Axe vs Axe displays 50% odds of winning, with damage/round at 20/20
  • Combat I Axe vs Axe displays 68.1% odds of winning, with damage/round at 20/19
  • Combat I + II Axe vs Axe displays 73.0% odds of winning, with damage/round at 21/18.
 
They were two tests because I did fifty and then went to worldbuild and repeated.

Then I reloaded and did it again. Different random seed but still there was about a 12.5% error in both of them. One above and one below. The first tests I did in my first thread had about a 15% error but both were below while the only one was above.

So, you're saying that some are below and some are above. If you have a 50% average that's what you're going to expect. It's unlikely that you'd get any at 50% but likely that multiple tests will show that as a mean. While one can't deduce any broad statistical trends from two trials, the results you posted do seem to indicate that the 50% average the game claims seems to be reflected in what you found.

I just don't get how it is that you're saying they don't do that.
 
Did not check the combat odds for these test because I just assumed fifty fifty with the defensive bonus, but someone said there is not defensive bonus so there goes that idea.

Also you want to see the combat logs? There were over 200 battles. That is a lot of logs and I do not even try to look at them. Just saying that is alot.
 
Also you want to see the combat logs? There were over 200 battles. That is a lot of logs and I do not even try to look at them. Just saying that is alot.

Look at the log after the first battle of an experiment, when there aren't nearly as many entries. Get the data, then run the rest of the experiment. We're not trying to verify every combat (although I suppose we could), but instead to make sure that the numbers displayed match the numbers we are modeling.
 
I will try to get them, but I did not save after the battles and my brother may be playing when I get home so no auto save. If not I did save before the battles, but the numbers will not be the same because of new random seed. Hope I remember will not be home till about eight so it may be tommorrow before I post that info.
 
Does the OP realise that getting a 50:50 result for his test is actually very unlikely? Getting a so called 'error' is what's to be expected. Also, 100 trials is laughably insufficient for proving any inconsistency. It seems most people who question the validity of RNGs can't grasp the fact that a random process is supposed to be unpredictable. If you always got the average result from such short experiments you'd be proving it's a poor RNG. Ask yourself, should a good RNG be predictable?
 
More math, less name-calling.
What name-calling? laughably is an adjective (or is it an adverb?) describing the extent to which I believe 100 trials to be insufficient.
 
What name-calling? laughably is an adjective (or is it an adverb?) describing the extent to which I believe 100 trials to be insufficient.

I have to agree with this. He was only discussing the results of the experiment itself and didn't mock the poster personally. There was no name-calling involved.
 
Back
Top Bottom