Actually my point is more that the data given in the OP's post is mostly complete. It is just his description of how he set it up that is lacking. He did say the odds were 50:50, and the results of 38 and 63 make this very likely.
My argument is that the OP has not used this data to construct a proper argument. Instead he seems to be ignoring (possibly unintentionally) important points relevant to RNGs, invalidating his hypothesis.
My hypothesis, if you'd call it that, is that the RNG is adequate for the purpose of being used throughout this game. It is not perfect, as no PRNG can be, but I have yet to see any signficant evidence (i.e. not just anecdotal) that would differentiate it from a pure RNG (one that is perfectly random). As I briefly mentioned earlier, if an argument put forward to discredit a RNG could equally be used to discredit a pure RNG, then something is not right.
The cardinal sin that is committed by the more naive when examining RNGs is to consider results in a retrospective fashion. This is done pretty frequently on the boards. People will say, "You won't believe my luck! I lost two 99% battles in a row!". The person or another poster soon after will calculate that the odds of such a thing happening were 1 in 10,000 and say he/she was very unlucky. Technically the RNG was just doing its job and there was no luck involved whatsoever.
Examining a random sequence retrospectively, looking for patterns (e.g. battles lost in a row) is actually a flawed way to examine the quality of a RNG. Proposing tests that would strengthen or weaken a particular hypothesis, and then carrying out those tests is the more reliable method (I was about so say the superior

Attacko rubbing off on me).
If one tossed a coin 100 times and then examined the resulting string of results, it is very likely one would be able to find some string that seemed 'unrandom' somehow. For example, TTTTTTTT, or THTHTHTH, or TTHHTTHH, or THHTTTTHHT etc. If you looked hard enough you would usually find something you consider odd that had less than 1 in 10 chance of occuring. If you could do this in every random sequence of 100 coin tosses, does it prove anything? All it proves is the human brain's remarkable ability to detect patterns quickly (arguably there is an obvious evolutionary reason for this which I won't discuss here), and that the way our brains are programmed to examine data is not conducive to detecting
absence of true randomness.
Testing Civ4's RNG for correctness in
monobit frequency (that is, over time it produces as many "wins" and "losses" as it is supposed to) is almost pointless. This is because a PRNG that did not adequately meet this condition is likely going to fail in most other regards. Also, a PRNG that did not even pass this test would be very easy to pick with relatively simple, cheap, tests. I would highly doubt Soren would code a PRNG that was flawed in the most basic of ways as that.
What is more likely to concern users is the appearance of streaks (sometimes called 'block' frequency) or periodicity (the random sequence will repeat after an unacceptably short time). I would guess that Soren's RNG is of the Linear Congruential type (LCG = Linear Congruential Generator - look it up if you're interested) - one of the most primitive yet decent PRNGs out there. LCGs to my knowledge can be prone to periodicity though certainly not on a scale that any of us would detect when playing the game. LCGs must have a maximum period but once again if I were to guess, I'd say Soren's RNG wouldn't have a period any lower than 10^6 or 10^7.
EDIT... I see DanF has given me the source code. Thanks!
