Just a quick word about the statistical side of it:
The "95% formula" will look like K1*(#defenders)+k2*sqrt(#defenders), for any reasonably large number of defenders - the second term reflects the standard deviation of the number of attackers each defender will take out before he perishes.
The exact sizes of K1 and K2 aren't easy to determine except by experiment.
As an approximation, you might treat treat attack strength and defense strength (3 and ~3.7, for the sword-vs-spear-in-city example) as fixed, but instead of counting numbers of units, count total hit points. That is, treat ten veteran defenders as having 40 hit points - maybe a few more, since they get a few more promotions than attackers do.
For each individual *point* in the battle, the attackers have ~44% chance of inflicting damage, ~56% chance of taking it.
Now just look at a negative-binomial distribution, and ask, how many attacking hitpoints will it take, to take out N defending hitpoints? If you use this approximation you will conclude K1~1.23 and K2~2.03, which, for 5 defenders = 20 points, estimates 34 attack points ~ 9 units are needed. This is too low - quite a lot too low - since units that end their turn damaged are unable to "use up" the rest of their hit points the same turn, but would not be a bad approximation if you allowed the battle to go on for several turns, neither side healing or getting reinforcements.
On the other extreme, you can look at it unit by unit - ignoring the fact two attackers might be able to lay into the same defender. Now you get K1~2.68 and K2~2.05, which for 5 defenders suggests 18 units are needed. This is quite a lot too high - still a little too high, but not as much, if you assume damaged defending units are able to heal between turns some but damaged attacking units can't.
The truth lies somewhere in between, and can probably only be discovered by experiement. I would say offhand that somewhere halfway between these two end-member cases would be about right... so how does 2*#def + 2*sqrt(#def) look to you all?
1 spearman -> 4 attacking swordsmen needed
2 spearmen -> 6.8 swords
3 spearmen -> 9.5 swords
4 spearmen -> 12 swords
5 spearmen -> 14.5 swords
6 spearmen -> 16.9 swords
...
10 spearmen -> 26.3 swords.
The predictions of 4 and 6.8 swords for 1 and 2 spearmen are dangerous, since "2 standard deviations from the mean" won't get you to 95% confidence for very skewed distributions. Still - the goodness of fit to the 1-to-6 spearmen experiment is good.
Anyone want to run a simulation for 10 spearmen and see how close 26 is to being right?
(The math was so much easier to do in Civ1!)