That's a very good match! I would say none of the 'worst' results are too far off to say it is invalid, but if you wanted to check you could run those tests again (ignoring the first results), and see if they come out closer.
Another reason to trust your formula is that the 'worst' results are both above and below the expected values - if they were all over or under then it would imply you may be missing something, but they don't.
If you take the 'worst' result for the 80/80 test run and apply the binomial distribution to it you can see exactly how good the actual values are at modeling your formula...
The standard deviation is:
s.d. = sqrt( npq )
where n=number of tries, p=prob(success), q=1-p
So, for 80/80 (n=400, p=61.54%=0.6154, q=0.3846)
s.d. = 9.73
In 400 tests:
The expected number of retreats is 400*0.6154=246.16
The s.d. is 9.73
The actual number of retreats is 227, 19.16 out from the expected.
The difference from actual to expected is 1.97*s.d.
This is further out from the expected value for a "one-off" event than would normally be comfortable, but since you did 28 tests, it is not that unusual. There are a couple more results with similarly amololous results, but nothing too unusual IMO.