So I made some more diagrams with the latest version and I figured they might be interesting for fellow nerds.
But first some definitions.
* A run of the simulation takes an initial position of friendly/neutral/enemy units and tries different assignments (move/attack/etc) for the friendly units resulting in new positions
* The positions have scores to guide the search but once a position is completed (no more assignments possible) we check whether it is acceptable or not.
* So highly-scored positions might be discarded because they leave a unit exposed to counterattack. It is not guaranteed that we find any completed and acceptable positions at all.
* Such failed runs are excluded (but don't worry, the AI will try to do something else with their units if a simuation run does not yield something usable).
* A run terminates when there are no more incomplete positions to work on (or if we run out of memory).
* There is deduplication logic so if we already have a position with assignments ...AB we ignore ...BA (if both are moves - for attacks the order matters!)
* I used a dataset of ~7600 runs from an AI only testgame (turn 300 to turn 500). i used the lategame because of the big armies involved but i also included smaller battles in the dataset.
* On a release build even the longest runs take less than 1 second to compute on my machine so runtime performace is good enough
On to the pictures!
1. to limit memory consumption there is a hard stop after 32000 positions per run. here we see that most runs are quite short and do not reach that limit.
2. how many positions should we evaluate to find the highest score? only very few runs really need the full length (most dots are at the bottom). blue and red dots are not overlapping, meaning sometimes the overall best score (blue) was an intermediate position whose descendants had to be discarded. the red dots are completed and accepted positions. on average we need to look at 1355 positions to find the best one but of course there is a high variance.
3. what happens if we limit the maximum number of positions to evaluate? not much. if we only allow 4000 positions, the best score is still 99.6 percent as good.
4. what happens if we limit the number of completed+accepted positions before we terminate? for the test set i allowed 1280 completed positions max. if the limit were 20 instead, the resulting scores would be 96% as good. this is the main mechanism for setting difficulty levels!
That's all for now ...
But first some definitions.
* A run of the simulation takes an initial position of friendly/neutral/enemy units and tries different assignments (move/attack/etc) for the friendly units resulting in new positions
* The positions have scores to guide the search but once a position is completed (no more assignments possible) we check whether it is acceptable or not.
* So highly-scored positions might be discarded because they leave a unit exposed to counterattack. It is not guaranteed that we find any completed and acceptable positions at all.
* Such failed runs are excluded (but don't worry, the AI will try to do something else with their units if a simuation run does not yield something usable).
* A run terminates when there are no more incomplete positions to work on (or if we run out of memory).
* There is deduplication logic so if we already have a position with assignments ...AB we ignore ...BA (if both are moves - for attacks the order matters!)
* I used a dataset of ~7600 runs from an AI only testgame (turn 300 to turn 500). i used the lategame because of the big armies involved but i also included smaller battles in the dataset.
* On a release build even the longest runs take less than 1 second to compute on my machine so runtime performace is good enough
On to the pictures!
1. to limit memory consumption there is a hard stop after 32000 positions per run. here we see that most runs are quite short and do not reach that limit.
2. how many positions should we evaluate to find the highest score? only very few runs really need the full length (most dots are at the bottom). blue and red dots are not overlapping, meaning sometimes the overall best score (blue) was an intermediate position whose descendants had to be discarded. the red dots are completed and accepted positions. on average we need to look at 1355 positions to find the best one but of course there is a high variance.
3. what happens if we limit the maximum number of positions to evaluate? not much. if we only allow 4000 positions, the best score is still 99.6 percent as good.
4. what happens if we limit the number of completed+accepted positions before we terminate? for the test set i allowed 1280 completed positions max. if the limit were 20 instead, the resulting scores would be 96% as good. this is the main mechanism for setting difficulty levels!
That's all for now ...