# AI algorithm for city improvements

Discussion in 'Civ4 - General Discussions' started by GatlingGun, Mar 19, 2008.

1. ### GatlingGunWarlord

Joined:
Feb 3, 2008
Messages:
273
Location:
Northwest USA
I've been hacking up a python script (using a C-based backend for speed) to take a an unimproved city site and show the best possible uses for those 20 tiles, given some food, production and tree constraints...

Essentially I wind up creating a multi-dimensional list of the valid possible tile improvements, and then generate combinations of those 20 improved tiles... I feed those combinations into a scoring algorithm and pop out the top seven combinations for my constraints...

The problem with this method is that 20 tiles with three improvements (assume the simplest case of all grass... so Farm, Cottage, Workshop) gives you 3**20 combinations to iterate through! Just generating this many combinations takes in excess of 1 minute and exponentiates when you consider something like 20 tiles where you have to chop trees... there are literally 1.1 trillion combinations with 20 flat tree tiles... this takes a long time, no matter what home computer you are using...

I'm assuming that the AI isn't choosing improvements like that... I'm not familiar with the Civ4 source, but is there anyone who can comment on how the AI does it? I'd be interested in comparing this algorithm against the "long way"... I realize the AI improvements are criticized for being silly at times, but I wonder whether I could reuse this and improve it somehow.

Even if someone could point me to the routine in the source, that could get me started...

Thoughts?

[Edit: as Refar alluded to below, 3**20 is the number of permutations, not combinations... so I need a better algorithm]

2. ### KrikkitTwoImmortal

Joined:
Apr 3, 2004
Messages:
12,404
Its in the Python files... I think CvTile, CvUnit or CvCity (search for "bestimprovement")

The AI does it on an individual tile basis... if amount of food is too low, the value of food is increased so all tiles are more likely to be given farms or other food boosters. Since there is a random factor, as to which ones are improved, as more farms are put in, the amount of food rises and the Value of it falls.

3. ### RefarDeity

Joined:
Apr 10, 2005
Messages:
4,608
Perhaps you could chat with Bhruic on this - he was working on something like tile scoring and improvements for his unofficial patch....

As to your idea - you need some kind of prevaluation: If you can prune a whole branch of the decision tree - before you even list all combination on this branch - you will save a lot of time.

I think food shortage might make a useful criteria for this: count food first and remove (dont even bother to list in the first place) that have to few food to feed all working pop.

Another thing is: do not need all the combinations - if the city has 20 grasslands those grasslands are all the same - it does not matter if you farm Nr.1 and cottage Nr.2 or vice versa.

There might be more usefull pruning criteria - you will have to reduce a lot, to make this sub-exponential.

4. ### r_rolo1King of myself

Joined:
May 19, 2006
Messages:
13,818
Location:
Lisbon, Portugal
I suppose that GatlingGun wants a algorythym that says to him how to squeze all the possible juice from a BFC, assuming enough and to work all tiles....

As Refar said, the best way is to first create a database of what one tile can give under diferent improvements,surrounding enviroments ( river, Moai, levee, dike , Feitoria, Golden Age..... ), possible events ( you can rule this out if you want ) and under diferent civics. Then try to maximize a certain position maximizing all the tiles... and then ( as Refar said again ), check if there is enough food. If not , go to second...

I suppose that this does not give a warrant of finding the best configuration in all scenarios ( this NP problems are a a PITA ) but it will give you a very good solution before the end of the world....

5. ### GatlingGunWarlord

Joined:
Feb 3, 2008
Messages:
273
Location:
Northwest USA
Thanks for the tip about Bhruic's work... I'll ping him and see what he says...

Regrettably, I have two topo-generation algorithms and neither is suited for pruning before I generate the possibilities... that is probably going to be my next focus, but the initial cut was just to generate all the possible combinations (see my note below)

An order-dependent series is a permutation, not a combination... most people confuse the terms so I don't blame you for pointing out the issue of which grass tile you farm vs cottage... my most thorough (and time-consuming) algorithm uses a Cartesian product of that tile improvement matrix... AFAICT, that's giving me the combinations (not permutations) of improvements... I'm not a linear algebra expert, so I could be wrong here but so far I think I'm getting combinations.... [Subsequent edit: I spoke too soon... I have counters in the script for duplicate topologies and the Cartesian product does generate duplicates... at least the way I'm using it now]

Your food comment is essentially suggesting an iterative approach after calculating the basic starting point... I have pretty much concluded that is another avenue I'm going to try... but remarkably, I have a rough hack on a cartesian product that is quick and seems to generate well over 90% of the usable possibilities... but my problem with it is twofold:

First, I got really lucky, and I am still trying to figure out why it works so well...
Second, it never took things like cottaging hills or workshops on bananas into account... and my attempts at extending it for these cases seems to have negative affects on coverage elsewhere... thus I tried the methodical (and computationally insane ) approach...

6. ### r_rolo1King of myself

Joined:
May 19, 2006
Messages:
13,818
Location:
Lisbon, Portugal
How is your algorythim that creates 90+% of the possibilities? and how time friendly it is ( i suppose that it is better than the methodical aproach, or otherwise you would still be looking at it )?

P.S This problem is one of rearrangements with repetition ( you can have repeated tile improvements ), and like that the result has to be factorial based

7. ### MyOtherNameEmperor

Joined:
Dec 7, 2004
Messages:
1,526
For the record, this general form of problem is a very hard one -- I think it's the 0-1 knapsack problem in disguise (or possibly harder than that). Of course, some instances of this problem are easier than others!

If you're looking for algorithmic ideas, here is a greedy algorithm that has worked reasonably well for me when I want to compute the optimum output of a set of tiles.

Suppose we have pre-chosen a constant for the relative value of a commerce versus a hammer... so that we have a concrete value for the total output of our tiles. (If we want to see a spectrum of possibilities, just rerun the algorithm with different constants)

First, I'm presuming we want a steady-state situation: we won't be swapping tiles every few turns, and we do not intend to grow to the next population point.

Second, I configure the city for maximum food surplus. (which might include decreasing population! e.g. so that it doesn't work a plains hill)

Then, while a food surplus still exists, I consider every possible tweak in terms of its food --> value tradeoff, and make the tweak with the best ratio.

e.g. if I consider hammers = 1 point and commerce = 0.01 points (so I'm making a hammer-specialized city, with commerce as a tie-break), there are no happiness/healthiness issues, and we won't consider specialists, and no bonus resources, then the possible tweaks might be,

(If I have all relevant tech and civics)

farm -> watermill : 1 food for 2.02 value : ratio = 2.02
farm -> workshop : 2 food for 4 value : ratio = 2
watermill -> workshop: 1 food for 1.98 value : ratio = 1.98

For grassland hills...
Not working -> windmill : 0 food for 2.01 value : ratio = infinity
Not working -> mine: 1 food for 4 value : ratio = 4
windmill -> mine: 1 food for 1.99 value : ratio = 1.99

For plains hills...
Not working -> windmill : 1 food for 3.01 value : ratio = 3.01
Not working -> mine: 2 food for 5 value : ratio = 2.5
windmill -> mine: 1 food for 1.99 value : ratio = 1.99

For desert hills...
Not working -> windmill : 1 food for 2.01 value : ratio = 2.01
Not working -> mine: 2 food for 4 value : ratio = 2
windmill -> mine: 1 food for 1.99 value : ratio = 1.99

So starting from the all-farm and no hill configuration, I would apply these tweaks in order until my food surplus reduced to 0:

windmill an unworked grassland hill: 0 food for 2.01 value : ratio = infinity * 2.01
windmill an unworked plains hill: 1 food for 3.01 value : ratio = 3.01
farm -> watermill : 1 food for 2.02 value : ratio = 2.02
windmill an unworked desert plains hill: 1 food for 2.01 value : ratio = 2.01
windmill -> mine: 1 food for 1.99 value : ratio = 1.99
watermill -> workshop: 1 food for 1.98 value : ratio = 1.98
rice farm -> watermill:

(If an unworked tile is on the river, then add it as an extra case, with its extra value)

And in this particular situation, this greedy algorithm is clearly guaranteed to find the optimal configuration.

8. ### r_rolo1King of myself

Joined:
May 19, 2006
Messages:
13,818
Location:
Lisbon, Portugal
I think that starting by the other side of the problem ( optimizing the BFC and then check if there is food avaliable, else try the second best ( after defining the F/H/C ratios of course ) ) may be more computer friendly for a great number of BFC calculations... and the greedy algorythim can get stuck in certain situations....

9. ### GatlingGunWarlord

Joined:
Feb 3, 2008
Messages:
273
Location:
Northwest USA
Typical runs without trees are rather fast... less than 5 seconds on a 3.0Ghz P4 and python 2.4.4 under linux... below I'm showing the output for a former jungle site with rice, bananas, grass and grasshills in the BFC...

Typical tree-heavy runs usually come in around 20 or 30 seconds

At the risk of being bombarded with requests for the code (right now the answer is no... sorry) I'll illustrate what it does... on the right is:
F = Food surplus (at 20 pop)
P = Production (at 20 pop)
C = Cottages
H = Local Health bonus (really a rough number at the moment)
T = Number of trees

As you can see, it only runs through about 1200 possibilities before I get an answer... my Cartesian product algorithm generates much more than this... it literally takes hours for 20 (treeless) tiles... probably because it's calculating permutations... I'm still trying to come up with a generalized algorithm that generates minimal (if no) duplicates...

Code:
```[FONT="CourierNew"]
[mpenning@tsunami ~]\$ ./civ4_city.py.orig --gsrc 1 --gsbn 1 --gs 12 --gshl 6

Total duplicate topology permutations rejected: 1055
Total unique topology combination (before pruning): 151
Number of feasible topologies (after pruning): 59
This analysis took 0.002040 minutes
=====
Top cottage options...
KEY                                                            F   P   C   H   T
gsbn:1-gs:c7-gshl:m1-gshl:wm5-gsrc:1-gs:wk5                    1  24   7   3   0
gsbn:1-gs:c8-gs:f1-gshl:wm6-gsrc:1-gs:wk3                      5  16   8   3   0
gsbn:1-gs:c8-gshl:wm6-gsrc:1-gs:wk4                            3  19   8   3   0
gsbn:1-gs:c8-gshl:m1-gshl:wm5-gsrc:1-gs:wk4                    2  21   8   3   0
gsbn:1-gs:c9-gshl:wm6-gsrc:1-gs:wk3                            4  16   9   3   0
gsbn:1-gs:c9-gshl:m1-gshl:wm5-gsrc:1-gs:wk3                    3  18   9   3   0
gsbn:1-gs:c10-gshl:m1-gshl:wm5-gsrc:1-gs:wk2                   4  15  10   3   0
=====
Top production options...
KEY                                                            F   P   C   H   T
gsbn:1-gs:c2-gs:f2-gshl:wm6-gsrc:1-gs:wk8                      1  31   2   3   0
gsbn:1-gs:c1-gs:f3-gshl:wm6-gsrc:1-gs:wk8                      2  31   1   3   0
gsbn:1-gs:f4-gshl:wm6-gsrc:1-gs:wk8                            3  31   0   3   0
gsbn:1-gs:f4-gshl:m1-gshl:wm5-gsrc:1-gs:wk8                    2  33   0   3   0
gsbn:1-gs:c1-gs:f2-gshl:wm6-gsrc:1-gs:wk9                      0  34   1   3   0
gsbn:1-gs:f3-gshl:wm6-gsrc:1-gs:wk9                            1  34   0   3   0
gsbn:1-gs:f3-gshl:m1-gshl:wm5-gsrc:1-gs:wk9                    0  36   0   3   0
=====
Top food options...
KEY                                                            F   P   C   H   T
gsbn:1-gs:f8-gshl:m1-gshl:wm5-gsrc:1-gs:wk4                   10  21   0   3   0
gsbn:1-gs:c2-gs:f7-gshl:wm6-gsrc:1-gs:wk3                     11  16   2   3   0
gsbn:1-gs:f8-gshl:wm6-gsrc:1-gs:wk4                           11  19   0   3   0
gsbn:1-gs:c1-gs:f8-gshl:wm6-gsrc:1-gs:wk3                     12  16   1   3   0
gsbn:1-gs:f9-gshl:m1-gshl:wm5-gsrc:1-gs:wk3                   12  18   0   3   0
gsbn:1-gs:f9-gshl:wm6-gsrc:1-gs:wk3                           13  16   0   3   0
gsbn:1-gs:f10-gshl:m1-gshl:wm5-gsrc:1-gs:wk2                  14  15   0   3   0
[mpenning@tsunami ~]\$
[/FONT]```
I'm also going to caveat my estimate of 90% of the usable Cartesian topologies... that is based on comparisons at about 10 to 13 workable tiles (where the Cartesian run-times are sane)... for all I know, things could diverge quickly as it evaluates many more possibilities than the approximation.

10. ### RefarDeity

Joined:
Apr 10, 2005
Messages:
4,608
I am not sure on Combinations vs Permutations - you might well be right. It was not the point however. The point was:
While in fact it's bound 20^3 (20 equal grasslands, so find c+f+w=20 with c,f,w from {1..20}) even if you do it by brute force. Reading this i assumed that you try to do it by "Very-brute-force", and that triggered my comment.

So basicly my point was - you don't need to concider a imporvement for evey single plot and should be working with terrain types instead (number of plots of a certain kind).

But now when i am see your sample output, i see that you actually trying something similar to what i had in mind.

Why do you want to have the Cartesian Product thingie, which would be much-much slower ?

11. ### GatlingGunWarlord

Joined:
Feb 3, 2008
Messages:
273
Location:
Northwest USA
My background is far more engineering than computer science... so I really appreciate the pointer on the Knapsack problem... it looks really close... just knowing the right name for it may help a lot... once you know what to call it, Google tutors the rest... thanks a mil!

This algorithm is making a couple of baseline assumptions that I'd rather not go with... as an example, it assumes fixed ratios between food/production... the goal of my script is to find the best way to build a specialized city on a given plot of land... ideally I want to pump out nearly one rifleman per turn (epic speed)... so I like to consider the goal of the city as well as food/production tradeoffs... sometimes I am surprised by what I find... maybe I expected a great production city, but it really looks better for commerce... but fixed ratios are interesting idea that I may incorporate for the majority of city sites that seem to have lackluster specialization potential...

12. ### MyOtherNameEmperor

Joined:
Dec 7, 2004
Messages:
1,526
If I've reverse engineered the output correctly, you are counting

(numbers are food / hammers)
City = 2/1
Banana = 3/0
Rice = 2/0
Grass Farm = 1/0
Grass Workshop = -1/3
Grass Mine = -1/3
Grass Windmill = 0/1

I see some obvious problems in that you've entirely missed the maximum in each category:

1 banana
1 rice
12 grassland cottages
6 grassland hill cottages
For a total of 18 cottages and 1 food surplus (and 1 commerce)

1 banana
1 rice
12 grassland farms
6 grassland hill windmills
For a total of 19 food (and 7 hammers, and 7 commerce)

The maximum production with 1 food surplus is
1 banana plantation
1 rice farm
6 grassland farms
6 grassland workshops
6 grassland hill mines
For a total of 1 food, 37 hammers. (and 1 commerce)
(This is what my greedy algorithm gave)

I'm pretty sure the maximum comes at swapping one farm for a workshop and one mine for a windmill, giving
1 banana plantation
1 rice farm
5 grassland farms
7 grassland workshops
5 grassland hill mines
1 grassland hill windmill
For a total of 0 food, 38 hammers. (and 2 commerce)

13. ### GatlingGunWarlord

Joined:
Feb 3, 2008
Messages:
273
Location:
Northwest USA
You made a fair point about brute force, and one that I was still wrestling with as I (finally) got the cartesian product to work this morning after about five hours of battle...

Yes, I agree... I don't want to generate all permutations... that clearly slows me down...

I considered brute-force because I'm trying to take this quick hack to the next level... right now, it's got several hard-coded assumptions about how you should improve a tile... so I want to make it more flexible and consider other options... for instance what happens if I put a cottage on a banana... or a grasshill... stuff like that...

The other side is that I find lots of sites that just look pretty boring... I'm hoping that I can squeeze a little more commerce and hammers out of those guys...

14. ### MyOtherNameEmperor

Joined:
Dec 7, 2004
Messages:
1,526
The zero food surplus is an important special case, because when you've reached the optimal output of a normal city, food surplus is of no value.

(The only exception I see is when you cyclically grow then whip/draft)

The point of assigning a fixed relative value between commerce and hammers is that you now have a specific objective function, and the problem can then be formulated as a standard optimization problem: maximize the objective function subject to the constraint that surplus food cannot be negative.

The fixed ratio between food and production is entirely a function of the tile improvement system and the objective you're trying to optimize. If you set the values = 1 and = 0, then (in your scenario) replacing a farm with a workshop means that you decrease food by 2 and increase value by 3, for a ratio of 1.5 value per food. But if you value = 0.3 and = 0.7, then the tradeoff for switching a farm to a workshop would be 0.45 value per food.

As for considering food surpluses -- if you are only interested in long-term output, and have decided upon the value of a hammer and of a commerce, then there's a nifty dynamic programming approach for deciding what the best way is to balance working food tiles for growth versus working productive tiles. (And the results are usually pretty unbalanced -- you usually want to heavily emphasize food up until a certain point where you suddenly barely care about food at all)

15. ### RefarDeity

Joined:
Apr 10, 2005
Messages:
4,608
I see how you want to reduce assumptions. However, that will slow you down again.

The problem here is, that if you have too many different terrain types, the running time of the type based approach will grow closer the brute-force carthesian running time. (Whith the worst case being 20 different terrain types wich only one plot of each type - which would basicle be 3^20 again - assuming 3 possible improvements per type).

This is why your code will run noticeable slower if you add forests for example.

So basicly your goal should be to make even more assumptions, to reduce the number of possible terrain types (declaring some equivalent), and perhaps reducing the number of possible improvements (for example assuming that hill-forest will allways be choped)...

16. ### GatlingGunWarlord

Joined:
Feb 3, 2008
Messages:
273
Location:
Northwest USA
Good work on decoding my city keys ... however, I make assumptions that are different than yours... you'll notice that about 90 options were "pruned"... well those 90 options include things with (what I consider) substandard production... I realize I can squeeze out a lot more cottages by sacrificing production, but I refuse to consider one of the first five city sites unless it has at least 15 (base) prod (assuming 20 pop).... otherwise, I can't even build the basic infrastructure without a lot of slave whipping and a lot of time to get to the point where I can even whip ... when I run this without pruning, I get:

Code:
```[FONT="Courier New"][mpenning@tsunami ~]\$ ./civ4_city.py.orig --gsrc 1 --gsbn 1 --gs 12 --gshl 6 --minprod 1 --noprune

Total duplicate topology permutations rejected: 1055
Total unique topology combination (before pruning): 151
Number of feasible topologies (after pruning): 151
This analysis took 0.002029 minutes
=====
Top cottage options...
KEY                                                            F   P   C   H   T
gsbn:1-gs:c11-gs:f1-gshl:wm6-gsrc:1                            8   7  11   3   0
gsbn:1-gs:c11-gs:f1-gshl:m1-gshl:wm5-gsrc:1                    7   9  11   3   0
gsbn:1-gs:c11-gshl:wm6-gsrc:1-gs:wk1                           6  10  11   3   0
gsbn:1-gs:c11-gshl:m1-gshl:wm5-gsrc:1-gs:wk1                   5  12  11   3   0
gsbn:1-gs:c12-gshl:wm6-gsrc:1                                  7   7  12   3   0
gsbn:1-gs:c12-gshl:m1-gshl:wm5-gsrc:1                          6   9  12   3   0
gsbn:1-gs:c12-gshl:m2-gshl:wm4-gsrc:1                          5  11  12   3   0
=====
Top production options...
KEY                                                            F   P   C   H   T
gsbn:1-gs:c1-gshl:wm6-gsrc:1-gs:wk11                          -4  40   1   3   0
gsbn:1-gs:f1-gshl:wm6-gsrc:1-gs:wk11                          -3  40   0   3   0
gsbn:1-gs:f1-gshl:m1-gshl:wm5-gsrc:1-gs:wk11                  -4  42   0   3   0
gsbn:1-gs:c1-gshl:m1-gshl:wm5-gsrc:1-gs:wk11                  -5  42   1   3   0
gsbn:1-gshl:wm6-gsrc:1-gs:wk12                                -5  43   0   3   0
gsbn:1-gshl:m1-gshl:wm5-gsrc:1-gs:wk12                        -6  45   0   3   0
gsbn:1-gshl:m2-gshl:wm4-gsrc:1-gs:wk12                        -7  47   0   3   0
=====
Top food options...
KEY                                                            F   P   C   H   T
gs:1-gsbn:1-gs:f11-gshl:m1-gshl:wm5-gsrc:1                    17   9   0   3   0
gsbn:1-gs:c1-gs:f11-gshl:m1-gshl:wm5-gsrc:1                   17   9   1   3   0
gsbn:1-gs:f11-gshl:wm6-gsrc:1-gs:wk1                          17  10   0   3   0
gsbn:1-gs:f12-gshl:m2-gshl:wm4-gsrc:1                         17  11   0   3   0
gsbn:1-gs:c1-gs:f11-gshl:wm6-gsrc:1                           18   7   1   3   0
gsbn:1-gs:f12-gshl:m1-gshl:wm5-gsrc:1                         18   9   0   3   0
gsbn:1-gs:f12-gshl:wm6-gsrc:1                                 19   7   0   3   0
[mpenning@tsunami ~]\$[/FONT]
```
It only finds 12 cottages and the minimum production was 9H... So you can see there is still work to be done... it never found that 18 cottage option (see my comment to Refar above about not considering GrassHill cottages at the moment)... In this case, I'm not complaining... yet it does illustrate why I'm looking for a better algorithm...

My theory is that (in general) if my top 7 options are pretty close in production and food, then I probably have good coverage in that part of the envelope... if my options widely diverge (as this case did), then I probably could use more coverage there.

17. ### GatlingGunWarlord

Joined:
Feb 3, 2008
Messages:
273
Location:
Northwest USA
Well... even on Noble and Prince, I find sites that have significant health problems... at a population of 14... so food surplus is good in those cases, but the goal of the algorithm is 0 food surplus (by default). Health (and the National Park) is also why I made it consider the number of trees... by default, it considers chopping everything... but it takes an option for the minimum number of trees to preserve. When I sort the output, health bonuses from trees are added into the sort number along with food...

Production cities are an easy decision... minimize commerce... but on the other hand, I'm not quite sure I know the relative value of a hammer vs commerce for commerce cities... I just concluded that less than 15 base hammers isn't fun to play unless you can leverage Universal Sufferage. Since that is a pretty late-game option, I eliminate anything less than 15 by default... but after building those first 5 cities, it isn't uncommon to settle for a base production of somewhere between 9 or 12...

18. ### WillemDeity

Joined:
Feb 12, 2002
Messages:
7,313
Location:
Blake would probably be a better person to contact as he's the one who did the AI for BtS.

19. ### OTAKUjbskiTK421

Joined:
Mar 4, 2007
Messages:
1,511
Location:
not at my post
I've done basic production tests comparing 4, 8, 12 and 16 total generated at the expense of working tiles (maturing or not) to build +population and appropriate +% infrastructure, and the findings were pretty surprising:

8 production resulted in a significant increase in commerce over 4.
12 production resulted in a slight increase over 8.
16 production resulted in little/no increase over 12 and was even worse than 4 in some scenarios.

NOTE: the tests were done with values common to the Ancient era and were primarily Cottage-Town vs Grassland Hill Mine and Plains Hill Mine in a city with one 4F tile (i.e. FP Farm).

My conclusion is that 8 - 12 base hammers is the magic value for Commerce Cities.

Also:

I really like what you're doing here, because I highly enjoy the topic of city specialization. I eagerly look forward to your final result.

20. ### MyOtherNameEmperor

Joined:
Dec 7, 2004
Messages:
1,526
I guess I should clarify: value doesn't have to correspond to the "worth" you ascribe to the different things. I mainly think of it as a slider I can adjust between "only hammers count" and "only commerce counts". e.g. if I decided that I need a minimum of X hammers in my city, I might start at "only commerce counts", and then adjust the relative value until the optimum configuration has enough hammers.