[R&F] Use machine learning to select better city settlement locations

Kebnoa · Dec 19, 2018

FearSunn said:
If you are talking here about getting yields data it is super easy. Just loop cities at the end of each turn and simply print each yield into Lua.log in csv compatible format... Or maybe I am not quite understand your needs here.

Yep, I think you don't understand my needs. In order to process the data it needs to be in a format I can easily manipulate later. I chose to create it, in-game, as a json string and print it to the lua.log.

The 2 problems I couldn't get around are that a single row in the lua.log file craps out at just over 2000 characters, and then trying to write more than 500 rows to the lua.log also craps out as well. Hence the rather convoluted LivePanel panel to extract the data.

If anyone has a better suggestion on how to capture and extract data at this scale I would love to hear from you. If anyone know how I can get at the data without any coding or mods I would be even happier ;-) This is one of the reasons I reduced my data gathering scope.

Victoria said:
It was a long time ago and a little incorrect. The basic answer was production without food and vice versa is not good, happy balance. Mainly it comes down to growth being better until about 4 pop to a degree, strongest tile wins but initially a strong production tile like 1-4 will likely drag you down and with loyalty now important it certainly tips food initially... but it is situational and I really appreciate that feel in civ 6. Naturally a,entities and housing limits change the food value later amd the choice is about value of many districts.

Hi @Victoria, do you have a link to this handy? I would like to see how you approached it and see if I can redo it with the data I captured.

FWIW, I now have a database of 502 cities, all settled within the first 3 turns of the game and each city's per turn yields and tooltip strings. One thing I am thinking of trying to do is work out for example the average food, prod, gold etc per turn at turns 5, 10, 15 etc.

I didn't capture the build order, and regret it a little. Would have been interesting to look at what impact this has.

Arent11 · Dec 19, 2018

Kebnoa said:
What I found interesting simply from a numerical perspective is the cumulative yields and the variability and shape of the distributions. Check out the Food yields graph for example.

Food in the end is limited by housing, so the best way would maybe be to add up cumulative production, science and culture to gain a rough estimate how "productive" a city is after 50 turns. Gold could be added as 1/2 production (You can pay 4 gold for 1 production, but gold is spent for important upgrades & more flexible), faith could also be 1/2 production, but there might be disagreement about the "value" of faith.

Victoria said:
It was a long time ago and a little incorrect. The basic answer was production without food and vice versa is not good, happy balance. Mainly it comes down to growth being better until about 4 pop to a degree, strongest tile wins but initially a strong production tile like 1-4 will likely drag you down and with loyalty now important it certainly tips food initially... but it is situational and I really appreciate that feel in civ 6. Naturally a,entities and housing limits change the food value later amd the choice is about value of many districts.

You always need to start somewhere & the first steps are always strongly simplified. Nothing wrong with that.

Trav'ling Canuck · Dec 19, 2018

Arent11 said:
Food in the end is limited by housing

The value of food is also limited by the amount of amenities you have. For the purposes of this experiment, though, you can probably set the amount of available amenities to be constant and compare the science, culture, faith, gold, and production values of the various cities over time.

One benefit of lower pop cities is that they do not need as many amenities, which allows them to sell their extra luxuries for gold. As long as you're tracking the pop of the city, this extra yield can be tracked artificially to get a closer approximation of the city's true yield compared to a higher pop city.

Victoria · Dec 19, 2018

Kebnoa said:
do you have a link to this handy?

I would have to drag it out and it was inaccurate anyways so a bit embarrassed.
It was a simplified production vs food analysis with tiles from 0-4 to 4-0 being worked. So it was to clarify if just working 2-2 tiles was better than working 3-1 then 1-3 tiles.
AI is an area of potential risks ... bad in = bad out. It is not the wonder many think it is and its not like it spits out at you all the answers at the end. Gotta get those irobot rules in there and Asimov showed holes in them, let alone what a programmer first - player second, codes.

It is a complex area with settling on a lux, adjacency, river placement, defensive hill, culture vs science vs faith, position of enemy, the way enemies view each other currently and likely to in the future.... and so forth

We cannot even agree on the value of things let alone everything else.

I am an analyst for a job, my boss tells me I am a very good one... there are so many variables involved in civ that it is going to be a bumpy road to any answer and any answer will be heavily criticised, so its a thankless task if you want to take it on. Hat off to you if you can convince me your answer is right with a 90%+ success rate.

Kebnoa · Dec 19, 2018

Trav'ling Canuck said:
The value of food is also limited by the amount of amenities you have. For the purposes of this experiment, though, you can probably set the amount of available amenities to be constant and compare the science, culture, faith, gold, and production values of the various cities over time.

One benefit of lower pop cities is that they do not need as many amenities, which allows them to sell their extra luxuries for gold. As long as you're tracking the pop of the city, this extra yield can be tracked artificially to get a closer approximation of the city's true yield compared to a higher pop city.

I need to go double check it, but from working through the top and bottom yields per city I noticed that they were frequently "unhappy". Also on average amenities available exceeded amenities needed. A long way of saying that at Prince level the impact of city happiness is neglible?

Victoria said:
I would have to drag it out and it was inaccurate anyways so a bit embarrassed.
It was a simplified production vs food analysis with tiles from 0-4 to 4-0 being worked. So it was to clarify if just working 2-2 tiles was better than working 3-1 then 1-3 tiles.
AI is an area of potential risks ... bad in = bad out. It is not the wonder many think it is and its not like it spits out at you all the answers at the end. Gotta get those irobot rules in there and Asimov showed holes in them, let alone what a programmer first - player second, codes.

It is a complex area with settling on a lux, adjacency, river placement, defensive hill, culture vs science vs faith, position of enemy, the way enemies view each other currently and likely to in the future.... and so forth

We cannot even agree on the value of things let alone everything else.

I am an analyst for a job, my boss tells me I am a very good one... there are so many variables involved in civ that it is going to be a bumpy road to any answer and any answer will be heavily criticised, so its a thankless task if you want to take it on. Hat off to you if you can convince me your answer is right with a 90%+ success rate.

No problem from my side. More curious than anything else.

Jeje - I have no plans on creating an AI for Civilization. Far too complicated. I did enjoy creating this model though, and might tune it a little based on the feedback received so far.

Trav'ling Canuck · Dec 19, 2018

I personally think it's a very interesting exercise. I have a checklist of things that I look for when settling the first couple of cities, and I'd be curious to see how well the data aligns to my current perception.

It's most important use could be as a way to benchmark the value of different starting positions.

_Calyx · Dec 19, 2018

This is a great idea, and I applaud you for doing it!

Someone mentioned Settler production earlier as a potential issue, and I think this is important to take into account. I don't know anything about modding or how you record the data, but could you add a specific build order into your mod? That would control everything build order related; another (better) option might be to simply use the mod to disable Settler building and record what is actually built in the city.

Another issue that may need to be taken into account is tile improvements - clearly these are occurring over the turn range of your data collection, but your algorithm isn't learning about them? If you can get the data out of the game, this would be a good thing to include.

Finally, I agree with others that the total yields probably aren't the best measuring stick to use. I really like the suggestion from @Arent11 to try to convert gold and faith to production, though that still leaves you with science, culture, and most importantly food. I doubt its possible to convert all of these to a single unit in any reliable way.

Great job though!

Victoria · Dec 19, 2018

_Calyx said:
and most importantly food

see, this is where it goes a bit wrong, a culture tile at the start is a lovely little gift while 3 are not 3 times the value because you progress too fast for the inspirations.
A faith plains tile can be lovely to settle on but not to have next to your city.
Both can be much more important than food initially... or not.

_Calyx · Dec 19, 2018

Victoria said:
see, this is where it goes a bit wrong, a culture tile at the start is a lovely little gift while 3 are not 3 times the value because you progress too fast for the inspirations.
A faith plains tile can be lovely to settle on but not to have next to your city.
Bith can be much more important than food initially... or not.

Yeah, I agree - by 'most importantly food' I meant that food is probably the most difficult to compare to the other yields. Obviously some is good, but having a ton of food is much worse than a ton of most other yields - unlike the other yields, I would argue food gets less important the more you have of it (the marginal benefit is lower). Probably needs to be modeled with some type of scaling function, or maybe a step function.

Trav'ling Canuck · Dec 19, 2018

_Calyx said:
Yeah, I agree - by 'most importantly food' I meant that food is probably the most difficult to compare to the other yields. Obviously some is good, but having a ton of food is much worse than a ton of most other yields - unlike the other yields, I would argue food gets less important the more you have of it (the marginal benefit is lower). Probably needs to be modeled with some type of scaling function, or maybe a step function.

Or just let the raw data show what the raw data shows. Food's only good to the extent it eventually turns into one of the other yields through higher population. Tracking those other yields over time would give some evidence to the relative value of each extra source of early food.

Kebnoa · Dec 19, 2018

Trav'ling Canuck said:
I personally think it's a very interesting exercise. I have a checklist of things that I look for when settling the first couple of cities, and I'd be curious to see how well the data aligns to my current perception.

It's most important use could be as a way to benchmark the value of different starting positions.

Thank you. I'd love to see the outcome of your comparison to your own checklist.

_Calyx said:
This is a great idea, and I applaud you for doing it!

Someone mentioned Settler production earlier as a potential issue, and I think this is important to take into account. I don't know anything about modding or how you record the data, but could you add a specific build order into your mod? That would control everything build order related; another (better) option might be to simply use the mod to disable Settler building and record what is actually built in the city.

Another issue that may need to be taken into account is tile improvements - clearly these are occurring over the turn range of your data collection, but your algorithm isn't learning about them? If you can get the data out of the game, this would be a good thing to include.

Finally, I agree with others that the total yields probably aren't the best measuring stick to use. I really like the suggestion from @Arent11 to try to convert gold and faith to production, though that still leaves you with science, culture, and most importantly food. I doubt its possible to convert all of these to a single unit in any reliable way.

Great job though!

Thank you, appreciated. I actively chose to NOT record build sequence or worker actions. It isn't too difficult to capture. My issues were around how to incorporate it into the model. At the moment the input data is static and hence easy to use in a ML model. When you start using "time series" data it becomes more complicated and I didn't want to go there. (yet)

Victoria said:
see, this is where it goes a bit wrong, a culture tile at the start is a lovely little gift while 3 are not 3 times the value because you progress too fast for the inspirations.
A faith plains tile can be lovely to settle on but not to have next to your city.
Both can be much more important than food initially... or not.

When I play I check the trade-off between time to next pop vs time to complete production by selecting different tiles. Sometimes I go food, sometime production. I haven't tried to formalise it. I usually settle for a balance between the two.

_Calyx said:
Yeah, I agree - by 'most importantly food' I meant that food is probably the most difficult to compare to the other yields. Obviously some is good, but having a ton of food is much worse than a ton of most other yields - unlike the other yields, I would argue food gets less important the more you have of it (the marginal benefit is lower). Probably needs to be modeled with some type of scaling function, or maybe a step function.

This is exactly why I decided to use deciles and score based on yield. For example the 1098 cumulative Food has the same score as the 768.8 cumulative Production. That is, they are both in the top 10% of results and equal weighting based on their relative values. (Relative to that yield across all cities). I literally couldn't think of a more objective means of scoring cities.

Trav'ling Canuck said:
Or just let the raw data show what the raw data shows. Food's only good to the extent it eventually turns into one of the other yields through higher population. Tracking those other yields over time would give some evidence to the relative value of each extra source of early food.

It might be interesting to check for correlations between each city's cumulative Food and Production curves to see what is visible.

Infixo · Dec 19, 2018

Kebnoa said:
Please .expand a little more on your reasoning.
I was surprised to see the AI does in fact and relatively frequently moves before settling. Sometimes it even moves twice before doing so. If you look in the raw data you can find many examples where cities are settled on the 2nd or 3rd turn of the game.

I have the same concerns as @Lily_Lancer
The AI already used an algorthm to evaluate settle spots. It has several parameters that can be modded. It even adjusts them to the passing time.
So, the subset of cities you have is what AI already thinks is good.
Until we get DLL source code we only can play with the parameters. I would gladly see some reseach or analysis showing if they can be improved.

Kebnoa · Dec 19, 2018

Infixo said:
I have the same concerns as @Lily_Lancer
The AI already used an algorthm to evaluate settle spots. It has several parameters that can be modded. It even adjusts them to the passing time.
So, the subset of cities you have is what AI already thinks is good.
Until we get DLL source code we only can play with the parameters. I would gladly see some reseach or analysis showing if they can be improved.

Hi Infixo,

What parameters are you referring to? Please keep in mind that I am new to modding. If you look at the logger code you'll see it isn't exactly sophisticated.

When you say showing if they can be improved, what do you mean? I am assuming you mean the AI's, but what would you want to improve? Their choice of tile to settle on? Thy order in which they build things? Which tiles they choose?

Maybe a better experiment would be where a couple of hundred people play the exact same map, start, leader combo. Maybe this will give us a better indication of what really matters :-)

Infixo · Dec 19, 2018

@Kebnoa
AiList PlotEvaluations:

Coastal
Cultural Pressure
Foreign Continent
Fresh Water
Inner Ring Yield
Nearest Friendly City
New Resources
Resource Class
Specific Feature
Specific Resource
Total Yield

AiList SettlementPreferences

SETTLEMENT_ADDITIONAL_VALUE_PER_CITY
SETTLEMENT_CITY_MINIMUM_VALUE
SETTLEMENT_CITY_VALUE_MULTIPLIER
SETTLEMENT_DECAY_AMOUNT
SETTLEMENT_DECAY_TURNS
SETTLEMENT_MIN_VALUE_NEEDED

Afaik, the first one (PlotEvals) is used to evaluate the potential settlement location. Since you checked only cities settled in first 3 turns, StandardSettlePlot is used mainly (few leaders use a bit modified params). So, basically fresh water is very important, can be coastal, res luxes and strats are important, iron is very important (btw, AI cheats here, same for Niter). Prefers to get new resources. Doesn't like Ice. Probably calculates Food and Production for inner ring, but also production for entire range. So, in the end I suppose Production is King. With RnF - it doen't go where loyalty pressure is, prefers pressure-free places.
The second one (SettlePrefs) is used to make a decision if that spot is good enough. The second list works in a dynamic way, i.e. the longer the spot searched for, the lower the expectations get (decay).

Arent11 · Dec 19, 2018

Infixo said:
Inner Ring Yield

So, they just sum the whole yields of the inner ring? Or is it more difficult?

Would explain why the city settlement places the ai suggests are so bad. I almost never settle in those places & almost always prefer to have at least one high yield (2/2) square nearby & settle on an additional yield (production, food, gold, culture) tile.

Infixo · Dec 19, 2018

Arent11 said:
So, they just sum the whole yields of the inner ring? Or is it more difficult?
Would explain why the city settlement places the ai suggests are so bad. I almost never settle in those places & almost always prefer to have at least one high yield (2/2) square nearby & settle on an additional yield (production, food, gold, culture) tile.

Without access to the source code we can only guess. Probably just adds yields, could be with some weight applied internally.

Kebnoa · Dec 20, 2018

Okay, so after a good night's rest I think the results of the experiment highlights a difference in people's expectation when you use the term "Machine Learning".

Those who assume Machine Learning (ML) = Artificial Intelligence (AI) are most likely disappointed by this experiment as this is not in anyway, even remotely, AI territory.

What this result tells us is what a machine learning classification algorithm learned as being useful tiles to look for based on an artificial criteria of a "good" city I decided.

What I found particular interesting about the result is the NOT settling next to a river is quantifiably negative. It doesn't determine if the city will be good, but it tells you that not settling next to a river is a poor choice. (On average given the limited information provided). Likewise, the Lux seems a no brainer to me. Also, the fact that the model values GrasslandMountains is telling. You can't work a mountain tile, but it does provide adjacency bonuses. The fact that the model detected this with limited information is interesting to me.

What I would love to hear from you is whether you strongly agree with, or disagree with any of the "recommendations" based on your playing experience.

Lily_Lancer said:
As a machine learning researcher, I seriously doubt what your system have learnt. From the description on Github, I guess it just learned from how AIs settle their cities.

I used to think AIs don't move their capital, it seems that I'm wrong. But this doesn't change my result here, your approach just learned how AIs settle their capitals.

Hi @Lily_Lancer ,

I feel you are being overly pessimistic here, but I appreciate the feedback regardless, and am keen to understand your viewpoint.

The model takes in settled plot information and then tries to classify how "good" that city will be after playing 50 turns. Granted, the dataset is small, and not 100%, but it doesn't invalidate the fact that there is huge variability on yields after 50 turns. Determining the types of tiles that best determines this is more than "learn how the AI settle their capitals" :-)

Please tell me more about how you reached your conclusion.

Leyrann · Dec 20, 2018

Lily_Lancer said:
As a machine learning researcher, I seriously doubt what your system have learnt. From the description on Github, I guess it just learned from how AIs settle their cities.

I used to think AIs don't move their capital, it seems that I'm wrong. But this doesn't change my result here, your approach just learned how AIs settle their capitals.

I would say this doesn't matter. He doesn't look at where cities are settled, but rather how well those cities perform after having settled. Sure, the dataset you're looking at is skewed towards what the AI prioritizes, but if those cities consistently perform worse than a subset of where the AI chose differently, then that subset will be considered the 'better' cities by the system.

For example, AIs relatively often settle a tile away from the river, and apparently the AI doesn't properly value fresh water (I believe there is even a bug where they only check for a tile in the city having fresh water, rather than the city center tile having fresh water?). So what happens if the AI settles on the river half of the times, and a tile away the other half of the times? This system detects that the first set of cities performs consistently better (due to having an easier time growing), and therefore says "settling next to fresh water is a positive", even if the AI doesn't get that.

Arent11 · Dec 20, 2018

Kebnoa said:
Hi @Lily_Lancer ,

I feel you are being overly pessimistic here, but I appreciate the feedback regardless, and am keen to understand your viewpoint.

The model takes in settled plot information and then tries to classify how "good" that city will be after playing 50 turns. Granted, the dataset is small, and not 100%, but it doesn't invalidate the fact that there is huge variability on yields after 50 turns. Determining the types of tiles that best determines this is more than "learn how the AI settle their capitals"

Please tell me more about how you reached your conclusion.

He means that the cities were probably not being settled randomly. To compare all possible outcomes, you would need to have randomly placed cities which you compare.

Kebnoa said:
Here is the results of the model:

2 or more Grassland (Hills) with Woods tiles are great. (1 is better than none, though)

1x Luxury is great. (More than 1 isn't significant, not having one is significantly negative).

2x Stone is good. (1 is better than none)

Bonus resources are good and are significant in this order of preference:

1x Bananas tile.

1x Rice tile, more than 1 isn't significant.

2 or more Wheat tiles.

1x Deer tile

1x Fish tile

2x Plains with Woods is good.

2x Plains (Hills) with Rainforest is good.

Minimal, or no, Grassland with Woods tiles are preferable.

Plains with Rainforest are positive.

8 or more Grassland tiles are positive. (Less than this is generally negative)

4 or more Grassland (Hills) tiles are positive.

1x Coast and Lake tile is marginally positive, more than this is negative.

Minimal Plains tiles are preferable.

1 or more Grassland Mountain tiles are positive

Not settling next to a River is negative.

As comparison to my own experience/approach:

(1) to maximize city jump start, it is always good to settle on or near high yield tiles.
(2) Grassland forest hills are 2/2, which is good.
(3) Bananas, stone etc. can also result in relatively high yield tiles, which can jumpstart your cities
(4) Settling near coast might be acceptable, there is not always a river around & you might still want to colonize such an area even if growth might be slightly impaired
(5) To much desert, plains, tundra etc. might obviously hamper your growth

Everything else can often be adressed by improving the land.

Lily_Lancer · Dec 20, 2018

Leyrann said:
I would say this doesn't matter. He doesn't look at where cities are settled, but rather how well those cities perform after having settled. Sure, the dataset you're looking at is skewed towards what the AI prioritizes, but if those cities consistently perform worse than a subset of where the AI chose differently, then that subset will be considered the 'better' cities by the system.

For example, AIs relatively often settle a tile away from the river, and apparently the AI doesn't properly value fresh water (I believe there is even a bug where they only check for a tile in the city having fresh water, rather than the city center tile having fresh water?). So what happens if the AI settles on the river half of the times, and a tile away the other half of the times? This system detects that the first set of cities performs consistently better (due to having an easier time growing), and therefore says "settling next to fresh water is a positive", even if the AI doesn't get that.

I mean the dataset is biased to only those location that AI prefers. So it is only learning "how AI performs on AI preferred starting locations.", that makes no sense to human players, since in Civ 6 AI basically play at a terribly low level.

[R&F] Use machine learning to select better city settlement locations

Chieftain

Emperor

Deity

Regina

Chieftain

Deity

Warlord

Regina

Warlord

Deity

Chieftain

Deity

Chieftain

Deity

Emperor

Deity

Chieftain

Deity

Emperor

Deity

Similar threads