play balancing the civs

davidlallen · Dec 6, 2009

That is fine, assuming you mean "industrious" for "scientific". We did not add scientific. That leaves aggressive and spiritual tied for the lead with 5 LH each, all the others at 3-4 each, and Mechanized at 2.

EDIT: Oh, wait. That would make Goya's traits same as Prad Vidal. Not fatal, but it should be possible to avoid.

Ahriman · Dec 6, 2009

Here's the Excel sheet with updated leader values for the 3 new leaders.

I tweaked a few of the existing values, so its probably better to read in *all* of the values than just the new ones. Also, I decided that Vernius fit the existing Ix values better than Malky, and that Goya fit the existing Tleilaxu values better than Scytale.

Here is a list of adjectives that help describe some of the more extreme behaviours:

Spoiler :

*edit*
Values changed twice (see post below)

Ahriman · Dec 6, 2009

That is fine, assuming you mean "industrious" for "scientific". We did not add scientific.

Why not? It seems appropriate to me.

Oh, wait. That would make Goya's traits same as Prad Vidal

Prad Vidal should probably be aggressive/industrious.

davidlallen · Dec 6, 2009

Ahriman said:
I tweaked a few of the existing values, so its probably better to read in *all* of the values than just the new ones. Also, I decided that Vernius fit the existing Ix values better than Malky, and that Goya fit the existing Tleilaxu values better than Scytale.

Great, thanks. I will shove these into the file.

Why not? It seems appropriate to me.

That is nice. Adding a trait surprisingly requires changing 8-10 files, and the incremental benefit seems small. I will put it on the to-do list (but not near the top.)

Prad Vidal should probably be aggressive/industrious.

I find it difficult to adjust one at a time this way. Perhaps the count of leaders per trait is not important to you, but this would make 6 aggressive (the largest number of any trait) and only two protective (the smallest number of any trait except for Ix Mechanized).

Ahriman · Dec 6, 2009

Adding a trait surprisingly requires changing 8-10 files, and the incremental benefit seems small. I will put it on the to-do list (but not near the top.)

Ok, I didn't realize it was so complex.

Can we change creative to Political though? Just the name-change?

I find it difficult to adjust one at a time this way

Ok, I am relatively indifferent.
I also don't see it as a big problem for two leaders of different factions to have the same traits. In vanilla, thats a problem, because the trait combinations are really all that distinguish the leaders, but there we have more faction variation.

Ahriman · Dec 7, 2009

Just realized, serious error in the sheet.

I had misinterpreted iMaxWarRand.
"A factor that affects how likely an AI leader will start a "total" war.
(50 very likely; 400 unlikely)"

I had thought that higher number was more likely to start a total war.
So the values are all the wrong way.

So, it should be:
Alia 100
Armand 200
Vladimir 100
Rabban 50
Leto 400
Shaddam 50
Vernius 200
Feyd-Rautha 100
Gaius Moiham 400
Margot 400
Leto II 200
Liet-Kynes 200
Goya Solidar 150
Paul 100
Irulan 200
Wensiscia/Elrood 150
Roma 200
Stilgar 100
Executrix 150

Malky 200
Prad 100
Scytale 200

(Version above corrected)

*edit*
Similarly for iLimitedWarRand, and for iDogpileWarRand.
Version above changed thrice.

davidlallen · Dec 9, 2009

We have discussed about creating a 9x1 spreadsheet showing the relative power of each civ. I did an experiment to test out one possible method. If the method is useful, we can start to get some actual statistics about whether one civ is weaker or stronger.

Here is what I did.

1. Take a three player map, standard mapscript, normal speed, noble difficulty and save it as a WorldBuilder file.

2. Make three copies of the map. Each copy has the same map and start locations, but each civ starts in a different location. Suppose the start locations are A,B,C and the civs are X,Y,Z. For game 1, civ X starts at A, civ Y starts at B, and civ Z starts at C. For game 2, civ X starts at B, civ Y starts at C, and civ Z starts at A. You could compute how many different possibilities there are; I picked three.

3. Autoplay each of the three games for 450 turns, printing the total score of each civ into a file once per 10 turns. As it happens, two of the games were won by diplomatic victory and one completed with no victor.

4. Plot each of the nine score lines on a graph. You can see the two winners where the score graph sticks "out of the top" of the other graphs and stops.

5. Since one game finished around turn 355, the last time at which I have scores for all three games is turn 350. I computed the average score for each *start position* at time 350. One of the start positions consistently had a higher score, regardless of which player started there; I did not look into the map, but presumably that start location was a little richer. So I decided to use a handicap factor and decrease the scores of the player in that position to 0.66 of the actual value.

6. Each civ had three scores, so I used the handicap value and then computed the average score of the three civs. The raw data, graph and computations are in the attached excel file.

The result:
Atreides: 7858
Bene Tleilaxu: 6408
Bene Gesserit: 4944

If this method has any validity, the conclusion is that BG is a little underpowered and Atreides is a little overpowered.

I chose three, simply because it was a small size experiment and I could manually create the files in step 2. If the method seems interesting, I can expand it to all 9 civs and 9 games. I will have to write scripts to generate the maps and extract the results. Running the games will take a few hours but I have a spare laptop I can use as a small server farm.

Any questions or feedback on the method? I don't have a good way to compensate for games which are won at different turns, I can only pick the min turn. I don't have a good way to compensate for different traits either.

Ahriman · Dec 9, 2009

A very interesting idea.

My thoughts:

a) AI auto-testing is good, but is only part of the story. Balance for the human player is not the same as balance for the AI; the human player can exploit some mechanics much more than the AI can (and a few go the other way; the human player suffers more against Tleilaxu plague than they benefit from the plague as a human playing as Tleilaxu).
But BG spy powers, espionage, and diplomatic manipulation will always be more powerful in the human player's hands and weaker in the AI's. So will the BG Kwizatz Haderach units; the human is better at knowing to accompany super-stacks.

b) Which leaders did you use? Trait balance is very important here. Financial trait is incredibly powerful if you're using a version where windtraps still give +2 commerce from game start.

c) Obviously Tleilaxu are much more powerful in a 3-player setting than in a ~9 player setting. In a 3-player settnig, their diploamtic penalties from the religion are unimportant. Similarly, the diplomatic penalties from terraforming (everyone except Atreides and Fremen tend to hate you) are lower.
Diplomatic win in three players is obviously much easier, if one other person votes for you, you win.

d) Can you use a wider color palate for the graphs? Its very hard to see which is which for some cases (eg C-BG vs C-BT)

e) Score is not always a great metric of power, because it is so population based. In our mod, a Spice-civ and a terraforming civ might have the same economy size, but the terraforming civ would have a much higher power rating, because it would have more citizens, because of the higher water income. Citizens count for score, spice resources don't.
This is one of the main reasons why Atreides rates high on the graph.

Its an interesting method, but very limited. I don't think it really tells us anythnig about balance that we couldn't learn better through experience playing games as a human, or introspective analysis. And I don't think it lets us conclude that Atreides is overpowered and Bene Gesserit are underpowered. All it really tells us is that financial trait (potentially) and terraforming are very powerful in a 3-player environment, and that the AI isnt' very good at using espionage.

* * *
Adding a spice silo building (=+0.15 gold per spice resource, cheap building that requires arrakis spice civic and spice industry tech) might also help. Or maybe +0.2 gold per spice resource.

davidlallen · Dec 9, 2009

Ahriman said:
a) AI auto-testing is good, but is only part of the story. Balance for the human player is not the same as balance for the AI

Yes, but on the other hand, feedback from human players is subjective and cannot be guaranteed to come in sufficient volume to make the best decisions. For example, have you played a full game as each of the 9 civs? And when 1.7 comes out will you play another such 9 games?

b) Which leaders did you use? Trait balance is very important here. Financial trait is incredibly powerful if you're using a version where windtraps still give +2 commerce from game start.

I'm using a version where windtraps do not give commerce. But in general I agree with your point about traits. I was trying to limit it to 9 games by overlooking the contribution from traits. For some civs, the difference between one LH and another may be small. For example, all three Harkonnen leaders are aggressive, and both Ix leaders are Mechanized. Can you suggest a way to be independent of traits, or a separate experiment to focus on traits?

c) Obviously Tleilaxu are much more powerful in a 3-player setting than in a ~9 player setting.

Do not focus too much on the selection of 3 players. This was just to prove out the method. One could consider doing the experiment separately for each different number of players. For example, a set of 2-player games would give some information about the 9x9 matrix which we have discussed. My plan was to only do the experiment on a 9-player game, so each civ is represented once.

d) Can you use a wider color palate for the graphs? Its very hard to see which is which for some cases (eg C-BG vs C-BT)

All the data is there in the spreadsheet, and the graph is on a separate tab. If you view the graph in the spreadsheet, the hover help for each line tells you which is which.

Ahriman · Dec 9, 2009

Yes, but on the other hand, feedback from human players is subjective and cannot be guaranteed to come in sufficient volume to make the best decisions.

Agreed. Hence, a mix of both is needed.

Can you suggest a way to be independent of traits, or a separate experiment to focus on traits?

Create dummy leaders for the factions who have no traits. Create a dummy faction that has no UUs or UBs or unique mechanics, but has leaders with different trait combinations.

Do not focus too much on the selection of 3 players. This was just to prove out the method.

Ok. Then, similarly, do not focus too much on the balance results from the 3-player method, they could be quite different to the full-9player results.

I think this method would be a useful test for a 9-player game, with some of the caveats about score as a measurement device above.

All the data is there in the spreadsheet, and the graph is on a separate tab. If you view the graph in the spreadsheet, the hover help for each line tells you which is which.

Yes, but something weird happened to your colors in going from the graph in the Excel sheet to the image you posted - several colors turned to shades of brown. It would be nice to have the forum picture be easy to read, so results are more readily apparent to casual observers.

davidlallen · Dec 9, 2009

Ahriman said:
Create dummy leaders for the factions who have no traits.

I don't think it is quite that simple. For example, Ix has Mechanized trait, which is actually critical to enable promotions on their vehicles. You have not seen that yet because it is part of 1.7. Thinking a little further, once the 9 games are run, I could do another run of 9 after replacing one leader with another. Then any *difference* in their score would be attributed to the different traits. Unfortunately if I change two civs' leaders and one civ's score changes, it could actually be due to the *other* leader. For example, if my neighbor did not have aggressive trait, and now he does, my score may go down because he attacks me more effectively. Sigh. Doing 9 runs, then changing one leader and doing another 9, requires 9 * (22 - 9) runs, which is too many. Perhaps I will just try to pick out the LH which "seems" to have more advantage, and then not worry about the effect of changing traits.

Then, similarly, do not focus too much on the balance results from the 3-player method, they could be quite different to the full-9player results.

Agreed. This is just an experiment to show what information could be derived.

I think this method would be a useful test for a 9-player game, with some of the caveats about score as a measurement device above.

Can you suggest a different metric besides score? I also have statistics on total commerce and total population. But score is easy for players to understand.

Ahriman · Dec 9, 2009

I don't think it is quite that simple

Ix is the only faction where this is an issue, and even then the main benefit of Mechanized is for the base stat-boost on their vehicles. Having leaders with no traits is the easiest way to test the mechanics without interference from the traits.

The way I would test it isn't to run things adding and removing leader traits, which has lots of complex interactions, as you suggest.
The way I would do it is to run things with a new dummy leader for *every* faction, who has no traits.
If you're worried about Ix, then leave the Ix leader with Mechanized, but change the Mechanized promotion to be a placeholder for the promotions, without the strength/withdraw chance.

Can you suggest a different metric besides score? I also have statistics on total commerce and total population. But score is easy for players to understand.

Can you record several metrics? Surely the main time constraint is running the simulations.
Total commerce can be useful, but it will undercount spice benefits again if these benfits give +GOLD, rather than +commerce.
So, score, total commerce, total gold+beakers, population would all be useful. But thats a lot of data.
So, score and gold+beakers would probably be most useful.

Another thing to note: AI personalities differ. Some AI strategies are more or less effective than others. Some AI leaders are builders, others are warmongers. So, another variable that affects AI play, but doesnt' affect human-controlled faction power. Does being easily bought into a war make you stronger or weaker? How about being highly aggressive, and willing to attack even factions with higher power ratings? How about diverting resources towards espionage or wonder construction?

davidlallen · Dec 9, 2009

Ahriman said:
Can you record several metrics? ... So, score, total commerce, total gold+beakers, population would all be useful. But thats a lot of data. So, score and gold+beakers would probably be most useful.

I can record any number of metrics, but then I will just be overwhelming myself with data. When you say "gold+beakers", what do you mean? There is one total commerce count; then you have the sliders to allocate this among gold, research, culture and espionage; then those are multiplied by some building multipliers.

I suppose that multiplication must happen at a city by city level. So if I have a total commerce of 100, and 50% is going to research, and I have a building in one city which gives +50% research, I am not sure what number is expected for total beakers. Somewhere between 50 and 75, but not 75. Do you see what I mean? I am not sure where to even see the real total beakers in the GUI.

Another thing to note: AI personalities differ. Some AI strategies are more or less effective than others. Some AI leaders are builders, others are warmongers. So, another variable that affects AI play, but doesnt' affect human-controlled faction power.

That is a good point, and another argument against making fake leaders with no traits. In order to do anything, we have to give up on studying several variables. I have already given up on game size (9 player only), speed (normal only) and handicap (noble only) all of which will undoubtably affect the results. I will pick one real LH for each civ to conduct the experiment and give up on studying traits as well.

Ahriman · Dec 9, 2009

I can record any number of metrics, but then I will just be overwhelming myself with data. When you say "gold+beakers", what do you mean? There is one total commerce count; then you have the sliders to allocate this among gold, research, culture and espionage; then those are multiplied by some building multipliers.

I literally mean the number of beaker output per turn + the number of gold output per turn (after all the building multipliers and such) for the entire civ as a whole.

Yes, commerce goes into gold, beakers and potentially culture/espionage.
HOWEVER: not all gold and beakers come from commerce. Gold from a new house spice firm (+3 gold per spice resource), or from a specialist economy (scientist + merchant give gold and beakers, not commerce) are not recorded in total commerce.

So, an Arrakis spice user running a specialist economy will have large amounts of gold and beaker income that is not coming from commerce.

If all you do is compare commerce scores of different factions, then you risk building an inaccurate picture of economic strength; an arrakis paradise user running a cottage economy will seem much more powerful than an arrakis spice user with a specialist economy, even if their economy output levels of beakers and gold are similar.

Hence, total beakers + total gold is probably a more useful measure than total commerce.

I suppose that multiplication must happen at a city by city level. So if I have a total commerce of 100, and 50% is going to research, and I have a building in one city which gives +50% research, I am not sure what number is expected for total beakers. Somewhere between 50 and 75, but not 75.

Yes, but just report the aggregates.
If you have 2 cities, A and B. city A has +50% beakers, +0% gold (from buildings). City B has +25% gold, +0% beakers. City A has 60 commerce, city B has 40 commerce. Beaker slider is 50/50 beaker gold.
Assume NO beaker/gold income from anywhere else.
Then:
City A has 45 beakers, 30 gold.
City B has 20 beakers, 25 gold.
Faction has 65 beakers, 55 gold. So beakers + gold = 130, while commerce = 100.

Now suppose that cities C and D for a different faction have the same +50/0 and +0/25 buildings. Assume city C has 45 commerce, and city D has 30 commerce. But assume city C has 4 scientists = 12 beakers from specialists. And city D has 3 merchants = 9 gold from specialists.
Then:
City C has 51.75 beakers and 22.5 gold
City D has 15 beakers and 30 gold.
Faction has 66.75 beakers and 52.5 gold. So beakers + gold = 119.25, while commerce = 75.

So the A/B faction has 100 commerce vs 75 commerce from faction C/D, but their actual economic strength is only slightly higher (130 vs 119).

I am not sure where to even see the real total beakers in the GUI.

Aren't total beakers and total gold income reported in the economy screen?
Total beakers are certainy reported next to the slider.
To demonstrate this; create a bunch of scientist specialists. Then move your science slider all the way to 0%, so 0% of your commerce is transformed into beakers. Note that your beaker income is still positive, because of the scientists.

That is a good point, and another argument against making fake leaders with no traits

How is it an argument against making no-trait leaders?
You could easily create a dummy leader who had average values for every AI parameter, and had no traits.

Having said all this, I think just picking 9 leaders, and normal mapsize, epic speed (I think works better than normal - the mod doesn't play well on normal IMO, it takes too long to move armies around) and Noble difficulty and recording a few variables will give us most of what we want.
We just need to keep traits and AI behavior in mind as background behavior.

davidlallen · Dec 13, 2009

Using the method in this post and the final version of 1.7, I did one complete set of AI runs on a 9-player map. I've attached two spreadsheets to show the result. The bottom line is, based on score, Ecaz is a little too powerful, and Fremen is a little too weak. Also, based on the score by position, the mapscript should try harder to make sure everybody starts about the same distance from the center. Positions "too close" to the center consistently did better.

First, look at manyleaders-posn.xls. This shows the scores at turn 310 for all 9 games. Because one player won a diplomatic victory at turn 318, this is the latest turn for which I have data on all the games. The map inside the sheet shows the locations; sorry if the text is a little hard to read. The graph has one line for each start position, including 9 games and the average. The key is row L, which shows the position handicap value I derived. If a position always had higher scores, that means it is a good start position and we should decrease the scores of players at that position to normalize. So a handicap value of 1.48 (the highest) is the worst start position, and a handicap value of 0.71 (the lowest) is the best start position. Theoretically, multiplying the scores by these values should remove the effect of a better start position.

Second, look at manyleaders-score.xls. This summarizes the 9 games. I thought it would be a little better to get data from later in the game, so I excluded the one game which was won "early", and then stopped at the point just before the second game was won. So I chose turn 470 for this analysis. The yellow highlighted column shows the key statistic. After normalizing by the handicap value for start position from sheet 1, this column is the average score of the civ across all the games.

Since Ecaz is much higher than all the others, I conclude Ecaz is too strong. Since Fremen is much lower than all the others, I conclude Fremen is too weak. This is probably because the AI is not using their desert movement well enough, but it could be because they are no longer allowed to build vehicles. I am a little surprised that Corrino did well, since they have no specific UU or abilities. I am a little surprised that Tleilax did poorly, since players report that fighting against the plague is very annoying.

I can put up the final save games if anybody is interested in looking; you will only be able to use version 1.7 to view them.

Ahriman · Dec 13, 2009

Some very nice work here.

My comments:
1. Faction design is not completed or fully implemented, so trying to do faction balance at this stage is somewhat premature.
2.If we did want to weaken Ecaz, the best way would probably be to revert back to the standard Mushtamal (ie remove sculptor's garden) and have the +1 trade route on a later building UU. After some testing, I agree with Sylvnn that the extra trade route on the mushtamal building is too soon.
3. Fremen issue is mostly AI-related.
4. I suspect that near-center vs farm from center issue is to do with the AI's tendency for rushing to colonize the pole. Quite often, *every* AI faction rushes to try to colonize the pole, but only the near ones succeed. THis means that an for the non-close factions, an early setter spends many many wasted turns in a scout thopter going for the pole but failing to get there, when it could have been a city somewhere growing instead.
While trying to get the mapscript to place more equal distance to center is one solution, this could significantly reduce the value of the start places (the more constraints you add, the harder it is to optimize). I would think a better solution might be improved city selection algorithm that places a higher cost on distance, so factions on the outside don't try to colonize the pole.
Changing polar from 2w to 1w1h might help with this too, since there would no longer be quite as much a rush for the pole (water counts more than hammers).
Also, faction on the "outside" often start on small junk islands - look at start position 0 here.
I'm also finding that 24% land seems to be leading to slightly better maps than 22% - maybe we should consider changing the default?
Also, maybe we should have the mapscript delete any islands that are size ~3 or less, since they're pretty useless.

5. Corrino strength may come from the preference for Imperial religion, so they try to found it, which can help them get a diplomatic victory.
Tleilaxu do poorly I suspect because the diplomacy penalty really does add up a great deal.
I wonder if the AI parameters also seem to work. Rabban's extreme aggression seems valuable, and I wonder if Ecaz's trade-with-anyone is also valuable, whereas Bene Gesserit relative pacifism does poorly.

davidlallen · Dec 13, 2009

Ahriman said:
1. Faction design is not completed or fully implemented, so trying to do faction balance at this stage is somewhat premature.

I think this experiment is one *input* to faction balancing, which is the point of this particular thread. For example, if I can fix the problem with Fremen AI not attacking on water, and nerf the Sculptor's Garden, then re-run, we can see if the min and max get closer.

I'm also finding that 24% land seems to be leading to slightly better maps than 22% - maybe we should consider changing the default? Also, maybe we should have the mapscript delete any islands that are size ~3 or less, since they're pretty useless.

"Better" may be subjective. Better how? Also, how would the game change if we deleted the size 3 islands?

Ahriman · Dec 13, 2009

I think this experiment is one *input* to faction balancing, which is the point of this particular thread.

This is a fair point, and I agree. It can help guide us, and evaluate the impact of changes.

"Better" may be subjective. Better how?

These are of course subjective.

More decent city spots, fewer useless city spots or islands that are too small to support a city, more connected land that is contested, cultural contesting actually important, more space for resources, larger graben/saltpan "sinks" that actually look like sinks (rather than little 2-3 tile channels).

Also, how would the game change if we deleted the size 3 islands?

Fewer junk city spots.
More aesthetically pleasing look; I think the small 1 and 2 tile islands look bad.
Remove worrying about the AI issue of unloading on small islands.
Less human exploit of building forts on the small islands (just inside culture) to expand culture range and capture more spice.

davidlallen · Dec 13, 2009

OK, I agree with these points on the mapscript. I am not making any more changes to 1.7 apart from updating with deliverator's patch 2, but we can put these on the list for 1.7.1.

Ahriman · Dec 13, 2009

OK, I agree with these points on the mapscript. I am not making any more changes to 1.7 apart from updating with deliverator's patch 2, but we can put these on the list for 1.7.1.

Fine with me, these are non-urgent. Thanks.

play balancing the civs

Deity

Tyrant

Attachments

Tyrant

Deity

Tyrant

Tyrant

Deity

Attachments

Tyrant

Deity

Tyrant

Deity

Tyrant

Deity

Tyrant

Deity

Attachments

Tyrant

Deity

Tyrant

Deity

Tyrant

Similar threads