Hammers: Advantage Irgy since he has a temple in Delhi while mine is still 4 turns away.
Delhi's population: Advantage Mitch by 1 pop, which should lead to some advantage down the road.
Here you are discussing probably the most complicated calculation of the two approaches.
Knowing the exact differences of Basic Inputs will also likely help us to decide whether to whip the Confucian Temple or to build it manually.
Probably the "fairest" way to compare is to play two identical games. So, we'd want one person to save their test saved game at the point where a Temple would be whipped. Then, two scenarios starting from that saved game would be performed and compared against each other:
a) Don't whip
VS
b) Whip
Now, that part sounds straight-forward, but the real question becomes: for how long do we continue to compare the two scenarios?
Well, let's break things down.
In scenario b), you trade Food for Hammers. However, you also trade the opportunity to work more squares with citizens or to run more Great People with citizens.
In scenario a), you may or may not run into a Happiness cap, which may or may not cause you to work different squares in exchange for intentionally delaying growth.
If you are lucky enough to be near a Happiness cap level, the scenarios are more easily comparable to each other, as eventually you'll both be at the same City Size as each other.
If not, though, then the effect of not whipping but instead getting Food, Hammers, Commerce, and Great People Points from the extra citizen or citizens can drag on for a really long time.
I suppose that you could say something like:
Well, if we whip at the ideal time in scenario b), there will be one turn where we're behind by two citizens and then for a while we'll only be behind by one citizen. However, if you still have Happiness available to grow into with scenario a), you'll soon be back to two citizens ahead of scenario b).
The extra 1 Commerce per turn from working a River square plus the extra Food and Hammers received from working that square can be totalled-up using a spreadsheet. It might even be that such a square brings in more than 1 Commerce (by working a Cottage) or brings in GPP by instead running a Specialist.
Further complicating things are Maintenance Costs that may or may not be reduced by having a smaller City (generally, Maintenance in your capitol is pretty small relative to other Cities in your empire, but it could still play a factor). Also, you may or may not have Science bonuses from the extra Commerce in scenario a) (the non-whipping scenario) that either give us an additional Flask from the Library's bonus or an additional Flask from the tech's Science bonus (such as the 20% bonus to Science for knowing a tech's pre-requisite). To remove these items from complicating things, you simply need to play from the same saved game and also keep track of these numbers. If you play both scenarios out equivalently for all other factors (Worker actions, what other Cities do, etc) for a good number of turns, then you can simply grab the finalized Science and Gold values obtained, in order to see some of the differences there.
That way, you won't need to specifically calculate the Commerce value gained, you'll just need to calculate how the Food and Hammers worked.
On the plus side for scenario a) is that you get extra turns that citizens can work squares.
On the plus side for scenario b) is that you require less Food to grow your City to the next population size.
If you play for long enough, such that you've grown the City a couple of times in each scenario, then the total amount of Food won't necessarily have to be added-up like it would in a scenario where you played only a few turns.
Similarly, if you played for long enough, you could compare the items built and how many remaining Hammers you have invested at the end, to get a feel for which scenario, in the long-run (say, after 20 turns) ended up with more Hammers total.
Now, in order to make these comparisons "fair," you almost have to "cheat" a bit. For example, if we built The Pyramids and/or a Temple a bit earlier, the temptation would be to run a Priest or two a bit earlier. However, if you do so, you need to use spreadsheets to keep track of all of the variables.
Instead, forget the spreadsheets and aim to run specialists at the exact same time as each other. The whipping scenario simply then gets more time to "grow back" some of the lost population by not working Specialists earlier.
My suggestion, for the "easiest" and "fairest" comparison would be not to run ANY Specialists until after The Pyramids would be built in the slowest scenario (the non-Temple-whipping scenario) and run the same amount of Specialists at the same time in each scenario, such that the Specialists would equally benefit from the Representation bonus.
Yes, it is a bit more unrealistic, but what we are doing here is more accurately comparing whether or not the Food for Hammers tradeoff from whipping, which includes a tradeoff of citizens not working squares, is a worthwhile tradeoff.
Once we have those numbers available, we can say "well, we'll be ahead by X Hammers and Y Great People Points in this scenario but we'll be behind by Z Flasks, A Gold, and B Food. In addition, such-and-such a scenario gets us The Pyramids a bit faster, which gives us the opportunity to run Specialists a bit faster... do we care about this fact?"
Note that by playing out another 20 turns or so, you can also factor in the extra GPP from getting The Pyramids a bit earlier, making the numerical comparison even more fair. The only unfair part then is simply the fact that we can switch to Representation for higher-powered specialists sooner in one scenario than we used, so that judgment would have to be a qualitative one. However, we'd have turned all other values into quantitative values, so the comparison of them will be a lot easier.
The fastest way to do all of this stuff would be to fortify most units except for the Workers that chop and do things identically in the rest of our empire in both scenarios.
A complication might result in the non-whipping Scenario if you have extra citizens in Delhi that might have been able to work a Cottage had you built one, so you may need to have the Workers build an extra river-side Cottage if Delhi will grow large enough to work it before hiring Specialists. Build the same Cottage at the same time in both scenarios, but if Delhi won't be growing much more than that (due to building a Settler and hiring Specialists), then you can safely fortify the Workers to make the testing run quicker.
Another complication might result if you chop unequal numbers of Forests--chop the same number of Forests in both scenarios, if possible, and if not possible because a strategy would be messed-up by chopping 1 less Forest, then chop that extra Forest in BOTH scenarios, netting the extra Hammers in the scenario that didn't need the Forest chop for a different build item.
A final, minor complication could result from chopping Forests on different turns from each other--if in one scenario you chop 4 Forests in one turn but in the other you chop those same Forests over a span of 3 turns, your Organized Religion bonus and Stone bonus might not be fairly compared. However, if you chop 4 Forests on Turn 132 in one scenario while you chop 4 Forests on Turn 134 in the other scenario, the comparison will be about as fair as it can possibly be, since you'll get all of the same amount of Hammers being inputted on a single turn.
I suggest that after building the Temple, we build a Settler, but I'm not sure what to put in the build queue next. Pick SOMETHING (another Settler, a building, a Missionary) and build the same item next in both scenarios. It is extremely likely that the items built will not be completed on the same turns as each other, but as long as you keep building the same items in the same order in both scenarios, the comparisons should be as fair as they can be.
One more caveat is to ensure that you research the same techs and adjust the Science rate on identical turns in each scenario. Therefore, if you are approaching having 0 Gold in one scenario, switch to 100% Gold sooner than you think you'd need to, just so that you won't "run out" of Gold in the other scenario. That's about as fair as I can possibly suggest for comparing Flasks vs Gold across the two scenarios "easily" (i.e. without requiring a spreadsheet).
Playing 20 turns from the time that we whipped may or may not give us an accurate enough picture, but it is enough time that we should get a good picture of the short-term impact of whipping.
Other tricky parts come in, such as how soon we can run Bureaucracy. I would suggest switching to Bureaucracy at the same time in both scenarios, meaning that you'll do so on the earliest turn possible that the slowest scenario would allow for.
But what about running 4 Scientists? One scenario might not allow you to do so while another might? That's not an easy question to answer.
Here's where I would suggest you save the game again and start tracking a THIRD scenario:
a) Don't whip, but run 4 Scientists on XXXX BC
b) Whip, and run as many Scientists as you can on XXXX BC
c) Don't whip, and run as many Scientists as you can in scenario b) on XXXX BC
So, while it means a bit more work, we can more fairly compare the different scenarios here, while not ignoring the potential to regain some of the GPP from getting The Pyramids sooner by running another Scientist for GPP later, by comparing scenarios a) and b) together.
We also have a fairer way to compare the Food, Hammer, Flask, and Gold output by comparing scenarios b) and c) together.
Scenarios a) and c) would then be compared against each other just to see what the baseline differences are.
I suggest picking "XXXX BC" as the same date in either scenario, even if it means delaying the hiring of the Scientists in one scenario, just to again keep the comparisons as close as possible to each other, so that we can avoid having to generate spreadsheets to track all of the differences.
There are indeed a number of inaccuracies here--in the real game, we may perform other actions, such as whip a building that might reduce our Distance or Civic Maintenance or might increase our Unit Cost Maintenance, or perhaps build a Trade Route by Roads that might make us switch around our Science Rate on a different turn and thus the turn that we hire a Specialist for its additional Flasks might not provide us with identical Flask-bonuses, but what we care more about are comparing the scenarios relatively to each other with as many other factors as possible remaining equal, so that we can easily quantitatively determine which scenario is better than the other.
If someone wants to go nuts and compare values using spreadsheets, I warn you that it is going to be a very intensive piece of work, taking up many more hours of your time, but go nuts if you'd rather do so. Just keep a lot of saved games if you do, as invariably, someone is going to point out a missing value or two in your spreadsheets and you'll have to load up old saved games to look for said values on a turn-by-turn basis (such as Maintenance costs for Civics on a given turn, since they may differ by a bit across scenarios due to Delhi's size being different).