Optimizing your Lua code for speed

Discussion in 'Civ5 - Modding Tutorials & Reference' started by Pazyryk, Jun 5, 2012.

1. PazyrykDeity

Joined:
Jun 13, 2008
Messages:
3,584
This tutorial is a work in progress. I'll keep adding things as I test them. Please make suggestions if you want me to test a specific issue. Or post your own test. There are similar analyses elsewhere (my test here is modeled on this web page), but I thought it might be useful to have one for Civ5. See here for an excellent article on optimization by the leading architect of Lua himself.

An important point is that, in many cases, you don't need to optimize and there is no reason to bother. Run time checks to see where you need to improve something. Still, it's nice to know what runs faster and it may change the way you write your code.

All tests were run on my computer (top of the line 5 years ago) with live tuner running and a Civ5 game in progress (small map; turn 0). I put it all in a mod with a single Lua file, with code shown here (mod was rebuilt for each test):
Spoiler :
Code:
```local a = {}		--just some tables to hold stuff
local b = {}
for i = 1, 100 do
number = Game.Rand(1000000, "hello") / 1000
a[i] = number
b[i] = tostring(number)
end

local TestCode = {}

function TimeTest()

local time = {}
for j = 1, #TestCode do
time[j] = 0
end

--do 5 runs testing each code block
for i = 1, 5 do
for j = 1, #TestCode do
local Code = TestCode[j]
local start = os.clock()
Code()
time[j] = time[j] + os.clock() - start
end
end

--find smallest time
local minimumTime = 1000
for j = 1, #TestCode do
if time[j] < minimumTime then minimumTime = time[j] end
end

for j = 1, #TestCode do
local percent = math.floor(100 * time[j] / minimumTime + 0.5)
print("Code " .. j .. ": " .. tostring(time[j] / 5) .. " (" .. percent .. "%)")
end
end

-- TEST CODE -------------------------------------------------------------------------

TestCode[1] = function()	-- "Code 1"
for i = 1, 10000 do
for j, v in pairs(a) do
local x = v
end
end
end

TestCode[2] = function()	-- "Code 2"
for i = 1, 10000 do
for j, v in ipairs(a) do
local x = v
end
end
end

TestCode[3] = function()	-- "Code 3"
for i = 1, 10000 do
for j = 1, #a do
local v = a[j]
local x = v
end
end
end

--[[
TestCode[4] = function()

end

TestCode[5] = function()

end
]]
```

You can ignore the code above. Just remember that times are averaged over 5 runs (interpolated), so an OS or Civ5 dll "hickup" won't affect results much. I also run the enclosing TimeTest() function 3 times to ensure that the results are consistent within a few percent. I adjust iteration number in each test so that it is high enough for a reliable test, but not so high that the map flickers or goes black (which happens if Lua code runs for more than a couple seconds). Percent time is always relative to the fastest code.

Some tests use a table called "a". This is an array (indexes 1 to 100) that holds 100 random non-integer numbers.

Test 1: for-loops (is ipairs faster than pairs?)

Code 1:
Code:
```	for i = 1, 10000 do
for j, v in pairs(a) do
local x = v
end
end```
Code 2:
Code:
```	for i = 1, 10000 do
for j, v in ipairs(a) do
local x = v
end
end```
Code 3:
Code:
```	for i = 1, 10000 do
for j = 1, #a do
local v = a[j]
local x = v
end
end```
Results:
Code 1: 0.083000000000004 (163%)
Code 2: 0.10359999999999 (204%)
Code 3: 0.050800000000004 (100%)

Conclusions:
No, ipairs is not faster than pairs. In fact, the opposite is true. If your indexes are continuous integers, then you should use the method in Code 3 ("for i = 1, #table do" or whatever range you need). Note that code 3 can be sped up even more (maybe 2x, though I haven't tested it) if you store the value of #a so Lua doesn't have to calculate it at this step. Use pairs when your indexes are not continuous integers. There is really no reason that I can think of to use ipairs.

Test 2: Localization of a simple math function

Code 1
Code:
```	for i = 1, 1000000 do
local x = math.log10(222.222)
end```

Code 2
Code:
```	local log10 = math.log10
for i = 1, 1000000 do
local x = log10(222.222)
end```
Results:
Code 1: 0.16780000000003 (171%)
Code 2: 0.098199999999952 (100%)

Conclusion:

Test 3: Table lookup vs. localized constant

Code 1
Code:
```	for i = 1, 1000000 do
local x = GameInfo.Improvements.IMPROVEMENT_FARM.ID
end```
Code 2
Code:
```	for i = 1, 1000000 do
local x = GameInfo.Improvements["IMPROVEMENT_FARM"].ID
end```
Code 3
Code:
```	for i = 1, 1000000 do
local x = GameInfoTypes.IMPROVEMENT_FARM
end```
Code 4
Code:
```	local IMPROVEMENT_FARM_ID = GameInfo.Improvements.IMPROVEMENT_FARM.ID
for i = 1, 1000000 do
local x = IMPROVEMENT_FARM_ID
end```
Results:

Code 1: 0.19180000000001 (1744%)
Code 2: 0.19599999999991 (1782%)
Code 3: 0.053000000000111 (482%)
Code 4: 0.010999999999967 (100%)

Conclusion:
Getting an integer from GameInfoTypes is much faster than from GameInfo. However, if you are going to reference a table item a lot (e.g., you are iterating over all plots and testing terrain, improvement or feature types), it is much faster to put whatever values you need in specific (localized!) constants. My suggestion is to put all of these in a "local defines" section at the top of every Lua file.

Test 4: Inserting table items

Code 1:
Code:
```	local c = {}
for i = 1, 500000 do
table.insert(c, "some text")
end```
Code 2:
Code:
```	local insert = table.insert
local c = {}
for i = 1, 500000 do
insert(c, "some text")
end```
Code 3:
Code:
```	local c = {}
for i = 1, 500000 do
c[#c + 1] = "some text"
end```
Code 4:
Code:
```	local c = {}
local position = #c
for i = 1, 500000 do
position = position + 1
c[position] = "some text"
end```
Results:
Code 1: 0.16999999999989 (392%)
Code 2: 0.13500000000004 (311%)
Code 3: 0.11140000000014 (257%)
Code 4: 0.043399999999929 (100%)

Conclusion:
table.insert is slower than the simpler (imho) method in Code 3, especially if you don't localize the insert function. However, you can get a big boost by keeping track of table position in an external variable (Code 4).

Test 5: Recycling/reusing tables (vs. re-initing)

Code 1:
Code:
```	for i = 1, 10000 do
local tempTable = {}	-- re-init table at each iteration
for j = 1, #a do
tempTable[j] = a[j]
end
end```

Code 2:
Code:
```	local tempTable = {}
for i = 1, 10000 do
for j = 1, #tempTable do	-- recycle table (clean out old contents first)
tempTable[j] = nil
end
for j = 1, #a do
tempTable[j] = a[j]
end
end```
Code 3:
Code:
```	local tempTable = {}
local tempTablePosition = 0		-- reuse table without cleaning previous contents (keep position in external variable)
for i = 1, 10000 do
tempTablePosition = 0
for j = 1, #a do
tempTablePosition = tempTablePosition + 1
tempTable[j] = a[j]
end
end```
Results:
Code 1: 0.13920000000007 (165%)
Code 2: 0.11739999999991 (139%)
Code 3: 0.084400000000005 (100%)

Conclusion:
Better to recycle/reuse your tables rather than letting Lua clean up after you. There is a lot of behind-the-scenes action in code 1 even though it looks simpler. Lua has to init and grow the table with each iteration, plus garbage collect tables from previous iterations. However, Lua is quite fast at this, so take advantage of it in cases where code simplicity is more important than a modest speed improvement.

Test 6: GameInfo.Units() iteration: condition test in Lua vs. SQL

Code 1:
Code:
```	for i = 1, 100 do
for row in GameInfo.Units() do
if row.Cost < 200 then
local x = row.Type
end
end
end```
Code 2:
Code:
```	for i = 1, 100 do
for row in GameInfo.Units("Cost < 200") do
local x = row.Type
end
end```
Results:
Code 1: 0.096799999999985 (130%)
Code 2: 0.074400000000014 (100%)

Conclusion:
Better to pass the condition test through SQL. But note the tiny number I had to use on the i iterator here. Both of these tests were horrendously slow! That brings me to the next test...

Test 7: Using GameInfo.Units() iterator versus iterating over the same table data cached in Lua

Code 1:
Code:
```	local n = 0
for i = 1, 1000 do
for row in GameInfo.Units("Cost < 200") do
n = n + 1
end
end
print(n)```
Code 2:
Code:
```	local tempTable = {}
for row in GameInfo.Units() do
tempTable[row.ID] = row
end

local highestID = #tempTable

local n = 0
for i = 1, 1000 do
for j = 0, highestID do
local row = tempTable[j]
if row.Cost < 200 then
n = n + 1
end
end
end
print(n)```
Results:
Code 1: 0.74800000000001 (6800%)
Code 2: 0.01099999999999 (100%)
--prints 57000 in both cases

Conclusion:
Iterators like GameInfo.Units() are really very slow. I'm totally guessing, but perhaps these iterate through the actual DB rather than C++ cached values (which is crazy!). In any case, if you need speed in a DB table iteration, cache the table (or the parts needed) into a Lua table and iterate over that instead.

To be continued if folks show any interest...

Joined:
Jun 13, 2008
Messages:
3,584
reserved...

3. PazyrykDeity

Joined:
Jun 13, 2008
Messages:
3,584
I'm redoing tests 1 to 3. I had an assignment to a global x rather than a local x inside the loop. Assignment to global was slow enough to partly mask the speed differences (just magnitude; no real difference in conclusion).

4. DanthorWarlord

Joined:
Feb 17, 2006
Messages:
289
Gender:
Male
Location:
Punta Arenas, Chile
Nice!

5. PazyrykDeity

Joined:
Jun 13, 2008
Messages:
3,584
I renamed Tests 6 and 7 above because I'm not really sure if these involve DB traversal or not. I suspect that they do because they are sooooo slow. GameInfo.Units and similar constructs are a strange animal: they are type userdata, which is (if I understand it) a call to some C++ code. So GameInfo.Units() returns an iterator (a function) and GameInfo.Units[arg] returns a table. It's hard to say how the GameInfo.Units() iterator works, but it's clearly slower than Lua iteration using Lua-cached table data.

6. bc1

Joined:
Jan 12, 2004
Messages:
1,266
Has anyone tested whether the GameInfo. iterator works any faster than a DB. sql query ? Thanks.

7. PazyrykDeity

Joined:
Jun 13, 2008
Messages:
3,584
Haven't tested it. I suspect they would be similar. In both cases you are sending info from Lua to C++ (takes time), then C++ has to build an SQL query (a text string), then SQLite has to compile that text string into its own code, then SQLite runs it, then C++ has to get the results from SQLite, convert it into C++ values, and then port it back out to Lua in the form of a table. Each row in the iteration requires a whole new Lua table to be inited from scratch and then grown by Lua's table routines. There is really no way that this could be a fast process.

If you need speed, then do what the dll does: cache the database values into a "native" form. For Lua, this would be tables, as I did in Test #7. You can cache whole tables or just the parts you need. Do this anywhere speed is an issue.

8. bc1

Joined:
Jan 12, 2004
Messages:
1,266
Testing shows GameInfo is more than 7 times faster than DB.Query

9. qqqbbbPrince

Joined:
Sep 25, 2010
Messages:
530
When running test 7, line "if row.Cost < 200 then" gives error "attempt to index local 'row' (a nil value)". How can there be nil entries in tempTable?

10. whoward69DLL Minion

Joined:
May 30, 2011
Messages:
8,532
Location:
Near Portsmouth, UK
If you've deleted units from the Units table, there will be gaps in that, and as tempTable is created from the <Units> table using its rowId value any gaps will be carried over.

Replace
Code:
```tempTable[row.ID] = row
```
with

Code:
```table.insert(tempTable, row)
```
and any gaps will be closed up

Otherwise, just run the tests on an unmodded game

11. qqqbbbPrince

Joined:
Sep 25, 2010
Messages:
530
I am running tests on unmodded game but there are gaps in vanilla Units table.

12. qqqbbbPrince

Joined:
Sep 25, 2010
Messages:
530
Something worth mentioning:
Code:
```for unitsRow in GameInfo.Units() do
for policyRow in GameInfo.Policies() do
if unitsRow.PolicyType == policyRow.Type then
end
end
end```
Caching Units table in lua does not decrease code execution time. Caching Policies table does.