Optimizing your Lua code for speed

Pazyryk

Deity
Joined
Jun 13, 2008
Messages
3,584
This tutorial is a work in progress. I'll keep adding things as I test them. Please make suggestions if you want me to test a specific issue. Or post your own test. There are similar analyses elsewhere (my test here is modeled on this web page), but I thought it might be useful to have one for Civ5. See here for an excellent article on optimization by the leading architect of Lua himself.

An important point is that, in many cases, you don't need to optimize and there is no reason to bother. Run time checks to see where you need to improve something. Still, it's nice to know what runs faster and it may change the way you write your code.

All tests were run on my computer (top of the line 5 years ago) with live tuner running and a Civ5 game in progress (small map; turn 0). I put it all in a mod with a single Lua file, with code shown here (mod was rebuilt for each test):
Spoiler :
Code:
local a = {}		--just some tables to hold stuff
local b = {}
for i = 1, 100 do
	number = Game.Rand(1000000, "hello") / 1000
	a[i] = number
	b[i] = tostring(number)
end

local TestCode = {}

function TimeTest()

	local time = {}
	for j = 1, #TestCode do
		time[j] = 0
	end

	--do 5 runs testing each code block
	for i = 1, 5 do
		for j = 1, #TestCode do
			local Code = TestCode[j]
			local start = os.clock()
			Code()
			time[j] = time[j] + os.clock() - start
		end
	end

	--find smallest time
	local minimumTime = 1000
	for j = 1, #TestCode do
		if time[j] < minimumTime then minimumTime = time[j] end
	end

	for j = 1, #TestCode do
		local percent = math.floor(100 * time[j] / minimumTime + 0.5)
		print("Code " .. j .. ": " .. tostring(time[j] / 5) .. " (" .. percent .. "%)")
	end
end

-- TEST CODE -------------------------------------------------------------------------

TestCode[1] = function()	-- "Code 1"
	for i = 1, 10000 do
		for j, v in pairs(a) do
			local x = v
		end
	end
end

TestCode[2] = function()	-- "Code 2"
	for i = 1, 10000 do
		for j, v in ipairs(a) do
			local x = v
		end
	end
end

TestCode[3] = function()	-- "Code 3"
	for i = 1, 10000 do
		for j = 1, #a do
			local v = a[j]
			local x = v
		end
	end
end

--[[
TestCode[4] = function()

end

TestCode[5] = function()

end
]]
You can ignore the code above. Just remember that times are averaged over 5 runs (interpolated), so an OS or Civ5 dll "hickup" won't affect results much. I also run the enclosing TimeTest() function 3 times to ensure that the results are consistent within a few percent. I adjust iteration number in each test so that it is high enough for a reliable test, but not so high that the map flickers or goes black (which happens if Lua code runs for more than a couple seconds). Percent time is always relative to the fastest code.

Some tests use a table called "a". This is an array (indexes 1 to 100) that holds 100 random non-integer numbers.

Test 1: for-loops (is ipairs faster than pairs?)

Code 1:
Code:
	for i = 1, 10000 do
		for j, v in pairs(a) do
			local x = v
		end
	end

Code 2:
Code:
	for i = 1, 10000 do
		for j, v in ipairs(a) do
			local x = v
		end
	end

Code 3:
Code:
	for i = 1, 10000 do
		for j = 1, #a do
			local v = a[j]
			local x = v
		end
	end

Results:
Code 1: 0.083000000000004 (163%)
Code 2: 0.10359999999999 (204%)
Code 3: 0.050800000000004 (100%)

Conclusions:
No, ipairs is not faster than pairs. In fact, the opposite is true. If your indexes are continuous integers, then you should use the method in Code 3 ("for i = 1, #table do" or whatever range you need). Note that code 3 can be sped up even more (maybe 2x, though I haven't tested it) if you store the value of #a so Lua doesn't have to calculate it at this step. Use pairs when your indexes are not continuous integers. There is really no reason that I can think of to use ipairs.

Test 2: Localization of a simple math function

Code 1
Code:
	for i = 1, 1000000 do
		local x = math.log10(222.222)
	end


Code 2
Code:
	local log10 = math.log10
	for i = 1, 1000000 do
		local x = log10(222.222)
	end

Results:
Code 1: 0.16780000000003 (171%)
Code 2: 0.098199999999952 (100%)

Conclusion:
Localize your functions!

Test 3: Table lookup vs. localized constant

Code 1
Code:
	for i = 1, 1000000 do
		local x = GameInfo.Improvements.IMPROVEMENT_FARM.ID
	end

Code 2
Code:
	for i = 1, 1000000 do
		local x = GameInfo.Improvements["IMPROVEMENT_FARM"].ID
	end

Code 3
Code:
	for i = 1, 1000000 do
		local x = GameInfoTypes.IMPROVEMENT_FARM
	end

Code 4
Code:
	local IMPROVEMENT_FARM_ID = GameInfo.Improvements.IMPROVEMENT_FARM.ID
	for i = 1, 1000000 do
		local x = IMPROVEMENT_FARM_ID
	end

Results:

Code 1: 0.19180000000001 (1744%)
Code 2: 0.19599999999991 (1782%)
Code 3: 0.053000000000111 (482%)
Code 4: 0.010999999999967 (100%)

Conclusion:
Getting an integer from GameInfoTypes is much faster than from GameInfo. However, if you are going to reference a table item a lot (e.g., you are iterating over all plots and testing terrain, improvement or feature types), it is much faster to put whatever values you need in specific (localized!) constants. My suggestion is to put all of these in a "local defines" section at the top of every Lua file.

Test 4: Inserting table items

Code 1:
Code:
	local c = {}
	for i = 1, 500000 do
		table.insert(c, "some text")
	end

Code 2:
Code:
	local insert = table.insert
	local c = {}
	for i = 1, 500000 do
		insert(c, "some text")
	end

Code 3:
Code:
	local c = {}
	for i = 1, 500000 do
		c[#c + 1] = "some text"
	end

Code 4:
Code:
	local c = {}	
	local position = #c
	for i = 1, 500000 do
		position = position + 1
		c[position] = "some text"
	end

Results:
Code 1: 0.16999999999989 (392%)
Code 2: 0.13500000000004 (311%)
Code 3: 0.11140000000014 (257%)
Code 4: 0.043399999999929 (100%)

Conclusion:
table.insert is slower than the simpler (imho) method in Code 3, especially if you don't localize the insert function. However, you can get a big boost by keeping track of table position in an external variable (Code 4).

Test 5: Recycling/reusing tables (vs. re-initing)

Code 1:
Code:
	for i = 1, 10000 do
		local tempTable = {}	-- re-init table at each iteration
		for j = 1, #a do
			tempTable[j] = a[j]
		end
	end


Code 2:
Code:
	local tempTable = {}
	for i = 1, 10000 do
		for j = 1, #tempTable do	-- recycle table (clean out old contents first)
			tempTable[j] = nil
		end
		for j = 1, #a do
			tempTable[j] = a[j]
		end
	end

Code 3:
Code:
	local tempTable = {}
	local tempTablePosition = 0		-- reuse table without cleaning previous contents (keep position in external variable)
	for i = 1, 10000 do
		tempTablePosition = 0
		for j = 1, #a do
			tempTablePosition = tempTablePosition + 1
			tempTable[j] = a[j]
		end
	end

Results:
Code 1: 0.13920000000007 (165%)
Code 2: 0.11739999999991 (139%)
Code 3: 0.084400000000005 (100%)

Conclusion:
Better to recycle/reuse your tables rather than letting Lua clean up after you. There is a lot of behind-the-scenes action in code 1 even though it looks simpler. Lua has to init and grow the table with each iteration, plus garbage collect tables from previous iterations. However, Lua is quite fast at this, so take advantage of it in cases where code simplicity is more important than a modest speed improvement.

Test 6: GameInfo.Units() iteration: condition test in Lua vs. SQL

Code 1:
Code:
	for i = 1, 100 do
		for row in GameInfo.Units() do
			if row.Cost < 200 then
				local x = row.Type
			end
		end
	end

Code 2:
Code:
	for i = 1, 100 do
		for row in GameInfo.Units("Cost < 200") do
			local x = row.Type
		end
	end

Results:
Code 1: 0.096799999999985 (130%)
Code 2: 0.074400000000014 (100%)

Conclusion:
Better to pass the condition test through SQL. But note the tiny number I had to use on the i iterator here. Both of these tests were horrendously slow! That brings me to the next test...

Test 7: Using GameInfo.Units() iterator versus iterating over the same table data cached in Lua

Code 1:
Code:
	local n = 0
	for i = 1, 1000 do
		for row in GameInfo.Units("Cost < 200") do
			n = n + 1
		end
	end
	print(n)

Code 2:
Code:
	local tempTable = {}
	for row in GameInfo.Units() do
		tempTable[row.ID] = row
	end

	local highestID = #tempTable

	local n = 0
	for i = 1, 1000 do
		for j = 0, highestID do
			local row = tempTable[j]
			if row.Cost < 200 then
				n = n + 1
			end
		end
	end
	print(n)

Results:
Code 1: 0.74800000000001 (6800%)
Code 2: 0.01099999999999 (100%)
--prints 57000 in both cases

Conclusion:
Iterators like GameInfo.Units() are really very slow. I'm totally guessing, but perhaps these iterate through the actual DB rather than C++ cached values (which is crazy!). In any case, if you need speed in a DB table iteration, cache the table (or the parts needed) into a Lua table and iterate over that instead.

To be continued if folks show any interest...
 
I'm redoing tests 1 to 3. I had an assignment to a global x rather than a local x inside the loop. Assignment to global was slow enough to partly mask the speed differences (just magnitude; no real difference in conclusion).
 
I renamed Tests 6 and 7 above because I'm not really sure if these involve DB traversal or not. I suspect that they do because they are sooooo slow. GameInfo.Units and similar constructs are a strange animal: they are type userdata, which is (if I understand it) a call to some C++ code. So GameInfo.Units() returns an iterator (a function) and GameInfo.Units[arg] returns a table. It's hard to say how the GameInfo.Units() iterator works, but it's clearly slower than Lua iteration using Lua-cached table data.
 
Haven't tested it. I suspect they would be similar. In both cases you are sending info from Lua to C++ (takes time), then C++ has to build an SQL query (a text string), then SQLite has to compile that text string into its own code, then SQLite runs it, then C++ has to get the results from SQLite, convert it into C++ values, and then port it back out to Lua in the form of a table. Each row in the iteration requires a whole new Lua table to be inited from scratch and then grown by Lua's table routines. There is really no way that this could be a fast process.

If you need speed, then do what the dll does: cache the database values into a "native" form. For Lua, this would be tables, as I did in Test #7. You can cache whole tables or just the parts you need. Do this anywhere speed is an issue.
 
When running test 7, line "if row.Cost < 200 then" gives error "attempt to index local 'row' (a nil value)". How can there be nil entries in tempTable?
 
How can there be nil entries in tempTable?
If you've deleted units from the Units table, there will be gaps in that, and as tempTable is created from the <Units> table using its rowId value any gaps will be carried over.

Replace
Code:
tempTable[row.ID] = row

with

Code:
table.insert(tempTable, row)

and any gaps will be closed up

Otherwise, just run the tests on an unmodded game ;)
 
I am running tests on unmodded game but there are gaps in vanilla Units table.
 
Something worth mentioning:
Code:
for unitsRow in GameInfo.Units() do
	for policyRow in GameInfo.Policies() do
		if unitsRow.PolicyType == policyRow.Type then
		end
	end
end
Caching Units table in lua does not decrease code execution time. Caching Policies table does.
 
Top Bottom