What I'm writing here is experience from Colonization, but odds are it also applies to BTS. Based on my investigation, it seems that the game just stalls while starting, like it's not allocating more memory. At some point it suddenly starts to rapidly allocate memory and it reaches the main menu in something like 3-5 seconds. It doesn't actually connect to the DLL file before the memory allocation. I just ran a test and it allocates 18 MB, does something, which isn't CPU throttled (active CPU, but not full load) and it lasts around 80 seconds. After that it allocates 200 MB within seconds and reach the main menu. Quitting and starting again, it does the same, but starts to allocate a lot after 9 seconds and reach the main menu after 13 seconds in total. Once more and again 9 and 13 seconds. Pure speculation is that it relies heavily on window's disk cache. When windows (or any other OS, it's not windows specific) reads a file, it actually requests data to be copied from the disk to memory. Once done, the CPU can read the data from memory as the CPU can't access the disk directly. If nothing else needed the memory in the meantime, the data will remain in memory and when the CPU requests it, it is told right away that the file has been read and is waiting in memory.
The conclusion is that xml caching is worthless because loading xml files are in the DLL file and the speed difference seems to be below 1 second. You better just disable it to avoid issues, or even go as far as I have done and permanently disable it in the DLL and ignore the settings.
The real question is what is the game doing when it's this slow to start? It's not xml loading and it's not loading graphics because it's not allocating memory for the graphics. The only thing I can think of would be making some sort of table of available files. If this is the case, then it's done in a very strange and inefficient way. This means the problem is less severe for modders when restarting the game frequently to check changes, but it's still annoying and doesn't help the players much.
I have a hunch that modules will slow down the game, purely based on the fact that it adds more places to look for files. I haven't tested it though as I'm avoiding modules as much as I can.
Packing files will provide a speed boost, but it has a tradeoff. It copies all the files into memory and some files are read from the disk, meaning your packed copied will use memory without ever being used. I have also had a case where 32 bit computers crashed at startup while 64 bit computers worked fine. Unpacking the files made it work on 32 bit computers too. I don't think there are many 32 bit players left, but it's something to keep in mind.
I did an experiment. I made a copy of Art and placed it inside Art. This doubled the number of files there. It took 9 seconds before when starting multiple times. Now it takes 28 seconds. Before it ended up allocating 18-19 MB. Now it allocates 25 MB. It's clearly affected greatly by the number of files in Art, even if nothing points to those extra files in the copy. It doesn't look like the size of any of the other folders affect the time prior to the fast memory allocations. Moving Art out of Assets changed the delay to 0 seconds.
EDIT: I just had the idea of adding non-game files. I added 6400 files of extensions not used by the civ4 engine. For comparing Art contains 7570 files. This means if file extensions doesn't matter, it sound be around 25 seconds, but it ended up only taking 11 seconds. In other words file extensions does matter and it's something related to graphics, perhaps dds.
Interestingly renaming Art to Art_ didn't change speed at all. It seems to be based purely on the number of files in Assets and has nothing to do with the contents of the files or if they are used.
Without packing, the way to reduce the loading time seems to be reducing the number of files. Assets\Art\Interface\Buttons\Beyond_the_Sword_Atlas.dds shows one way of reducing the number of files without actually packing files.
Other things to test would be if it's fastest to have one name for each file or if reusing names from vanilla is fastest. I have had indications that using the same names will result in the biggest slowdown, but I haven't tried to test this in any setup where I can be sure this is the case.
Last, but not least: the slowdown seems to affect different computers very differently, way more differently than hardware specs can explain. Lots of memory will increase the odds of reusing the disk cache in memory, but that's for the second run, not the first.
That's all I can contribute to this topic for now. I would be most interested in knowing anything, which can be used to reduce this startup delay as it's really annoying.