faster_exporter: making the exporter faster

Started by rubdos
8be85caac78e76deced9182a23b67687?s=156&d=mm

rubdos

Hi!

TynkaTopi/JuhaW complained about the Blender exporter times of TheBounty being too high. So I started working on that.

I identify following factors to be "bad" for exporter performance, in order of importance:

  1. Exporter uses XML format when not exporting into Blender;
  2. Exporter is written entirely in Python (interpreted scripting language);
  3. Exporter uses badly optimized code;
  4. current yafarayInterface cannot utilize multi threading;

Getting away from the XML takes a lot of time and consideration. If you're going to do that (and we are, that's for sure), you'll have to invent a completely new, fast alternative, which takes a lot of time and investigation.

Also the multi threaded exporter will take time, as we'll need to implement a thebountyInterface or something like that, so it actually will support multi threading.

The fact that the exporter is entirely in Python (and swig, which accounts for some time too) means that interpreting the code takes probably more than 50% of the time. The solution to this is rewriting the entire exporter code (the io directory in the source) to a compiled language (C++ or C, as the Core of TheBounty is C++ code).

I first started optimizing the Python code (faster_exporter branch). There were some constructions that did a loop twice, and there was the "calling a CPython function for each vertex" thing, where I cached all vertices and then called a single C++ method. In the graph embedded later in this post, you can see what performance increase you get. The little bump inside the first part of the curve (before C++) is the part where I construct the cache, the commit after that is the effect of actually using the cache.

After that, I had around 30% performance increase in a scene that TynkaTopi claims to take 4 minutes of export time (on HDD, the XML problem) and which I had 42 seconds export time (render into Blender, so on RAM).

In total, I want to achieve less than one second, compared to the claim of TynkaTopi (4 minutes). That means I should rip out about 98% of the time, and means that 98% of the time your computer is exporting, is considered completely useless. Also mind that I do have an ongoing background in optimizing stuff that happened in Python and that I have founded reasons to believe that I can achieve this.

So, after optimizing the Python code a bit, I found that the exporter could take a huge speedup by being rewritten in a compiled language. I told povmaniac that I was going to do so, and I started rewriting/translating the whole thing in C++.

This is the current state of the C++ exporter:

rubdos: Testing C++ exporter 3

Currently, there are 60 lines of Python exporter code left. They act like glue code. A bunch of the original Python code is still in comments in the C++ source code.

For things that are still left to be done, please refer to the merge request

Graph showing performance improvements

I'm already confident to show the above graph. I spoke about the first part, in which you can see the Python optimizations I made. After that, I only started measuring when I actually finished the export of all geometry (including dupli's), because that takes ~95% of all export time.

It started at about 0.43% time increase, which was kind of unexpected. I did some profiling (callgrind is awesome!) and I did three commits to improve the exporters performance. Now I'm at 115% performance increase (exports per second), at around 19 seconds for the same scene that took 42s on my machine.

Now, I'll finish the rest of the exporter code. After that, I still have some measurements and enhancements in my mind to get it further down. I want to achieve a factor of 1500%. Let's hope that it goes well, I only did less than half a day of optimizing yet…

I'll keep this topic posted of the progress I make.

Regards,

Ruben

8be85caac78e76deced9182a23b67687?s=156&d=mm

rubdos

Added basic implementation for volumetric, except for NoiseVolume, as that requires textures (and isn't implemented yet).

Also had to enable strikethrough in the forum markdown so the first post could be edited appropriately. :o

8be85caac78e76deced9182a23b67687?s=156&d=mm

rubdos

Spent half a day optimizing. From 19 seconds (start of the day) to 13 seconds (end of the day) on the above scene. Coming from Python (35 seconds), that means less than half the time already.

Edit: please keep in mind that this is about export time. The Core is pretty optimized already ;p

8be85caac78e76deced9182a23b67687?s=156&d=mm

rubdos

Quick update: getting the code to work under Windows and Mac appears more difficult than anticipated. Working on this now.

8be85caac78e76deced9182a23b67687?s=156&d=mm

rubdos

Quick update:

We made a primary version work for Windows. Mac OS X is next. In the mean time, I'm implementing everything else on the TODO list :)