Weeks 4, 5
Over the last two weeks, we have drilled further down to evaluate and test the performance of alternatives available. I took off on a self-assigned side quest to optimize the commit-graph write using buffers, which did not turn out well.
Overview
After I had published the initial performance report, we realized that we don’t need to have backward compatible generation number if we use GDAT chunk. Therefore, it was possible to store corrected commit date offset into GDAT while saving GENERATION_NUMBER_INFINITY or topological levels into the CDAT chunk.
Dr. Stolee pointed out that one of the concerns was how close is the offset is to overflowing and use the number of commits as a metric instead of wall clock timing (which can be influenced by other factors like CPU usage at the time).
However, the lesson I would draw from the weeks would be to solve the problem the right way in the first try. See:
I used #define
macros to set up flags for different choices (Metadata Chunk Vs. GDAT and others) instead of using the environment variable because define macros are familiar, whereas git_env_bool
is not. I was rebuilding the git whenever I wanted to try a different set of flags.
I also did not automate running tests and calculating the average time because parsing the output of time initially looked hard. I used to run up python console, type in the timings, and calculate mean. That a step up from using adding numbers by hand, but inefficient.
It took me a morning to sit down and focus on efficiency - Using environment variables was as easy as a search and replace as I understood it and getting the script right took a few times. But with this, the time I needed to calculate performance dropped from a half-hour (of which only ten minutes were test run time, rest was building, calculating and noting the results) to just seven minutes of actual test run time.
Do things the right way around the tenth time you have to do them.
You can find the second version of the report on the mailing list and my pull request , which has instructions on replicating the results.
Improving commit-graph write
While working on the generation data chunk, I realized that I am calling hashwrite
for each offset. I could load all offsets into a buffer and call hashwrite just once to reduce I/O operations. Or so I thought.
To test my hypothesis, I wrote up a patch that would load the fanout table into a buffer and write it out.
It performed slightly better than master (taking 70ms fewer), so I thought the fanout table was too small to make a difference. After all, we were writing around 50 Megabytes for the complete graph, of which the fanout table was a measly 1024 bytes (< 0.002%).
I continued on the way with loading commit oids into the buffer. I was awestruck as it took longer to write a graph this time (about 50ms more than master)! Analyzing further, I found:
-
hashwrite()
is pretty smart. Calling hashwrite does not necessarily imply anxwrite()
call. Hashwrite stores an internal buffer of 8192 bytes and flushes out only when there is not enough space in the buffer. -
Not passing a buffer of the same size of hashfile buffer would require a memcpy over at least
3N/2
bytes over currentN
bytes. -
Chunk oids are 20 bytes long (for SHA-1) and does not neatly fit in the buffer. Likewise, CDAT is 36 bytes long and does not fit either.
Measuring the performance of hashwrite I found:
- Number of flushes: 6379
- The average number of bytes flushed: 8190
- Maximum number of bytes flushed: 8192
Hashwrite wastes only one flush for the Linux repository! I could reduce the number of function calls to hashwrite (and there are around five calls for each commit), but any savings are not going to come from “reduced” I/O operations.
Profile before you optimize.
Future Plans
With the second report, storing corrected commit date in GDAT as well as computing topological levels seems like a no-brainer. I have started working on the patch and will push to the mailing list after some discussion on the report.