GSoC - Weeks 1, 2
Jun 14, 2020 by Abhishek Kumar
Over the first two weeks of GSoC, I have been working on moving struct members
graph_pos out of the struct
commit. You can find the patch and subsequent discussion versions 1 and 2, versions 3 and 4.
Let’s take a moment to talk about
commit-slab. A commit slab dynamically associates commits with some kind of information. For example, indegree commit slab stores the indegree of a commit when running the topological sort, generation commit slab stores the generation number of a commit, and so on.
Commit slabs are neat because they reduce the memory footprint. The generation commit slab allocates memory only when generation numbers are assigned. For the large repositories, such savings can be pretty nifty.
But this, of course, comes at a cost. A struct member access is much cheaper than accessing the same member through commit-slab.
As discovered by SZEDER 1, specific commands like “git merge-base –is-ancestor HEAD~50000 HEAD” were slower by nearly 40%!
Dr. Stolee believes the slowdown has to do with the underlying algorithm, rather than the slower commit slab access 2.
|Test||Time taken by Master||Time taken by Version 4||Change in Time (%)|
|git merge-base –is-ancestor HEAD~50000 HEAD||0.787s||0.927s||+17.79%|
|Test||Max RSS, Master||Max RSS, Version 4||Change in Max RSS (%)|
|git merge-base –is-ancestor HEAD~50000 HEAD||177694kb||177707kb||+0.01%|
Talking more about my experience, over the weeks, I have grown more comfortable with the build process and hacking around things. While the implementation is straightforward, I had to spend a lot of time debugging tests.
For example, consider the idea “commits within the graph have definite generation numbers”. So, if the commit is not from the graph, it must have an infinite generation number.
This fails when we think about writing commit-graph. Since commits have not been written (yet), they have a generation number but are also not from the graph.
Likewise, consider the following idea: Maintaining a count of “parsed commits so far” for a repository should provide a unique value, which can then be used as an index for commit-slab seems reasonable. But it’s true only for single repository setups that have no submodules at all! The build failed tests for diff-submodules.
The commit-slab has been used for the following:
contains_cache, commit_seen, indegree_slab, author_date_slab, commit_base, commit_pos, bloom_filter_slab, buffer_slab, commit_rev_name, commit_names, commit_name_slab, saved_parents, blame_suspects, commit_todo_item
None of which had failed yet.
As v3 was queued into next, I will polish the current series a bit more. Solving the global counter issue with alloc_commit_index() more cleanly by dropping commit_count from parsed_object_pool and fixing the segfault when allocating new slabs 6.
Once that’s done, I will focus entirely on “handling commit-graph format change”.
Thanks to Dr. Stolee, Dr. Jakub, Junio, and Szeder Gábor for their interest in patch series.