GSoC - Week 3

Jun 21, 2020 by Abhishek Kumar

Over the third week, I made progress over a crucial pre-requisite for implementing generation number v2 - Using corrected commit dates with monotonically increasing offsets without increasing generation number version.

Contents

Overview

As discovered by Ævar Arnfjörð Bjarmason, Git on versions 2.21 and earlier dies when the graph version in the commit-graph file does not match the GRAPH_VERSION macro in commit-graph.c. We would rather have Git benefit from commit-graph file and switch off generation number related improvements rather than error out completely.

We could increment the graph version (as with implementing generation number v2) only after old Git clients are no longer used. We could create additional, optional chunks to signal the use of corrected commit dates while still maintaining graph version 1. The following alternatives were discussed:

Generation Data Chunk

The fact that generation numbers must lie within 30 bits of CDAT chunk is restrictive. We could move generation numbers into a new optional chunk (“GDAT”), eliminating the restriction. Using generation data chunk would be the clean and future proof solution, allowing more extensive and creative implementations of generation number.

However, we would spend more time in I/O while parsing commits as the commit data, and generation data would no longer be sequential.

Metadata Chunk

We could introduce an optional metadata chunk to store generation number version as the metadata and store the offsets in CDAT. Since generation number v2 is backward compatible, old clients, who would not read the metadata chunk and consider the offsets to be topological levels, would still work. Newer clients would correctly understand them to be corrected, commit date offsets, and add the commit date to form generation numbers for comparison.

While metadata chunk would perform just as well as generation number v1 mechanically, it skirts around the limitation on the size of generation numbers.

Performance

In deciding between generation data and metadata chunk, it’s essential to understand the performance difference between them.

Command Time (master) Time (metadata) Time (generation data)
git commit-graph write 14.45s 14.28s 14.63s
git log --topo-order -10000 0.211s 0.206s 0.208s
git log --topo-order -100 A..B 0.019s 0.015s 0.015s
git merge-base A..B 0.137s 0.137s 0.137s

While both metadata and generation data approaches better than generation number v1 on using commit-graph, metadata outperforms both generation number v1 and generation data hunk in writing commit-graph files.

You can find a rough prototype here.