GSoC - Weeks 9, 10

Aug 9, 2020 by Abhishek Kumar

Over the last two weeks, I worked on handling mixed commit-graph chain and adding tests for the new features.

I apologize for the rather slow progress and missing out on the blog post for Week 9 - My college has re-commenced classes on 3rd August, and the transition to online learning has been chaotic and uncertain.

Mixed Commit Graph Chains

By the end of week 8, our patch series could be reliably used for single Git environments. However, real life is not as rosy, and we need to handle mixed generation chains to ensure backward compatibility and a smooth transition to newer versions of Git.

A mixed generation chain contains split commit-graph files with two different generation number definitions.

As we cannot compare corrected commit dates and topological levels, we need to avoid mixed generation chains and modify our read behavior from an “if it has generation data chunk, read corrected commit date. Otherwise, read topological level”.

The read behavior now needs the context of the entire commit-graph chain, as we need to identify if any two commit-graph files have two different generation number definitions.

At first thought, I iterated through commit-graph files from the tip-checking if adjacent pairs had different definitions. If no pair had different definitions, then the entire chain has the same generation number version and can be compared safely. If not, disable generation numbers.

Dr. Stolee improved upon my idea as we can still make use of topological levels present in each commit-graph file as a common generation number and don’t need to disable to generation numbers.

The write behavior for split commit-graph chains is a bit tricky, as follows:

  1. Read and parse existing commit-graph files - This helps us avoid re-computing generation numbers for commits already in commit-graph files.
  2. Compute topological level and corrected commit dates for commits not in the graph.
  3. Write generation data chunk into the new tip if the existing tip has a generation data chunk.
  4. If writing split commit-graph chain with replace strategy, write generation data chunk.

In the reviews to the second version, Dr. Stolee pointed out a crucial flaw - Our code makes the assumption “commit-graph file has generation data chunk, so it must store corrected commit date offsets” which is incorrect. If we check out to cb797e, we can write generation data chunk that stores topological levels and, therefore, interprets the values stored as topological levels.

As our problem is “we write topological levels into generation data chunk”, we can re-order the patches to introduce generation data chunk after implementing corrected commit dates, and write offsets to generation data chunk. Thus, there will never be a version of Git that writes topological levels into generation data chunk.

Adding Tests

With the patch series, we introduce two new premises:

While writing tests for these, I hit a wall. Hard. I have never written tests before thoughtfully and comprehensively, never to the rigor of the Git’s test suite.

I was stuck at deciding what options to go for:

  1. Will we control read or write behavior (i.e., ignore the existence of GDAT chunk or not write it in the first place).
  2. Would it be an environment variable, configuration variable, or a command-line argument?
  3. How would the tests be structured?
  4. Should I test corrupting both CDAT and GDAT chunk?

After constantly revising my changes and making no progress, I reached out to Dr. Stolee and Dr. Narębski for guidance.

Dr. Stolee’s advice and encouragement helped me to approach the problem anew. I didn’t have to write tests to check if New Git can use topological levels, as I could leverage tests already in t6600-test-reach. While extending verify subcommand to validate both topological levels and corrected commit dates would have been useful, the current patch series can progress without it too.

Futute Plans

I hope to finish code re-organization and publish the third version of the patch series by Friday night. With that done, I can work on improvements left out of the series - extending tests for both CDAT and GDAT as a part of a separate patch series.

Thanks to Dr. Stolee, Dr. Narębski, and Taylor for continued reviews and guidance.