At work, we work in GitHub pull requests that get merged to the main branch. We also have twice-yearly community release branches, and a small fraction of the main-branch changes need to be copied onto the current release branch. Trying to automate choosing the commits to cherry-pick led me into some git and GitHub complexities.
Git has three different ways to finish up a pull request, which complicates the process of figuring out what to cherry-pick. Before getting into cherry-picking, let’s look at the three finishes to pull requests. Suppose we have five commits on the main branch (A-B-C-D-E), and a pull request for a feature branch started from B with two commits (F-G) on it:
The F-G pull request can be brought into the main branch in three ways. First, the F-G commits can be merged to main with a merge commit:
Second, the two commits can be rebased onto main as two new commits Fr-Gr (for F-rebased and G-rebased):
Lastly, the two commits can be squashed down to one new commit FGs (for F and G squashed):
Note that for rebased and squashed pull requests, the original commits F-G will not be reachable from the main branch, and will eventually disappear from the repo, indicated by their dashed outlines.
Now let’s consider the release branch. This is a branch made twice a year to mark community releases of the platform. Once the branch is made, some fixes need to be cherry-picked onto it from the main branch. We can’t just merge the fixes, because that would bring the entire history of the main branch into the release. Cherry-picking lets us take just the commits we want.
As an example, here E has been cherry-picked as Ec:
The question now is:
To get the changes from a finished pull request onto the release branch, what commits should we cherry-pick?
The two rules are:
- The commits should make the same change to the release branch that were made to the main branch, and
- The commits should be reachable from the main branch, in case we need to later investigate how the changes came to be.
GitHub doesn’t record what approach was used to finish a pull request (unless I’ve missed something). It records what it calls the “merge commit”. For merged pull request, this is the actual merge commit. For rebased and squashed pull requests, it’s the final commit that ended up on the main branch.
In the case of a merged pull request, the answer is easy: cherry-pick the two original commits in the pull request. We can tell the pull request was merged because the merge commit (with a thicker outline) has two parents (it’s actually a merge):
But for rebased and squashed pull requests, the answer is not so simple. We can tell the pull request wasn’t merged, because the recorded “merge commit” isn’t a merge. Somehow we have to figure out how many commits starting with the merge commit are the right ones to take. For a rebased pull request we’d like to cherry-pick as many commits as the pull request had:
And for a squashed pull request, we want to cherry-pick just the one squashed commit:
But how to tell the difference between these two situations? I don’t know the best approach. Maybe comparing the commit messages? My first way was to look at the count of added and deleted lines. If the merge commit changes as many lines as the pull request as a whole, then just take that one commit. But that could be wrong if a rebased pull request had overlapping commits, and the last commit changed all the lines.
Is there some bit of information I’ve overlooked? Does git or GitHub have a way to unambiguously distinguish these cases?