gitcoin: the author of the commit sha1 with the longest prefix of 0’s in your repository is now the project maintainer
The genius in the tweet is how it draws a comparison to Bitcoin’s approach to achieving distributed consensus with achieving consensus on choosing a project maintainer.
With Bitcoin, there’s a proof-of-work algorithm that relies on generating SHAs until you find one with a certain number of leading zeroes. Git commit SHAs could perhaps serve a similar purpose.
Is this a good way to pick a project maintainer? Probably not. But then again, it’s not that far off from how I make most important life decisions. If your project wants to take a walk on the wild side, I’ve got just the command for you.
A simple solution
Run the following command in a Git repository and it’ll return the name of the author, the commit date, and the SHA of the commit that has the lowest SHA sorted lexicographically.
git log --pretty=format:"%H %ad %an" | sort | head -n1
Or, if you prefer a Git alias:
[alias] coin = !git log --pretty=format:'%H %ad %an' | sort | head -n1
The SHA that results will have the most leading zeroes in the repository. There may be other commits with the same number of leading zeroes, but for the sake of this thought exercise, I’ll just pick the one that’s sorted first.
For those not familiar with the
git log command, there are a gaggle of options. I’ll break down this specific invocation.
--pretty=format option takes a custom format string that specifies the contents of the output.
%H is the commit SHA.
%ad is the commit date and
%an is the author name.
We pipe that to the
sort command. Since no two SHAs can be the same, we don’t have to worry about sorting on just the first column. We can just sort using the entire line as the sort key. Then we use
head -n1 to pluck the first item.
It’s possible that there won’t be any commits with leading 0s, but I ignore that for now. I figure the commit with the lowest SHA sorted alphabetically fits with the spirit of the idea.
Since github.com runs on Ruby on Rails, I thought it’d be fun to try it out on https://github.com/rails/rails. I cloned the repository to my machine and ran
git coin on it. Here’s the output (SHA truncated for presentation purposes):
000121e8 Thu Nov 15 23:10:45 2012 +0200 Agis Anastasopoulos
Congratulations Agis! You are the new maintainer of Rails!
Not so fast!
I know what some of you are thinking, “You are ridiculous. This is a waste of time.” To those I say, hold my beer because I’m not done yet.
Others who are familiar with Bitcoin’s consensus protocol are thinking, “This is not how the protocol works. It’s not about choosing the lowest sorted SHA, it’s about reaching a target number of leading zeros.” To those I say, you’re taking this too seriously!
Even so, in anticipation of all the “Well, Actually” responses I’m sure to receive, I’ll address this fair point.
With Bitcoin, the first miner to generate a SHA with the target number of leading zeros is the one to add their block to the global blockchain.
I’m hand waving a bit here for the sake of brevity. The important point is that it’s not the block with the lowest SHA. It has nothing to do with the sorting of SHAs.
Over time, the protocol compensates for the global increase in computing power by increasing the number of leading zeros in the proof of work target. That way a block is added roughly every ten minutes no matter how fast computers get and no matter how many computers are mining.
If we translate this to the gitcoin idea, we probably want to look at the first commit to reach the most number of leading zeros.
For example, say that the current maintainer was chosen because of a commit with a SHA that has two leading zeros. The next maintainer is chosen by the commit that has three (or more) leading zeroes. The next maintainer after that is chosen by the first commit with one more leading zero than the commit that chose the previous maintainer. And so on.
In other words, every time a new maintainer is chosen by this protocol, the target number of leading zeroes increases by one. The implication is that over time, maintainers chosen by this project will spend more and more time maintaining the project. Not sure that’s necessarily a desirable trait or not.
The shell script to find the maintainer with these rules is considerably more complex than my previous script. This is why I originally wanted to stop with that script and call it a day. Also, I’m lazy.
Not to mention, my background is primarily with Windows so my Unix-fu is fairly weak. However, my time at GitHub working with Git has helped me exercise those muscles quite a bit more than I did before. So I thought it’d be fun to give it a shot. Here’s the script I came up with:
TZ=UTC git log --pretty=format:'%H%x09%ad%x09%an' --date=iso-local | grep ^0.* | sed -E 's/(0+)(.*)/t/' | sort -k1,1 -k3,3r | tail -n1 | cut -f 2,3,4
And the award goes to…
When I run this against the Rails repository, it outputs (again, SHA truncated for presentation purposes):
00050dfe 2006-04-09 21:27:32 +0000 David Heinemeier Hansson
Sorry Agis, this David Heinemeier Hansson person is now the Rails maintainer! I hope David accepts this responsibility seriously.
The excrutiatingly detailed breakdown
Let’s break this down piece by piece for those of you like me who don’t eat and breath shell scripting.
The first thing we do is set the local timezone to UTC (
TZ=UTC) so we can sort by date and compare apples to apples.
git log --pretty=format:'%H%x09%ad%x09%an' --date=iso-local
Just like before, we’re running a git log command. It looks ugly, but all I’m doing here is using tab character
%x09 in place of spaces. That’ll come
in handy later. I also specify that the date format should be
iso-local. This provides a date that’s sortable lexicographically. We’ll need that later too.
I then pipe all these results to
grep with a very simple regex. We’re only looking at commit SHAs that start with at least one
sed -E 's/(0+)(.*)/t/'
Sed is a powerful command used to perform text transformations on an input stream. In this case, we’re using the
s/ command which is a regex replacement. The
-E indicates that sed should use extended regular expressions. What I’m doing here is extracting the consecutive sequence of
0s that the SHA starts with as a new column in the output.
So if the
git log command we ran earlier returned something like this (SHAs and name truncated for presentation purposes):
005371e1 2004-12-01 13:59:16 +0000 David 0daa29ec 2004-12-01 13:18:51 +0000 David 08a2249e 2004-11-26 02:16:05 +0000 David
Piping this output to this sed expression results in (name truncated for brevity):
00 005371e1 2004-12-01 13:59:16 +0000 David 0 0daa29ec 2004-12-01 13:18:51 +0000 David 0 08a2249e 2004-11-26 02:16:05 +0000 David
That format is pretty handy because we can sort this by the first column which is the number of zeroes a SHA has as its prefix. This would short the commits from those with the least leading zeroes to the most.
This will also group all SHAs with the same number of leading zeroes together. Then we can sort by the date column to find the first commit in any such group.
sort -k1,1 -k3,3r
Does exactly that. One thing that tripped me up when I first worked on this is I thought I should be able to
sort -k1 -k3r. The
-k option specifies a sort key. By default, when you specify a column, it takes that column and all columns after it as the sort key. Thus
-k1 is pretty much equivalent to not specifying a sort key at all as it sorts by the whole line.
Fortunately, you can specify an end column for the sort key using the comma. So
-k1,1 sorts just by the first column. Whereas
-k1,3 would take the first three columns as a sort key.
-k3,3r indicates to sort that column in reverse. After all, we want the first commit among the group of commits with the most leading zeroes.
Now that we have the proper sort in play, we just need to take the last entry. That’ll be the oldest commit with the most leading zeroes.
cut -f 2,3,4
And finally, we don’t need the leading zeroes column in the final output so I run the
cut command and only keep columns 2, 3, and 4. This is where inserting the tabs before comes in handy. By default,
cut uses the tab character as a delimiter.