Counting Github commits

The Github API enables you to do some pretty awesome things with their data. You can query for information and use it to build statistics or you can interface into their system and make changes to your repositories.

I spent some time creating a Github project comparison tool as a way to help compare two projects hosted on Github. While building it I ran into an interesting problem that I will discuss here: how to count things.

The Github API

image

The API Github provides is fairly straight-forward and well documented. For example you can get a set of commits for a repository by simply doing a GET request to:

https://api.github.com/repos/bayne/github-compare/commits

Which returns something like:

image

Pagination

Typically when you make a request to a URI that does not end in an ID, you are going to receive a collection. So in the above example a collection is returned.

Another thing, the Github API automatically paginates long collections. The pagination can be accessed by two parameters: per_page and page. Traversing the collection requires diving into the response headers:

Link:<https://api.github.com/repositories/458058/issues?page=2>; rel="next", <https://api.github.com/repositories/458058/issues?page=26>; rel="last"

By using the Link header, you are able to traverse the collection.

Getting the count

Getting the count of the collection requires getting the entire collection, which can be an expensive operation. However by using the per_page parameter we can utilize a little hack to make it cheap.

Set the per_page parameter to 1

https://api.github.com/repos/symfony/symfony?per_page=1

Which gives this in your response header

Link:<https://api.github.com/repositories/458058/issues?per_page=1&page=2>; rel="next", <https://api.github.com/repositories/458058/issues?per_page=1&page=760>; rel="last"

Now you know exactly how many issues are in that repository. By looking at the last link and the value of the page parameter, you are able to count the number in that collection without traversing.

Getting the count without “last”

What if you don’t have the link to the last page? Unfortunately the commits endpoint doesn’t provide you with the last page.

Counting the number of commits (efficiently) is a bit more involved. By thinking back to my Computer Science backround the best way to do this is to divide-and-conquer or a binary search. The binary search has to be two step though since a binary search requires a ceiling (this does not have one).

The first part of the algorithm requires finding the ceiling, which from my intuition is best to do exponentially (optimally would probably be to fit it to some heuristic based on standard distribution). Then from there do a binary search for the page that is only partially filled.

github-services.js

function binarySearch(lower, upper, callback) {
  var deferred = $q.defer();
  function recurse(lower, upper) {
    var midpoint = Math.ceil((lower+upper)/2);
    callback(midpoint).then(function (result) {
      if (result < 0) {
        recurse(midpoint + 1, upper);
      } else if (result > 0) {
        recurse(lower, midpoint - 1);
      } else {
        deferred.resolve(midpoint);
      }
    });
    return deferred.promise;
  }
  return recurse(lower, upper);
}

var compare = function (page) {
  var deferred = $q.defer();
  $http.get(
    urlParser(url)
      .addToSearch('page', page)
      .url+'&per_page=100',
    {stopPropagate: true}
  ).then(function (response) {
      var next = parse_link_header(response.headers('Link')).next;
      var result;
      if (response.data.length === 0) {
        result = 1;
      } else if (response.data.length == 100 && next !== undefined) {
        result = -1;
      } else {
        result = 0;
      }
      deferred.resolve(result);
  });
  return deferred.promise;
};

getUpperbound(compare).then(function (upperBound) {
  binarySearch(1, upperBound, compare).then(function (lastPage) {
    $http.get(
      urlParser(url)
        .addToSearch('page', lastPage)
        .url+'&per_page=100',
      {stopPropagate: true}
    ).then(function (response) {
        deferred.resolve(response.data.length+(lastPage-1)*100);
    });
  });
});