List GitHub Repositories with Search Term

Context

I recently needed to update code that was common across dozens of GitHub repositories. The changes were easy, the challenge was getting the list of repos that needed  them. GitHub provides search functionality, but it returns every mention in code (multiple per repo). This spanned hundreds of pages.  I just needed the list of repositories!

Solution

Using the GitHub API we are able to fetch these results. For this I have 2 scripts:

#! /usr/bin/env python

import os
import sys
import requests

def print_search_repos(organisation: str, token: str, search_term: str):
    i = 1
    while True:
        result = requests.get(f"https://api.github.com/search/code?q={search_term}+org:{organisation}&page={i}",
                 headers={"Authorization": f"token {token}"}
                 )
        data = result.json()
        if 'total_count' in data and data['total_count'] > 0:
            item_repos = [entry["repository"]["html_url"] for entry in data["items"]]
            for item in item_repos:
                print(item)
        else:
            return
        i+=1


def main():

    if len(sys.argv) != 2:
        raise ValueError('Please provide search term as input param')

    search_term = sys.argv[1]
    token = os.environ["GITHUB_TOKEN"]
    org = os.environ["GITHUB_ORG"]
    print_search_repos(org,token,search_term)


if __name__ == "__main__":
    main()

The python script will perform the scrape over all of the search result pages using the GitHub API and extract the repository URL:

https://github.com/<org>/<repo>

This script will produce duplicates. We use the following shell script to help with that.

#!/bin/bash

export GITHUB_TOKEN="<YOUR GITHUB ACCESS TOKEN>"
export GITHUB_ORG="<YOUR GITHUB ORGANISATION>"
SEARCH_TERM="$1"

echo "Searching all repositories in $GITHUB_ORG for [$SEARCH_TERM]"
python3 ./github_search_repo_list.py $SEARCH_TERM | sort | uniq

This shell script will setup the environments variables and call the python script. We then run the results through sort and uniq to get a unique list of all repositories containing the search term.