List GitHub Repositories with Search Term
Context
I recently needed to update code that was common across dozens of GitHub repositories. The changes were easy, the challenge was getting the list of repos that needed them. GitHub provides search functionality, but it returns every mention in code (multiple per repo). This spanned hundreds of pages. I just needed the list of repositories!
Solution
Using the GitHub API we are able to fetch these results. For this I have 2 scripts:
#! /usr/bin/env python
import os
import sys
import requests
def print_search_repos(organisation: str, token: str, search_term: str):
i = 1
while True:
result = requests.get(f"https://api.github.com/search/code?q={search_term}+org:{organisation}&page={i}",
headers={"Authorization": f"token {token}"}
)
data = result.json()
if 'total_count' in data and data['total_count'] > 0:
item_repos = [entry["repository"]["html_url"] for entry in data["items"]]
for item in item_repos:
print(item)
else:
return
i+=1
def main():
if len(sys.argv) != 2:
raise ValueError('Please provide search term as input param')
search_term = sys.argv[1]
token = os.environ["GITHUB_TOKEN"]
org = os.environ["GITHUB_ORG"]
print_search_repos(org,token,search_term)
if __name__ == "__main__":
main()
The python script will perform the scrape over all of the search result pages using the GitHub API and extract the repository URL:
https://github.com/<org>/<repo>
This script will produce duplicates. We use the following shell script to help with that.
#!/bin/bash
export GITHUB_TOKEN="<YOUR GITHUB ACCESS TOKEN>"
export GITHUB_ORG="<YOUR GITHUB ORGANISATION>"
SEARCH_TERM="$1"
echo "Searching all repositories in $GITHUB_ORG for [$SEARCH_TERM]"
python3 ./github_search_repo_list.py $SEARCH_TERM | sort | uniq
This shell script will setup the environments variables and call the python script. We then run the results through sort
and uniq
to get a unique list of all repositories containing the search term.