Skip to content

Fetch docs from remote repos via git clone instead of individually requesting every file from GitHub.Β #3951

@sengi

Description

@sengi

GitHubRepoFetcher is super unkind to GitHub's HTTP API. We're basically crawling every remote repo that's listed in data/repos.yml and issuing an HTTP request for every file in its docs/ directory (including subdirs) on startup.

This makes for a miserable developer experience when trying to preview changes to documentation. Startup takes many minutes and often fails because we hit rate-limits (sometimes even when using an API token!) The worst part is that the tests take forever to run and depend on thousands of network requests all succeeding, to endpoints which are outside our control.

It'd be simpler, faster and more reliable just to clone the remote repos and read the .md docs from the local filesystem. We wouldn't even have to download the whole of each repo; it's possible to download just the docs directory for just the head of the default branch, by using clone --filter with sparse-checkout.

This would also let us ditch our homegrown cache mechanism, because the files will just stick around when developing locally and we can use the built-in cache in GitHub Actions.

Metadata

Metadata

Assignees

No one assigned

    Labels

    good first issueIssue is likely suitable for newcomers to the project.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions