airfs.storage.github

GitHub as a read-only file-system.

New in version 1.5.0.

Mount

The GitHub storage does not require to be mounted prior to being used.

It can be mounted with an “personal access token” to access private repositories and to increase the API rate limit (Which is very limited in unauthenticated mode). Private repositories access requires a token with the repo scope, public repositories access does not require to select a scope.

import airfs

# Mount GitHub with an API token
airfs.mount(
    storage='github',
    storage_parameters=dict(
        token='my_token',
    )
)

# Call of airfs on an GitHub object.
with airfs.open('github://my_organization/my_repo/HEAD/my_object', 'rt') as file:
    text = file.read()

Limitation

Only one GitHub configuration can be mounted simultaneously.

Usage

With the GitHub storage, it is possible to navigate in any repository like any local git repository. It is possible to navigate in any branch, tag or commit but also in the current branch, source codes archives, releases and releases assets.

The storage supports common GitHub URLs, and some specific shortcuts.

For instance with the current project GitHub repository:

# Listing the main branch files
airfs.listdir("https://github.com/JGoutin/airfs/HEAD")

# Listing a specific branch files
airfs.listdir("https://github.com/JGoutin/airfs/branches/master")

# Listing a specific tag files
airfs.listdir("https://github.com/JGoutin/airfs/tags/1.4.0")

# Listing download published with the latest
airfs.listdir("https://github.com/JGoutin/airfs/releases/latest/assets")

# Listing download published with a specific release
airfs.listdir("https://github.com/JGoutin/airfs/releases/tag/1.4.0/assets")

# Listing all source code archives for tags and branches
airfs.listdir("https://github.com/JGoutin/airfs/archive")

# Getting the size of the latest release source code archive
airfs.getsize(
    "https://github.com/JGoutin/airfs/releases/latest/archive/source_code.tar.gz")

Many references are handled like symlinks to more precises reference. This feature help in the repositories navigation, but can also be used to get extra information:

from os.path import basename

# Getting the name of the current branch
basename(airfs.readlink("https://github.com/JGoutin/airfs/HEAD"))

# Getting the commit of the a specific branch
basename(airfs.readlink("https://github.com/JGoutin/airfs/branches/master"))

# Getting the commit of the a specific tag
basename(airfs.readlink("https://github.com/JGoutin/airfs/tags/1.4.0"))

# Getting the tag of the latest release
basename(airfs.readlink("https://github.com/JGoutin/airfs/releases/latest"))

This is just a subset of what is possible, read next sections for a detailed description of the files and directories structure.

GitHub API Rate limit

GitHub API calls are limited by a rate limit.

By default, if the rate limit is reached, Airfs waits until the limit reset. To raise an exception instead, set the wait_rate_limit argument to False in storage_parameters when mounting.

Airfs uses the GitHub API v3 (REST API) because it allows unauthenticated requests.

Therefore, using authentication with an API token when mounting the storage allow to have a greater rate limit than using the unauthenticated default mount.

Airfs does its best to reduce API rate limit usage (using GitHub conditional requests mechanism, caches and lazy evaluation).

Supported paths and URLs

Variables

Definitions of all variables used paths and URLs in following sections:

  • :asset_name: Filename of a release asset.

  • :branch: Git branch name.

  • :dir_path: Path of a directory in the Git tree. Git tree root is used if not specified.

  • :file_path: Path of a file (or blob) in the Git tree.

  • :owner: Repository owner name (User or Organization)

  • :path: Path of a file or directory in the Git tree. Git tree root is used if not specified.

  • :ref: Git reference that can be HEAD, a branch name, a tag name or a commit ID.

  • :repo: Repository name

  • :tag: Git tag name.

Files and directory structure

To allow to view GitHub as a file-system, airfs provides a specific directory structure.

This structure is done to be as compatible as possible with URLs used to navigate on the GitHub website itself.

This structure also add some extra paths and symbolic link relationships that are not available on the GitHub website itself, theses path are commented bellow.

The parameters used in the structure are the following:

  • :asset_name: A GitHub release asset/download filename. GitHub also provides .tag.gz and .zip archives for each releases that does not count as assets.

  • :branch: A Git branch.

  • :path: The path to any file or directory inside the repository.

  • :owner: The GitHub user or organization.

  • :ref: A Git reference tha can be a branch, a commit or a tag.

  • :repo: The repository.

  • :sha: A Git commit SHA.

  • :tag: A Git tag.

The structure is the following:

  • `:owner`

    • `:repo`

      • archive

        • `:ref`.zip

        • `:ref`.tar.gz

      • blob [8]

        • `:ref` [1]

          • `:path`

      • branches

      • commits

        • `:sha`

      • HEAD [2] [4]

        • `:path`

      • refs [4]

        • heads

          • `:branch` [1]

            • `:path`

        • tags

          • `:tag` [1]

            • `:path`

      • releases

        • tag

          • `:tag`

            • source_code.zip [5]

            • source_code.tar.gz [5]

            • assets [6]

              • `:asset_name`

            • tree [7]

              • `:path`

      • latest [3]

        • source_code.zip [5]

        • source_code.tar.gz [5]

        • assets [6]

          • `:asset_name`

        • tree [7]

          • `:path`

      • download

        • `:tag`

          • `:asset_name`

      • tags

      • tree [8]

        • `:ref` [1]

          • `:path`

GitHub URLs

Airfs provides a specific github:// scheme but also supports common GitHub URLs:

  • https://github.com/:owner

  • https://github.com/:owner/:repo

  • https://github.com/:owner/:repo/archive/:ref.zip

  • https://github.com/:owner/:repo/archive/:ref.tar.gz

  • https://github.com/:owner/:repo/branches

  • https://github.com/:owner/:repo/blob/:ref/:path

  • https://github.com/:owner/:repo/commits

  • https://github.com/:owner/:repo/releases

  • https://github.com/:owner/:repo/releases/latest

  • https://github.com/:owner/:repo/releases/tag/:tag

  • https://github.com/:owner/:repo/releases/download/:tag/:asset_name

  • https://github.com/:owner/:repo/tags

  • https://github.com/:owner/:repo/tree/:ref/:path

  • https://raw.githubusercontent.com/:owner/:repo/:ref/:path (Redirect to github://:owner/:repo/tree/:ref/:path)

Files objects classes

GitHub.

class airfs.storage.github.GithubBufferedIO(name, mode='r', buffer_size=None, max_buffers=0, max_workers=None, **kwargs)[source]

Buffered GitHub Object I/O.

Parameters:
  • name (path-like object) – URL to the file which will be opened.

  • mode (str) – The mode can be ‘r’ for reading.

  • buffer_size (int) – The size of buffer.

  • max_buffers (int) – The maximum number of buffers to preload in read mode or awaiting flush in “write” mode. 0 for no limit.

  • max_workers (int) – The maximum number of threads that can be used to execute the given calls.

close()

Flush the write buffers of the stream if applicable and close the object.

detach()

Disconnect this buffer from its underlying raw stream and return it.

After the raw stream has been detached, the buffer is in an unusable state.

fileno()

Returns underlying file descriptor if one exists.

OSError is raised if the IO object does not use a file descriptor.

flush()

Flush the write buffers of the stream if applicable.

isatty()

Return whether this is an ‘interactive’ stream.

Return False if it can’t be determined.

property mode

The mode.

Returns:

Mode.

Return type:

str

property name

The file name.

Returns:

Name.

Return type:

str

peek(size=-1)

Return bytes from the stream without advancing the position.

Parameters:

size (int) – Number of bytes to read. -1 to read the full stream.

Returns:

bytes read

Return type:

bytes

property raw

The underlying raw stream.

Returns:

Raw stream.

Return type:

ObjectRawIOBase subclass

read(size=-1)

Read the object content.

Read and return up to size bytes, with at most one call to the underlying raw stream.

Use at most one call to the underlying raw stream’s read method.

Parameters:

size (int) – Number of bytes to read. -1 to read the stream until the end.

Returns:

Object content

Return type:

bytes

read1(size=-1)

Read the object content.

Read and return up to size bytes, with at most one call to the underlying raw stream.

Use at most one call to the underlying raw stream’s read method.

Parameters:

size (int) – Number of bytes to read. -1 to read the stream until the end.

Returns:

Object content

Return type:

bytes

readable()

Return True if the stream can be read from.

If False, read() will raise OSError.

Returns:

Supports reading.

Return type:

bool

readinto(b)

Read the object content into a buffer.

Read bytes into a pre-allocated, writable bytes-like object b, and return the number of bytes read.

Parameters:

b (bytes-like object) – buffer.

Returns:

number of bytes read

Return type:

int

readinto1(b)

Read the object content into a buffer.

Read bytes into a pre-allocated, writable bytes-like object b, and return the number of bytes read.

Use at most one call to the underlying raw stream’s readinto method.

Parameters:

b (bytes-like object) – buffer.

Returns:

number of bytes read

Return type:

int

readline(size=-1, /)

Read and return a line from the stream.

If size is specified, at most size bytes will be read.

The line terminator is always b’n’ for binary files; for text files, the newlines argument to open can be used to select the line terminator(s) recognized.

readlines(hint=-1, /)

Return a list of lines from the stream.

hint can be specified to control the number of lines read: no more lines will be read if the total size (in bytes/characters) of all lines so far exceeds hint.

seek(offset, whence=0)

Change the stream position to the given byte offset.

Parameters:
  • offset – Offset is interpreted relative to the position indicated by whence.

  • whence – The default value for whence is SEEK_SET. Values are: SEEK_SET or 0 – Start of the stream (the default); offset should be zero or positive SEEK_CUR or 1 – Current stream position; offset may be negative SEEK_END or 2 – End of the stream; offset is usually negative

Returns:

The new absolute position.

Return type:

int

seekable()

Return True if the stream supports random access.

If False, seek(), tell() and truncate() will raise OSError.

Returns:

Supports random access.

Return type:

bool

tell()

Return the current stream position.

Returns:

Stream position.

Return type:

int

truncate()

Truncate file to size bytes.

File pointer is left unchanged. Size defaults to the current IO position as reported by tell(). Returns the new size.

writable()

Return True if the stream supports writing.

If False, write() and truncate() will raise OSError.

Returns:

Supports writing.

Return type:

bool

write(b)

Write into the object.

Write the given bytes-like object, b, to the underlying raw stream, and return the number of bytes written.

Parameters:

b (bytes-like object) – Bytes to write.

Returns:

The number of bytes written.

Return type:

int

writelines(lines, /)

Write a list of lines to stream.

Line separators are not added, so it is usual for each of the lines provided to have a line separator at the end.

exception airfs.storage.github.GithubRateLimitException[source]

Exception if rate limit reached.

add_note()

Exception.add_note(note) – add a note to the exception

with_traceback()

Exception.with_traceback(tb) – set self.__traceback__ to tb and return self.

exception airfs.storage.github.GithubRateLimitWarning[source]

Warning if rate limit reached and waiting.

add_note()

Exception.add_note(note) – add a note to the exception

with_traceback()

Exception.with_traceback(tb) – set self.__traceback__ to tb and return self.

class airfs.storage.github.GithubRawIO(*args, **kwargs)[source]

Binary GitHub Object I/O.

Parameters:
  • name (path-like object) – URL to the file which will be opened.

  • mode (str) – The mode can be ‘r’ for reading (default)

close()

Flush the write buffers of the stream if applicable and close the object.

fileno()

Returns underlying file descriptor if one exists.

OSError is raised if the IO object does not use a file descriptor.

flush()

Flush.

Flush the write buffers of the stream if applicable and save the object on the storage.

isatty()

Return whether this is an ‘interactive’ stream.

Return False if it can’t be determined.

property mode

The mode.

Returns:

Mode.

Return type:

str

property name

Name.

Returns:

Name

Return type:

str

readable()

Return True if the stream can be read from.

If False, read() will raise OSError.

Returns:

Supports reading.

Return type:

bool

readall()

Read and return all the bytes from the stream until EOF.

Returns:

Object content

Return type:

bytes

readinto(b)

Read the object content into a buffer.

Read bytes into a pre-allocated, writable bytes-like object b, and return the number of bytes read.

Parameters:

b (bytes-like object) – buffer.

Returns:

number of bytes read

Return type:

int

readline(size=-1, /)

Read and return a line from the stream.

If size is specified, at most size bytes will be read.

The line terminator is always b’n’ for binary files; for text files, the newlines argument to open can be used to select the line terminator(s) recognized.

readlines(hint=-1, /)

Return a list of lines from the stream.

hint can be specified to control the number of lines read: no more lines will be read if the total size (in bytes/characters) of all lines so far exceeds hint.

seek(offset, whence=0)

Change the stream position to the given byte offset.

Parameters:
  • offset (int) – Offset is interpreted relative to the position indicated by whence.

  • whence (int) – The default value for whence is SEEK_SET. Values are: SEEK_SET or 0 – Start of the stream (the default); offset should be zero or positive SEEK_CUR or 1 – Current stream position; offset may be negative SEEK_END or 2 – End of the stream; offset is usually negative

Returns:

The new absolute position.

Return type:

int

seekable()

Return True if the stream supports random access.

If False, seek(), tell() and truncate() will raise OSError.

Returns:

Supports random access.

Return type:

bool

tell()

Return the current stream position.

Returns:

Stream position.

Return type:

int

truncate()

Truncate file to size bytes.

File pointer is left unchanged. Size defaults to the current IO position as reported by tell(). Returns the new size.

writable()

Return True if the stream supports writing.

If False, write() and truncate() will raise OSError.

Returns:

Supports writing.

Return type:

bool

write(b)

Write into the object.

Write the given bytes-like object, b, to the underlying raw stream, and return the number of bytes written.

Parameters:

b (bytes-like object) – Bytes to write.

Returns:

The number of bytes written.

Return type:

int

writelines(lines, /)

Write a list of lines to stream.

Line separators are not added, so it is usual for each of the lines provided to have a line separator at the end.