Preface

(日本語版も公開されています。)

While PyPI has a security page, they don’t have a clear policy for vulnerability assessments.1
This article describes the vulnerabilities that were reported as potential vulnerabilities, using publicly available information. This was done without actually exploiting / demonstrating the vulnerabilities.

This article is not intended to encourage you to perform an unauthorized vulnerability assessment.
If you find any vulnerabilities in PyPI, please report them to security@python.org.2

TL;DR

There was a vulnerability in GitHub Actions of PyPI’s repository, which allowed a malicious pull request to execute an arbitrary command.
This allows an attacker to obtain write permission against the repository, which could lead to arbitrary code execution on pypi.org.

About PyPI

PyPI is the package registry used by Python’s package manager (pip), and is referenced when you run commands such as pip install [package name].
Many projects including Flask and TensorFlow are indirectly using PyPI.

Installation steps for Flask

Reasons for investigation

While reading Max Justicz’s blogpost, I noticed that they reported a vulnerability to PyPI.
After checking the advisory mentioned in the article, I found that source codes of PyPI are published on GitHub. So I decided to read it.

Investigation of source code

The current source code of PyPI is in pypa/warehouse repository.

A quick read of the code revealed that it’s using a web framework for Python called Pyramid.
After reading the documentation of Pyramid, I continued reading the codes of PyPI. And found these 2 vulnerabilities.

Arbitrary Legacy Document Deletion

PyPI had a documentation feature that can be used in Python projects.
They deleted this feature as usage of this feature didn’t increase, but they didn’t delete existing documents.
So, a feature to delete these documents was requested and was implemented in this pull request.

This feature removes the document by using the following codes internally:

    def remove_by_prefix(self, prefix):
        if self.prefix:
            prefix = os.path.join(self.prefix, prefix)
        keys_to_delete = []
        keys = self.s3_client.list_objects_v2(Bucket=self.bucket_name, Prefix=prefix)
        for key in keys.get("Contents", []):
            keys_to_delete.append({"Key": key["Key"]})
            if len(keys_to_delete) > 99:
                self.s3_client.delete_objects(
                    Bucket=self.bucket_name, Delete={"Objects": keys_to_delete}
                )
                keys_to_delete = []
        if len(keys_to_delete) > 0:
            self.s3_client.delete_objects(
                Bucket=self.bucket_name, Delete={"Objects": keys_to_delete}
            )

As you can see from the code snippet above, it’s using the list_objects_v2 function with the prefix parameter to fetch objects to delete.
And names of a user-owned project are passed into the prefix parameter.

This means that if someone deleted legacy documents in the project named examp, it’ll delete documents of projects with the name that starts with examp (e.g. example, exampleasdf)

Arbitrary Role Deletion

PyPI has a permission management feature for packages.
In this feature, the project owner can grant / remove permissions. And there was a vulnerability in this removal process.

While fetching permisison information from the database, PyPI used the following codes:

role = (
    request.db.query(Role)
    .join(User)
    .filter(Role.id == request.POST["role_id"])
    .one()
)

As you can see from the code snippet above, it’s not specifying the project ID while fetching the permission information to delete.
Hence, an attacker was able to delete other projects’ permission by assuming role_id.3

Remote code execution

As mentioned above, I reported two vulnerabilities discovered during the source code investigation.
However, these vulnerabilities don’t have many impacts and can only be used for harassment at best.
I wanted to report something that had a little more impact, so I decided to look for a vulnerability that could be used to execute arbitrary codes.

And then, I read the code for the package upload function and project management function, but I could not find any vulnerabilities that could lead to arbitrary code execution.

While taking a break, I found an article named Unintended Deployments to PyPI Servers.
After reading this article carefully, I found that the code pushed to the main branch of the pypa/warehouse repository will be automatically deployed to pypi.org.
In other words, if I can obtain write permission for this repository, it’s possible to execute arbitrary codes on pypi.org.
Therefore, I checked the workflow file of GitHub Actions, which has write permission for the repository by default, and I found the following vulnerability.

Investigation of the workflow file

In pypa/warehouse, there is a workflow called combine-prs.yml.

This workflow is used to collect pull requests that have branch names starting with dependabot and merge them into a single pull request.
As Dependabot doesn’t have a feature to merge all pull requests, they used this workflow to simulate it.

In this workflow, there is no pull request author verification.
This means that if someone created a pull request that has a branch name starting with dependabot, it’s possible to force this workflow to process the crafted pull request.

However, this workflow only combines pull requests into a single pull request.
So, the pull request generated by this workflow will be reviewed by a human, and if it contains malicious changes, they’ll simply discard it.

Therefore, this couldn’t be used directly to execute arbitrary codes.
But, as I read through the code, I realized that there was another vulnerability.

In this line, combine-prs.yml prints branch lists of pull requests by using the following code.

run: |
  echo "${{steps.fetch-branch-names.outputs.result}}"

It’s a simple echo command, which looks fine at first glance, but it’s not safe due to the GitHub Actions’ behavior.

As described in Keeping your GitHub Actions and workflows secure: Untrusted input, ${{ }} expression will be evaluated before being passed into Bash.

This means that ${{ }} expression doesn’t care about the context in Bash, so if steps.fetch-branch-names.outputs.result contains strings like ";curl https://example.com;#, curl https://example.com will be executed.

Because this workflow used actions/checkout, .git/config contains secrets.GITHUB_TOKEN, which has the write permissions.
So, by executing commands like cat .git/config, it’s possible to leak GitHub Access Token with write permission against the pypa/warehouse repository.

As described above, if someone pushed changes to the main branch, it’ll trigger the automatic deployment to pypi.org.
Hence, by using the following steps, it was possible to execute arbitrary codes on pypi.org.

  1. Fork pypa/warehouse
  2. In forked repository, create a branch named dependabot;cat$IFS$(echo$IFS'LmdpdA=='|base64$IFS'-d')/config|base64;sleep$IFS'10000';#4
  3. Add harmless modification to the created branch
  4. Create a pull request with a harmless name (e.g.: WIP)
  5. Wait for combine-prs.yml to be executed
  6. A GitHub Access Token that has the write permissions against pypa/warehouse will be leaked, so add an arbitrary modification to the main branch
  7. Modified codes will be deployed to pypi.org

Remote code execution in GitHub Actions

I reported this vulnerability to Python’s security team, and they fixed it in this commit.

(Update: 2021/07/31 12:55 UTC)
@mrtc0 mentioned that the attack procedure above doesn’t work.
After checking the steps above, I confirmed that it’s necessary to use a different attack procedure instead of the attack procedure described above.

In line 119 of combine-prs.yml, there are codes like the following.

script: |
  const prString = `${{ steps.fetch-branch-names.outputs.prs-string }}`;

As mentioned earlier, the ${{ }} expression doesn’t care about the context.
So, if steps.fetch-branch-names.outputs.prs-string contains strings like `;console.log("test")//, console.log("test") would be executed.

Since steps.fetch-branch-names.outputs.prs-string contains the title of pull requests, it was possible to execute arbitrary commands on pypi.org by using the following attack procedure.

  1. Fork pypa/warehouse
  2. Find a branch that starts with dependabot in pypa/warehouse
  3. In forked repository, add a harmless modification to the branch that you found in step 2
  4. Create a pull request named `;github.auth().then(auth=>console.log(auth.token.split("")))//
  5. Wait for combine-prs.yml to be executed
  6. A GitHub Access Token that has the write permissions against pypa/warehouse will be leaked, so add an arbitrary modification to the main branch
  7. Modified codes will be deployed to pypi.org

Conclusion

The vulnerabilities described in this article had a significant impact on the Python ecosystem.
As I’ve mentioned several times before, some supply chains have critical vulnerabilities.
However, a limited number of people are researching supply chain attacks, and most supply chains are not properly protected.
Therefore, I believe that it’s necessary for users who depend on the supply chain to actively contribute to improving security in the supply chain.

If you have any questions / comments about this article, please send a message to @ryotkak on Twitter.

Timeline

Date (UTC)Event
July 25, 2021Found / Reported document deletion vulnerability
July 26, 2021Fixed document deletion vulnerability
July 26, 2021Found / Reported role deletion vulnerability
July 27, 2021Found / Reported combine-prs vulnerability
July 27, 2021Fixed role deletion vulnerability
July 27, 2021Fixed combine-prs vulnerability
July 29, 2021Published the advisory
July 30, 2021Published this article

  1. They have a plan to improve security page: https://github.com/pypa/warehouse/issues/7970 ↩︎

  2. Reference: Reporting a security issue ↩︎

  3. As a side note, I thought that role_id is a sequential number, and told them “it’s possible to spray the ID”. But in fact, it was a UUID. ↩︎

  4. This branch name will execute cat .git/config | base64; sleep 10000 if it’s executed in Bash ↩︎