Skip to content

Replace the Suboptimal fuzz_tree.py Harness With a Better Alternative #1910

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Replace the suboptimal fuzz_tree harness with a better alternative
As discussed in the initial fuzzing integration PR[^1], `fuzz_tree.py`'s
implementation was not ideal in terms of coverage and its reading/writing to
hard-coded paths inside `/tmp` was problematic as (among other concerns), it
causes intermittent crashes on ClusterFuzz[^2] when multiple workers execute
the test at the same time on the same machine.

The changes here replace `fuzz_tree.py` completely with a completely new
`fuzz_repo.py` fuzz target which:

- Uses `tempfile.TemporaryDirectory()` to safely manage tmpdir creation and
  tear down, including during multi-worker execution runs.
- Retains the same feature coverage as `fuzz_tree.py`, but it also adds
  considerably more from much smaller data inputs and with less memory consumed
  (and it doesn't even have a seed corpus or target specific dictionary yet.)
- Can likely be improved further in the future by exercising additional features
  of `Repo` to the harness.

Because `fuzz_tree.py` was removed and `fuzz_repo.py` was not derived from it,
the Apache License call outs in the docs were also updated as they only apply to
the singe `fuzz_config.py` file now.

[^1]: #1901 (comment)
[^2]: https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=68355
  • Loading branch information
DaveLak committed Apr 29, 2024
commit c84e643c6aa177f364ebe28e4c7bab1e37fb0242
4 changes: 2 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -240,8 +240,8 @@ Please have a look at the [contributions file][contributing].

[3-Clause BSD License](https://opensource.org/license/bsd-3-clause/), also known as the New BSD License. See the [LICENSE file][license].

Two files exclusively used for fuzz testing are subject to [a separate license, detailed here](./fuzzing/README.md#license).
These files are not included in the wheel or sdist packages published by the maintainers of GitPython.
One file exclusively used for fuzz testing is subject to [a separate license, detailed here](./fuzzing/README.md#license).
This file is not included in the wheel or sdist packages published by the maintainers of GitPython.

[contributing]: https://github.com/gitpython-developers/GitPython/blob/main/CONTRIBUTING.md
[license]: https://github.com/gitpython-developers/GitPython/blob/main/LICENSE
16 changes: 8 additions & 8 deletions fuzzing/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -225,14 +225,14 @@ to [the official OSS-Fuzz documentation][oss-fuzz-docs].
## LICENSE

All files located within the `fuzzing/` directory are subject to [the same license](../LICENSE)
as [the other files in this repository](../README.md#license) with two exceptions:

Two files located in this directory, [`fuzz_config.py`](./fuzz-targets/fuzz_config.py)
and [`fuzz_tree.py`](./fuzz-targets/fuzz_tree.py), have been migrated here from the OSS-Fuzz project repository where
they were originally created. As such, these two files retain their original license and copyright notice (Apache
License, Version 2.0 and Copyright 2023 Google LLC respectively.) Each file includes a notice in their respective header
comments stating that they have been modified. [LICENSE-APACHE](./LICENSE-APACHE) contains the original license used by
the OSS-Fuzz project repository at the time they were migrated.
as [the other files in this repository](../README.md#license) with one exception:

[`fuzz_config.py`](./fuzz-targets/fuzz_config.py) was migrated to this repository from the OSS-Fuzz project's repository
where it was originally created. As such, [`fuzz_config.py`](./fuzz-targets/fuzz_config.py) retains its original license
and copyright notice (Apache License, Version 2.0 and Copyright 2023 Google LLC respectively) as in a header
comment, followed by a notice stating that it has have been modified contributors to GitPython.
[LICENSE-APACHE](./LICENSE-APACHE) contains the original license used by the OSS-Fuzz project repository at the time the
file was migrated.

[oss-fuzz-repo]: https://github.com/google/oss-fuzz

Expand Down
13 changes: 0 additions & 13 deletions fuzzing/dictionaries/fuzz_tree.dict

This file was deleted.

47 changes: 47 additions & 0 deletions fuzzing/fuzz-targets/fuzz_repo.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,47 @@
import atheris
import io
import sys
import os
import tempfile

if getattr(sys, "frozen", False) and hasattr(sys, "_MEIPASS"):
path_to_bundled_git_binary = os.path.abspath(os.path.join(os.path.dirname(__file__), "git"))
os.environ["GIT_PYTHON_GIT_EXECUTABLE"] = path_to_bundled_git_binary

with atheris.instrument_imports():
import git


def TestOneInput(data):
fdp = atheris.FuzzedDataProvider(data)

with tempfile.TemporaryDirectory() as temp_dir:
repo = git.Repo.init(path=temp_dir)

# Generate a minimal set of files based on fuzz data to minimize I/O operations.
file_paths = [os.path.join(temp_dir, f"File{i}") for i in range(min(3, fdp.ConsumeIntInRange(1, 3)))]
for file_path in file_paths:
with open(file_path, "wb") as f:
# The chosen upperbound for count of bytes we consume by writing to these
# files is somewhat arbitrary and may be worth experimenting with if the
# fuzzer coverage plateaus.
f.write(fdp.ConsumeBytes(fdp.ConsumeIntInRange(1, 512)))

repo.index.add(file_paths)
repo.index.commit(fdp.ConsumeUnicodeNoSurrogates(fdp.ConsumeIntInRange(1, 80)))

fuzz_tree = git.Tree(repo, git.Tree.NULL_BIN_SHA, 0, "")

try:
fuzz_tree._deserialize(io.BytesIO(data))
except IndexError:
return -1


def main():
atheris.Setup(sys.argv, TestOneInput)
atheris.Fuzz()


if __name__ == "__main__":
main()
67 changes: 0 additions & 67 deletions fuzzing/fuzz-targets/fuzz_tree.py

This file was deleted.

Loading