Skip to content

PyEval_SetProfileAllThreads is racy under free-threading #132817

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
hawkinsp opened this issue Apr 22, 2025 · 5 comments
Open

PyEval_SetProfileAllThreads is racy under free-threading #132817

hawkinsp opened this issue Apr 22, 2025 · 5 comments
Labels
3.13 bugs and security fixes 3.14 new features, bugs and security fixes interpreter-core (Objects, Python, Grammar, and Parser dirs) topic-free-threading type-bug An unexpected behavior, bug, or error

Comments

@hawkinsp
Copy link
Contributor

hawkinsp commented Apr 22, 2025

Bug report

Bug description:

PyEval_SetProfileAllThreads iterates over the list of thread-states without doing anything to stop the world:

while (ts) {

In jax-ml/jax#28142, we saw the following TSAN report from a 3.14 freethreaded build:

2025-04-21T05:51:27.9856358Z WARNING: ThreadSanitizer: heap-use-after-free (pid=135448)
2025-04-21T05:51:27.9857362Z   Write of size 8 at 0x72a000030050 by thread T2 (mutexes: write M0):
2025-04-21T05:51:27.9859014Z     #0 setup_profile /__w/jax/jax/cpython/Python/legacy_tracing.c:488:27 (python3.14+0x4afbc2) (BuildId: eefa0d7b7d9f04b19cecc485006a6f3518071e23)
2025-04-21T05:51:27.9861003Z     #1 _PyEval_SetProfile /__w/jax/jax/cpython/Python/legacy_tracing.c:513:36 (python3.14+0x4afbc2)
2025-04-21T05:51:27.9862948Z     #2 PyEval_SetProfileAllThreads /__w/jax/jax/cpython/Python/ceval.c:2485:13 (python3.14+0x424d96) (BuildId: eefa0d7b7d9f04b19cecc485006a6f3518071e23)
2025-04-21T05:51:27.9907359Z     #3 xla::profiler::PythonHookContext::SetProfilerInAllThreads() /proc/self/cwd/external/xla/xla/python/profiler/internal/python_hooks.cc:385:3 (libjax_common.so+0xe960701) (BuildId: 6ab57a4ee1673a660c9ab8350dab69cfc1b93d94)
2025-04-21T05:51:27.9911718Z     #4 xla::profiler::PythonHookContext::Start(xla::profiler::PythonHooksOptions const&) /proc/self/cwd/external/xla/xla/python/profiler/internal/python_hooks.cc:162:7 (libjax_common.so+0xe95fd33) (BuildId: 6ab57a4ee1673a660c9ab8350dab69cfc1b93d94)
...

2025-04-21T05:51:28.0175658Z   Previous write of size 8 at 0x72a000030050 by thread T137:
2025-04-21T05:51:28.0176775Z     #0 free <null> (python3.14+0xdff13) (BuildId: eefa0d7b7d9f04b19cecc485006a6f3518071e23)
2025-04-21T05:51:28.0178539Z     #1 _PyMem_RawFree /__w/jax/jax/cpython/Objects/obmalloc.c:91:5 (python3.14+0x2cce25) (BuildId: eefa0d7b7d9f04b19cecc485006a6f3518071e23)
2025-04-21T05:51:28.0180714Z     #2 PyMem_RawFree /__w/jax/jax/cpython/Objects/obmalloc.c:989:5 (python3.14+0x2cecb4) (BuildId: eefa0d7b7d9f04b19cecc485006a6f3518071e23)
2025-04-21T05:51:28.0183087Z     #3 free_threadstate /__w/jax/jax/cpython/Python/pystate.c:1486:9 (python3.14+0x4cdca1) (BuildId: eefa0d7b7d9f04b19cecc485006a6f3518071e23)
2025-04-21T05:51:28.0185149Z     #4 _PyThreadState_DeleteCurrent /__w/jax/jax/cpython/Python/pystate.c:1909:5 (python3.14+0x4cdca1)
2025-04-21T05:51:28.0187230Z     #5 thread_run /__w/jax/jax/cpython/./Modules/_threadmodule.c:371:5 (python3.14+0x59f81c) (BuildId: eefa0d7b7d9f04b19cecc485006a6f3518071e23)
2025-04-21T05:51:28.0189489Z     #6 pythread_wrapper /__w/jax/jax/cpython/Python/thread_pthread.h:242:5 (python3.14+0x4f9017) (BuildId: eefa0d7b7d9f04b19cecc485006a6f3518071e23)

and reading the code this makes sense; PyEval_SetProfileAllThreads does nothing to prevent threads from starting or stopping during its execution.

CPython versions tested on:

3.14

Operating systems tested on:

Linux

@hawkinsp hawkinsp added the type-bug An unexpected behavior, bug, or error label Apr 22, 2025
@ZeroIntensity
Copy link
Member

Very similar to #132296. I'm not totally sure what the best way to deal with this is. Stop-the-world isn't really an option because _PyEval_SetProfile is re-entrant.

@ZeroIntensity ZeroIntensity added interpreter-core (Objects, Python, Grammar, and Parser dirs) 3.13 bugs and security fixes topic-free-threading 3.14 new features, bugs and security fixes labels Apr 22, 2025
@hawkinsp
Copy link
Contributor Author

hawkinsp commented Apr 23, 2025

Without knowing this code well, I wonder if you could keep the thread states (the list of which is read-mostly?) in a QSBR-style data structure so it's safe to iterate on them even in the presence of concurrent list updates.

@ZeroIntensity
Copy link
Member

There's a few ways to deal with the concurrent deletion of the thread state, but the much bigger problem is that thread states are, by design, not thread-safe, because they're supposed to be thread-local.

@ZeroIntensity
Copy link
Member

A crazy idea that could possibly work: I guess we could re-enable the GIL right before the loop to prevent other threads from messing things up, and then disable it again at the end.

@colesbury
Copy link
Contributor

colesbury commented Apr 25, 2025

The whole iteration over thread states isn't safe even in the GIL-enabled build. It's a mess. I think the "right" way to do this is something like:

  1. Move the _PySys_Audit() earlier and call it only once instead of once per-thread
  2. Stop the world before iteration over thread states (or at least hold the HEAD_LOCK() for the entire loop)
  3. Make sure that nothing else in _PyEval_SetProfile() is possibly reentrant

I wonder if you could keep the thread states (the list of which is read-mostly?) in a QSBR-style data structure...

I've played around with this before and it started to get really complicated. I think we may want to pursue this eventually if we contention on HEAD_LOCK() becomes a problem, but probably not immediately.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3.13 bugs and security fixes 3.14 new features, bugs and security fixes interpreter-core (Objects, Python, Grammar, and Parser dirs) topic-free-threading type-bug An unexpected behavior, bug, or error
Projects
None yet
Development

No branches or pull requests

3 participants