Skip to content

Fix dangling‐capture in TestNodeConfig::complete_func to eliminate flaky IDLE returns #967

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed

Conversation

pepeRossRobotics
Copy link

@pepeRossRobotics pepeRossRobotics commented Apr 23, 2025

Summary

When using TestNode for JSON-driven substitution rules, the default complete_func lambda in
TestNodeConfig was capturing this (the address of a temporary). After move-constructing into the member _test_config, that pointer became dangling and often returned IDLE from onRunning(), triggering a LogicError. This patch changes the capture to copy return_status by value, fully eliminating the UB and making the test nodes reliably return SUCCESS or FAILURE as configured.

Background / Motivation

Reported in issue #927: “TestNode sometimes returns IDLE onRunning()”
Reproduced in a minimal repo: test nodes intermittently returned IDLE even when configured for SUCCESS
Added file/minitrace loggers, saw onRunning() → IDLE and confirmed it came from complete_func()
Debugging revealed complete_func = [this](){…} pointed at a destroyed temporary
Changes

--- a/include/behaviortree_cpp/test_node.h
+++ b/include/behaviortree_cpp/test_node.h
@@ struct TestNodeConfig
-  std::function<NodeStatus(void)> complete_func = [this]() { return return_status; };
+  /// Capture return_status by value to avoid dangling this-pointer
+  std::function<NodeStatus(void)> complete_func = [ret = return_status]() { return ret; };

Before:

onStart() → RUNNING
onRunning() → complete_func() returns IDLE  ← UB, crashes

After:

onStart() → RUNNING
onRunning() → complete_func() returns SUCCESS ← correct

Testing

Added file- and minitrace-logger in user code to capture every tick.
Wrote a small BT with a single <Action ID="Test" …/> substituted via JSON.
Verified 1st tick RUNNING, 2nd tick SUCCESS.
Ran full scenario suite — no more LogicError.

Related issues Issues:
#927: #927
and maybe
#930: #930

Thanks for reviewing 🙏 Let me know if you’d like any tweaks!

…by value:

Now every TestNodeConfig instance has its own copy of return_status, and no dangling pointer ever occurs, eliminating the illegal IDLE return in onRunning().
@facontidavide
Copy link
Collaborator

CI is red

@facontidavide
Copy link
Collaborator

I opted for a different solution, but thanks for finding the issue 😄

3540b4a

@facontidavide
Copy link
Collaborator

also, thinking more carefully about it, I decided to make a more radical change: 2da7906

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants