-
Notifications
You must be signed in to change notification settings - Fork 7.9k
Add Uri\WhatWg classes to ext/uri #18672
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
General comments regarding the tests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given that these tests naturally are all WhatWg specific, it would make sense to move them into a subdirectory:
ext/uri/tests/whatwg/*
Possibly with an even deeper structure:
ext/uri/tests/whatwg/parse/*
ext/uri/tests/whatwg/getters/*
ext/uri/tests/whatwg/stringify/*
ext/uri/tests/whatwg/withers/*
or something like that. This would make it easy to check whether something is already tested and would keep things organized when adding tests after the fact.
ext/random is smaller, but perhaps it can serve as an inspiration (I've went through every test and gave it a proper name): https://github.com/php/php-src/tree/master/ext/random/tests
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah no, these are not specific to WHATWG, I just selectively removed all RFC 3986 related parts from each test (and from every other file). I also planned some test renaming as I saw that tests were currently out of control, but this would be much easier to do after the Uri
class is also merged so that the conflicts are avoided.
@@ -0,0 +1,32 @@ | |||
--TEST-- | |||
Parse URL exotic URLs |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Besides the improved directory structure, having a common prefix in the test title would also be helpful when looking at the test output. Specifically the “WhatWg” information is currently missing.
Parse URL exotic URLs | |
Uri: WhatWg: parse(): Exotic URLs |
Also an example:
Random: Randomizer: getInt(): Basic functionality |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some first comments before ending my day. I'm having some troubles with the amount of “indirection” / “function pointers” handling things in a very abstract fashion that I'm not sure is always necessary.
if (property_handler->write_func(new_internal_uri, property_zv, &errors) == FAILURE) { \ | ||
throw_invalid_uri_exception(new_internal_uri->handler, &errors); \ | ||
zval_ptr_dtor(&errors); \ | ||
zend_object_release(new_object); \ | ||
zval_ptr_dtor(property_zv); \ | ||
RETURN_THROWS(); \ | ||
} \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it would be cleaner if the write_func()
would throw the exception by itself. This gives it best control over the exception data.
If necessary a silent
parameter could be added, but extensions are already able to suppress any exceptions if they are undesired, so this is probably not necessary.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't want to throw in the handlers themselves because of the internal API: different kinds of exceptions/errors may be used in different contexts, especially when code needs to be backward compatible. For example SOAP uses a SoapFault
with the "Unable to parse URL" message (https://github.com/php/php-src/blob/cf0c982b0ba937f4b94923c8fe4d541f0181a283/ext/soap/php_http.c#L467) to indicate an error, while the FILTER_VALIDATE_URL
filter uses the false
return value etc. Another example from the code of the RFC itself is the parse_uri
handler that you spotted below: it should generally throw an InvalidUriException
, except when it's used during unserialization.
That's why my reasoning is that handlers should be free of any exception throwing, and the caller should decide based on the context what to do with the error.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's why my reasoning is that handlers should be free of any exception throwing, and the caller should decide based on the context what to do with the error.
When internal code throws an exception while an Exception is already active, it will automatically set the $previous
value to the active exception. This matches the:
It MUST set the
$previous
property to the original exception when doing so.
rule from the Throwable policy RFC (https://externals.io/message/127340) that is very likely to pass. Thus it is super easy to wrap the exception during unserialization (and probably also in Soap). You can see this pattern in action at:
php-src/ext/standard/password.c
Lines 87 to 91 in c4fba37
if (FAILURE == php_random_bytes_throw(ZSTR_VAL(buffer), ZSTR_LEN(buffer))) { | |
zend_value_error("Unable to generate salt"); | |
zend_string_release_ex(buffer, 0); | |
return NULL; | |
} |
Where the Random\RandomException
will be wrapped into the ValueError
.
Generally speaking, APIs that do not throw an Exception on error should become more rare going forward, thus throwing an exception is the "standard case". In case an exception is absolutely undesired either a silent
parameter would work or alternatively the exception can be suppressed with zend_clear_exception()
after extracting the message.
if (this_str == NULL) { | ||
zend_throw_exception_ex(NULL, 0, "Cannot recompose %s to string", ZSTR_VAL(this_object->ce->name)); | ||
RETURN_THROWS(); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this reachable? If not I would just ZEND_ASSERT(this_str)
. Same elsewhere.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The current classes won't fail during recomposition, that's true. However, if we want to expose uri_equals
in the URI parsing API (I think we should (?)), then:
- I'll have to refactor it a little bit so that it doesn't depend on ZPP etc.
- We have to take into account other 3rd party classes that may fail during recomposition (e.g. when their URL parser is not integrated into PHP's memory handler).
extern zend_class_entry *whatwg_url_ce; | ||
extern zend_object_handlers whatwg_uri_object_handlers; | ||
extern zend_class_entry *uri_comparison_mode_ce; | ||
extern zend_class_entry *uri_exception_ce; | ||
extern zend_class_entry *invalid_uri_exception_ce; | ||
extern zend_class_entry *whatwg_invalid_url_exception_ce; | ||
extern zend_class_entry *whatwg_url_validation_error_type_ce; | ||
extern zend_class_entry *whatwg_url_validation_error_ce; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Naming suggestion from #18658 apply.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
First round, didn't check the tests yet
RETURN_THROWS(); | ||
} | ||
|
||
object_properties_load(object, Z_ARRVAL_P(arr)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note to self: check if this is always safe or if this can be abused in crazy ways
ext/uri/php_uri.c
Outdated
ZEND_ASSERT(new_object != NULL); | ||
uri_object_t *new_uri_object = uri_object_from_obj(new_object); | ||
|
||
URI_ASSERT_INITIALIZATION(internal_uri); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should happen at the start of the function to avoid doing needless work + to avoid memleaks
lxb_url_t *url, *lexbor_base_url = NULL; | ||
|
||
if (base_url) { | ||
lexbor_base_url = (lxb_url_t *) base_url; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are casting away the const
. lexbor_base_url
should be const lxb_url_t *
and the cast removed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice catch, thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not fully resolved: The cast needs to be removed. In fact, the if (base_url)
is also unnecessary. You can just do: const lxb_url_t *lexbor_base_url = base_url;
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, the simplification works!
zend_update_property_string( | ||
whatwg_invalid_url_exception_ce, | ||
Z_OBJ_P(exception_zv), | ||
ZSTR_VAL(ZSTR_KNOWN(ZEND_STR_MESSAGE)), | ||
ZSTR_LEN(ZSTR_KNOWN(ZEND_STR_MESSAGE)), | ||
"URL parsing failed" | ||
); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I would find this a little cleaner. Using a known string just to decompose it into val + length doesn't bring any benefit:
zend_update_property_string( | |
whatwg_invalid_url_exception_ce, | |
Z_OBJ_P(exception_zv), | |
ZSTR_VAL(ZSTR_KNOWN(ZEND_STR_MESSAGE)), | |
ZSTR_LEN(ZSTR_KNOWN(ZEND_STR_MESSAGE)), | |
"URL parsing failed" | |
); | |
zval value; | |
ZVAL_STRING(&value, "URL parsing failed"); | |
zend_update_property_ex(whatwg_invalid_url_exception_ce, Z_OBJ_P(exception_zv), ZSTR_KNOWN(ZEND_STR_MESSAGE), &value); | |
zval_ptr_dtor(&value); |
#define CHECK_WRITE_RESULT(status, lexbor_uri, str, errors) do { \ | ||
if (status != LXB_STATUS_OK) { \ | ||
fill_errors(errors); \ | ||
return FAILURE; \ | ||
} \ | ||
return SUCCESS; \ | ||
} while (0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel that this macro is not much simpler than simply using copy and paste of the definition. To me it makes the code much harder to follow, though, since it is not clear that it unconditionally returns and since it also has unused parameters.
ext/uri/php_uri.c
Outdated
Z_PARAM_BOOL(failure) | ||
ZEND_PARSE_PARAMETERS_END(); | ||
|
||
zend_update_property_str(whatwg_url_validation_error_ce, Z_OBJ_P(ZEND_THIS), "context", sizeof("context") - 1, context); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You need the following after every assignment:
/* The assignment might fail due to 'readonly'. */
if (UNEXPECTED(EG(exception))) {
RETURN_THROWS();
}
Test case:
<?php
use Uri\WhatWg\UrlValidationErrorType;
$r = new Uri\WhatWg\UrlValidationError('foo', UrlValidationErrorType::DomainInvalidCodePoint, true);
$r->__construct('bar', UrlValidationErrorType::HostMissing, false);
var_dump($r);
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it was an oversight that I forgot these checks. I'll add the test case.
whatwg_url_validation_error_ce = register_class_Uri_WhatWg_UrlValidationError(); | ||
whatwg_url_validation_error_type_ce = register_class_Uri_WhatWg_UrlValidationErrorType(); | ||
|
||
zend_hash_init(&uri_handlers, 4, NULL, ZVAL_PTR_DTOR, true); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is ZVAL_PTR_DTOR
the correct destructor? Should it perhaps rather be something that calls handler->destroy_parser()
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, ZVAL_PTR_DTOR
shouldn't be there indeed, no matter that it doesn't do anything as far as I see, because the hash content is not refcounted (
Line 42 in b068c2f
if (Z_REFCOUNTED_P(zval_ptr)) { |
destroy_parser()
there is not good either, because this handler is called in RSHUTDOWN
(as discussed under Niels' 2nd comment: #18672 (comment)).
I think skipping the destructor entirely works well in this case.
} while (0) | ||
|
||
#define URI_REGISTER_PROPERTY_READ_WRITE_HANDLER(property_handlers, name, property_read_func, property_write_func) do { \ | ||
static const uri_property_handler_t handler = {.read_func = property_read_func, .write_func = property_write_func}; \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It took me a while to see why this isn't just storing a stack pointer (I missed the static). I think we can do without the macro by using zend_hash_add_new_mem()
in uri_register_property_handler()
, changing the signature to void uri_register_property_handler(HashTable *property_handlers, zend_string *name, const uri_property_handler_t handler)
and then just calling:
uri_register_property_handler(&lexbor_property_handlers, ZSTR_KNOWN(ZEND_STR_SCHEME), (uri_property_handler_t){
.read_func = lexbor_read_scheme,
.write_func = lexbor_write_scheme,
});
like this. It's a little more verbose, but very clear in what is happening.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, the current approach was taken over from the DOM extension. DOM used to have the _add_mem approach you describe, but I stepped away from it because the data doesn't need to be on the heap and can sit in rodata
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see. I'll have another look at this, when I perform an "in-depth" review after the first round of fixes.
I addressed most review comments (that I marked with 👀 ). I'll continue tomorrow with the rest of the comments. Niels, please let me know if you'd prefer to resolve your comments yourself or I should do it. I'll resolve the obvious ones for Tim tomorrow. |
It's fine by me if you resolve them. If you'd like some extra hands helping with the coding just hmu. |
return result; | ||
} | ||
|
||
zend_string *object_str = ZSTR_INIT_LITERAL("(object value omitted)", false); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just before I sign off today: this string initialization can now be removed.
zval error_type; | ||
zend_enum_new(&error_type, whatwg_url_validation_error_type_ce, error_str, NULL); | ||
zend_update_property(whatwg_url_validation_error_ce, Z_OBJ(error), ZEND_STRL("type"), &error_type); | ||
zend_string_release_ex(error_str, false); | ||
zval_ptr_dtor(&error_type); | ||
|
||
zend_update_property(whatwg_url_validation_error_ce, Z_OBJ(error), ZEND_STRL("failure"), &failure); | ||
|
||
add_next_index_zval(errors, &error); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
zval error_type; | |
zend_enum_new(&error_type, whatwg_url_validation_error_type_ce, error_str, NULL); | |
zend_update_property(whatwg_url_validation_error_ce, Z_OBJ(error), ZEND_STRL("type"), &error_type); | |
zend_string_release_ex(error_str, false); | |
zval_ptr_dtor(&error_type); | |
zend_update_property(whatwg_url_validation_error_ce, Z_OBJ(error), ZEND_STRL("failure"), &failure); | |
add_next_index_zval(errors, &error); | |
zval error_type; | |
zend_enum_new(&error_type, whatwg_url_validation_error_type_ce, error_str, NULL); | |
zend_update_property(whatwg_url_validation_error_ce, Z_OBJ(error), ZEND_STRL("type"), &error_type); | |
zval_ptr_dtor(&error_type); | |
zend_update_property(whatwg_url_validation_error_ce, Z_OBJ(error), ZEND_STRL("failure"), &failure); | |
add_next_index_zval(errors, &error); | |
zend_string_release_ex(error_str, false); |
To me it makes more sense to release the error_str
at the very bottom of the loop, since it is initialized at the top. It feels odd to do that in-between, which makes it more likely to miss it (leading to use-after-frees after refactoring).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I usually try to free stuffs as soon as possible so that I don't have to worry about memleaks when new stop conditions are added before the free
operation. Free
ing late is defensive to avoid possible use-after-frees indeed, but on the other hand, it makes it easier to commit memleaks.
Relates to #14461 and https://wiki.php.net/rfc/url_parsing_api