Skip to content

Add Uri\WhatWg classes to ext/uri #18672

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: master
Choose a base branch
from
Open

Conversation

kocsismate
Copy link
Member

Copy link
Member

@TimWolla TimWolla left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

General comments regarding the tests.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given that these tests naturally are all WhatWg specific, it would make sense to move them into a subdirectory:

ext/uri/tests/whatwg/*

Possibly with an even deeper structure:

ext/uri/tests/whatwg/parse/*
ext/uri/tests/whatwg/getters/*
ext/uri/tests/whatwg/stringify/*
ext/uri/tests/whatwg/withers/*

or something like that. This would make it easy to check whether something is already tested and would keep things organized when adding tests after the fact.

ext/random is smaller, but perhaps it can serve as an inspiration (I've went through every test and gave it a proper name): https://github.com/php/php-src/tree/master/ext/random/tests

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah no, these are not specific to WHATWG, I just selectively removed all RFC 3986 related parts from each test (and from every other file). I also planned some test renaming as I saw that tests were currently out of control, but this would be much easier to do after the Uri class is also merged so that the conflicts are avoided.

@@ -0,0 +1,32 @@
--TEST--
Parse URL exotic URLs
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Besides the improved directory structure, having a common prefix in the test title would also be helpful when looking at the test output. Specifically the “WhatWg” information is currently missing.

Suggested change
Parse URL exotic URLs
Uri: WhatWg: parse(): Exotic URLs

Also an example:

Random: Randomizer: getInt(): Basic functionality

Copy link
Member

@TimWolla TimWolla left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some first comments before ending my day. I'm having some troubles with the amount of “indirection” / “function pointers” handling things in a very abstract fashion that I'm not sure is always necessary.

Comment on lines +140 to +146
if (property_handler->write_func(new_internal_uri, property_zv, &errors) == FAILURE) { \
throw_invalid_uri_exception(new_internal_uri->handler, &errors); \
zval_ptr_dtor(&errors); \
zend_object_release(new_object); \
zval_ptr_dtor(property_zv); \
RETURN_THROWS(); \
} \
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be cleaner if the write_func() would throw the exception by itself. This gives it best control over the exception data.

If necessary a silent parameter could be added, but extensions are already able to suppress any exceptions if they are undesired, so this is probably not necessary.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't want to throw in the handlers themselves because of the internal API: different kinds of exceptions/errors may be used in different contexts, especially when code needs to be backward compatible. For example SOAP uses a SoapFault with the "Unable to parse URL" message (https://github.com/php/php-src/blob/cf0c982b0ba937f4b94923c8fe4d541f0181a283/ext/soap/php_http.c#L467) to indicate an error, while the FILTER_VALIDATE_URL filter uses the false return value etc. Another example from the code of the RFC itself is the parse_uri handler that you spotted below: it should generally throw an InvalidUriException, except when it's used during unserialization.

That's why my reasoning is that handlers should be free of any exception throwing, and the caller should decide based on the context what to do with the error.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's why my reasoning is that handlers should be free of any exception throwing, and the caller should decide based on the context what to do with the error.

When internal code throws an exception while an Exception is already active, it will automatically set the $previous value to the active exception. This matches the:

It MUST set the $previous property to the original exception when doing so.

rule from the Throwable policy RFC (https://externals.io/message/127340) that is very likely to pass. Thus it is super easy to wrap the exception during unserialization (and probably also in Soap). You can see this pattern in action at:

if (FAILURE == php_random_bytes_throw(ZSTR_VAL(buffer), ZSTR_LEN(buffer))) {
zend_value_error("Unable to generate salt");
zend_string_release_ex(buffer, 0);
return NULL;
}

Where the Random\RandomException will be wrapped into the ValueError.


Generally speaking, APIs that do not throw an Exception on error should become more rare going forward, thus throwing an exception is the "standard case". In case an exception is absolutely undesired either a silent parameter would work or alternatively the exception can be suppressed with zend_clear_exception() after extracting the message.

Comment on lines +243 to +246
if (this_str == NULL) {
zend_throw_exception_ex(NULL, 0, "Cannot recompose %s to string", ZSTR_VAL(this_object->ce->name));
RETURN_THROWS();
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this reachable? If not I would just ZEND_ASSERT(this_str). Same elsewhere.

Copy link
Member Author

@kocsismate kocsismate May 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The current classes won't fail during recomposition, that's true. However, if we want to expose uri_equals in the URI parsing API (I think we should (?)), then:

  • I'll have to refactor it a little bit so that it doesn't depend on ZPP etc.
  • We have to take into account other 3rd party classes that may fail during recomposition (e.g. when their URL parser is not integrated into PHP's memory handler).

Comment on lines +20 to +27
extern zend_class_entry *whatwg_url_ce;
extern zend_object_handlers whatwg_uri_object_handlers;
extern zend_class_entry *uri_comparison_mode_ce;
extern zend_class_entry *uri_exception_ce;
extern zend_class_entry *invalid_uri_exception_ce;
extern zend_class_entry *whatwg_invalid_url_exception_ce;
extern zend_class_entry *whatwg_url_validation_error_type_ce;
extern zend_class_entry *whatwg_url_validation_error_ce;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Naming suggestion from #18658 apply.

Copy link
Member

@nielsdos nielsdos left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First round, didn't check the tests yet

RETURN_THROWS();
}

object_properties_load(object, Z_ARRVAL_P(arr));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note to self: check if this is always safe or if this can be abused in crazy ways

ZEND_ASSERT(new_object != NULL);
uri_object_t *new_uri_object = uri_object_from_obj(new_object);

URI_ASSERT_INITIALIZATION(internal_uri);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should happen at the start of the function to avoid doing needless work + to avoid memleaks

lxb_url_t *url, *lexbor_base_url = NULL;

if (base_url) {
lexbor_base_url = (lxb_url_t *) base_url;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are casting away the const. lexbor_base_url should be const lxb_url_t * and the cast removed.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice catch, thanks!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not fully resolved: The cast needs to be removed. In fact, the if (base_url) is also unnecessary. You can just do: const lxb_url_t *lexbor_base_url = base_url;.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, the simplification works!

Comment on lines +542 to +548
zend_update_property_string(
whatwg_invalid_url_exception_ce,
Z_OBJ_P(exception_zv),
ZSTR_VAL(ZSTR_KNOWN(ZEND_STR_MESSAGE)),
ZSTR_LEN(ZSTR_KNOWN(ZEND_STR_MESSAGE)),
"URL parsing failed"
);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I would find this a little cleaner. Using a known string just to decompose it into val + length doesn't bring any benefit:

Suggested change
zend_update_property_string(
whatwg_invalid_url_exception_ce,
Z_OBJ_P(exception_zv),
ZSTR_VAL(ZSTR_KNOWN(ZEND_STR_MESSAGE)),
ZSTR_LEN(ZSTR_KNOWN(ZEND_STR_MESSAGE)),
"URL parsing failed"
);
zval value;
ZVAL_STRING(&value, "URL parsing failed");
zend_update_property_ex(whatwg_invalid_url_exception_ce, Z_OBJ_P(exception_zv), ZSTR_KNOWN(ZEND_STR_MESSAGE), &value);
zval_ptr_dtor(&value);

Comment on lines +62 to +68
#define CHECK_WRITE_RESULT(status, lexbor_uri, str, errors) do { \
if (status != LXB_STATUS_OK) { \
fill_errors(errors); \
return FAILURE; \
} \
return SUCCESS; \
} while (0)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel that this macro is not much simpler than simply using copy and paste of the definition. To me it makes the code much harder to follow, though, since it is not clear that it unconditionally returns and since it also has unused parameters.

Z_PARAM_BOOL(failure)
ZEND_PARSE_PARAMETERS_END();

zend_update_property_str(whatwg_url_validation_error_ce, Z_OBJ_P(ZEND_THIS), "context", sizeof("context") - 1, context);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You need the following after every assignment:

	/* The assignment might fail due to 'readonly'. */
	if (UNEXPECTED(EG(exception))) {
		RETURN_THROWS();
	}

Test case:

<?php

use Uri\WhatWg\UrlValidationErrorType;

$r = new Uri\WhatWg\UrlValidationError('foo', UrlValidationErrorType::DomainInvalidCodePoint, true);
$r->__construct('bar', UrlValidationErrorType::HostMissing, false);

var_dump($r);

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it was an oversight that I forgot these checks. I'll add the test case.

whatwg_url_validation_error_ce = register_class_Uri_WhatWg_UrlValidationError();
whatwg_url_validation_error_type_ce = register_class_Uri_WhatWg_UrlValidationErrorType();

zend_hash_init(&uri_handlers, 4, NULL, ZVAL_PTR_DTOR, true);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is ZVAL_PTR_DTOR the correct destructor? Should it perhaps rather be something that calls handler->destroy_parser()?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, ZVAL_PTR_DTOR shouldn't be there indeed, no matter that it doesn't do anything as far as I see, because the hash content is not refcounted (

if (Z_REFCOUNTED_P(zval_ptr)) {
). Calling destroy_parser() there is not good either, because this handler is called in RSHUTDOWN (as discussed under Niels' 2nd comment: #18672 (comment)).

I think skipping the destructor entirely works well in this case.

} while (0)

#define URI_REGISTER_PROPERTY_READ_WRITE_HANDLER(property_handlers, name, property_read_func, property_write_func) do { \
static const uri_property_handler_t handler = {.read_func = property_read_func, .write_func = property_write_func}; \
Copy link
Member

@TimWolla TimWolla May 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It took me a while to see why this isn't just storing a stack pointer (I missed the static). I think we can do without the macro by using zend_hash_add_new_mem() in uri_register_property_handler(), changing the signature to void uri_register_property_handler(HashTable *property_handlers, zend_string *name, const uri_property_handler_t handler) and then just calling:

	uri_register_property_handler(&lexbor_property_handlers, ZSTR_KNOWN(ZEND_STR_SCHEME), (uri_property_handler_t){
		.read_func = lexbor_read_scheme,
		.write_func = lexbor_write_scheme,
	});

like this. It's a little more verbose, but very clear in what is happening.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, the current approach was taken over from the DOM extension. DOM used to have the _add_mem approach you describe, but I stepped away from it because the data doesn't need to be on the heap and can sit in rodata

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. I'll have another look at this, when I perform an "in-depth" review after the first round of fixes.

@kocsismate
Copy link
Member Author

I addressed most review comments (that I marked with 👀 ). I'll continue tomorrow with the rest of the comments.

Niels, please let me know if you'd prefer to resolve your comments yourself or I should do it. I'll resolve the obvious ones for Tim tomorrow.

@nielsdos
Copy link
Member

Niels, please let me know if you'd prefer to resolve your comments yourself or I should do it. I'll resolve the obvious ones for Tim tomorrow.

It's fine by me if you resolve them. If you'd like some extra hands helping with the coding just hmu.

return result;
}

zend_string *object_str = ZSTR_INIT_LITERAL("(object value omitted)", false);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just before I sign off today: this string initialization can now be removed.

Comment on lines +233 to +241
zval error_type;
zend_enum_new(&error_type, whatwg_url_validation_error_type_ce, error_str, NULL);
zend_update_property(whatwg_url_validation_error_ce, Z_OBJ(error), ZEND_STRL("type"), &error_type);
zend_string_release_ex(error_str, false);
zval_ptr_dtor(&error_type);

zend_update_property(whatwg_url_validation_error_ce, Z_OBJ(error), ZEND_STRL("failure"), &failure);

add_next_index_zval(errors, &error);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
zval error_type;
zend_enum_new(&error_type, whatwg_url_validation_error_type_ce, error_str, NULL);
zend_update_property(whatwg_url_validation_error_ce, Z_OBJ(error), ZEND_STRL("type"), &error_type);
zend_string_release_ex(error_str, false);
zval_ptr_dtor(&error_type);
zend_update_property(whatwg_url_validation_error_ce, Z_OBJ(error), ZEND_STRL("failure"), &failure);
add_next_index_zval(errors, &error);
zval error_type;
zend_enum_new(&error_type, whatwg_url_validation_error_type_ce, error_str, NULL);
zend_update_property(whatwg_url_validation_error_ce, Z_OBJ(error), ZEND_STRL("type"), &error_type);
zval_ptr_dtor(&error_type);
zend_update_property(whatwg_url_validation_error_ce, Z_OBJ(error), ZEND_STRL("failure"), &failure);
add_next_index_zval(errors, &error);
zend_string_release_ex(error_str, false);

To me it makes more sense to release the error_str at the very bottom of the loop, since it is initialized at the top. It feels odd to do that in-between, which makes it more likely to miss it (leading to use-after-frees after refactoring).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I usually try to free stuffs as soon as possible so that I don't have to worry about memleaks when new stop conditions are added before the free operation. Freeing late is defensive to avoid possible use-after-frees indeed, but on the other hand, it makes it easier to commit memleaks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants