Skip to content

Commit 306a197

Browse files
authored
Update article.md
1 parent dc7a157 commit 306a197

File tree

1 file changed

+7
-7
lines changed

1 file changed

+7
-7
lines changed

1-js/99-js-misc/06-unicode/article.md

+7-7
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,7 @@
22
# Unicode, String internals
33

44
```warn header="Advanced knowledge"
5-
The section goes deeper into string internals. This knowledge will be useful for you if you plan to deal with emoji, rare mathematical or hieroglyphic characters, or other rare symbols.
5+
The section goes deeper into string internals. This knowledge will be useful for you if you plan to deal with emoji, rare mathematical or logographic characters, or other rare symbols.
66
```
77

88
As we already know, JavaScript strings are based on [Unicode](https://en.wikipedia.org/wiki/Unicode): each character is represented by a byte sequence of 1-4 bytes.
@@ -13,9 +13,9 @@ JavaScript allows us to insert a character into a string by specifying its hexad
1313

1414
`XX` must be two hexadecimal digits with a value between `00` and `FF`, then `\xXX` is the character whose Unicode code is `XX`.
1515

16-
Because the `\xXX` notation supports only two digits, it can be used only for the first 256 Unicode characters.
16+
Because the `\xXX` notation supports only two hexadecimal digits, it can be used only for the first 256 Unicode characters.
1717

18-
These first 256 characters include the latin alphabet, most basic syntax characters, and some others. For example, `"\x7A"` is the same as `"z"` (Unicode `U+007A`).
18+
These first 256 characters include the Latin alphabet, most basic syntax characters, and some others. For example, `"\x7A"` is the same as `"z"` (Unicode `U+007A`).
1919

2020
```js run
2121
alert( "\x7A" ); // z
@@ -29,7 +29,7 @@ JavaScript allows us to insert a character into a string by specifying its hexad
2929
3030
```js run
3131
alert( "\u00A9" ); // ©, the same as \xA9, using the 4-digit hex notation
32-
alert( "\u044F" ); // я, the cyrillic alphabet letter
32+
alert( "\u044F" ); // я, the Cyrillic alphabet letter
3333
alert( "\u2191" ); // ↑, the arrow up symbol
3434
```
3535

@@ -38,13 +38,13 @@ JavaScript allows us to insert a character into a string by specifying its hexad
3838
`XXXXXXX` must be a hexadecimal value of 1 to 6 bytes between `0` and `10FFFF` (the highest code point defined by Unicode). This notation allows us to easily represent all existing Unicode characters.
3939
4040
```js run
41-
alert( "\u{20331}" ); // 佫, a rare Chinese hieroglyph (long Unicode)
41+
alert( "\u{20331}" ); // 佫, a rare Chinese character (long Unicode)
4242
alert( "\u{1F60D}" ); // 😍, a smiling face symbol (another long Unicode)
4343
```
4444

4545
## Surrogate pairs
4646

47-
All frequently used characters have 2-byte codes. Letters in most european languages, numbers, and even most hieroglyphs, have a 2-byte representation.
47+
All frequently used characters have 2-byte codes (4 hex digits). Letters in most European languages, numbers, and the basic CJK ideograph set (from Chinese, Japanese, and Korean writing systems), have a 2-byte representation.
4848

4949
Initially, JavaScript was based on UTF-16 encoding that only allowed 2 bytes per character. But 2 bytes only allow 65536 combinations and that's not enough for every possible symbol of Unicode.
5050
@@ -55,7 +55,7 @@ As a side effect, the length of such symbols is `2`:
5555
```js run
5656
alert( '𝒳'.length ); // 2, MATHEMATICAL SCRIPT CAPITAL X
5757
alert( '😂'.length ); // 2, FACE WITH TEARS OF JOY
58-
alert( '𩷶'.length ); // 2, a rare Chinese hieroglyph
58+
alert( '𩷶'.length ); // 2, a rare Chinese character
5959
```
6060
6161
That's because surrogate pairs did not exist at the time when JavaScript was created, and thus are not correctly processed by the language!

0 commit comments

Comments
 (0)