Skip to content

Commit 9ac2438

Browse files
committed
Handle encodings better; make the sum type "public"
Windows does not have a direct analogue of LANG=C or LC_ALL=C. Some programs give them special treatment, but they do not affect the way localized behavior of the Windows API operates. In particular, the bash.exe WSL wrapper, as well as wsl.exe and wslconfig.exe, do not produce their own localized messages (for errors not originating in a running distribution) when they are set. Windows also provides significant localization through localized versions of Windows, so changing language settings in Windows, even system-wide, does not always produce the same effect that many or most Windows users who use a particular language would experience. Various encodings may appear when bash.exe is WSL-related but gives its own error message. Such a message is often in UTF-16LE, which is what Windows uses internally, and preserves any localization. That is the more well behaved scenario and already was detected; this commit moves, but does not change, the code for that. The situation where it is not UTF-16LE was previously handled by treating it as UTF-8. Because the default strict error treatment was used, this would error out in test discovery in some localized setups, preventing all tests in test_index from running, including the majority of them that are not related to hooks. This fixes that by doing better detection that should decode the mesages correctly most of the time, that should in practice decode them well enough to tell (by the aka.ms URL) if the message is complaining about there being no installed distribution all(?) of the time, and that should avoid breaking unrelated tests even if that can't be done. An English non-UTF-16LE message appears on GitHub Actions CI when no distribution is installed. Testing of this situation on other languages was performed in a virtual machine on a development machine. That the message is output in a narrow character set some of the time when bash.exe produces it appears to be a limitation of the bash.exe wrapper. In particular, with a zh-CN version of Windows (and with the language not changed to anything else), a localized message in Simplified Chinese was correctly printed when running wsl.exe, but running bash.exe produced literal "?" characters in place of Chinese characters (it was not a display or font issue, and they were "?" and not Unicode replacement characters). The change here does not overcome that; the literal "?" characters will be included. But "https://aka.ms/wslstore" is still present if it is an error about not having any distributions, so the correct status is still inferred. For more information on code pages in Windows, see: https://serverfault.com/a/836221 The following alternatives to all or part of the approach taken here were considered but, at least for now, not done, because they would not clearly be simpler or more correct: - Abandoning this "run bash.exe and see what it shows us" approach altogether and instead reimplementing the rules CreateProcessW uses, to find if the bash.exe the system finds is the one in System32, and then, if so, checking the metadata in that executable to determine if it's the WSL wrapper. I believe that would be even more complex to do correctly than it seems; the behavior noted in the WinBashStatus docstring and recent commit messages is not the whole story. The documented search order for CreateProcessW seems not to be followed in some circumstances. One is that the Windows Store version of Python seems always to forgo the usual System32 search that precedes seaching directories in PATH. It looks like there may also be some subtleties in which directories 32-bit builds search in. - Using chardet. Although the chardet library is excellent, it is not clear that the code needed to bridge this highly specific use case to it would be simpler than the code currently in use. Some of the work might still need to be done by other means; when I tested it out for this, this did not detect the UTF-16LE messages as such for English. (They are often also valid UTF-8, because interleaving null characters is, while strange, permitted.) - Also calling wsl.exe and/or wslconfig.exe. It's still necessary to call bash.exe, because it may not be the WSL bash, even on a system with WSL fully set up. Furthermore, those tools' output seem to vary in some complex ways, too. Using only one subprocess for the detection seemed the simplest. Even using "wsl --list" would introduce significant additional logic. Sometimes its output is a list of distributions, sometimes it is an error message, and if WSL is not set up it may be a help message. - Using the Windows API to check for WSL systems. https://learn.microsoft.com/en-us/windows/win32/api/wslapi/ does not currently include functions to list registered distributions. - Attempting to get wsl.exe to produce an English message using Windows API techniques like those used in Locale Emulator. This would be complicated, somewhat unintuitive and error prone to do in Python, and I am not sure how well it would work on a system that does not have an English language pack installed. - Checking on disk for WSL distributions in the places they are most often expected to be. This would intertwine WinBashStatus with deep details of how WSL actually operates, and this seems like the sort of thing that is likely to change in the future. However, there may be a more straightforward way to do this (that is at least as correct and that remains transparent to debug). Especially if the WinBashStatus class remains in test_index for long (rather than just being used to aid in debugging existing test falures and possible subsequent design decisions for making commit hooks work more robustly on Windows in GitPython), then this may be worth revisiting. Thus it is *not* with the intention of treating WinBashStatus as a "stable" part of the test suite that it is renamed from _WinBashStatus. This is instead done because: - Like HOOKS_SHEBANG, it makes sense to import it when figuring out how the tests work or debugging them. - When debugging, it is intended that it be imported to call check() and examine the resulting `process` and `message` information, at least in the CheckError case.
1 parent d779a75 commit 9ac2438

File tree

1 file changed

+54
-16
lines changed

1 file changed

+54
-16
lines changed

test/test_index.py

+54-16
Original file line numberDiff line numberDiff line change
@@ -41,10 +41,13 @@
4141

4242

4343
@sumtype
44-
class _WinBashStatus:
44+
class WinBashStatus:
4545
"""Status of bash.exe for native Windows. Affects which commit hook tests can pass.
4646
4747
Call :meth:`check` to check the status.
48+
49+
The :class:`CheckError` and :class:`WinError` cases should not typically be used in
50+
``skip`` or ``xfail`` mark conditions, because they represent unexpected situations.
4851
"""
4952

5053
Inapplicable = constructor()
@@ -84,10 +87,10 @@ def check(cls):
8487
On Windows, `Popen` calls ``CreateProcessW``, which checks some locations before
8588
using the ``PATH`` environment variable. It is expected to try the ``System32``
8689
directory, even if another directory containing the executable precedes it in
87-
``PATH``. (Other differences are less relevant here.) When WSL is present, even
88-
with no distributions, ``bash.exe`` usually exists in ``System32``, and `Popen`
89-
finds it even if another ``bash.exe`` precedes it in ``PATH``, as on CI. If WSL
90-
is absent, ``System32`` may still have ``bash.exe``, as Windows users and
90+
``PATH``. (The other differences are less relevant here.) When WSL is present,
91+
even with no distributions, ``bash.exe`` usually exists in ``System32``, and
92+
`Popen` finds it even if another ``bash.exe`` precedes it in ``PATH``, as on CI.
93+
If WSL is absent, ``System32`` may still have ``bash.exe``, as Windows users and
9194
administrators occasionally put executables there in lieu of extending ``PATH``.
9295
"""
9396
if os.name != "nt":
@@ -105,9 +108,7 @@ def check(cls):
105108
except OSError as error:
106109
return cls.WinError(error)
107110

108-
# FIXME: When not UTF-16LE: try local ANSI code page, then fall back to UTF-8.
109-
encoding = "utf-16le" if b"\r\0\n\0" in process.stdout else "utf-8"
110-
text = process.stdout.decode(encoding).rstrip() # stdout includes WSL errors.
111+
text = cls._decode(process.stdout).rstrip() # stdout includes WSL's own errors.
111112

112113
if process.returncode == 1 and re.search(r"\bhttps://aka.ms/wslstore\b", text):
113114
return cls.WslNoDistro(process, text)
@@ -121,8 +122,45 @@ def check(cls):
121122
log.error("Strange output checking WSL status: %s", text)
122123
return cls.CheckError(process, text)
123124

125+
@staticmethod
126+
def _decode(stdout):
127+
"""Decode ``bash.exe`` output as best we can. (This is used only on Windows.)"""
128+
# When bash.exe is the WSL wrapper but the output is from WSL itself rather than
129+
# code running in a distribution, the output is often in UTF-16LE, which Windows
130+
# uses internally. The UTF-16LE representation of a Windows-style line ending is
131+
# rarely seen otherwise, so use it to detect this situation.
132+
if b"\r\0\n\0" in stdout:
133+
return stdout.decode("utf-16le")
134+
135+
import winreg
136+
137+
# At this point, the output is probably either empty or not UTF-16LE. It's often
138+
# UTF-8 from inside a WSL distro or a non-WSL bash shell. But our test command
139+
# only uses the ASCII subset, so it's safe to guess wrong for that command's
140+
# output. Errors from inside a WSL distro or non-WSL bash.exe are arbitrary, but
141+
# unlike WSL's own messages, go to stderr, not stdout. So we can try the system
142+
# active code page first. (Although console programs usually use the OEM code
143+
# page, the ACP seems more accurate here. For example, on en-US Windows set to
144+
# fr-FR, the message, if not UTF-16LE, is windows-1252, same as the ACP, while
145+
# the OEM code page on such a system defaults to 437, which can't decode it.)
146+
hklm_path = R"SYSTEM\CurrentControlSet\Control\Nls\CodePage"
147+
with winreg.OpenKey(winreg.HKEY_LOCAL_MACHINE, hklm_path) as key:
148+
value, _ = winreg.QueryValueEx(key, "ACP")
149+
try:
150+
return stdout.decode(f"cp{value}")
151+
except UnicodeDecodeError:
152+
pass
153+
except LookupError as error:
154+
log.warning("%s", str(error)) # Message already says "Unknown encoding:".
155+
156+
# Assume UTF-8. If we don't have valid UTF-8, substitute Unicode replacement
157+
# characters. (For example, on zh-CN Windows set to fr-FR, error messages from
158+
# WSL itself, if not UTF-16LE, are in windows-1252, even though the ACP and OEM
159+
# code pages are 936; decoding as code page 936 or as UTF-8 both have errors.)
160+
return stdout.decode("utf-8", errors="replace")
161+
124162

125-
_win_bash_status = _WinBashStatus.check()
163+
_win_bash_status = WinBashStatus.check()
126164

127165

128166
def _make_hook(git_dir, name, content, make_exec=True):
@@ -994,7 +1032,7 @@ class Mocked:
9941032
self.assertEqual(rel, os.path.relpath(path, root))
9951033

9961034
@pytest.mark.xfail(
997-
type(_win_bash_status) is _WinBashStatus.WslNoDistro,
1035+
type(_win_bash_status) is WinBashStatus.WslNoDistro,
9981036
reason="Currently uses the bash.exe for WSL even with no WSL distro installed",
9991037
raises=HookExecutionError,
10001038
)
@@ -1005,7 +1043,7 @@ def test_pre_commit_hook_success(self, rw_repo):
10051043
index.commit("This should not fail")
10061044

10071045
@pytest.mark.xfail(
1008-
type(_win_bash_status) is _WinBashStatus.WslNoDistro,
1046+
type(_win_bash_status) is WinBashStatus.WslNoDistro,
10091047
reason="Currently uses the bash.exe for WSL even with no WSL distro installed",
10101048
raises=AssertionError,
10111049
)
@@ -1016,7 +1054,7 @@ def test_pre_commit_hook_fail(self, rw_repo):
10161054
try:
10171055
index.commit("This should fail")
10181056
except HookExecutionError as err:
1019-
if type(_win_bash_status) is _WinBashStatus.Absent:
1057+
if type(_win_bash_status) is WinBashStatus.Absent:
10201058
self.assertIsInstance(err.status, OSError)
10211059
self.assertEqual(err.command, [hp])
10221060
self.assertEqual(err.stdout, "")
@@ -1032,12 +1070,12 @@ def test_pre_commit_hook_fail(self, rw_repo):
10321070
raise AssertionError("Should have caught a HookExecutionError")
10331071

10341072
@pytest.mark.xfail(
1035-
type(_win_bash_status) in {_WinBashStatus.Absent, _WinBashStatus.Wsl},
1073+
type(_win_bash_status) in {WinBashStatus.Absent, WinBashStatus.Wsl},
10361074
reason="Specifically seems to fail on WSL bash (in spite of #1399)",
10371075
raises=AssertionError,
10381076
)
10391077
@pytest.mark.xfail(
1040-
type(_win_bash_status) is _WinBashStatus.WslNoDistro,
1078+
type(_win_bash_status) is WinBashStatus.WslNoDistro,
10411079
reason="Currently uses the bash.exe for WSL even with no WSL distro installed",
10421080
raises=HookExecutionError,
10431081
)
@@ -1055,7 +1093,7 @@ def test_commit_msg_hook_success(self, rw_repo):
10551093
self.assertEqual(new_commit.message, "{} {}".format(commit_message, from_hook_message))
10561094

10571095
@pytest.mark.xfail(
1058-
type(_win_bash_status) is _WinBashStatus.WslNoDistro,
1096+
type(_win_bash_status) is WinBashStatus.WslNoDistro,
10591097
reason="Currently uses the bash.exe for WSL even with no WSL distro installed",
10601098
raises=AssertionError,
10611099
)
@@ -1066,7 +1104,7 @@ def test_commit_msg_hook_fail(self, rw_repo):
10661104
try:
10671105
index.commit("This should fail")
10681106
except HookExecutionError as err:
1069-
if type(_win_bash_status) is _WinBashStatus.Absent:
1107+
if type(_win_bash_status) is WinBashStatus.Absent:
10701108
self.assertIsInstance(err.status, OSError)
10711109
self.assertEqual(err.command, [hp])
10721110
self.assertEqual(err.stdout, "")

0 commit comments

Comments
 (0)