Skip to content

Fix for parsing non-ASCII chars in status lines #473

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 7 commits into from

Conversation

nvie
Copy link
Contributor

@nvie nvie commented Jun 14, 2016

It's a bit unclear which test fixture contained the data that tests this: changing the existing ones with - [up to date] foo -> foo lines did not turn any tests red.

I've updated this regex twice now very recently and I think just generically accepting any characters except whitespace is most future-compatible. We're running into some very wild ref names in practice, so this seems the most stable fix.

nvie added 2 commits June 14, 2016 21:49
Previously, the following fields on Diff instances were assumed to be
passed in as unicode strings:

  - `a_path`
  - `b_path`
  - `rename_from`
  - `rename_to`

However, since Git natively records paths as bytes, these may
potentially not have a valid unicode representation.

This patch changes the Diff instance to instead take the following
equivalent fields that should be raw bytes instead:

  - `a_rawpath`
  - `b_rawpath`
  - `raw_rename_from`
  - `raw_rename_to`

NOTE ON BACKWARD COMPATIBILITY:
The original `a_path`, `b_path`, etc. fields are still available as
properties (rather than slots).  These properties now dynamically decode
the raw bytes into a unicode string (performing the potentially
destructive operation of replacing invalid unicode chars by "�"'s).
This means that all code using Diffs should remain backward compatible.
The only exception is when people would manually construct Diff
instances by calling the constructor directly, in which case they should
now pass in bytes rather than unicode strings.

See also the discussion on
#467
@Byron Byron added this to the v2.0.6 - Bugfixes milestone Jun 20, 2016
@Byron
Copy link
Member

Byron commented Jun 20, 2016

Thanks a bunch !

The question is to what extend the unit-tests can be trusted in that regard. Taking into consideration that the progress parsing has always been a weak spot, I'd be happy to give it another shot by merging this commit.

@Byron Byron closed this Jun 20, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging this pull request may close these issues.

3 participants