Skip to content

Fix diff patch parser for paths with unsafe chars #415

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Apr 20, 2016
Merged

Fix diff patch parser for paths with unsafe chars #415

merged 2 commits into from
Apr 20, 2016

Conversation

nvie
Copy link
Contributor

@nvie nvie commented Apr 19, 2016

This specifically covers the cases where unsafe chars occur in path names, and git-diff -p will escape those.

From the git-diff-tree manpage:

  1. TAB, LF, double quote and backslash characters in pathnames are represented as \t, \n, \" and \\, respectively. If there is need for such substitution then the whole pathname is put in double quotes.

This patch checks whether or not this has happened and will unescape those paths accordingly.

One thing to note here is that, depending on the position in the patch format, those paths may be prefixed with an a/ or b/. I've specifically made sure to never interpret a path that actually starts with a/ or b/ incorrectly.

Example of that subtlety below. Here, the actual file path is b/normal. On the diff file that gets encoded as b/b/normal.

 diff --git a/b/normal b/b/normal
 new file mode 100644
 index 0000000000000000000000000000000000000000..eaf5f7510320b6a327fb308379de2f94d8859a54
 --- /dev/null
 +++ b/b/normal
 @@ -0,0 +1 @@
 +dummy content

Here, we prefer the --- and +++ lines' values. Note that these paths start with a/ or b/. The only exception is the value /dev/null, which is handled as a special case.

Suppose now the file gets moved b/moved, the output of that diff would then be this:

 diff --git a/b/normal b/b/moved
 similarity index 100%
 rename from b/normal
 rename to b/moved

We prefer the rename lines' values in this case (the diff line is always a last resort). Take note that those lines are not prefixed with a/ or b/, but the ones in the diff line are (just like the ones in --- or +++ lines).

This specifically covers the cases where unsafe chars occur in path
names, and git-diff -p will escape those.

From the git-diff-tree manpage:

> 3. TAB, LF, double quote and backslash characters in pathnames are
>    represented as \t, \n, \" and \\, respectively. If there is need
>    for such substitution then the whole pathname is put in double
>    quotes.

This patch checks whether or not this has happened and will unescape
those paths accordingly.

One thing to note here is that, depending on the position in the patch
format, those paths may be prefixed with an a/ or b/.  I've specifically
made sure to never interpret a path that actually starts with a/ or b/
incorrectly.

Example of that subtlety below.  Here, the actual file path is
"b/normal".  On the diff file that gets encoded as "b/b/normal".

     diff --git a/b/normal b/b/normal
     new file mode 100644
     index 0000000000000000000000000000000000000000..eaf5f7510320b6a327fb308379de2f94d8859a54
     --- /dev/null
     +++ b/b/normal
     @@ -0,0 +1 @@
     +dummy content

Here, we prefer the "---" and "+++" lines' values.  Note that these
paths start with a/ or b/.  The only exception is the value "/dev/null",
which is handled as a special case.

Suppose now the file gets moved "b/moved", the output of that diff would
then be this:

     diff --git a/b/normal b/b/moved
     similarity index 100%
     rename from b/normal
     rename to b/moved

We prefer the "rename" lines' values in this case (the "diff" line is
always a last resort).  Take note that those lines are not prefixed with
a/ or b/, but the ones in the "diff" line are (just like the ones in
"---" or "+++" lines).
Specifically "string_escape" does not exist as an encoding anymore.
@Byron Byron added this to the v1.1.0 - Features and Fixes milestone Apr 20, 2016
@Byron Byron merged commit fcb6e88 into gitpython-developers:master Apr 20, 2016
@Byron
Copy link
Member

Byron commented Apr 20, 2016

Thank @nvie , I particularly like your test !
Even though I didn't particularly verify it's truly exhaustive, it does look good to me.

@nvie nvie deleted the fix-for-unicode-paths branch April 20, 2016 10:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

2 participants