git diff hunk header 中的摘录来自哪里?
Where does the excerpt in the git diff hunk header come from?
当我在 C# 文件上使用 git diff
时,我看到类似这样的内容:
diff --git a/foo.cs b/foo.cs
index ff61664..dd8a3e3 100644
--- a/foo.cs
+++ b/foo.cs
@@ -15,6 +15,7 @@ static void Main(string[] args)
string name = Console.ReadLine();
}
Console.WriteLine("Hello {0}!", name);
+ Console.WriteLine("Goodbye");
}
}
}
hunk header 行包含当前方法的第一行 (static void Main(string[] args)
),这很棒。不过好像不太靠谱。。。我看到很多情况下都不行。
所以我想知道,这段摘录是从哪里来的? git diff
是否以某种方式识别语言语法?有没有办法自定义它?
Is there a way to customize it?
配置定义在.gitattributes
, section "Defining a custom hunk-header":
First, in .gitattributes
, you would assign the diff
attribute for paths.
*.tex diff=tex
Then, you would define a "diff.tex.xfuncname
" configuration to specify a regular expression that matches a line that you would want to appear as the hunk header "TEXT
". Add a section to your $GIT_DIR/config
file (or $HOME/.gitconfig
file) like this:
[diff "tex"]
xfuncname = "^(\\(sub)*section\{.*)$"
Note. A single level of backslashes are eaten by the configuration file parser, so you would need to double the backslashes; the pattern above picks a line that begins with a backslash, and zero or more occurrences of sub followed by section followed by open brace, to the end of line.
There are a few built-in patterns to make this easier, and tex
is one of them, so you do not have to write the above in your configuration file (you still need to enable this with the attribute mechanism, via .gitattributes
).
('csharp
' 是当前 built-in 模式的一部分)
Where does this excerpt come from?
Does git diff
somehow recognize the language syntax?
最初,该算法对于函数名称检测非常粗糙:
参见 commit acb7257(Git 1.3.0,2006 年 4 月,由 Mark Wooding 撰写)
xdiff
: 以 hunk headers.
显示函数名称
The speed of the built-in diff generator is nice; but the function names
shown by diff -p
are really nice. And I hate having to choose.
So, we hack xdiff
to find the function names and print them.
The function names are parsed by a particularly stupid algorithm at the
moment: it just tries to find a line in the 'old' file, from before the
start of the hunk, whose first character looks plausible. Still, it's
most definitely a start.
用get_func_line(), itself coming from commit f258475 (Git 1.5.3, Sept 2007, authored by Junio C Hamano (gitster
))
精炼
您可以在提交测试 t/t4018-diff-funcname.sh
中看到,以测试自定义 diff 函数名称模式。
Per-path 基于属性的 hunk header 选择。
This makes "diff -p
" hunk headers customizable via gitattributes
mechanism.
It is based on Johannes's earlier patch that allowed to define a single
regexp to be used for everything.
The mechanism to arrive at the regexp that is used to define hunk header
is the same as other use of gitattributes
.
You assign an attribute, funcname
(because "diff -p
" typically uses the name of the function the patch is about as the hunk header), a simple string value.
This can be one of the names of built-in pattern (currently, java
" is defined) or a custom pattern name, to be looked up from the configuration file.
(in .gitattributes)
*.java funcname=java
*.perl funcname=perl
(in .git/config)
[funcname]
java = ... # ugly and complicated regexp to override the built-in one.
perl = ... # another ugly and complicated regexp to define a new one.
当前的 xfuncname
语法在 commit 45d9414、Git 1.6.0.3 中引入,2008 年 10 月,作者是 Brandon Casey
diff.*.xfuncname
对 hunk header 选择使用“扩展”正则表达式
Currently, the hunk headers produced by 'diff -p
' are customizable by
setting the diff.*.funcname
option in the config file. The 'funcname
' option takes a basic regular expression. This functionality was designed using the GNU regex library which, by default, allows using backslashed versions of some extended regular expression operators, even in Basic Regular Expression mode. For example, the following characters, when backslashed, are interpreted according to the extended regular expression rules: ?
, +
, and |
.
As such, the builtin funcname
patterns were created using some extended
regular expression operators.
Other platforms which adhere more strictly to the POSIX spec do not
interpret the backslashed extended RE operators in Basic Regular Expression
mode. This causes the pattern matching for the builtin funcname patterns to
fail on those platforms.
Introduce a new option 'xfuncname
' which uses extended regular expressions, and advertise it instead of funcname
.
Since most users are on GNU platforms, the majority of funcname
patterns are created and tested there.
Advertising only xfuncname
should help to avoid the creation of non-portable patterns which work with GNU regex but not elsewhere.
Additionally, the extended regular expressions may be less ugly and
complicated compared to the basic RE since many common special operators do not need to be backslashed.
For example, the GNU Basic RE:
^[ ]*\(\(public\|static\).*\)$
becomes the following Extended RE:
^[ ]*((public|static).*)$
最后,它已用 commit 14937c2 扩展,用于 git 1.7.8(2011 年 12 月),由 René Scharfe 撰写。
diff
: 添加选项以将整个函数显示为上下文
Add the option -W
/--function-context
to git diff
.
It is similar to the same option of git grep
and expands the context of change hunks so that the whole surrounding function is shown.
This "natural" context can allow changes to be understood better.
它仍在 Git 2.15(2017 年第 4 季度)中进行调整
The built-in pattern to detect the "function header" for HTML did
not match <H1>..<H6>
elements without any attributes, which has
been fixed.
在 2.15 之前,它无法匹配 <h1>...</h1>
,而 <h1 class="smth">...</h1>
匹配。
参见 commit 9c03cac (23 Sep 2017) by Ilya Kantor (iliakan
)。
(由 Junio C Hamano -- gitster
-- in commit 376a1da 合并,2017 年 9 月 28 日)
检测函数边界的模式称为xfuncref
。
参见 commit a807200 (08 Nov 2019) by Łukasz Niemier (hauleth
)。
(由 Junio C Hamano -- gitster
-- in commit 376e730 合并,2019 年 12 月 1 日),对于 Git 2.25(2020 年第一季度)
userdiff
: add Elixir to supported userdiff languages
Signed-off-by: Łukasz Niemier
Acked-by: Johannes Sixt
Adds support for xfuncref
in Elixir language which is Ruby-like language that runs on Erlang Virtual Machine (BEAM).
并且:
参见 commit d1b1384 (13 Dec 2019) by Ed Maste (emaste
)。
(由 Junio C Hamano -- gitster
-- in commit ba6b662 合并,2019 年 12 月 25 日)
userdiff
: remove empty subexpression from elixir
regex
Signed-off-by: Ed Maste
Reviewed-by: Jeff King
Helped-by: Johannes Sixt
The regex failed to compile on FreeBSD.
Also add /* -- */
mark to separate the two regex entries given to the PATTERNS()
macro, to make it consistent with patterns for other content types.
Markdown 文档的 userdiff 模式已添加到 Git 2.27(2020 年第 2 季度)。
参见 commit 09dad92 (02 May 2020) by Ash Holland (sersorrel
)。
(由 Junio C Hamano -- gitster
-- in commit dc4c393 合并,2020 年 5 月 8 日)
userdiff
: support Markdown
Signed-off-by: Ash Holland
Acked-by: Johannes Sixt
It's typical to find Markdown documentation alongside source code, and having better context for documentation changes is useful; see also commit 69f9c87d4 ("userdiff
: add support for Fountain documents", 2015-07-21, Git v2.6.0-rc0 -- merge listed in batch #1).
The pattern is based on the CommonMark specification 0.29, section 4.2 https://spec.commonmark.org/ but doesn't match empty headings, as seeing them in a hunk header is unlikely to be useful.
Only ATX headings are supported, as detecting setext headings would require printing the line before a pattern matches, or matching a multiline pattern. The word-diff pattern is the same as the pattern for HTML, because many Markdown parsers accept inline HTML.
使用 Git 2.30(2021 年第一季度),userdiff 模式学会了识别 POSIX shell 和 bash
.
中的函数定义
参见 commit 2ff6c34 (22 Oct 2020) by Victor Engmark (l0b0
)。
(由 Junio C Hamano -- gitster
-- in commit 292e53f 合并,2020 年 11 月 2 日)
userdiff
: support Bash
Signed-off-by: Victor Engmark
Acked-by: Johannes Sixt
Support POSIX, bashism and mixed function declarations, all four compound command types, trailing comments and mixed whitespace.
Even though Bash allows locale-dependent characters in function names, only detect function names with characters allowed by POSIX.1-2017 for simplicity.
This should cover the vast majority of use cases, and produces system-agnostic results.
Since a word pattern has to be specified, but there is no easy way to know the default word pattern, use the default IFS
characters for a starter. A later patch can improve this.
gitattributes
现在包含在其 man page 中:
bash
suitable for source code in the Bourne-Again SHell language.
Covers a superset of POSIX shell function definitions.
在 Git 2.32(2021 年第 2 季度)中,添加了“Scheme”的用户差异模式。
参见 commit a437390 (08 Apr 2021) by Atharva Raykar (tfidfwastaken
)。
(由 Junio C Hamano -- gitster
-- in commit 6d7a62d 合并,2021 年 4 月 20 日)
userdiff
: add support for Scheme
Signed-off-by: Atharva Raykar
Add a diff driver for Scheme-like languages which recognizes top level and local define
forms, whether it is a function definition, binding, syntax definition or a user-defined define-xyzzy
form.
Also supports R6RS library
forms, module
forms along with class and struct declarations used in Racket (PLT Scheme).
Alternate "def" syntax such as those in Gerbil Scheme are also supported, like defstruct, defsyntax and so on.
The rationale for picking define
forms for the hunk headers is because it is usually the only significant form for defining the structure of the program, and it is a common pattern for schemers to have local function definitions to hide their visibility, so it is not only the top level define
's that are of interest.
Schemers also extend the language with macros to provide their own define forms (for example, something like a define-test-suite
) which is also captured in the hunk header.
Since it is common practice to extend syntax with variants of a form like module+
, class*
etc, those have been supported as well.
The word regex is a best-effort attempt to conform to R7RS (section 2.1) valid identifiers, symbols and numbers.
gitattributes
现在包含在其 man page 中:
scheme
suitable for source code in the Scheme language.
使用 Git 2.33(2021 年第 3 季度),C# 的 userdiff 模式学习了令牌“record
”。
参见 commit c4e3178 (02 Mar 2021) by Julian Verdurmen (304NotModified
)。
(由 Junio C Hamano -- gitster
-- in commit f741069 合并,2021 年 7 月 8 日)
userdiff
: add support for C# record types
Signed-off-by: Julian Verdurmen
Reviewed-by: Johannes Schindelin
Records are added in C# 9
Code example :
public record Person(string FirstName, string LastName);
For more information, see https://docs.microsoft.com/en-us/dotnet/csharp/whats-new/csharp-9
在 Git 2.34(2021 年第 4 季度)中,“java”语言的用户差异模式已更新。
参见 commit a8cbc89 (11 Aug 2021) by Tassilo Horn (tsdh
)。
(由 Junio C Hamano -- gitster
-- in commit a896086 合并,2021 年 8 月 30 日)
userdiff
: improve java hunk header regex
Signed-off-by: Tassilo Horn
Currently, the git diff
(man) hunk headers show the wrong method signature if the method has a qualified return type, an array return type, or a generic return type because the regex doesn't allow dots (.)
, []
, or <
and >
in the return type.
Also, type parameter declarations couldn't be matched.
Add several t4018 tests asserting the right hunk headers for different cases:
- enum constant change
- change in generic method with bounded type parameters
- change in generic method with wildcard
- field change in a nested class
并且,仍然使用 Git 2.34(2021 年第 4 季度),更新了 C++ 语言的用户差异模式。
参见 commit 386076e (24 Oct 2021), commit c4fdba3, commit 637b80c, commit bfaaf19 (10 Oct 2021), and commit 350b87c, commit 3e063de, commit 1cf9384 (08 Oct 2021) by Johannes Sixt (j6t
)。
(由 Junio C Hamano -- gitster
-- in commit f3f157f 合并,2021 年 10 月 25 日)
例如:
userdiff-cpp
: permit the digit-separating single-quote in numbers
Signed-off-by: Johannes Sixt
Since C++17, the single-quote can be used as digit separator:
3.141'592'654
1'000'000
0xdead'beaf
Make it known to the word regex of the cpp driver, so that numbers are not split into separate tokens at the single-quotes.
当我在 C# 文件上使用 git diff
时,我看到类似这样的内容:
diff --git a/foo.cs b/foo.cs
index ff61664..dd8a3e3 100644
--- a/foo.cs
+++ b/foo.cs
@@ -15,6 +15,7 @@ static void Main(string[] args)
string name = Console.ReadLine();
}
Console.WriteLine("Hello {0}!", name);
+ Console.WriteLine("Goodbye");
}
}
}
hunk header 行包含当前方法的第一行 (static void Main(string[] args)
),这很棒。不过好像不太靠谱。。。我看到很多情况下都不行。
所以我想知道,这段摘录是从哪里来的? git diff
是否以某种方式识别语言语法?有没有办法自定义它?
Is there a way to customize it?
配置定义在.gitattributes
, section "Defining a custom hunk-header":
First, in
.gitattributes
, you would assign thediff
attribute for paths.*.tex diff=tex
Then, you would define a "
diff.tex.xfuncname
" configuration to specify a regular expression that matches a line that you would want to appear as the hunk header "TEXT
". Add a section to your$GIT_DIR/config
file (or$HOME/.gitconfig
file) like this:[diff "tex"] xfuncname = "^(\\(sub)*section\{.*)$"
Note. A single level of backslashes are eaten by the configuration file parser, so you would need to double the backslashes; the pattern above picks a line that begins with a backslash, and zero or more occurrences of sub followed by section followed by open brace, to the end of line.
There are a few built-in patterns to make this easier, and
tex
is one of them, so you do not have to write the above in your configuration file (you still need to enable this with the attribute mechanism, via.gitattributes
).
('csharp
' 是当前 built-in 模式的一部分)
Where does this excerpt come from?
Doesgit diff
somehow recognize the language syntax?
最初,该算法对于函数名称检测非常粗糙:
参见 commit acb7257(Git 1.3.0,2006 年 4 月,由 Mark Wooding 撰写)
xdiff
: 以 hunk headers.
显示函数名称
The speed of the built-in diff generator is nice; but the function names shown by
diff -p
are really nice. And I hate having to choose.
So, we hackxdiff
to find the function names and print them.The function names are parsed by a particularly stupid algorithm at the moment: it just tries to find a line in the 'old' file, from before the start of the hunk, whose first character looks plausible. Still, it's most definitely a start.
用get_func_line(), itself coming from commit f258475 (Git 1.5.3, Sept 2007, authored by Junio C Hamano (gitster
))
您可以在提交测试 t/t4018-diff-funcname.sh
中看到,以测试自定义 diff 函数名称模式。
Per-path 基于属性的 hunk header 选择。
This makes "
diff -p
" hunk headers customizable viagitattributes
mechanism.
It is based on Johannes's earlier patch that allowed to define a single regexp to be used for everything.The mechanism to arrive at the regexp that is used to define hunk header is the same as other use of
gitattributes
.
You assign an attribute,funcname
(because "diff -p
" typically uses the name of the function the patch is about as the hunk header), a simple string value.
This can be one of the names of built-in pattern (currently,java
" is defined) or a custom pattern name, to be looked up from the configuration file.(in .gitattributes) *.java funcname=java *.perl funcname=perl (in .git/config) [funcname] java = ... # ugly and complicated regexp to override the built-in one. perl = ... # another ugly and complicated regexp to define a new one.
当前的 xfuncname
语法在 commit 45d9414、Git 1.6.0.3 中引入,2008 年 10 月,作者是 Brandon Casey
diff.*.xfuncname
对 hunk header 选择使用“扩展”正则表达式
Currently, the hunk headers produced by '
diff -p
' are customizable by setting thediff.*.funcname
option in the config file. The 'funcname
' option takes a basic regular expression. This functionality was designed using the GNU regex library which, by default, allows using backslashed versions of some extended regular expression operators, even in Basic Regular Expression mode. For example, the following characters, when backslashed, are interpreted according to the extended regular expression rules:?
,+
, and|
.
As such, the builtinfuncname
patterns were created using some extended regular expression operators.Other platforms which adhere more strictly to the POSIX spec do not interpret the backslashed extended RE operators in Basic Regular Expression mode. This causes the pattern matching for the builtin funcname patterns to fail on those platforms.
Introduce a new option '
xfuncname
' which uses extended regular expressions, and advertise it instead offuncname
.
Since most users are on GNU platforms, the majority offuncname
patterns are created and tested there.
Advertising onlyxfuncname
should help to avoid the creation of non-portable patterns which work with GNU regex but not elsewhere.Additionally, the extended regular expressions may be less ugly and complicated compared to the basic RE since many common special operators do not need to be backslashed.
For example, the GNU Basic RE:
^[ ]*\(\(public\|static\).*\)$
becomes the following Extended RE:
^[ ]*((public|static).*)$
最后,它已用 commit 14937c2 扩展,用于 git 1.7.8(2011 年 12 月),由 René Scharfe 撰写。
diff
: 添加选项以将整个函数显示为上下文
Add the option
-W
/--function-context
togit diff
.
It is similar to the same option ofgit grep
and expands the context of change hunks so that the whole surrounding function is shown.
This "natural" context can allow changes to be understood better.
它仍在 Git 2.15(2017 年第 4 季度)中进行调整
The built-in pattern to detect the "function header" for HTML did not match
<H1>..<H6>
elements without any attributes, which has been fixed.
在 2.15 之前,它无法匹配 <h1>...</h1>
,而 <h1 class="smth">...</h1>
匹配。
参见 commit 9c03cac (23 Sep 2017) by Ilya Kantor (iliakan
)。
(由 Junio C Hamano -- gitster
-- in commit 376a1da 合并,2017 年 9 月 28 日)
检测函数边界的模式称为xfuncref
。
参见 commit a807200 (08 Nov 2019) by Łukasz Niemier (hauleth
)。
(由 Junio C Hamano -- gitster
-- in commit 376e730 合并,2019 年 12 月 1 日),对于 Git 2.25(2020 年第一季度)
userdiff
: add Elixir to supported userdiff languagesSigned-off-by: Łukasz Niemier
Acked-by: Johannes Sixt
Adds support for
xfuncref
in Elixir language which is Ruby-like language that runs on Erlang Virtual Machine (BEAM).
并且:
参见 commit d1b1384 (13 Dec 2019) by Ed Maste (emaste
)。
(由 Junio C Hamano -- gitster
-- in commit ba6b662 合并,2019 年 12 月 25 日)
userdiff
: remove empty subexpression fromelixir
regexSigned-off-by: Ed Maste
Reviewed-by: Jeff King
Helped-by: Johannes Sixt
The regex failed to compile on FreeBSD.
Also add
/* -- */
mark to separate the two regex entries given to thePATTERNS()
macro, to make it consistent with patterns for other content types.
Markdown 文档的 userdiff 模式已添加到 Git 2.27(2020 年第 2 季度)。
参见 commit 09dad92 (02 May 2020) by Ash Holland (sersorrel
)。
(由 Junio C Hamano -- gitster
-- in commit dc4c393 合并,2020 年 5 月 8 日)
userdiff
: support MarkdownSigned-off-by: Ash Holland
Acked-by: Johannes Sixt
It's typical to find Markdown documentation alongside source code, and having better context for documentation changes is useful; see also commit 69f9c87d4 ("
userdiff
: add support for Fountain documents", 2015-07-21, Git v2.6.0-rc0 -- merge listed in batch #1).
The pattern is based on the CommonMark specification 0.29, section 4.2 https://spec.commonmark.org/ but doesn't match empty headings, as seeing them in a hunk header is unlikely to be useful.
Only ATX headings are supported, as detecting setext headings would require printing the line before a pattern matches, or matching a multiline pattern. The word-diff pattern is the same as the pattern for HTML, because many Markdown parsers accept inline HTML.
使用 Git 2.30(2021 年第一季度),userdiff 模式学会了识别 POSIX shell 和 bash
.
参见 commit 2ff6c34 (22 Oct 2020) by Victor Engmark (l0b0
)。
(由 Junio C Hamano -- gitster
-- in commit 292e53f 合并,2020 年 11 月 2 日)
userdiff
: support BashSigned-off-by: Victor Engmark
Acked-by: Johannes Sixt
Support POSIX, bashism and mixed function declarations, all four compound command types, trailing comments and mixed whitespace.
Even though Bash allows locale-dependent characters in function names, only detect function names with characters allowed by POSIX.1-2017 for simplicity.
This should cover the vast majority of use cases, and produces system-agnostic results.Since a word pattern has to be specified, but there is no easy way to know the default word pattern, use the default
IFS
characters for a starter. A later patch can improve this.
gitattributes
现在包含在其 man page 中:
bash
suitable for source code in the Bourne-Again SHell language.
Covers a superset of POSIX shell function definitions.
在 Git 2.32(2021 年第 2 季度)中,添加了“Scheme”的用户差异模式。
参见 commit a437390 (08 Apr 2021) by Atharva Raykar (tfidfwastaken
)。
(由 Junio C Hamano -- gitster
-- in commit 6d7a62d 合并,2021 年 4 月 20 日)
userdiff
: add support for SchemeSigned-off-by: Atharva Raykar
Add a diff driver for Scheme-like languages which recognizes top level and local
define
forms, whether it is a function definition, binding, syntax definition or a user-defineddefine-xyzzy
form.Also supports R6RS
library
forms,module
forms along with class and struct declarations used in Racket (PLT Scheme).Alternate "def" syntax such as those in Gerbil Scheme are also supported, like defstruct, defsyntax and so on.
The rationale for picking
define
forms for the hunk headers is because it is usually the only significant form for defining the structure of the program, and it is a common pattern for schemers to have local function definitions to hide their visibility, so it is not only the top leveldefine
's that are of interest.
Schemers also extend the language with macros to provide their own define forms (for example, something like adefine-test-suite
) which is also captured in the hunk header.Since it is common practice to extend syntax with variants of a form like
module+
,class*
etc, those have been supported as well.The word regex is a best-effort attempt to conform to R7RS (section 2.1) valid identifiers, symbols and numbers.
gitattributes
现在包含在其 man page 中:
scheme
suitable for source code in the Scheme language.
使用 Git 2.33(2021 年第 3 季度),C# 的 userdiff 模式学习了令牌“record
”。
参见 commit c4e3178 (02 Mar 2021) by Julian Verdurmen (304NotModified
)。
(由 Junio C Hamano -- gitster
-- in commit f741069 合并,2021 年 7 月 8 日)
userdiff
: add support for C# record typesSigned-off-by: Julian Verdurmen
Reviewed-by: Johannes Schindelin
Records are added in C# 9
Code example :
public record Person(string FirstName, string LastName);
For more information, see https://docs.microsoft.com/en-us/dotnet/csharp/whats-new/csharp-9
在 Git 2.34(2021 年第 4 季度)中,“java”语言的用户差异模式已更新。
参见 commit a8cbc89 (11 Aug 2021) by Tassilo Horn (tsdh
)。
(由 Junio C Hamano -- gitster
-- in commit a896086 合并,2021 年 8 月 30 日)
userdiff
: improve java hunk header regexSigned-off-by: Tassilo Horn
Currently, the
git diff
(man) hunk headers show the wrong method signature if the method has a qualified return type, an array return type, or a generic return type because the regex doesn't allow dots(.)
,[]
, or<
and>
in the return type.
Also, type parameter declarations couldn't be matched.Add several t4018 tests asserting the right hunk headers for different cases:
- enum constant change
- change in generic method with bounded type parameters
- change in generic method with wildcard
- field change in a nested class
并且,仍然使用 Git 2.34(2021 年第 4 季度),更新了 C++ 语言的用户差异模式。
参见 commit 386076e (24 Oct 2021), commit c4fdba3, commit 637b80c, commit bfaaf19 (10 Oct 2021), and commit 350b87c, commit 3e063de, commit 1cf9384 (08 Oct 2021) by Johannes Sixt (j6t
)。
(由 Junio C Hamano -- gitster
-- in commit f3f157f 合并,2021 年 10 月 25 日)
例如:
userdiff-cpp
: permit the digit-separating single-quote in numbersSigned-off-by: Johannes Sixt
Since C++17, the single-quote can be used as digit separator:
3.141'592'654 1'000'000 0xdead'beaf
Make it known to the word regex of the cpp driver, so that numbers are not split into separate tokens at the single-quotes.