带通配符、Glob 和 Gitignore 样式 Glob 的快速字符串匹配 - 如何不让它爆炸

Robert van Engelen

4.99/5 (43投票s)

2019 年 8 月 5 日

CPOL

11分钟阅读

45051

1453

经典的 globbing 和现代的 gitignore 样式 globbing 算法可以很快，而递归实现则以指数级爆炸而闻名；为什么有些免费提供的源代码不应使用。

引言

Jack Handy 的通配符字符串比较可高效地将字符串与包含 * 和 ? 通配符的模式进行比较。该算法快速且安全，但不支持 gitignore 风格的 globbing 规则。在本文中，我将说明经典的 globbing 和现代的 gitignore 风格的 globbing 算法也可以很快。我还将解释为什么会爆炸式增长的递归实现存在问题，以及为什么有些免费提供的源代码不应使用。

背景

通配符字符串匹配和 globbing 并非初看起来那么简单。事实上，过去的错误已导致严重的漏洞，例如拒绝服务。简单的补丁，例如限制匹配数量以限制 CPU 时间，已被应用于修复执行时间呈指数级爆炸的实现。更令人不安的是，有缺陷的 globbing 实现可以在网上轻松找到：只需搜索 wildmat.c 即可找到可能在字符类范围不完整（例如 [a-）时崩溃的实现副本。

递归的问题

为了理解 globbing 实现如何可能导致拒绝服务，我们将快速回顾一些示例。在通配符匹配 * 和 ? 的简单实现中，通常使用递归。其思想是从左到右扫描模式和字符串，同时成对匹配字符。当在模式中遇到星号时，将递归调用 do-match 函数，以将模式的其余部分与字符串的其余部分进行比较。如果递归的 do-match 调用成功，那么我们就完成了。但是，如果递归调用失败，我们将回退到递归调用之前的位置，在 string 中向前推进一个字符，然后重复递归调用。此循环是“star-loop”，它在模式的星号处消耗零个、一个或多个字符，如下面的示例所示。

// returns TRUE if text string matches wild pattern with * and ?
int naive_recursive_match(const char *text, const char *wild)
{
  while (*text != '\0')
  {
    if (*wild == '*')
    {
      // any number of stars act as one star
      while (*++wild == '*')
        continue;
      // a star at the end of the pattern matches any text
      if (*wild == '\0')
        return TRUE;
      // star-loop: match the rest of the pattern and text
      while (naive_recursive_match(text, wild) == FALSE && *text != '\0')
        text++;
      return *text != '\0' ? TRUE : FALSE;
    }
    // ? matches any character or we match the current non-NUL character
    if (*wild != '?' && *wild != *text)
      return FALSE;
    text++;
    wild++;
  }
  // ignore trailing stars
  while (*wild == '*')
    wild++;
  // at end of text means success if nothing else is left to match
  return *wild == '\0' ? TRUE : FALSE;
}

该算法相当简单，只需很少的解释：为星号调用递归，直到匹配或到达 string 文本的末尾。? 匹配任何单个字符。否则，将当前模式字符与文本中的当前字符进行匹配。我们将文本和模式向前移动一个字符，然后重复，直到到达文本末尾。请注意，当到达文本末尾时，可以忽略模式中的任何尾随星号。

这种方法对于要匹配的短字符串和星号很少的模式效果很好。但是，很容易找到由于星号上的过度回溯而导致执行时间爆炸的示例。例如，a*a*a*a*a*a*a*a*b 和包含 100 个 a 的字符串 aaa…aaa 需要几分钟才能终止并报告匹配失败。只需尝试以下两个 shell 命令

$ touch `python3 -c 'print(100*"a")'`
$ ls a*a*a*a*a*a*a*a*b

在我的 2.9GHz i7 机器上，bash-3.2 大约需要 8 分钟才能终止，性能相当不错。tcsh-6.18 则需要一个多小时。2*ⁿ 个状态对于 *n 个星号是造成这种行为的原因。

使用 ABORT

历史上最早的 globbing 算法之一，由 Rich Salz 编写的 wildmat.c（原始版）具有这种不良行为，但后来在 2000 年代初由 Lars Mathiesen 使用 ABORT 状态进行了更新。他们对更新后的 wildmat.c 源代码的注释至今仍在，并且非常有见地。

引用

一旦 DoMatch 的一个实例的控制进入 star-loop，该实例将返回 TRUE 或 ABORT，任何调用实例因此将在（不再次递归调用后）立即返回。实际上，只有一个 star-loop 处于活动状态。可以修改代码以显式维护此上下文，从而消除所有递归调用，但会增加一些复杂性并降低清晰度（并且 ABORT 的东西本身似乎就足够不清楚了）。

显然，指数级爆炸问题在那时已经广为人知，因为我们在 wildmat.c 中发现了以下注释，演示了这个问题。

pattern: -*-*-*-*-*-*-12-*-*-*-m-*-*-*
text 1:  -adobe-courier-bold-o-normal--12-120-75-75-m-70-iso8859-1
text 2:  -adobe-courier-bold-o-normal--12-120-75-75-X-70-iso8859-1
Text 1 matches with 51 calls, while text 2 fails with 54 calls. 
Without the ABORT, then it takes 22310 calls to fail.  Ugh.

这里强大的洞察力是，只有最后一个 star-loop 应该是活动的，因为当最后一个 star-loop 失败时，推进任何之前的 star-loop 都没有成效。这个巧妙的更改使得 globbing 在典型情况下以线性时间执行。只有在最坏的情况下，该算法在模式和字符串长度上的执行时间才为二次方。

让我们将此改进应用于我们示例中的递归匹配器。

enum { FALSE = 0, TRUE = 1, ABORT = 2 };
// returns TRUE if text string matches wild pattern with * and ?
int recursive_match(const char *text, const char *wild)
{
  while (*text != '\0')
  {
    if (*wild == '*')
    {
      int ret;
      // any number of stars act as one star
      while (*++wild == '*')
        continue;
      // a star at the end of the pattern matches any text
      if (*wild == '\0')
        return TRUE;
      // star-loop: match the rest of the pattern and text
      while ((ret = recursive_match(text, wild)) == FALSE && *text != '\0')
        text++;
      // ABORT recursion when failing the last star-loop
      if (ret != TRUE)
        return ABORT;
      return *text != '\0' ? TRUE : FALSE;
    }
    // ? matches any character or we match the current non-NUL character
    if (*wild != '?' && *wild != *text)
      return FALSE;
    text++;
    wild++;
  }
  // ignore trailing stars
  while (*wild == '*')
    wild++;
  // at end of text means success if nothing else is left to match
  return *wild == '\0' ? TRUE : FALSE;
}

此改进避免了指数级爆炸，但仍然为模式中的每个星号进行递归调用。这是不必要的。我们可以保存匹配器的状态，以便从备份中恢复它，以执行最后一个 star-loop 的另一个迭代，直到完成。

非递归通配符字符串匹配

为了用迭代代替递归，我们需要两个备份位置：模式中紧跟星号的当前位置和文本字符串中的当前位置。为了迭代 star-loop，我们恢复备份的位置，在文本字符串中前进一个字符，然后重试。我们一直迭代，直到匹配成功或用完文本。如果星号出现在我们最后一个备份位置之后，我们就会有效地进入一个新的 star-loop，并移除我们之前正在迭代的那个。

// returns TRUE if text string matches wild pattern with * and ?
int match(const char *text, const char *wild)
{
  const char *text_backup = NULL;
  const char *wild_backup = NULL;
  while (*text != '\0')
  {
    if (*wild == '*')
    {
      // new star-loop: backup positions in pattern and text
      text_backup = text;
      wild_backup = ++wild;
    }
    else if (*wild == '?' || *wild == *text)
    {
      // ? matched any character or we matched the current non-NUL character
      text++;
      wild++;
    }
    else
    {
      // if no stars we fail to match
      if (wild_backup == NULL)
        return FALSE;
      // star-loop: backtrack to the last * by restoring the backup positions 
      // in the pattern and text
      text = ++text_backup;
      wild = wild_backup;
    }
  }
  // ignore trailing stars
  while (*wild == '*')
    wild++;
  // at end of text means success if nothing else is left to match
  return *wild == '\0' ? TRUE : FALSE;
}

该算法在典型情况下，执行时间与模式和字符串的长度成线性关系。在最坏的情况下，该算法的执行时间为二次方（要了解原因，请考虑模式 *aaa…aaab，其中包含 ½n 个 a，以及字符串 aaa…aaa，其中包含 *n* 个 a，这需要 ¼*n*² 次比较才能失败）。

如果您想知道是否存在线性最坏情况算法，答案是肯定的：通过从模式构建确定性有限自动机 (DFA) 进行匹配。这超出了本文的范围，并且需要考虑 (高昂的) DFA 构建成本。

非递归 Glob 匹配

现在我们回顾了字符串匹配和 globbing 的背景和问题，让我们进入本文的主要部分。Globbing 与字符串匹配的区别在于 * 和 ? 通配符在 glob 中具有微妙的含义：这些通配符不匹配路径分隔符，即不匹配 /（在 Windows 中也不匹配 \）。例如，foo*.h 不匹配 foo/bar.h。

为了调整我们之前的匹配算法以执行 glob 匹配，我们添加了一些针对 / 路径分隔符的检查，这样 * 和 ? 就不能匹配它。

// returns TRUE if text string matches glob-like pattern with * and ?
int globly_match(const char *text, const char *glob)
{
  const char *text_backup = NULL;
  const char *glob_backup = NULL;
  while (*text != '\0')
  {
    if (*glob == '*')
    {
      // new star-loop: backup positions in pattern and text
      text_backup = text;
      glob_backup = ++glob;
    }
    else if ((*glob == '?' && *text != '/') || *glob == *text)
    {
      // ? matched any character except /, or we matched the current non-NUL character
      text++;
      glob++;
    }
    else
    {
      if (glob_backup == NULL || *text_backup == '/')
        return FALSE;
      // star-loop: backtrack to the last * but do not jump over /
      text = ++text_backup;
      glob = glob_backup;
    }
  }
  // ignore trailing stars
  while (*glob == '*')
    glob++;
  // at end of text means success if nothing else is left to match
  return *glob == '\0' ? TRUE : FALSE;
}

然而，对于实际的 glob 匹配应用程序，这个 glob 式（“globly”）算法还不完整。它缺乏对字符类的支持，例如 [a-zA-Z] 匹配字母，[^0-9] 匹配除数字以外的任何字符。它还缺乏一种方法来转义 *、? 和 [ 元字符的特殊含义，使用反斜杠。添加这些新功能需要重新设计主循环中的 glob 元字符测试，使用 switch 来选择 glob 元字符，并使用 continue 来启动主循环。从 switch 跳出会将控制权转移到 star-loop。

#define PATHSEP '/'  // pathname separator, we should define '\\' for Windows instead
// returns TRUE if text string matches glob pattern with * and ?
int glob_match(const char *text, const char *glob)
{
  const char *text_backup = NULL;
  const char *glob_backup = NULL;
  while (*text != '\0')
  {
    switch (*glob)
    {
      case '*':
        // new star-loop: backup positions in pattern and text
        text_backup = text;
        glob_backup = ++glob;
        continue;
      case '?':
        // match any character except /
        if (*text == PATHSEP)
          break;
        text++;
        glob++;
        continue;
      case '[':
      {
        int lastchr;
        int matched = FALSE;
        int reverse = glob[1] == '^' || glob[1] == '!' ? TRUE : FALSE;
        // match any character in [...] except /
        if (*text == PATHSEP)
          break; 
        // inverted character class
        if (reverse)
          glob++;
        // match character class
        for (lastchr = 256; *++glob != '\0' && *glob != ']'; lastchr = *glob)
          if (lastchr < 256 && *glob == '-' && glob[1] != ']' && glob[1] != '\0' ?
              *text <= *++glob && *text >= lastchr :
              *text == *glob)
            matched = TRUE;
        if (matched == reverse)
          break;
        text++;
        if (*glob != '\0')
          glob++;
        continue;
      }
      case '\\':
        // literal match \-escaped character
        glob++;
        // FALLTHROUGH
      default:
        // match the current non-NUL character
        if (*glob != *text && !(*glob == '/' && *text == PATHSEP))
          break;
        text++;
        glob++;
        continue;
    }
    if (glob_backup == NULL || *text_backup == PATHSEP)
      return FALSE;
    // star-loop: backtrack to the last * but do not jump over /
    text = ++text_backup;
    glob = glob_backup;
  }
  // ignore trailing stars
  while (*glob == '*')
    glob++;
  // at end of text means success if nothing else is left to match
  return *glob == '\0' ? TRUE : FALSE;
}

PATHSEP 字符是字符串文本中用于分隔路径名的传统 / 或 Windows \。请注意，传统 Unix 使用 ! 来反转字符类，例如 [!0-9]。在这里，我们提供 ! 和更传统的 ^ 来反转字符类，例如 [^0-9]。

该算法的执行时间在典型情况下与模式和字符串的长度成线性关系，在最坏情况下为二次方。

非递归 gitignore 样式 glob 匹配

在我们到达本文的第二部分时，让我们看看现代的 gitignore 样式 globbing 规则，并设计一个高效的非递归算法。

Gitignore 样式 globbing 应用以下规则来确定文件和目录的路径名匹配。

`*`	匹配除 `/` 以外的任何内容。
`?`	匹配除 `/` 以外的任何单个字符。
`[a-z]`	匹配所选字符范围内的单个字符。
`[^a-z]`	匹配所选字符范围外的单个字符。
`[!a-z]`	匹配所选字符范围外的单个字符。
`/`	在 glob 的开头使用时，如果路径名没有 `/`，则匹配。
`**/`	匹配零个或多个目录。
`/**`	在 glob 的末尾时，匹配 `/` 之后的所有内容。
`\?`	匹配 `?`（或反斜杠后指定的任何字符）。

我们还做以下假设：当 glob 包含路径分隔符 / 时，匹配完整路径名。否则，匹配文件或目录的基本名称。例如，*.h 匹配 foo.h 和 bar/foo.h，bar/*.h 匹配 bar/foo.h 但不匹配 foo.h 也不匹配 bar/bar/foo.h。Glob 中的前导 / 可用于强制 /*.h 匹配 foo.h 但不匹配 bar/foo.h。此外，路径名中的前导 ./ 或 / 被忽略，因此 ./foo/bar 被匹配为 foo/bar，正如我们所预期的。

示例

`*`	匹配 `a`、`b`、`x/a`、`x/y/b`。
`a`	匹配 `a`、`x/a`、`x/y/a`。	但不匹配 `b`、`x/b`、`a/a/b`。
`/*`	匹配 `a`、`b`。	但不匹配 `x/a`、`x/b`、`x/y/a`。
`/a`	匹配 `a`。	但不匹配 `x/a`、`x/y/a`。
`a?b`	匹配 `axb`、`ayb`。	但不匹配 `a`、`b`、`ab`、`a/b`。
`a[xy]b`	匹配 `axb`、`ayb`。	但不匹配 `a`、`b`、`azb`。
`a[a-z]b`	匹配 `aab`、`abb`、`acb`、`azb`。	但不匹配 `a`、`b`、`a3b`、`aAb`、`aZb`。
`a[^xy]b`	匹配 `aab`、`abb`、`acb`、`azb`。	但不匹配 `a`、`b`、`axb`、`ayb`。
`a[^a-z]b`	匹配 `a3b`、`aAb`、`aZb`。	但不匹配 `a`、`b`、`aab`、`abb`、`acb`、`azb`。
`a/*/b`	匹配 `a/x/b`、`a/y/b`。	但不匹配 `a/b`、`a/x/y/b`。
`**/a`	匹配 `a`、`x/a`、`x/y/a`。	但不匹配 `b`、`x/b`。
`a/**/b`	匹配 `a/b`、`a/x/b`、`a/x/y/b`。	但不匹配 `x/a/b`、`a/b/x`。
`a/**`	匹配 `a/x`、`a/y`、`a/x/y`。	但不匹配 `a`、`b/x`。
`a\?b`	匹配 `a?b`。	但不匹配 `a`、`b`、`ab`、`axb`、`a/b`。

为了实现 gitignore 风格的 globbing，我们需要两个星号循环：一个用于单个星号，“*-loop”，另一个用于双星号，“**-loop”。**-loop 优先于 *-loop，因为当我们在 glob 模式中遇到它后面的双星号时，在单个星号上回溯是没有意义的。反之则不然：当后面有一个单个星号不匹配时，我们应该在双星号上回溯。

// returns TRUE if text string matches gitignore-style glob pattern
int gitignore_glob_match(const char *text, const char *glob)
{
  const char *text1_backup = NULL;
  const char *glob1_backup = NULL;
  const char *text2_backup = NULL;
  const char *glob2_backup = NULL;
  // match pathname if glob contains a / otherwise match the basename
  if (*glob == '/')
  {
    // if pathname starts with ./ then ignore these pairs
    while (*text == '.' && text[1] == PATHSEP)
      text += 2;
    // if pathname starts with / then ignore it
    if (*text == PATHSEP)
      text++;
    glob++;
  }
  else if (strchr(glob, '/') == NULL)
  {
    const char *sep = strrchr(text, PATHSEP);
    if (sep)
      text = sep + 1;
  }
  while (*text != '\0')
  {
    switch (*glob)
    {
      case '*':
        if (*++glob == '*')
        {
          // trailing ** match everything after /
          if (*++glob == '\0')
            return TRUE;
          // ** followed by a / match zero or more directories
          if (*glob != '/')
            return FALSE;
          // new **-loop, discard *-loop
          text1_backup = NULL;
          glob1_backup = NULL;
          text2_backup = text;
          glob2_backup = ++glob;
          continue;
        }
        // trailing * matches everything except /
        text1_backup = text;
        glob1_backup = glob;
        continue;
      case '?':
        // match any character except /
        if (*text == PATHSEP)
          break;
        text++;
        glob++;
        continue;
      case '[':
      {
        int lastchr;
        int matched = FALSE;
        int reverse = glob[1] == '^' || glob[1] == '!' ? TRUE : FALSE;
        // match any character in [...] except /
        if (*text == PATHSEP)
          break;
        // inverted character class
        if (reverse)
          glob++;
        // match character class
        for (lastchr = 256; *++glob != '\0' && *glob != ']'; lastchr = *glob)
          if (lastchr < 256 && *glob == '-' && glob[1] != ']' && glob[1] != '\0' ?
              *text <= *++glob && *text >= lastchr :
              *text == *glob)
            matched = TRUE;
        if (matched == reverse)
          break;
        text++;
        if (*glob != '\0')
          glob++;
        continue;
      }
      case '\\':
        // literal match \-escaped character
        glob++;
        // FALLTHROUGH
      default:
        // match the current non-NUL character
        if (*glob != *text && !(*glob == '/' && *text == PATHSEP))
          break;
        text++;
        glob++;
        continue;
    }
    if (glob1_backup != NULL && *text1_backup != PATHSEP)
    {
      // *-loop: backtrack to the last * but do not jump over /
      text = ++text1_backup;
      glob = glob1_backup;
      continue;
    }
    if (glob2_backup != NULL)
    {
      // **-loop: backtrack to the last **
      text = ++text2_backup;
      glob = glob2_backup;
      continue;
    }
    return FALSE;
  }
  // ignore trailing stars
  while (*glob == '*')
    glob++;
  // at end of text means success if nothing else is left to match
  return *glob == '\0' ? TRUE : FALSE;
}

该算法在典型情况下，执行时间与模式和字符串的长度成线性关系，在最坏情况下为三次，当 **-loop 和 *-loop 都处于活动状态时。

以点 (.) 开头的文件的名称被 shell glob 以不同的方式处理，这意味着名称开头的点必须显式匹配，并且不能被通配符匹配。为了复制此行为，我们添加以下内容。

int glob_match(const char *text, const char *glob)
{
  int nodot = 1; // new flag to indicate that we shouldn't match a dot, e.g. after a /
  ...
  while (*text != '\0')
  {
    switch (*glob)
    {
      case '*':
        // match anything except . after /
        if (nodot && *text == '.')
          break;
        ...
      case '?':
        // match anything except . after /
        if (nodot && *text == '.')
          break;
        ...
      case '[':
      {
        ...
        // match any character in [...] except . after /
        if (nodot && *text == '.')
          break;
        ...
      }
      ...
      default:
        // match the current non-NUL character
        if (*glob != *text && !(*glob == '/' && *text == PATHSEP))
          break;
        nodot = *glob == '/';
        ...

可以通过 tolower() 和 toupper() 实现 ASCII 中的不区分大小写的匹配，如下所示。

        // match the current non-NUL character
        if (tolower(*glob) != tolower(*text) && !(*glob == '/' && *text == PATHSEP))
          break;

        // match character class
        int chr;
        for (lastchr = 256; *++glob != '\0' && *glob != ']'; lastchr = *glob)
          if (lastchr < 256 && *glob == '-' && glob[1] != ']' && glob[1] != '\0' ?
              (tolower(*text) <= (chr = *++glob) && tolower(*text) >= lastchr) ||
              (toupper(*text) <= chr && toupper(*text) >= lastchr)
              : tolower(*text) == tolower(*glob))
            matched = TRUE;

Unicode 匹配可以通过两种方式支持：使用宽字符串，即 wchar_t 或 std::wstring，或者使用 UTF-8 编码的字符串。宽字符串选项不需要对算法进行任何更改。UTF-8 版本需要对 ? 和字符类进行一些更改，其他所有内容保持不变。

      case '?':
        // match anything except /
        if (*text == PATHSEP)
          break;
        utf8(&text);
        glob++;
        continue;

      case '[':
      {
        int chr = utf8(&text);
        int lastchr = 0x10ffff;
        int matched = FALSE;
        int reverse = glob[1] == '^' || glob[1] == '!' ? TRUE : FALSE;
        // match any character in [...] except /
        if (chr == PATHSEP)
          break;
        // inverted character class
        if (reverse)
          glob++;
        glob++; 
        while (*glob != '\0' && *glob != ']')
          if (lastchr < 0x10ffff && *glob == '-' && glob[1] != ']' && glob[1] != '\0' ?
              glob++, chr <= utf8(&glob) && chr >= lastchr :
              chr == (lastchr = utf8(&glob)))
            matched = TRUE;
        if (matched == reverse)
          break;
        if (*glob)
          glob++;
        continue;
      }

其中 utf8() 返回宽字符并按一个 UTF-8 字符在 glob 中前进。

int utf8(const char **s)
{
  int c1, c2, c3, c = (unsigned char)**s;
  if (c != '\0')
    (*s)++;
  if (c < 0x80)
    return c;
  c1 = (unsigned char)**s;
  if (c < 0xC0 || (c == 0xC0 && c1 != 0x80) || c == 0xC1 || (c1 & 0xC0) != 0x80)
    return 0xFFFD;
  if (c1 != '\0')
    (*s)++;
  c1 &= 0x3F;
  if (c < 0xE0)
    return (((c & 0x1F) << 6) | c1);
  c2 = (unsigned char)**s;
  if ((c == 0xE0 && c1 < 0x20) || (c2 & 0xC0) != 0x80)
    return 0xFFFD;
  if (c2 != '\0')
    (*s)++;
  c2 &= 0x3F;
  if (c < 0xF0)
    return (((c & 0x0F) << 12) | (c1 << 6) | c2);
  c3 = (unsigned char)**s;
  if (c3 != '\0')
    (*s)++;
  if ((c == 0xF0 && c1 < 0x10) || (c == 0xF4 && c1 >= 0x10) || 
       c >= 0xF5 || (c3 & 0xC0) != 0x80)
    return 0xFFFD;
  return (((c & 0x07) << 18) | (c1 << 12) | (c2 << 6) | (c3 & 0x3F));
}

结论

通过使用星号循环迭代而不是递归来实现，通配符匹配、经典 globbing 和现代 gitignore 风格的 globbing 可以很快。这些算法通常在字符串和模式长度上以线性时间运行，在最坏情况下，可能在模式和字符串长度上以二次方或三次方的速度运行，而不会导致 CPU 使用率呈指数级爆炸。

我们可以将类似 shell 的花括号展开等扩展的 globbing 功能添加到我们的匹配器中。然而，花括号展开需要递归和回溯，如果我们不小心，可能会使指数级爆炸问题变得更糟。例如，考虑 a*{b,a*{b,a*{b,a*{b,a*{b,a*{b,a*{b,a*b}}}}}}}。除了限制花括号的数量或将扩展的 glob 转换为用于匹配的正则表达式（尽管正则表达式匹配也可能成本高昂）之外，没有简单的方法来限制花括号展开的执行时间。

我在实现 gitignore 风格的 globbing 时编写了这篇文章，用于 ugrep（快速通用 grep），因为缺乏可用且可用的开源代码。我在官方存储库中找到的 wildmat.c 版本都有一个烦人的 bug，实际上在源代码中也有记录：“在处理格式不正确的模式时可能不够健壮；例如，“foo[a-”可能会导致段错误。” 哎呀。

历史

2019 年 8 月 3 日：初稿
2019 年 8 月 5 日：首次发布
2019 年 8 月 8 日：更新
2019 年 9 月 12 日：添加了 Java 下载源
2019 年 9 月 19 日：添加了 shell 执行示例；[] 不匹配 /；dotglob 改进；新下载源和更新的下载源
2019 年 10 月 17 日：次要更新以通过移除路径名中的前导 /（如果存在）来改进 gitignore globbing。
2020 年 1 月 23 日：改进了字符类 globbing，以便在 -（连字符）是最后一个字符时匹配 shell globbing，例如 [a-]，修复了 UTF-8 字符类范围匹配（即 [] 带有连字符）在“改进”部分。
2023 年 7 月 22 日：次要更新以改进不区分大小写的 globbing 代码。