65.9K
CodeProject 正在变化。 阅读更多。
Home

最快的精确/模糊/通配符全文搜索器!?

starIconstarIconstarIconstarIcon
emptyStarIcon
starIcon

4.71/5 (22投票s)

2013 年 11 月 17 日

CPOL

17分钟阅读

viewsIcon

49352

downloadIcon

2111

优化的多线程控制台全文本搜索器

引言

本文献适合那些追求速度的人。

与其“浪费”时间来学习/调整/基准测试精确/模糊/通配符搜索技术,不如看看我经过大量测试和彻底优化的三首练习曲,它们是用 C 语言编写的,在实际环境中,即作为实际实现。

您总是可以购买包含算法的书籍,而获得一个(超高速运行的)源代码则完全是另一回事。为什么会有这种不匹配?有如此多精通数学和编程的人,却做了如此多的半成品工作!金钱,金钱,金钱,令人惊讶的是,多年来一直在思考(隐藏在金钱背后)的真正原因——缺乏信任。我相信,追随尼古拉·特斯拉(我崇拜的偶像)的精神是一种令人振奋的方式,他之所以前进的动力不是金钱,而是纯真

假设您需要从一个 42GB 的日志文件中快速提取所有包含特定 IP 或文件名的行!

或者,假设您需要“实时”找到“edelweiss”,同时搜索“edelvais”!

你会呼叫谁?

  • 捉鬼敢死队做不到。
  • GREP,不不不,我说的是快速。
  • SQL,省省吧,它太神奇了,能把菜鸟变成“专业程序员”。我喜欢业余主义,[源自拉丁语 amator,意为“爱好者”,源自 amare,意为“爱”],在那里纯真尚未褪色。

    啊哈
    我们现在该去哪里?
    我不知道
    纯真已逝
    正在迅速消逝
    你仍在承诺完美,完美
    用空洞的话语
    用空洞的话语
    用空洞的话语
    用空洞的话语
    很难改掉习惯
    你迷失其中
    你现在爱谁?
    一阵寒意
    刺痛我(刺痛我)
    关闭我感受的方式
    为什么我看不清,看不清
    奋力呼吸
    奋力呼吸
    奋力呼吸
    奋力呼吸
    /Dannii Minogue/

    现在,我所知的最快的全文搜索器 Kazahana 来啦。

背景

有用链接

Kazahana 概览

  • 一个开源项目;
  • 无与伦比的速度性能;
  • 始终通过 OpenMP 强制执行 16 个线程;
  • 充分利用 I/O 读取带宽,例如在 SSD 上达到 234.3MB/s,平均线性读取速度为 236.6MB/s,块大小为 1MB;
  • 转储物理行(以 LF 结尾)以包含您的模式;
  • 可同时在 Linux/Windows 下作为 32/64 位代码运行;
  • 通过“Railgun_Sekireigan_Wolfram”进行精确匹配 - 速度超群;
  • 通过 Wagner–Fischer 算法实现高度优化的模糊字符串匹配,使用了 3 种 Kaze 的启发式方法;
  • 通过 Wagner–Fischer 算法模式进行全面的模糊字符串匹配,功能强大;
  • 通配符匹配牢牢掌握,提供 9 种通配符;
  • 最快的通配符匹配,提供“经典”的 2 种通配符,在 Core 2 T7500 上每线程遍历速度达到 75MB/s 以上;
  • 可调节的小型(通常是最后一级 CPU 缓存的一半大小)主缓冲区,1-7MB,有时高达 350MB 以上;
  • 100% 免费;
  • 由 machinely yours Kaze 用 C 语言编写。

Using the Code

编译简单直接,只需分别使用
Intel, Windows

icl /O3 /Qunroll Kazahana_r1-++fix+nowait_critical_nixFIX_WolfRAM.c 
  /FAcs /FeKazahana_r1-++fix+nowait_critical_nixFIX_WolfRAM_HEXADECAD-Threads_IntelV12 
  /Qopenmp /Qopenmp-link:static -DCommence_OpenMP -D_icl_mumbo_jumbo_

MinGW, Windows

gcc -O3 -funroll-loops -static -o Kazahana_r1-++fix+
  nowait_critical_nixFIX_WolfRAM_GCC_472 Kazahana_r1-++fix+nowait_critical_nixFIX_WolfRAM.c 
  -fopenmp -DCommence_OpenMP -D_gcc_mumbo_jumbo_

GCC, Linux

gcc -O3 -funroll-loops -static -o Kazahana_r1-++fix+
  nowait_critical_nixFIX_WolfRAM_HEXADECAD_GCC_471 Kazahana_r1-++fix+nowait_critical_nixFIX_WolfRAM.c 
  -fopenmp -D_FILE_OFFSET_BITS=64 -DCommence_OpenMP -D_gcc_mumbo_jumbo_
// Change appropriately in 'Kazahana_r1-++fix+nowait_critical_nixFIX_WolfRAM.c':
#define _WIN32_ENVIRONMENT_
//#define _POSIX_ENVIRONMENT_

下面按顺序给出模糊/通配符/精确三种搜索模式

D:\Kazahana_r1-++fix+nowait_critical_nixFIX_WolfRAM>"Kazahana_r1-
  ++fix+nowait_critical_nixFIX_WolfRAM_HEXADECAD-Threads_IntelV12.exe" 4 
  "beautifability" MASAKARI_General-Purpose_Grade_English_Wordlist.wrd 1536
Kazahana, a superfast exact & wildcards & Levenshtein Distance (Wagner-Fischer) 
  searcher, r. 1-++fix+nowait_critical_nixFIX_Wolfram, copyleft Kaze 2013-Nov-15.
omp_get_num_procs( ) = 2
omp_get_max_threads( ) = 2
Enforcing HEXADECAD i.e. hexadecuple-threads ...
Allocating Master-Buffer 1536KB ... OK
/; 00,000,065,535 bytes/clock
Kazahana: Total/Checked/Dumped xgrams: 317,919/179,525/18
Kazahana: Performance: 34 KB/clock
Kazahana: Performance: 2,890 xgrams/clock
Kazahana: Performance: Total/fread() clocks: 110/0
Kazahana: Performance: I/O time, i.e. fread() time, is 0 percents
Kazahana: Performance: RDTSC I/O time, i.e. fread() time, is 18,577,955 ticks
Kazahana: Done.

D:\Kazahana_r1-++fix+nowait_critical_nixFIX_WolfRAM>type Kazahana.txt
dutiability
eatability
bearability
equitability
exuviability
fatigability
breathability
buffability
certifiability
practicability
quantifiability
realisability
realizability
rectifiability
refutability
reputability
negotiability
satiability

D:\Kazahana_r1-++fix+nowait_critical_nixFIX_WolfRAM>"Kazahana_r1-++fix+
  nowait_critical_nixFIX_WolfRAM_HEXADECAD-Threads_IntelV12.exe" 
  "*fathom*" MASAKARI_General-Purpose_Grade_English_Wordlist.wrd 1536
Kazahana, a superfast exact & wildcards & Levenshtein Distance 
  (Wagner-Fischer) searcher, r. 1-++fix+nowait_critical_nixFIX_Wolfram, copyleft Kaze 2013-Nov-15.
omp_get_num_procs( ) = 2
omp_get_max_threads( ) = 2
Enforcing HEXADECAD i.e. hexadecuple-threads ...
Allocating Master-Buffer 1536KB ... OK
/; 00,000,196,607 bytes/clock
Kazahana: Total/Checked/Dumped xgrams: 317,919/317,919/17
Kazahana: Performance: 60 KB/clock
Kazahana: Performance: 5,046 xgrams/clock
Kazahana: Performance: Total/fread() clocks: 63/0
Kazahana: Performance: I/O time, i.e. fread() time, is 0 percents
Kazahana: Performance: RDTSC I/O time, i.e. fread() time, is 11,228,173 ticks
Kazahana: Done.

D:\Kazahana_r1-++fix+nowait_critical_nixFIX_WolfRAM>type Kazahana.txt
[*fathom*] fathom /MASAKARI_General-Purpose_Grade_English_Wordlist.wrd/
[*fathom*] fathomable /MASAKARI_General-Purpose_Grade_English_Wordlist.wrd/
[*fathom*] fathomage /MASAKARI_General-Purpose_Grade_English_Wordlist.wrd/
[*fathom*] fathomed /MASAKARI_General-Purpose_Grade_English_Wordlist.wrd/
[*fathom*] fathomers /MASAKARI_General-Purpose_Grade_English_Wordlist.wrd/
[*fathom*] fathometer /MASAKARI_General-Purpose_Grade_English_Wordlist.wrd/
[*fathom*] fathometers /MASAKARI_General-Purpose_Grade_English_Wordlist.wrd/
[*fathom*] fathoming /MASAKARI_General-Purpose_Grade_English_Wordlist.wrd/
[*fathom*] fathomless /MASAKARI_General-Purpose_Grade_English_Wordlist.wrd/
[*fathom*] fathomlessly /MASAKARI_General-Purpose_Grade_English_Wordlist.wrd/
[*fathom*] fathomlessness /MASAKARI_General-Purpose_Grade_English_Wordlist.wrd/
[*fathom*] fathoms /MASAKARI_General-Purpose_Grade_English_Wordlist.wrd/
[*fathom*] unfathomability /MASAKARI_General-Purpose_Grade_English_Wordlist.wrd/
[*fathom*] unfathomable /MASAKARI_General-Purpose_Grade_English_Wordlist.wrd/
[*fathom*] unfathomableness /MASAKARI_General-Purpose_Grade_English_Wordlist.wrd/
[*fathom*] unfathomably /MASAKARI_General-Purpose_Grade_English_Wordlist.wrd/
[*fathom*] unfathomed /MASAKARI_General-Purpose_Grade_English_Wordlist.wrd/

D:\Kazahana_r1-++fix+nowait_critical_nixFIX_WolfRAM>"Kazahana_r1-++fix+
  nowait_critical_nixFIX_WolfRAM_HEXADECAD-Threads_IntelV12.exe" 
  "molyb" MASAKARI_General-Purpose_Grade_English_Wordlist.wrd 1536
Kazahana, a superfast exact & wildcards & Levenshtein Distance (Wagner-Fischer) 
  searcher, r. 1-++fix+nowait_critical_nixFIX_Wolfram, copyleft Kaze 2013-Nov-15.
omp_get_num_procs( ) = 2
omp_get_max_threads( ) = 2
Enforcing HEXADECAD i.e. hexadecuple-threads ...
Allocating Master-Buffer 1536KB ... OK
/; 00,000,185,042 bytes/clock
Kazahana: Dumped xgrams: 24
Kazahana: Performance: 223 KB/clock
Kazahana: Performance: Total/fread() clocks: 17/16
Kazahana: Performance: I/O time, i.e. fread() time, is 94 percents
Kazahana: Performance: RDTSC I/O time, i.e. fread() time, is 10,902,419 ticks
Kazahana: Done.

D:\Kazahana_r1-++fix+nowait_critical_nixFIX_WolfRAM>type Kazahana.txt
[molyb] ferrimolybdite /MASAKARI_General-Purpose_Grade_English_Wordlist.wrd/
[molyb] ferromolybdenum /MASAKARI_General-Purpose_Grade_English_Wordlist.wrd/
[molyb] molybdaena /MASAKARI_General-Purpose_Grade_English_Wordlist.wrd/
[molyb] molybdate /MASAKARI_General-Purpose_Grade_English_Wordlist.wrd/
[molyb] molybdates /MASAKARI_General-Purpose_Grade_English_Wordlist.wrd/
[molyb] molybdena /MASAKARI_General-Purpose_Grade_English_Wordlist.wrd/
[molyb] molybdenic /MASAKARI_General-Purpose_Grade_English_Wordlist.wrd/
[molyb] molybdenite /MASAKARI_General-Purpose_Grade_English_Wordlist.wrd/
[molyb] molybdenosis /MASAKARI_General-Purpose_Grade_English_Wordlist.wrd/
[molyb] molybdenous /MASAKARI_General-Purpose_Grade_English_Wordlist.wrd/
[molyb] molybdenum /MASAKARI_General-Purpose_Grade_English_Wordlist.wrd/
[molyb] molybdic /MASAKARI_General-Purpose_Grade_English_Wordlist.wrd/
[molyb] molybdite /MASAKARI_General-Purpose_Grade_English_Wordlist.wrd/
[molyb] molybdoenzyme /MASAKARI_General-Purpose_Grade_English_Wordlist.wrd/
[molyb] molybdomancy /MASAKARI_General-Purpose_Grade_English_Wordlist.wrd/
[molyb] molybdomenite /MASAKARI_General-Purpose_Grade_English_Wordlist.wrd/
[molyb] molybdophosphate /MASAKARI_General-Purpose_Grade_English_Wordlist.wrd/
[molyb] molybdophyllite /MASAKARI_General-Purpose_Grade_English_Wordlist.wrd/
[molyb] molybdopterin /MASAKARI_General-Purpose_Grade_English_Wordlist.wrd/
[molyb] molybdosis /MASAKARI_General-Purpose_Grade_English_Wordlist.wrd/
[molyb] molybdous /MASAKARI_General-Purpose_Grade_English_Wordlist.wrd/
[molyb] phosphomolybdate /MASAKARI_General-Purpose_Grade_English_Wordlist.wrd/
[molyb] phosphomolybdic /MASAKARI_General-Purpose_Grade_English_Wordlist.wrd/
[molyb] polymolybdate /MASAKARI_General-Purpose_Grade_English_Wordlist.wrd/

D:\Kazahana_r1-++fix+nowait_critical_nixFIX_WolfRAM>

正如中国的古老谚语所说,两张图胜过两千言。

图 #1

683665/Bonboniera_rag.png

对于上面在 Wikipedia 文件中进行全文搜索“金属疲劳”(在我的笔记本电脑“Bonboniera”上执行)的统计数据是

总时间I/O 时间:183941 时钟周期 vs 164638 时钟周期,这意味着搜索时间(在“Railgun_Sekireigan_Wolfram”中花费的时间)等于 183941-164638 = 19303 时钟周期,或者说“metal fatigue”这个目标的所有出现次数都以 45,261,093,005 字节 / 19303 时钟周期 = 2236MB/s 的速度找到,通常这种目标搜索速度为 3500MB/s,作为一个整体块,这里的主缓冲区是 7168KB,分为 16 个块,得到 (43164MB/7168KB)*16 = 98660 次“WolfRAM”调用,我称之为 MUTSI 性能。更精确的计时证实 I/O 时间为 359617617625 滴答 / 2200000000 滴答 = 163 秒。

图 #2

683665/Results_Core2_T7500_small.png

上面的基准测试是在我的笔记本电脑上完成的,配备 Core 2 T7500 2200MHz(根据 EVEREST,内存读取速度为 5454MB/s,内存复制速度为 4014MB/s)。造成如此受限的结果(1.2-1.4GB/s)的原因是需要查找/提取包含模式的行,但是真正的瓶颈并非该逻辑,'memchr()' 部分,而是 'memcpy()'。在“Ivy Bridge”上的一些基准测试显示内存读取速度为 42GB/s,而内存复制速度则难以达到 19-20GB/s。由于 haystack 是间接遍历的,即涉及 memcpy(),复制 haystack 到主缓冲区需要 2-3 秒。这两个瓶颈占用了笔记本电脑(Core 2)总时间的 75%。

目标“metal fatigue”的全部 16 个出现次数都以 1024MB / (781-574) 时钟周期 = 4946MB/s 的速度找到,太棒了!

Kazahana: Dumped xgrams: 16
Kazahana: Performance: 1,342 KB/clock
Kazahana: Performance: Total/fread() clocks: 781/574
Kazahana: Performance: I/O time, i.e. fread() time, is 73 percents
Kazahana: Performance: RDTSC I/O time, i.e. fread() time, is 1,331,116,253 ticks

美味,4946MB/s:2236MB/s 的比率或2.2:1 更清晰地显示了“WolfRAM”有多酷。
再次检查:1,331,116,253 滴答 / 2200000000 滴答 = 0.6 秒 ~ 574 时钟周期。

回到那 42GB,如果机器是一头野兽,例如拥有 64GB 主内存(系统缓存可以容纳整个文件)和 16 个线程,那么您可以期待 |???| “Hana Senpu” “风之旋转”,这正是 Kazahana 的含义。

看到连最基本的操作(查找一些文本)都如此缓慢,我感到恶心。
全文处理很重要,无法在一秒钟内提取离线所需的 Wikipedia 行,这说明了当今硬件和软件的拥堵程度。

我每天(或者说每小时)都需要包含指定单词/短语的所有句子/段落,例如,我想要 Wikipedia 中“metal fatigue”这 338 个出现的所有内容,形式如下

[metal fatigue] The [[ICE 1]] trains were equipped with single-cast [[wheelset 
  (rail transport)|wheelsets]], known as [[monobloc]] wheels. Once in service it soon became apparent 
  that this design could, as a result of [[metal fatigue]] and out-of-round conditions, result in [[resonance]] 
  and vibration at cruising speed. Passengers noticed this particularly in the restaurant car, where there 
  were reports of loud vibrations in the dinnerware and of glasses "creeping" across tables. /enwiki-20130904-pages-articles.xml/
[metal fatigue] At Eschede in Lower Saxony, Germany a high speed passenger train became derailed in 1998, 
and 101 persons died. The primary cause was the fracture from metal fatigue of a wheel tyre; the train failed 
to negotiate two sets of points and struck the pier of an overbridge. It was the most serious railway accident 
in Germany, and also the most serious on any high speed (over {{convert|200|kph|mph}}) line. 
Ultrasonic testing had failed to reveal the incipient fracture.<ref name = eschede>Erich Preuß, 
''Eschede, 10 Uhr 59. Die Geschichte einer Eisenbahn-Katastrophe'', GeraNova Zeitschriftenverlag, 2002, 
ISBN 3-932785-21-5</ref> See [[Eschede train disaster]]. /enwiki-20130904-pages-articles.xml/
[metal fatigue] * The [[De Havilland Comet#Comet disasters of 1954|De Havilland Comet disasters of 1954]], 
later determined to be structural failures due to unanticipated metal fatigue at the corners of square 
windows used by the Comet 1. /enwiki-20130904-pages-articles.xml/
...

在我看来,如今的 PC 已经成为过去,也就是说,它们将被更快、更便宜的主流计算机所取代——这是我尊敬的保加利亚...先知 Slava Sevryukova 的预言。
简单来说,即将到来的重大技术飞跃将吹走我们的狭隘思维,Kazahana 只是一个小小的台风级先驱。

“现代‘台风’一词受到粤语借词‘taaîfung’的影响,并重新拼写使其更像希腊语。‘Taaîfung’,字面意思是‘大风’,碰巧与阿拉伯语的借词相似,并在 1699 年首次以英文形式记录为‘tuffoon’。”
/Heritage/

微风
认识你是一种荣幸
但我听到了风暴
在呼唤我
在呼唤我
微风
它们在呼唤我
所以,带上火焰
带上冰霜
给我你所拥有的一切
我不再掷骰子
稳住我的枪
稳住我的心
一旦我做出选择
就无法回头
微风
一丝滋味就像天堂
我会战斗
再次感受你
又一次,又一次,又一次,又一次
/Conjure One - Zephyr, 'Exilarch' 专辑/

由于这是一个开源项目,我希望在更有经验的程序员的帮助下改进 Kazahana,最终使其成为一个能够匹配快速硬件的搜索工具,并免费提供。

2014-11-20 的一个附加项

我的歉意,昨天我发现了一个愚蠢的命令行解析错误,导致在全面模糊模式下跳过长行。

错误是

// MAXboth = MaxLineLength +1+1 +(167*WILDCARD_IP_flag*MaxLineLength); // Buggy line, fixed with next one in r. ...CS_fix
//	if (WILDCARD_IP_flag) {
//		MAXboth = MaxLineLength +1+1 +(167*WILDCARD_IP_flag*MaxLineLength);
//	} else {
//		MAXboth = MaxLineLength +1+1 +(167*EXHAUSTIVE_flag*MaxLineLength);
//	}

现在一切都正常了,已修复并重新上传,玩得开心!

我一直想展示一个利用 Kazahana 全部强大功能的重度模糊示例,但我仍然没有 16 线程的机器,所以以下是在 Q9550s 2833MHz 笔记本电脑上运行的最新英文 Wikipedia 测试

D:\_KAZE\GameraWikipediaWiktionary>dir

11/15/2014  01:25 PM    50,151,236,957 enwiki-20141008-pages-articles.xml
11/19/2014  12:33 PM           494,080 Kazahana_r1-++fix+nowait_critical_nixFIX_WolfRAM+fixITER+EX+CS_fix_HEXADECAD-Threads_IntelV12_SSE2_32bit.exe
09/17/2014  02:27 PM             4,096 timer32.exe

D:\_KAZE\GameraWikipediaWiktionary>timer32.exe Kazahana_r1-++fix+nowait_critical_nixFIX_WolfRAM+fixITER+EX+CS_fix_HEXADECAD-Threads_IntelV12_SSE2_32bit.exe 4e "Silvestor Staloune" enwiki-20141008-pages-articles.xml 11263
Kazahana, a superfast exact & wildcards & Levenshtein Distance (Wagner-Fischer) searcher, r. 1-++fix+nowait_critical_nixFIX_Wolfram+fixITER+EX+CS_fix, copyleft Kaze 2014-Nov-19.
Pattern: Silvestor Staloune
omp_get_num_procs( ) = 4
omp_get_max_threads( ) = 4
Enforcing HEXADECAD i.e. hexadecuple-threads ...
Allocating Master-Buffer 11263KB ... OK
\; 00,000,002,131 bytes/clock
Kazahana: Total/Checked/Dumped xgrams: 800,855,553/341,600,291,334/2,106
Kazahana: Performance: 2 KB/clock
Kazahana: Performance: 34 xgrams/clock
Kazahana: Performance: Total/fread() clocks: 23,542,319/369,148
Kazahana: Performance: I/O time, i.e. fread() time, is 1 percents
Kazahana: Performance: RDTSC I/O time, i.e. fread() time, is 1,045,731,585,253 ticks
Kazahana: Done.

Kernel  Time =   222.800 =    0%
User    Time = 88536.121 =  376%
Process Time = 88758.921 =  377%    Virtual  Memory =     18 MB
Global  Time = 23542.910 =  100%    Physical Memory =     18 MB

尽管进行了所有调整(包括选择 11MB 的主缓冲区,Q9550s 拥有 12MB 缓存),遍历速度仍然很低 - 2MB/s。顺便说一句,我们那位朋友的正确名字是“Sylvester Stallone”,我故意犯了 4 个错误,在结果文件中我发现了 4 个拼写错误。

Sylvester Stalone
Sylvestor Stallone
Slvester Stallone
Silvester Stallone

显然 Wikipedia 还没有运行过模糊过滤器。

2013 年 11 月 22 日的附加项

最后说最后的事。我对“gcc+OpenMP+recursion”的组合非常失望,我的建议是除非你精通这三者,否则要避免这个组合。原因是我花了 2 个小时来应对“MinGW+OpenMP+recursion”和“Linux+OpenMP+recursion”,当进行通配符搜索(带有长行)并且使用 gcc+OpenMP 时,会发生简单的“段错误”崩溃?!
我不会再浪费时间在这“巨魔”身上了。一种结束危机的方法是用 PTHREADS 替换 OpenMP,但这目前不是我的优先事项。
不要弄错,问题不在 Kazahana,而在于这个组合,而且即使是 MONAD(单线程)ELF/EXE 玩偶式的,也比所有 GREP 式的爬行者要快得多,也就是说,它们与它们相比是行走的。
Intel 的可执行文件工作得很好——32/64 位多线程 EXE 在与这个组合相同的场景下运行得飞快,而那个组合则会崩溃。

昨天,进行了一个小改动,以便允许对所有长度的行进行模糊搜索,下面给出了对 Wikipedia 转储的搜索,改动本身

//if (m>MaxLineLength)
//{ printf( "\nKazahana: Incoming xgram exceeding the limit.\n" ); exit( 2 ); }
// Above two commented lines are too severe, changed with next line allowing to search into lines bigger than our needs:
if (m<=MaxLineLength)

之所以这样做,是因为我只需要针对特定的 x-gram 文件进行模糊搜索。这个改动为查看 Wikipedia 文件提供了可能性,嗯哼。Levenshtein 距离越大,所需的计算就越重,也就是说,线程得到了很好的喂养(与 EXACT 模式相反,在这种模式下主要强调主内存 I/O),让我们遍历 Wikipedia 1024MB 的转储,使用 LD = 18,这将给出我们字符串的所有“变异”,顺便说一句,这种搜索在您需要给定句子 i.e. 当您需要查看给定句子在您的 LD 中的可行变体时非常有用。换句话说,Levenshtein 不仅对于单词和短语(所谓的 x-gram)非常有用,而且对于句子和……段落也很有用,哇!想象一个复合行(Wikipedia 由这样的行组成):一个由几个句子组成的物理行,这些句子在视觉换行后显示为一个段落。这就是 16 个线程尖叫的地方,FUZZY & WILDCARD,对于 2 个线程,它们几乎使性能加倍,对于 4 个线程也是如此 - 四倍。

下面给出了两个测试,一个在 Linux 下,一个在 Windows 下。这三种搜索模式都被“折磨”。

测试 #1,Linux 32 位,Kazahana 作为 32 位单线程 ELF(gcc 4.7.1;选项:-O3 -funroll-loops -static)

time ./Kazahana_r1-++fix+nowait_critical_nixFIX_WolfRAM+_MONAD_GCC_471 
  "metal fatigue" enwiki-20130904-pages-articles.7z.001 1536
Kazahana, a superfast exact & wildcards & Levenshtein Distance (Wagner-Fischer) 
  searcher, r. 1-++fix+nowait_critical_nixFIX_Wolfram+, copyleft Kaze 2013-Nov-21.
Enforcing MONAD i.e. single-thread ...
Allocating Master-Buffer 1536KB ... OK
/; 00,000,001,204 bytes/clock
Kazahana: Dumped xgrams: 16
Kazahana: Performance: 1 KB/clock
Kazahana: Performance: Total/fread() clocks: 900,001/550,000
Kazahana: Performance: I/O time, i.e. fread() time, is 61 percents
Kazahana: Done.

real 0m0.934s
user 0m0.380s
sys  0m0.552s

time ./Kazahana_r1-++fix+nowait_critical_nixFIX_WolfRAM+_MONAD_GCC_471 
  "*[[*Category:*pai*nters *" enwiki-20130904-pages-articles.7z.001 1536
Kazahana, a superfast exact & wildcards & Levenshtein Distance (Wagner-Fischer) 
  searcher, r. 1-++fix+nowait_critical_nixFIX_Wolfram+, copyleft Kaze 2013-Nov-21.
Enforcing MONAD i.e. single-thread ...
Allocating Master-Buffer 1536KB ... OK
/; 00,000,000,014 bytes/clock
Kazahana: Total/Checked/Dumped xgrams: 9,382,307/7,914,526/56
Kazahana: Performance: 0 KB/clock
Kazahana: Performance: 0 xgrams/clock
Kazahana: Performance: Total/fread() clocks: 75,640,001/560,000
Kazahana: Performance: I/O time, i.e. fread() time, is 0 percents
Kazahana: Done.

real 1m15.750s
user 1m15.169s
sys  0m0.584s

time ./Kazahana_r1-++fix+nowait_critical_nixFIX_WolfRAM+_MONAD_GCC_471 18 
  "[[Category:Italian painters of the 20th century]]" enwiki-20130904-pages-articles.7z.001 1536
Kazahana, a superfast exact & wildcards & Levenshtein Distance (Wagner-Fischer) 
  searcher, r. 1-++fix+nowait_critical_nixFIX_Wolfram+, copyleft Kaze 2013-Nov-21.
Enforcing MONAD i.e. single-thread ...
Allocating Master-Buffer 1536KB ... OK
/; 00,000,000,089 bytes/clock
Kazahana: Total/Checked/Dumped xgrams: 9,382,307/2,005,354/69
Kazahana: Performance: 0 KB/clock
Kazahana: Performance: 0 xgrams/clock
Kazahana: Performance: Total/fread() clocks: 11,980,001/550,000
Kazahana: Performance: I/O time, i.e. fread() time, is 4 percents
Kazahana: Done.

real 0m12.332s
user 0m11.613s
sys  0m0.716s

此外,我想将三种搜索类型的性能与 Intel 的 32 位和 64 位代码并列展示,快速的决斗如下。

测试 #2,Windows 7 64 位,Kazahana 作为 32/64 位多线程 EXE(Intel 12.1;选项:/O3 /Qunroll /Qopenmp /Qopenmp-link:static)

D:\Kazahana_C_ELF_EXE\EXE>dir kaz*

11/22/2013  03:44 AM               389 Kazahana_compile_WolfRAM+_Intel.bat
11/22/2013  04:07 AM           779,675 Kazahana_r1-++fix+nowait_critical_nixFIX_WolfRAM+.c
11/22/2013  04:11 AM         7,543,701 Kazahana_r1-++fix+nowait_critical_nixFIX_WolfRAM+_32bit.cod
11/22/2013  04:09 AM         8,296,283 Kazahana_r1-++fix+nowait_critical_nixFIX_WolfRAM+_64bit.cod
11/22/2013  04:11 AM           466,944 Kazahana_r1-++fix+nowait_critical_nixFIX_WolfRAM+_HEXADECAD-Threads_IntelV12_32bit.exe
11/22/2013  04:09 AM           555,520 Kazahana_r1-++fix+nowait_critical_nixFIX_WolfRAM+_HEXADECAD-Threads_IntelV12_64bit.exe
11/22/2013  04:11 AM           153,088 Kazahana_r1-++fix+nowait_critical_nixFIX_WolfRAM+_MONAD-Thread_IntelV12_32bit.exe
11/22/2013  04:09 AM           183,808 Kazahana_r1-++fix+nowait_critical_nixFIX_WolfRAM+_MONAD-Thread_IntelV12_64bit.exe

D:\Kazahana_C_ELF_EXE\EXE>dir en*

11/20/2013  03:12 AM     1,073,741,824 enwiki-20130904-pages-articles.7z.001

D:\Kazahana_C_ELF_EXE\EXE>Kazahana_r1-++fix+nowait_critical_nixFIX_WolfRAM+
  _HEXADECAD-Threads_IntelV12_32bit.exe "metal fatigue" enwiki-20130904-pages-articles.7z.001 1536 
Kazahana, a superfast exact & wildcards & Levenshtein Distance 
  (Wagner-Fischer) searcher, r. 1-++fix+nowait_critical_nixFIX_Wolfram+, copyleft Kaze 2013-Nov-21.
omp_get_num_procs( ) = 2
omp_get_max_threads( ) = 2
Enforcing HEXADECAD i.e. hexadecuple-threads ...
Allocating Master-Buffer 1536KB ... OK

/; 00,001,183,695 bytes/clockKazahana: Dumped xgrams: 16
Kazahana: Performance: 1,157 KB/clock
Kazahana: Performance: Total/fread() clocks: 906/671
Kazahana: Performance: I/O time, i.e. fread() time, is 74 percents
Kazahana: Performance: RDTSC I/O time, i.e. fread() time, is 1,397,776,523 ticks
Kazahana: Done.

D:\Kazahana_C_ELF_EXE\EXE>Kazahana_r1-++fix+nowait_critical_nixFIX_WolfRAM+
  _HEXADECAD-Threads_IntelV12_32bit.exe "*[[*Category:*pai*nters *" 
  enwiki-20130904-pages-articles.7z.001 1536 
Kazahana, a superfast exact & wildcards & Levenshtein Distance (Wagner-Fischer) 
  searcher, r. 1-++fix+nowait_critical_nixFIX_Wolfram+, copyleft Kaze 2013-Nov-21.
omp_get_num_procs( ) = 2
omp_get_max_threads( ) = 2
Enforcing HEXADECAD i.e. hexadecuple-threads ...
Allocating Master-Buffer 1536KB ... OK
/; 00,000,031,162 bytes/clockKazahana: Total/Checked/Dumped xgrams: 9,382,307/7,914,526/56
Kazahana: Performance: 30 KB/clock
Kazahana: Performance: 271 xgrams/clock
Kazahana: Performance: Total/fread() clocks: 34,523/684
Kazahana: Performance: I/O time, i.e. fread() time, is 1 percents
Kazahana: Performance: RDTSC I/O time, i.e. fread() time, is 1,420,836,593 ticks
Kazahana: Done.

D:\Kazahana_C_ELF_EXE\EXE>Kazahana_r1-++fix+nowait_critical_nixFIX_WolfRAM+
  _HEXADECAD-Threads_IntelV12_32bit.exe 18 "[[Category:Italian painters of the 20th century]]" 
  enwiki-20130904-pages-articles.7z.001 1536 
Kazahana, a superfast exact & wildcards & Levenshtein Distance (Wagner-Fischer) 
  searcher, r. 1-++fix+nowait_critical_nixFIX_Wolfram+, copyleft Kaze 2013-Nov-21.
omp_get_num_procs( ) = 2
omp_get_max_threads( ) = 2
Enforcing HEXADECAD i.e. hexadecuple-threads ...
Allocating Master-Buffer 1536KB ... OK

/; 00,000,146,546 bytes/clockKazahana: Total/Checked/Dumped xgrams: 9,382,307/2,005,354/69
Kazahana: Performance: 142 KB/clock
Kazahana: Performance: 1,276 xgrams/clock
Kazahana: Performance: Total/fread() clocks: 7,349/730
Kazahana: Performance: I/O time, i.e. fread() time, is 9 percents
Kazahana: Performance: RDTSC I/O time, i.e. fread() time, is 1,426,790,820 ticks
Kazahana: Done.

D:\Kazahana_C_ELF_EXE\EXE>Kazahana_r1-++fix+nowait_critical_nixFIX_WolfRAM+
  _HEXADECAD-Threads_IntelV12_64bit.exe "metal fatigue" enwiki-20130904-pages-articles.7z.001 1536 
Kazahana, a superfast exact & wildcards & Levenshtein Distance (Wagner-Fischer) 
  searcher, r. 1-++fix+nowait_critical_nixFIX_Wolfram+, copyleft Kaze 2013-Nov-21.
omp_get_num_procs( ) = 2
omp_get_max_threads( ) = 2
Enforcing HEXADECAD i.e. hexadecuple-threads ...
Allocating Master-Buffer 1536KB ... OK

/; 00,001,125,318 bytes/clockKazahana: Dumped xgrams: 16
Kazahana: Performance: 1,100 KB/clock
Kazahana: Performance: Total/fread() clocks: 953/717
Kazahana: Performance: I/O time, i.e. fread() time, is 75 percents
Kazahana: Performance: RDTSC I/O time, i.e. fread() time, is 1,380,011,061 ticks
Kazahana: Done.

D:\Kazahana_C_ELF_EXE\EXE>Kazahana_r1-++fix+nowait_critical_nixFIX_WolfRAM+
  _HEXADECAD-Threads_IntelV12_64bit.exe "*[[*Category:*pai*nters *" enwiki-20130904-pages-articles.7z.001 1536 
Kazahana, a superfast exact & wildcards & Levenshtein Distance 
  (Wagner-Fischer) searcher, r. 1-++fix+nowait_critical_nixFIX_Wolfram+, copyleft Kaze 2013-Nov-21.
omp_get_num_procs( ) = 2
omp_get_max_threads( ) = 2
Enforcing HEXADECAD i.e. hexadecuple-threads ...
Allocating Master-Buffer 1536KB ... OK

/; 00,000,029,966 bytes/clockKazahana: Total/Checked/Dumped xgrams: 9,382,307/7,914,526/56
Kazahana: Performance: 29 KB/clock
Kazahana: Performance: 261 xgrams/clock
Kazahana: Performance: Total/fread() clocks: 35,913/509
Kazahana: Performance: I/O time, i.e. fread() time, is 1 percents
Kazahana: Performance: RDTSC I/O time, i.e. fread() time, is 1,426,409,831 ticks
Kazahana: Done.

D:\Kazahana_C_ELF_EXE\EXE>Kazahana_r1-++fix+nowait_critical_nixFIX_WolfRAM+
  _HEXADECAD-Threads_IntelV12_64bit.exe 18 "[[Category:Italian painters of the 20th century]]" 
  enwiki-20130904-pages-articles.7z.001 1536 
Kazahana, a superfast exact & wildcards & Levenshtein Distance (Wagner-Fischer) searcher, 
  r. 1-++fix+nowait_critical_nixFIX_Wolfram+, copyleft Kaze 2013-Nov-21.
omp_get_num_procs( ) = 2
omp_get_max_threads( ) = 2
Enforcing HEXADECAD i.e. hexadecuple-threads ...
Allocating Master-Buffer 1536KB ... OK
/; 00,000,174,890 bytes/clockKazahana: Total/Checked/Dumped xgrams: 9,382,307/2,005,354/69
Kazahana: Performance: 170 KB/clock
Kazahana: Performance: 1,522 xgrams/clock
Kazahana: Performance: Total/fread() clocks: 6,163/647
Kazahana: Performance: I/O time, i.e. fread() time, is 10 percents
Kazahana: Performance: RDTSC I/O time, i.e. fread() time, is 1,436,490,308 ticks
Kazahana: Done.

D:\Kazahana_C_ELF_EXE\EXE>type kazahana.txt 
[[Category:Italian painters of the 16th century]]
[[Category:Italian painters of the 17th century]]
[[Category:Italian painters of the 16th century]]
[[Category:Italian painters of the 17th century]]
[[Category:Films set in the 13th century]]
[[Category:West Indian cricketers of the 21st century]]
[[Category:Films set in the 13th century]]
[[Category:Italian painters of the 13th century]]
[[Category:Italian painters of the 14th century]]
[[Category:Italian painters of the 16th century]]
[[Category:Italian painters of the 17th century]]
[[Category:Italian painters of the 15th century]]
[[Category:Italian painters of the 15th century]]
[[Category:Italian painters of the 16th century]]
[[Category:Italian Ministers of the Interior]]
[[Category:Films set in the 23rd century]]
[[Category:Italian people of Irish descent]]
[[Category:Italian painters of the 17th century]]
[[Category:Italian painters of the 17th century]]
[[Category:Italian painters of the 18th century]]
[[Category:Italian painters of the 15th century]]
[[Category:Italian painters of the 15th century]]
[[Category:Italian painters of the 16th century]]
[[Category:Novels set in the 18th century]]
[[Category:Italian painters of the 15th century]]
[[Category:Italian painters of the 16th century]]
[[Category:Chilean Ministers of the Interior]]
[[Category:Italian painters of the 15th century]]
[[Category:Italian people of Greek descent]]
[[Category:Italian painters of the 20th century]]
[[Category:Italian painters of the 21st century]]
[[Category:Films set in the 23rd century]]
[[Category:Films set in the 23rd century]]
[[Category:Films set in the 24th century]]
[[Category:Films set in the 24th century]]
[[Category:Italian painters of the 16th century]]
[[Category:Italian painters of the 17th century]]
[[Category:Italian painters of the 15th century]]
[[Category:Films set in the 16th century]]
[[Category:Italian people of the Risorgimento]]
[[Category:Italian painters of the 15th century]]
[[Category:Italian painters of the 16th century]]
[[Category:Italian people of French descent]]
[[Category:Italian people of the Risorgimento]]
[[Category:Italian people of the Risorgimento]]
[[Category:Films set in the 23rd century]]
[[Category:Italian painters of the 16th century]]
[[Category:Italian painters of the 17th century]]
[[Category:Italian painters of the 16th century]]
[[Category:Italian painters of the 16th century]]
[[Category:Italian painters of the 17th century]]
[[Category:Italian painters of the 18th century]]
[[Category:Films set in the 19th century]]
[[Category:Films set in the 17th century]]
[[Category:Italian painters of the 19th century]]
[[Category:Italian painters of the 20th century]]
[[Category:Novels set in the 19th century]]
[[Category:Films set in the 14th century]]
[[Category:Italian people of Swedish descent]]
[[Category:Italian people of French descent]]
[[Category:Italian people of the Risorgimento]]
[[Category:Italian painters of the 15th century]]
[[Category:Italian painters of the 16th century]]
[[Category:Novels set in the 19th century]]
[[Category:Films set in the 20th century]]
[[Category:Italian people of French descent]]
[[Category:Films set in the 16th century]]
[[Category:Films set in the 23rd century]]
[[Category:Films set in the 10th century]]

D:\Kazahana_C_ELF_EXE\EXE>

嗯,我本来以为会看到“法国/荷兰/俄罗斯/...画家”,哦,这只是整个 Wikipedia 的 1/42。正如你所见,Levenshtein 距离为 18 的字符串“[[Category:Italian painters of the 20th century]]”内的所有 69 个命中都被以这样的速度转储:

* 32 位代码:9,382,307/7,349 = 1,276,678 行/秒。
* 64 位代码:9,382,307/6,163 = 1,522,360 行/秒。

转换为更易读的格式,看到这 69 个出现次数在我的笔记本电脑上花费了 7.4/6.2 秒。嗯……需要更多线程。

简而言之,就 2 个线程而言,EXACT 搜索模式受限于主 RAM 复制,没有显示出显著的提升,而 FUZZY 和 WILDCARD 模式的运行速度是其两倍。

最后,希望有谁知道 gcc 行为异常的原因能与我们分享。在此期间,源代码以及单线程 ELF 和多线程 EXE 均可在下载区找到,请享用!

2013 年 11 月 24 日的附加项

抱歉,修复一个愚蠢的错误迫使我重新编译并重新上传。

错误是 Kazahana 中的数组被声明为“unsigned int”,而在 Galadriel 中是‘int’,修复本身

int LevenshteinT1[MaxLineLength+1][MaxLineLength+1]; // declare int d[0..m, 0..n]
...
int LevenshteinTf[MaxLineLength+1][MaxLineLength+1]; // declare int d[0..m, 0..n]

Galadriel 源代码包含在 ZIP 文件中,整个 Wagner-Fischer 部分最初是用我的工具 Galadriel 编写的,然后才复制粘贴到 Kazahana 中,在传输过程中,我愚蠢地在“int”前面加上了“unsigned”(值不能为负,但它们决定了减法),原始片段如下

int Levenshtein[MaxLineLength+1][MaxLineLength+1]; // declare int d[0..m, 0..n]
...
if ( 0 < wrdlen && wrdlen < MaxLineLength +1+1)
{
    wrd[ wrdlen ] = 0;
    if ( wrd[ wrdlen-1 ] == 13 ) //CR
        wrd[ --wrdlen ] = 0;
// A simple heuristic #1: Don't enter the nasty loops unless MaximumLevenshteinDistance >= ABS(m-n).
m = wrdlen; // strlen(wrd);
if (m>MaxLineLength)
{ printf( "Galadriel: Incoming xgram exceeding the limit.\n" ); return( 1 ); }
#if defined(_WIN32ASM_)
if (AtMostLevenshteinDistance >= abs_AF(m-n))
#else
if (AtMostLevenshteinDistance >= MAX(m,n)-MIN(m,n)) // This is the only add-on for r.1+
#endif
{
// LD [
// A simple heuristic #3: Don't recalculate rows for identical leftmost chars we already have.
// Caution: heuristic #3 works in pair with heuristic #2
// (the bail out position is less or equal to cached chars), whereas heuristic #1 is standalone.
StartingPosition = 1;
while (    wrdCACHED[StartingPosition-1] && wrdCACHED[StartingPosition-1]==wrd[StartingPosition-1])
// No need of && wrd[StartingPosition-1] 
    StartingPosition++;
// The bail out 'i' value (heuristic #2) affects our cached value here, 'StartingPosition' cannot be greater than 'i':
StartingPosition = MIN(StartingPosition, i);
// Printing Matrix [
// printf("\n%s", wrd);
// Printing Matrix ]
    WordsChecked++;
    SkipHeuristic=0;
    for(i=StartingPosition;i<=m;i++) { // StartingPosition is in range 1..
// Printing Matrix [
// printf("\n");
// Printing Matrix ]
    for (j=1;j<=n;j++) {
        if(wrd[i-1] == argv[2][j-1])
            Levenshtein[i][j] = Levenshtein[i-1][j-1];
        else
#if defined(_WIN32ASM_)
            Levenshtein[i][j] = min_AF(Levenshtein[i-1][j]+1, 
              Levenshtein[i][j-1]+1, Levenshtein[i-1][j-1]+1); // Variant 1: 237,270 xgrams/s
#else
            Levenshtein[i][j] = MIN(MIN((Levenshtein[i-1][j]+1),(Levenshtein[i][j-1]+1)),
              (Levenshtein[i-1][j-1]+1)); // Variant 2: This MS code is faster than above jumpless code! // 358,327 xgrams/s
            //{Levenshtein[i][j] = MIN(MIN(Levenshtein[i-1][j],Levenshtein[i][j-1]),
            //  Levenshtein[i-1][j-1]); --Levenshtein[i][j];} // Variant 3: This compound
            // line is much slower than above inc-inc-inc code! // 237,270 xgrams/s
#endif
// Printing Matrix [
// printf("%s ", _ui64toaKAZEzerocomma(Levenshtein[i][j], llTOaDigits, 10)+(26-2) );
// Printing Matrix ]
        }

        // A simple heuristic #2: Discontinue the nasty vertical loop (i.e. m)
        // when the LD in cell in the last column minus the remaining vertical cycles is greater than our MAX LD:
        if ( Levenshtein[i][n] - (m-i) >= 0 && AtMostLevenshteinDistance < 
          Levenshtein[i][n] - (m-i) ) {SkipHeuristic=1; break;}
          // Caution: Levenshtein[i][n] can be less than (m-i), this changes nothing the logic is the same.
        //          C  C  C  C  C  C  C  C  C  C  C  C  C
        //          o  o  o  o  o  o  o  o  o  o  o  o  o
        //          l  l  l  l  l  l  l  l  l  l  l  l  l
        //          u  u  u  u  u  u  u  u  u  u  u  u  u
        //          m  m  m  m  m  m  m  m  m  m  m  m  m
        //          n  n  n  n  n  n  n  n  n  n  n  n  n
        //          0  0  0  0  0  0  0  0  0  1  1  1  1
        //          1  2  3  4  5  6  7  8  9  0  1  2  3
        //          p  s  y  c  h  e  d  l  i  c  i  z  e
        // p Row01: 00 01 02 03 04 05 06 07 08 09 10 11 12 !Caution!
        // Here the last cell Levenshtein[i][n] - (m-i) = 12 - (14-1) = -1
        // s Row02: 01 00 01 02 03 04 05 06 07 08 09 10 11
        // y Row03: 02 01 00 01 02 03 04 05 06 07 08 09 10
        // c Row04: 03 02 01 00 01 02 03 04 05 06 07 08 09
        // h Row05: 04 03 02 01 00 01 02 03 04 05 06 07 08
        // e Row06: 05 04 03 02 01 00 01 02 03 04 05 06 07
        // d Row07: 06 05 04 03 02 01 00 01 02 03 04 05 06
        // e Row08: 07 06 05 04 03 02 01 01 02 03 04 05 05
        // l Row09: 08 07 06 05 04 03 02 01 02 03 04 05 06
        // i Row10: 09 08 07 06 05 04 03 02 01 02 03 04 05
        // c Row11: 10 09 08 07 06 05 04 03 02 01 02 03 04
        // i Row12: 11 10 09 08 07 06 05 04 03 02 01 02 03
        // z Row13: 12 11 10 09 08 07 06 05 04 03 02 01 02
        // e Row14: 13 12 11 10 09 08 07 06 05 04 03 02 01
    }
    if (SkipHeuristic==0 && AtMostLevenshteinDistance >= Levenshtein[m][n]) {
        fprintf( fp_outLINE, "%s\n", wrd); DumpedLines++;
        if ((DumpedLines & 0xff) == 0xff)
            fflush(fp_outLINE); // Not sure: CTRL+C doesn't flush?!
    }
    // The bail out 'i' value (heuristic #2) affects our cached value here, 'i' is the needed one:
    memcpy(wrdCACHED, wrd, wrdlen+1); // +1 because we need the ASCII 000 termination
// LD ]
}            
}

并突出显示此错误

#define MaxLineLength 126+100

int main(int argc, char **argv) {

//unsigned int Levenshtein[MaxLineLength+1][MaxLineLength+1]; // declare int d[0..m, 0..n]
int Levenshtein[MaxLineLength+1][MaxLineLength+1]; // declare int d[0..m, 0..n]
unsigned int AtMostLevenshteinDistance;
int i,j,m,n;

AtMostLevenshteinDistance = 2;
m=14;
i=1;
n=12;
Levenshtein[i][n]=12;

//if ( Levenshtein[i][n] - (m-i) >= 0 && AtMostLevenshteinDistance
// < Levenshtein[i][n] - (m-i) ) {SkipHeuristic=1; break;}
// Caution: Levenshtein[i][n] can be less than (m-i), this changes nothing the logic is the same.

printf("'Levenshtein[i][n] - (m-i) >= 0' equals %d\n", Levenshtein[i][n] - (m-i) >= 0); 
printf("'AtMostLevenshteinDistance < Levenshtein[i][n] - (m-i)' equals %d\n", 
  AtMostLevenshteinDistance < Levenshtein[i][n] - (m-i)); 

exit (0);
}

// When 'unsigned int Levenshtein[MaxLineLength+1][MaxLineLength+1]; // declare int d[0..m, 0..n]':
// 
// D:\_KAZE>bug.exe
// 'Levenshtein[i][n] - (m-i) >= 0' equals 1
// 'AtMostLevenshteinDistance < Levenshtein[i][n] - (m-i)' equals 1
// 
// When 'int Levenshtein[MaxLineLength+1][MaxLineLength+1]; // declare int d[0..m, 0..n]':
// 
// D:\_KAZE>bug.exe
// 'Levenshtein[i][n] - (m-i) >= 0' equals 0
// 'AtMostLevenshteinDistance < Levenshtein[i][n] - (m-i)' equals 1

好吧,12 - (14-1) = -1,即 -1 >= 0 应该为 0,但由于“阴影”转换,它变成了 1。“基础”错误如 C 基础不容小觑,存在一些滑溜的石头。

现在,是时候做一些令人兴奋且极其……缓慢的事情了——全面模糊搜索。

我的意图是在 Kazahana 的 revision 1 中添加这个必备功能,遍历(模糊搜索)在每个可能的位置,即完全遍历所有传入的行,下面立即给出一个示例。

Kazahana "*edelweiss*" enwiki-20130904-pages-articles.7z.001 1537
D:\_KAZE>"Kazahana_r1-++fix+nowait_critical_nixFIX_WolfRAM+
  fix_HEXADECAD-Threads_IntelV12_32bit.exe" "*edelweiss*" enwiki-20130904-pages-articles.7z.001 1537
Kazahana: Total/Checked/Dumped xgrams: 9,382,307/7,914,526/33

上面的WILDCARD搜索产生了33个命中,其中有3种不同的情况:edelweiss/Edelweiss/EDELWEISS

D:\_KAZE>"Kazahana_r1-++fix+nowait_critical_nixFIX_WolfRAM+
  fix_HEXADECAD-Threads_IntelV12_32bit.exe" "edelweiss" enwiki-20130904-pages-articles.7z.001 1537
Kazahana: Dumped xgrams: 5

D:\_KAZE>"Kazahana_r1-++fix+nowait_critical_nixFIX_WolfRAM+
  fix_HEXADECAD-Threads_IntelV12_32bit.exe" "Edelweiss" enwiki-20130904-pages-articles.7z.001 1537
Kazahana: Dumped xgrams: 25

D:\_KAZE>"Kazahana_r1-++fix+nowait_critical_nixFIX_WolfRAM+
  fix_HEXADECAD-Threads_IntelV12_32bit.exe" "EDELWEISS" enwiki-20130904-pages-articles.7z.001 1537
Kazahana: Dumped xgrams: 3

好的,EXACT 搜索得到了
edelweiss 5 次命中
Edelweiss 25 次命中
EDELWEISS 3 次命中

这 33/5 行中的一行是一个 46 字节长的字符串:|Edelweiß || [[edelweiss]] || edelweiss flower

让我们模糊搜索“edelvais”,假装我们不知道正确的拼写。

要找到“edelweiss”,我们需要至少 Levenshtein 距离 3(1 次插入和 2 次替换):'edelvais' vs 'edelweiss'

  • 拼写错误的那个少一个字符;
  • 拼写错误的那个有 2 个错误字符:'v' 和 'a' 代替 'w' 和 'e'。

    目前的模糊模式是非全面的,现在我们在m字节长的stringS(传入的行)中搜索 n 字节长的字符串(模式),希望能落在这种情况之下

if (AtMostLevenshteinDistance >= MAX(m,n)-MIN(m,n)) {
//...
}

上述条件限制了我们的命中数,它有自己的细微差别和用途,而 EF(Exhaustive Fuzzy)是补充现有的三种模式。

// EXHAUSTIVE [[[[[[[[[[[[[[[[[[[[[[[[[[[[[[
// Here the old fuzzy is complemented with exhaustive one, using BBs of the incoming string (here 'm').
// We need to factorize 'm' down to all 'n+AtMostLevenshteinDistance'
// long strings/BBs and to search into them ONE-BY-ONE - a gruelling task indeed!
// Example:
// One of those 33/5 lines is a 46 bytes long string, i.e. m = 46:
// |Edelweiß || [[edelweiss]] || edelweiss flower
// Let's search fuzzily for "edelvais", pretending we don't know the right spelling:
// To find "edelweiss" we need at least Levenshtein distance 3, i.e. n = 8, AtMostLevenshteinDistance = 3:
// 'edelvais' vs 'edelweiss':
// - the mispelled one has one character less;
// - the mispelled one has 2 wrong characters: 'v' & 'a' instead of 'w' & 'e'.
// From above we need Building-Blocks of 46 bytes order 8+3.
// Inhere we are using order 11, 'm - Order + 1' is the number of total BBs for text 'm' bytes long: 46-11+1 = 36
// The 36 BBs:
// |Edelweiß |
// Edelweiß ||
// ...
// weiss flowe
// eiss flower
    for (l=0; l < m-(n+AtMostLevenshteinDistance)+1; l++) {
    // Here we throw 'edelvais' against '|Edelweiß |', ... 'eiss flower':
    // ...
    }
// Even more gruelling, 'edelvais' vs 'EDELWEISS' needs LD = 9 (1 insertion and 8 substitutions) as a minimum.
// EXHAUSTIVE ]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]

待完成和基准测试。

edelweiss
n.
一种高山植物(Leontopodium alpinum),原产于欧洲,叶子覆盖着白色绒毛,小花头周围有显眼的白色苞片。
[德语:edel,高贵(源自中古高地德语 edele,源自古高地德语 edili)+ weiss,白色(源自中古高地德语 wiz,源自古高地德语 wiz,hwiz;请参阅 Indo-European 词根 kweit-)。]

endure
v. tr.
1. 坚持到底,尽管困难重重;经历:忍受严酷的北极冬天。
2. 容忍地承受:“我们寻求真相,并将忍受后果”(Charles Seymour)。参见 bear1 的同义词。
v. intr.
1. 继续存在;持续:经受住几个世纪的风雨侵蚀的建筑。
2. 毫无屈服地耐心忍受。
[中古英语 enduren,源自古法语 endurer,源自拉丁语 indurare,意为“使坚硬”:in-,反对,进入;参见 en-1 + durus,坚硬;参见 Indo-European 词根 deru-。]

我最喜欢的花之一——一位坚忍不拔者——雪绒花在寒风中默默承受,从不屈服。

深层的高贵/美丽/智慧是基于耐力的,这一点毫无疑问。
同样,本着这种精神,Kazahana 的第四种模式将是与时间的战斗,我已经感到既害怕又兴奋,它将多么惊人——极其缓慢且方便。

2013 年 11 月 29 日的附加项

WILDCARD 模式已增加一个额外的快速通配符子模式。

看到三天前,Mr. Handy 的函数——一位 CodeProject 的同事——我推迟了所有活动,试图拿出一些绝活。

我已经有一个运行良好的通配符搜索器,一个非常通用的,但速度较慢。因此,我为我的 3 合 1 搜索器 Kazahana 添加了一个快速附加项,从而实现了通用(9 种通配符)和快速(经典的 2 种通配符)模式。我还测试了我的和他的,使用了 2 个线程,简而言之,它们确实很快,2 个线程利用了 180-192% 的 CPU,实现了 140-170MB/s 的总遍历速度,请看下文。

FAST 模式只使用两个经典通配符 '*' 和 '?'(我的分别是 '&' 和 '+'),如下所示

...
Note9a: Nine SLOW wildcards are available:
        wildcard '*' any character(s) or empty,
        wildcard '.' any ALPHA character(s) or empty,
        wildcard '`' any NON-ALPHA character(s) or empty,
        wildcard '@'/'#' any character {or empty}/{and not empty},
        wildcard '^'/'$' any ALPHA character {or empty}/{and not empty},
        wildcard '|'/'~' any NON-ALPHA character {or empty}/{and not empty}.
Note9b: Two FAST wildcards are available:
        wildcard '&' any character(s) or empty,
        wildcard '+' any character and not empty.
Note9c: Don't mix SLOW and FAST, the SLOW overrides the FAST, i.e. presence of at least one of the 9 wildcards cancels FAST mode.
...

不幸的是,他的函数比我的快,让我的练习曲能够超越他的只有两件事:

  • 我的函数更简洁、更美观、更短小
  • 我的函数是无许可的,即 100% 免费。

    我需要帮助来加快速度,源代码如下。

// Igor Pavlov's recursive variant modified (and converted to iterative) by Kaze, 2013-Nov-28:
//static boolean Wildcard_IP(const char *mask, int maskPos, const char *name, int namePos)
//{
//int maskLen = maskGLOBALlen - maskPos;
//int nameLen = nameGLOBALlen - namePos;
//if (maskLen == 0)
//    if (nameLen == 0)
//        return true;
//    else
//        return false;
//if (mask[maskPos] == '*') {
//    if (Wildcard_IP(mask, maskPos + 1, name, namePos))
//        return true;
//    if (nameLen == 0) 
//        return false;
//    return Wildcard_IP(mask, maskPos, name, namePos + 1);
//} else if (mask[maskPos] == '?') {
//    if (nameLen == 0) 
//        return false;
//    return Wildcard_IP(mask, maskPos + 1, name, namePos + 1);
//} else {
//    if (mask[maskPos] != name[namePos])
//        return false;
//    return Wildcard_IP(mask, maskPos + 1, name, namePos + 1);
//}
//}
int WildcardMatch_Iterative_Kaze(const char* mask, const char* name) {
const char* maskSTACK;
const char* nameSTACK;
int BacktrackFlag = 0;
Backtrack:
for (nameSTACK = name, maskSTACK = mask; *nameSTACK; ++nameSTACK, ++maskSTACK) {
    if (*maskSTACK == '*') {
        mask = maskSTACK+1;
        if (!*mask) return 1;
        name = nameSTACK;
        BacktrackFlag = -1;
        goto Backtrack;
    }
    //else if (*maskSTACK == '?') {
    //} else {
    else if (*maskSTACK != '?') {
        //if (tolower(*nameSTACK) != tolower(*maskSTACK)) {
        // These 'tolower's are outrageous, they hurt speed BADLY,
        // in real-world usage both should have been lowercased outwith the 'for'.
        if (*nameSTACK != *maskSTACK) { 
            if (!BacktrackFlag) return 0;
            name++;
            goto Backtrack;
        }
    } 
} 
while (*maskSTACK == '*') ++maskSTACK;
return (!*maskSTACK);
}

大型基准测试,搜索 Wikipedia 1024MB 转储中的所有行。
我的函数用于可执行文件:Kazahana_r1-++fix+nowait_critical_nixFIX_WolfRAM+fixITER_HEXADECAD-Threads_IntelV12.exe
他的函数用于可执行文件:Kazahana_r1-++fix+nowait_critical_nixFIX_WolfRAM+fixITER_HEXADECAD-Threads_IntelV12_JH.exe
运行结果是,我的通配符 '&'/' + ' 分别代表 '*'/' ?'

timer Kazahana_r1-++fix+nowait_critical_nixFIX_WolfRAM+
  fixITER_HEXADECAD-Threads_IntelV12.exe "&karolina&wydra&" 
  enwiki-20130904-pages-articles.7z.001 1536 >>Results.txt
timer Kazahana_r1-++fix+nowait_critical_nixFIX_WolfRAM+fixITER_HEXADECAD-Threads_IntelV12_JH.exe 
  "&karolina&wydra&" enwiki-20130904-pages-articles.7z.001 1536 >>Results.txt
timer Kazahana_r1-++fix+nowait_critical_nixFIX_WolfRAM+fixITER_HEXADECAD-Threads_IntelV12.exe 
  "++++++++++ &AUTHENTICITY&EMPOWERS&YOU+&" enwiki-20130904-pages-articles.7z.001 1536 >>Results.txt
timer Kazahana_r1-++fix+nowait_critical_nixFIX_WolfRAM+
  fixITER_HEXADECAD-Threads_IntelV12_JH.exe "++++++++++ &AUTHENTICITY&EMPOWERS&YOU+
  &" enwiki-20130904-pages-articles.7z.001 1536 >>Results.txt
timer Kazahana_r1-++fix+nowait_critical_nixFIX_WolfRAM+
  fixITER_HEXADECAD-Threads_IntelV12.exe "&Traven&" 
  enwiki-20130904-pages-articles.7z.001 1536 >>Results.txt
timer Kazahana_r1-++fix+nowait_critical_nixFIX_WolfRAM+
  fixITER_HEXADECAD-Threads_IntelV12_JH.exe "&Traven&" 
  enwiki-20130904-pages-articles.7z.001 1536 >>Results.txt
timer Kazahana_r1-++fix+nowait_critical_nixFIX_WolfRAM+fixITER_HEXADECAD-Threads_IntelV12.exe 
  "+++++++++ +++++++++ & influence&" enwiki-20130904-pages-articles.7z.001 1536 >>Results.txt
timer Kazahana_r1-++fix+nowait_critical_nixFIX_WolfRAM+fixITER_HEXADECAD-Threads_IntelV12_JH.exe 
  "+++++++++ +++++++++ & influence&" enwiki-20130904-pages-articles.7z.001 1536 >>Results.txt
timer Kazahana_r1-++fix+nowait_critical_nixFIX_WolfRAM+fixITER_HEXADECAD-Threads_IntelV12.exe 
  "&metal+fat&u+&" enwiki-20130904-pages-articles.7z.001 1536 >>Results.txt
timer Kazahana_r1-++fix+nowait_critical_nixFIX_WolfRAM+fixITER_HEXADECAD-Threads_IntelV12_JH.exe 
  "&metal+fat&u+&" enwiki-20130904-pages-articles.7z.001 1536 >>Results.txt
timer Kazahana_r1-++fix+nowait_critical_nixFIX_WolfRAM+fixITER_HEXADECAD-Threads_IntelV12.exe 
  "&[+&Category:&pai&nte++ &+++++&" enwiki-20130904-pages-articles.7z.001 1536 >>Results.txt
timer Kazahana_r1-++fix+nowait_critical_nixFIX_WolfRAM+fixITER_HEXADECAD-Threads_IntelV12_JH.exe 
  "&[+&Category:&pai&nte++ &+++++&" enwiki-20130904-pages-articles.7z.001 1536 >>Results.txt

模式 "&karolina&wydra&" 的速度结果

D:\_KAZE>timer Kazahana_r1-++fix+nowait_critical_nixFIX_WolfRAM+
  fixITER_HEXADECAD-Threads_IntelV12.exe "&karolina&wydra&" 
  enwiki-20130904-pages-articles.7z.001 1536 >>Results.txt
Kazahana, a superfast exact & wildcards & Levenshtein Distance (Wagner-Fischer) 
  searcher, r. 1-++fix+nowait_critical_nixFIX_Wolfram+fixITER, copyleft Kaze 2013-Nov-29.
Enforcing FAST wildcard mode ...
omp_get_num_procs( ) = 2
omp_get_max_threads( ) = 2
Enforcing HEXADECAD i.e. hexadecuple-threads ...
Allocating Master-Buffer 1536KB ... OK
/; 00,000,160,591 bytes/clock
Kazahana: Total/Checked/Dumped xgrams: 9,382,307/7,914,526/0
Kazahana: Performance: 156 KB/clock
Kazahana: Performance: 1,401 xgrams/clock
Kazahana: Performance: Total/fread() clocks: 6,694/654
Kazahana: Performance: I/O time, i.e. fread() time, is 9 percents
Kazahana: Performance: RDTSC I/O time, i.e. fread() time, is 1,334,917,342 ticks
Kazahana: Done.
Timer 9.01 : Igor Pavlov : Public domain : 2009-05-31

Kernel Time  =     0.717 =    9%
User Time    =    13.041 =  178%
Process Time =    13.759 =  188%
Global Time  =     7.298 =  100%

D:\_KAZE>timer Kazahana_r1-++fix+nowait_critical_nixFIX_WolfRAM+
  fixITER_HEXADECAD-Threads_IntelV12_JH.exe "&karolina&wydra&" 
  enwiki-20130904-pages-articles.7z.001 1536 >>Results.txt
Kazahana, a superfast exact & wildcards & Levenshtein Distance (Wagner-Fischer) 
  searcher, r. 1-++fix+nowait_critical_nixFIX_Wolfram+fixITER, copyleft Kaze 2013-Nov-29.
Enforcing FAST wildcard mode ...
omp_get_num_procs( ) = 2
omp_get_max_threads( ) = 2
Enforcing HEXADECAD i.e. hexadecuple-threads ...
Allocating Master-Buffer 1536KB ... OK
/; 00,000,167,227 bytes/clock
Kazahana: Total/Checked/Dumped xgrams: 9,382,307/7,914,526/0
Kazahana: Performance: 163 KB/clock
Kazahana: Performance: 1,459 xgrams/clock
Kazahana: Performance: Total/fread() clocks: 6,428/639
Kazahana: Performance: I/O time, i.e. fread() time, is 9 percents
Kazahana: Performance: RDTSC I/O time, i.e. fread() time, is 1,308,754,183 ticks
Kazahana: Done.
Timer 9.01 : Igor Pavlov : Public domain : 2009-05-31

Kernel Time  =     0.748 =   11%
User Time    =    12.230 =  181%
Process Time =    12.979 =  192%
Global Time  =     6.729 =  100%

在此,为避免信息过载,仅给出亮点。
模式 "++++++++++ &AUTHENTICITY&EMPOWERS&YOU+&" 的速度结果

Mine:
Kazahana: Total/Checked/Dumped xgrams: 9,382,307/7,914,526/0
Kazahana: Performance: 184 KB/clock
His:
Kazahana: Total/Checked/Dumped xgrams: 9,382,307/7,914,526/0
Kazahana: Performance: 186 KB/clock

模式 "&Traven&" 的速度结果

Mine:
Kazahana: Total/Checked/Dumped xgrams: 9,382,307/7,914,526/617
Kazahana: Performance: 145 KB/clock
His:
Kazahana: Total/Checked/Dumped xgrams: 9,382,307/7,914,526/617
Kazahana: Performance: 152 KB/clock

模式 "+++++++++ +++++++++ & influence&" 的速度结果

Mine:
Kazahana: Total/Checked/Dumped xgrams: 9,382,307/7,914,526/540
Kazahana: Performance: 189 KB/clock
His:
Kazahana: Total/Checked/Dumped xgrams: 9,382,307/7,914,526/540
Kazahana: Performance: 189 KB/clock

模式 "&metal+fat&u+&" 的速度结果

Mine:
Kazahana: Total/Checked/Dumped xgrams: 9,382,307/7,914,526/18
Kazahana: Performance: 152 KB/clock
His:
Kazahana: Total/Checked/Dumped xgrams: 9,382,307/7,914,526/18
Kazahana: Performance: 159 KB/clock

模式 "&[+&Category:&pai&nte++ &+++++&" 的速度结果

Mine:
Kazahana: Total/Checked/Dumped xgrams: 9,382,307/7,914,526/56
Kazahana: Performance: 153 KB/clock
His:
Kazahana: Total/Checked/Dumped xgrams: 9,382,307/7,914,526/56
Kazahana: Performance: 159 KB/clock

还有一些来自下载区“WildBench.zip”的结果。

D:\_KAZE\_KAZE_GOLD\Kazahana_source_EXEs_Benchmark\WildBench>TEST.BAT

D:\_KAZE\_KAZE_GOLD\Kazahana_source_EXEs_Benchmark\WildBench>"Kazahana_r1-++fix+
  nowait_critical_nixFIX_WolfRAM+fixITER_HEXADECAD-Threads_IntelV12.exe" 
  "&a&e+f++&e" MASAKARI_General-Purpose_Grade_English_Wordlist.wrd 1024
Kazahana, a superfast exact & wildcards & Levenshtein Distance (Wagner-Fischer) 
  searcher, r. 1-++fix+nowait_critical_nixFIX_Wolfram+fixITER, copyleft Kaze 2013-Nov-29.
Enforcing FAST wildcard mode ...
omp_get_num_procs( ) = 2
omp_get_max_threads( ) = 2
Enforcing HEXADECAD i.e. hexadecuple-threads ...
Allocating Master-Buffer 1024KB ... OK
-; 00,000,149,796 bytes/clock
Kazahana: Total/Checked/Dumped xgrams: 317,919/317,919/11
Kazahana: Performance: 105 KB/clock
Kazahana: Performance: 8,831 xgrams/clock
Kazahana: Performance: Total/fread() clocks: 36/5
Kazahana: Performance: I/O time, i.e. fread() time, is 13 percents
Kazahana: Performance: RDTSC I/O time, i.e. fread() time, is 5,433,065 ticks
Kazahana: Done.

D:\_KAZE\_KAZE_GOLD\Kazahana_source_EXEs_Benchmark\WildBench>"Kazahana_r1-++fix+
  nowait_critical_nixFIX_WolfRAM+fixITER_HEXADECAD-Threads_IntelV12_jh.exe" 
  "&a&e+f++&e" MASAKARI_General-Purpose_Grade_English_Wordlist.wrd 1024
Kazahana, a superfast exact & wildcards & Levenshtein Distance (Wagner-Fischer) 
  searcher, r. 1-++fix+nowait_critical_nixFIX_Wolfram+fixITER, copyleft Kaze 2013-Nov-29.
Enforcing FAST wildcard mode ...
omp_get_num_procs( ) = 2
omp_get_max_threads( ) = 2
Enforcing HEXADECAD i.e. hexadecuple-threads ...
Allocating Master-Buffer 1024KB ... OK
-; 00,000,149,796 bytes/clock
Kazahana: Total/Checked/Dumped xgrams: 317,919/317,919/11
Kazahana: Performance: 105 KB/clock
Kazahana: Performance: 8,831 xgrams/clock
Kazahana: Performance: Total/fread() clocks: 36/0
Kazahana: Performance: I/O time, i.e. fread() time, is 0 percents
Kazahana: Performance: RDTSC I/O time, i.e. fread() time, is 5,974,419 ticks
Kazahana: Done.

D:\_KAZE\_KAZE_GOLD\Kazahana_source_EXEs_Benchmark\WildBench>type Kazahana.txt
[&a&e+f++&e] afterfame /MASAKARI_General-Purpose_Grade_English_Wordlist.wrd/
[&a&e+f++&e] afterfuture /MASAKARI_General-Purpose_Grade_English_Wordlist.wrd/
[&a&e+f++&e] arsenferratose /MASAKARI_General-Purpose_Grade_English_Wordlist.wrd/
[&a&e+f++&e] azelfafage /MASAKARI_General-Purpose_Grade_English_Wordlist.wrd/
[&a&e+f++&e] brazenface /MASAKARI_General-Purpose_Grade_English_Wordlist.wrd/
[&a&e+f++&e] laissezfaire /MASAKARI_General-Purpose_Grade_English_Wordlist.wrd/
[&a&e+f++&e] malperformance /MASAKARI_General-Purpose_Grade_English_Wordlist.wrd/
[&a&e+f++&e] salesforce /MASAKARI_General-Purpose_Grade_English_Wordlist.wrd/
[&a&e+f++&e] schadenfreude /MASAKARI_General-Purpose_Grade_English_Wordlist.wrd/
[&a&e+f++&e] waterfree /MASAKARI_General-Purpose_Grade_English_Wordlist.wrd/
[&a&e+f++&e] zauberflote /MASAKARI_General-Purpose_Grade_English_Wordlist.wrd/

D:\_KAZE\_KAZE_GOLD\Kazahana_source_EXEs_Benchmark\WildBench>

还有下载区提供的来自 WildBench(Dezhi Zhao 的基准修改版)的速度结果。

// Results on my laptop Core 2 T7500 2200MHz:
/*
D:\_KAZE\_KAZE_GOLD\Kazahana_source_EXEs_Benchmark\WildBench>_compile_Intel.bat

D:\_KAZE\_KAZE_GOLD\Kazahana_source_EXEs_Benchmark\WildBench>icl /O3 wildbench.c /FAcs /Fewildbench_Intel12
Intel(R) C++ Compiler XE for applications running on IA-32, Version 12.1.1.258 Build 20111011
Copyright (C) 1985-2011 Intel Corporation.  All rights reserved.

wildbench.c
Microsoft (R) Incremental Linker Version 10.00.30319.01
Copyright (C) Microsoft Corporation.  All rights reserved.

-out:wildbench_Intel12.exe
wildbench.obj

D:\_KAZE\_KAZE_GOLD\Kazahana_source_EXEs_Benchmark\WildBench>wildbench_Intel12.exe
Running three times, for charm ...

50000000 runs of WildcardMatch_Recursive_DezhiZhao  = 44.272s, r = 350000000
50000000 runs of WildcardMatch_Iterative_JackHandy  = 15.959s, r = 350000000
50000000 runs of WildcardMatch_Iterative_Kaze       = 18.237s, r = 350000000

50000000 runs of WildcardMatch_Recursive_DezhiZhao  = 46.035s, r = 350000000
50000000 runs of WildcardMatch_Iterative_JackHandy  = 14.711s, r = 350000000
50000000 runs of WildcardMatch_Iterative_Kaze       = 21.403s, r = 350000000

50000000 runs of WildcardMatch_Recursive_DezhiZhao  = 44.164s, r = 350000000
50000000 runs of WildcardMatch_Iterative_JackHandy  = 16.302s, r = 350000000
50000000 runs of WildcardMatch_Iterative_Kaze       = 18.595s, r = 350000000

D:\_KAZE\_KAZE_GOLD\Kazahana_source_EXEs_Benchmark\WildBench>_compile_VS2010.bat

D:\_KAZE\_KAZE_GOLD\Kazahana_source_EXEs_Benchmark\WildBench>cl /Ox wildbench.c /FAcs /Fewildbench_VS2010
Microsoft (R) 32-bit C/C++ Optimizing Compiler Version 16.00.30319.01 for 80x86
Copyright (C) Microsoft Corporation.  All rights reserved.

wildbench.c
Microsoft (R) Incremental Linker Version 10.00.30319.01
Copyright (C) Microsoft Corporation.  All rights reserved.

/out:wildbench_VS2010.exe
wildbench.obj

D:\_KAZE\_KAZE_GOLD\Kazahana_source_EXEs_Benchmark\WildBench>wildbench_VS2010.exe
Running three times, for charm ...

50000000 runs of WildcardMatch_Recursive_DezhiZhao  = 63.835s, r = 350000000
50000000 runs of WildcardMatch_Iterative_JackHandy  = 26.130s, r = 350000000
50000000 runs of WildcardMatch_Iterative_Kaze       = 26.567s, r = 350000000

50000000 runs of WildcardMatch_Recursive_DezhiZhao  = 62.384s, r = 350000000
50000000 runs of WildcardMatch_Iterative_JackHandy  = 26.005s, r = 350000000
50000000 runs of WildcardMatch_Iterative_Kaze       = 26.068s, r = 350000000

50000000 runs of WildcardMatch_Recursive_DezhiZhao  = 63.040s, r = 350000000
50000000 runs of WildcardMatch_Iterative_JackHandy  = 26.036s, r = 350000000
50000000 runs of WildcardMatch_Iterative_Kaze       = 26.333s, r = 350000000

D:\_KAZE\_KAZE_GOLD\Kazahana_source_EXEs_Benchmark\WildBench>
*/

以及来自基准测试中包含的 Intel 的“wildbench.cod”,您可以看到
_WildcardMatch_Iterative_JackHandy: 000ce-00000+2= 208 字节长
_WildcardMatch_Iterative_Kaze: 000a2-00000+14= 176 字节长

奇怪,我做了我最好的尝试,但未能超越 Mr. Handy 的练习曲。嗯,我错过了什么?

2013 年 11 月 30 日的附加项

自由打击!

从昨晚开始,最快的通配符函数被称为 'WildcardMatch_Iterative_Kaze' revision 2,它 100% 免费,一如既往。

不再犹豫,我很幸运,现在最快的通配符函数不是 Mr.Handy 的,而是 Kaze 的,源代码/基准测试的 revision 2 如下。我很快就找到了昨天修订版延迟的原因——一个不必要的逻辑分支,现在已移除,使函数能够更流畅地运行。

新的调整与他的第一个“while”非常相似,而且看起来我“窃取”了他的想法,但当然不是这样,我只是看到三个“if”嵌套得很糟糕,其中第三个完全不必要,即

if (!BacktrackFlag) return 0;

这个第三个“if”本应被移除,但昨天时间不够,所以要移除它,还需要另一个“for”循环来设置 'BacktrackFlag',并与主“for”循环一起。

在这里,我想感谢 Mr.Handy 的工作,看到他的函数自 2001 年以来一直占主导地位真是太棒了,但是,现在是时候被取代了。

最终对决,无论是短字符串(类似文件名)还是长字符串(Wikipedia 的行)。

对于短字符串,我再次使用了修改版的 Dezhi Zhao 的 WildBench,在下载区更新。

// Results on my laptop Core 2 T7500 2200MHz:
/*
D:\_KAZE\WildBench>RUNME.bat

D:\_KAZE\WildBench>wildbench_Intel12.exe
FREEDOM strikes a blow!
Running three times, for charm ...

50000000 runs of WildcardMatch_Recursive_DezhiZhao  = 46.503s, r = 350000000
50000000 runs of WildcardMatch_Iterative_JackHandy  = 15.241s, r = 350000000
50000000 runs of WildcardMatch_Iterative_Kaze       = 15.008s, r = 350000000

50000000 runs of WildcardMatch_Recursive_DezhiZhao  = 44.351s, r = 350000000
50000000 runs of WildcardMatch_Iterative_JackHandy  = 15.038s, r = 350000000
50000000 runs of WildcardMatch_Iterative_Kaze       = 14.633s, r = 350000000

50000000 runs of WildcardMatch_Recursive_DezhiZhao  = 48.782s, r = 350000000
50000000 runs of WildcardMatch_Iterative_JackHandy  = 14.976s, r = 350000000
50000000 runs of WildcardMatch_Iterative_Kaze       = 15.413s, r = 350000000

D:\_KAZE\WildBench>wildbench_VS2010.exe
FREEDOM strikes a blow!
Running three times, for charm ...

50000000 runs of WildcardMatch_Recursive_DezhiZhao  = 61.152s, r = 350000000
50000000 runs of WildcardMatch_Iterative_JackHandy  = 25.896s, r = 350000000
50000000 runs of WildcardMatch_Iterative_Kaze       = 22.776s, r = 350000000

50000000 runs of WildcardMatch_Recursive_DezhiZhao  = 61.167s, r = 350000000
50000000 runs of WildcardMatch_Iterative_JackHandy  = 26.068s, r = 350000000
50000000 runs of WildcardMatch_Iterative_Kaze       = 22.386s, r = 350000000

50000000 runs of WildcardMatch_Recursive_DezhiZhao  = 61.682s, r = 350000000
50000000 runs of WildcardMatch_Iterative_JackHandy  = 26.146s, r = 350000000
50000000 runs of WildcardMatch_Iterative_Kaze       = 22.620s, r = 350000000

D:\_KAZE\WildBench>
*/

对于长字符串,我再次使用了 Wikipedia 的 1024MB 转储。

timer Kazahana_r1-++fix+nowait_critical_nixFIX_WolfRAM+fixITER+_HEXADECAD-Threads_IntelV12.exe 
  "&karolina&wydra&" enwiki-20130904-pages-articles.7z.001 1536 >>Results.txt
Kazahana: Total/Checked/Dumped xgrams: 9,382,307/7,914,526/0
Kazahana: Performance: 162 KB/clock Kaze's revision 2
Kazahana: Performance: 163 KB/clock Jack Handy's

timer Kazahana_r1-++fix+nowait_critical_nixFIX_WolfRAM+fixITER+_HEXADECAD-Threads_IntelV12.exe 
  "++++++++++ &AUTHENTICITY&EMPOWERS&YOU+&" enwiki-20130904-pages-articles.7z.001 1536 >>Results.txt
Kazahana: Total/Checked/Dumped xgrams: 9,382,307/7,914,526/0
Kazahana: Performance: 186 KB/clock Kaze's revision 2
Kazahana: Performance: 186 KB/clock Jack Handy's

timer Kazahana_r1-++fix+nowait_critical_nixFIX_WolfRAM+fixITER+_HEXADECAD-Threads_IntelV12.exe 
  "&Traven&" enwiki-20130904-pages-articles.7z.001 1536 >>Results.txt
Kazahana: Total/Checked/Dumped xgrams: 9,382,307/7,914,526/617
Kazahana: Performance: 152 KB/clock Kaze's revision 2
Kazahana: Performance: 152 KB/clock Jack Handy's

timer Kazahana_r1-++fix+nowait_critical_nixFIX_WolfRAM+fixITER+_HEXADECAD-Threads_IntelV12.exe 
  "+++++++++ +++++++++ & influence&" enwiki-20130904-pages-articles.7z.001 1536 >>Results.txt
Kazahana: Total/Checked/Dumped xgrams: 9,382,307/7,914,526/540
Kazahana: Performance: 190 KB/clock Kaze's revision 2
Kazahana: Performance: 189 KB/clock Jack Handy's

timer Kazahana_r1-++fix+nowait_critical_nixFIX_WolfRAM+fixITER+_HEXADECAD-Threads_IntelV12.exe 
  "&metal+fat&u+&" enwiki-20130904-pages-articles.7z.001 1536 >>Results.txt
Kazahana: Total/Checked/Dumped xgrams: 9,382,307/7,914,526/18
Kazahana: Performance: 158 KB/clock Kaze's revision 2
Kazahana: Performance: 159 KB/clock Jack Handy's

timer Kazahana_r1-++fix+nowait_critical_nixFIX_WolfRAM+fixITER+_HEXADECAD-Threads_IntelV12.exe 
  "&[+&Category:&pai&nte++ &+++++&" enwiki-20130904-pages-articles.7z.001 1536 >>Results.txt
Kazahana: Total/Checked/Dumped xgrams: 9,382,307/7,914,526/56
Kazahana: Performance: 162 KB/clock Kaze's revision 2
Kazahana: Performance: 159 KB/clock Jack Handy's

以及源代码。

// Revision 2, 2013-Nov-30:
int WildcardMatch_Iterative_Kaze(const char* mask, const char* name) {
const char* maskSTACK;
const char* nameSTACK;
//int BacktrackFlag = 0; // No need of it in rev.2
/*
    // Simplest heuristic with SUPEROVERHEAD enforced: trying to skip the
    // whole wildcard section by comparing the two arrays order 1 of mask&name.
    int i;
    unsigned char maskOrder1[256];
    unsigned char nameOrder1[256];
    for (i='a'; i <= 'z'; i++) { maskOrder1[i]=0; nameOrder1[i]=0;}
    for (i=0; i < strlen(mask); i++) maskOrder1[mask[i]]=1;
    for (i=0; i < strlen(name); i++) nameOrder1[name[i]]=1;
    // Assuming the incoming strings are already lowercased (as it should for speed) and
    // if we don't have matching alphabet parts (from mask side) means
    // we don't need to compare any further i.e. the match fails.
    for (i='a'; i <= 'z'; i++) if ( maskOrder1[i] == 1 && nameOrder1[i] == 0 ) return 0;
*/
/*
    int i;
    for (i=0; i < strlen(mask); i++) {
        if ( mask[i] != '*' && mask[i] != '?' )
            if ( !memchr(name,mask[i],strlen(name)) ) return 0;
    }
*/
/*
Backtrack:
for (nameSTACK = name, maskSTACK = mask; *nameSTACK; ++nameSTACK, ++maskSTACK) {
    if (*maskSTACK == '*') {
        mask = maskSTACK+1;
        if (!*mask) return 1;
        name = nameSTACK;
        BacktrackFlag = -1;
        goto Backtrack;
    }
    //else if (*maskSTACK == '?') {
    //} else {
    else if (*maskSTACK != '?') {
        //if (tolower(*nameSTACK) != tolower(*maskSTACK)) {
        // These 'tolower's are outrageous, they hurt speed BADLY,
        // in real-world usage both should have been lowercased outwith the 'for'.
        if (*nameSTACK != *maskSTACK) { 
            if (!BacktrackFlag) return 0; // Stupid branching, SLOW!
            name++;
            goto Backtrack;
        }
    } 
} 
*/
// Here, outside the main/second 'for', in order to avoid branching
// we need to set the old/obsolete BacktrackFlag i.e. we need first occurrence of '*':
for (name, mask; *name; ++name, ++mask) {
    if (*mask == '*') {
        goto Backtrack;
    }
    //else if (*mask == '?') {
    //} else {
    else if (*mask != '?') {
        if (*name != *mask) { 
            return 0;
        }
    } 
}
// We are entering the main/second 'for' with mask pointing
// to '*' as if BacktrackFlag is already set in the very first iteration at first condition:
Backtrack:
for (nameSTACK = name, maskSTACK = mask; *nameSTACK; ++nameSTACK, ++maskSTACK) {
    if (*maskSTACK == '*') {
        mask = maskSTACK+1;
        if (!*mask) return 1;
        name = nameSTACK;
        //BacktrackFlag = -1;
        goto Backtrack;
    }
    //else if (*maskSTACK == '?') {
    //} else {
    else if (*maskSTACK != '?') {
        //if (tolower(*nameSTACK) != tolower(*maskSTACK)) {
        // These 'tolower's are outrageous, they hurt speed BADLY,
        // in real-world usage both should have been lowercased outwith the 'for'.
        if (*nameSTACK != *maskSTACK) { 
            //if (!BacktrackFlag) return 0; // Stupid branching, SLOW!
            name++;
            goto Backtrack;
        }
    } 
}
while (*maskSTACK == '*') ++maskSTACK;
return (!*maskSTACK);
}

以及来自基准测试中包含的 Intel 的‘wildbench.cod’,您可以看到
_WildcardMatch_Iterative_JackHandy: 000ce-00000+2= 208 字节长
_WildcardMatch_Iterative_Kaze: 000a3-00000+13= 176 字节长

我非常接近发布 Kazahana 的 revision 1 了。

2013 年 12 月 7 日的附加项

EXHAUSTIVE FUZZY 模式已添加,通过在命令行中将 LD 后缀加上 'e' 来进入,如下行所示。

经过 304,017,070 次比较,1 次命中

D:\_KAZE>Kazahana 0e "miesha tate" enwiki-20130904-pages-articles.7z.001 1536
D:\_KAZE>type Kazahana.txt
* [[Miesha Tate]], Mixed martial artist

经过 893,637,435 次比较,1 次命中

D:\_KAZE>Kazahana 1e "mieshaA tate" enwiki-20130904-pages-articles.7z.001 1536
D:\_KAZE>type Kazahana.txt
* [[Miesha Tate]], Mixed martial artist

经过 930,871,807 次比较,1 次命中

D:\_KAZE>Kazahana 1e "misha tate" enwiki-20130904-pages-articles.7z.001 1536
D:\_KAZE>type Kazahana.txt
* [[Miesha Tate]], Mixed martial artist

经过 1,489,713,146 次比较,3 次命中

D:\_KAZE>Kazahana 2e "meescha tate" enwiki-20130904-pages-articles.7z.001 1536
D:\_KAZE>type Kazahana.txt
*  1977   &ndash; [[Ameesha Patel]], Indian actress and producer
* [[Ameesha Patel]]
* [[Miesha Tate]], Mixed martial artist

正如所示,所有包含您的模式且在您的“AtMostLevenshteinDistance”(最后一个示例为 2)内的行都会被转储。

当一个人不知道正确的拼写时,组合至少有 4x3 种:Meesha/Meescha/Mischa/Misha Tayt/Taith/Tate

并展示了一个重度的 EF“弯曲膝盖”示例,用 1 个线程扫描 1GB,耗时 31,876 秒。

D:\_KAZE>Kazahana 13e "Project Icarus, a design study of an 
interstellar spacecraft based on Project Daedalus" enwiki-20130904-pages-articles.7z.001 1536
Kazahana, a superfast exact & wildcards & Levenshtein Distance 
(Wagner-Fischer) searcher, r. 1-++fix+nowait_critical_nixFIX_Wolfram+fixITER+EX, copyleft Kaze 2013-Dec-05.
omp_get_num_procs( ) = 2
omp_get_max_threads( ) = 2
Enforcing HEXADECAD i.e. hexadecuple-threads ...
Allocating Master-Buffer 1536KB ... OK
/; 00,000,000,062 bytes/clock
Kazahana: Total/Checked/Dumped xgrams: 9,382,307/2,527,965,825/1
Kazahana: Performance: Total/fread() clocks: 17,282,584/734
Kazahana: Performance: I/O time, i.e. fread() time, is 0 percents
Kazahana: Performance: RDTSC I/O time, i.e. fread() time, is 1,777,296,620 ticks
Kazahana: Done.
Timer 9.01 : Igor Pavlov : Public domain : 2009-05-31

Kernel Time  =    10.358 =    0%
User Time    = 31866.433 =  184%
Process Time = 31876.791 =  184%
Global Time  = 17283.201 =  100%

模式,你的模糊文本,86 字节长

Project Icarus, a design study of an interstellar spacecraft based on Project Daedalus

Haystack,Wikipedia 1024MB 转储中的一行,126 字节长

* [[Project Icarus (Interstellar Probe Design Study)]], a design study of an interstellar spacecraft based on Project Daedalus

要匹配上述两个,我们需要至少 Levenshtein 距离 13,为什么,就是这样。

GOAL  : Project Icarus, a design study of an interstellar spacecraft based on Project Daedalus
PHASE0: tudy)]], a design study of an interstellar spacecraft based on Project Daedalus

Insert 6 chars 'Projec':
GOAL  : Project Icarus, a design study of an interstellar spacecraft based on Project Daedalus
PHASE1: Projectudy)]], a design study of an interstellar spacecraft based on Project Daedalus

Insert 1 char ' ':
GOAL  : Project Icarus, a design study of an interstellar spacecraft based on Project Daedalus
PHASE2: Project udy)]], a design study of an interstellar spacecraft based on Project Daedalus

Substitute 6 chars 'udy)]]' with 'Icarus':
GOAL  : Project Icarus, a design study of an interstellar spacecraft based on Project Daedalus
PHASE3: Project Icarus, a design study of an interstellar spacecraft based on Project Daedalus

13=6+1+6

逻辑很简单(经典的 Wagner-Fischer 被包装在另外两个循环中,以便以所有可能的构建块顺序将我们的模式投射到传入行的 86-13..86+13 范围),唯一令人担忧的是比较次数,以上述示例为例:2,527,965,825 次。
扫描的行数:9,382,307 行。

一个快速示例
我们需要将传入的行分解为构建块,从 86-13=73 开始。

BB order 86-13=73: "* [[Project Icarus (Interstellar Probe Design Study)]], a design study of"
BB order 86-12=74: "* [[Project Icarus (Interstellar Probe Design Study)]], a design study of "
...
BB order 86-01=85: "* [[Project Icarus (Interstellar Probe Design Study)]], a design study of an interste"
BB order 86-00=86: "* [[Project Icarus (Interstellar Probe Design Study)]], a design study of an interstel"
BB order 86+01=87: "* [[Project Icarus (Interstellar Probe Design Study)]], a design study of an interstell"
...
BB order 86+12=98: "* [[Project Icarus (Interstellar Probe Design Study)]], a design study of an interstellar spacecra"
BB order 86+13=99: "* [[Project Icarus (Interstellar Probe Design Study)]], a design study of an interstellar spacecraf"

Those above are the first loop, second loop is sliding within the line one position at a time:

BB order 86-13=73: " [[Project Icarus (Interstellar Probe Design Study)]], a design study of "
BB order 86-12=74: " [[Project Icarus (Interstellar Probe Design Study)]], a design study of a"
...
BB order 86-01=85: " [[Project Icarus (Interstellar Probe Design Study)]], a design study of an interstel"
BB order 86-00=86: " [[Project Icarus (Interstellar Probe Design Study)]], a design study of an interstell"
BB order 86+01=87: " [[Project Icarus (Interstellar Probe Design Study)]], a design study of an interstella"
...
BB order 86+12=98: " [[Project Icarus (Interstellar Probe Design Study)]], a design study of an interstellar spacecraf"
BB order 86+13=99: " [[Project Icarus (Interstellar Probe Design Study)]], a design study of an interstellar spacecraft"

一种有用的自动穷举模糊搜索方法是,从 Levenshtein 距离 0(与精确不区分大小写搜索相同)开始,并将 LD 逐渐增加 1,直到出现命中。

这种自动模式将非常强大,只需提供你的模式并等待命中,命中是有保证的,当 LD 达到模式长度甚至超过时,所有种类的命中都会涌入。

在这里,16 个线程是“基本必需品”,对于像我这样的硬核文本探索者来说,128 个线程将挽救一天,字面意义上。

在我看来,无论计算多么广泛(或者说密集),EXHAUSTIVE FUZZY 模式都是一个必须拥有、已经拥有的功能。

2013 年 12 月 10 日的附加项

对 Exhaustive Fuzzy 部分进行了重大重排,使执行速度提高了数倍。Galadriel 的启发式算法 #1 和 #3 回归,允许“缓存”,从而避免不必要的 LD 矩阵更新。我还下载了最新的 Intel v14.0 编译器进行免费评估,并重新编译了 Kazahana 的 SSE 和 AVX 可执行文件,在下载区提供。谢谢 Intel。

目标是通过用 256 位内在函数替换缓慢的rep movsd来拓宽讨厌的memcpy()瓶颈,如果我有一台现代 CPU,我也会编译 512 位复制。

最新修订版1-++fix+nowait_critical_nixFIX_Wolfram+fixITER+EX+的实际提升如下。

D:\_KAZE\_KAZE_GOLD\Kazahana_source_EXEs_Benchmark>timer "Kazahana_r1-++fix+
  nowait_critical_nixFIX_WolfRAM+fixITER+EX_HEXADECAD-Threads_IntelV14_SSE2_32bit.exe" 
  2e "meescha tate" enwiki-20130904-pages-articles.7z.001 1536
Timer 9.01 : Igor Pavlov : Public domain : 2009-05-31
Kazahana, a superfast exact & wildcards & Levenshtein Distance 
  (Wagner-Fischer) searcher, r. 1-++fix+nowait_critical_nixFIX_Wolfram+fixITER+EX, copyleft Kaze 2013-Dec-05.
omp_get_num_procs( ) = 2
omp_get_max_threads( ) = 2
Enforcing HEXADECAD i.e. hexadecuple-threads ...
Allocating Master-Buffer 1536KB ... OK
/; 00,000,004,486 bytes/clock
Kazahana: Total/Checked/Dumped xgrams: 9,382,307/1,489,713,146/3
Kazahana: Performance: 4 KB/clock
Kazahana: Performance: 39 xgrams/clock
Kazahana: Performance: Total/fread() clocks: 239,876/944
Kazahana: Performance: I/O time, i.e. fread() time, is 0 percents
Kazahana: Performance: RDTSC I/O time, i.e. fread() time, is 2,407,651,279 ticks
Kazahana: Done.

Kernel Time  =    27.750 =   11%
User Time    =   450.656 =  187%
Process Time =   478.406 =  198%
Global Time  =   240.506 =  100%

D:\_KAZE\_KAZE_GOLD\Kazahana_source_EXEs_Benchmark>type Kazahana.txt
*  1977   &ndash; [[Ameesha Patel]], Indian actress and producer
* [[Ameesha Patel]]
* [[Miesha Tate]], Mixed martial artist

D:\_KAZE\_KAZE_GOLD\Kazahana_source_EXEs_Benchmark>timer "Kazahana_r1-++fix+
  nowait_critical_nixFIX_WolfRAM+fixITER+EX+_HEXADECAD-Threads_IntelV14_SSE2_32bit.exe" 
  2e "meescha tate" enwiki-20130904-pages-articles.7z.001 1536
Timer 9.01 : Igor Pavlov : Public domain : 2009-05-31
Kazahana, a superfast exact & wildcards & Levenshtein Distance (Wagner-Fischer) searcher, 
  r. 1-++fix+nowait_critical_nixFIX_Wolfram+fixITER+EX+, copyleft Kaze 2013-Dec-10.
omp_get_num_procs( ) = 2
omp_get_max_threads( ) = 2
Enforcing HEXADECAD i.e. hexadecuple-threads ...
Allocating Master-Buffer 1536KB ... OK
/; 00,000,006,880 bytes/clock
Kazahana: Total/Checked/Dumped xgrams: 9,382,307/1,489,713,186/3
Kazahana: Performance: 6 KB/clock
Kazahana: Performance: 59 xgrams/clock
Kazahana: Performance: Total/fread() clocks: 156,439/1,095
Kazahana: Performance: I/O time, i.e. fread() time, is 0 percents
Kazahana: Performance: RDTSC I/O time, i.e. fread() time, is 2,436,846,588 ticks
Kazahana: Done.

Kernel Time  =    18.796 =   11%
User Time    =   294.390 =  186%
Process Time =   313.187 =  198%
Global Time  =   157.485 =  100%

D:\_KAZE\_KAZE_GOLD\Kazahana_source_EXEs_Benchmark>type Kazahana.txt
*  1977   &ndash; [[Ameesha Patel]], Indian actress and producer
* [[Ameesha Patel]]
* [[Miesha Tate]], Mixed martial artist

D:\_KAZE\_KAZE_GOLD\Kazahana_source_EXEs_Benchmark>timer Kazahana_r1-++fix+
  nowait_critical_nixFIX_WolfRAM+fixITER+EX+_HEXADECAD-Threads_IntelV14_SSE2_32bit.exe 13e 
  "Project Icarus, a design study of an interstellar spacecraft based 
  on Project Daedalus" enwiki-20130904-pages-articles.7z.001 1536
Timer 9.01 : Igor Pavlov : Public domain : 2009-05-31
Kazahana, a superfast exact & wildcards & Levenshtein Distance (Wagner-Fischer) 
    searcher, r. 1-++fix+nowait_critical_nixFIX_Wolfram+fixITER+EX+, copyleft Kaze 2013-Dec-10.
omp_get_num_procs( ) = 2
omp_get_max_threads( ) = 2
Enforcing HEXADECAD i.e. hexadecuple-threads ...
Allocating Master-Buffer 1536KB ... OK
/; 00,000,000,518 bytes/clock
Kazahana: Total/Checked/Dumped xgrams: 9,382,307/2,527,966,497/1
Kazahana: Performance: 0 KB/clock
Kazahana: Performance: 4 xgrams/clock
Kazahana: Performance: Total/fread() clocks: 2,075,345/1,049
Kazahana: Performance: I/O time, i.e. fread() time, is 0 percents
Kazahana: Performance: RDTSC I/O time, i.e. fread() time, is 2,655,364,558 ticks
Kazahana: Done.

Kernel Time  =   106.125 =    5%
User Time    =  3867.406 =  186%
Process Time =  3973.531 =  191%
Global Time  =  2076.243 =  100%

D:\_KAZE\_KAZE_GOLD\Kazahana_source_EXEs_Benchmark>type Kazahana.txt
* [[Project Icarus (Interstellar Probe Design Study)]], a design study of an interstellar spacecraft based on Project Daedalus

D:\_KAZE\_KAZE_GOLD\Kazahana_source_EXEs_Benchmark>

“缓存”带来了强大的提升,17,283 秒 vs 2,076 秒,但仍然很慢,我看不出如何加快速度,Wikipedia 有 42GB,这意味着完全遍历将花费 42x2000 秒,太可怕了!

关注点

最后,我感谢 Igor Pavlov,干得漂亮。我还感谢 Fantasy 和 Mr.Eiht,他们帮助我校准了 MokujIN 和 FNV1A_YoshimitsuTRIAD,而它们又被用于 Kazahana。如果问我,贪婪(而不是需要)更多的速度是一种美德,它是一个强大的驱动力,唯一应该害怕的缺点是忘恩负义。

历史

  • 2013-11-17:Kazahana 成为主流的日子
© . All rights reserved.