ASP.NET 精确短语网站实时搜索 2.0 版本 (适用于 .NET Framework 2 及更高版本)

Tushar Arora

4.00/5 (1投票)

2010 年 8 月 14 日

CPOL

6分钟阅读

34594

376

ASP.NET 脚本，用于在网站中实时查找精确关键词或短语。适用于 .NET Framework 2.0

下载源代码 (2.0 版本) - 156.78 KB

引言

我的项目是 ASP.NET 精确短语网站实时搜索脚本，适用于 .NET Framework 2.0 及更高版本。该脚本可用于 ASP.NET 网站：localhost, www.mydomain.com 等。我使用 C# 编程实现了此脚本。此脚本是一个免费工具，可以搜索整个 domain.com，包括以下文本文件：.html, .htm, .txt, .rtf, .nfo, .doc, .php, .asp, .aspx, .php4, .php5, .xml 等，并根据您的自定义设置进行搜索。您可以搜索精确的关键词或精确的短语。此脚本的管理员可以进一步施加限制。您可以将此脚本升级到更高级的 Web 应用程序。使用此搜索脚本，您无需数据库，也无需创建临时文件或临时空间。搜索直接在服务器上实时进行！在 Xeon 服务器上，一次解析至少 100 个大型网页只需两秒钟，并立即索引搜索到的文件。由于是实时搜索，它会对整个物理服务器产生负载，因此此脚本可用于超级服务器以及大学、学院、图书馆和学校。此脚本是一个完整的 Web 应用程序，解决了那些不知编程或只想为关键词和短语实现网站搜索功能的人的网站实时搜索问题。

扩展信息

这是旧版本 1.0 脚本的 2.0 版本 + Revision 8。将此 ASP.NET 脚本放入 yourdomain.com/<folder> 目录中，即可在整个网站中搜索精确的单词或短语。此脚本是免费软件，仅供个人和教育用途。您可以在大学、学校和公共图书馆中实施此脚本，以搜索网站上的所有*文件。该脚本可升级。目前，由于该脚本实时搜索文件，因此无需数据库、文件或任何临时空间来索引文件，但它会增加服务器的负载。它需要内存和服务器负载。服务器上存储的任何新内容，此脚本都会处理。您指定的短语或关键词将完全按照您在搜索表单中指定的方式进行搜索。所有搜索短语都将转换为小写字母进行搜索，这样，如果用户不知道要搜索的单词是大写还是小写字母，搜索将找到所有匹配的单词或短语。当在有效解析的文件（.html, .htm, .txt., .nfo, .rtf）中找到匹配项时，保证 99.9% 的精确搜索都能找到，因此此脚本非常适合公共图书馆从配置的文件扩展名中搜索书籍、标题、姓名等。

背景

这是一个用 ASP.NET 和 C# 编写的免费搜索引擎模块。它包含了要求苛刻的教育机构或免费网站管理员在其网站上所需的一切。该脚本可升级，因为它完全由方法/函数驱动，结构清晰，组织良好，没有杂乱的代码。

Using the Code

下载代码，将文件解压缩到 <localhost>/search 文件夹，或直接放在网站的 <domain.com>/search 文件夹中。如果您有 ASP.NET，脚本将从主文件 phrasesearch.aspx 执行，然后显示搜索表单。phrasesearch.aspx.cs 文件是主要的 C# 代码文件，顶部有一些需要手动配置的常量或全局变量。但除非需要，否则无需配置。只有四到六个变量，并且对了解 Web 开发和 ASP 的程序员来说很容易理解。

在 phrasesearch.aspx.cs 文件中，这些是仅“可选”的全局变量。除非出现错误，否则实际上无需配置。

/*****************************************************************
//
//  MANUAL CONFIGURATION SECTION**/
    
//*****************************************************************/
    
// Configure your website address here:
// example: http://www.mydomain.com:80/
// example: http://www.mydomain2.com:8080/
// example: http://www.mydomain3.com:2200/
// also include port number of your http server along with the website url :XX
// please do not put slash mark "/" after the fully qualified website address
public const String mywebsiteaddr = "";
    
// Configure your domain here:
// example: http://mydomain.com:9090
// example: https://:50
// example: http://tushar
// X NOT: mydomain.com, mydomain, etc. etc.
// Fully qualified http://domain.com address is required along with port number
// also include port number of your http server along with the website url :XX
public const String mydomain = "";
    
// Configure your domain/website's root directory here:
// example: "public_html" or "wwwroot" or "httpdocs"
// please do not put slash mark "/" after the directory's name
public const String mydomainrootaddr = "";
    
// Configure your starting search directory which will be entirely search
// example: "/" (root of your domain) or "/folder1" or "/folder2" 
// are www.lawsofbrahman.com/folder1 and www.lawsofbrahman.com/folder2
// please put back slash mark "/" after the directory's name
public const String mysearchdir = "/";
    
// Enable the flag that you have filled the variables above:
// example: isSetupDone = true;
public const Boolean isSetupDone = false;    
    
//*****************************************************************/
    
// Configure maximum number of files to be search in the entire website.
// example: myfileslimit = 10000 -> will search only Ten thousand files
// example: myfileslimit = 1000 -> will search only One thousand files
// example: myfileslimit = 50 -> will search only Fifty files
// please only put digits and no negative number
// This variable is not under the isSetupDone flag and 
// so is always enabled in the code. 
// When you wish to use this variable, please select [ search unlimited files ] 
// option from the total searched files limit combobox, 
// otherwise this variable won't work.
public static int myfileslimit = 100;
    
// Recommended configuration
// Configure your domain/website's physical path here, 
// if you actually and exactly know it!:
// example: myphysicalpath = "C:\\Inetpub\\wwwroot"
// please do not put slash mark "\" after the directory's name
// please put double backslash between all the 
// directory names in the path: C:\\dir1\\dir2\\dir3
// This setting does not depend on isSetupDone flag.
// Example2: myphysicalpath = "C:\\Inetpub\\vhosts\\lawsofbrahman.com\\httpdocs"
public const String myphysicalpath = "";
    
// arrayFileTypesToSearch is a String Array containing 
// the extensions of the files that are to be parsed. All other files are ignored.
// ".htm" will search the htm webpages
// ".txt" will search the text files
// IMP: PLEASE PUT "." DOT IN FRONT OF THE EXTENSIONS.
// MODIFY THE ARRAY AS IT IS AND DO NOT MAKE IT SOMETHING ELSE! 
// OTHERWISE SCRIPT MIGHT MALFUNCTION.
// LIKE THIS: = {".txt", ".docx", ".html", ".xml"} 
// WITHIN "{" AND "}" CURLY BRACKETS.
public static String[] arrayFileTypesToSearch = 
	{".htm", ".html", ".txt", ".nfo", ".rtf", ".doc", ".xml", ".php", 
	".asp", ".aspx", ".php4", ".php5"};

如果脚本未按预期工作，则手动配置这些变量。

public static Boolean initializeSiteSearching()
{
    Boolean returnval = false;
    
    // Reset all static variables before beginning a new search...
    strSearchPhrase = "";
    inumResultsLimit = 0;
    strDomainRootDir = "";
    strDomainUrl = "";
    boolSearchingDone = false;
    arrayPhraseMatches = null;
    iPhraseMatchCount = 0;
    arrayPhrasePositions = null;
    numMatchingFilesFound = 0;
    iCountSearchedFiles = 0;
    iFilesLimitReached = false;
    
    // Find the domain's root directory
    try
    {
        strDomainRootDir = HttpContext.Current.Server.MapPath("/");
    }
    catch (Exception)
    {
        strDomainRootDir = HttpContext.Current.Server.MapPath("~/");
    }
    
    // Find domain's URL
    strDomainUrl = HttpContext.Current.Request.Url.Host;
    
    // Get the server/domain.com's port
    strDomainPort = HttpContext.Current.Request.ServerVariables["SERVER_PORT"];
    
    // Configure the total files limit if user specified negative digits, 
    // reconfigure it with default value
    if (myfileslimit <= 0)
        myfileslimit = 1000;
        
    // Finalize if we got the domain root directory and the domain url
    if (strDomainRootDir.Length > 0 && strDomainUrl.Length > 0)
    {
        returnval = true;
    }
    return returnval;
}

此函数启动搜索。

代码中的一些函数

```
public static string PageName()
```
返回脚本 <filename.aspx> 的文件名。此函数是可选的。
```
public static Boolean initializeSiteSearching()
```
此函数仅在新搜索即将发生时初始化。此函数将所有必需的变量重置为 null 或 0 值。为了自动化任务，使 ASP.NET 函数在解析无数文件时不会被无限制地调用，该函数还获取全局网站相关值，包括网站 URL、域名以及网站运行的端口号。该函数还设置一个全局可配置变量，即一次搜索文件的数量限制。
```
public static String GenerateInternalLink(String strPath)
```
每当解析有效文件并即将将其实时索引到搜索结果页面时，就会调用此函数。此函数很重要，因为它会删除文件的物理路径，将文件转换为相对 URL 路径，然后返回相对 URL 路径，以便在索引结果中作为正在索引的文件的 URL 或超链接，以匹配模式。此函数对于解析与网站相关的路径很重要。
```
public static String GetRelavitePathOfFile(String filepath)
```
每当解析有效文件并即将对其进行索引时，也会调用此函数。该函数解析相对于 localhost 的物理路径。这是将物理路径转换为网站相对路径/URL 的主要函数。假设如果此函数返回错误但未返回处理过的网站相对 URL，则仅调用 GenerateInternalLink 作为获取网站相对 URL 的最终方法。这两个方法都是可升级的，并在 FindMatchingCurrentFile() 方法中调用，在该方法中执行所需的自动化任务，例如：提取文件的全部数据、去除标签等，然后将用户指定的短语与文件中缓冲区包含的文本进行匹配。如果文本匹配，则无论匹配多少次，该文件都会被索引，从 HTML 文件中获取标题，并生成链接，以便用户可以单击链接查看包含用户指定短语或关键词的文件。此函数使用 GenerateInternalLink() 函数来帮助解析路径。
```
public static Boolean FindMatchingCurrentFile(string path)
```
此函数执行与文件相关的任务，尝试搜索用户指定的短语的任何匹配项，并在文件中找到任何匹配模式时将文件索引到结果部分。这是主要的与文件和索引相关的函数。

一些代码片段示例

public static String GenerateInternalLink(String strPath)
   {
       String strfinalurl = "";
       String strPathstripped = "";
       String strDomainRootStripped = "";

       String strDomainRootPath = "";
       try
       {
           strDomainRootPath = HttpContext.Current.Request.MapPath("/");
       }
       catch (Exception)
       {
           strDomainRootPath = HttpContext.Current.Request.MapPath
			(HttpContext.Current.Request.ApplicationPath);
       }

       // Remove the physical path
       strPathstripped = Replace(strPath, @"\\", "/", -1, 0, 
		RegexOptions.IgnoreCase | RegexOptions.Singleline);
       strDomainRootStripped = Replace(strDomainRootPath, @"\\", 
		"/", -1, 0, RegexOptions.IgnoreCase | RegexOptions.Singleline);
       strfinalurl = Replace(strPathstripped, strDomainRootStripped, 
		"", -1, 0, RegexOptions.IgnoreCase | RegexOptions.Singleline);
       return strfinalurl; 
   }

public static String GetRelavitePathOfFile(String filepath)
    {
        String strresult = "";
        String strfinalurl = "";
        String strRootPath = "";
        try
        {
            strRootPath = HttpContext.Current.Request.MapPath("/");
        }
        catch (Exception)
        {
            strRootPath = HttpContext.Current.Request.MapPath
		(HttpContext.Current.Request.ApplicationPath);
        }

        // First Check if a physical path had been configured.

        // Check if admin user specified an exact physical path
        if (myphysicalpath.Length > 0)
        {
            // A physical file path was specified which will be stripped directly.
            // Strip drive letter and trailing :\
            strfinalurl = Replace(filepath, strRootPath, "", 
		-1, 0, RegexOptions.IgnoreCase | RegexOptions.Singleline);

            String modphysicalpath = Replace(strfinalurl, @"\\", "/", 
		-1, 0, RegexOptions.IgnoreCase | RegexOptions.Singleline);

            // Replace physical path separators with url path separator
            strfinalurl = Replace(strfinalurl, @"\\", "/",
		-1,0,RegexOptions.IgnoreCase | RegexOptions.Singleline);

            // now finally remove the physical path's unnecessary bytes 
            // by matching modphysicalpath
            strresult = Replace(strfinalurl, modphysicalpath, "", 
		-1, 0, RegexOptions.IgnoreCase | RegexOptions.Singleline);

            // Return the result
            return strresult;
        }

        String strfileurl = GenerateInternalLink(filepath);

        // If the user has not created the setup then proceed with default settings
        if (isSetupDone == false)
        {
            /*
             * NEW CODE
             */
            if (thisDomainName == "localhost") 
		{ strfileurl = "https://:" + strDomainPort + "/" + strfileurl; }
            else { strfileurl = "http://" + thisDomainName + ":" + 
				strDomainPort + "/" + strfileurl; }
            return strfileurl;
        }
        else
        {
            // using user defined variables****************
            // NEW CODE
            if (mydomain == "localhost")
		{strfileurl = "https://:" + strDomainPort + "/" + strfileurl; }
            else { strfileurl = "http://" + mydomain + "/" + strfileurl; }
            return strfileurl;
        }

        // Nothing found
        return "Error in GetRelativePathOfFile method";
    }

代码/用户界面语言

语言是英语。

结论

这是我的第二个版本，也是最新的版本，它运行在 IIS+、.NET 2.0 或更高版本上，完全自动化并修复了错误。如果出现任何问题，请通过以下方式与我联系：aroratushar@gmail.com。

关注点

我对关键词和短语搜索很着迷，因此我尝试制作一个基于脚本的搜索引擎。我热爱 .NET，喜欢在 Visual Studio、Visual Basic 和 .NET 中编程。我很高兴创建了自己的搜索引擎，并发现此脚本对有需要的人非常有价值。

历史

2010 年 8 月 17 日：新更新和错误修复
2009 年 8 月 27 日：首次发布