集成.NET Google搜索REST API
本文档和代码演示了如何在.NET项目中集成Google搜索REST API。
引言
本文档演示了如何使用托管代码(C#)将Google搜索REST API集成到.NET项目中。REST API不需要许可证密钥,查询无数量限制。
本文档还涵盖了将JSON响应反序列化为.NET对象,以及处理API使用中的一些限制/功能的变通方法。
可以通过在浏览器地址栏中输入URL和搜索词来查询REST API。例如,以下调用...
https://ajax.googleapis.ac.cn/ajax/services/search/web?v=1.0&q=Earth%Day
返回...
{"responseData":
{"results":[{"GsearchResultClass":"GwebSearch",
"unescapedUrl":"http://www.earthday.net/",
"url":"http://www.earthday.net/",
"visibleUrl":"www.earthday.net",
"cacheUrl":"http://www.google.com/search?
q\u003dcache:szJHSzSCm38J:www.earthday.net",
"title":"\u003cb\u003eEarth Day\u003c/b\u003e Network",
"titleNoFormatting":"Earth Day Network",
"content":"Get information on \u003cb\u003eEarth
Day\u003c/b\u003e events, activities and actions, and learn how you
can join the \u003cb\u003eEarth Day\u003c/b\u003e Network
and Green Generation to make a difference
in \u003cb\u003e...\u003c/b\u003e"},
{"GsearchResultClass":"GwebSearch",
"unescapedUrl":"http://en.wikipedia.org/wiki/Earth_Day",
"url":"http://en.wikipedia.org/wiki/Earth_Day",
"visibleUrl":"en.wikipedia.org",
"cacheUrl":"http://www.google.com/search
?q\u003dcache:57bkwmGRNFkJ:en.wikipedia.org",
"title":"\u003cb\u003eEarth Day\u003c/b\u003e - Wikipedia,
the free encyclopedia","titleNoFormatting":"Earth
Day - Wikipedia, the free encyclopedia",
"content":"\u003cb\u003eEarth Day\u003c/b\u003e
is celebrated in the US on April 22
and is a day designed to inspire
awareness and appreciation for
the Earth\u0026#39;s environment.
\u003cb\u003e...\u003c/b\u003e"},
{"GsearchResultClass":"GwebSearch",
"unescapedUrl":"http://www.epa.gov/earthday/",
"url":"http://www.epa.gov/earthday/",
"visibleUrl":"www.epa.gov",
"cacheUrl":"http://www.google.com/search?
q\u003dcache:3IS9GTb-r0IJ:www.epa.gov",
"title":"\u003cb\u003eEarth Day\u003c/b\u003e | US EPA",
"titleNoFormatting":"Earth Day | US EPA",
"content":"Includes history of the celebration,
listing of EPA-sponsored events, and other resources."},
{"GsearchResultClass":"GwebSearch",
"unescapedUrl":"http://holidays.kaboose.com/earth-day/",
"url":"http://holidays.kaboose.com/earth-day/",
"visibleUrl":"holidays.kaboose.com",
"cacheUrl":"http://www.google.com/search?
q\u003dcache:v3MWPNh2yBkJ:holidays.kaboose.com",
"title":"\u003cb\u003eEarth Day\u003c/b\u003e 2009:
Crafts, Environmental Games, and
Recycling \u003cb\u003e...\u003c/b\u003e",
"titleNoFormatting":"Earth Day 2009:
Crafts, Environmental Games, and Recycling ...",
"content":"Jun 1, 2009 \u003cb\u003e...\u003c/b\u003e
\u003cb\u003eEarth Day\u003c/b\u003e is a special day to learn
about to take care of the planet.
Kids can learn with fun \u003cb\u003eEarth
Day\u003c/b\u003e activities, environmental projects,
\u003cb\u003e...\u003c/b\u003e"}],
"cursor":{"pages":[
{"start":"0","label":1},
{"start":"4","label":2},
{"start":"8","label":3},
{"start":"12","label":4},
{"start":"16","label":5},
{"start":"20","label":6},
{"start":"24","label":7},
{"start":"28","label":8}],
"estimatedResultCount":"40300000",
"currentPageIndex":0,
"moreResultsUrl":"http://www.google.com/search?
oe\u003dutf8\u0026ie\u003dutf8\u0026source\u003duds\u0026start\
u003d0\u0026hl\u003den\u0026q\u003dEarth+Day"}},
"responseDetails": null, "responseStatus": 200}
本文档演示了如何进行此类调用并在.NET项目中使用响应。
Google在其使用条款中的要求之一是,必须注明搜索结果来自Google。此项目中的Web控件包含JavaScript,可以在结果页面上显示“Powered by Google”徽标。
本文档侧重于标准的网页搜索结果,尽管API也可以返回图片和其他类型的搜索结果。如果需要其他搜索结果类型,则需要修改代码。
有关与Google搜索REST API交互的更多信息,请在此处阅读:http://googlesystem.blogspot.com/2008/04/google-search-rest-api.html。
背景
REST API不一定必须在服务器端使用,也可以使用JavaScript在客户端解析结果。然而,我们需要使用Google搜索API来生成和记录返回的搜索结果信息,因此我们提出了一个服务器端解决方案。另一个优点是,我们能够构建可重用的.NET Web控件,只需极少的配置即可放置在任何Web项目中。
使用Google搜索REST API的一个优点是,不需要API密钥,并且查询数量没有限制,这使得这里的代码和控件易于在新应用程序中重用。
REST API的一个缺点是,每次查询只能返回8个结果,并且任何搜索词最多只能访问前32个结果。这里的代码通过创建多个查询并将结果分组来部分解决这个问题,当需要更多结果时。这样,开发者在实现搜索功能时就不必处理这个问题了。
Google搜索是一个强大的工具,服务器端集成具有许多实际应用,因此我们决定贡献我们的代码,以帮助其他想要利用REST API的.NET开发者。本文档随附的代码使用API创建了一个简单的网站搜索框,该搜索框会将用户重定向到结果页面,尽管该解决方案的模块化结构旨在允许在其他应用程序中使用该代码。
屏幕截图
可以在以下网址查看实时网站的集成:Cognize.co.uk。
关注点
Google搜索REST API返回JSON响应。本文档中最有趣的代码片段之一是将JSON响应反序列化为.NET对象。这需要构建代表API JSON响应的对象。这些对象是在测试和观察响应数据后构建的。代码使用WebClient
进行调用。这是因为Google搜索REST API需要有效的HTTP头,这是最容易实现此目标的方式。
然后,代码使用DataContractJsonSerializer
将响应数据反序列化为.NET对象。
/// <summary>
/// Get google search results for a particular site
/// </summary>
/// <param name="siteName">The fully qualified
/// site root path for the site that results are to be limited to.
/// E.g. http://www.cognize.co.uk</param>
/// <param name="searchString">The raw
/// (non encoded) search string</param>
/// <returns></returns>
public static ResponseData GetSearchResultsChunk( string siteName,
string searchString, int resultCountStartPos )
{
ResponseData responseData = null;
try
{
searchString = HttpUtility.UrlEncode( searchString.Trim() );
using (WebClient client = new WebClient())
{
// Manipulate request headers - Google REST API
// requires valid result header
// hence the use of Web client as opposed to WebRequest
client.Headers.Add( "user-agent",
"Mozilla/4.0 (compatible; MSIE 6.0; " +
"Windows NT 5.2; .NET CLR 1.0.3705;)" );
string siteSearchString = String.Empty;
if (!String.IsNullOrEmpty( siteName ))
{
// Param name and value must include no spaces
siteSearchString = "site:" + siteName + " ";
}
// Result size is rsz in query string. This parameter causes the
// api to return results in sets of 8 where 'large' is used
// rather than 4 if 'small' is used,
// so less total api requests are required.
string resultSize = "large";
string searchRequestURL = "https://ajax.googleapis.ac.cn/" +
"ajax/services/search/web?v=1.0&start=" +
resultCountStartPos.ToString() + "&rsz=" +
resultSize + "&q=" + siteSearchString + searchString;
DataContractJsonSerializer jsonSerializer =
new DataContractJsonSerializer( typeof( GoogleAPISearchResults ) );
// Read our search results into a .net object
GoogleAPISearchResults searchResultsObj =
(GoogleAPISearchResults)jsonSerializer.ReadObject(
client.OpenRead( searchRequestURL ) );
responseData = searchResultsObj.responseData;
}
}
catch (Exception ex)
{
// Log error here
// Allow exception to bubble up
throw ex;
}
// Return response data including search results
return responseData;
}
代表JSON响应的对象需要在类级别使用[DataContract]
属性,并且每个属性使用[DataMember]
。
using System;
using System.Runtime.Serialization;
namespace Cognize.GoogleSearchAPIIntegration
{
[DataContract]
public class Pages
{
[DataMember]
public int start
{
get;
set;
}
[DataMember]
public int label
{
get;
set;
}
}
}
对多个查询的管理和结果的分组(由于API每次调用只返回8个结果的限制)由以下代码管理。如果需要更多结果,将使用不同的起始索引进行额外调用,例如,如果请求10个结果,第二次调用将使用8作为结果起始参数(chunkResultCountStart
)。
Google REST API的一个奇怪行为是,如果一个搜索词只返回少量结果,例如1个,那么第二次调用API获取第二批结果将返回相同的搜索结果,即使起始参数设置为8(对于第8个结果)。为了解决这个问题,在读取第一个查询结果时,会读取响应数据以获取估计结果数(estimatedResultCount
)。如果此值低于请求的最大搜索结果数,则会在for
循环中修改maxChunksRequired
(我知道这不是最佳实践,但有必要),以防止对API进行过多调用和重复结果。
/// <summary>
/// Get google search results from a specific domain.
/// Google documentation suggests that the maximum number
/// of results that can be requested without
/// throwing an exception are 32. In testing, using large chunks,
/// upto 64 results have been acheived. Any number above 64 is amended to 64.
/// </summary>
/// <param name="siteName">The fully qualified site name url.</param>
/// <param name="searchString">The raw search string.</param>
/// <param name="requestedResults">The number of results requested. </param>
/// <returns>The number of search results to return.</returns>
public static SortedList<int, Results>
GetSearchResults( string siteName,
string searchString, int requestedResults )
{
SortedList<int, Results> searchResultsLst =
new SortedList<int, Results>();
// API will error if we request more than 64 results,
// so modifiy if over that figure
if (requestedResults > 64)
{
requestedResults = 64;
}
int maxChunksRequired = CalculateChunksRequired( requestedResults );
for (int chunkIndx = 0; chunkIndx < maxChunksRequired; chunkIndx++)
{
// Results are returned in sets of 8
// so we request results from the start of
// the next group of 8.
int chunkResultCountStart = (chunkIndx * 8);
// Return the response data including chunk of search results
ResponseData responseData = GetSearchResultsChunk( siteName,
searchString, chunkResultCountStart );
// For some search terms,
// the max no of requested results will be higher
// than the actual number of results google has.
// This is determined
// by esitmatedResultCount value
// returned by the API. In this case we need
// to reduce the number of API calls,
// since superflous calls will result
// in the results being duplicated
// where google returns repeats the results despite
// the value passed in chunkResultCountStart
if (responseData != null)
{
if (chunkIndx == 0)
{
int realChunksRequired = CalculateChunksRequired(
responseData.cursor.estimatedResultCount );
if (maxChunksRequired > realChunksRequired)
{
maxChunksRequired = realChunksRequired;
}
}
// Put the results in a more manageable sorted list
for (int resultIndx = 0; resultIndx <
responseData.results.Length; resultIndx++)
{
searchResultsLst.Add( searchResultsLst.Count,
responseData.results[resultIndx] );
}
}
}
return searchResultsLst;
}
Using the Code
要使用该代码,您可以直接将其集成到您的Web应用程序中,使用包含的Web控件。
如果您想将代码用于其他目的,可以使用以下主要方法:
public static SortedList<int, Results>
GetSearchResults( string siteName,
string searchString, int requestedResults )
正如其名称所示,“这可以检索给定搜索词的结果,最多可达请求的结果数(测试中的最大值为64)。”
还有一个不限制结果到特定网站的重载方法,可以以类似的方式使用。
历史
- 2009年12月30日:首次发布。