65.9K
CodeProject 正在变化。 阅读更多。
Home

一个非常虚荣的“我的文章怎么样了”的 Web 爬虫

starIconstarIconstarIconstarIcon
emptyStarIcon
starIcon

4.56/5 (38投票s)

2006年12月22日

CPOL

6分钟阅读

viewsIcon

94646

downloadIcon

1000

一个简单的网络爬虫,用于抓取 CodeProject 文章。

引言

这篇文章纯粹是出于好奇心写的。我的一位朋友构建了一个酒吧串吧网站(barcrawl),该网站使用了一个小型网络爬虫来抓取地图网站并提取地图坐标。我以前没做过这个,所以决定尝试一下。

但首先,我需要有一个我想从中获取一些数据的网站。由于我完全沉迷于 The Code Project,所以我决定选择它。我接下来想到的问题是,我想抓取什么数据?嗯,我是一个非常虚荣的家伙(我想我们有时都会这样),我想我应该写一个网络爬虫来抓取我的文章摘要区域,类似于下面显示的区域,您可以通过类似以下的 URL 找到它:https://codeproject.org.cn/script/articles/list_articles.asp?userid=userId

所以,这就是本文试图抓取的内容。在尝试这样做时,我认为本文展示了如何从可能包含大量数据的区域中抓取有用信息。

这一切是如何完成的

VainWebSpider 应用程序涉及几个步骤,如下所示:

  1. 获取用户 ID,以便形成完整的 URL 来抓取文章。
  2. 将此用户 ID 存储在注册表中,以便 VainWebSpider 应用程序下次运行时知道要抓取哪个用户的数据。
  3. 根据完整 URL(例如 https://codeproject.org.cn/script/articles/list_articles.asp?userid=569009)抓取整个网页。
  4. 将网页存储在合适的对象中,字符串即可。
  5. 拆分字符串以仅抓取感兴趣的区域,从而大大减小内存中字符串的大小。我们只对文章摘要详细信息感兴趣;其他任何内容我们都不关心。
  6. 使用正则表达式来抓取所需数据。

这 6 个步骤是本文的基础。

第 6 步确实是最有趣的,因为它允许我们从垃圾数据中提取我们想要的数据。我们来看一个例子,好吗?

使用正则表达式抓取数据

<hr size=1 noshade><a name='#Author0'></a><h2>Articles 
  by Sacha Barber (10 articles found)</h2><h4>Average article rating: 
  4.5</h4><h3>C# Algorithms</h3>

  <p><a href='https://codeproject.org.cn/cs/algorithms/#Evolutional'>Evolutional</a></p>

  <div class=smallText style='width:600;margin-left:40px;margin-top:10px'>

  <a href='https://codeproject.org.cn/cs/algorithms/Genetic_Algorithm.asp'>
  <b>AI - Simple Genetic 
  Algorithm (GA) to solve a card problem</b></a><div style='font-size:8pt; 
  color:#666666'>Last Updated: 8 Nov 2006  Page views: 7,164  Rating: 
  4.7/5  Votes: 17  Popularity: 5.8</div>

  <div style='margin-top:3;font-size:8pt;'>A simple Genetic Algorithm (GA) 
  to solve a card problem.</div>
</div>

上面显示了文章的实际网站内容。假设我们只想抓取文章的浏览次数。这该如何做到?其实很简单。我们只需创建一个格式良好的正则表达式,例如

private List<long> getViews()
{
    string pattern = "Page views: [0-9,]*";
    MatchCollection matches = Regex.Matches(this.webContent, 
        pattern, RegexOptions.ExplicitCapture);
    List<long> lViews = new List<long>();
    foreach (Match m in matches)
    {
        int idx = m.Value.LastIndexOf(":") + 2;
        lViews.Add(long.Parse(m.Value.Substring(idx).Replace(",", "")));
    }
    return lViews;
}

这段精巧的代码足以匹配网页内容中所有页面浏览量:XXXX的条目。matches 对象最终会包含浏览量值,例如上面的示例中的 7,164。从这里开始就很容易了;我们只需为我们感兴趣的网页内容的所有部分重复此操作。

最终,我们在 WebScreenScraper 类中得到了用于抓取以下详细信息的正则表达式:

  • 文章浏览量
  • 文章投票数
  • 文章受欢迎度
  • 文章评分
  • 文章 URL

所有这些都发生在 WebScreenScraper 类中。一旦我们得到了结果,它们就被简单地作为标准的 ADO.NET DataTable 提供,以便主界面 (frmMain) 可以以一种美观的方式显示它们。

类图

VainWebSpider 类的类图如下:

代码列表

执行此操作的代码基本上如下:

Program 类

此类包含各种弹出窗口和常用函数,以及 Main 方法。

using System;
using System.Collections.Generic;
using System.Windows.Forms;
using Microsoft.Win32;

namespace VainWebSpider
{
    #region Program CLASS
    /// <summary>
    /// provides the main access point into the application. Also
    /// provides several generic helper methods, such as InputBox(..),
    /// ErrorBox(..),InfoBox(..) and also provides read/write funtions
    /// to store the current UserID within the registry
    /// </summary>
    public static class Program
    {
        #region Instance fields
        //instance fields
        private static long userId;

        #endregion
        #region Public Methods/Properties

        /// <summary>
        /// gets or sets the UserID which will be used to retrieve codeproject
        /// articles for. When a new UserID is set, the new value is also written
        /// to the windows registry, using the writeToRegistry(..) method. This 
        /// ensures the next time the VainWebSpider application is run, the last
        /// selected UserID will be used. 
        /// </summary>
        public static long UserID
        {
            get { return Program.userId; }
            set
            {
                Program.userId = value;
                Program.writeToRegistry(value);
            }
        }

        /// <summary>
        /// Creates a new "VainWebSpider" subkey (if none exists) under 
        /// the HKEY_LOCAL_MACHINE\SOFTWARE registry key. It also creates
        /// a new value within the newly created VainWebSpider subkey, for
        /// the userId input parameter. This is done so that the VainWebSpider
        /// application can know which user it was looking at last time
        /// </summary>
        /// <param name="userId">
        ///     The userId to store within the registry</param>
        public static void writeToRegistry(long userId)
        {
            try
            {
                RegistryKey hklm = Registry.LocalMachine;
                RegistryKey hkSoftware = 
                            hklm.OpenSubKey("Software", true);
                RegistryKey hkVainWebSpider = 
                    hkSoftware.CreateSubKey("VainWebSpider");
                hkVainWebSpider.SetValue("userId", userId);
            }
            catch (Exception ex)
            {
                Program.ErrorBox(
                    "There was a problem creating " + 
                    "the Registry key for VainWebSpider");
            }
        }

        /// <summary>
        /// Returns the userId value within the
        /// HKEY_LOCAL_MACHINE\SOFTWARE\VainWebSpider registry key
        /// </summary>
        /// <returns>The value of the userId value within the
        /// HKEY_LOCAL_MACHINE\SOFTWARE\VainWebSpider registry key, 
        /// if it exists, else returns -1</returns>
        public static long readFromRegistry()
        {
            try
            {
                RegistryKey hklm = Registry.LocalMachine;
                RegistryKey hkSoftware = hklm.OpenSubKey("Software");
                RegistryKey hkVainWebSpider = 
                            hkSoftware.OpenSubKey("VainWebSpider");
                return long.Parse(hkVainWebSpider.GetValue("userId").ToString());
            }
            catch (Exception ex)
            {
                return -1;
            }
        }

        /// <summary>
        /// InputBox, returns user input string
        /// </summary>
        /// <param name="prompt">the prompt</param>
        /// <param name="title">the form title</param>
        /// <param name="defaultValue">the default value to use</param>
        /// <returns>the string the user entered</returns>
        public static string InputBox(string prompt,
          string title, string defaultValue)
        {
            InputBoxDialog ib = new InputBoxDialog();
            ib.FormPrompt = prompt;
            ib.FormCaption = title;
            ib.DefaultValue = defaultValue;
            ib.ShowDialog();
            string s = ib.InputResponse;
            ib.Close();
            return s;
        } 

        /// <summary>
        /// Shows an error message within a MessageBox
        /// </summary>
        /// <param name="error">the error message</param>
        public static void ErrorBox(string error)
        {
            MessageBox.Show(error,"Error", 
               MessageBoxButtons.OK,MessageBoxIcon.Error);
        }

        /// <summary>
        /// Shows an information message within a MessageBox
        /// </summary>
        /// <param name="error">the information message</param>
        public static void InfoBox(string info)
        {
            MessageBox.Show(info, "Information",
                MessageBoxButtons.OK, MessageBoxIcon.Information);
        }

        /// <summary>
        /// Shows a Yes/No query within a MessageBox
        /// </summary>
        /// <param name="query">the query message</param>
        /// <returns>DialogResult,
        /// which is the result of the Confirmation query</returns>
        public static DialogResult YesNoBox(string query)
        {
            return MessageBox.Show(query,
                "Confirmation", MessageBoxButtons.YesNo,
                MessageBoxIcon.Question);
        }
        #endregion
        #region MAIN THREAD
        /// <summary>
        /// The main entry point for the application.
        /// Expects 0 command line arguments
        /// </summary>
        [STAThread]
        static void Main()
        {
            Application.EnableVisualStyles();
            Application.SetCompatibleTextRenderingDefault(false);
            Application.Run(new frmLoader());

        }
        #endregion
    }
    #endregion
}

WebScreenScraper 类

此类负责从相关的 CodeProject 网页获取和提取数据。

using System;
using System.Collections.Generic;
using System.Text;
using System.Data;
using System.Text.RegularExpressions;
using System.IO;
using System.Net;
using System.Windows.Forms;


namespace VainWebSpider
{
    #region WebScreenScraper CLASS
    /// <summary>
    /// This class reads the entire contents of the article summary codeproject
    /// web page for the currently selected user. An example URL for such a codeproject
    /// page may be 
    /// https://codeproject.org.cn/script/articles/list_articles.asp?userid=569009
    /// which would fetch all articles for author 569009 that's Sacha Barber, which
    /// is Me.
    /// Data within this page is then extracted using regular expressions which are then
    /// used to create new <see cref="CPArticle">
    ///       CPArticle</see> objects. The values within
    /// these new CPArticle objects in then used to create 
    /// a <see cref="DataTable">DataTable
    /// </see> which is used by the
    ///       <see cref="frmMain">main interface </see>
    /// </summary>
    public class WebScreenScraper
    {
        #region Instance Fields
        // Fields
        private List<CPArticle> cpArticlesForUser;
        private bool hasArticles;
        private string authorName;
        private long userId;
        public string webContent;
        public event EventHandler EndParse;
        public event EventHandler StartParse;
        #endregion
        #region Constructor
        /// <summary>
        /// Constructs a new WebScreenScraper using the parameters provided
        /// </summary>
        /// <param name="userId">The codeproject
        ///             user to fetch articles for</param>
        public WebScreenScraper(long userId)
        {
            this.hasArticles = true;
            this.cpArticlesForUser = new List<CPArticle>();
            this.userId = userId;
        }
        #endregion
        #region Public Properties / Methods
        /// <summary>
        /// Raises the start event, then calls the following methods
        /// readInSiteContents(..) and getArticleSummaryArea(..)
        /// </summary>
        public void getInitialData()
        {
            this.OnStartParse(this, new EventArgs());
            this.readInSiteContents();
            this.getArticleSummaryArea();
        }

        /// <summary>
        /// Returns a <see cref="DataTable">DataTable<see/>
        /// of all the articles founf (if any) for the current
        /// codeproject user
        /// </summary>
        /// <returns>A <see cref="DataTable">DataTable<see/>
        /// which holds all the articles found for the current
        /// codeproject user</returns>
        public DataTable getWebData()
        {
            //screen scape the web page, to gather the
            //data that we are intersted in
            List<long> lViews = this.getViews();
            List<string> lRatings = this.getRatings();
            List<int> lVotes = this.getVotes();
            List<float> lPopularity = this.getPopularity();
            List<string> lURLS = this.getArticleURLS();
            //create new CPArticles using the extracted data
            for (int i = 0; i < lViews.Count; i++)
            {
                this.cpArticlesForUser.Add(new CPArticle(
                                                    lViews[i],
                                                    lRatings[i],
                                                    lVotes[i],
                                                    lPopularity[i],
                                                    lURLS[i]));
            }
            //raise the finished event, to alert the event subscribers
            //that we are now donw
            this.OnEndParse(this, new EventArgs());
            //return the DataTable to the caller
            return this.createDataSet();
        }

        /// <summary>
        /// Returns true if the currently parsed web page has 
        /// codeproject articles. Some codeproject users dont
        /// have an articles published
        /// </summary>
        public bool HasArticles
        {
            get { return this.hasArticles; }
        }

        /// <summary>
        /// Gets the number of articles for the currently
        /// requested codeproject member
        /// </summary>
        public int NoOfArticles
        {
            get { return this.cpArticlesForUser.Count; }
        }

        /// <summary>
        /// Gets the name for the currently requested 
        /// codeproject member
        /// </summary>
        public string AuthorName
        {
            get { return this.authorName; }
        }
        #endregion
        #region Events
        /// <summary>
        /// Raised when the parsing of the requested codeproject page is completed
        /// </summary>
        /// <param name="sender"><see
        ///    cref="WebScreenScraper">
        ///         WebScreenScraper</see></param>
        /// <param name="e"><see
        ///     cref="WebScreenScraper">EventArgs</see></param>
        public void OnEndParse(object sender, EventArgs e)
        {
            if (this.EndParse != null)
            {
                this.EndParse(this, e);
            }
        }

        /// <summary>
        /// Raised at the start of parsing of the requested codeproject page
        /// </summary>
        /// <param name="sender"><see
        ///     cref="WebScreenScraper">WebScreenScraper</see></param>
        /// <param name="e"><see
        ///     cref="WebScreenScraper">EventArgs</see></param>
        public void OnStartParse(object sender, EventArgs e)
        {
            if (this.StartParse != null)
            {
                this.StartParse(this, e);
            }
        }
        #endregion
        #region Private Methods
        /// <summary>
        /// Returns a <see cref="DataTable">DataTable<see/>
        /// of all the articles founf (if any) for the current
        /// codeproject user
        /// </summary>
        /// <returns>A <see cref="DataTable">DataTable</see>
        ///  which holds all the article details for the current
        ///  code project user </returns>
        private DataTable createDataSet()
        {
            //create a new DataTable and set up the column types
            DataTable dt = new DataTable("CPArticles");
            dt.Columns.Add("ArticleURL", Type.GetType("System.String"));
            dt.Columns.Add("Views", Type.GetType("System.Int64"));
            dt.Columns.Add("Ratings", Type.GetType("System.String"));
            dt.Columns.Add("Votes", Type.GetType("System.Int32"));
            dt.Columns.Add("Popularity", Type.GetType("System.Single"));
            //loop through all the previously fetched CPArticle(s) and
            //add the contents of each to the DataTable
            foreach (CPArticle cpa in this.cpArticlesForUser)
            {
                DataRow row = dt.NewRow();
                row["ArticleURL"] = cpa.ArticleURL;
                row["Views"] = cpa.Views;
                row["Ratings"] = cpa.Ratings;
                row["Votes"] = cpa.Votes;
                row["Popularity"] = cpa.Popularity;
                dt.Rows.Add(row);
            }
            return dt;
        }

        /// <summary>
        /// Trimes the entire web content read from codeproject page, to being
        /// just the article summary area. Which is a much smaller more manageable
        /// string. Which means that the webContent instance field now contains
        /// a string which has ALL the details we need, but none of the other stuff
        /// which is off no interest.
        /// </summary>
        private void getArticleSummaryArea()
        {
            //clear all the articles that may have been stored for
            //the previous run
            this.cpArticlesForUser.Clear();
            //check for no articles found
            if (this.webContent.Contains("(No articles found)"))
            {
                this.webContent = "";
                this.hasArticles = false;
                this.authorName = "";
            }
            else
            {
                //check for an author name, codeproject article summary page
                //always uses <a name='#Author0'> to denote author text
                int idx = this.webContent.IndexOf("<a name='#Author0'>", 0);
                if (idx > 0)
                {
                    this.webContent = this.webContent.Substring(idx);
                    this.hasArticles = true;
                    this.authorName = getAuthor();
                }
                //ERROR, no author, no articles, this is bad, must be totally
                //unknown user as codepeoject web site
                else
                {
                    this.webContent = "";
                    this.hasArticles = false;
                    this.authorName = "";
                }
            }
        }

        /// <summary>
        /// returns a string, which represents the name of the author
        /// for all the articles for the current codeproject user
        /// This name is extracted by using the RegEx
        /// pattern "Articles by [a-z\sA-Z]*" on the 
        /// codeproject web page for the current user
        /// </summary>
        /// <returns>a string, which represent the authors name
        /// for all the articles for the current codeproject user</returns>
        private string getAuthor()
        {
            string pattern = @"Articles by [a-z\sA-Z]*";
            MatchCollection matches = Regex.Matches(this.webContent, 
                pattern, RegexOptions.ExplicitCapture);
            List<string> author = new List<string>();
            foreach (Match m in matches)
            {
                int idx = m.Value.LastIndexOf("by") + "by ".Length;
                author.Add(m.Value.Substring(idx));
            }
            return author[0].Trim();
        }

        /// <summary>
        /// returns a list of strings, which represent the URLs
        /// for all the articles for the current codeproject user
        /// These URLs are extracted by using the RegEx
        /// pattern "<a href='([-a-zA-Z_/#0-9]*).asp'>" on the 
        /// codeproject web page for the current user
        /// </summary>
        /// <returns>a generic list of strings, which represent the URLs
        /// for all the articles for the current codeproject user</returns>
        private List<string> getArticleURLS()
        {
            string pattern = "<a href='([-a-zA-Z_/#0-9]*).asp'>";
            MatchCollection matches = Regex.Matches(this.webContent, 
                pattern, RegexOptions.ExplicitCapture);
            List<string> urls = new List<string>();
            foreach (Match m in matches)
            {
                urls.Add(m.Value.Replace("<a href='", "").Replace("'>", ""));
            }
            return urls;
        }

        /// <summary>
        /// returns a list of floats, which represent the Popularity
        /// for all the articles for the current codeproject user
        /// These Popularity are extracted by using the RegEx
        /// pattern "Popularity: [0-9.]*" on the codeproject web page 
        /// for the current user
        /// </summary>
        /// <returns>a generic list of floats, which represent the Popularity
        /// for all the articles for the current codeproject user</returns>
        private List<float> getPopularity()
        {
            string pattern = "Popularity: [0-9.]*";
            MatchCollection matches = Regex.Matches(this.webContent, 
                pattern, RegexOptions.ExplicitCapture);
            List<float> lPopularity = new List<float>();
            foreach (Match m in matches)
            {
                int idx = m.Value.LastIndexOf(":") + 2;
                lPopularity.Add(float.Parse(m.Value.Substring(idx)));
            }
            return lPopularity;
        }

        /// <summary>
        /// returns a list of strings, which represent the Ratings
        /// for all the articles for the current codeproject user
        /// These Ratings are extracted by using the RegEx
        /// pattern "Rating: [0-9./]*" on the codeproject web page 
        /// for the current user
        /// </summary>
        /// <returns>a generic list of strings, which represent the Ratings
        /// for all the articles for the current codeproject user</returns>
        private List<string> getRatings()
        {
            string pattern = "Rating: [0-9./]*";
            MatchCollection matches = Regex.Matches(this.webContent, 
                pattern, RegexOptions.ExplicitCapture);
            List<string> lRatings = new List<string>();
            foreach (Match m in matches)
            {
                int idx = m.Value.LastIndexOf(":") + 2;
                lRatings.Add(m.Value.Substring(idx));
            }
            return lRatings;
        }

        /// <summary>
        /// returns a list of longs, which represent the views
        /// for all the articles for the current codeproject user
        /// These views are extracted by using the RegEx
        /// pattern "Page views: [0-9,]*" on the codeproject web page 
        /// for the current user
        /// </summary>
        /// <returns>a generic list of longs, which represent the views
        /// for all the articles for the current codeproject user</returns>
        private List<long> getViews()
        {
            string pattern = "Page views: [0-9,]*";
            MatchCollection matches = Regex.Matches(this.webContent, 
                pattern, RegexOptions.ExplicitCapture);
            List<long> lViews = new List<long>();
            foreach (Match m in matches)
            {
                int idx = m.Value.LastIndexOf(":") + 2;
                lViews.Add(long.Parse(m.Value.Substring(idx).Replace(",", "")));
            }
            return lViews;
        }

        /// <summary>
        /// returns a list of ints, which represent the votes
        /// for all the articles for the current codeproject user
        /// These votes are extracted by using the RegEx
        /// pattern "Votes: [0-9]*" on the codeproject web page 
        /// for the current user
        /// </summary>
        /// <returns>a generic list of ints, which represent the votes
        /// for all the articles for the current codeproject user</returns>
        private List<int> getVotes()
        {
            string pattern = "Votes: [0-9]*";
            MatchCollection collection1 = Regex.Matches(this.webContent, 
                pattern, RegexOptions.ExplicitCapture);
            List<int> lVotes = new List<int>();
            foreach (Match m in collection1)
            {
                int num1 = m.Value.LastIndexOf(":") + 2;
                lVotes.Add(int.Parse(m.Value.Substring(num1)));
            }
            return lVotes;
        }

        /// <summary>
        /// Reads the entire contents of the article summary codeproject web page
        /// for the currently selected user. An example URL for such a codeproject
        /// page may be 
        /// https://codeproject.org.cn/script/articles/list_articles.asp?userid=569009
        /// which would fetch all articles for author 569009 that's Sacha Barber, which
        /// is Me.
        /// </summary>
        private void readInSiteContents()
        {
            WebClient wc = null;
            Stream strm = null;

            try
            {
                //open the codeproject site, for the currently selected user
                //basiaclly get the article summary for the currently selected user
                wc = new WebClient();
                strm = wc.OpenRead(
                    "https://codeproject.org.cn/script/articles/" +
                    "list_articles.asp?userid=" + this.userId);
                //read the contents into the webContent instance field
                using (StreamReader reader = new StreamReader(strm))
                {
                    string line;
                    StringBuilder sBuilder = new StringBuilder();
                    while ((line = reader.ReadLine()) != null)
                    {
                        sBuilder.AppendLine(line);
                    }
                    this.webContent = sBuilder.ToString();
                }
            }
            catch (Exception)
            {
                Program.ErrorBox(
                "Could not access web site https://codeproject.org.cn/script/" +
                "articles/list_articles.asp?userid=" + this.userId);
            }
            finally
            {
                //release the held resources if they need releasing
                if (wc != null) { wc.Dispose(); }
                if (strm != null) { strm.Close(); }
            }
        }
        #endregion
    }
    #endregion
}

CPArticle 类

此类代表一个 CodeProject 文章。

using System;
using System.Collections.Generic;
using System.Text;

namespace VainWebSpider
{
    #region CPArticle CLASS
    /// <summary>
    /// Provides a single code project article summary object, which has
    /// the following properties : Votes, views, popularity, ratings and
    /// an article URL
    /// </summary>
    public class CPArticle
    {
        #region Instance fields
        // Fields
        private string articleURL;
        private float popularity;
        private string ratings;
        private long views;
        private int votes;
        #endregion
        #region Constructor
        /// <summary>
        /// Creates a new CPArticle object, assigning the contructor parameters
        /// to public properties
        /// </summary>
        /// <param name="views">The number of view for the article</param>
        /// <param name="ratings">The ratings for the article</param>
        /// <param name="votes">The number of votes for the article</param>
        /// <param name="popularity">The popularity for the article</param>
        /// <param name="articleURL">The article url</param>
        public CPArticle(long views, string ratings, int votes, 
                         float popularity, string articleURL)
        {
            this.views = views;
            this.ratings = ratings;
            this.votes = votes;
            this.popularity = popularity;
            this.articleURL = articleURL;
        }
        #endregion
        #region Public Properties
        /// <summary>
        /// Gets the Views for the current CPArticle
        /// </summary>
        public long Views
        {
            get { return this.views; }
        }

        /// <summary>
        /// Gets the Ratings for the current CPArticle
        /// </summary>
        public string Ratings
        {
            get { return this.ratings; }
        }

        /// <summary>
        /// Gets the Votes for the current CPArticle
        /// </summary>
        public int Votes
        {
            get { return this.votes; }
        }

        /// <summary>
        /// Gets the Popularity for the current CPArticle
        /// </summary>
        public float Popularity
        {
            get { return this.popularity; }
        }

        /// <summary>
        /// Gets the ArticleURL for the current CPArticle
        /// </summary>
        public string ArticleURL 
        {
            get { return this.articleURL; }
        }
        #endregion
    } 
    #endregion
}

frmLoader 类

此类是显示的初始窗体(完整的 Designer 列表,请参阅附件中的应用程序)。

using System;
using System.Collections.Generic;
using System.ComponentModel;
using System.Data;
using System.Drawing;
using System.Text;
using System.Windows.Forms;
using Microsoft.Win32;

namespace VainWebSpider
{
    #region frmLoader CLASS
    /// <summary>
    /// This form obtains the currently selected user from the registry
    /// (if there is a current user, if may be the 1st run, so there wont be)
    /// by using the <see cref="Program">Programs </see>readFromRegistry(..) 
    /// method. This class also allows the user to change the currently selected user
    /// via clicking a change user hyperlink. The user may also show the 
    /// <see cref="frmMain"> main interface</see> from this form using the hyperlink
    /// provided
    /// </summary>
    public partial class frmLoader : Form
    {
        #region Contructor
        /// <summary>
        /// Constructs a new frmLoader object
        /// </summary>
        public frmLoader()
        {
            InitializeComponent();
        }
        #endregion
        #region Private Methods
        /// <summary>
        /// Allows the user to specify a new UserId
        /// to fetch codeproject articles for by the
        /// use of a <see cref="InputBoxDialog">InputBoxDialog </see>
        /// The value entered must be a postive number
        /// </summary>
        /// <param name="sender">lnkChangeUser</param>
        /// <param name="e">LinkLabelLinkClickedEventArgs</param>
        private void lnkChangeUser_LinkClicked(object sender,
            LinkLabelLinkClickedEventArgs e)
        {
            //get the new userId
            string stringEntered = 
              Program.InputBox("Enter a new user ID to examine",
              "Enter a new user ID", "");
            //check for empty
            if (stringEntered.Equals(string.Empty)) 
            {
                Program.ErrorBox("You must enter a value for the userId");
            }
            else 
            {
                try 
                {
                    //make sure its a positive number, then update the Program
                    //held property
                    long userId = long.Parse(stringEntered);
                    if (userId > 0)
                    {
                        Program.UserID = userId;
                        lblCurrentUser.Text = 
                        "Currently set-up to fetch articles for user ID : " + 
                        Program.UserID.ToString();

                    }
                    else
                    {
                        Program.ErrorBox("User ID must be a postive value");
                    }
                }
                //its not a number that was entered, tell them off
                catch(Exception ex) 
                {
                    Program.ErrorBox("The value you entered was not valid\r\n" +
                                    "The user ID must be a number");
                }
            }
        }

        /// <summary>
        /// Check to see if there is already a user within the registry (from last time)
        /// to fetch codeproject articles for, by using the <see cref="Program">Programs
        ///  </see>readFromRegistry(..) method. And update this forms GUI accordingly
        /// </summary>
        /// <param name="sender">frmLoader</param>
        /// <param name="e">EventArgs</param>
        private void frmLoader_Load(object sender, EventArgs e)
        {
            //check if there is a user in the registry, if there is a user
            //update the Program class and the GUI label
            long userId = Program.readFromRegistry();
            Program.UserID = userId;
            if (userId != -1)
            {
                lblCurrentUser.Text = "Currently set-up to fetch " + 
                                      "articles for user ID : " + userId.ToString();
            }
            else
            {
                lblCurrentUser.Text = "Not setup for any user as yet, " + 
                                      "use the link to pick a new user";
            }
        }

        /// <summary>
        /// Create and show a new <see
        ///    cref="frmMain">frmMain</see> object, and hide this form
        /// </summary>
        /// <param name="sender">lnkLoadMainForm</param>
        /// <param name="e">LinkLabelLinkClickedEventArgs</param>
        private void lnkLoadMainForm_LinkClicked(object sender, 
                     LinkLabelLinkClickedEventArgs e)
        {
            frmMain fMain = new frmMain();
            this.Hide();
            fMain.ShowDialog(this);
        }
        #endregion
    }
    #endregion
}

frmMain 类

此类是主界面(完整的 Designer 列表,请参阅附件中的应用程序)。

using System;
using System.Collections.Generic;
using System.ComponentModel;
using System.Data;
using System.Drawing;
using System.Text;
using System.Windows.Forms;
using Microsoft.Win32;

namespace VainWebSpider
{
    #region frmMain CLASS
    /// <summary>
    /// Creates a new BackGroundWorker which creates a new 
    /// <see cref="WebScreenScraper">WebScreenScraper </see>
    /// and subscribe to its StartParse/EndParse events. If the  WebScreenScraper
    /// data signifies that the currently selected codeproject user has some
    /// articles, get the article data out of the WebScreenScraper, and display
    /// the data in a DataGridView.
    /// </summary>
    /// <param name="sender">BackgroundWorker</param>
    /// <param name="e">DoWorkEventArgs</param>
    public partial class frmMain : Form
    {
        #region Instance Fields
        //instance fields
        private Boolean formShown = true;
       
        #endregion
        #region Contructor
        /// <summary>
        /// Constructs a new frmMain object
        /// </summary>
        public frmMain()
        {
            InitializeComponent();
        }
        #endregion
        #region Private Methods

        /// <summary>
        /// User double clicked the system tray icon, so if the form
        /// is shown it is hidden, if its hidden its shown
        /// </summary>
        /// <param name="sender">The notify icon</param>
        /// <param name="e">The event arguments</param>
        private void nfIcon_DoubleClick(object sender, EventArgs e)
        {
            if (formShown)
            {
                this.Hide();
                formShown = false;
            }
            else
            {
                this.Show();
                formShown = true;
            }
        }

        /// <summary>
        /// Shows the form
        /// </summary>
        /// <param name="sender">The show menu</param>
        /// <param name="e">The event arguments</param>
        private void showFormToolStripMenuItem_Click(object sender, EventArgs e)
        {
            this.Show();
        }

        /// <summary>
        /// Hides the form
        /// </summary>
        /// <param name="sender">The hide menu</param>
        /// <param name="e">The event arguments</param>
        private void hideFormToolStripMenuItem_Click(object sender, EventArgs e)
        {
            this.Hide();
        }

        /// <summary>
        /// Calls the ClearRemoteObjectReference() method if the user confirms they
        /// wish to quit.
        /// </summary>
        /// <param name="sender">The exit menu</param>
        /// <param name="e">The event arguments</param>
        private void exitToolStripMenuItem_Click(object sender, EventArgs e)
        {
            DialogResult dr = MessageBox.Show("Are you sure you want to quit.\r\n" +
                "There may be client connected at present", "Exit",
                 MessageBoxButtons.YesNo, MessageBoxIcon.Question);
            if (dr.Equals(DialogResult.Yes))
            {
                Application.Exit();
            }
        }
        
        /// <summary>
        /// Creates a new <see cref="WebScreenScraper">WebScreenScraper </see>
        /// and subscribe to its StartParse/EndParse events. If the  WebScreenScraper
        /// data signifies that the currently selected codeproject user has some
        /// articles, get the article data out of the WebScreenScraper, and display
        /// the data in a DataGridView.
        /// </summary>
        /// <param name="sender">BackgroundWorker</param>
        /// <param name="e">DoWorkEventArgs</param>
        private void bgw_DoWork(object sender, DoWorkEventArgs e)
        {
            //create a new WebScreenScraper and subscribe to its events
            WebScreenScraper wss = new WebScreenScraper(Program.UserID);
            wss.StartParse += new EventHandler(wss_StartParse);
            wss.EndParse += new EventHandler(wss_EndParse);
            //get the initial article summary area only, discard the other 
            //text that doesnt hold any text we need to parse
            wss.getInitialData();

            //need to test for an invoke initially, as the BackgroundWorker
            //that is run to do the web site parsing is on a different handle
            //to that of this forms controls, so will need to be marshalled to
            //the correct thread handle, on order to change properties
            if (this.InvokeRequired)
            {
                this.Invoke(new EventHandler(delegate
                {
                    //are there any articles for the current user
                    if (wss.HasArticles)
                    {
                        //only worry about getting the rest if the
                        //author has articles
                        DataTable dt = wss.getWebData();
                        lblCurrentUser.Text = wss.AuthorName + " " +
                            wss.NoOfArticles + " articles available";
                        
                        //check there is at least 1 article, before showing the 
                        //article DataGridView
                        if (dt.Rows.Count > 0)
                        {
                            dgArticles.Columns.Clear();
                            dgArticles.DataSource = dt;
                            alterColumns();
                            resizeColumns();
                            dgArticles.Visible = true;
                            pnlResults.Visible = true;
                            this.Invalidate();
                            Application.DoEvents();
                        }
                        //known author, but no articles to show
                        else
                        {
                            dgArticles.Visible = false;
                            pnlResults.Visible = false;
                            this.Invalidate();
                            Application.DoEvents();
                        }
                    }
                    //there are no articles to show, so update GUI to show this
                    else
                    {
                        pnlResults.Visible = false;
                        lblCurrentUser.Text = "Unknown Or Unpublished Author";
                        lblProgress.Visible = false;
                        prgBar.Visible = false;
                        dgArticles.Visible = false;
                        pnlResults.Visible = false;
                        pnlUser.Visible = true;
                        this.Invalidate();
                        Application.DoEvents();
                        Program.InfoBox(
                            "There are no CodeProject articles avaialble for user ("
                            + Program.UserID + ")");
                    }
                }));
            }
        }

        /// <summary>
        /// Alter the article DataGridView columns, by firstly adding an image column
        /// which will be a new column index of 4. And then Delete the auto mapped
        /// "ArticleURL" column, and create a new DataGridViewLinkColumn column for
        /// the "ArticleURL" column, which will be column index 5.
        /// </summary>
        private void alterColumns()
        {

            //need to catch this, as this column may not be in existence
            //when the request to remove it is made.
            try
            {
                //remove existing ArticleURL column
                dgArticles.Columns.Remove("ArticleURL");
            }
            catch (Exception ex)
            {
                //cant do much about the removal of a non-existent column
            }
            //create a new image column
            DataGridViewImageColumn imgs = new DataGridViewImageColumn();
            imgs.Image = global::VaneWebSpider.FormResources.LinkIcon;
            imgs.DisplayIndex = 0;
            imgs.Width = 40;
            dgArticles.Columns.Add(imgs);
            //create a new hyperlink column
            DataGridViewLinkColumn links = new DataGridViewLinkColumn();
            links.HeaderText = "ArticleURL";
            links.DataPropertyName = "ArticleURL";
            links.ActiveLinkColor = Color.Blue;
            links.LinkBehavior = LinkBehavior.SystemDefault;
            links.LinkColor = Color.Blue;
            links.SortMode = DataGridViewColumnSortMode.Automatic;
            links.TrackVisitedState = true;
            links.VisitedLinkColor = Color.Blue;
            links.DisplayIndex = 1;
            links.Width = 300;
            dgArticles.Columns.Add(links);
        }

        /// <summary>
        /// Resize all article DataGridView columns to fixed sizes
        /// </summary>
        private void resizeColumns()
        {
            //resize all other columns to have default width of 60
            dgArticles.Columns[0].Width = 60; //Views column
            dgArticles.Columns[1].Width = 60; //Ratings column
            dgArticles.Columns[2].Width = 60; //Votes column
            dgArticles.Columns[3].Width = 60; //Popularity column
        }

        /// <summary>
        /// Puts all the GUI components into a EndParse state
        /// </summary>
        /// <param name="sender"><see cref="WebScreenScraper">
        /// The WebScreenScraper</param>
        /// <param name="e">EventArgs</param>
        private void wss_EndParse(object sender, EventArgs e)
        {
            lblProgress.Visible = false;
            prgBar.Visible = false;
            pnlUser.Visible = true;
            pnlGridMainFill.Visible = true;
            this.Invalidate();
            Application.DoEvents();
        }

        /// <summary>
        /// Puts all the GUI components into a StartParse state
        /// </summary>
        /// <param name="sender"><see cref="WebScreenScraper">
        /// The WebScreenScraper</param>
        /// <param name="e">EventArgs</param>
        private void wss_StartParse(object sender, EventArgs e)
        {
            //need to test for an invoke initially, as the BackgroundWorker
            //that is run to do the web site parsing is on a different handle
            //to that of this forms controls, so will need to be marshalled to
            //the correct thread handle, on order to change properties
            if (this.InvokeRequired)
            {
                this.Invoke(new EventHandler(delegate
                {
                    lblProgress.Visible = true;
                    prgBar.Visible = true;
                    this.Invalidate();
                    Application.DoEvents();
                }));
            }
        }

        /// <summary>
        /// If the column of the DataridView clicked was the link column
        /// call the startProcess, passing it the correct URL to navigate to
        /// </summary>
        /// <param name="sender"></param>
        /// <param name="e"></param>
        private void dgArticles_CellContentClick(object sender, 
                     DataGridViewCellEventArgs e)
        {
            int LINK_COLUMN_INDEX = 5;
            //the link column is index 5,
            //as it was created at index 5, as there were
            //originally 5 auto generated columns
            //created by the WebScreenScraper.createDataSet() 
            //method, but then we deleted that auto-generated
            //column, and swapped it for a hyperlink
            //column which was added to the end of the
            //existing auto-generated columns. Thats why its
            //at index 5 which is a little strange, but there you go.
            if (e.ColumnIndex == LINK_COLUMN_INDEX)
            {
                startProcess(@"https://codeproject.org.cn" +
                    dgArticles[e.ColumnIndex, e.RowIndex].Value.ToString());
            }
        }

        /// <summary>
        /// Attempts to start the process which has
        /// the name of the parameter supplied, So
        /// long as the process is a URL. Must start
        /// with www or http, as we are attempting
        /// to start a web browser
        /// </summary>
        /// <param name="target">The process to start</param>
        private void startProcess(string target)
        {
            // If the value looks like a URL, navigate to it.
            if (null != target && (target.StartsWith("www") || 
                               target.StartsWith("http")))
            {
                try
                {
                    System.Diagnostics.Process.Start(target);
                }
                catch (Exception ex)
                {
                    Program.ErrorBox("Problem with starting process " + target);
                }
            }
        }

        /// <summary>
        /// Creates a new BackgroundWorker thread and calls the 
        /// BackgroundWorkers bgw_DoWork(..) method, where the 
        /// argument is the value of the <see cref="Program">
        /// Program classes </see>UserID
        /// </summary>
        /// <param name="sender">frmMain</param>
        /// <param name="e">EventArgs</param>
        private void frmMain_Load(object sender, EventArgs e)
        {
            pnlUser.Visible = false;
            pnlGridMainFill.Visible = false;
            BackgroundWorker bgw = new BackgroundWorker();
            bgw.DoWork += new DoWorkEventHandler(bgw_DoWork);
            bgw.RunWorkerAsync(Program.UserID);
        }

        /// <summary>
        /// Allows the user to specify a new UserId
        /// to fetch codeproject articles for by the
        /// use of a <see cref="InputBoxDialog">InputBoxDialog </see>
        /// The value entered must be a postive number
        /// </summary>
        /// <param name="sender">lnkChangeUser</param>
        /// <param name="e">LinkLabelLinkClickedEventArgs</param>
        private void lnkChangeUser_LinkClicked(object sender, 
                     LinkLabelLinkClickedEventArgs e)
        {
            //get the new userId
            string stringEntered = 
                Program.InputBox("Enter a new user ID to examine",
                "Enter a new user ID", "");
            //check for empty
            if (stringEntered.Equals(string.Empty)) 
            {
                Program.ErrorBox("You must enter a value for the userId");
            }
            else 
            {
                try 
                {
                    //make sure its a positive number, then update the Program
                    //held property
                    long uId = long.Parse(stringEntered);
                    if (uId > 0)
                    {
                        Program.UserID = uId;
                        BackgroundWorker bgw = new BackgroundWorker();
                        bgw.DoWork += new DoWorkEventHandler(bgw_DoWork);
                        bgw.RunWorkerAsync(Program.UserID);
                    }
                    else
                    {
                        Program.ErrorBox("User ID must be a postive value");
                    }
                }
                //its not a number that was entered, tell them off
                catch(Exception ex) 
                {
                    Program.ErrorBox("The value you entered was not valid\r\n" +
                                    "The user ID must be a number");
                }
            }
        }

        /// <summary>
        /// Hide the notify icon, and shutdown the application
        /// </summary>
        /// <param name="sender">frmMain</param>
        /// <param name="e">FormClosedEventArgs</param>
        private void frmMain_FormClosed(object sender, FormClosedEventArgs e)
        {
            nfIcon.Visible = false;
            Application.Exit();
        }

        /// <summary>
        /// Create and show a new <see
        ///     cref="frmPie">frmPie</see> object, and hide this form
        /// </summary>
        /// <param name="sender">lnkResults</param>
        /// <param name="e">LinkLabelLinkClickedEventArgs</param>
        private void lnkResults_LinkClicked(object sender, 
                     LinkLabelLinkClickedEventArgs e)
        {
            frmPie fPie = new frmPie();
            fPie.GridIsUse = dgArticles;
            fPie.AuthorString = lblCurrentUser.Text;
            this.Hide();
            fPie.ShowDialog(this);
            this.Show();
        }
        #endregion
    }
    #endregion
}

frmPie 类

此类是饼图显示窗口(完整的 Designer 列表,请参阅附件中的应用程序)。此窗体使用了 Julijan Sribar 的第三方 DLL,可以在 CodeProject 上找到,网址为:pie library。功劳归于应得者。谢谢 Julijan,出色的工作。

using System;
using System.Collections.Generic;
using System.ComponentModel;
using System.Data;
using System.Drawing;
using System.Text;
using System.Windows.Forms;
using Microsoft.Win32;

//https://codeproject.org.cn/csharp/julijanpiechart.asp
using System.Drawing.PieChart;


namespace VainWebSpider
{
    #region frmPie CLASS
    /// <summary>
    /// Sets up the pie chart with the values that match
    /// the data type the user selected "views", "votes", 
    /// "popularity", "ratings"
    /// </summary>
    public partial class frmPie : Form
    {
        #region Instance Fields
        //instance fields
        private DataGridView gridInUse;
        private int CLUSTERED_THRESHOLD = 20;
        #endregion
        #region Contructor
        /// <summary>
        /// Constructs a new frmPie object
        /// </summary>
        public frmPie()
        {
            InitializeComponent();
        }
        #endregion
        #region Public Properties
        /// <summary>
        /// Sets the <see cref="DataGridView">DataGridView</see> to use
        /// </summary>
        public DataGridView GridIsUse
        {
            set { gridInUse = value; }
        }

        /// <summary>
        /// Sets the AuthorString to use
        /// </summary>
        public string AuthorString
        {
            set { lblCurrentUser.Text = value; }
        }
        #endregion
        #region Private Methods
        /// <summary>
        /// Calls the populatePieData() method and sets up some
        /// other miscellaneous pie chart values
        /// </summary>
        private void setupPie()
        {
            populatePieData();
            pnlPie.Font = new Font("Arial", 8F);
            pnlPie.ForeColor = SystemColors.WindowText;
            pnlPie.EdgeColorType = EdgeColorType.DarkerThanSurface;
            pnlPie.LeftMargin = 10F;
            pnlPie.RightMargin = 10F;
            pnlPie.TopMargin = 10F;
            pnlPie.BottomMargin = 10F;
            pnlPie.SliceRelativeHeight = 0.25F;
            pnlPie.InitialAngle = -90F;
        }

        /// <summary>
        /// Sets up the pie chart with the values that match
        /// the data type the user selected "views", "votes", 
        /// "popularity", "ratings"
        /// </summary>
        private void populatePieData()
        {
            //Switch on the current data type
            switch (cmbViewData.SelectedItem.ToString().ToLower())
            {
                //Views DataGridView column = 0
                //Rating DataGridView column = 1
                //Votes DataGridView column = 2
                //Popularity DataGridView column = 3
                //URL DataGridView column = 5
                case "views" :
                    getGridData("views", 0);
                    break;
                case "votes":
                    getGridData("votes", 2);
                    break;
                case "popularity":
                    getGridData("popularity", 3);
                    break;
                case "ratings":
                    getGridData("ratings", 1);
                    break;
                default:
                    getGridData("views", 0);
                    break;
            }
        }

        /// <summary>
        /// Returns a single dimesion decimal array of data, extracted
        /// from this forms gridInUse field, which is then used to display
        /// on the embedded pie chart
        /// </summary>
        /// <param name="type">The type
        ///       of columns "views", "votes", 
        /// "popularity", "ratings" </param>
        /// <param name="column">Column number 0-3</param>
        private void getGridData(string type, int column)
        {
            try
            {
                //setup some golding fields for the pie data
                int qty = gridInUse.RowCount;
                decimal[] results = new decimal[qty];
                string[] pieToolTips = new string[qty];
                string[] pieText = new string[qty];
                float[] pieRelativeDisplacements = new float[qty];
                Color[] pieColors = new Color[qty];
                int alpha = 60;
                Random rnd = new Random();
                Color[] colorsAvailable = new Color[] { Color.FromArgb(alpha, Color.Red), 
                                                Color.FromArgb(alpha, Color.Green), 
                                                Color.FromArgb(alpha, Color.Yellow), 
                                                Color.FromArgb(alpha, Color.Blue),
                                                Color.FromArgb(alpha, Color.CornflowerBlue), 
                                                Color.FromArgb(alpha, Color.Cyan), 
                                                Color.FromArgb(alpha, Color.DarkGreen), 
                                                Color.FromArgb(alpha, Color.PeachPuff),
                                                Color.FromArgb(alpha, Color.Plum), 
                                                Color.FromArgb(alpha, Color.Peru)         };
                //loop through the grid and set up the pie chart to use the grids data
                for (int i = 0; i < gridInUse.RowCount; i++)
                {
                    //Views DataGridView column = 0
                    //Rating DataGridView column = 1
                    //Votes DataGridView column = 2
                    //Popularity DataGridView column = 3
                    //URL DataGridView column = 5
                    pieToolTips[i] = "URL " + gridInUse[5, i].Value.ToString() + " " +
                                     "Views " + gridInUse[0, i].Value.ToString() + " " +
                                     "Rating " + gridInUse[1, i].Value.ToString() + " " +
                                     "Votes " + gridInUse[2, i].Value.ToString() + " " +
                                     "Popularity " + gridInUse[3, i].Value.ToString();
                    if (type.Equals("ratings"))
                    {
                        string val = gridInUse[column, i].Value.ToString();
                        int idx = val.LastIndexOf("/");
                        string sNewValue = val.Substring(0, idx);
                        results[i] = decimal.Parse(sNewValue);
                    }
                    else
                    {
                        results[i] = decimal.Parse(gridInUse[column, i].Value.ToString());
                    }
                    //if there are loads of articles, we dont want any text on pie chunks
                    //as it becomes illegible
                    if (gridInUse.RowCount < CLUSTERED_THRESHOLD)
                    {
                        pieText[i] = gridInUse[column, i].Value.ToString();
                    }
                    else
                    {
                        pieText[i] = " ";
                    }
                    pieRelativeDisplacements[i] = 0.1F;
                    int idxColor = rnd.Next(0, colorsAvailable.Length - 1);
                    pieColors[i] = colorsAvailable[idxColor];
                }
                //update the pie components
                pnlPie.ToolTips = pieToolTips;
                pnlPie.Texts = pieText;
                pnlPie.SliceRelativeDisplacements = pieRelativeDisplacements;
                pnlPie.Colors = pieColors;
                pnlPie.Values = results;

            }
            catch (Exception ex)
            {
                //Cant do much about it, but catch it all the same.
                //just dont update pie chart if we get an Exception
            }
        }

        /// <summary>
        /// Selects the 1st index in the cmbViewData combobox and then
        /// Calls the setupPie() method
        /// </summary>
        /// <param name="sender">frmPie</param>
        /// <param name="e">EventArgs</param>
        private void frmPie_Load(object sender, EventArgs e)
        {
            cmbViewData.SelectedIndex = 1;
            setupPie();
        }

        /// <summary>
        /// Calls the setupPie() method
        /// </summary>
        /// <param name="sender">cmbViewData</param>
        /// <param name="e">EventArgs</param>
        private void cmbViewData_SelectedValueChanged(object sender, EventArgs e)
        {
            setupPie();
        }
        #endregion
    }
    #endregion
}

InputBoxDialog 类

此类是一个简单的输入框。

using System;
using System.Drawing;
using System.Collections;
using System.ComponentModel;
using System.Windows.Forms;

namespace VainWebSpider
{
    #region InputBoxDialog CLASS
    /// <summary>
    /// Provides a generic modal text input box, for use with
    /// any other form
    /// </summary>
    public class InputBoxDialog : System.Windows.Forms.Form
    {
        #region Instance Fields
        //instance fields
        string formCaption = string.Empty;
        string formPrompt = string.Empty;
        string inputResponse = string.Empty;
        string defaultValue = string.Empty;
        private System.Windows.Forms.Label lblPrompt;
        private System.Windows.Forms.Button btnOK;
        private System.Windows.Forms.Button btnCancel;
        private System.Windows.Forms.TextBox txtInput;
        private System.ComponentModel.Container components = null;
        #endregion
        #region Constructor
        /// <summary>
        /// Constructs a new InputBoxDialog object
        /// </summary>
        public InputBoxDialog()
        {
            InitializeComponent();
        }
        #endregion
        #region Windows Form Designer generated code
        /// <summary>
        /// Required method for Designer support - do not modify
        /// the contents of this method with the code editor.
        /// 
        private void InitializeComponent()
        {
            this.lblPrompt = new System.Windows.Forms.Label();
            this.btnOK = new System.Windows.Forms.Button();
            this.btnCancel = new System.Windows.Forms.Button();
            this.txtInput = new System.Windows.Forms.TextBox();
            this.SuspendLayout();
            // 
            // lblPrompt
            // 
            this.lblPrompt.Anchor = (
                (System.Windows.Forms.AnchorStyles)((((
                System.Windows.Forms.AnchorStyles.Top | 
                System.Windows.Forms.AnchorStyles.Bottom)
                        | System.Windows.Forms.AnchorStyles.Left)
                        | System.Windows.Forms.AnchorStyles.Right)));
            this.lblPrompt.BackColor = System.Drawing.SystemColors.Control;
            this.lblPrompt.Font = new System.Drawing.Font("Microsoft Sans Serif",
                8.25F, System.Drawing.FontStyle.Regular,
                System.Drawing.GraphicsUnit.Point, ((byte)(0)));
            this.lblPrompt.Location = new System.Drawing.Point(9, 35);
            this.lblPrompt.Name = "lblPrompt";
            this.lblPrompt.Size = new System.Drawing.Size(302, 22);
            this.lblPrompt.TabIndex = 3;
            // 
            // btnOK
            // 
            this.btnOK.DialogResult = System.Windows.Forms.DialogResult.OK;
            this.btnOK.Location = new System.Drawing.Point(265, 59);
            this.btnOK.Name = "btnOK";
            this.btnOK.Size = new System.Drawing.Size(60, 20);
            this.btnOK.TabIndex = 1;
            this.btnOK.Text = "Ok";
            this.btnOK.Click += new System.EventHandler(this.btnOK_Click);
            // 
            // btnCancel
            // 
            this.btnCancel.DialogResult = System.Windows.Forms.DialogResult.Cancel;
            this.btnCancel.Location = new System.Drawing.Point(331, 59);
            this.btnCancel.Name = "btnCancel";
            this.btnCancel.Size = new System.Drawing.Size(60, 20);
            this.btnCancel.TabIndex = 2;
            this.btnCancel.Text = "Cancel";
            this.btnCancel.Click += new System.EventHandler(this.btnCancel_Click);
            // 
            // txtInput
            // 
            this.txtInput.Location = new System.Drawing.Point(8, 59);
            this.txtInput.MaxLength = 40;
            this.txtInput.Name = "txtInput";
            this.txtInput.Size = new System.Drawing.Size(251, 20);
            this.txtInput.TabIndex = 0;
            // 
            // InputBoxDialog
            // 
            this.AutoScaleBaseSize = new System.Drawing.Size(5, 13);
            this.ClientSize = new System.Drawing.Size(398, 103);
            this.Controls.Add(this.txtInput);
            this.Controls.Add(this.btnCancel);
            this.Controls.Add(this.btnOK);
            this.Controls.Add(this.lblPrompt);
            this.FormBorderStyle = System.Windows.Forms.FormBorderStyle.FixedDialog;
            this.KeyPreview = true;
            this.MaximizeBox = false;
            this.MinimizeBox = false;
            this.Name = "InputBoxDialog";
            this.StartPosition = System.Windows.Forms.FormStartPosition.CenterScreen;
            this.Text = "InputBox";
            this.KeyDown += new System.Windows.Forms.KeyEventHandler(
                this.InputBoxDialog_KeyDown);
            this.Load += new System.EventHandler(this.InputBox_Load);
            this.ResumeLayout(false);
            this.PerformLayout();
        }

        #region Dispose
        /// <summary>
        /// Clean up any resources being used.
        /// 
        protected override void Dispose(bool disposing)
        {
            if (disposing)
            {
                if (components != null)
                {
                    components.Dispose();
                }
            }
            base.Dispose(disposing);
        }

        #endregion
        #endregion
        #region Public Properties
        // property FormCaption
        public string FormCaption
        {
            get { return formCaption; }
            set { formCaption = value; }
        }
        // property FormPrompt
        public string FormPrompt
        {
            get { return formPrompt; }
            set { formPrompt = value; }
        }
        // property InputResponse
        public string InputResponse
        {
            get { return inputResponse; }
            set { inputResponse = value; }
        }
        // property DefaultValue
        public string DefaultValue
        {
            get { return defaultValue; }
            set { defaultValue = value; }
        } 

        #endregion
        #region Form and Control Events
        /// <summary>
        /// The InputBoxDialog form load event, sets focus to the
        /// txtInput control
        /// </summary>
        /// <param name="sender">The InputBoxDialog</param>
        /// <param name="e">The event arguments</param>
        private void InputBox_Load(object sender, System.EventArgs e)
        {
            this.txtInput.Text = defaultValue;
            this.lblPrompt.Text = formPrompt;
            this.Text = formCaption;
            this.txtInput.SelectionStart = 0;
            this.txtInput.SelectionLength = this.txtInput.Text.Length;
            this.txtInput.Focus();
        }

        /// <summary>
        /// The btnOk click event, sets the InputResponse=txtInput
        /// and then closes the form
        /// </summary>
        /// <param name="sender">The btnOK</param>
        /// <param name="e">The event arguments</param>
        private void btnOK_Click(object sender, System.EventArgs e)
        {
            InputResponse = this.txtInput.Text;
            this.Close();
        }

        /// <summary>
        /// The btnCancel click event, closes the form
        /// </summary>
        /// <param name="sender">The btnCancel</param>
        /// <param name="e">The event arguments</param>
        private void btnCancel_Click(object sender, System.EventArgs e)
        {
            this.Close();
        }

        /// <summary>
        /// The InputBoxDialog key down event, if the key == Enter, sets the
        /// InputResponse=txtInput and then closes the form
        /// </summary>
        /// <param name="sender">The InputBoxDialog</param>
        /// <param name="e">The event arguments</param>
        private void InputBoxDialog_KeyDown(object sender, KeyEventArgs e)
        {
            if (e.KeyCode == Keys.Enter)
            {
                InputResponse = this.txtInput.Text;
                this.Close();
            }
        }
        #endregion
    }
    #endregion
}

演示截图

显示的第一个屏幕是窗体 (frmLoader),如下所示:

VainWebSpider 用户可以选择另一个 CodeProject 用户来获取文章,或者可以使用提供的链接直接进入主界面。

从上面的截图中可以看到,应用程序已经配置了一个用户。此用户 ID 存储在注册表中。每次选择新用户 ID 时,注册表都会更新。

VainWebSpider 键及其关联值存储在 HKEY_LOCAL_MACHINE\SOFTWARE\ 下;将创建一个名为“VainWebSpider”的新文件夹。并且当前选定的用户 ID 将存储在 VainWebSpider 键中。这样,VainWebSpider 应用程序在启动时就能知道它上次使用的是哪个用户,甚至是否曾有过用户;如果 VainWebSpider 应用程序是第一次运行,将不会有任何注册表键或关联值。当然,一旦选择了用户 ID,它们就会被创建。

主界面 (frmMain) 加载时如下所示,当前选定用户的文章全部显示在一个标准的 ADO.NET DataGridView 中。用户可以使用任何列标题对这些条目进行排序;他们还可以通过单击每篇文章提供的超链接来打开文章。

主界面 (frmMain) 还在系统托盘中提供了一个通知图标,允许 VainWebSpider 用户隐藏/显示主界面 (frmMain) 或完全退出应用程序。

从主界面 (frmMain) 中,VainWebSpider 用户还可以选择使用饼图检查网页结果(非常感谢 Julijan Sribar 提供的出色(甚至获奖)的饼图库,可在此处获取),我只是必须找到一个用途。VainWebSpider 用户可以选择在饼图中显示哪些结果。当用户将鼠标悬停在饼图块上时,工具提示将显示所有网页结果。

VainWebSpider 用户还可以使用主界面 (frmMain) 中的“选择另一个用户”链接来选择一个新用户。应用程序将通过使用输入框 (inputboxdialog) 来获取新条目。如果输入的值是正整数,则会查询相关网站并提取新数据。以下是 CodeProject 用户编号 1(CodeProject 的创始人 Chris Maunder)的示例。

可以看到,Chris Maunder 有相当多的文章,在本文发布时有 102 篇。因此,饼图不包含饼图块上的任何文本。这是由于处理拥有大量文章的 CodeProject 用户时出现的视觉清晰度问题。饼图变得太拥挤了。

您觉得怎么样?

就是这样了。我只想请求一下,如果您喜欢这篇文章,请为它投票。

结论

我认为本文展示了从可能非常庞大的数据量中提取所需数据的难度有多么容易。就个人而言,我对它的工作方式非常满意,并且可能会使用它,因为它比我打开 Firefox 并转到我的文章,然后查看它们更快,而且它还能以漂亮的饼图显示(再次感谢 Julijan Sribar 提供的出色(甚至获奖)的饼图库,可在此处获取)。

Bug

据我所知,没有。

历史

  • v1.0: 22/12/06: 初始发布。
© . All rights reserved.