使用 WebDriver 的 CodeProject 统计计算器

Anton Angelov

5.00/5 (2投票s)

2017 年 1 月 18 日

Ms-PL

3分钟阅读

6310

一个用于为作者的特定年份或从开始至今的文章创建报告的工具。计算总浏览量。

下载应用程序

下载完整源代码

引言

我决定创建一个新的文章系列，专门介绍自动化工具 - 自动化工具系列。我将与您分享的第一个工具是我为我上一篇文章构建的 - 用 2016 年最佳自动化星球加速你的 2017 年。在那里，我必须计算我的 CodeProject 文章的总浏览量。我用计算器用老办法做了几次这个练习。这次我告诉自己我太懒了，不想再做第三次了，于是就诞生了一个新工具。

问题是什么？

我是一个总是有计划，热爱数字和统计的男人。因此，为了衡量我在写作方面的成功，我想知道我的文章总共被阅读了多少次。但是，CodeProject 没有提供这样的信息。您只能看到每篇文章当前的浏览量。

该页面有一个 RSS 源。不幸的是，浏览量未包含在内，它们始终等于 0。

最初，我计划将我的应用程序基于 RSS，但由于上述缺点，我无法做到。所以我需要找出另一种方法来获取数字。

此外，我对该工具的另一个要求是，我希望能够仅提取特定年份以及从开始至今的上述信息。这样我就可以观察我最新文章的受欢迎程度。

如何使用该应用程序？

我将该应用程序设计为可以使用参数从控制台调用。

将根据您的输入生成以下格式的表格。您可以提取特定年份或从开始至今的文章信息。包含的列是：浏览量、标题、发布日期和 URL。

CodeProjectStatisticsCalculator.exe --y -1 --p "allTime.csv" --i 11449574

上面的行将生成一个报告，其中包含从开始至今关于您的所有文章的信息。

参数

--y -1 使用此参数，您可以指定要生成报告的年份。如果将 -1 更改为 2016，该工具将仅提取该特定年份的数据。

--p "allTime.csv" 在 --p 之后，您需要指定存储数据的 excel 表格的路径和名称。它应该是 CSV 文件格式。

--i 11449574 在 --i 之后，您需要指定您的公共 CodeProject 帐户的 ID。您可以从 URL 中获取它。

它是如何工作的？

该工具基于使用 WebDriver 的 UI 测试自动化。它利用了 Page Object 设计模式。主要逻辑位于 ArticlesPage 类中。

public partial class ArticlesPage : BasePage
{
    private string viewsRegex = @".*Views: (?<Views>[0-9,]{1,})";
    private string publishDateRegex = @".*Posted: (?<PublishDate>[0-9,A-Za-z ]{1,})";
    private readonly int profileId;
    public ArticlesPage(IWebDriver driver, int profileId) : base(driver)
    {
        this.profileId = profileId;
    }
    public override string Url
    {
        get
        {
            return string.Format("https://codeproject.org.cn/script/Articles/MemberArticles.aspx?amid={0}", this.profileId);
        }
    }
    public void Navigate(string part)
    {
        base.Open(part);
    }
    public List<Article> GetArticlesByUrl(string sectionPart)
    {
        this.Navigate(sectionPart);
        var articlesInfos = new List<Article>();
        foreach (var articleRow in this.ArticlesRows.ToList())
        {
            if (!articleRow.Displayed)
            {
                continue;
            }
            var article = new Article();
            var articleTitleElement = this.GetArticleTitleElement(articleRow);
            article.Title = articleTitleElement.GetAttribute("innerHTML");
            article.Url = articleTitleElement.GetAttribute("href");
            var articleStatisticsElement = this.GetArticleStatisticsElement(articleRow);
            string articleStatisticsElementSource = articleStatisticsElement.GetAttribute("innerHTML");
            if (!string.IsNullOrEmpty(articleStatisticsElementSource))
            {
                article.Views = this.GetViewsCount(articleStatisticsElementSource);
                article.PublishDate = this.GetPublishDate(articleStatisticsElementSource);
            }
            articlesInfos.Add(article);
        }
        return articlesInfos;
    }
    private double GetViewsCount(string articleStatisticsElementSource)
    {
        var regexViews = new Regex(viewsRegex, RegexOptions.Singleline);
        Match currentMatch = regexViews.Match(articleStatisticsElementSource);
        if (!currentMatch.Success)
        {
            throw new ArgumentException("No content for the current statistics element.");
        }
        return double.Parse(currentMatch.Groups["Views"].Value);
    }
    private DateTime GetPublishDate(string articleStatisticsElementSource)
    {
        var regexPublishDate = new Regex(publishDateRegex, RegexOptions.IgnorePatternWhitespace);
        Match currentMatch = currentMatch = regexPublishDate.Match(articleStatisticsElementSource);
        if (!currentMatch.Success)
        {
            throw new ArgumentException("No content for the current statistics element.");
        }
        return DateTime.Parse(currentMatch.Groups["PublishDate"].Value);
    }
}

主要工作流程位于公共方法 GetArticlesByUrl 中。您需要指定要加载的 URL 部分 - #Article、#TechnicalBlog 或 #Tip。该页面的构造函数需要配置文件的 ID 来构建整个 URL。

提取标题和 URL

我们定位所有文章行并遍历它们。我们通过下面的 XPath 表达式找到它们。

public ReadOnlyCollection<iwebelement> ArticlesRows
{
    get
    {
        return this.driver.FindElements(By.XPath("//tr[contains(@id,'CAR_MainArticleRow')]"));
    }
}

对于每一行，我们提取标题和 URL。为此，我们在每一行中找到锚点元素。从 href 属性中提取 URL，并从内部 HTML 中提取标题。

提取浏览量和发布日期

要获取发布日期和浏览量，我们需要找到统计信息 DIV。

public IWebElement GetArticleStatisticsElement(IWebElement articleRow)
{
    return articleRow.FindElement(By.CssSelector("div[id$='CAR_SbD']"));
}

我们再次使用 XPath 在当前行中查找此元素。

private string viewsRegex = @".*Views: (?<Views>[0-9,]{1,})";
private string publishDateRegex = @".*Posted: (?<PublishDate>[0-9,A-Za-z ]{1,})";
private double GetViewsCount(string articleStatisticsElementSource)
{
    var regexViews = new Regex(viewsRegex, RegexOptions.Singleline);
    Match currentMatch = regexViews.Match(articleStatisticsElementSource);
    if (!currentMatch.Success)
    {
        throw new ArgumentException("No content for the current statistics element.");
    }
    return double.Parse(currentMatch.Groups["Views"].Value);
}
private DateTime GetPublishDate(string articleStatisticsElementSource)
{
    var regexPublishDate = new Regex(publishDateRegex, RegexOptions.IgnorePatternWhitespace);
    Match currentMatch = currentMatch = regexPublishDate.Match(articleStatisticsElementSource);
    if (!currentMatch.Success)
    {
        throw new ArgumentException("No content for the current statistics element.");
    }
    return DateTime.Parse(currentMatch.Groups["PublishDate"].Value);
}

我认为提取信息的最简单方法是使用 Regex 表达式。

程序的主体

class Program
{
    private static string filePath = string.Empty;
    private static string yearInput = string.Empty;
    private static int year = -1;
    private static int profileId = 0;
    private static string profileIdInput = string.Empty;
    private static List<Article> articlesInfos;
    static void Main(string[] args)
    {
        var commandLineParser = new FluentCommandLineParser();
        commandLineParser.Setup<string>('p', "path").Callback(s => filePath = s);
        commandLineParser.Setup<string>('y', "year").Callback(y => yearInput = y);
        commandLineParser.Setup<string>('i', "profileId").Callback(p => profileIdInput = p);
        commandLineParser.Parse(args);
        bool isProfileIdCorrect = int.TryParse(profileIdInput, out profileId);
        if (string.IsNullOrEmpty(profileIdInput) || !isProfileIdCorrect)
        {
            Console.WriteLine("Please specify a correct profileId.");
            return;
        }
        if (string.IsNullOrEmpty(filePath))
        {
            Console.WriteLine("Please specify a correct file path.");
            return;
        }
        if (!string.IsNullOrEmpty(yearInput))
        {
            bool isYearCorrect = int.TryParse(yearInput, out year);
            if (!isYearCorrect)
            {
                Console.WriteLine("Please specify a correct year!");
                return;
            }
        }
        articlesInfos = GetAllArticlesInfos();
        if (year == -1)
        {
            CreateReportAllTime();
        }
        else
        {
            CreateReportYear();
        }
        Console.WriteLine("Total VIEWS: {0}", articlesInfos.Sum(x => x.Views));
        Console.ReadLine();
    }
    private static void CreateReportAllTime()
    {
        TextWriter textWriter = new StreamWriter(filePath);
        var csv = new CsvWriter(textWriter);
        csv.WriteRecords(articlesInfos.OrderByDescending(x => x.Views));
    }
    private static void CreateReportYear()
    {
        TextWriter currentYearTextWriter = new StreamWriter(filePath);
        var csv = new CsvWriter(currentYearTextWriter);
        csv.WriteRecords(articlesInfos.Where(x => x.PublishDate.Year.Equals(year)).OrderByDescending(x => x.Views));
    }
    private static List<Article> GetAllArticlesInfos()
    {
        var articlesInfos = new List<Article>();
        using (var driver = new ChromeDriver())
        {
            var articlePage = new ArticlesPage(driver, profileId);
            articlesInfos.AddRange(articlePage.GetArticlesByUrl("#Articles"));
        }
        using (var driver = new ChromeDriver())
        {
            var articlePage = new ArticlesPage(driver, profileId);
            articlesInfos.AddRange(articlePage.GetArticlesByUrl("#TechnicalBlog"));
        }
        using (var driver = new ChromeDriver())
        {
            var articlePage = new ArticlesPage(driver, profileId);
            articlesInfos.AddRange(articlePage.GetArticlesByUrl("#Tip"));
        }
        return articlesInfos;
    }
}

上面的代码中有几个重要的部分。

参数解析器

对于参数解析器，我使用了 FluentCommandLineParser NuGet 包。下面的代码定义了如果指定了特定参数，应该执行哪个逻辑。

var commandLineParser = new FluentCommandLineParser();
commandLineParser.Setup<string>('p', "path").Callback(s => filePath = s);
commandLineParser.Setup<string>('y', "year").Callback(y => yearInput = y);
commandLineParser.Setup<string>('i', "profileId").Callback(p => profileIdInput = p);
commandLineParser.Parse(args);

CSV 实用程序

我认为从文章数据列表中创建 CSV 文件的最简单方法是使用 CsvHelper NuGet 包。我们首先创建一个指向 CSV 文件路径的 TextWriter，然后创建 CsvWriter 对象。最后，我们调用 WriteRecords 方法。

private static void CreateReportAllTime()
{
    TextWriter textWriter = new StreamWriter(filePath);
    var csv = new CsvWriter(textWriter);
    csv.WriteRecords(articlesInfos.OrderByDescending(x => x.Views));

下载应用程序

下载完整源代码

所有图片均从 DepositPhotos.com购买，不能免费下载和使用。
许可协议