可用性 UI 中级 C#

GUIMiner：使用关联规则快速录入数据

Paolo Parise

4.33/5 (3投票s)

2013年8月7日

CPOL

5分钟阅读

22685

369

使用关联规则实现快速数据输入。

下载源代码 - 34.1 KB

引言

有时，数据输入过程会给最终用户带来非常烦恼且耗时的体验。在这里，我们讨论了众所周知的关联规则[AGRAWAL93]，以根据用户的当前输入建议“常用”值。它还介绍了在 n 层场景中实现此类技术，最后以医疗保健药物处方案例研究结束。

背景

通过一个例子，我们介绍了关联规则背后的思想，并将其留给[AGRAWAL93]进行正式讨论。

考虑下表T

属性1	属性2	属性3
a	b	c
a	b	c
a	b	d
a	d	d
a	f	d

我们说规则Attribute1[a], Attribute[b] → Attribute3[c]在T中被满足，其支持度 = 3/5，置信度 = 2/3。规则的支持度是指T中满足规则的结果集和前提集的并集的交易占总交易的比例。此外，如果至少c%的满足其前提集的交易也满足其结果集，则该规则的置信度因子为0 ≤ c ≤ 1。用中文来说就是：“如果Attribute1 = a且Attribute2 = b，那么，以一定的概率，Attribute3 = c”。对于挖掘关联规则的问题，已经引入了多种算法，例如Apriori[AGRAWAL94]。

使用代码

服务器端.

这里是服务契约声明（IService）、其实现（Service）、一组在客户端-服务器通信期间序列化的轻量级对象（DTO）以及用于查询挖掘出的规则集（RuleSet）的持久化层。

ServiceLayer.IService：定义了服务契约。

namespace ServiceLayer
{
    public interface IService
    {
        List<DTO.AttributeDTO> minedConsequents(List<DTO.AttributeDTO> 
                  antecedents, List<DTO.AttributeDTO> consequents);
    }
}

ServiceLayer.Service：服务调用的实现。

namespace ServiceLayer
{
    public class Service : IService
    {
        public List<DTO.AttributeDTO> minedConsequents(List<DTO.AttributeDTO> 
                   antecedents, List<DTO.AttributeDTO> consequents)
        {
            var ants = Dtos_to_Attributes(antecedents);
            var cons = Dtos_to_Attributes(consequents);
            var tmp = PersistenceLayer.RuleDAO.getConsequents(ants, cons);
            return Attributes_to_Dtos(tmp);
        }
    } 
}

DTO.AttributeDTO：用于将挖掘出的规则信息序列化到客户端/从客户端反序列化的数据传输对象。

namespace DTO
{
    public class AttributeDTO
    {
        public string AttributeName { get; private set; }
        public  string AttributeValue { get; set; }
 
        public AttributeDTO(string name = null, string value = null)
        {
            AttributeName = name;
            AttributeValue = value;
        }
    }
}

PersistenceLayer.RuleDAO：用于查询RuleSet的数据访问对象的实现。简而言之，该调用会查询RuleSet并返回具有指定前提集的所有规则的结果集。

namespace PersistenceLayer
{
    public class RuleDAO
    {
        public static List<Entities.Attribute> getConsequents(
          List<Entities.Attribute> antecedents, List<Entities.Attribute> consequents = null)
        {
            ...
        }
    }
}

存储

对于生产数据库模型，没有特别的假设。通常使用关系数据库，但也可以考虑其他方法，如对象关系数据库和最新的 NoSQL 模型。

数据仓库：ETL阶段。进行转换以生成适合关联规则挖掘（Apriori）的输入（Datamart）。与任何 ETL 提取一样，关于运行频率、一天中的哪个时间、考虑的输入数据集以及其他任何策略的决策将根据当前场景做出。

RuleSet.xml：是Apriori运行的输出。RuleSet可以存储在数据访问层接受的任何格式中。选择了一个自定义 XML 表示，但也可以使用更常见的格式，如 XRFF 或 ARFF。

<?xml version="1.0" encoding="utf-8" ?>
<rules> 
<rule support="2" confidence="60">
<antecedent attributeName = "attribute1" value ="10"/>
<antecedent attributeName = "attribute2" value ="20"/>
<consequent attributeName = "attribute3" value ="30"/>
</rule>
...
</rules>

客户端

GUI.Form1.gui_KeyPress：当用户执行类似按下数字键的事件时，会触发此事件。
GUI.Form1._bindingGUIAttributes：这个内存中的数据结构将每个 GUI 组件与关联的属性绑定。
GUI.Form1.setMinedAttributes()：每当触发 gui_KeyPress 事件时，都会调用此方法。给定用户当前的输入（前提集），它会查询 GUI.Proxy 以获取相应的后果（对应于尚未填写的字段），并相应地刷新关联字段。

namespace GUI
{
    public partial class Form1 : Form
    {
        Dictionary<string, BindedValue> _bindingGUIAttributes;
        
        public Form1()
        {
            ...
            _bindingGUIAttributes = new Dictionary<string, BindedValue>();
            _bindingGUIAttributes.Add(textBox1.Name, 
              new BindedValue(new DTO.AttributeDTO("Drug", null),true));
            _bindingGUIAttributes.Add(textBox2.Name, 
              new BindedValue(new DTO.AttributeDTO("Route", null), true));
            _bindingGUIAttributes.Add(textBox3.Name, 
              new BindedValue(new DTO.AttributeDTO("Form", null), true));
            _bindingGUIAttributes.Add(textBox4.Name, 
              new BindedValue(new DTO.AttributeDTO("Dose", null), true));
        }
 
        private void setMinedAttributes()
        {
            List<DTO.AttributeDTO> antecedents = this.getNonMineableAttributes();
            List<DTO.AttributeDTO> consequents = this.getMineableAttributes();
            var minedConsequents = new GUI.Proxy().minedConsequents(antecedents, consequents);
            ...
            foreach (var cons in consequents)
            {
                ...
                foreach (var mined in minedConsequents)
                {
                    ...
                    updateAttributeValue(cons, mined.AttributeValue);
                }
            }
        }
 
        private void gui_KeyPress(object sender, KeyPressEventArgs e)
        {
            ...
            setMinedAttributes();
        }
 
        class BindedValue
        {
            public DTO.AttributeDTO attribute;
            public bool isMineable;
 
            public BindedValue(DTO.AttributeDTO attr, bool mineable)
            {
                attribute = attr;
                this.isMineable = mineable;
            }
        }
    }
}

GUI.Proxy：它管理客户端侧的服务调用机制，与服务器端的 ServiceLayer 共享 IService 契约。

namespace GUI
{
    class Proxy : ServiceLayer.IService
    {
        public List<DTO.AttributeDTO> minedConsequents(List<DTO.AttributeDTO> 
                    antecedents, List<DTO.AttributeDTO> consequents) 
        {
            var service = new ServiceLayer.Service();
            return service.minedConsequentes(antecedents, consequents);
        }
    }
}

案例研究

考虑一个数据输入任务，医生通过 GUIMiner 提交一份药物处方，该处方由药物、给药途径、剂型和剂量组成，以及期望的处方。

通过 ETL 程序和经过适当参数化（支持度和置信度）的Apriori运行，挖掘出以下RuleSet：

[药物]en → [给药途径]口服

[药物]en, [给药途径]口服, [剂型]t → [剂量]3

医生打开 GUIMiner，开始输入上述新药物处方。他开始输入药物的名称。

在后台，会触发 gui_KeyPress 事件，系统会检查当前输入是否与RuleSet中的规则前提集匹配。目前没有找到匹配项，因此没有向 GUIMiner 返回任何建议。然后是药物名称的第二个字母。再次，会触发 gui_KeyPress 事件，系统会查找规则。现在找到了规则[药物]en → [给药途径]口服，并且结果集[给药途径]口服被封装在一个 AttributeDTO 实例中返回。GUIMiner 会自动填充相应的字段，可能在图形上标记为“建议值”。

因为“口服”是该处方的正确给药途径，医生可以避免输入它，继续填写剂型。系统会检查当前输入（[药物]en, [给药途径]口服, [剂型]t）是否是某些规则的前提集。因为找到了规则[药物]en, [给药途径]口服, [剂型]t → [剂量]3，所以返回了相应的后果。GUIMiner 会自动填充剂量字段。

最后，医生将这个最后挖掘出的字段替换为期望值剂量 = 2，并提交处方。

结论

上面展示了一个初步的设想，但在现实世界中，细节决定成败。为了让您了解可能遇到的问题，我们报告一些随机的考虑。

如果服务调用是一个瓶颈，那么客户端缓存方法可能是正确的选择。在之前的解决方案中，每当用户按下数字键时都会执行一次远程调用。为了避免这些远程调用，可以在 GUI.Proxy 后面实现一个客户端缓存。此外，还可以采用不同的缓存策略，如到期时间、根据 GUI 中出现的属性下载的规则子集以及规则累积策略。

另一个有趣的方面是引入图形反馈，以通知用户某个字段是显式用户输入还是挖掘出的值。在之前的解决方案中，会渲染不同的边框样式来显示该字段是挖掘出的。在下一个数字输入时，边框样式会恢复到默认值。

好消息和坏消息：Apriori 执行可能是一项昂贵的工作，但同时，在大量数据中挖掘出的RuleSet在进行大量数据库插入/删除/更新操作后会发生变化，因此您应该在更新RuleSet和为其他业务任务节省计算资源之间找到一个好的平衡。

使用语法约束：调用 getConsequents(List<DTO.AttributeDTO> ants, List<DTO.AttributeDTO>cons = null) 考虑{X→I_j ∈ RuleSet | ants = X and I_j ∈ cons}，但其他查询，如{X→I_j ∈ RuleSet | ants ⊆ X and I_j ∈ cons}，在注意以下结果时可能也很好：confidence(AB→C) ≥ confidence(A→C)。

参考文献

AGRAWAL93：Rakesh Agrawal；Tomasz Imielinski；Arun Swami，Mining Associations Rules between Sets of Items in Large Databases，1993。
AGRAWAL94：Rakesh Agrawal；Ramakrishnan Srikant，Fast Algorithms for Mining Association Rules，1994。