65.9K
CodeProject 正在变化。 阅读更多。
Home

转换 NorthWind

starIconstarIconstarIconstarIcon
emptyStarIcon
starIcon

4.95/5 (29投票s)

2013年9月25日

GPL3

7分钟阅读

viewsIcon

59906

downloadIcon

341

结合反规范化、转换、复制和精彩内容。

引言

Transformalize 旨在近乎实时地转换和反规范化关系数据。生成的数据可以用于多种方式:

  • 作为 OLAP 多维数据集的数据源
  • 用于填充 SOLR 或 Elasticsearch 索引
  • 提供更快、更简单、非阻塞的常规 SQL 查询和报表访问
  • 或者,发挥您的想象力...(例如,用于填充 Redis 缓存服务器,加载 NoSql 数据库等)

Transformalize 是一个开源的 .NET 4.5 类库。它可以直接在代码中引用和运行,或者使用包含的控制台应用程序运行。其源代码托管在 GitHub 上。

演示

从 XML 配置开始

<transformalize>
	<processes>
		<add name="NorthWind">
		    <connections>
		        <add name="input" />
		        <add name="output" />
		    </connections>
		    <entities/>
		    <relationships/>
		</add>
	</processes>
</transformalize>

指定 NorthWind 数据库作为输入,另一个作为输出。默认的输入和输出连接提供程序是 SQL Server 2005+。

<connections>
    <add name="input" connection-string="server=localhost;Database=NorthWind;Trusted_Connection=True;"/>
    <add name="output" connection-string="server=localhost;Database=NorthWindOutput;Trusted_Connection=True;"/>
</connections>

NorthWind 数据库架构

NorthWind Schema

添加一个 Order Details 实体并将文件保存为NorthWind.xml

<entities>
    <add name="Order Details"/>
</entities> 

使用控制台应用程序,以“metadata”模式运行 Transformalize(又名 tfl)。

tfl NorthWind.xml {'mode':'metadata'}

Metadata 模式读取数据库的信息架构。然后,它会写入并打开一个包含 Order Detail 主键和字段定义的 XML 文件。将它们复制到NorthWind.xml

<entities>
    <add name="Order Details">
        <fields>
            <add name="OrderID" type="System.Int32" primary-key="true" />
            <add name="ProductID" type="System.Int32" primary-key="true" />
            <add name="Discount" type="System.Single" />
            <add name="Quantity" type="System.Int16" />
            <add name="UnitPrice" type="System.Decimal" precision="19" scale="4"/>
        </fields>
    </add>
</entities>  

现在,以 Initialize 模式运行 Transformalize。

tfl NorthWind.xml {'mode':'init'}
23:38:57 | Info | NorthWind | All | Initialized TrAnSfOrMaLiZeR.
23:38:57 | Info | NorthWind | All | Initialized NorthWindOrderDetails in NorthWindOutput on localhost.
23:38:57 | Info | NorthWind | All | Process completed in 00:00:00.5585967.

Initialize 模式初始化输出,准备存储数据的地方。现在运行 Tfl 而不指定模式。

tfl NorthWind.xml
23:43:01 | Info | NorthWind | Order Details....... | Processed 2155 inserts, and 0 updates in Order Details.
23:43:01 | Info | NorthWind | Order Details....... | Process completed in 00:00:00.7455880.

Transformalize 复制了 Northwind.xml 中配置的数据。如果我们再次运行它,就会发生这种情况:

tfl NorthWind.xml
23:44:18 | Info | NorthWind | Order Details....... | Processed 0 inserts, and 2155 updates in Order Details.
23:44:18 | Info | NorthWind | Order Details....... | Process completed in 00:00:01.0926105. 

它会更新数据。它会复制新数据并更新现有数据,但这效率低下。源中的 2155 条记录未被修改,但在目标中却不必要地更新了。因此,我们需要向 Order Details 实体添加一个version列。版本列应该是一个在记录插入或更新时会递增的值。方便的是,SQL Server 提供了 ROWVERSION 类型,可以在不修改应用程序或添加触发器的情况下提供版本列。

ALTER TABLE [Order Details] ADD RowVersion ROWVERSION; 

更新 Order Details 实体以使用 RowVersion:

<entities>
    <add name="Order Details" version="RowVersion">
        <fields>
            <add name="OrderID" type="System.Int32" primary-key="true" />
            <add name="ProductID" type="System.Int32" primary-key="true" />
            <add name="Discount" type="System.Single" />
            <add name="Quantity" type="System.Int16" />
            <add name="RowVersion" type="System.Byte[]" length="8" />
            <add name="UnitPrice" type="System.Decimal" precision="19" scale="4"/>
        </fields>
    </add>
</entities> 

重新初始化并运行两次:

tfl NorthWind.xml {'mode':'init'}
23:58:52 | Info | NorthWind | All | Initialized TrAnSfOrMaLiZeR.
23:58:52 | Info | NorthWind | All | Initialized NorthWindOrderDetails in NorthWindOutput on localhost.
23:58:52 | Info | NorthWind | All | Process completed in 00:00:00.5504415.

tfl NorthWind.xml
00:00:18 | Info | NorthWind | Order Details....... | Processed 2155 inserts, and 0 updates in Order Details.
00:00:18 | Info | NorthWind | Order Details....... | Process completed in 00:00:00.7417452.

tfl NorthWind.xml
00:00:23 | Info | NorthWind | Order Details....... | Processed 0 inserts, and 0 updates in Order Details.
00:00:23 | Info | NorthWind | Order Details....... | Process completed in 00:00:00.6042720.

现在它不会不必要地更新数据。它使用版本字段来感知数据未被更新。让我们查看输出。

SELECT TOP 10 Discount, OrderID, ProductID, Quantity, UnitPrice
FROM NorthWindStar;
Discount      OrderID     ProductID   Quantity UnitPrice
------------- ----------- ----------- -------- ---------
0.2           10248       11          12       14.0000
0             10248       42          10       9.8000
0             10248       72          5        34.8000
0             10249       14          9        18.6000
0             10249       51          40       42.4000
0             10250       41          10       7.7000
0.15          10250       51          35       42.4000
0.15          10250       65          15       16.8000
0.05          10251       22          6        16.8000
0.05          10251       57          15       15.6000

查看 NorthWind 图。与 Order Details 最近的表是 OrdersProducts。添加 Orders 实体。提示:添加实体 <add name="Orders"/> 并以元数据模式运行 Tfl。

<add name="Orders" version="RowVersion">
    <fields>
        <add name="OrderID" type="System.Int32" primary-key="true" ></add>
        <add name="Discount" type="System.Single" ></add>
        <add name="Quantity" type="System.Int16" ></add>
        <add name="RowVersion" type="System.Byte[]" length="8" ></add>
        <add name="UnitPrice" type="System.Decimal" precision="19" scale="4"></add>
        <add name="CustomerID" type="System.Char" length="5" ></add>
        <add name="EmployeeID" type="System.Int32" ></add>
        <add name="Freight" type="System.Decimal" precision="19" scale="4"></add>
        <add name="OrderDate" type="System.DateTime" ></add>
        <add name="RequiredDate" type="System.DateTime" ></add>
        <add name="RowVersion" type="System.Byte[]" length="8" ></add>
        <add name="ShipAddress" length="60" ></add>
        <add name="ShipCity" length="15" ></add>
        <add name="ShipCountry" length="15" ></add>
        <add name="ShipName" length="40" ></add>
        <add name="ShippedDate" type="System.DateTime" ></add>
        <add name="ShipPostalCode" length="10" ></add>
        <add name="ShipRegion" length="15" ></add>
        <add name="ShipVia" type="System.Int32" ></add>
    </fields>
</add> 

重新初始化。

tfl NorthWind.xml {'mode':'init'}
22:32:14 | Error | NorthWind | The entity Orders must have a relationship to the master entity Order Details.

添加另一个表时,它必须与主表相关。主表是定义的第一个表。在这种情况下,它是 Order Details。因此,我们必须添加一个关系。

</entities>
    <relationships>
        <add left-entity="Order Details" left-field="OrderID" 
                right-entity="Orders" right-field="OrderID"/>
    </relationships>
</process> 

重新初始化。

tfl NorthWind.xml {'mode':'init'}
23:13:31 | Error | NorthWind | field overlap error in Orders. The field: RowVersion is already defined in a previous entity.  You must alias (rename) it.

就像在 SQL 视图中一样,多个实体(或表)连接在一起可能会引入相同的字段名称。因此,您必须重命名(或别名)任何名称相同的列。在这种情况下,它是我们用于检测更改的 RowVersion 列。因此,将 Orders 实体中的 RowVersion 别名为 OrdersRowVersion,如下所示:

<add name="Orders" version="RowVersion">
	<fields>
		<!-- ... -->
		<add name="RowVersion" alias="OrdersRowVersion" type="System.Byte[]" length="8" />
	</fields>
</add>

重新初始化并运行两次。

tfl NorthWind.xml {'mode':'init'}
23:23:47 | Info | NorthWind | All | Initialized TrAnSfOrMaLiZeR.
23:23:47 | Info | NorthWind | All | Initialized NorthWindOrderDetails in NorthWindOutput on localhost.
23:23:47 | Info | NorthWind | All | Initialized NorthWindOrders in NorthWindOutput on localhost.
23:23:47 | Info | NorthWind | All | Process completed in 00:00:00.6609756.

tfl NorthWind.xml
23:24:30 | Info | NorthWind | Order Details....... | Processed 2155 inserts, and 0 updates in Order Details.
23:24:30 | Info | NorthWind | Orders.............. | Processed 830 inserts, and 0 updates in Orders.
23:24:30 | Info | NorthWind | Orders.............. | Process completed in 00:00:00.9719255.

tfl NorthWind.xml
23:24:35 | Info | NorthWind | Order Details....... | Processed 0 inserts, and 0 updates in Order Details.
23:24:35 | Info | NorthWind | Orders.............. | Processed 0 inserts, and 0 updates in Orders.
23:24:35 | Info | NorthWind | Orders.............. | Process completed in 00:00:00.7284382.

查看输出

SELECT TOP 10 Discount, OrderID, ProductID, Quantity, UnitPrice, 
  CustomerID, EmployeeID, Freight, OrderDate, RequiredDate, 
  ShipAddress, ShipCity, ShippedDate, ShipPostalCode, ShipRegion, ShipVia
FROM NorthWindStar;
Discount OrderID   ProductID   Quantity UnitPrice  CustomerID EmployeeID  Freight    OrderDate  RequiredDate ShipAddress               ShipCity        ShippedDate ShipPostalCode ShipRegion ShipVia
-------- --------- ----------- -------- ---------  ---------- ----------- --------   ---------- ------------ ------------------------- --------------- ----------- -------------- ---------- -----------
0.2      10248     11          12       14.0000    VINET      5           32.3800    1996-07-04 1996-08-01   59 rue de l'Abbaye    Reims           1996-07-16  51100                     3
0        10248     42          10       9.8000     VINET      5           32.3800    1996-07-04 1996-08-01   59 rue de l'Abbaye    Reims           1996-07-16  51100                     3
0        10248     72          5        34.8000    VINET      5           32.3800    1996-07-04 1996-08-01   59 rue de l'Abbaye    Reims           1996-07-16  51100                     3
0        10249     14          9        18.6000    TOMSP      6           11.6100    1996-07-05 1996-08-16   Luisenstr. 48             Münster    1996-07-10  44087                     1
0        10249     51          40       42.4000    TOMSP      6           11.6100    1996-07-05 1996-08-16   Luisenstr. 48             Münster    1996-07-10  44087                     1
0        10250     41          10       7.7000     HANAR      4           65.8300    1996-07-08 1996-08-05   Rua do Paço, 67    Rio de Janeiro  1996-07-12  05454-876      RJ         2
0.15     10250     51          35       42.4000    HANAR      4           65.8300    1996-07-08 1996-08-05   Rua do Paço, 67    Rio de Janeiro  1996-07-12  05454-876      RJ         2
0.15     10250     65          15       16.8000    HANAR      4           65.8300    1996-07-08 1996-08-05   Rua do Paço, 67    Rio de Janeiro  1996-07-12  05454-876      RJ         2
0.05     10251     22          6        16.8000    VICTE      3           41.3400    1996-07-08 1996-08-05   2, rue du Commerce        Lyon            1996-07-15  69004                     1
0.05     10251     57          15       15.6000    VICTE      3           41.3400    1996-07-08 1996-08-05   2, rue du Commerce        Lyon            1996-07-15  69004                     1

现在,重复执行。也就是说,查阅 NorthWind 图并继续添加相关实体,直到关系配置如下所示:

<relationships>
    <add left-entity="Order Details" left-field="OrderID" right-entity="Orders" right-field="OrderID" />
    <add left-entity="Order Details" left-field="ProductID" right-entity="Products" right-field="ProductID" />
    <add left-entity="Orders" left-field="CustomerID" right-entity="Customers" right-field="CustomerID" />
    <add left-entity="Orders" left-field="EmployeeID" right-entity="Employees" right-field="EmployeeID" />
    <add left-entity="Orders" left-field="ShipVia" right-entity="Shippers" right-field="ShipperID" />
    <add left-entity="Products" left-field="SupplierID" right-entity="Suppliers" right-field="SupplierID" />
    <add left-entity="Products" left-field="CategoryID" right-entity="Categories" right-field="CategoryID" />
</relationships>

正如您所料,添加所有这些实体会产生许多重复的字段名称。与其重命名每个字段,不如为实体添加前缀。前缀将所有字段别名化为 prefix + name。

<add name="Employees" version="RowVersion" prefix="Employee">
    <fields>
		<!-- ... -->
    </fields>
</add>

初始化并运行两次。控制台输出应如下所示:

tfl NorthWind.xml {'mode':'init'}
19:41:53 | Info | NorthWind | All | Initialized TrAnSfOrMaLiZeR.
19:41:53 | Info | NorthWind | All | Initialized NorthWindOrderDetails in NorthWindOutput on localhost.
19:41:53 | Info | NorthWind | All | Initialized NorthWindOrders in NorthWindOutput on localhost.
19:41:53 | Info | NorthWind | All | Initialized NorthWindProducts in NorthWindOutput on localhost.
19:41:53 | Info | NorthWind | All | Initialized NorthWindCustomers in NorthWindOutput on localhost.
19:41:53 | Info | NorthWind | All | Initialized NorthWindEmployees in NorthWindOutput on localhost.
19:41:53 | Info | NorthWind | All | Initialized NorthWindShippers in NorthWindOutput on localhost.
19:41:53 | Info | NorthWind | All | Initialized NorthWindSuppliers in NorthWindOutput on localhost.
19:41:53 | Info | NorthWind | All | Initialized NorthWindCategories in NorthWindOutput on localhost.
19:41:53 | Info | NorthWind | All | Process completed in 00:00:01.1828232.

tfl NorthWind.xml
19:42:06 | Info | NorthWind | Order Details....... | Processed 2155 inserts, and 0 updates in Order Details.
19:42:07 | Info | NorthWind | Orders.............. | Processed 830 inserts, and 0 updates in Orders.
19:42:07 | Info | NorthWind | Products............ | Processed 77 inserts, and 0 updates in Products.
19:42:07 | Info | NorthWind | Customers........... | Processed 91 inserts, and 0 updates in Customers.
19:42:07 | Info | NorthWind | Employees........... | Processed 9 inserts, and 0 updates in Employees.
19:42:07 | Info | NorthWind | Shippers............ | Processed 3 inserts, and 0 updates in Shippers.
19:42:07 | Info | NorthWind | Suppliers........... | Processed 29 inserts, and 0 updates in Suppliers.
19:42:07 | Info | NorthWind | Categories.......... | Processed 8 inserts, and 0 updates in Categories.
19:42:07 | Info | NorthWind | Orders.............. | Processed 2155 rows. Updated Order Details with Orders.
19:42:07 | Info | NorthWind | Products............ | Processed 2155 rows. Updated Order Details with Products.
19:42:07 | Info | NorthWind | All................. | Process completed in 00:00:01.2583563.

tfl NorthWind.xml
19:42:13 | Info | NorthWind | Order Details....... | Processed 0 inserts, and 0 updates in Order Details.
19:42:13 | Info | NorthWind | Orders.............. | Processed 0 inserts, and 0 updates in Orders.
19:42:13 | Info | NorthWind | Products............ | Processed 0 inserts, and 0 updates in Products.
19:42:13 | Info | NorthWind | Customers........... | Processed 0 inserts, and 0 updates in Customers.
19:42:13 | Info | NorthWind | Employees........... | Processed 0 inserts, and 0 updates in Employees.
19:42:13 | Info | NorthWind | Shippers............ | Processed 0 inserts, and 0 updates in Shippers.
19:42:13 | Info | NorthWind | Suppliers........... | Processed 0 inserts, and 0 updates in Suppliers.
19:42:13 | Info | NorthWind | Categories.......... | Processed 0 inserts, and 0 updates in Categories.
19:42:13 | Info | NorthWind | All................. | Process completed in 00:00:00.7708553.

现在输出 NorthWindStar 中有 81 个可用字段。

SELECT COUNT(*) AS FieldCount
FROM INFORMATION_SCHEMA.COLUMNS
WHERE TABLE_NAME = 'NorthWindStar'
FieldCount
-----------
81

Northwind Star Schema

如上图所示,我并没有完全反规范化数据。相反,我创建了一个 星型模型,其中每个相关表都与主表有一个连接。此外,我还创建了一个视图(例如 NorthWindStar),以便用户可以像查询单个表一样查询它。拥有此重复数据的单一视图使我们能够快速创建简单的多维数据集或与搜索引擎集成:

利用 SQL Server Analysis Services

打开 BIDS,创建一个 Analysis Services 项目(或多维数据集)来浏览数据。

  • 将数据源设置为您的 NorthWindOutput 数据库。
  • 将数据视图设置为 NorthWindStar 视图。
  • 创建一个使用视图中所有字段(二进制字段除外)的维度。称之为 Properties。
  • 创建一个包含几个度量(例如 UnitPrice、Quantity)的多维数据集,并将其连接到 Properties 维度。
  • 处理多维数据集并进行浏览。注意:您可能需要授予 NT AUTHORITY\LOCAL SERVICE 对 NorthWindOutput 数据库的读写权限。

BIDS

正如您所看到的,按订单日期切片度量并非理想。此外,单价和数量度量本身作用不大。此多维数据集需要时间层次结构和收入计算。我们可以通过 Transformalize 添加它们。首先,添加三个基于“order date”的计算字段以创建时间层次结构。

<add name="Orders" version="RowVersion" prefix="Orders">
    <fields>
      <!-- ... -->
    </fields>
    <calculated-fields>
      <add name="TimeDate" length="10" default="9999-12-31">
        <transforms>
            <add method="toString" format="yyyy-MM-dd" parameter="OrderDate" />
        </transforms>
      </add>
      <add name="TimeMonth" length="6" default="12-DEC">
        <transforms>
            <add method="toString" format="MM-MMM" parameter="OrderDate" />
            <add method="toUpper" />
        </transforms>
      </add>
      <add name="TimeYear" type="System.Int16" default="9999">
        <transforms>
            <add method="toString" format="yyyy" parameter="OrderDate" />
		</transforms>
      </add>
    </calculated-fields>
</add>

计算字段根据其他字段和其他先前定义的计算字段的值来生成新字段。它们在实体级别或处理级别使用。在实体中,它们可以访问实体内的任何字段。在处理中,它们可以访问所有数据。要控制它们可以访问哪些字段,请使用如下参数:

<transform method="format" format="{0} is a big city!">
    <parameters>
        <add field="City" />
    </parameters>
</transform>

您可以这样添加多个参数。但是,如果您只有一个参数,可以在 transform 元素本身的 parameter 属性中指定它,如下所示:

<transform method="format" format="{0} is a big city!" 
        parameter="City" /> 

另一个快捷方式是将 parameter 属性设置为“*”以包含所有字段。

有许多内置的 Transforms。如果您找不到适合您需求的,可以使用 C#、JavaScript 或 Razor 模板 transforms 来定义自己的。让我们使用一个 JavaScript transform 来计算收入。

<calculated-fields>
    <add name="Revenue" type="System.Decimal" >
        <transforms>
            <add method="javascript" script="(UnitPrice * (1 - Discount)) * Quantity" parameter="*" />
        </transforms>
    </add>
</calculated-fields>

重新初始化并运行 Tfl。然后,使用新的时间字段和收入,看看它是否能改善多维数据集的浏览体验。

BIDS 2

多维数据集现在看起来更好,但我们需要它在 Transformalize 运行时更新。因此,添加一个到 Analysis Services 的连接和一个相应的模板操作:

<connections>
    <add name="input" connection-string="server=localhost;Database=NorthWind;Trusted_Connection=True;"/>
    <add name="output" connection-string="Server=localhost;Database=NorthWindOutput;Trusted_Connection=True;"/>
    <add name="cube" connection-string="Data Source=localhost;Catalog=NorthWind;" provider="AnalysisServices"/>
</connections>

<!-- ... -->

<templates path="C:\Tfl\">
    <add name="process-cube" file="process-cube.xmla">
        <settings>
            <add name="DatabaseID" value="NorthWind2"></add>
        </settings>    
        <actions>
            <add action="run" connection="cube"></add>
        </actions>
    </add>
</templates>**

Transformalize 的“templates”使用 C# Razor 语法。设置会传入模板并像这样使用:

<Batch xmlns="http://schemas.microsoft.com/analysisservices/2003/engine">
  <Process xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:ddl2="http://schemas.microsoft.com/analysisservices/2003/engine/2" xmlns:ddl2_2="http://schemas.microsoft.com/analysisservices/2003/engine/2/2" xmlns:ddl100_100="http://schemas.microsoft.com/analysisservices/2008/engine/100/100" xmlns:ddl200="http://schemas.microsoft.com/analysisservices/2010/engine/200" xmlns:ddl200_200="http://schemas.microsoft.com/analysisservices/2010/engine/200/200">
    <Object>
      <DatabaseID>@(Model.Settings.DatabaseID)</DatabaseID>
    </Object>
    <Type>ProcessFull</Type>
    <WriteBackTableCreation>UseExisting</WriteBackTableCreation>
  </Process>
</Batch>

@(Model.Settings.DatabaseID) 将被替换为 NorthWind2。Transformalize 的模板管理器将渲染模板,并随后运行定义的“actions”。“run”操作针对指定的连接执行渲染后的内容。这允许您动态构建数据操作查询,或者在此情况下是 XMLA 命令,并执行它们。

tfl NorthWind.xml
00:14:28 | Info | NorthWind | Order Details.. | Processed 2155 inserts, and 0 updates in Order Details.
00:14:28 | Info | NorthWind | Orders......... | Processed 830 inserts, and 0 updates in Orders.
00:14:28 | Info | NorthWind | Products....... | Processed 77 inserts, and 0 updates in Products.
00:14:28 | Info | NorthWind | Customers...... | Processed 91 inserts, and 0 updates in Customers.
00:14:28 | Info | NorthWind | Employees...... | Processed 9 inserts, and 0 updates in Employees.
00:14:28 | Info | NorthWind | Shippers....... | Processed 3 inserts, and 0 updates in Shippers.
00:14:28 | Info | NorthWind | Suppliers...... | Processed 29 inserts, and 0 updates in Suppliers.
00:14:28 | Info | NorthWind | Categories..... | Processed 8 inserts, and 0 updates in Categories.
00:14:28 | Info | NorthWind | Orders......... | Processed 2155 rows. Updated Order Details with Orders.
00:14:29 | Info | NorthWind | Products....... | Processed 2155 rows. Updated Order Details with Products.
00:14:31 | Info | NorthWind | Categories..... | process-cube ran successfully.
00:14:31 | Info | NorthWind | All............ | Process completed in 00:00:03.8312882.

利用 Apache SOLR

使用更复杂的模板和 Apache SOLR 服务器,还可以将全文搜索集成到该过程中。Transformalize 附带一对模板,可以构建必要的 SOLR 配置文件,用于模式和数据导入处理。

<templates>

    <add name="solr-data-handler" file="solr-data-handler.cshtml" cache="true">
      <actions>
        <add action="copy" file="C:\Solr\NorthWind\conf\data-config.xml"/>
      </actions>
    </add>

    <add name="solr-schema" file="solr-schema.cshtml" cache="true">
      <actions>
        <add action="copy" file="C:\Solr\NorthWind\conf\schema.xml"/>
        <add action="web" url="https://:8983/solr/NorthWind/dataimport?command=full-import&clean=true&commit=true&optimize=true"/>
      </actions>
    </add>

</templates>

<search-types>
    <add name="default" />
    <add name="facet" analyzer="lowercase" store="true" index="true" />
    <add name="standard" analyzer="standard_lowercase" store="false" index="true"/>
</search-types>

Razor 模板“solr-data-handler.cshtml”和“solr-schema.cshtml”渲染 SOLR 配置文件。这是可能的,因为模板管理器将我们上面用 XML 构建的整个 NorthWind 配置传递到模板中。

为了控制 SOLR 中字段的处理方式,“search types”(搜索类型)应用于每个 &lt;field/&gt;&lt;calculated-field/&gt;。默认情况下,每个字段都会根据其数据类型在搜索索引中进行索引和存储。要分配更复杂的文本分析,您可以将 search-type 属性设置为 facet、standard 或任何您定义的其他类型。要将字段排除在搜索之外,请将 search-type 设置为“none”。

现在运行 Tfl 会生成:

tfl NorthWind.xml
...
00:48:25 | Info | NorthWind | Products...... | Processed 2155 rows. Updated Order Details with Products.
00:48:28 | Info | NorthWind | Categories.... | process-cube ran successfully.
00:48:28 | Info | NorthWind | Categories.... | Copied solr-data-handler template output to C:\Solr\NorthWind\conf\data-config.xml.
00:48:29 | Info | NorthWind | Categories.... | Copied solr-schema template output to C:\Solr\NorthWind\conf\schema.xml.
00:48:29 | Info | NorthWind | Categories.... | Made web request to https://:8983/solr/NorthWind/dataimport?command=full-import&clean=true&commit=true&optimize=true.
00:48:29 | Info | NorthWind | Categories.... | Process completed in 00:00:04.8287386.

在此示例中,模板操作“web”会触发 SOLR 清理和重新导入索引。在生产环境中,您希望在模式更改时重新加载它,并适当地利用完全导入和增量导入。如果一切顺利,您将在 SOLR 管理界面中看到类似以下的内容:

SOLR

现在,如果您将 Transformalize 设置为每几分钟运行一次,您就可以在 OLTP 数据之上获得近乎实时的 OLAP 和搜索引擎服务。OLAP 多维数据集支持更高效、更复杂的报表需求,而搜索引擎则可以实现闪电般快速的模糊搜索来查找特定记录。如果您的用户希望以不同的方式查看数据,而且他们一定会这样做,您所要做的就是添加 transforms 和/或新的计算字段并重新初始化您的输出。

当 Transformalize 读取您的生产数据库时,它会尝试尽可能少地引起争用。使用版本字段,它可以非常快速地保持非常大的数据库的星型模型副本的最新状态。

摘要

NorthWind 数据相当干净。在现实中,您将面临更具挑战性的数据源。


Transformalize 使用了几个其他开源项目,包括:

  1. Rhino ETL 
  2. Razor Engine
  3. Jint
  4. Ninject
  5. NLog
  6. fastJSON 
  7. Dapper-dot-net
  8. File Helpers
  9. Excel Data Reader
  10. Enterprise Library 6 Validation Block
  11. Elasticsearch.NET
  12. SolrNet

在可能的情况下,我包含了这些项目的源代码,而不是 Nuget 包。这样做的好处是,我可以在他人的代码中进行深入学习。缺点是保持这些库更新有点困难。

© . All rights reserved.