65.9K
CodeProject 正在变化。 阅读更多。
Home

VisualBasic 中的维恩图

starIconstarIconstarIconstarIconstarIcon

5.00/5 (3投票s)

2016 年 4 月 4 日

CPOL

4分钟阅读

viewsIcon

18503

用于在 VisualBasic 中绘制维恩图的 R API

引言

维恩图是一种图表,用于表示数据集之间的关系。 例如,在生物研究领域,维恩图可用于通过使用蛋白质 BBH blastp 分析结果来表示细菌基因组之间的共同和独特元素。

背景

R 语言是数据挖掘和机器学习中流行的一种语言,也是数据可视化的强大工具。 对于在 R 语言中绘制维恩图,建议使用名为 VennDiagram 的软件包用于此绘图

https://cran.r-project.cn/web/packages/VennDiagram/index.html

这是一个在 R 语言中绘制维恩图的简单示例

library(VennDiagram)

# Creates the data set
d0 <- c(3, 4, 5);
d1 <- c(2, 3);
d2 <- c(1, 3);
d3 <- c(3, 5);
d4 <- c(1, 2, 3, 4);
input_data <- list(objA=d0,objB=d1,objC=d2,objD=d3,objE=d4);

# Creates output 
output_image_file <- "C:/Users/xieguigang/Desktop/venn_venn.tiff";

# Configs for the diagram
title <- "venn";
fill_color <- c("mediumorchid4","azure1","gray24","darkolivegreen3","grey13");

# Invoke drawing of the venn Diagram
venn.diagram(input_data,fill=fill_color,filename=output_image_file,
             width=5000,height=3000,main=title);

R.Bioinformatics 项目是 GCModeller 工具中的组件。 R API 通过 RDotNET 项目移植到 .NET 语言,本文基于我之前关于如何为 .NET 语言构建 R API 的文章中的 R API 工具

<R 静态语言 API 到 VB.NET 语言>

https://codeproject.org.cn/Articles/1083875/R-Statics-Language-API-to-VB-NET-Language

Using the Code

VisualBasic 与 R 混合编程的原因

一般来说,R 语言不太擅长处理大量文本,R 语言更适合用于数值数据分析和绘图,以表示您的研究数据。

生物信息学研究中分析的数据大小通常大于 10GB,甚至在一次计算实验中高达 100GB,例如针对参考序列数据库进行功能注释的 blastp BBH 分析,对 Pfam 数据库进行蛋白质功能结构分析的 blastp,或对基因组功能分析的 RNA-seq 实验。 并且大多数生物数据都存储为纯文本文件,以便与面向对象的数据库保持一致。

因此,R 语言需要一种工具语言在其分析工作流程的上游,以从实验数据中生成干净的输入,并且该工作流程通常与其他在大量文本数据处理方面具有高性能的语言混合编程,例如 python/R、Java/R 和 VisualBasic/R。

由于 .NET 语言受益于并行 Linq 工作流程和正则表达式,这使得 VisualBasic/C# 语言能够对大型文本处理具有高性能,并且可以处理任何文本格式的数据库。

原始数据由 .NET 程序处理并生成 R API 输入,然后通过 RDotNET 与 R 语言混合编程,最后,您的用户代码从 R 服务器读取原始输出数据,最终您可以将 R 对象序列化为 .NET 对象以进行下游分析。

R 混合工作流程

1. 在 Python、Java 或 Visualbasic 中用户代码处理大型原始数据以生成 R 数据输入

2. 与 R 混合编程以生成脚本工作流程

3. 从执行脚本中获取 R 服务器原始内存数据以进行下游分析。

venn.diagram R API

venn.diagram API 已经在 R.Bioinformatics 项目中创建。 此 API 在命名空间 RDotNet.Extensions.Bioinformatics.VennDiagram.vennDiagramPlot 中可用,其原始 API 详细信息可以在 R 控制台中通过 help 命令 ??venn.diagram 找到。

Imports RDotNet.Extensions.VisualBasic
Imports RDotNet.Extensions.VisualBasic.Services.ScriptBuilder
Imports RDotNet.Extensions.VisualBasic.Services.ScriptBuilder.RTypes

Namespace VennDiagram

    ''' <summary>
    ''' This function takes a list and creates a publication-quality TIFF Venn Diagram
    ''' </summary>
    <RFunc("venn.diagram")> Public Class vennDiagramPlot : Inherits vennBase

        ''' <summary>
        ''' A list of vectors (e.g., integers, chars), 
        ''' with each component corresponding to a separate circle in the Venn diagram
        ''' </summary>
        ''' <returns></returns>
        Public Property x As RExpression
        ''' <summary>
        ''' Filename for image output, Or if NULL returns the grid object itself
        ''' </summary>
        ''' <returns></returns>
        <Parameter("filename", ValueTypes.Path)> Public Property filename As String
        ''' <summary>
        ''' Integer giving the height Of the output figure In units
        ''' </summary>
        ''' <returns></returns>
        Public Property height As Integer = 4000
        ''' <summary>
        ''' Integer giving the width of the output figure in units
        ''' </summary>
        ''' <returns></returns>
        Public Property width As Integer = 7000
        ''' <summary>
        ''' Resolution of the final figure in DPI
        ''' </summary>
        ''' <returns></returns>
        Public Property resolution As Integer = 600
        ''' <summary>
        ''' Specification of the image format (e.g. tiff, png or svg)
        ''' </summary>
        ''' <returns></returns>
        Public Property imagetype As String = "tiff"
        ''' <summary>
        ''' Size-units to use for the final figure
        ''' </summary>
        ''' <returns></returns>
        Public Property units As String = "px"
        ''' <summary>
        ''' What compression algorithm should be applied to the final tiff
        ''' </summary>
        ''' <returns></returns>
        Public Property compression As String = "lzw"
        ''' <summary>
        ''' Missing value handling method: "none", "stop", "remove"
        ''' </summary>
        ''' <returns></returns>
        Public Property na As String = "stop"
        ''' <summary>
        ''' Character giving the main title of the diagram
        ''' </summary>
        ''' <returns></returns>
        Public Property main As RExpression = NULL
        ''' <summary>
        ''' Character giving the subtitle of the diagram
        ''' </summary>
        ''' <returns></returns>
        Public Property [sub] As RExpression = NULL
        ''' <summary>
        ''' Vector of length 2 indicating (x,y) of the main title
        ''' </summary>
        ''' <returns></returns>
        <Parameter("main.pos")> Public Property mainPos As RExpression = c(0.5, 1.05)
        ''' <summary>
        ''' Character giving the fontface (font style) of the main title
        ''' </summary>
        ''' <returns></returns>
        <Parameter("main.fontface")> Public Property mainFontface As String = "plain"
        ''' <summary>
        ''' Character giving the fontfamily (font type) of the main title
        ''' </summary>
        ''' <returns></returns>
        <Parameter("main.fontfamily")> Public Property mainFontfamily As String = "serif"
        ''' <summary>
        ''' Character giving the colour of the main title
        ''' </summary>
        ''' <returns></returns>
        <Parameter("main.col")> Public Property mainCol As String = "black"
        ''' <summary>
        ''' Number giving the cex (font size) of the main title
        ''' </summary>
        ''' <returns></returns>
        <Parameter("main.cex")> Public Property mainCex As Integer = 1
        ''' <summary>
        ''' Vector of length 2 indicating horizontal and 
        ''' vertical justification of the main title
        ''' </summary>
        ''' <returns></returns>
        <Parameter("main.just")> Public Property mainJust As RExpression = c(0.5, 1)
        ''' <summary>
        ''' Vector of length 2 indicating (x,y) of the subtitle
        ''' </summary>
        ''' <returns></returns>
        <Parameter("sub.pos")> Public Property subPos As RExpression = c(0.5, 1.05)
        ''' <summary>
        ''' Character giving the fontface (font style) of the subtitle
        ''' </summary>
        ''' <returns></returns>
        <Parameter("sub.fontface")> Public Property subFontface As String = "plain"
        ''' <summary>
        ''' Character giving the fontfamily (font type) of the subtitle
        ''' </summary>
        ''' <returns></returns>
        <Parameter("sub.fontfamily")> Public Property subFontfamily As String = "serif"
        ''' <summary>
        ''' Character Colour of the subtitle
        ''' </summary>
        ''' <returns></returns>
        <Parameter("sub.col")> Public Property subCol As String = "black"
        ''' <summary>
        ''' Number giving the cex (font size) of the subtitle
        ''' </summary>
        ''' <returns></returns>
        <Parameter("sub.cex")> Public Property subCex As Integer = 1
        ''' <summary>
        ''' Vector of length 2 indicating horizontal and 
        ''' vertical justification of the subtitle
        ''' </summary>
        ''' <returns></returns>
        <Parameter("sub.just")> Public Property subJust As RExpression = c(0.5, 1)
        ''' <summary>
        ''' Allow specification of category names using plotmath syntax
        ''' </summary>
        ''' <returns></returns>
        <Parameter("category.names")> Public Property categoryNames _
                                      As RExpression = names("x")
        ''' <summary>
        ''' Logical specifying whether to use only unique elements 
        ''' in each item of the input list or use all elements. Defaults to FALSE
        ''' </summary>
        ''' <returns></returns>
        <Parameter("force.unique")> Public Property forceUnique As Boolean = True
        ''' <summary>
        ''' Can be either 'raw' or 'percent'. This is the format that the numbers 
        ''' will be printed in. Can pass in a vector with the second element 
        ''' being printed under the first
        ''' </summary>
        ''' <returns></returns>
        <Parameter("print.mode")> Public Property printMode As String = "raw"
        ''' <summary>
        ''' If one of the elements in print.mode is 'percent', 
        ''' then this is how many significant digits will be kept
        ''' </summary>
        ''' <returns></returns>
        Public Property sigdigs As Integer = 3
        ''' <summary>
        ''' If this is equal to true, then the vector passed into 
        ''' area.vector will be directly assigned to the areas of the 
        ''' corresponding regions. Only use this if you know which positions 
        ''' in the vector correspond to which regions in the diagram
        ''' </summary>
        ''' <returns></returns>
        <Parameter("direct.area")> Public Property directArea As Boolean = False
        ''' <summary>
        ''' An argument to be used when direct.area is true. 
        ''' These are the areas of the corresponding regions in the Venn Diagram
        ''' </summary>
        ''' <returns></returns>
        <Parameter("area.vector")> Public Property areaVector As Integer = 0
        ''' <summary>
        ''' If there are only two categories in the venn diagram and 
        ''' total.population is not NULL, then perform the hypergeometric test 
        ''' and add it to the sub title.
        ''' </summary>
        ''' <returns></returns>
        <Parameter("hyper.test")> Public Property hyperTest As Boolean = False
        ''' <summary>
        ''' An argument to be used when hyper.test is true. 
        ''' This is the total population size
        ''' </summary>
        ''' <returns></returns>
        <Parameter("total.population")> Public Property totalPopulation _
                                        As RExpression = NULL

        ''' <summary>
        ''' The partition fill color
        ''' </summary>
        ''' <returns></returns>
        Public Property fill As RExpression

VennDiagram 数据模型

R 混合的步骤详细信息

维恩图数据模型在命名空间中可用

RDotNet.Extensions.Bioinformatics.VennDiagram.ModelAPI.VennDiagram

用于将数据模型自动转换为 R 脚本的函数

Imports System.Drawing
Imports System.Text
Imports System.Xml.Serialization
Imports Microsoft.VisualBasic
Imports Microsoft.VisualBasic.DocumentFormat.Csv
Imports Microsoft.VisualBasic.DocumentFormat.Csv.DocumentStream
Imports Microsoft.VisualBasic.Linq
Imports Microsoft.VisualBasic.Linq.Extensions
Imports RDotNet.Extensions.VisualBasic
Imports RDotNet.Extensions.VisualBasic.Services.ScriptBuilder

Const venn__plots_out As String = NameOf(venn__plots_out)

''' <summary>
''' Convert the data model as the R script for venn diagram drawing.
''' (将本数据模型对象转换为R脚本)
''' </summary>
''' <returns></returns>
''' <remarks></remarks>
Protected Overrides Function __R_script() As String
    Dim R As ScriptBuilder = New ScriptBuilder(capacity:=5 * 1024)
    Dim dataList As New List(Of String) ' The list elements for the 
                                        ' venn diagram partitions
    Dim color As New List(Of String) ' The partitions color name vector

    For i As Integer = 0 To partitions.Length - 1
        Dim x As Partition = partitions(i)
        Dim objName As String = x.Name.NormalizePathString.Replace(" ", "_")

        R += $"d{i} <- c({x.Vector})"
        color += x.Color
        dataList += $"{objName}=d{i}"

        If Not String.Equals(x.Name, objName) Then
             Call $"{x.Name} => '{objName}'".__DEBUG_ECHO
        End If
    Next

    plot.categoryNames = c(partitions.ToArray(Function(x) x.DisplName))

    R += $"input_data <- list({dataList.JoinBy(",")})"
    R += $"fill_color <- {c(color.ToArray)}"

    ' Calling the venn.diagram R API
    R += venn__plots_out <= plot.Copy("input_data", "fill_color", plot.categoryNames)

    Return R.ToString
End Function

使用维恩图模型

要直接从现有的维恩图 XML 模型文件绘制维恩图,您可以使用以下代码。 此代码从现有的 XML 文档加载维恩图数据模型,然后您可以直接从此模型生成 R 脚本

Imports Microsoft.VisualBasic.CommandLine.Reflection
Imports Microsoft.VisualBasic.ConsoleDevice.STDIO
Imports Microsoft.VisualBasic.Scripting.MetaData
Imports Microsoft.VisualBasic.Linq
Imports Microsoft.VisualBasic.DocumentFormat.Csv
Imports RDotNET.Extensions.VisualBasic.RSystem
Imports RDotNET.Extensions.VisualBasic
Imports RDotNET.Extensions.Bioinformatics.VennDiagram.ModelAPI

Dim venn As VennDiagram = path.LoadXml(Of VennDiagram)
Dim EXPORT As String = venn.saveTiff.TrimFileExt & ".r"

Call TryInit()
Call venn.RScript.SaveTo(EXPORT, Encodings.ASCII.GetEncodings)
Call RSystem.Source(EXPORT)
Call Process.Start(venn.saveTiff)

要从 CSV 原始数据文件绘制维恩图,您应该使用函数 RModelAPI.Generate: 将原始 CSV 数据集转换为维恩图中的分区:

Private Function __run(inData As String, title As String, _
    options As String, out As String, R_HOME As String) As Integer
    Dim dataset As DocumentStream.File = New DocumentStream.File(inData)
    Dim VennDiagram As VennDiagram = RModelAPI.Generate(source:=dataset)

    If String.IsNullOrEmpty(options) Then '从原始数据中进行推测
        VennDiagram += From col As String In dataset.First Select _
                       {col, GetRandomColor()} '
    Else '从用户输入之中进行解析
        VennDiagram += From s As String In options.Split(CChar(";")) _
                       Select s.Split(CChar(",")) '
    End If

    VennDiagram.Title = title
    VennDiagram.saveTiff = out

    Dim RScript As String = VennDiagram.RScript
    Dim EXPORT As String = FileIO.FileSystem.GetParentPath(out)
    EXPORT = $"{EXPORT}/{title.NormalizePathString}_venn.r"

    If Not R_HOME.DirectoryExists Then
        Call TryInit()
    Else
        Call TryInit(R_HOME)
    End If

    Call RScript.SaveTo(EXPORT, Encodings.ASCII.GetEncodings)
    Call VennDiagram.SaveAsXml(EXPORT.TrimFileExt & ".Xml")
    Call RSystem.Source(EXPORT)

    Printf("The venn diagram r script were saved at location:\n '%s'", EXPORT)
    Call Process.Start(out)

    Return 0
End Function

从 csv 原始数据生成维恩图中的分区

Imports System.Drawing
Imports System.Runtime.CompilerServices
Imports System.Text
Imports System.Xml.Serialization
Imports Microsoft.VisualBasic
Imports Microsoft.VisualBasic.DocumentFormat.Csv
Imports Microsoft.VisualBasic.DocumentFormat.Csv.DocumentStream
Imports Microsoft.VisualBasic.Linq
Imports Microsoft.VisualBasic.Linq.Extensions
Imports RDotNET.Extensions.VisualBasic

Namespace VennDiagram.ModelAPI

    Public Module RModelAPI

        ''' <summary>
        ''' 从一个Excel逗号分割符文件之中生成一个文氏图的数据模型
        ''' </summary>
        ''' <param name="source"></param>
        ''' <returns></returns>
        ''' <remarks></remarks>
        Public Function Generate(source As DocumentStream.File) As VennDiagram
            Dim LQuery = From vec
                         In __vector(source:=source)
                         Select New Partition With {
                             .Vector = String.Join(", ", vec.Value),
                             .Name = vec.Key
                         } '
            Return New VennDiagram With {
                .partitions = LQuery.ToArray
            }
        End Function

        Private Function __vector(source As File) As Dictionary(Of String, String())
            Dim Width As Integer = source.First.Count
            Dim Vector = (From name As String
                          In source.First
                          Select k = name,
                              lst = New List(Of String)).ToArray

            For row As Integer = 1 To source.RowNumbers - 1
                Dim Line As RowObject = source(row)
                For colums As Integer = 0 To Width - 1
                    If Not String.IsNullOrEmpty(Line.Column(colums).Trim) Then
                        Call Vector(colums).lst.Add(CStr(row))
                    End If
                Next
            Next

            Return Vector.ToDictionary(Function(x) x.k, Function(x) x.lst.ToArray)
        End Function

运行示例工具

一个用于 VisualBasic 中维恩图绘图的示例工具已在 github 上发布。 您可以从示例链接下载此示例应用程序,并在控制台中键入 venn man 以获取维恩工具的帮助手册

E:\GCModeller\GCModeller-x64\Templates>venn man
GCModeller [version 1.3.11.2]
Module AssemblyName: file:///E:/GCModeller/GCModeller-x64/venn.exe
Root namespace: LANS.SystemsBiology.AnalysisTools.DataVisualization.VennDiagramTools

All of the command that available in this program has been list below:

 .Draw:  Draw the venn diagram from a csv data file, 
         you can specific the diagram drawing options from this command switch value. 
         The generated venn dragram will be saved as tiff file format.

Commands
--------------------------------------------------------------------------------
1.  Help for command '.Draw':

  Information:  Draw the venn diagram from a csv data file, 
                you can specific the diagram drawing options from this 
                command switch value. The generated venn dragram will be 
                saved as tiff file format.
  Usage:        E:\GCModeller\GCModeller-x64\venn.exe .Draw -i <csv_file> 
                [-t <diagram_title> -o <_diagram_saved_path> 
                 -s <partitions_option_pairs> -rbin <r_bin_directory>]
  Example:      venn .Draw .Draw -i /home/xieguigang/Desktop/genomes.csv 
                -t genome-compared -o ~/Desktop/xcc8004.tiff -s "Xcc8004,
                blue,Xcc 8004;ecoli,green,Ecoli. K12;pa14,yellow,PA14;ftn,
                black,FTN;aciad,red,ACIAD"

  Parameters information:
   ---------------------------------------
    -i
    Description:  The csv data source file for drawing the venn diagram graph.

    Example:      -i "/home/xieguigang/Desktop/genomes.csv"

   [-t]
    Description:  Optional, the venn diagram title text

    Example:      -t "genome-compared"

   [-o]
    Description:  Optional, the saved file location for the venn diagram, 
                  if this switch value is not specific by the user then
                  the program will save the generated venn diagram to 
                  user desktop folder and using the file name of the input 
                  csv file as default.

    Example:      -o "~/Desktop/xcc8004.tiff"

   [-s]
    Description:  Optional, the profile settings for the partitions 
                  in the venn diagram, each partition profile data is
                  in a key value paired like: name,color, 
                  and each partition profile pair is seperated by a ';' character.
                  If this switch value is not specific by the user then 
                  the program will trying to parse the partition name
                  from the column values and apply for each partition a randomize color.

    Example:      -s "Xcc8004,blue,Xcc 8004;ecoli,green,Ecoli. K12;
                  pa14,yellow,PA14;ftn,black,FTN;aciad,red,ACIAD"

   [-rbin]
    Description:  Optional, Set up the r bin path for drawing the venn diagram, 
                  if this switch value is not specific by the user then
                  the program just output the venn diagram drawing R script file 
                  in a specific location, or if this switch
                  value is specific by the user and is valid for call the R program 
                  then will output both venn diagram tiff image file and R script 
                  for drawing the output venn diagram.
                  This switch value is just for the windows user, 
                  when this program was running on a LINUX/UNIX/MAC platform 
                  operating system, you can ignore this switch value, 
                  but you should install the R program in your linux/MAC first 
                  if you wish to get the venn diagram directly from this program.

    Example:      -rbin "C:\\R\\bin\\"

使用示例 utils CLI

venn .Draw -i <csv_file> [-t <diagram_title> -o <_diagram_saved_path> 
           -s <serials_option_pairs> -rbin <r_bin_directory>]

CLI 示例是

venn .Draw -i "E:\GCModeller\GCModeller-x64\Templates\venn.csv" 
           -t "test example plot title" -s objA,blue,"Object Test A";objB,
           red,"BBBB";objC,green,"3333333";objD,black,"DEFGGG, HI";objE,yellow,"Good!!"

示例的运行结果输出
© . All rights reserved.