Click here to Skip to main content
13,767,441 members
Click here to Skip to main content
Add your own
alternative version

Stats

7.7K views
2 bookmarked
Posted 4 Apr 2016
Licenced CPOL

Venn Diagram in VisualBasic

Rate this:
Please Sign up or sign in to vote.
R API for drawing venn diagram in VisualBasic

Introduction

The venn diagram is a kind of diagram plot for represents the relationships between the data sets. For example, in the biological research area, the Venn diagram can be used for represents the common and unique elements between the bacterial genome by using the protein BBH blastp analysis result.

Background

The R language is a kind of popular language in the data mining and machine learning, and it also is a powerful tools on the data visualization. For drawing a venn diagram in R language, a package named VennDiagram is recommended for this plots:

https://cran.r-project.org/web/packages/VennDiagram/index.html

Here is a simple example of drawing the venn diagram in R language:

library(VennDiagram)

# Creates the data set
d0 <- c(3, 4, 5);
d1 <- c(2, 3);
d2 <- c(1, 3);
d3 <- c(3, 5);
d4 <- c(1, 2, 3, 4);
input_data <- list(objA=d0,objB=d1,objC=d2,objD=d3,objE=d4);

# Creates output 
output_image_file <- "C:/Users/xieguigang/Desktop/venn_venn.tiff";

# Configs for the diagram
title <- "venn";
fill_color <- c("mediumorchid4","azure1","gray24","darkolivegreen3","grey13");

# Invoke drawing of the venn Diagram
venn.diagram(input_data,fill=fill_color,filename=output_image_file,width=5000,height=3000,main=title);

The R.Bioinformatics project is part of the component in GCModeller tools. R API port to .NET language through RDotNET project and this article is based on the R API tools from my previous article about how to build a R API for .NET language:

<R Statics Language API to VB.NET Language

http://www.codeproject.com/Articles/1083875/R-Statics-Language-API-to-VB-NET-Language

Using the code

Reasons of hybrids programming R with VisualBasic

In generally, the R language is not so good at large amount text process, R language is more prefer on the numerical data analysis and plotting for represents your research data. 

The analysised data size in the bioinformatics research is usually bigger than 10GB and even more up to 100GB in one computational experiment, such as the blastp BBH analysis against the reference sequence database for function annotation, blastp on Pfam database for protein function structure analysis, or RNA-seq experiments on the genome function analysis. And most of the biological data is store as plant text file to consistent a object-oriented database.

So that the R language needs a kinds of tools language on its analysis workflow upstream to generates the clean input from the experiment data, and this workflow is usually hybrids programming with other language that high performance on large amount text data processing, such as python/R, Java/R and VisualBasic/R.

Due to the reason of .NET language benefits from the parallel Linq workflow and regular expression, this makes the possible of VisualBasic/C# language have the capability of high performance on large size text process and can deal with any text format database.

The raw data was processing by .NET program and generate the R API input, then hybrid programming with R language through RDotNET, at last, Your user code reads the raw output data from R server, finally you are able to serialize the R object as .NET object for the downstream analysis.

R hybrids workflow:

1. User code in Python, java or Visualbasic on the large size raw data to generates the R data input

2. Hybrids programming with R to generates the script workflow

3. Gets R server raw memory data from execute the script for downstream analysis.

The venn.diagram R API

The venn.diagram API is already been created in the R.Bioinformatics project. This API is available at namespace RDotNet.Extensions.Bioinformatics.VennDiagram.vennDiagramPlot which its original API details can be found from help command ??venn.diagram in R console.

Imports RDotNet.Extensions.VisualBasic
Imports RDotNet.Extensions.VisualBasic.Services.ScriptBuilder
Imports RDotNet.Extensions.VisualBasic.Services.ScriptBuilder.RTypes

Namespace VennDiagram

    ''' <summary>
    ''' This function takes a list and creates a publication-quality TIFF Venn Diagram
    ''' </summary>
    <RFunc("venn.diagram")> Public Class vennDiagramPlot : Inherits vennBase

        ''' <summary>
        ''' A list of vectors (e.g., integers, chars), with each component corresponding to a separate circle in the Venn diagram
        ''' </summary>
        ''' <returns></returns>
        Public Property x As RExpression
        ''' <summary>
        ''' Filename for image output, Or if NULL returns the grid object itself
        ''' </summary>
        ''' <returns></returns>
        <Parameter("filename", ValueTypes.Path)> Public Property filename As String
        ''' <summary>
        ''' Integer giving the height Of the output figure In units
        ''' </summary>
        ''' <returns></returns>
        Public Property height As Integer = 4000
        ''' <summary>
        ''' Integer giving the width of the output figure in units
        ''' </summary>
        ''' <returns></returns>
        Public Property width As Integer = 7000
        ''' <summary>
        ''' Resolution of the final figure in DPI
        ''' </summary>
        ''' <returns></returns>
        Public Property resolution As Integer = 600
        ''' <summary>
        ''' Specification of the image format (e.g. tiff, png or svg)
        ''' </summary>
        ''' <returns></returns>
        Public Property imagetype As String = "tiff"
        ''' <summary>
        ''' Size-units to use for the final figure
        ''' </summary>
        ''' <returns></returns>
        Public Property units As String = "px"
        ''' <summary>
        ''' What compression algorithm should be applied to the final tiff
        ''' </summary>
        ''' <returns></returns>
        Public Property compression As String = "lzw"
        ''' <summary>
        ''' Missing value handling method: "none", "stop", "remove"
        ''' </summary>
        ''' <returns></returns>
        Public Property na As String = "stop"
        ''' <summary>
        ''' Character giving the main title of the diagram
        ''' </summary>
        ''' <returns></returns>
        Public Property main As RExpression = NULL
        ''' <summary>
        ''' Character giving the subtitle of the diagram
        ''' </summary>
        ''' <returns></returns>
        Public Property [sub] As RExpression = NULL
        ''' <summary>
        ''' Vector of length 2 indicating (x,y) of the main title
        ''' </summary>
        ''' <returns></returns>
        <Parameter("main.pos")> Public Property mainPos As RExpression = c(0.5, 1.05)
        ''' <summary>
        ''' Character giving the fontface (font style) of the main title
        ''' </summary>
        ''' <returns></returns>
        <Parameter("main.fontface")> Public Property mainFontface As String = "plain"
        ''' <summary>
        ''' Character giving the fontfamily (font type) of the main title
        ''' </summary>
        ''' <returns></returns>
        <Parameter("main.fontfamily")> Public Property mainFontfamily As String = "serif"
        ''' <summary>
        ''' Character giving the colour of the main title
        ''' </summary>
        ''' <returns></returns>
        <Parameter("main.col")> Public Property mainCol As String = "black"
        ''' <summary>
        ''' Number giving the cex (font size) of the main title
        ''' </summary>
        ''' <returns></returns>
        <Parameter("main.cex")> Public Property mainCex As Integer = 1
        ''' <summary>
        ''' Vector of length 2 indicating horizontal and vertical justification of the main title
        ''' </summary>
        ''' <returns></returns>
        <Parameter("main.just")> Public Property mainJust As RExpression = c(0.5, 1)
        ''' <summary>
        ''' Vector of length 2 indicating (x,y) of the subtitle
        ''' </summary>
        ''' <returns></returns>
        <Parameter("sub.pos")> Public Property subPos As RExpression = c(0.5, 1.05)
        ''' <summary>
        ''' Character giving the fontface (font style) of the subtitle
        ''' </summary>
        ''' <returns></returns>
        <Parameter("sub.fontface")> Public Property subFontface As String = "plain"
        ''' <summary>
        ''' Character giving the fontfamily (font type) of the subtitle
        ''' </summary>
        ''' <returns></returns>
        <Parameter("sub.fontfamily")> Public Property subFontfamily As String = "serif"
        ''' <summary>
        ''' Character Colour of the subtitle
        ''' </summary>
        ''' <returns></returns>
        <Parameter("sub.col")> Public Property subCol As String = "black"
        ''' <summary>
        ''' Number giving the cex (font size) of the subtitle
        ''' </summary>
        ''' <returns></returns>
        <Parameter("sub.cex")> Public Property subCex As Integer = 1
        ''' <summary>
        ''' Vector of length 2 indicating horizontal and vertical justification of the subtitle
        ''' </summary>
        ''' <returns></returns>
        <Parameter("sub.just")> Public Property subJust As RExpression = c(0.5, 1)
        ''' <summary>
        ''' Allow specification of category names using plotmath syntax
        ''' </summary>
        ''' <returns></returns>
        <Parameter("category.names")> Public Property categoryNames As RExpression = names("x")
        ''' <summary>
        ''' Logical specifying whether to use only unique elements in each item of the input list or use all elements. Defaults to FALSE
        ''' </summary>
        ''' <returns></returns>
        <Parameter("force.unique")> Public Property forceUnique As Boolean = True
        ''' <summary>
        ''' Can be either 'raw' or 'percent'. This is the format that the numbers will be printed in. Can pass in a vector with the second element being printed under the first
        ''' </summary>
        ''' <returns></returns>
        <Parameter("print.mode")> Public Property printMode As String = "raw"
        ''' <summary>
        ''' If one of the elements in print.mode is 'percent', then this is how many significant digits will be kept
        ''' </summary>
        ''' <returns></returns>
        Public Property sigdigs As Integer = 3
        ''' <summary>
        ''' If this is equal to true, then the vector passed into area.vector will be directly assigned to the areas of the corresponding regions. Only use this if you know which positions in the vector correspond to which regions in the diagram
        ''' </summary>
        ''' <returns></returns>
        <Parameter("direct.area")> Public Property directArea As Boolean = False
        ''' <summary>
        ''' An argument to be used when direct.area is true. These are the areas of the corresponding regions in the Venn Diagram
        ''' </summary>
        ''' <returns></returns>
        <Parameter("area.vector")> Public Property areaVector As Integer = 0
        ''' <summary>
        ''' If there are only two categories in the venn diagram and total.population is not NULL, then perform the hypergeometric test and add it to the sub title.
        ''' </summary>
        ''' <returns></returns>
        <Parameter("hyper.test")> Public Property hyperTest As Boolean = False
        ''' <summary>
        ''' An argument to be used when hyper.test is true. This is the total population size
        ''' </summary>
        ''' <returns></returns>
        <Parameter("total.population")> Public Property totalPopulation As RExpression = NULL

        ''' <summary>
        ''' The partition fill color
        ''' </summary>
        ''' <returns></returns>
        Public Property fill As RExpression

The VennDiagram Data Model

Steps details on R hybrids

The venn diagram data model is available at namespace 

RDotNet.Extensions.Bioinformatics.VennDiagram.ModelAPI.VennDiagram

Function for convert the data model into R script automatically:   

Imports System.Drawing
Imports System.Text
Imports System.Xml.Serialization
Imports Microsoft.VisualBasic
Imports Microsoft.VisualBasic.DocumentFormat.Csv
Imports Microsoft.VisualBasic.DocumentFormat.Csv.DocumentStream
Imports Microsoft.VisualBasic.Linq
Imports Microsoft.VisualBasic.Linq.Extensions
Imports RDotNet.Extensions.VisualBasic
Imports RDotNet.Extensions.VisualBasic.Services.ScriptBuilder

Const venn__plots_out As String = NameOf(venn__plots_out)

''' <summary>
''' Convert the data model as the R script for venn diagram drawing.(将本数据模型对象转换为R脚本)
''' </summary>
''' <returns></returns>
''' <remarks></remarks>
Protected Overrides Function __R_script() As String
    Dim R As ScriptBuilder = New ScriptBuilder(capacity:=5 * 1024)
    Dim dataList As New List(Of String) ' The list elements for the venn diagram partitions
    Dim color As New List(Of String) ' The partitions color name vector

    For i As Integer = 0 To partitions.Length - 1
        Dim x As Partition = partitions(i)
        Dim objName As String = x.Name.NormalizePathString.Replace(" ", "_")

        R += $"d{i} <- c({x.Vector})"
        color += x.Color
        dataList += $"{objName}=d{i}"

        If Not String.Equals(x.Name, objName) Then
             Call $"{x.Name} => '{objName}'".__DEBUG_ECHO
        End If
    Next

    plot.categoryNames = c(partitions.ToArray(Function(x) x.DisplName))

    R += $"input_data <- list({dataList.JoinBy(",")})"
    R += $"fill_color <- {c(color.ToArray)}"

    ' Calling the venn.diagram R API
    R += venn__plots_out <= plot.Copy("input_data", "fill_color", plot.categoryNames)

    Return R.ToString
End Function

Using the Venn diagram Model

For drawing a venn diagram directly from a exists venn diagram Xml model file, you can using the code below, this code load the venn diagram data model from a exists XML document and then you can generates the R script directly from this model:

Imports Microsoft.VisualBasic.CommandLine.Reflection
Imports Microsoft.VisualBasic.ConsoleDevice.STDIO
Imports Microsoft.VisualBasic.Scripting.MetaData
Imports Microsoft.VisualBasic.Linq
Imports Microsoft.VisualBasic.DocumentFormat.Csv
Imports RDotNET.Extensions.VisualBasic.RSystem
Imports RDotNET.Extensions.VisualBasic
Imports RDotNET.Extensions.Bioinformatics.VennDiagram.ModelAPI

Dim venn As VennDiagram = path.LoadXml(Of VennDiagram)
Dim EXPORT As String = venn.saveTiff.TrimFileExt & ".r"

Call TryInit()
Call venn.RScript.SaveTo(EXPORT, Encodings.ASCII.GetEncodings)
Call RSystem.Source(EXPORT)
Call Process.Start(venn.saveTiff)

For drawing a venn diagram from a csv raw data file, you should convert the raw csv dataset as the partitions in Venn diagram by using the function RModelAPI.Generate:

Private Function __run(inData As String, title As String, options As String, out As String, R_HOME As String) As Integer
    Dim dataset As DocumentStream.File = New DocumentStream.File(inData)
    Dim VennDiagram As VennDiagram = RModelAPI.Generate(source:=dataset)

    If String.IsNullOrEmpty(options) Then '从原始数据中进行推测
        VennDiagram += From col As String In dataset.First Select {col, GetRandomColor()} '
    Else '从用户输入之中进行解析
        VennDiagram += From s As String In options.Split(CChar(";")) Select s.Split(CChar(",")) '
    End If

    VennDiagram.Title = title
    VennDiagram.saveTiff = out

    Dim RScript As String = VennDiagram.RScript
    Dim EXPORT As String = FileIO.FileSystem.GetParentPath(out)
    EXPORT = $"{EXPORT}/{title.NormalizePathString}_venn.r"

    If Not R_HOME.DirectoryExists Then
        Call TryInit()
    Else
        Call TryInit(R_HOME)
    End If

    Call RScript.SaveTo(EXPORT, Encodings.ASCII.GetEncodings)
    Call VennDiagram.SaveAsXml(EXPORT.TrimFileExt & ".Xml")
    Call RSystem.Source(EXPORT)

    Printf("The venn diagram r script were saved at location:\n '%s'", EXPORT)
    Call Process.Start(out)

    Return 0
End Function

Generates the partitions in th Venn diagram from the csv raw data:

Imports System.Drawing
Imports System.Runtime.CompilerServices
Imports System.Text
Imports System.Xml.Serialization
Imports Microsoft.VisualBasic
Imports Microsoft.VisualBasic.DocumentFormat.Csv
Imports Microsoft.VisualBasic.DocumentFormat.Csv.DocumentStream
Imports Microsoft.VisualBasic.Linq
Imports Microsoft.VisualBasic.Linq.Extensions
Imports RDotNET.Extensions.VisualBasic

Namespace VennDiagram.ModelAPI

    Public Module RModelAPI

        ''' <summary>
        ''' 从一个Excel逗号分割符文件之中生成一个文氏图的数据模型
        ''' </summary>
        ''' <param name="source"></param>
        ''' <returns></returns>
        ''' <remarks></remarks>
        Public Function Generate(source As DocumentStream.File) As VennDiagram
            Dim LQuery = From vec
                         In __vector(source:=source)
                         Select New Partition With {
                             .Vector = String.Join(", ", vec.Value),
                             .Name = vec.Key
                         } '
            Return New VennDiagram With {
                .partitions = LQuery.ToArray
            }
        End Function

        Private Function __vector(source As File) As Dictionary(Of String, String())
            Dim Width As Integer = source.First.Count
            Dim Vector = (From name As String
                          In source.First
                          Select k = name,
                              lst = New List(Of String)).ToArray

            For row As Integer = 1 To source.RowNumbers - 1
                Dim Line As RowObject = source(row)
                For colums As Integer = 0 To Width - 1
                    If Not String.IsNullOrEmpty(Line.Column(colums).Trim) Then
                        Call Vector(colums).lst.Add(CStr(row))
                    End If
                Next
            Next

            Return Vector.ToDictionary(Function(x) x.k, Function(x) x.lst.ToArray)
        End Function

Running the example tools

A example tools for the venn diagram plots in the VisualBasic is already been release on github, you can download this example application from the example link, and typing venn man in the console for getting the help manual of the venn tools:

E:\GCModeller\GCModeller-x64\Templates>venn man
GCModeller [version 1.3.11.2]
Module AssemblyName: file:///E:/GCModeller/GCModeller-x64/venn.exe
Root namespace: LANS.SystemsBiology.AnalysisTools.DataVisualization.VennDiagramTools

All of the command that available in this program has been list below:

 .Draw:  Draw the venn diagram from a csv data file, you can specific the diagram drawing options from this command switch value. The generated venn dragram will be saved as tiff file format.

Commands
--------------------------------------------------------------------------------
1.  Help for command '.Draw':

  Information:  Draw the venn diagram from a csv data file, you can specific the diagram drawing options from this command switch value. The generated venn dragram will be saved as tiff file format.
  Usage:        E:\GCModeller\GCModeller-x64\venn.exe .Draw -i <csv_file> [-t <diagram_title> -o <_diagram_saved_path> -s <partitions_option_pairs> -rbin <r_bin_directory>]
  Example:      venn .Draw .Draw -i /home/xieguigang/Desktop/genomes.csv -t genome-compared -o ~/Desktop/xcc8004.tiff -s "Xcc8004,blue,Xcc 8004;ecoli,green,Ecoli. K12;pa14,yellow,PA14;ftn,black,FTN;aciad,red,ACIAD"

  Parameters information:
   ---------------------------------------
    -i
    Description:  The csv data source file for drawing the venn diagram graph.

    Example:      -i "/home/xieguigang/Desktop/genomes.csv"

   [-t]
    Description:  Optional, the venn diagram title text

    Example:      -t "genome-compared"

   [-o]
    Description:  Optional, the saved file location for the venn diagram, if this switch value is not specific by the user then
              the program will save the generated venn diagram to user desktop folder and using the file name of the input csv file as default.

    Example:      -o "~/Desktop/xcc8004.tiff"

   [-s]
    Description:  Optional, the profile settings for the partitions in the venn diagram, each partition profile data is
               in a key value paired like: name,color, and each partition profile pair is seperated by a ';' character.
              If this switch value is not specific by the user then the program will trying to parse the partition name
              from the column values and apply for each partition a randomize color.

    Example:      -s "Xcc8004,blue,Xcc 8004;ecoli,green,Ecoli. K12;pa14,yellow,PA14;ftn,black,FTN;aciad,red,ACIAD"

   [-rbin]
    Description:  Optional, Set up the r bin path for drawing the venn diagram, if this switch value is not specific by the user then
              the program just output the venn diagram drawing R script file in a specific location, or if this switch
              value is specific by the user and is valid for call the R program then will output both venn diagram tiff image file and R script for drawing the output venn diagram.
              This switch value is just for the windows user, when this program was running on a LINUX/UNIX/MAC platform operating
              system, you can ignore this switch value, but you should install the R program in your linux/MAC first if you wish to
               get the venn diagram directly from this program.

    Example:      -rbin "C:\\R\\bin\\"

Using the example utils CLI:

venn .Draw -i <csv_file> [-t <diagram_title> -o <_diagram_saved_path> -s <serials_option_pairs> -rbin <r_bin_directory>]

A CLI example is:

venn .Draw -i "E:\GCModeller\GCModeller-x64\Templates\venn.csv" -t "test example plot title" -s objA,blue,"Object Test A";objB,red,"BBBB";objC,green,"3333333";objD,black,"DEFGGG, HI";objE,yellow,"Good!!"

The running result output of the example

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)

Share

About the Author

Mr. xieguigang 谢桂纲
Student 中国南方微生物资源利用中心(SMRUCC)
China China
He is good and loves VisualBasic!



github: https://github.com/xieguigang

You may also be interested in...

Comments and Discussions

 
-- There are no messages in this forum --
Permalink | Advertise | Privacy | Cookies | Terms of Use | Mobile
Web01-2016 | 2.8.181116.1 | Last Updated 4 Apr 2016
Article Copyright 2016 by Mr. xieguigang 谢桂纲
Everything else Copyright © CodeProject, 1999-2018
Layout: fixed | fluid