MS Chart Control 和 VB.NET 在统计中的应用:绘制 Kaplan-Meier 估计。






4.90/5 (15投票s)
我们在 Web 应用程序中使用 VB.NET 和 MS Chart,从用户上传的或在线数据库中提取数据,使用 Kaplan-Meier 公式创建生命表并绘制生存曲线。整个操作通过用户友好的基于 Web 的向导完成。
引言
自从 17 世纪以来,随着精算科学和人口统计学的发展,生存分析开始出现。[1] 生存分析,今天被广泛应用于临床研究,不仅限于对实际生存率的分析,测量死亡率,而且具有更广泛的意义,包括临床试验研究者可能想要研究的任何“事件”。 Kaplan 和 Meier 在 1958 年发表的一篇研究论文,被引用超过 33,000 次[2],提出了他们“著名的”生存曲线估计器,这导致了生存分析的重大进展。
Kaplan 和 Meier 提出了经典生命表的“连续时间”版本,后者基于将时间划分为固定间隔[1]。
有很多商业上可用的桌面软件,用于计算总体或无事件生存率并绘制 Kaplan-Meier 估计;例如 IBM SPSS ® 和 MedCalc ®。“R”等开源统计软件也可使用,但是,“标准”用户仍然需要接受专门的培训才能有效使用该软件。虽然可以实现 Web 启用的数据收集和分析界面,但要执行它并非易事。
我们在 Web 应用程序中使用 VB.Net 和 MS Chart,从用户上传的或在线数据库中提取数据,使用 Kaplan-Meier 公式创建生命表并绘制生存曲线。整个操作通过用户友好的基于 Web 的向导完成。
向导首先要求用户选择要分析的数据库表。选择后,用户会将“开始日期”、“结束日期”、“事件”和“分层依据”与他上传表中的字段进行匹配。
然后将绘制曲线以及生命表。以编程方式生成的图表将通过 MS Chart 的工具提示属性描述指定时间点的生存数据。
我们的工具提供了一种“实时”分析生存数据的方法,可以对代码进行操作以符合研究方案的统计考量部分(例如,设置最佳的随访期)。可以导出生成的生命表以进行进一步的研究。 作为基于 Web 的应用程序,不需要客户端设置。 与其他统计软件相比,无需在服务器上安装其他软件。
背景
令 S(t) 为给定研究人群中的患者的寿命超过 t [3,4] 的概率。对于来自该人群的大小为 N 的样本,令观察到的 N 个样本成员的死亡时间为
在下面的示例中,有十名患者参加了一项研究,“持续时间”表示 [结束日期 - 开始日期],其中开始日期是研究或临床试验的登记日期。 如果发生“死亡”或“事件”,结束日期分别代表死亡日期或事件日期。 如果患者仍然存活或未发生指定的“事件”,则结束日期是最后随访日期。
对于每个 ti,对应的是 ni (时间 ti 之前“有风险”的数量),以及 di (时间 ti 发生的死亡或事件的数量)。
为了将其应用于前面的示例,让我们设置“事件”= 1 的时间点
第一个时间点 @ 3 个月
ni = 10
di = 1
第二个时间点 @ 11 个月
ni = 10(初始)– 1(患者在时间点 1 死亡)– 1(患者在持续时间 9 个月时存活)= 8
di = 2(两名患者在持续时间 11 个月时死亡)
即 此示例中 11 个月的生存率 = 67.5 %
使用代码
该项目由两个网页(向导 和 绘图)以及放置在 "App_Code" 文件夹中的 VB.Net 代码文件 (SurvivalCurve.vb) 组成。 以下代码以内联方式记录。
向导页面
以下是构建向导所需的 ASP.Net 控件
<asp:Label ID="Label1" runat="server" Text="Label"> </asp:Label>
<asp:Label ID="Label2" runat="server" Text="Label"> </asp:Label>
<asp:DropDownList ID="myList" runat="server"> </asp:DropDownList>
<asp:Button ID="Button3" runat="server" Text="Select Table" />
<asp:TextBox ID="choosetbl" runat="server">Table / Query Name</asp:TextBox>
<asp:Button ID="Button2" runat="server" Text="Cancel" />
<asp:Button ID="Button1" runat="server" Text="Next" />
以下是代码隐藏文件
Imports System.Data
Imports System.Data.OleDb
Partial Class DynamicKaplan
Inherits System.Web.UI.Page
Protected Sub Page_Load(ByVal sender As Object, ByVal e As System.EventArgs) Handles Me.Load
If Session("ChosenTable") = "" Then InitMyDB()
End Sub
Private Sub InitMyDB()
Dim myDB As New OleDbConnection
Try
myDB = New OleDbConnection("Provider=Microsoft.Jet.OLEDB.4.0;Data Source=" + Session("CurrentDB") + ";Persist Security Info=True")
myDB.Open()
Catch
Threading.Thread.Sleep(1)
Response.Redirect("Default.aspx")
End Try
'Retrieve the name of the current database
Dim tmpString = myDB.DataSource.ToString
Dim tmpInd = tmpString.LastIndexOf("|")
tmpString = Right(tmpString, Len(tmpString) - (tmpInd + 1))
Label1.Text = "The currently used Database is " & tmpString
Label2.Text = "Please select a table or query from the database to estimate Kaplan Meier curves from: "
'Retrieve the tables & queries schema
Dim mytables = myDB.GetOleDbSchemaTable(OleDbSchemaGuid.Tables, New Object() {})
myDB.Close()
'Iterate through the schema & fill the names to the dynamically created list
' myList.Items.Clear()
For i = 1 To mytables.Rows.Count
'to clean the list from annoying system tables
If mytables.Rows(i - 1).Item(3).ToString = "VIEW" Then
'Item(2 is the third column containing table name from the schema)
myList.Items.Add(mytables.Rows(i - 1).Item(2).ToString)
End If
Next i
End Sub
Protected Sub Button1_Click(ByVal sender As Object, ByVal e As System.EventArgs) Handles Button1.Click
If Session("ChosenTable") = "" Then
'this is the first time click
Session("ChosenTable") = choosetbl.Text
Session("CurrentLoadState") = "Tables"
End If
LoadCases()
End Sub
Private Sub LoadCases()
Dim CurrentLoadStateEx = Session("CurrentLoadState")
Select Case CurrentLoadStateEx
Case "Tables"
'The user has pressed the "Next" Button for the first time, and chosen a table. It's time to fill the list with the available columns in the chosen table.
InitSurvivalStatus()
Session("CurrentLoadState") = "SurvivalStatus"
Case "SurvivalStatus"
Session("SurvivalStatusColumn") = myList.SelectedItem.Text
Label1.Text = Label1.Text & ", 'EVENT STATUS' Column is '" & Session("SurvivalStatusColumn") & "' "
Label2.Text = "Please select the field of 'REGISTRATION DATE' (The date of first contact) : "
myList.Items.Remove(myList.SelectedItem)
Session("CurrentLoadState") = "RegDate"
Case "RegDate"
Session("RegDateColumn") = myList.SelectedItem.Text
Label1.Text = Label1.Text & "& the 'REGISTRATION DATE' Column is '" & Session("RegDateColumn") & "' "
Label2.Text = "Please select the field of 'LAST CONTACT DATE' : "
myList.Items.Remove(myList.SelectedItem)
Session("CurrentLoadState") = "LastContact"
Case "LastContact"
Session("LastContactColumn") = myList.SelectedItem.Text
Label1.Text = Label1.Text & "& the 'LAST CONTACT DATE' Column is '" & Session("LastContactColumn") & "'"
Label2.Text = "Please select the field of 'STRATIFICATION' : "
myList.Items.Remove(myList.SelectedItem)
myList.Items.Insert(0, "None")
Session("CurrentLoadState") = "Strata"
Case "Strata"
Session("Strata") = myList.SelectedItem.Text
If Session("Strata") <> "None" Then
CheckStrata()
Else
Session("StrataCount") = 1
End If
Response.Redirect("KaplanAccess.aspx")
End Select
End Sub
Private Sub InitSurvivalStatus()
Dim myDB = New OleDbConnection("Provider=Microsoft.Jet.OLEDB.4.0;Data Source=" & Session("CurrentDB") & ";Persist Security Info=True")
myDB.Open()
Label1.Text = "The currently chosen table or view is '" & Session("ChosenTable") & "'"
Label2.Text = "Please select the field of 'SURVIVAL STATUS' where 1 means occurence of the event while 0 means no event : "
'Retrieve the tables & queries schema
Dim SqlString = "SELECT * FROM [" & Session("ChosenTable") & "];"
Dim myDBCommand = New OleDbCommand(SqlString, myDB)
Dim myReader As IDataReader = myDBCommand.ExecuteReader()
Dim myColumns = myReader.GetSchemaTable
myDB.Close()
myList.Items.Clear()
For I = 1 To myColumns.Rows.Count
myList.Items.Add(myColumns.Rows(I - 1).Item(0).ToString)
Next I
End Sub
Private Sub CheckStrata()
Dim myDB = New OleDbConnection("Provider=Microsoft.Jet.OLEDB.4.0;Data Source=" + Session("CurrentDB") + ";Persist Security Info=True")
myDB.Open()
Dim SqlString = "SELECT DISTINCT [" + Session("Strata") + "] FROM [" + Session("ChosenTable") + "] WHERE ([" + Session("Strata") + "] Is Not Null);"
Dim myDBCommand = New OleDbCommand(SqlString, myDB)
Dim myReader As IDataReader = myDBCommand.ExecuteReader()
Dim StrataCount As Integer
Dim StrataArray As New ArrayList
Do While myReader.Read
StrataArray.Add(myReader.GetValue(0))
StrataCount = StrataCount + 1
Loop
Session("StrataCount") = StrataCount
Session("StrataArray") = StrataArray
End Sub
Protected Sub Button2_Click(ByVal sender As Object, ByVal e As System.EventArgs) Handles Button2.Click
Response.Redirect("default.aspx")
End Sub
Protected Sub Button3_Click(ByVal sender As Object, ByVal e As System.EventArgs) Handles Button3.Click
choosetbl.Text = myList.SelectedItem.Text
End Sub
<pre lang="vb.net">End Class
绘图页面
在这里,我们添加 MS Chart 如下
<asp:Chart ID="Chart1" runat="server" Height="600px" Width="742px">
<pre lang="aspnet"> <ChartAreas>
<asp:ChartArea Name="ChartArea1">
</asp:ChartArea>
</ChartAreas>
</asp:Chart>
以及代码隐藏文件
Imports System.Web.UI.DataVisualization.Charting
Imports Microsoft.VisualBasic
Imports System.Data
Imports System.Data.OleDb
Imports System.Math
Partial Class KaplanAccess
Inherits System.Web.UI.Page
Protected Sub Page_Load(ByVal sender As Object, ByVal e As System.EventArgs) Handles Me.Load
LoadKaplanData()
'Chart visual settings
Chart1.ChartAreas(0).AxisY.Maximum = 1
Chart1.ChartAreas(0).AxisY.Interval = 0.1
Chart1.ChartAreas(0).AxisX.MajorGrid.Enabled = False
Chart1.ChartAreas(0).AxisY.MajorGrid.Enabled = False
Chart1.ChartAreas(0).AxisX.Title = "Duration of Survival in months"
Chart1.ChartAreas(0).AxisY.Title = "Cumulative Survival Propabaility"
Dim myTitle As New Title
myTitle.Text = "Kaplan Meier Survival Curve(s) for total of " & Session("TotalCasesCount").ToString & " cases"
Chart1.Titles.Add(myTitle)
End Sub
Private Sub LoadKaplanData()
'Get all settings from the session
Dim CurrentDB = Session("CurrentDB")
Dim ChosenTable = Session("ChosenTable")
Dim SurvivalStatusColumn = Session("SurvivalStatusColumn")
Dim RegDateColumn = Session("RegDateColumn")
Dim LastContactColumn = Session("LastContactColumn")
Dim StrataColumn = Session("Strata")
Dim StrataCount = CInt(Session("StrataCount"))
Dim StrataArray = CType(Session("StrataArray"), ArrayList)
'Open the Database
Dim myDB = New OleDbConnection("Provider=Microsoft.Jet.OLEDB.4.0;Data Source=" & CurrentDB & ";Persist Security Info=True")
myDB.Open()
'Base of the SQL String, Note that you can manipulate the DateDiff to add the optimum followup period recommended by the study protocol statistical consideration
Dim SQLString = "SELECT [" & RegDateColumn & "], [" & LastContactColumn & "], [" & SurvivalStatusColumn & "] , DateDiff('m',[" & RegDateColumn & "],[" & LastContactColumn & "]) AS survivalduration" _
& " FROM([" & ChosenTable _
& "]) WHERE ((([" & RegDateColumn & "]) Is Not Null) AND (([" & LastContactColumn & "]) Is Not Null) AND (([" & SurvivalStatusColumn & "]) Is Not Null)) and ([" & RegDateColumn & "] < [" & LastContactColumn & "])"
'Do this as many times as there is strata
For I = 1 To StrataCount
Dim FinalSQLString, SQLString2, Stratum As String
SQLString2 = SQLString
Stratum = "All cases"
'Append strata selection criteria if there is any strata
If StrataCount > 1 Then
If IsNumeric(StrataArray.Item(I - 1)) Then
SQLString2 = SQLString & "and ([" & StrataColumn & "]=" & StrataArray.Item(I - 1).ToString & ")"
Else
SQLString2 = SQLString & "and ([" & StrataColumn & "]='" & StrataArray.Item(I - 1).ToString & "')"
End If
Stratum = StrataArray.Item(I - 1).ToString
End If
'Formulate the final SQL String
FinalSQLString = SQLString2 & "ORDER BY DateDiff('m',[" & RegDateColumn & "],[" & LastContactColumn & "]), " & SurvivalStatusColumn & " DESC;"
Dim myDBCommand = New OleDbCommand(FinalSQLString, myDB)
'Generate a temporary reader to get the number of cases
Dim myReader As IDataReader = myDBCommand.ExecuteReader()
Dim CasesCount As Integer = 0
Do While myReader.Read
CasesCount = CasesCount + 1
Loop
myReader.Close()
'Generate the main reader
myReader = myDBCommand.ExecuteReader()
'Call the function which performs calculations
Session("TotalCasesCount") = CInt(Session("TotalCasesCount")) + Kaplan.KaplanTables(Me, myReader, CasesCount, Stratum, Chart1)
myReader.Close()
Next I
myDB.Close()
End Sub
End Class
SurvivalCurve.vb
Imports Microsoft.VisualBasic
Imports System.Web.UI.DataVisualization.Charting
Imports System.Data
Imports System.Data.OleDb
Imports System.Math
Public Class Kaplan
Public Shared Function KaplanTables(ByRef myPage As Page, ByRef myReader As IDataReader, ByVal CasesCount As Integer, ByVal Stratum As String, ByRef myChart As Chart) As Int32
'This function will take two variables, an OLE Database Connection with a table named "Query1", _
'This query should contain the Survival status as the third column, and the survival duration as the fourth colum,
'the other variable passed to this function should be a gridview to produce the lifetables in.
'The function will bind the generated lifetables to the gridview & return the total number of processed cases as Int32
'Start generating the life tables
Dim myTable As New DataTable
myTable.Columns.Add("Survial Status")
myTable.Columns.Add("Remaining Patients")
myTable.Columns.Add("Survial Duration")
myTable.Columns.Add("Survial Propability")
myTable.Columns.Add("Cumulative Survial Propability")
Dim myFirstRow As DataRow = myTable.NewRow
myFirstRow.Item(0) = 1
myFirstRow.Item(1) = CasesCount
myFirstRow.Item(2) = 0
myFirstRow.Item(3) = 1
myFirstRow.Item(4) = 1
myTable.Rows.Add(myFirstRow)
Dim Ptnseq = CasesCount
For I = 1 To CasesCount
Dim myRow As DataRow = myTable.NewRow
'Get only one record from KaplanTable
Dim Kaplantmp = myReader.Read
Ptnseq = Ptnseq - 1
myRow.Item(0) = myReader.GetValue(2)
myRow.Item(1) = Ptnseq 'Sets the total number of remaining patients
myRow.Item(2) = myReader.GetValue(3)
If myRow.Item(0) = 0 Then
myRow.Item(3) = myTable.Rows(I - 1).Item(3)
myRow.Item(4) = myTable.Rows(I - 1).Item(4)
ElseIf myRow.Item(0) = 1 Then
myRow.Item(3) = myRow.Item(1) / myTable.Rows(I - 1).Item(1)
myRow.Item(4) = myRow.Item(3) * myTable.Rows(I - 1).Item(4)
End If
myTable.Rows.Add(myRow)
Next I
'Finished generating the lifetables, bind it to a grid
Dim myGrid As New GridView 'Create a new dynamic Grid
Dim myLabel As New Label 'Create a new dynamic label for this grid
myPage.Form.Controls.Add(myLabel) 'add the label, then
myPage.Form.Controls.Add(myGrid) 'add the grid
myGrid.DataSource = myTable 'Bind the grid to the calculated lifetables
myGrid.DataBind()
DrawKaplanCurve(myTable, myChart, Stratum)
myLabel.Text = "Current Stratum is: " & Stratum & "<br/>" & "Total Number of cases is: " & (myTable.Rows.Count - 1).ToString & " Cases" '(if strata)
Return myTable.Rows.Count - 1
End Function
Public Shared Sub DrawKaplanCurve(ByVal myTable As DataTable, ByVal myChart As Chart, ByVal Stratum As String)
Dim KaplanSeries As New Series
KaplanSeries.ChartType = SeriesChartType.StepLine
KaplanSeries.Name = Stratum
Dim CensoredSeries As New Series
CensoredSeries.ChartType = SeriesChartType.Stock
CensoredSeries.Name = "Censored " & Stratum
For I = 1 To myTable.Rows.Count - 1
Dim myPoint As New DataPoint
Dim xval As Double = myTable.Rows(I).Item(2)
Dim yval As Double = myTable.Rows(I).Item(4)
myPoint.SetValueXY(xval, yval)
' If alive case, then add to censored data
If myTable.Rows(I).Item(0) = 0 Then
Dim CensoredPoint As New DataPoint
CensoredPoint.SetValueXY(myPoint.XValue, yval - 0.01, yval + 0.01)
CensoredPoint.ToolTip = "Censored Case Number " & myTable.Rows(I).Item(1).ToString & vbNewLine & "Survival Duration = " & myTable.Rows(I).Item(2).ToString & " months" & vbNewLine & "Cumulative Survival Propability = " & Round(yval * 100, 2).ToString & "%"
CensoredPoint.Color = myPoint.Color
If I <> myTable.Rows.Count - 1 Then CensoredSeries.Points.Add(CensoredPoint) 'add all except the last point because it shouldn't be censored
End If
If I = myTable.Rows.Count - 1 Then myPoint.Label = Round(yval * 100, 2).ToString & "%"
KaplanSeries.Points.Add(myPoint)
Next
myChart.Series.Add(KaplanSeries)
myChart.Series.Add(CensoredSeries)
Dim myLegend As New Legend
myLegend.TitleForeColor = myChart.Series(myChart.Series.Count - 1).Color
myChart.Legends.Add(myLegend)
End Sub
End Class
参考文献
1. martingales 在生存分析中的应用历史。 Odd O.AALEN, PerKragh ANDERSEN , Ørnulf BORGAN, Richard D. GILL 和 Niels KEIDING。 2009 年 6 月,概率和统计历史电子杂志,第 5 卷。
2. 来自不完整观测的非参数估计。 Kaplan, E. L. 和 Meier。 1958 年,美国统计协会杂志,pp. P.53, 457–481 和 562–563。
3. 社区,维基。 Kaplan–Meier 估计器。 wikipedia.org。 [在线] 2012。
4. Dunn, Steve。 生存曲线:应计和 Kaplan-Meier 估计。 cancerguide.org。 [在线] 2002。 http://www.cancerguide.org/scurve_km.html.