Sparkm4n Sparkm4n - 6 months ago 76
Vb.net Question

get column names Jet OLE DB in vb.net

I've written a function which reads csv files and parametrizes them accordingly, therefore i have a function gettypessql which queries sql table at first to get data types and therefore to adjust the columns which are later inserted in sql. So my problem is when I set HDR=Yes in Jet OLE DB I get only column names like F1, F2, F3. To circumvent this issue I've set HDR=No and written some for loops but now I get only empty strings, what is actually the problem? here is my code:

Private Function GetCSVFile(ByVal file As String, ByVal min As Integer, ByVal max As Integer) As DataTable
Dim ConStr As String = "Provider=Microsoft.Jet.OLEDB.4.0;Data Source=" & TextBox1.Text & ";Extended Properties=""TEXT;HDR=NO;IMEX=1;FMT=Delimited;CharacterSet=65001"""
Dim conn As New OleDb.OleDbConnection(ConStr)
Dim dt As New DataTable
Dim da As OleDb.OleDbDataAdapter = Nothing
getData = Nothing

Try
Dim CMD As String = "Select * from " & _table & ".csv"
da = New OleDb.OleDbDataAdapter(CMD, conn)
da.Fill(min, max, dt)
getData = New DataTable(_table)
Dim firstRow As DataRow = dt.Rows(0)

For i As Integer = 0 To dt.Columns.Count - 1
Dim columnName As String = firstRow(i).ToString()
Dim newColumn As New DataColumn(columnName, mListOfTypes(i))
getData.Columns.Add(newColumn)
Next

For i As Integer = 1 To dt.Rows.Count - 1
Dim row As DataRow = dt.Rows(i)
Dim newRow As DataRow = getData.NewRow()

For j As Integer = 0 To getData.Columns.Count - 1
If row(j).GetType Is GetType(String) Then
Dim colValue As String = row(j).ToString()
colValue = ChangeEncoding(colValue)
colValue = ParseString(colValue)
colValue = ReplaceChars(colValue)
newRow(j) = colValue
Else
newRow(j) = row(j)
End If
Next

getData.Rows.Add(newRow)
Application.DoEvents()
Next
Catch ex As OleDbException
MessageBox.Show(ex.Message)
Catch ex As Exception
MessageBox.Show(ex.Message)
Finally
dt.Dispose()
da.Dispose()
End Try

Return getData
End Function


and get types sql, this one doesn't convert properly, especially doubles

Private Sub GetTypesSQL()
If (mListOfTypes Is Nothing) Then
mListOfTypes = New List(Of Type)()
End If

mListOfTypes.Clear()

Dim dtTabelShema As DataTable = db.GetDataTable("SELECT TOP 0 * FROM " & _table)

Using dtTabelShema
For Each col As DataColumn In dtTabelShema.Columns
mListOfTypes.Add(col.DataType)
Next
End Using
End Sub

Answer

I think you have made it more complicated than it needs to be. For instance, you get the dbSchema by creating an empty DataTable and harvesting the Datatypes from it. Why not just use that first table rather than creating a new table from the Types? The table also need not be reconstructed over and over for each batch of rows imported.

Generally since OleDb will try to infer types from the data, it seems unnecessary and may even get in the way in some cases. Also, you are redoing everything that OleDB does and copying data to a different DT. Given that, I'd skip the overhead OleDB imposes and work with the raw data.

This creates the destination table using the CSV column name and the Type from the Database. If the CSV is not in the same column order as those delivered in a SELECT * query, it will fail.


The following uses a class to map csv columns to db table columns so the code is not depending on the CSVs being in the same order (since they may be generated externally). My sample data CSV is not in the same order:

Public Class CSVMapItem

    Public Property CSVIndex As Int32
    Public Property ColName As String = ""
   'optional
    Public Property DataType As Type

    Public Sub New(ndx As Int32, csvName As String,
                   dtCols As DataColumnCollection)

        CSVIndex = ndx

        For Each dc As DataColumn In dtCols
            If String.Compare(dc.ColumnName, csvName, True) = 0 Then
                ColName = dc.ColumnName
                DataType = dc.DataType
                Exit For
            End If
        Next

        If String.IsNullOrEmpty(ColName) Then
            Throw New ArgumentException("Cannot find column: " & csvName)
        End If
    End Sub
End Class

The code to parse the csv uses CSVHelper but in this case the TextFieldParser could be used since the code just reads the CSV rows into a string array.

Dim SQL = String.Format("SELECT * FROM {0} WHERE ID<0", DBTblName)
Dim rowCount As Int32 = 0
Dim totalRows As Int32 = 0
Dim sw As New Stopwatch
sw.Start()

Using dbcon As New MySqlConnection(MySQLConnStr)
    Using cmd As New MySqlCommand(SQL, dbcon)

        dtSample = New DataTable
        dbcon.Open()

        ' load empty DT, create the insert command
        daSample = New MySqlDataAdapter(cmd)
        Dim cb = New MySqlCommandBuilder(daSample)
        daSample.InsertCommand = cb.GetInsertCommand
        dtSample.Load(cmd.ExecuteReader())

        ' dtSample is not only empty, but has the columns
        ' we need

        Dim csvMap As New List(Of CSVMapItem)

        Using sr As New StreamReader(csvfile, False),
                        parser = New CsvParser(sr)

            ' col names from CSV
            Dim csvNames = parser.Read()
            ' create a map of CSV index to DT Columnname  SEE NOTE
            For n As Int32 = 0 To csvNames.Length - 1
                csvMap.Add(New CSVMapItem(n, csvNames(n), dtSample.Columns))
            Next

            ' line data read as string
            Dim data As String()
            data = parser.Read()
            Dim dr As DataRow

            Do Until data Is Nothing OrElse data.Length = 0

                dr = dtSample.NewRow()

                For Each item In csvMap
                    ' optional/as needed type conversion
                    If item.DataType = GetType(Boolean) Then
                        ' "1" wont convert to bool, but (int)1 will
                        dr(item.ColName) = Convert.ToInt32(data(item.CSVIndex).Trim)
                    Else
                        dr(item.ColName) = data(item.CSVIndex).Trim
                    End If
                Next
                dtSample.Rows.Add(dr)
                rowCount += 1

                data = parser.Read()

                If rowCount = 50000 OrElse (data Is Nothing OrElse data.Length = 0) Then
                    totalRows += daSample.Update(dtSample)
                    ' empty the table if there will be more than 100k rows
                    dtSample.Rows.Clear()
                    rowCount = 0
                End If
            Loop
        End Using

    End Using
End Using
sw.Stop()
Console.WriteLine("Parsed and imported {0} rows in {1}", totalRows,
                    sw.Elapsed.TotalMinutes)

The processing loop updates the DB every 50K rows in case there are many many rows. It also does it in one pass rather than reading N rows thru OleDB at a time. CsvParser will read one row at a time, so there should never be more than 50,001 rows worth of data on hand at a time.

There may be special cases to handle for type conversions as shown with If item.DataType = GetType(Boolean) Then. A Boolean column read in as "1" cant be directly passed to a Boolean column, so it is converted to integer which can. There could be other conversions such as for funky dates.

Time to process 250,001 rows: 3.7 mins. An app which needs to apply those string transforms to every single string column will take much longer. I'm pretty sure that using the CsvReader in CSVHelper you could have those applied as part of parsing to a Type.


There is a potential disaster waiting to happen since this is meant to be an all-purpose importer/scrubber.

For i As Integer = 0 To dt.Columns.Count - 1
    Dim columnName As String = firstRow(i).ToString()
    Dim newColumn As New DataColumn(columnName, mListOfTypes(i))
    getData.Columns.Add(newColumn)
Next

Both the question and the self-answer build the new table using the column names from the CSV and the DataTypes from a SELECT * query on the destination table. So, it assumes the CSV Columns are in the same order that SELECT * will return them, and that all CSVs will always use the same names as the tables.

The answer above is marginally better in that it finds and matches based on name.

A more robust solution is to write a little utility app where a user maps a DB column name to a CSV index. Save the results to a List(Of CSVMapItem) and serialize it. There could be a whole collection of these saved to disk. Then, rather than creating a map based on dead reckoning, just deserialize the desired for user as the csvMap in the above code.