Sparkm4n Sparkm4n - 3 months ago 27x Question

Import to DataTable using OleDB/Jet

I wrote a short function with Oledb which should read large amount of data but there are still several issues I couldnt't solve, it's about reading and inserting semicolon separated data in sql database

Private Function GetCSVFile(ByVal file As String) As DataTable

Dim dt As New DataTable
Dim ConStr As String = "Provider=Microsoft.Jet.OLEDB.4.0;Data Source=" & TextBox1.Text & ";Extended Properties=""TEXT;HDR=Yes;FMT=Delimited"""
Dim conn As New OleDb.OleDbConnection(ConStr)

Dim da As New OleDb.OleDbDataAdapter("Select * from " & _table & ".csv", conn)
getData = dt
Catch ex As OleDbException
Catch ex As Exception
End Try
Return getData
End Function

1. it reads whole file but i need to tell the function that it should read only 50.000 rows and pass them to another function step for step, it should probably better work in for loop because jet oledb doesn't read files larger than 1 GB

  1. i need to replace characters

value(i) = value(i).Replace("\t", Constants.vbTab).Replace("\n", Constants.vbLf).Replace("\r", Constants.vbCr).Replace("\""", """").Replace("\\", "\")

but it usually works only for strings

  1. I need to recognize data types like integer, double, string etc. my first idea was to do it over sql query and check the tables over tryparse

    Dim dtInserts As DataTable = db.GetDataTable("SELECT TOP 0 * FROM " & _table)
    Dim ListOfTypes As New List(Of System.Type)
    For Each _col As DataColumn In dtInserts.Columns
    Dim _type As System.Type = _col.DataType
    Dim _wert1 As String = "11.11.2011"
    Dim _type1 As System.Type = ListOfTypes.Item(1)
    If DateTime.TryParse(_wert1, New Date) Then
    End If

    but still not sure if it will work

  2. All data that have been read should be encoded in 1252 Codepage.
    this one doesn't really work

Dim ConStr As String = "Provider=Microsoft.Jet.OLEDB.4.0;Data Source=" & TextBox1.Text & ";Extended Properties=""TEXT;HDR=Yes;FMT=Delimited;CODEPAGE=1252"""

anyone idea what could be done?


CSVHelper can do most of what it sounds like you want. FileHelpers might also work, I just haven't worked with it as much. I wouldn't recommend VB's TextFieldParser because it returns a string array rather than typed data.

OleDB provides a great way to import, but saving to the database presents a small challenge. All the loaded rows will have a RowState of Unchanged. The only way to change that to Added is to copy the rows into a new table:

For Each dr As DataRow In dtCSV.Rows

It will work, but as a result, you will have 2 tables and all those rows loaded at once. It turns out that using CSVHelper and a simple INSERT query is the most economical - since the records are fetched from an IEnumerable<T> only 1 source record will be loaded at a time. It is also a bit faster than copying rows: about 20% faster on 500k rows.

Note: We have no idea whatsoever what the data looks like other than it is semicolon delimited...and there is apparently a lot of it.

Using sr As New StreamReader(CSVFilePath, False),
     csv As New CsvReader(sr)

    ' some CSVHelper config options
    csv.Configuration.HasHeaderRecord = True
    csv.Configuration.TrimFields = True
    csv.Configuration.TrimHeaders = True
    csv.Configuration.Delimiter = ";"
    csv.Configuration.IsHeaderCaseSensitive = False
    csv.Configuration.RegisterClassMap(Of RandItem.RandItemMap)()

    ' get the file into IEnumerable collection
    Dim csvData = csv.GetRecords(Of OleImportItem)()

    Dim SQL = <sql>
              INSERT INTO RandomData 
                     (Foo, Bar, Cat, Dog...)
                    (@p1, @p2, @p3, @p4...)

    Using dbcon As New OleDbConnection(ACEConnStr)
        Using cmd As New OleDbCommand(SQL, dbcon)

            ' create the parameters
            cmd.Parameters.Add("@p1", OleDbType.VarChar)
            cmd.Parameters.Add("@p2", OleDbType.VarChar)
            cmd.Parameters.Add("@p3", OleDbType.Integer)
            cmd.Parameters.Add("@p4", OleDbType.Integer)
            ' load one item at a time, to save it
            For Each item In csvData
                cmd.Parameters("@p1").Value = item.Foo
                cmd.Parameters("@p2").Value = item.Bar
                cmd.Parameters("@p3").Value = item.Cat
                cmd.Parameters("@p4").Value = item.Dog
        End Using
    End Using
End Using

I cant give a complete tutorial on how to use it, but in general you create a type (Class) which defines the datatype for each column (here,RandItem), the RandItemMap is another class which specifies the order of those properties in the file. In so doing, CSVHelper knows the datatype of each column and will convert for you.

There are several ways to use it, this way is reading one line at a time from the file and immediately saves that item to the database: csvData = csv.GetRecords(Of OleImportItem)() initializes csvData as an IEnumberable(Of RandItem), so only one item is loaded at a time in the loop which makes it very economical.

In the loop, the code gets a typed item from the csv file which is then mapped to the respective parameters and saved. This is about 20% faster than copying the data to a DataTable in batches and saving; and much less memory intensive since there is only ever 1 item loaded.

For smaller csv files, you could use .ToArray() or ToList() to load the file to a collection.

Note Importing/parsing a text file without headers is a little different than shown, but just as easy.

It sounds like CSVHelper might obviate most issues in your laundry list.