Sparkm4n Sparkm4n - 1 year ago 70 Question

Import range of rows / batches from CSV with OleDB

I want to fetch dynamically selected rows with specific range with ole db, thats the code that i wrote:

Dim ConStr As String = "Provider=Microsoft.Jet.OLEDB.4.0;Data Source=" & TextBox1.Text & ";Extended Properties=""TEXT;HDR=Yes;FMT=Delimited"""
Dim conn As New OleDb.OleDbConnection(ConStr)
Dim dt As New DataTable
Dim CMD As String = "Select * from " & _table & ".csv WHERE ID BETWEEN " & min & "AND " & max
Dim da As New OleDb.OleDbDataAdapter(CMD, conn)

it runs in a for loop with max and min and fetches rows step for step and passes them to another function, but the problem is that the code above doesn't really work because it looks after ID but it should actually look after rows. I mean there are 70.000 rows but 200.000 ids. how can i select rows in oledb?

Answer Source

Fundamentally, Ids do not really have any relation to row count except by accident. There is an overload for the DataAdapter.Fill() method which lets you specify the range of rows loaded. I double checked to be sure it works with the OleDB Text Driver.

Private OLECSVConnstr = ...your text driver connection string
Private firstRow As Int32 = 0
Private rowCount As Int32 = 1000
Private Function ImportRows(csvFile As String) As Int32

    Dim SQL = String.Format("SELECT * FROM {0}", csvFile)
    Dim rows As Int32

    Using dbcon As New OleDbConnection(OLECSVConnstr)
        Using cmd As New OleDbCommand(SQL, dbcon)

            ' DataTable is IDiposable, so dont just
            '   (re)create a new one each time
            If dtSample Is Nothing Then
                dtSample = New DataTable
            End If

            Using da As New OleDbDataAdapter(cmd)
                rows = da.Fill(firstRow, rowCount, dtSample)
            End Using

            ' increment firstrow for next time
            firstRow += rowCount

        End Using
    End Using

    dgv2.DataSource = dtSample
    Return rows

End Function

Edit: Since DataTable implements IDisposable creating a new one for each "batch"/row set may result in a leak. The above checks and creates a new one if needed and clears the rows otherwise. Alternatively, you could create the DataAdapter once only.

DataAdapter.Fill(int first, int count, DataTable dt) allows you to tell the adapter the first row to load and how many. The ImportRows method above returns how many rows were actually loaded so when it returns fewer than the number requested, it should mean there are no more rows.

With such a big file, I would use a Schema.INI with the OleDB Text Driver so you can specify the data types used for each row. Without it, the Text Driver determines data types (guesses) from the data in the first few rows.

I should mention that CSVHelper could also be used to load rows in batches as well. It reads the CSV a line at a time as you call Read(), so all you'd have to do is maintain a reference to the CSV reader and call it in a loop to get N rows at a time.

DBDataAdapter.Fill on MSDN