Grey Walker Grey Walker - 2 months ago 17 Question

Sequence or Batch items DataGridView

I have large DataGridView with 940000 rows...ouch, filled from parsing a csv file, The DataGridView has a column named Sequence numbered 1 to 940000. What I am attempting to do is to re-number the sequence to spilt up into sequences of 1 to 7000 for the amount of rows in the DataGridView. Whats the most efficient way to reorder the sequence column?

Using reader As New Microsoft.VisualBasic.FileIO.TextFieldParser(fileName)
reader.TextFieldType = FileIO.FieldType.Delimited
Dim currentRow As String()
Dim serial As String
Dim sequence As Integer = 0
Dim RollId As String

'pbUploadFile.Maximum = serialAmmount / quantityBreak
pbUploadFile.Maximum = serialAmmount
pbUploadFile.Step = 1
pbUploadFile.Value = 0

For i = 1 To serialAmmount / quantityBreak
For j = 1 To quantityBreak
currentRow = reader.ReadFields()
serial = currentRow(0).ToString
sequence += 1
EnterDataIntoDatabase(serial, sequence, nextRollNumber, ddSelectPartNumber.Text)
Catch ex As Microsoft.VisualBasic.FileIO.MalformedLineException
MsgBox("Code " & ex.Message & "is not valid and will be skipped check csv file")
End Try
Next j

sqlCmd = New SqlClient.SqlCommand("SELECT * FROM serials WHERE Sequence=@sequence AND RollNo=@rollNo ", sqlCon)
sqlCmd.CommandType = CommandType.Text
sqlCmd.Parameters.AddWithValue("@sequence", 1)
sqlCmd.Parameters.AddWithValue("@rollNo", nextRollNumber)
Dim readRollId As SqlClient.SqlDataReader = sqlCmd.ExecuteReader()
If readRollId.Read() Then
RollId = readRollId.Item("Code")
End If

UpdateAvailableRolls(ddSelectPartNumber.Text, nextRollNumber, RollId)
nextRollNumber += 1
sequence = 0
Next i
End Using


It is usually best to take into consideration how the data will be used and how when deciding exactly how to do something and what tools to use to do it. There is no one right | fast | efficient way to do most things.

That said, there are some bad ways of doing things. Using a DataGridView as a data container seems ill advised (I cant actually see anything related to a DGV in the code though). A) there is no automatic way for the data to get into it - you had to write code to do it, and 2) there is no automatic way for the data to go somewhere else - you have to write code to loop thru it and fish the data back out. Then there is the matter of all the data likely being stored as strings.

There also looks to be more going on than just batching up items. The following will import rows from a CSV, process them and write them back to a DB (I am using MySql, but the concepts are the same).

First, the TextFieldParser is a pretty handy tool, but it has a major drawback in that it only returns strings. If the CSV has prices, dates, booleans etc in it, that type is lost. In many cases, CSVHelper would be a better choice.

In this case, since the data is destined for a database, I would use OleDB to read the CSV into a DataTable, batch it, then send it to the DB.

Import Data using OleDB


OleDb includes a text file driver which can be used to parse CSVs. It can "guess" at the data types based on the context of the first few rows, but you can also define them. In the folder/directory where the CSV resides, create a new text file named Schema.INI. Define the CSV and columns like this:

Col1="Country" Text Width 254
Col2="Capital City" Text Width 254
Col3="Population" Single
Col4="Rank" Integer
Col5="National Day" Date

  • You can have multiple csv definitions in a single file, each starting with [...]
  • The [...] would be the name of the CSV
  • If the CSV has a header row, it can use those for the column names
  • If the columns are also enclosed in quotes ("Like this","in","the csv"), use TextDelimiter="
  • Each Col#= entry defines the datatype and can override the name. This allows you to "map" a column named "Foo" in the CSV to one named "Bar" in the DB.
  • Other options like the decimal and currency symbol and the code page used in the file can be specified.

Connection String

The connection string to use would be:

ACEImportStr = "Provider=Microsoft.ACE.OLEDB.12.0;Data Source='{0}';Extended Properties='TEXT'"

The Data Source would be the folder where both the CSV and Schema.INI exist and the 'TEXT' element tells it to use the Text driver. Fill in the blank using the folder name:

ACEImportConnStr = String.Format(ACEImportConnStr, "C:\Temp")

OLEDB.12 can sometimes be finicky, if you have problems, use Microsoft.Jet.OLEDB.4.0 for the Provider instead.

Now, to load the data, just select from the CSV file name (no folder):

Dim sSQL = "SELECT * FROM RandomOle.CSV"
Dim daSrc = New OleDbDataAdapter(sSQL, OleCSVConnstr)
rowsLoaded = daSrc.Fill(dtSample)

The DataAdapter will read the Schema for the definitions and load the CSV into a datatable in just a few seconds. There is more to be done to handle other tasks, but that is the concept.

Dim sw As New Stopwatch

Dim rowsLoaded As Int32
Dim rowsUpdated As Int32

ACEImportConnStr = String.Format(ACEImportConnStr, "C:\Temp")

' create Destination MySQL conn, Src and Dest dataadapters,
' and a command builder (because I am lazy...and fallible)
Using mysqlCon As New MySqlConnection(MySQLConnStr),
    daSrc As New OleDbDataAdapter(sSQL, ACEImportConnStr),
    daDest As New MySqlDataAdapter("SELECT * FROM Sample", mysqlCon),
    cb As New MySqlCommandBuilder(daDest)

    ' important!
    daSrc.AcceptChangesDuringFill = False

    dtSample = New DataTable
    rowsLoaded = daSrc.Fill(dtSample)

    ' csv lacks an ID column - add it
    Dim dc As New DataColumn("Id", GetType(Int32))
    dc.DefaultValue = 1

    ' MY csv also lacks a BATCH column
    dc = New DataColumn("Batch", GetType(Int32))
    dc.DefaultValue = 1

    ' set the batch number
    ' each 5k rows == a batch
    Dim batch As Int32 = 1
    Dim counter As Int32 = 1
    For Each dr As DataRow In dtSample.Rows
        dr("Batch") = batch
        counter += 1
        If counter > 5000 Then
            counter = 0
            batch += 1
        End If

    ' now save the data to MySQL
    ' inserting 250k rows takes a while,
    ' use a transaction
    Using t As MySqlTransaction = mysqlCon.BeginTransaction
        rowsUpdated = daDest.Update(dtSample)
    End Using

End Using

' show the IMPORT in a dgv
dgv1.DataSource = dtSample
dgv1.Columns("Id").Visible = False

' report

The principle is simple: since the data is bound for a DB, get the data into a DataTable ASAP. The trick here is that there are 2 DB Providers involved: OleDB to read the csv and MySql for the saving.

  • Normally when a DataAdapter fills a DataTable all the rows are set to Unchanged. AcceptChangesDuringFill = False leaves the states set to Added so that the MySql adapter can insert those rows.
  • The CommandBuilder builds the INSERT SQL to be used from the SELECT command.
  • I don't know what that serials-rollno query is doing, but I would not run queries inside the import process loop. If some of the values you need to set depend on values in the DB, load them into another DT and query them from there. There are some DataTable extension methods that make it easy to find rows.
  • Likewise, I don't know what EnterDataIntoDatabase does, but you should strive to process and prepare all the imported data in the DataTable, then update it all at once.

You appear to have more going one than just batching or sequencing a bunch of rows. The code above can import 250k rows, assign batch numbers, and insert 250k new rows into MySql in 1.2 minutes (almost 3500 rows per second).

If the batch/sequencer is anything like each X number of rows in order from the CSV, you might be able to just load 7000 rows at a time, set the value, save that batch and then load the next 7k rows. This would limit the number of rows loaded at any one time and reduce the memory the app uses. I am not sure if it applies though.