insomniac insomniac - 1 month ago 5
SQL Question

Why does this VBA code for SQL queries on CSV files work intermittently?

A very simple query function that takes in a path for a source CSV file and a SQL statement as a string (I'm also transposing the data from the VBA function),

Public Function RunQuery(FilePath As String, SQLStatement As String)

Dim Conn As New ADODB.Connection
Dim RecSet As New ADODB.Recordset

With Conn
.Provider = "Microsoft.Jet.OLEDB.4.0"
.ConnectionString = "Data Source=" & FilePath & ";" & _
"Extended Properties=""text;HDR=Yes;FMT=Delimited;IMEX=1"""
End With

Conn.Open
RecSet.Open SQLStatement, Conn
RecSet.MoveFirst
RunQuery = RecSet.GetRows()

Conn.Close
Set RecSet = Nothing
Set Conn = Nothing

End Function


This code works intermittently against a CSV files, some data is retrieved correctly and some is not.

An example are these two CSV files - Abbreviated and Full. The following SQL query works perfectly on the Abbreviated file, but returns #VALUE on the Full file.

SELECT birthYear FROM [File]


It's definitely not a data limit/size issue as the Full file only contains 1800 rows. I'm completely befuddled and would appreciate any thoughts/pointers.

Incidentally if I wrap up the logic into a Sub rather than a UDF then it works perfectly without any errors,

Public Sub RunQuerySub()

Dim Conn As New ADODB.Connection
Dim RecSet As New ADODB.Recordset
Dim FilePath As String
FilePath = ActiveSheet.Range("Path")

With Conn
.Provider = "Microsoft.Jet.OLEDB.4.0"
.ConnectionString = "Data Source=" & FilePath & ";" & _
"Extended Properties=""text;HDR=Yes;FMT=Delimited;IMEX=1"""
End With
Dim SQLStatement As String
SQLStatement = ActiveSheet.Range("SQL")

Conn.Open
RecSet.Open SQLStatement, Conn
ActiveSheet.Cells(1, 8).CopyFromRecordset RecSet

Conn.Close
Set RecSet = Nothing
Set Conn = Nothing

End Sub


I am very confused, and would appreciate any pointers.

Answer

I adapted the technique for using a Sub and managed to get a Function which returns an array for both abbreviated and full files.

Highlight a range of 1892 cells in a column & use this array function

=RunQuery("C:\stackoverflow", "SELECT birthYear FROM [full.csv]")

This is the function. It replaces Null values in the resultset with zero.

Public Function RunQuery(FilePath As String, SQLStatement As String)

    Dim Conn As New ADODB.Connection
    Dim RecSet As New ADODB.Recordset
    Dim rows As Variant
    On Error GoTo ErrHandler
    With Conn
        .Provider = "Microsoft.Jet.OLEDB.4.0"
        .ConnectionString = "Data Source=" & FilePath & ";" & _
        "Extended Properties=""text;HDR=Yes;FMT=Delimited;IMEX=1"""
    End With

    Conn.Open
    RecSet.Open SQLStatement, Conn
    RecSet.MoveFirst
    rows = RecSet.GetRows()

    Conn.Close
    Set RecSet = Nothing
    Set Conn = Nothing

    Dim nrows As Integer, i As Integer, valu As Integer
    nrows = UBound(rows, 2) + 1
    ReDim arr2(1 To nrows, 1 To 1) As Integer
    For i = 1 To nrows
        If IsNull(rows(0, i - 1)) Then
            valu = 0
        Else
            valu = rows(0, i - 1)
        End If
        arr2(i, 1) = valu
    Next
    RunQuery = arr2
    Exit Function

ErrHandler:
    Debug.Print Err.Number, Err.Description
    Resume Next
End Function