A method for quickly importing and working with CSV files (part 2)

This is part 2 in which I explain how I solved the problem. Make sure to read the previous blog posts. Taken together, these show I though through and ultimately solved the problem.

Query Tables to the Rescue!

To solve this problem, I stumbled on something I rarely use, and that’s our good ol’ friend Query Tables. I don’t think they go by Query Tables anymore, but that’s what what they’re called in the code, so that’s the terminology I’m going to use here.

You might have used Query Tables as a manual feature before. For instance, to import a CSV file into Excel using Query Tables you would simply select From Text in the Get External Data Group from on the Data tab. See the snapshot below. These are Query Tables.

image

This functionality is really meant to create a connection between Excel and some other data source. The idea being you can manipulate and analyze data using Excel as your business logic and presentation layer.

But I didn’t need any of that here. I never expected the data in the underlying CSV file to change, so a connection that could refresh wasn’t necessary. Furthermore, unless you’re using these connections with PowerPivot, Excel has a bad history linking to other files.

But what Query Tables brings to the table (pun intended) is that they can dump an entire table’s worth of data rather quickly into a worksheet tab in one fell swoop. Try it for yourself manually to see how quick it is.

In fact, I was able to automate the entire process using the code below.

   1: Private Sub UploadFiles(FilePath As String, WorksheetTabName As String)

   2:     On Error Resume Next

   3:

   4:     Dim CurrentWorksheet As Excel.Worksheet

   5:     Set CurrentWorksheet = Worksheets(WorksheetTabName)

   6:

   7:     With CurrentWorksheet.QueryTables.Add(Connection:= _

   8:         "TEXT;" & FilePath _

   9:         , Destination:=CurrentWorksheet.Range("$A$1"))

  10:         .CommandType = 0

  11:         .Name = "DeleteMe"

  12:         .FieldNames = True

  13:         .RefreshOnFileOpen = False

  14:         .RefreshStyle = xlOverwriteCells

  15:         .AdjustColumnWidth = True

  16:         .RefreshPeriod = 0

  17:         .TextFilePlatform = 437

  18:         .TextFileStartRow = 1

  19:         .TextFileParseType = xlDelimited

  20:         .TextFileTextQualifier = xlTextQualifierDoubleQuote

  21:         .TextFileConsecutiveDelimiter = False

  22:         .TextFileCommaDelimiter = True

  23:         .TextFileTrailingMinusNumbers = True

  24:         .Refresh BackgroundQuery:=False

  25:     End With

  26:

  27:     CurrentWorksheet.QueryTables(1).Delete

  28:     ThisWorkbook.Names(WorksheetTabName & "!DeleteMe_1").Delete

  29: End Sub

So let’s go through how this works.

First, the way my program works is that the user supplies a folder containing the CSV files. Then, when they hit a button that says “Run,” I have Excel iterate through the contents of the folder. When I’ve verified a file in the folder is a correct file that will work with my application, I call the UploadFiles method just shown. The FilePath parameter contains to the direct path to the CSV file whose contents I’d like to upload; and the WorksheetTabName parameter refers to the worksheet where’d I’d like the contents to be dumped.

As you can see, on line 8, FilePath is supplied to the connection string when the query table is created. CurrentWorksheet is then used to tell Excel where the data will be placed (line 9).

On line 11, you see I named this query “DeleteMe.” Whenever Excel creates a new Query Table, it also creates a new named range in the background. There’s a default name Excel uses, but you can overwrite it with your own name. In this case, I use “DeleteMe” in place of the default as a reminder that the range shouldn’t exist (if my code errors out, I will see a DeleteMe named range in the name manager). You can see this happening below. It really helps with debugging.

image

The reason there are so many repeats is that each name created by the code above is scoped locally to the sheet in which it was created. Excel also adds that “_1” to the end. I imagine if I had multiple connections on the same sheet it would increment that last number. But maybe someone with more experience could confirm that for me?

In any event, we want to ensure there are no traces of the connection settings. So the BackgroundQuery setting to FALSE on line 24 ensures there are no direct updates from the backend data. Line 27 deletes the QueryTable we just created. Don’t worry this just deletes the connection behind the scenes, but it keeps the data that’s been dumped. Finally, line 28 removes the range that was just created. Since it’s local to the worksheet, we can refer to it as “WorksheetName!DeleteMe_1” and then delete it.

So let’s recap. I’ve created a mechanism to quickly import the data from CSV files. I used the functionality incorporated into Query Tables to bypass other solutions that either were not as good or weren’t feasible. In particular, I was  able to get around two solutions I particularly hate: (1) the iterative algorithm; and (2) copying and pasting from other files.

All told, this solution works great! Right now, Excel will iterate through a folder of about eight CSV files and place the data from each CSV file rather quickly. In fact, it happens so quickly, I had to add a completion msgbox just to ensure the code was actually being run! Neither the iterative or copying/pasting solution can boast these results.

Remember, whenever possible, do more with less.

Advertisements

14 thoughts on “A method for quickly importing and working with CSV files (part 2)

  1. Stakov

    Funny: I faced the same situation few days ago : opening several csv files to get their content.

    After trying to use these qwerytable I finally use this method : open file through workbook.open, pass its content into an array, close the file, do whatever you want with the array.

    I didn’t like the idea to create and delete file connections.

    Why do you prefer queries?

    Reply
      1. Jordan G Post author

        Make sure to read part I of this article: https://optionexplicitvba.com/2014/12/04/a-method-for-quickly-importing-and-working-with-csv-files-part-1, since I talk about ADO there. I actually have nothing against ADODB in terms of functionality – so I agree, it’s an excellent option. But projects which have referenced earlier versions of the ADO libraries always seem to have problems later on when ADO libraries change names (say, from “2.7” to “2.8”). In other words: ADO isn’t always a lasting solution. In fact, project references aren’t what they used to be for anything. Common Controls, for instances, no longer ship with Office, introducing a whole new set of headaches for us.

    1. Jordan G Post author

      @Stakov
      Make Sure to check out Part 1: A method for quickly importing and working with CSV files (part 1: https://optionexplicitvba.com/2014/12/04/a-method-for-quickly-importing-and-working-with-csv-files-part-1/.) One issue with putting the entire thing into an array is that there are limits to array size, and we’re dealing with record counts that could be of any size. Second, why this method is better is because I’m not placing the entire dataset into the additional memory of a variable. Instead, I’m letting Excel take care of it quickly using its own internal mechanisms.

      I don’t particularly like the idea of creating and deleting connections either, but the functionality appeared to be the fastest way to do what I wanted. I’m interested in your experiences. You say you tried the above method and it didn’t work? Could elaborate more on that?

      Reply
      1. Christopher J. McClellan

        Hmmmm I wouldn’t worry too much about the connection string changing. If you’re doing it right, it should be trivial to change. As for the API changing, yes. That could be an issue, but I’m skeptical that it’s as big of an issue as you make it out to be.

      2. Jordan G Post author

        I’ve run into both problems before. The connection string I only ran into once building on work that came long before me, so I really don’t expect it to be an issue long term. But the API changing versions is one I’ve run into more than I’d like. I just have a preference to never tell the client the fix to a problem is for them to jump into the backend and check/uncheck things. We can debate whether it’s a big issue, and chances are, it probably isn’t. It’s more of a preference on my part.

        I will say the most recent break in ActiveX controls (re: the December Office update) is just further reason why I’ve moved away from ActiveX in general. But I will say you gotta go with what works for you. So, if for nothing else, consider this article as providing a workable if somewhat uncommon alternative.

  2. Jimbo

    Jordan, Love the Adv Essentials Book! Many of the concepts remind me of when complex CORBA got dumbed down simple AJAX web dev, but i digress. I am trying to use your Short hand range syntax between two Workbooks, but the syntax always fails. Searching the web seems to say you can’t use this across two workbooks, only the current workbook. So how would you solution copying cells from Workbook2 to Workbook1 using the “Short hand range” syntax. -Jimbo

    Reply
    1. Jordan G Post author

      You know, that is a great question I’ve never ever though of. I think you ought to be able to do it this way [Workbook1!Range1] and [Workbook2!Range1]. Worse case scenario, you move away from shorthand and do Workbooks(…).Range(…).

      Reply
    2. Jordan G Post author

      Also please consider leaving an honest amazon review so others might know what to expect before reading the book.

      Reply
  3. Stephane

    Jordan,

    Me too i like your Advanced Excel book !!! A masterpiece even if i have some question on the use of merged cells into your spreadsheets. Many Excel Gurus consider it as a bad practice because they kind of change the range structure. Can you give us some thoughts on this subject ?

    Thanks,

    STEPHANE

    Reply
    1. Jordan G Post author

      I know many of them consider it bad practice but I’ve yet to figure out why. My guess is that many of developers use VBA heavy solutions that require iteration through large ranges of cells. I use merged ranges for titles and input cells, but I’m rarely in a situation where I a merged cell is in my iteration path. And in any event, I always advocate doing as much work on the spreadsheet as possible, so I think my development style has necessarily avoided the problems that typically arise via merged ranges.

      Reply
  4. H D

    Thank you! I didn’t know this method before and was searching on internet to look for a fast way to import large amount of data ( 100K-300K record at a time) into excel and found this! I also have tried establishing ADO connections and open workbooks separately, but this is the best solution by far! Thank you again!

    Reply

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s