Get Advanced Excel Essentials for $9.99!

I wish this deal were available a few days ago.

Apress’ parent company, Springer, currently has a coupon to receive $30 off any of their books, including Advanced Excel Essentials.

Here’s what you do:

1. Go here to my Advanced Excel Essentials book page. Then click “Get it now.”

2. Use the coupon code HLDY14A to take $30 off the regular price. That’s all. Shipping is free.

Obviously, you can use this code for any of the other books in their library. I’m not affiliated with the coupon, so I won’t get a kick back every time you use it (in fact, I’ll make pennies in royalties from the lowered price!) nor can I really troubleshoot if it doesn’t work. I will say that it’s worked for my friend’s who’ve tried it, so why not give it a try yourself?

A method for quickly importing and working with CSV files (part 2)

This is part 2 in which I explain how I solved the problem. Make sure to read the previous blog posts. Taken together, these show I though through and ultimately solved the problem.

Query Tables to the Rescue!

To solve this problem, I stumbled on something I rarely use, and that’s our good ol’ friend Query Tables. I don’t think they go by Query Tables anymore, but that’s what what they’re called in the code, so that’s the terminology I’m going to use here.

You might have used Query Tables as a manual feature before. For instance, to import a CSV file into Excel using Query Tables you would simply select From Text in the Get External Data Group from on the Data tab. See the snapshot below. These are Query Tables.

image

This functionality is really meant to create a connection between Excel and some other data source. The idea being you can manipulate and analyze data using Excel as your business logic and presentation layer.

But I didn’t need any of that here. I never expected the data in the underlying CSV file to change, so a connection that could refresh wasn’t necessary. Furthermore, unless you’re using these connections with PowerPivot, Excel has a bad history linking to other files.

But what Query Tables brings to the table (pun intended) is that they can dump an entire table’s worth of data rather quickly into a worksheet tab in one fell swoop. Try it for yourself manually to see how quick it is.

In fact, I was able to automate the entire process using the code below.

   1: Private Sub UploadFiles(FilePath As String, WorksheetTabName As String)

   2:     On Error Resume Next

   3:

   4:     Dim CurrentWorksheet As Excel.Worksheet

   5:     Set CurrentWorksheet = Worksheets(WorksheetTabName)

   6:

   7:     With CurrentWorksheet.QueryTables.Add(Connection:= _

   8:         "TEXT;" & FilePath _

   9:         , Destination:=CurrentWorksheet.Range("$A$1"))

  10:         .CommandType = 0

  11:         .Name = "DeleteMe"

  12:         .FieldNames = True

  13:         .RefreshOnFileOpen = False

  14:         .RefreshStyle = xlOverwriteCells

  15:         .AdjustColumnWidth = True

  16:         .RefreshPeriod = 0

  17:         .TextFilePlatform = 437

  18:         .TextFileStartRow = 1

  19:         .TextFileParseType = xlDelimited

  20:         .TextFileTextQualifier = xlTextQualifierDoubleQuote

  21:         .TextFileConsecutiveDelimiter = False

  22:         .TextFileCommaDelimiter = True

  23:         .TextFileTrailingMinusNumbers = True

  24:         .Refresh BackgroundQuery:=False

  25:     End With

  26:

  27:     CurrentWorksheet.QueryTables(1).Delete

  28:     ThisWorkbook.Names(WorksheetTabName & "!DeleteMe_1").Delete

  29: End Sub

So let’s go through how this works.

First, the way my program works is that the user supplies a folder containing the CSV files. Then, when they hit a button that says “Run,” I have Excel iterate through the contents of the folder. When I’ve verified a file in the folder is a correct file that will work with my application, I call the UploadFiles method just shown. The FilePath parameter contains to the direct path to the CSV file whose contents I’d like to upload; and the WorksheetTabName parameter refers to the worksheet where’d I’d like the contents to be dumped.

As you can see, on line 8, FilePath is supplied to the connection string when the query table is created. CurrentWorksheet is then used to tell Excel where the data will be placed (line 9).

On line 11, you see I named this query “DeleteMe.” Whenever Excel creates a new Query Table, it also creates a new named range in the background. There’s a default name Excel uses, but you can overwrite it with your own name. In this case, I use “DeleteMe” in place of the default as a reminder that the range shouldn’t exist (if my code errors out, I will see a DeleteMe named range in the name manager). You can see this happening below. It really helps with debugging.

image

The reason there are so many repeats is that each name created by the code above is scoped locally to the sheet in which it was created. Excel also adds that “_1” to the end. I imagine if I had multiple connections on the same sheet it would increment that last number. But maybe someone with more experience could confirm that for me?

In any event, we want to ensure there are no traces of the connection settings. So the BackgroundQuery setting to FALSE on line 24 ensures there are no direct updates from the backend data. Line 27 deletes the QueryTable we just created. Don’t worry this just deletes the connection behind the scenes, but it keeps the data that’s been dumped. Finally, line 28 removes the range that was just created. Since it’s local to the worksheet, we can refer to it as “WorksheetName!DeleteMe_1” and then delete it.

So let’s recap. I’ve created a mechanism to quickly import the data from CSV files. I used the functionality incorporated into Query Tables to bypass other solutions that either were not as good or weren’t feasible. In particular, I was  able to get around two solutions I particularly hate: (1) the iterative algorithm; and (2) copying and pasting from other files.

All told, this solution works great! Right now, Excel will iterate through a folder of about eight CSV files and place the data from each CSV file rather quickly. In fact, it happens so quickly, I had to add a completion msgbox just to ensure the code was actually being run! Neither the iterative or copying/pasting solution can boast these results.

Remember, whenever possible, do more with less.

A method for quickly importing and working with CSV files (part 1)

So many folks have asked me to write more on my coding style. So here it is: I’ve developed a two-parter blog post to present my thinking style and how I ultimately solved the problem.

The Problem

I have a client with several CSV files that must be checked and validated for certain data specifications. Each individual CSV file is essentially an output of a table from a query of a larger system. My client required I verify the contents of the CSV files and ensure the connections between the file are correct.

By connection, I mean that several of these CSV files have foreign and primary keys between them. The CSV files could also contain many errors. My client needs an Excel application that can verify these files link together with good data. With that information they can upload the files into another system.

Below is my thinking on how to solve the problem. In the next blog post, I’ll go through implementing the solution.

PowerPivot Won’t Cut It

Now, PowerPivot is actually a great candidate for this problem. But I couldn’t be sure my client (and the folks using my Excel application) would have the ability to install and use PowerPivot. So while I’d love to use PowerPivot for this in an ideal world, it just wasn’t a feasible solution for my client.

ADO is Out

So another would be to use ADO (here’s an example to see what I mean). The problem with using ADO is that it relies on references outside the normal VBA library. In previous years, that didn’t use to be a problem. But as I try to run many of my older Excle applications that had interfaced with Access and other Excel files using ADO, it’s just not a solution that ages well. The reason is that the libraries keep getting updated or names change. New versions of Excel require new connection strings. Over time, these types of solutions just break.

Iteration: My Own Worst Enemy

The most obvious solution—indeed, the one I really, really wanted to avoid—was to open these files all at the same time and then iterate through each record. This solution however is actually the bane of my existence.

For one, iterative algorithms like these explode in complexity as more records are added to the CSV file. And they take forever to complete.

Second, iterative algorithms for this type of problem really promotes something I don’t like. Specifically, it requires you create nested loops upon nested loops of iterations and compounding conditions.

This type of iterative solution is often the go-to for many organizations. It often results in hundreds of lines of VBA in a module with large sections of comments explaining what’s going on. But those comments never help. Indeed, when a company expresses concern about hiring an Excel developer because they fear what might happen when that person leaves, it’s because they’ve experienced one of these long-form iterative solutions. Let’s be honest: they take forever to complete; they’re a pain in the ass to debug; and when passed along to the next developer, they’re rarely touched because nobody understands them.

Copying/Pasting: Just Say No

So I decided early on I had no interest in opening these CSV files several at a time and creating a gigantic loop that only I could (barely) understand. I reasoned that I could do much of the required validation rather easily (and quickly) with formulas. But to use formulas on the data, I would have to load these CSV files into the current workbook somehow. My goal then was to have corresponding tabs filled in with the contents of each given CSV file.

Because each CSV files would always have the same name, I wouldn’t have to create new tabs on the fly. So that was nice: I could first create a tab for each corresponding CSV file (in this case, the tab’s name was the same as the CSV file).

So another obvious solution would be to open each file programmatically, and then copy its entire contents over using the Copy and Paste commands in VBA. This solution is common, but also fraught with problems. For one, it’s an expensive solution when the datasets are large. A large CSV file would require a lot of data copied to the clipboard. Several of these pasting operations happening one after another is a great way to crash a spreadsheet. Second, sometimes the clipboard just simply acts unpredictably; clearing out for now apparent reason.

So what did I do? Find out in the the next thrilling blog post… but feel free to share your ideas!